Jungtaek Lim created SPARK-55131:
------------------------------------
Summary: The default delimiter of StringAppendOperator (merge
operator for RocksDB) conflicts when merge is used with non-existence value
Key: SPARK-55131
URL: https://issues.apache.org/jira/browse/SPARK-55131
Project: Spark
Issue Type: Bug
Components: Structured Streaming
Affects Versions: 4.2.0
Reporter: Jungtaek Lim
The default delimit of StringAppendOperator (merge operator for RocksDB)
conflicts when merge is used with non-existence value.
When there is an existing value, applying merge would be following:
<size of the encoded value (4 bytes, big endian)><encoded
value><delimiter><...>{color:#FF0000}<delimiter><size of the encoded
value><encoded value>{color}
*red color refers to the newly added content
Reading the value would be straightforward, read the size, read the encoded
value based on the size, skip one byte (for delimiter), loop.
When there is no existing value, applying merge would be following:
{color:#FF0000}<delimiter><size of the encoded value><encoded value>{color}
Reading the value would not be the same with the case in above - we need to
skip reading delimiter at the start to apply the same read logic.
That said, we need to ensure delimiter to be an "invalid value" of the first
byte of the size.
The value of default delimit is "," (44, 0x2C) which does not satisfy it. Since
we do not allow negative value for the size, any value making the size (4
bytes) to be negative can be used as a delimiter.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]