[jira] [Updated] (SPARK-55131) The default delimiter of StringAppendOperator (merge operator for RocksDB) conflicts when merge is used with non-existence value

Jungtaek Lim (Jira) Thu, 22 Jan 2026 04:46:35 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-55131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jungtaek Lim updated SPARK-55131:
---------------------------------
    Description: 
The default delimit of StringAppendOperator (merge operator for RocksDB) 
conflicts when merge is used with non-existence value.

When there is an existing value, applying merge would be following:

<size of the encoded value (4 bytes, big endian)><encoded 
value><delimiter><...>{color:#ff0000}<delimiter><size of the encoded 
value><encoded value>{color}

*red color refers to the newly added content

Reading the value would be straightforward, read the size, read the encoded 
value based on the size, skip one byte (for delimiter), loop.

When there is no existing value, applying merge would be following:

{color:#ff0000}<delimiter><size of the encoded value><encoded value>{color}

Reading the value would not be the same with the case in above - we need to 
skip reading delimiter at the start to apply the same read logic.

-That said, we need to ensure delimiter to be an "invalid value" of the first 
byte of the size.-

-The value of default delimit is "," (44, 0x2C) which does not satisfy it. 
Since we do not allow negative value for the size, any value making the size (4 
bytes) to be negative can be used as a delimiter.-

^ The above is dependent on the endianness and we tend to use sun's Unsafe 
module for putting some value into byte array & UnsafeRow which follows 
system's default endianness. The above is only safe when the system uses big 
endian, which is guaranteed by JVM but that's abstracted within JVM and Unsafe 
module breaks the abstraction. Making it worse, most common systems use little 
endian.

Alternative is to ensure the delimiter to be available in the first byte of the 
value, regardless of whether the value is written from put vs merge.

  was:
The default delimit of StringAppendOperator (merge operator for RocksDB) 
conflicts when merge is used with non-existence value.

When there is an existing value, applying merge would be following:

<size of the encoded value (4 bytes, big endian)><encoded 
value><delimiter><...>{color:#ff0000}<delimiter><size of the encoded 
value><encoded value>{color}

*red color refers to the newly added content

Reading the value would be straightforward, read the size, read the encoded 
value based on the size, skip one byte (for delimiter), loop.

When there is no existing value, applying merge would be following:

{color:#ff0000}<delimiter><size of the encoded value><encoded value>{color}

Reading the value would not be the same with the case in above - we need to 
skip reading delimiter at the start to apply the same read logic.

-That said, we need to ensure delimiter to be an "invalid value" of the first 
byte of the size.-

-The value of default delimit is "," (44, 0x2C) which does not satisfy it. 
Since we do not allow negative value for the size, any value making the size (4 
bytes) to be negative can be used as a delimiter.-

^ The above is dependent on the endianness and we tend to use sun's Unsafe 
module for putting some value into byte array & UnsafeRow which follows 
system's default endianness. The above is only safe when the system uses big 
endian, which is guaranteed by JVM but that's abstracted within JVM and Unsafe 
module breaks the abstraction.

Alternative is to ensure the delimiter to be available in the first byte of the 
value, regardless of whether the value is written from put vs merge.


> The default delimiter of StringAppendOperator (merge operator for RocksDB) 
> conflicts when merge is used with non-existence value
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-55131
>                 URL: https://issues.apache.org/jira/browse/SPARK-55131
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 4.2.0
>            Reporter: Jungtaek Lim
>            Priority: Major
>
> The default delimit of StringAppendOperator (merge operator for RocksDB) 
> conflicts when merge is used with non-existence value.
> When there is an existing value, applying merge would be following:
> <size of the encoded value (4 bytes, big endian)><encoded 
> value><delimiter><...>{color:#ff0000}<delimiter><size of the encoded 
> value><encoded value>{color}
> *red color refers to the newly added content
> Reading the value would be straightforward, read the size, read the encoded 
> value based on the size, skip one byte (for delimiter), loop.
> When there is no existing value, applying merge would be following:
> {color:#ff0000}<delimiter><size of the encoded value><encoded value>{color}
> Reading the value would not be the same with the case in above - we need to 
> skip reading delimiter at the start to apply the same read logic.
> -That said, we need to ensure delimiter to be an "invalid value" of the first 
> byte of the size.-
> -The value of default delimit is "," (44, 0x2C) which does not satisfy it. 
> Since we do not allow negative value for the size, any value making the size 
> (4 bytes) to be negative can be used as a delimiter.-
> ^ The above is dependent on the endianness and we tend to use sun's Unsafe 
> module for putting some value into byte array & UnsafeRow which follows 
> system's default endianness. The above is only safe when the system uses big 
> endian, which is guaranteed by JVM but that's abstracted within JVM and 
> Unsafe module breaks the abstraction. Making it worse, most common systems 
> use little endian.
> Alternative is to ensure the delimiter to be available in the first byte of 
> the value, regardless of whether the value is written from put vs merge.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-55131) The default delimiter of StringAppendOperator (merge operator for RocksDB) conflicts when merge is used with non-existence value

Reply via email to