[jira] [Comment Edited] (CASSANDRA-20190) MemoryUtil.setInt/getInt and similar use the wrong endianness

Dmitry Konstantinov (Jira) Wed, 08 Jan 2025 10:40:23 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-20190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17911199#comment-17911199
 ]


Dmitry Konstantinov edited comment on CASSANDRA-20190 at 1/8/25 6:38 PM:
-------------------------------------------------------------------------

||Entity||Used Memory methods||Serialize/deserialize logic||Notes||
|BloomFilter (OffHeapBitSet)
assuming new format|setByte
getByte|OffHeapBitSet#serialize
{code:java}
DataOutputPlus.write(bytes, 0, bytes.size()); -> list of native order byte 
buffers -> copy them as is{code}OffHeapBitSet#deserialize
{code:java}
FBUtilities.copy(in, new MemoryOutputStream(memory), byteCount), read using 
byte[] chunks {code}|Order agnostic|
|IndexSummary.offsets|getInt|IndexSummaryBuilder
{code:java}
offsets = new SafeMemoryWriter(4 * 
maxExpectedEntries).order(ByteOrder.LITTLE_ENDIAN);
// it was native before CASSANDRA-17723
...
offsets.writeInt((int) entries.length()); {code}
IndexSummary.IndexSummarySerializer#serialize
{code:java}
int offset = t.offsets.getInt(i * 4) + baseOffset;
// our serialization format for this file uses native byte order, so if this is 
different to the
// default Java serialization order (BIG_ENDIAN) we have to reverse our bytes
offset = Integer.reverseBytes(offset);
out.writeInt(offset); {code}IndexSummary.IndexSummarySerializer#deserialize
{code:java}
FBUtilities.copy(in, new MemoryOutputStream(offsets), offsets.size()); {code}
copy file content as is to memory| * file format: LE (for offsets)
 * memory format: LE|
|IndexSummary.entries|getLong|IndexSummaryBuilder
{code:java}
entries = new SafeMemoryWriter(expectedEntrySize * 
maxExpectedEntries).order(ByteOrder.LITTLE_ENDIAN); // was native before 
CASSANDRA-17723
...
entries.writeLong(indexStart); {code}
IndexSummary.IndexSummarySerializer#serialize
out.write(t.entries, 0, t.entriesLength);  -> list of native order byte buffers 
-> copy them as is| * file format: LE (for positions)
 * memory format: LE|
|CompressionMetadata|setLong
getLong|CompressionMetadata.Writer#doPrepare
{code}
out.writeLong(offsets.getLong(i * 8L)); 
{code}
CompressionMetadata#readChunkOffsets
{code}
offsets.setLong(i * 8L, input.readLong());{code}| * file format: BE
 * memory format: LE|


was (Author: dnk):
||Entity||Used Memory methods||Serialize/deserialize logic||Notes||
|BloomFilter (OffHeapBitSet)
assuming new format|setByte
getByte|OffHeapBitSet#serialize
{code:java}
DataOutputPlus.write(bytes, 0, bytes.size()); -> list of native order byte 
buffers -> copy them as is{code}OffHeapBitSet#deserialize
{code:java}
FBUtilities.copy(in, new MemoryOutputStream(memory), byteCount), read using 
byte[] chunks {code}|Order agnostic|
|IndexSummary.offsets|getInt|IndexSummaryBuilder
{code:java}
offsets = new SafeMemoryWriter(4 * 
maxExpectedEntries).order(ByteOrder.LITTLE_ENDIAN);
// it was native before CASSANDRA-17723
...
offsets.writeInt((int) entries.length()); {code}
IndexSummary.IndexSummarySerializer#serialize
{code:java}
int offset = t.offsets.getInt(i * 4) + baseOffset;
// our serialization format for this file uses native byte order, so if this is 
different to the
// default Java serialization order (BIG_ENDIAN) we have to reverse our bytes
offset = Integer.reverseBytes(offset);
out.writeInt(offset); {code}IndexSummary.IndexSummarySerializer#deserialize
{code:java}
FBUtilities.copy(in, new MemoryOutputStream(offsets), offsets.size()); {code}
copy file content as is to memory| * file format: LE (for offsets)
 * memory format: LE|

> MemoryUtil.setInt/getInt and similar use the wrong endianness
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-20190
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20190
>             Project: Apache Cassandra
>          Issue Type: Bug
>          Components: Local/Other
>            Reporter: Branimir Lambov
>            Priority: Normal
>
> `NativeCell`, `NativeClustering` and `NativeDecoratedKey` use the above 
> methods from `MemoryUtil` to write and read data from native memory. As far 
> as I can see they are meant to write data in big endian. They do not (they 
> always correct to little endian).
> Moreover, they disagree with their `ByByte` versions on big-endian machines 
> (which is only likely an issue on aligned-access architectures (x86 and arm 
> should be fine)).
> The same is true for the methods in `Memory`, used by compression metadata as 
> well as index summaries.
> We need to verify that this does not cause any problems, and to change the 
> methods to behave as expected and document the behaviour by explicitly using 
> `ByteOrder.LITTLE_ENDIAN` for any data that may have been persisted on disk 
> with the wrong endianness.
> The current MemoryUtil behaviour:
> ||Native 
> order||MemoryUtil.setX||MemoryUtil.setXByByte||MemoryUtil.getX||MemoryUtil.getXByByte||
> |BE|LE|BE|LE|BE|
> |LE|LE|LE|LE|LE|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (CASSANDRA-20190) MemoryUtil.setInt/getInt and similar use the wrong endianness

Reply via email to