[jira] [Comment Edited] (CASSANDRA-20190) MemoryUtil.setInt/getInt and similar use the wrong endianness

Dmitry Konstantinov (Jira) Wed, 26 Mar 2025 06:13:30 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-20190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17938608#comment-17938608
 ]


Dmitry Konstantinov edited comment on CASSANDRA-20190 at 3/26/25 1:12 PM:
--------------------------------------------------------------------------

A possible way can be to add a heuristic check into 
org.apache.cassandra.io.sstable.indexsummary.IndexSummary.IndexSummarySerializer#deserialize

once we loaded offsets and entries into memory we can read offsets.getInt(0) 
(and/or entries.getLong(..)) and check if it is suspiciously big for example by 
offsets.getInt(0) > Integer.reserveBytes(offsets.getInt(0) or a similar way and 
then we can throw IOException. 

We load summaries using the following call tree if I read the code correctly:
 * 
org.apache.cassandra.io.sstable.format.big.BigSSTableReaderLoadingBuilder#openComponents
 <-- it should rebuild the summary if it is not loaded
 * 
org.apache.cassandra.io.sstable.format.big.BigSSTableReaderLoadingBuilder#loadSummary
 <-- it will catch the IOException and return null as summary
 * 
org.apache.cassandra.io.sstable.format.big.IndexSummaryComponent#loadOrDeleteCorrupted
  <-- it will delete the summary file if we thrown an IOException
 * org.apache.cassandra.io.sstable.format.big.IndexSummaryComponent#load
 * 
org.apache.cassandra.io.sstable.indexsummary.IndexSummary.IndexSummarySerializer#deserialize

so, it looks like it would be the cheapest way to do such detection...

 


was (Author: dnk):
A possible way can be to add a heuristic check into 
org.apache.cassandra.io.sstable.indexsummary.IndexSummary.IndexSummarySerializer#deserialize

once we loaded offsets and entries into memory we can read offsets.getInt(0) 
(and/or entries.getLong(..)) and check if it is suspiciously big for example by 
offsets.getInt(0) > Integer.reserveBytes(offsets.getInt(0) or a similar way and 
then we can throw IOException. 

We load summaries using the following call tree if I read the code correctly:
 * 
org.apache.cassandra.io.sstable.format.big.BigSSTableReaderLoadingBuilder#openComponents
 <-- it should rebuild the summary
 * 
org.apache.cassandra.io.sstable.format.big.BigSSTableReaderLoadingBuilder#loadSummary
 <-- it will delete the summary file
 * 
org.apache.cassandra.io.sstable.format.big.IndexSummaryComponent#loadOrDeleteCorrupted
 <-- it will catch the IOException
 * org.apache.cassandra.io.sstable.format.big.IndexSummaryComponent#load
 * 
org.apache.cassandra.io.sstable.indexsummary.IndexSummary.IndexSummarySerializer#deserialize

so, it looks like it would be the cheapest way to do such detection...

 

> MemoryUtil.setInt/getInt and similar use the wrong endianness
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-20190
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20190
>             Project: Apache Cassandra
>          Issue Type: Bug
>          Components: Local/Other
>            Reporter: Branimir Lambov
>            Assignee: Dmitry Konstantinov
>            Priority: Normal
>             Fix For: 5.x
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> `NativeCell`, `NativeClustering` and `NativeDecoratedKey` use the above 
> methods from `MemoryUtil` to write and read data from native memory. As far 
> as I can see they are meant to write data in big endian. They do not (they 
> always correct to little endian).
> Moreover, they disagree with their `ByByte` versions on big-endian machines 
> (which is only likely an issue on aligned-access architectures (x86 and arm 
> should be fine)).
> The same is true for the methods in `Memory`, used by compression metadata as 
> well as index summaries.
> We need to verify that this does not cause any problems, and to change the 
> methods to behave as expected and document the behaviour by explicitly using 
> `ByteOrder.LITTLE_ENDIAN` for any data that may have been persisted on disk 
> with the wrong endianness.
>  
> The current MemoryUtil behaviour (before the fix):
> ||Native 
> order||MemoryUtil.setX||MemoryUtil.setXByByte||MemoryUtil.getX||MemoryUtil.getXByByte||
> |BE|LE|BE|LE|BE|
> |LE|LE|LE|LE|LE|
> shortly: MemoryUtil.setX/getX is LE, MemoryUtil.setXByByte/getXByByte is 
> Native



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (CASSANDRA-20190) MemoryUtil.setInt/getInt and similar use the wrong endianness

Reply via email to