[
https://issues.apache.org/jira/browse/CASSANDRA-20190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17910671#comment-17910671
]
Dmitry Konstantinov edited comment on CASSANDRA-20190 at 1/7/25 3:57 PM:
-------------------------------------------------------------------------
MemoryUtil part:
||Class ||Used methods||Usage||
|NativeClustering|MemoryUtil.setShort/getShort|to store metadata like
sizes/null bitmap/etc, the metadata are not exposed without decoding outside of
this class as a part of any raw buffers/arrays/etc.|
|NativeDecoratedKey|MemoryUtil.setInt/getInt|to store length, it is not exposed
outside of this class without decoding as a part of any raw buffers/arrays/etc|
|NativeCell|MemoryUtil.setLong/getLong
MemoryUtil.setInt/getInt|to store metadata like size/timestamp/deletion
time/etc, the metadata are not exposed without decoding outside of this class
as a part of any raw buffers/arrays/etc|
based on the source code analysis: for these 3 classes - the written values are
not exposed in a raw form outside, so it should be ok to use any order here and
from performance point of view - here it makes sense to use a native order.
So, do we have an agreement that the best option would be to introduce
unconditional NativeEndianUtil and BigEndianUtil and use NativeEndianUtil in
the above MemoryUtil use cases?
For CASSANDRA-20173 I will use BigEndianUtil within NativeAccessor logic.
was (Author: dnk):
MemoryUtil part:
||Class ||Used methods||Usage||
|NativeClustering|MemoryUtil.setShort/getShort|to store metadata like
sizes/null bitmap/etc, the metadata are not exposed without decoding outside of
this class as a part of any raw buffers/arrays/etc.|
|NativeDecoratedKey|MemoryUtil.setInt/getInt|to store length, it is not exposed
outside of this class without decoding as a part of any raw buffers/arrays/etc|
|NativeCell|MemoryUtil.setLong/setLong
MemoryUtil.setInt/setInt|to store metadata like size/timestamp/deletion
time/etc, the metadata are not exposed without decoding outside of this class
as a part of any raw buffers/arrays/etc|
based on the source code analysis: for these 3 classes - the written values are
not exposed in a raw form outside, so it should be ok to use any order here and
from performance point of view - here it makes sense to use a native order.
So, do we have an agreement that the best option would be to introduce
unconditional NativeEndianUtil and BigEndianUtil and use NativeEndianUtil in
the above MemoryUtil use cases?
For CASSANDRA-20173 I will use BigEndianUtil within NativeAccessor logic.
> MemoryUtil.setInt/getInt and similar use the wrong endianness
> -------------------------------------------------------------
>
> Key: CASSANDRA-20190
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20190
> Project: Apache Cassandra
> Issue Type: Bug
> Components: Local/Other
> Reporter: Branimir Lambov
> Priority: Normal
>
> `NativeCell`, `NativeClustering` and `NativeDecoratedKey` use the above
> methods from `MemoryUtil` to write and read data from native memory. As far
> as I can see they are meant to write data in big endian. They do not (they
> always correct to little endian).
> Moreover, they disagree with their `ByByte` versions on big-endian machines
> (which is only likely an issue on aligned-access architectures (x86 and arm
> should be fine)).
> The same is true for the methods in `Memory`, used by compression metadata as
> well as index summaries.
> We need to verify that this does not cause any problems, and to change the
> methods to behave as expected and document the behaviour by explicitly using
> `ByteOrder.LITTLE_ENDIAN` for any data that may have been persisted on disk
> with the wrong endianness.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]