[I] [Java] Inconsistent Byte Order in Serialization Breaks Cross-Platform Compatibility [fory]

via GitHub Tue, 05 Aug 2025 02:29:10 -0700


LouisLou2 opened a new issue, #2440:
URL: https://github.com/apache/fory/issues/2440


   ### Search before asking
   
   - [x] I had searched in the [issues](https://github.com/apache/fory/issues) 
and found no similar issues.
   
   
   ### Version
   
   latest commit
   
   ### Component(s)
   
   Java
   
   ### Minimal reproduce step
   
   The core of the logic is as follows:
   
   1.  **Platform-dependent constants are defined at class-load time.** The 
code checks the machine's native endianness and sets shift constants 
(`HI_BYTE_SHIFT`, `LO_BYTE_SHIFT`) accordingly. On a Little-Endian machine, 
these constants are set up one way, and on a Big-Endian machine, they are set 
up the opposite way.
   
       ```java
       // StringUTF16.java
       static {
         if (ByteOrder.nativeOrder() == ByteOrder.BIG_ENDIAN) {
           HI_BYTE_SHIFT = 8; LO_BYTE_SHIFT = 0;
         } else {
           HI_BYTE_SHIFT = 0; LO_BYTE_SHIFT = 8;
         }
       }
       ```
   
   2.  **These constants are used to serialize multi-byte data.** For example, 
when writing a `char` (2 bytes):
   
       ```java
       // in StringSerializer::offHeapWriteCharsUTF16
       tmpArray[i]     = (byte) (c >> HI_BYTE_SHIFT);
       tmpArray[i + 1] = (byte) (c >> LO_BYTE_SHIFT);
       ```
   
   **Logical Consequence:**
   
   * On a Little-Endian machine, this logic assembles bytes in Little-Endian 
order (`[low_byte, high_byte]`).
   * On a Big-Endian machine, the exact same code assembles bytes in Big-Endian 
order (`[high_byte, low_byte]`).
   
   The serialized output is therefore inherently tied to the architecture of 
the machine that created it.
   
   
   ### What did you expect to see?
   
   I expect the serialized byte stream for any given object to be identical, 
regardless of the host machine's endianness. A serialization framework must 
enforce a single, canonical byte order to ensure data portability.
   
   ### What did you see instead?
   
   Instead, the generated byte stream's endianness is coupled to the host 
machine's native architecture. Data serialized on a Little-Endian machine is in 
Little-Endian format, while the same data serialized on a Big-Endian machine is 
in Big-Endian format. This prevents cross-platform data exchange.
   
   ### Anything Else?
   
   Please note that this report is based on a logical analysis of the code. 
While it has not been empirically tested on a physical Big-Endian machine. If 
there is any problem, please point it out in the comments.
   
   ### Are you willing to submit a PR?
   
   - [x] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] [Java] Inconsistent Byte Order in Serialization Breaks Cross-Platform Compatibility [fory]

Reply via email to