felipecrv commented on code in PR #41299:
URL: https://github.com/apache/arrow/pull/41299#discussion_r1583502818


##########
docs/source/format/CanonicalExtensions.rst:
##########
@@ -251,6 +251,17 @@ Variable shape tensor
    Values inside each **data** tensor element are stored in 
row-major/C-contiguous
    order according to the corresponding **shape**.
 
+UUID
+====
+
+* Extension name: `arrow.uuid`.
+
+* The storage type of the extension is ``FixedSizeBinary`` with a length of 16 
bytes.
+
+.. note::
+   A specific UUID version is not required or guaranteed. This extension 
represents
+   UUIDs as FixedSizeBinary(16) and does not interpret the bytes in any way.

Review Comment:
   You should specify what byte (0 or 15) is most-significant byte.
   
   For instance, Java treats the 0-th byte as the MSB [1] which is consistent 
with the convention that big-endian the network byte order [2]:
   
   ```java
       private UUID(byte[] data) {
           long msb = 0;
           long lsb = 0;
           assert data.length == 16 : "data must be 16 bytes in length";
           for (int i=0; i<8; i++)
               msb = (msb << 8) | (data[i] & 0xff);
           for (int i=8; i<16; i++)
               lsb = (lsb << 8) | (data[i] & 0xff);
           this.mostSigBits = msb;
           this.leastSigBits = lsb;
       }
   ```
   
   When porting this code to C/C++, be careful with the fact that Java's 
integer types are big-endian (inheritance from SPARC), unlike most 
architectures we use today that are little-endian.
   
   An advantage of putting the MSB at byte 0 is that when you parse an UUID 
string you read the string from the MSB to the LSB and write the UUID data from 
0 to 15.
   
   [1] 
https://github.com/openjdk/jdk/blob/819f3d6fc70ff6fe54ac5f9033c17c3dd4326aa5/src/java.base/share/classes/java/util/UUID.java#L116-L126
   [2] 
https://stackoverflow.com/questions/13514614/why-is-network-byte-order-defined-to-be-big-endian



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to