felipecrv commented on code in PR #41299:
URL: https://github.com/apache/arrow/pull/41299#discussion_r1583502818
##########
docs/source/format/CanonicalExtensions.rst:
##########
@@ -251,6 +251,17 @@ Variable shape tensor
Values inside each **data** tensor element are stored in
row-major/C-contiguous
order according to the corresponding **shape**.
+UUID
+====
+
+* Extension name: `arrow.uuid`.
+
+* The storage type of the extension is ``FixedSizeBinary`` with a length of 16
bytes.
+
+.. note::
+ A specific UUID version is not required or guaranteed. This extension
represents
+ UUIDs as FixedSizeBinary(16) and does not interpret the bytes in any way.
Review Comment:
You should specify what byte (0 or 15) is most-significant byte.
For instance, Java treats the 0-th byte as the MSB [1] which is consistent
with the convention that big-endian the network byte order [2]:
```java
private UUID(byte[] data) {
long msb = 0;
long lsb = 0;
assert data.length == 16 : "data must be 16 bytes in length";
for (int i=0; i<8; i++)
msb = (msb << 8) | (data[i] & 0xff);
for (int i=8; i<16; i++)
lsb = (lsb << 8) | (data[i] & 0xff);
this.mostSigBits = msb;
this.leastSigBits = lsb;
}
```
When porting this code to C/C++, be careful with the fact that Java's
integer types are big-endian (inheritance from SPARC), unlike most
architectures we use today that are little-endian.
An advantage of putting the MSB at byte 0 is that when you parse an UUID
string you read the string from the MSB to the LSB and write the UUID data from
0 to 15.
[1]
https://github.com/openjdk/jdk/blob/819f3d6fc70ff6fe54ac5f9033c17c3dd4326aa5/src/java.base/share/classes/java/util/UUID.java#L116-L126
[2]
https://stackoverflow.com/questions/13514614/why-is-network-byte-order-defined-to-be-big-endian
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]