steveloughran commented on code in PR #3481:
URL: https://github.com/apache/parquet-java/pull/3481#discussion_r3241591960
##########
parquet-variant/src/main/java/org/apache/parquet/variant/VariantUtil.java:
##########
@@ -299,6 +299,31 @@ static int readUnsigned(ByteBuffer bytes, int pos, int
numBytes) {
return result;
}
+ /**
+ * Fast little-endian unsigned read using bulk ByteBuffer operations.
+ * Requires the buffer to have {@link java.nio.ByteOrder#LITTLE_ENDIAN} byte
order.
+ * Adapted from Apache Iceberg's VariantUtil.readLittleEndianUnsigned.
+ */
+ static int readUnsignedLittleEndian(ByteBuffer buffer, int pos, int
numBytes) {
+ switch (numBytes) {
+ case 1:
+ return buffer.get(pos) & U8_MAX;
+ case 2:
+ return buffer.getShort(pos) & U16_MAX;
+ case 3:
+ return (buffer.getShort(pos) & U16_MAX) | ((buffer.get(pos + 2) &
U8_MAX) << 16);
+ case 4:
+ int v = buffer.getInt(pos);
+ if (v < 0) {
Review Comment:
maybe use `org.apache.parquet.Preconditions.checkArgument()` here
##########
parquet-variant/src/main/java/org/apache/parquet/variant/Variant.java:
##########
@@ -56,17 +81,35 @@ public Variant(byte[] value, int valuePos, int valueLength,
byte[] metadata, int
}
public Variant(ByteBuffer value, ByteBuffer metadata) {
- // The buffers are read a single-byte at a time, so the endianness of the
input buffers
- // is not important.
- this.value = value.asReadOnlyBuffer();
- this.metadata = metadata.asReadOnlyBuffer();
+ this.value = value.asReadOnlyBuffer().order(ByteOrder.LITTLE_ENDIAN);
+ this.metadata = metadata.asReadOnlyBuffer().order(ByteOrder.LITTLE_ENDIAN);
// There is currently only one allowed version.
if ((metadata.get(metadata.position()) & VariantUtil.VERSION_MASK) !=
VariantUtil.VERSION) {
throw new UnsupportedOperationException(String.format(
"Unsupported variant metadata version: %d",
metadata.get(metadata.position()) & VariantUtil.VERSION_MASK));
}
+
+ // Pre-compute dictionary size for lazy metadata cache allocation.
+ int pos = this.metadata.position();
+ int metaOffsetSize = ((this.metadata.get(pos) >> 6) & 0x3) + 1;
+ if (this.metadata.remaining() > 1) {
+ this.dictSize = VariantUtil.readUnsignedLittleEndian(this.metadata, pos
+ 1, metaOffsetSize);
Review Comment:
see #3562 ; dictSize must be <= variant metadata size
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]