bryanck commented on code in PR #5168:
URL: https://github.com/apache/iceberg/pull/5168#discussion_r912547916
##########
arrow/src/main/java/org/apache/iceberg/arrow/vectorized/parquet/VectorizedParquetDefinitionLevelReader.java:
##########
@@ -358,7 +358,8 @@ class FixedLengthDecimalReader extends BaseReader {
protected void nextVal(
FieldVector vector, int idx, ValuesAsBytesReader valuesReader, int
typeWidth, byte[] byteArray) {
valuesReader.getBuffer(typeWidth).get(byteArray, 0, typeWidth);
- ((DecimalVector) vector).setBigEndian(idx, byteArray);
+ byte[] vectorBytes = DecimalVectorUtil.padBigEndianBytes(byteArray,
DecimalVector.TYPE_WIDTH);
Review Comment:
Also one thing to note is that the benchmark isn't quite right.
Decimal(20,5) will end up taking 9 bytes and will thus use a fixed length byte
array instead of long or int encoding. And fixed length byte arrays aren't
dictionary encoded in Parquet v1. That explains why the decimal benchmark is
much slower than the other data types (which are dictionary encoded).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]