[GitHub] [iceberg] bryanck commented on a diff in pull request #5168: Arrow: Pad decimal bytes before passing to decimal vector

GitBox Sun, 03 Jul 2022 16:10:02 -0700


bryanck commented on code in PR #5168:
URL: https://github.com/apache/iceberg/pull/5168#discussion_r912547916



##########
arrow/src/main/java/org/apache/iceberg/arrow/vectorized/parquet/VectorizedParquetDefinitionLevelReader.java:
##########
@@ -358,7 +358,8 @@ class FixedLengthDecimalReader extends BaseReader {
     protected void nextVal(
         FieldVector vector, int idx, ValuesAsBytesReader valuesReader, int 
typeWidth, byte[] byteArray) {
       valuesReader.getBuffer(typeWidth).get(byteArray, 0, typeWidth);
-      ((DecimalVector) vector).setBigEndian(idx, byteArray);
+      byte[] vectorBytes = DecimalVectorUtil.padBigEndianBytes(byteArray, 
DecimalVector.TYPE_WIDTH);

Review Comment:
   Also one thing to note is that the benchmark isn't quite right. 
Decimal(20,5) will end up taking 9 bytes and will thus use a fixed length byte 
array instead of long or int encoding. And fixed length byte arrays aren't 
dictionary encoded in Parquet v1. That explains why the decimal benchmark is 
much slower than the other data types (which are dictionary encoded).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] bryanck commented on a diff in pull request #5168: Arrow: Pad decimal bytes before passing to decimal vector

Reply via email to