[GitHub] [iceberg] yyanyy commented on a change in pull request #1946: Avro metrics support: create MetricsAwareDatumWriter and some refactors for Avro

GitBox Fri, 18 Dec 2020 12:54:39 -0800


yyanyy commented on a change in pull request #1946:
URL: https://github.com/apache/iceberg/pull/1946#discussion_r546085784




##########
File path: api/src/main/java/org/apache/iceberg/FieldMetrics.java
##########
@@ -30,15 +28,15 @@
   private final long valueCount;
   private final long nullValueCount;
   private final long nanValueCount;
-  private final ByteBuffer lowerBound;
-  private final ByteBuffer upperBound;
+  private final Object lowerBound;
+  private final Object upperBound;

Review comment:
       I think if we convert this to `ByteBuffer` now we may still need to 
check type when doing truncation (based on metrics mode), and I think string 
with non-unicode characters will not vend the same result if truncated by 
`BinaryUtil.truncateBinary`, so we will either convert the byte buffer back to 
char sequence and use `UnicodeUtil.truncateString` or create a new 
`BinaryUtil.truncateString`. Whereas if we do conversion later when evaluating 
metrics, I think the [code needed for the conversion 
itself](https://github.com/apache/iceberg/pull/1935/files#diff-32411391d7c2962fe8c090ca56f1e9f63cb9bd473f0e55a9a1581863e9542deaR109)
 isn't that bad since we know the type of the field, and that's the reason for 
me to do this change. 
   
   But one thing that may worth noting is that for the current approach, in 
order ensure the `Conversions.toByteBuffer` could work, for certain writers I 
have to make sure the min/max from the value writers return the type that 
`Conversions.toByteBuffer` knows how to translate, if the data type in `write` 
is not of that type (that is, usage of [this 
method](https://github.com/apache/iceberg/pull/1935/files#diff-6ccdc1daa757d215170dc57a4b3d4f31a62c8b2993ae2289511e97b2df52ef02R641-R652)).
 I think we still need to maintain a similar function for translation in each 
value writer if we return bytebuffer for bounds in field metrics, but it will 
directly translate input data type to byte buffer instead of doing two hops, 
and that might be easier to understand. 
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] yyanyy commented on a change in pull request #1946: Avro metrics support: create MetricsAwareDatumWriter and some refactors for Avro

Reply via email to