yyanyy opened a new pull request #1963:
URL: https://github.com/apache/iceberg/pull/1963


   This change is a smaller PR broken down from #1935. 
   
   This change adds field id to constructors of Avro primitive value writers, 
and make these writers to track stats such as value count, min and max, and 
expose a `metrics` method that could be called to collect `FieldMetrics`. 
However nothing is calling these method yet. 
   This change doesn't have any test, and tests will be included in the next PR 
when end to end integration is set up. 
   
   Please note: regarding change to the signature of `FieldMetrics`, the 
alternative would be to keep `ByteBuffer` as the return value for lower/upper 
bound of `FieldMetrics` and ingest each field's metrics mode to each leaf value 
writer during construction, so that when collecting metrics from these writers, 
truncation and conversion to byte buffer could happen. I think it's doable but 
it would touch a lot of methods' signatures, including adding metric mode to 
the constructor of every leaf writer, and adding metrics config to every datum 
writer (e.g. `DataWriter`, `GenericAppenderFactory`), but it does avoid skip 
computing min/max for fields that don't need them. Please let me know if you 
are interested, and I'll post a new commit to this PR so that the differences 
in these two implementations could be compared.  
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to