kbendick commented on pull request #2069:
URL: https://github.com/apache/iceberg/pull/2069#issuecomment-761919569


   For ORC metrics, which currently don't populate `nanValueCounts` in the 
metrics and for which it seems difficult to populate `nanValueCounts`, will 
this still work when we create `Metrics` using the non-deprecated constructors 
that require passing `nanValueCounts`? This is in reference to this issue for 
removing the `Metrics` constructors that don't take in `nanValueCounts` maps: 
https://github.com/apache/iceberg/issues/2044
   
   I suppose when we go to deprecate the old `Metrics` constructors, we can 
simply pass `null` into the `Metrics` constructor for `nanValueCounts` with ORG 
in order to 
   
   Based off of my understanding of ORC's `ColumnStatistics` and the notes in 
this PR about `upperBound ` and `lowerBound` being `NaN` in ORC if _any_ value 
in the file is `NaN`. To me, I fail to see how we can get the count of NaN 
values in the following code block (though we can always continue to pass in 
`null` to match the existing behavior, provided the various metrics evaluators 
are aware that `null` for that Map entry for nullValueCounts doesnt convey much 
meaning for ORC).
   
   The relevant code for populating ORC metrics is here: 
https://github.com/apache/iceberg/blob/29915e3522f9a808420a2e16fb1c2cf3ae165a74/orc/src/main/java/org/apache/iceberg/orc/OrcMetrics.java#L100-L172


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to