emkornfield commented on PR #3098: URL: https://github.com/apache/parquet-java/pull/3098#issuecomment-2557642415
I can try to look in more detail but stats can certainly be used here, I imagine they are most useful for repeated fieds when trying to discriminate between repeated fields that mostly have 0 or 1 element, and trying to filter out cases with > 0 or 1 elements. e.g. if all fields have 0 observed rep_levels of 1, then one knows for sure all lists are of length 0 or 1 (whether there are any lists of length 0 or one can be deteremined by inspecting the def level histogram). For larger cardinality lists the filtering power diminishes significanly (its hard to distinguish based on histograms the difference between many very small lists vs one very large one). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
