raunaqmorarka opened a new pull request, #216:
URL: https://github.com/apache/parquet-format/pull/216

   ### Jira
   
     - https://issues.apache.org/jira/browse/PARQUET-2352
   
   This updates the spec to allow truncation of row group min_values/max_value 
statistics so that readers can take advantage of row group pruning for 
predicates on columns containing long strings.
   https://issues.apache.org/jira/browse/PARQUET-1685 already introduced a 
feature to parquet-mr which allows users to deviate from the current spec and 
configure truncation of row group statistics.
   
   Since the possibility of truncation exists and is not possible to explicitly 
detect, attempts to pushdown min/max aggregation to parquet have avoided 
implementing it for string columns (e.g. 
https://issues.apache.org/jira/browse/SPARK-36645)
   Given the above situation, the spec should be updated to allow truncation of 
min/max row group stats. This would align the spec with current reality that 
string column min/max row group stats could be truncated.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to