Ryan Blue created PARQUET-411:
---------------------------------

             Summary: Format: Add a flag when min/max are truncated
                 Key: PARQUET-411
                 URL: https://issues.apache.org/jira/browse/PARQUET-411
             Project: Parquet
          Issue Type: Bug
          Components: parquet-format
    Affects Versions: format-2.3.1
            Reporter: Ryan Blue


PARQUET-372 drops page and column chunk stats when values are larger than 4k to 
avoid storing very large values in page headers and the file footer. An 
alternative approach is to truncate the values, which would still allow 
filtering on page stats. The problem with truncating values is that the value 
in stats may not be the true min or max so engines that use these values as the 
result of aggregations like {{min(col)}} would return incorrect data. We should 
consider adding metadata to allow truncating values for filtering that captures 
the fact that the values have been modified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to