[ https://issues.apache.org/jira/browse/DRILL-4070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15002576#comment-15002576 ]
Jason Altekruse commented on DRILL-4070: ---------------------------------------- I have confirmed that we do behave properly for newly written files, the filter currently used by parquet is a version number greater than 1.8 (our new "fork", which is just a solid maven release version of the current 1.8.2-SNAPSHOT, the tip of parquet master, is called 1.8.1-drill_r0 and does get read appropriately and has the statistics respected). I also confirmed running Drill 1.2, creating a set of auto-partitioned files, where the version number is not in the range to be accepted as valid by the new parquet changes causes 1.3 to fail at pruning. I don't think there are changes that should be made to Drill to solve this issue. Unfortunately externally created files could have bad statistics because of the previous bug, if we made drill behave differently we may cause incorrect results over files created by other tools. I am trying to look if there is a unique version number in the old files that we were using, but it appears that they just contain "parquet-mr" with no version number. So unfortunately it doesn't look like we could modify parquet to provide a special case for the old Drill files, by looking for our older version string in particular. I think we need to just work on a separate migration utility to rewrite the footers in the cases where we know the files were produced with Drill. > Metadata Caching : min/max values are null for varchar columns in auto > partitioned data > --------------------------------------------------------------------------------------- > > Key: DRILL-4070 > URL: https://issues.apache.org/jira/browse/DRILL-4070 > Project: Apache Drill > Issue Type: Bug > Components: Metadata > Affects Versions: 1.3.0 > Reporter: Rahul Challapalli > Priority: Critical > Attachments: cache.txt, fewtypes_varcharpartition.tar.tgz > > > git.commit.id.abbrev=e78e286 > The metadata cache file created contains incorrect values for min/max fields > for varchar colums. The data is also partitioned on the varchar column > {code} > refresh table metadata fewtypes_varcharpartition; > {code} > As a result partition pruning is not happening. This was working after > DRILL-3937 has been fixed (d331330efd27dbb8922024c4a18c11e76a00016b) > I attached the data set and the cache file -- This message was sent by Atlassian JIRA (v6.3.4#6332)