[
https://issues.apache.org/jira/browse/DRILL-4070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15053534#comment-15053534
]
Rahul Challapalli commented on DRILL-4070:
------------------------------------------
I tested out the migration tool written by parth. So this issue can be closed
> Files written with versions of Drill before v1.3 record metadata that is
> indistinguishable from bad metadata from other Parquet creators
> ----------------------------------------------------------------------------------------------------------------------------------------
>
> Key: DRILL-4070
> URL: https://issues.apache.org/jira/browse/DRILL-4070
> Project: Apache Drill
> Issue Type: Bug
> Components: Metadata
> Affects Versions: 1.3.0
> Reporter: Rahul Challapalli
> Assignee: Parth Chandra
> Priority: Blocker
> Fix For: 1.3.0
>
> Attachments: cache.txt, fewtypes_varcharpartition.tar.tgz
>
>
> Drill uses the parquet-mr library to write Parquet files. The metadata
> signature that Drill produced in 1.2 and earlier versions of Drill is
> indistinguishable from older footers written by other tools (such as Pig and
> Hive). There was a known bug when those tools wrote metadata that caused the
> statistics to be incorrect. To correct this, the parquet-mr library adopted a
> behavior of ignoring statistics from the old form of the Parquet footer.
> With 1.3, Drill upgraded to the latest version of parquet-mr and has now
> started ignoring these statistics as well. This ensures correct result but
> produces performance regressions (compared to Drill v1 and v2) when querying
> against partitioned Parquet files generated in Drill 1.1 and 1.2.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)