[ 
https://issues.apache.org/jira/browse/DRILL-4070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15021629#comment-15021629
 ] 

ASF GitHub Bot commented on DRILL-4070:
---------------------------------------

GitHub user jaltekruse opened a pull request:

    https://github.com/apache/drill/pull/278

    DRILL-4070: Add note about parquet file migration in 1.3

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jaltekruse/incubator-drill gh-pages

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/278.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #278
    
----
commit 452426579980151535c20e0f61b02fd60207172d
Author: Jason Altekruse <[email protected]>
Date:   2015-11-23T07:00:47Z

    Add note about parquet file migration in 1.3

----


> Files written with versions of Drill before v1.3 record metadata that is 
> indistinguishable from bad metadata from other Parquet creators
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-4070
>                 URL: https://issues.apache.org/jira/browse/DRILL-4070
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Metadata
>    Affects Versions: 1.3.0
>            Reporter: Rahul Challapalli
>            Assignee: Parth Chandra
>            Priority: Blocker
>             Fix For: 1.3.0
>
>         Attachments: cache.txt, fewtypes_varcharpartition.tar.tgz
>
>
> Drill uses the parquet-mr library to write Parquet files. The metadata 
> signature that Drill produced in 1.2 and earlier versions of Drill is 
> indistinguishable from older footers written by other tools (such as Pig and 
> Hive). There was a known bug when those tools wrote metadata that caused the 
> statistics to be incorrect. To correct this, the parquet-mr library adopted a 
> behavior of ignoring statistics from the old form of the Parquet footer. 
> With 1.3, Drill upgraded to the latest version of parquet-mr and has now 
> started ignoring these statistics as well. This ensures correct result but 
> produces performance regressions (compared to Drill v1 and v2) when querying 
> against partitioned Parquet files generated in Drill 1.1 and 1.2. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to