[jira] [Commented] (DRILL-4028) Merge Drill parquet modifications back into the mainline project

ASF GitHub Bot (JIRA) Tue, 03 Nov 2015 18:14:57 -0800

    [ 
https://issues.apache.org/jira/browse/DRILL-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988742#comment-14988742
 ]


ASF GitHub Bot commented on DRILL-4028:
---------------------------------------

GitHub user jaltekruse opened a pull request:

    https://github.com/apache/drill/pull/236

    DRILL-4028: Get off parquet fork

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jaltekruse/incubator-drill 
parquet-update-squash

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/236.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #236
    
----
commit afb72c81bbba69346c48c77852f2429bae47dea4
Author: Jason Altekruse <[email protected]>
Date:   2015-09-04T18:09:23Z

    DRILL-4028: Part 1 - Remove references to the shaded version of a Jackson 
@JsonCreator annotation from parquet, replace with proper fasterxml version.

commit 0f51a6bf341699aa7f14457b2c49097e84fff936
Author: Jason Altekruse <[email protected]>
Date:   2015-09-04T18:17:21Z

    DRILL-4028: Part 2 - Fixing imports using the wrong parquet packages after 
rebase.
    
    clean up imports in generated source template

commit 4feb538da813f2f1a974337f5e6874866c3cd350
Author: Jason Altekruse <[email protected]>
Date:   2015-09-14T18:13:04Z

    DRILL-4028: Part 3 - Fixing issues with Drill parquet read a write path 
after merging the Drill parquet fork back into mainline.
    
    Fixed the issue with the writer, needed to flush the RecordConsumer in the 
ParquetRecordWriter.
    
    Consolidate page reading code
    
    Fix buffer sizes, uncompressed and compressed sizes were backwards
    
    The issue was a mismatch in the usage of byte buffers. Even though the 
position of a buffer was being set, that seemed to be ignored in the setSafe 
method on the varbinary vector. I needed to pass in the offset as it seems to 
just read from the beginning of the buffer. I'm not sure this is how 
ByteBuffers are supposed to be used, but we seem to make use of this pattern 
commonly so I'm not sure it could be easily refactored.
    
    Added some test to print out some additional context when an ordered 
comparison of two datasets fails in a test.
    
    Removing usage of Drill classes from DirectCodecFactory, getting it ready 
to be moved into the parquet codebase.
    
    Fix up parquet API usage in Hive Module.
    
    Fix dictionary reading, the changes made I think may speed up reading 
dictionary encoded files by avoiding an extra copy.
    
    Adding unit test to read a write all types in parquet, the decimal types 
and interval year have some issues.
    
    Use direct codec factory from new package in the parquet library now that 
it has been moved.
    
    Moving the test for Direct Codec Factory out of the Drill source as the 
class itself has been moved.
    
    Small fix after consolidating two different ByteBuffer based 
implementations of BytesInput.
    
    Small fixes to accommodate interface changes.
    
    Small changes to remove direct references to DirectCodecFactory, this class 
is not accessible outside of parquet, but an instance with the same contract is 
now accessible with a new factory method on CodecFactory.
    
    Fixed failing test using miniDFS when reading a larger parquet file.

----


> Merge Drill parquet modifications back into the mainline project
> ----------------------------------------------------------------
>
>                 Key: DRILL-4028
>                 URL: https://issues.apache.org/jira/browse/DRILL-4028
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>            Reporter: Jason Altekruse
>            Assignee: Jason Altekruse
>             Fix For: 1.3.0
>
>
> Drill has been maintaining a fork of Parquet for over a year. The changes 
> need to make it back into the main repository so we don't have to bother 
> merging in all of the new changes from the master repository into the fork.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4028) Merge Drill parquet modifications back into the mainline project

Reply via email to