[
https://issues.apache.org/jira/browse/DRILL-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988742#comment-14988742
]
ASF GitHub Bot commented on DRILL-4028:
---------------------------------------
GitHub user jaltekruse opened a pull request:
https://github.com/apache/drill/pull/236
DRILL-4028: Get off parquet fork
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jaltekruse/incubator-drill
parquet-update-squash
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/drill/pull/236.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #236
----
commit afb72c81bbba69346c48c77852f2429bae47dea4
Author: Jason Altekruse <[email protected]>
Date: 2015-09-04T18:09:23Z
DRILL-4028: Part 1 - Remove references to the shaded version of a Jackson
@JsonCreator annotation from parquet, replace with proper fasterxml version.
commit 0f51a6bf341699aa7f14457b2c49097e84fff936
Author: Jason Altekruse <[email protected]>
Date: 2015-09-04T18:17:21Z
DRILL-4028: Part 2 - Fixing imports using the wrong parquet packages after
rebase.
clean up imports in generated source template
commit 4feb538da813f2f1a974337f5e6874866c3cd350
Author: Jason Altekruse <[email protected]>
Date: 2015-09-14T18:13:04Z
DRILL-4028: Part 3 - Fixing issues with Drill parquet read a write path
after merging the Drill parquet fork back into mainline.
Fixed the issue with the writer, needed to flush the RecordConsumer in the
ParquetRecordWriter.
Consolidate page reading code
Fix buffer sizes, uncompressed and compressed sizes were backwards
The issue was a mismatch in the usage of byte buffers. Even though the
position of a buffer was being set, that seemed to be ignored in the setSafe
method on the varbinary vector. I needed to pass in the offset as it seems to
just read from the beginning of the buffer. I'm not sure this is how
ByteBuffers are supposed to be used, but we seem to make use of this pattern
commonly so I'm not sure it could be easily refactored.
Added some test to print out some additional context when an ordered
comparison of two datasets fails in a test.
Removing usage of Drill classes from DirectCodecFactory, getting it ready
to be moved into the parquet codebase.
Fix up parquet API usage in Hive Module.
Fix dictionary reading, the changes made I think may speed up reading
dictionary encoded files by avoiding an extra copy.
Adding unit test to read a write all types in parquet, the decimal types
and interval year have some issues.
Use direct codec factory from new package in the parquet library now that
it has been moved.
Moving the test for Direct Codec Factory out of the Drill source as the
class itself has been moved.
Small fix after consolidating two different ByteBuffer based
implementations of BytesInput.
Small fixes to accommodate interface changes.
Small changes to remove direct references to DirectCodecFactory, this class
is not accessible outside of parquet, but an instance with the same contract is
now accessible with a new factory method on CodecFactory.
Fixed failing test using miniDFS when reading a larger parquet file.
----
> Merge Drill parquet modifications back into the mainline project
> ----------------------------------------------------------------
>
> Key: DRILL-4028
> URL: https://issues.apache.org/jira/browse/DRILL-4028
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Parquet
> Reporter: Jason Altekruse
> Assignee: Jason Altekruse
> Fix For: 1.3.0
>
>
> Drill has been maintaining a fork of Parquet for over a year. The changes
> need to make it back into the main repository so we don't have to bother
> merging in all of the new changes from the master repository into the fork.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)