GitHub user parthchandra opened a pull request:
https://github.com/apache/drill/pull/162
Drill 2908: Int96 support
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/parthchandra/incubator-drill DRILL-2908
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/drill/pull/162.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #162
----
commit a9eb4314ff56497fd2f892d79ee930231261f1d5
Author: Jason Altekruse <[email protected]>
Date: 2015-04-14T23:27:59Z
DRILL-2908:Enable reading the Int 96 type from parquet files.
column chunk metadata can be out of order from the column ordering in the
schema, even though it exposes both as a list, making them seem like they
should correspond, so we have to make our own map between the column names and
indexes in the list.
Support for varbinary reading and int96 reading in the new reader.
Support the second version page header, the java library will only
dictionary encode fix length byte arrays when the writer version is set to 2.0
Looks to be working in the vectorized reader, need a test case.
Fixed complex reader, was using the wrong field to figure out the length to
read.
Conflicts:
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/NullableFixedByteAlignedReaders.java
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ParquetFixedWidthDictionaryReaders.java
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ParquetToDrillTypeConverter.java
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet2/DrillParquetGroupConverter.java
UDF for reading impala timestamps from varbinary
Fix for reading fixed binary and int96 columns in the vectorized parquet
reader.
Conflicts:
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/NullableFixedByteAlignedReaders.java
Fix for a bug reading fixed binary and int 96 data out of parquet when the
data is plain encoded.
commit c02614ce1e2dc3d7782aa617493deda8f8a269dd
Author: Parth Chandra <[email protected]>
Date: 2015-09-15T22:27:17Z
DRILL-2908: Fix Parquet for var length vectors where encoding changes
across pages. Add unit tests. Add option to make parquet page size and
disctionary page size configurable at session level.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---