[
https://issues.apache.org/jira/browse/TAJO-30?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943852#comment-13943852
]
David Chen edited comment on TAJO-30 at 3/22/14 2:38 AM:
---------------------------------------------------------
I ended up special-casing handling of the {{NULL_TYPE}} columns. On the write
side, {{NULL_TYPE}} columns are ignored and not written to the Parquet file. On
the read side, I added code to handle the case where {{NULL_TYPE}} columns are
in the projection.
All the tajo-storage tests now pass. I am posting a patch.
FYI, as of now, there is a test in Tajo Core Backend that is failing
(org.apache.tajo.benchmarkTestTPCH). This appears to be unrelated to my changes
because this test fails on the master branch as well.
was (Author: davidzchen):
I ended up special-casing handling of the {{NULL_TYPE}} columns. On the write
side, {{NULL_TYPE}} columns are ignored and not written to the Parquet file. On
the read side, I added code to handle the case where {{NULL_TYPE}} columns are
in the projection.
All the tajo-storage tests now pass. I am posting a patch.
> Parquet Integration
> -------------------
>
> Key: TAJO-30
> URL: https://issues.apache.org/jira/browse/TAJO-30
> Project: Tajo
> Issue Type: New Feature
> Reporter: Hyunsik Choi
> Assignee: David Chen
> Labels: Parquet
> Attachments: TAJO-30.patch
>
>
> Parquet is a columnar storage format developed by Twitter. Implement Parquet
> (http://parquet.io/) support for Tajo.
> The implementation consists of the following:
> * {{ParquetScanner}} and {{ParquetAppender}} - FileScanner and FileAppenders
> for reading and writing Parquet.
> * {{TajoParquetReader}} and {{TajoParquetWriter}} - Top-level reader and
> writer for serializing/deserializing to Tajo Tuples.
> * {{TajoReadSupport}} and {{TajoWriteSupport}} - Abstractions to perform
> conversion between Parquet and Tajo records.
> * {{TajoRecordMaterializer}} - Materializes Tajo Tuples from Parquet's
> internal representation.
> * {{TajoRecordConverter}} - Used by {{TajoRecordMateriailzer}} to
> materialize a Tajo Tuple.
> * {{TajoSchemaConverter}} - Converts between Tajo and Parquet schemas.
--
This message was sent by Atlassian JIRA
(v6.2#6252)