[ 
https://issues.apache.org/jira/browse/TAJO-30?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943852#comment-13943852
 ] 

David Chen edited comment on TAJO-30 at 3/22/14 2:38 AM:
---------------------------------------------------------

I ended up special-casing handling of the {{NULL_TYPE}} columns. On the write 
side, {{NULL_TYPE}} columns are ignored and not written to the Parquet file. On 
the read side, I added code to handle the case where {{NULL_TYPE}} columns are 
in the projection.

All the tajo-storage tests now pass. I am posting a patch.

FYI, as of now, there is a test in Tajo Core Backend that is failing 
(org.apache.tajo.benchmarkTestTPCH). This appears to be unrelated to my changes 
because this test fails on the master branch as well.


was (Author: davidzchen):
I ended up special-casing handling of the {{NULL_TYPE}} columns. On the write 
side, {{NULL_TYPE}} columns are ignored and not written to the Parquet file. On 
the read side, I added code to handle the case where {{NULL_TYPE}} columns are 
in the projection.

All the tajo-storage tests now pass. I am posting a patch.

> Parquet Integration
> -------------------
>
>                 Key: TAJO-30
>                 URL: https://issues.apache.org/jira/browse/TAJO-30
>             Project: Tajo
>          Issue Type: New Feature
>            Reporter: Hyunsik Choi
>            Assignee: David Chen
>              Labels: Parquet
>         Attachments: TAJO-30.patch
>
>
> Parquet is a columnar storage format developed by Twitter. Implement Parquet 
> (http://parquet.io/) support for Tajo.
> The implementation consists of the following:
>  * {{ParquetScanner}} and {{ParquetAppender}} - FileScanner and FileAppenders 
> for reading and writing Parquet.
>  * {{TajoParquetReader}} and {{TajoParquetWriter}} - Top-level reader and 
> writer for serializing/deserializing to Tajo Tuples.
>  * {{TajoReadSupport}} and {{TajoWriteSupport}} - Abstractions to perform 
> conversion between Parquet and Tajo records.
>  * {{TajoRecordMaterializer}} - Materializes Tajo Tuples from Parquet's 
> internal representation.
>  * {{TajoRecordConverter}} - Used by {{TajoRecordMateriailzer}} to 
> materialize a Tajo Tuple.
>  * {{TajoSchemaConverter}} - Converts between Tajo and Parquet schemas.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to