[jira] [Comment Edited] (TAJO-30) Parquet Integration

David Chen (JIRA) Tue, 25 Mar 2014 13:26:06 -0700

    [ 
https://issues.apache.org/jira/browse/TAJO-30?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13946636#comment-13946636
 ]


David Chen edited comment on TAJO-30 at 3/25/14 8:23 PM:
---------------------------------------------------------

Thanks for the clarification, Hyunsik.

I have fixed the way NULLs are handled. I will post an updated patch after 
adding some more Javadoc comments and cleanup.

By the way, I have opened a few tickets:

 * TAJO-709 - Add {{.reviewboardrc}} and clean up {{request-patch-review.py}} 
to use {{rbt}}
 * TAJO-710 - Support nested schemas and non-scalar types
 * TAJO-711 - Add Avro storage support. This is not for switching from Protobuf 
to Avro as discussed in TAJO-28 but to add a {{FileScanner}} and 
{{FileAppender}} for Avro. We currently store most of our data in Avro, and 
having Avro storage support is a must for us. I have a pretty good 
understanding of Avro and have contributed some to improvements to Parquet's 
Avro integration and would be happy to pick this up after we commit this patch.

Thanks,
David


was (Author: davidzchen):
Thanks for the clarification, Hyunsik.

I have fixed the way NULLs are handled. I will post an updated patch after 
adding some more Javadoc comments and cleanup.

By the way, I have opened a few tickets:

 * TAJO-709 - Add {{.reviewboardrc}} and clean up {{request-patch-review.py}} 
to use {{rbt}}
 * TAJO-710 - Support nested schemas
 * TAJO-711 - Add Avro storage support. This is not for switching from Protobuf 
to Avro as discussed in TAJO-28 but to add a {{FileScanner}} and 
{{FileAppender}} for Avro. We currently store most of our data in Avro, and 
having Avro storage support is a must for us. I have a pretty good 
understanding of Avro and have contributed some to improvements to Parquet's 
Avro integration and would be happy to pick this up after we commit this patch.

Thanks,
David

> Parquet Integration
> -------------------
>
>                 Key: TAJO-30
>                 URL: https://issues.apache.org/jira/browse/TAJO-30
>             Project: Tajo
>          Issue Type: New Feature
>            Reporter: Hyunsik Choi
>            Assignee: David Chen
>              Labels: Parquet
>         Attachments: TAJO-30.patch, null_handling.patch
>
>
> Parquet is a columnar storage format developed by Twitter. Implement Parquet 
> (http://parquet.io/) support for Tajo.
> The implementation consists of the following:
>  * {{ParquetScanner}} and {{ParquetAppender}} - FileScanner and FileAppenders 
> for reading and writing Parquet.
>  * {{TajoParquetReader}} and {{TajoParquetWriter}} - Top-level reader and 
> writer for serializing/deserializing to Tajo Tuples.
>  * {{TajoReadSupport}} and {{TajoWriteSupport}} - Abstractions to perform 
> conversion between Parquet and Tajo records.
>  * {{TajoRecordMaterializer}} - Materializes Tajo Tuples from Parquet's 
> internal representation.
>  * {{TajoRecordConverter}} - Used by {{TajoRecordMateriailzer}} to 
> materialize a Tajo Tuple.
>  * {{TajoSchemaConverter}} - Converts between Tajo and Parquet schemas.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (TAJO-30) Parquet Integration

Reply via email to