[
https://issues.apache.org/jira/browse/TAJO-30?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943956#comment-13943956
]
David Chen commented on TAJO-30:
--------------------------------
[~hyunsik], [~jihoonson] Thanks! I look forward to iterating on your feedback.
[~hjkim] Thanks for running the benchmark! Really excited that we're already
seeing performance gains with Parquet. I'll add the missing property.
[~hsaputra] Agreed. In the meantime, I'm adding some more detailed Javadoc
comments in the source code and a package-info.java that will give an overview
of the Parquet support code, including a mapping of Tajo to Parquet data types.
Yes, I have moved my changes to https://github.com/davidzchen/tajo/tree/parquet
after seeing the great news that Tajo has graduated to a TLP. :)
Congratulations!
> Parquet Integration
> -------------------
>
> Key: TAJO-30
> URL: https://issues.apache.org/jira/browse/TAJO-30
> Project: Tajo
> Issue Type: New Feature
> Reporter: Hyunsik Choi
> Assignee: David Chen
> Labels: Parquet
> Attachments: TAJO-30.patch
>
>
> Parquet is a columnar storage format developed by Twitter. Implement Parquet
> (http://parquet.io/) support for Tajo.
> The implementation consists of the following:
> * {{ParquetScanner}} and {{ParquetAppender}} - FileScanner and FileAppenders
> for reading and writing Parquet.
> * {{TajoParquetReader}} and {{TajoParquetWriter}} - Top-level reader and
> writer for serializing/deserializing to Tajo Tuples.
> * {{TajoReadSupport}} and {{TajoWriteSupport}} - Abstractions to perform
> conversion between Parquet and Tajo records.
> * {{TajoRecordMaterializer}} - Materializes Tajo Tuples from Parquet's
> internal representation.
> * {{TajoRecordConverter}} - Used by {{TajoRecordMateriailzer}} to
> materialize a Tajo Tuple.
> * {{TajoSchemaConverter}} - Converts between Tajo and Parquet schemas.
--
This message was sent by Atlassian JIRA
(v6.2#6252)