[
https://issues.apache.org/jira/browse/TAJO-30?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943919#comment-13943919
]
hyoungjunkim commented on TAJO-30:
----------------------------------
Thanks David. I really wanted parquet file format.
I already tested your patch shortly with TPCH 100 scale data.
{noformat}
select sum(l_extendedprice*l_discount) as revenue from lineitem_100 where
l_shipdate >= '1994-01-01' and l_shipdate < '1995-01-01' and l_discount >= 0.05
and l_discount <= 0.07 and l_quantity < 24;
{noformat}
- TextFile: 23.113 sec
- Parquet File: 10.996 sec
The following property is missing in the
patch(tajo-storage/src/main/resource/storage-default.xml).
{code}
<property>
<name>tajo.storage.scanner-handler.parquet.class</name>
<value>org.apache.tajo.storage.parquet.ParquetScanner</value>
</property>
{code}
> Parquet Integration
> -------------------
>
> Key: TAJO-30
> URL: https://issues.apache.org/jira/browse/TAJO-30
> Project: Tajo
> Issue Type: New Feature
> Reporter: Hyunsik Choi
> Assignee: David Chen
> Labels: Parquet
> Attachments: TAJO-30.patch
>
>
> Parquet is a columnar storage format developed by Twitter. Implement Parquet
> (http://parquet.io/) support for Tajo.
> The implementation consists of the following:
> * {{ParquetScanner}} and {{ParquetAppender}} - FileScanner and FileAppenders
> for reading and writing Parquet.
> * {{TajoParquetReader}} and {{TajoParquetWriter}} - Top-level reader and
> writer for serializing/deserializing to Tajo Tuples.
> * {{TajoReadSupport}} and {{TajoWriteSupport}} - Abstractions to perform
> conversion between Parquet and Tajo records.
> * {{TajoRecordMaterializer}} - Materializes Tajo Tuples from Parquet's
> internal representation.
> * {{TajoRecordConverter}} - Used by {{TajoRecordMateriailzer}} to
> materialize a Tajo Tuple.
> * {{TajoSchemaConverter}} - Converts between Tajo and Parquet schemas.
--
This message was sent by Atlassian JIRA
(v6.2#6252)