[jira] [Commented] (TAJO-30) Parquet Integration

hyoungjunkim (JIRA) Fri, 21 Mar 2014 22:25:34 -0700

    [ 
https://issues.apache.org/jira/browse/TAJO-30?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943919#comment-13943919
 ]


hyoungjunkim commented on TAJO-30:
----------------------------------

Thanks David. I really wanted parquet file format.
I already tested your patch shortly with TPCH 100 scale data.
{noformat}
select sum(l_extendedprice*l_discount) as revenue from lineitem_100 where 
l_shipdate >= '1994-01-01' and l_shipdate < '1995-01-01' and l_discount >= 0.05 
and l_discount <= 0.07 and l_quantity < 24;
{noformat}

- TextFile:  23.113 sec
- Parquet File: 10.996 sec

The following property is missing in the 
patch(tajo-storage/src/main/resource/storage-default.xml).
{code}
  <property>
    <name>tajo.storage.scanner-handler.parquet.class</name>
    <value>org.apache.tajo.storage.parquet.ParquetScanner</value>
  </property>
{code}

> Parquet Integration
> -------------------
>
>                 Key: TAJO-30
>                 URL: https://issues.apache.org/jira/browse/TAJO-30
>             Project: Tajo
>          Issue Type: New Feature
>            Reporter: Hyunsik Choi
>            Assignee: David Chen
>              Labels: Parquet
>         Attachments: TAJO-30.patch
>
>
> Parquet is a columnar storage format developed by Twitter. Implement Parquet 
> (http://parquet.io/) support for Tajo.
> The implementation consists of the following:
>  * {{ParquetScanner}} and {{ParquetAppender}} - FileScanner and FileAppenders 
> for reading and writing Parquet.
>  * {{TajoParquetReader}} and {{TajoParquetWriter}} - Top-level reader and 
> writer for serializing/deserializing to Tajo Tuples.
>  * {{TajoReadSupport}} and {{TajoWriteSupport}} - Abstractions to perform 
> conversion between Parquet and Tajo records.
>  * {{TajoRecordMaterializer}} - Materializes Tajo Tuples from Parquet's 
> internal representation.
>  * {{TajoRecordConverter}} - Used by {{TajoRecordMateriailzer}} to 
> materialize a Tajo Tuple.
>  * {{TajoSchemaConverter}} - Converts between Tajo and Parquet schemas.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (TAJO-30) Parquet Integration

Reply via email to