[jira] [Commented] (TAJO-30) Parquet Integration

David Chen (JIRA) Tue, 05 Nov 2013 19:22:03 -0800

    [ 
https://issues.apache.org/jira/browse/TAJO-30?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814566#comment-13814566
 ]


David Chen commented on TAJO-30:
--------------------------------

Hi all,

I am new to the Tajo project. I am very excited about Tajo's capabilities and 
am interested in contributing to the project.

I am one of the main engineers working on deploying Parquet at LinkedIn and 
have made a number of contributions to the Parquet project such as adding 
support for the FIXED_LEN_BYTE_ARRAY data type and a number of Avro support 
improvements.

I am excited to see that Parquet support is planned for Tajo as well. Due to 
the Parquet's generic design, adding Tajo integration would mostly involve 
writing a FileReader, FileWriter, and a SchemaConverter so that Tajo can 
automatically convert the schema and records to Tajo's internal representation 
on the read side and then vice versa on the right side. This is the approach 
that most of the packages under parquet-mr take, such as parquet-avro, 
parquet-thrift, etc.

Min, have you started working on Parquet support for Tajo? If not, would it be 
fine if I take this ticket?

Thanks!
David

> Parquet Integration
> -------------------
>
>                 Key: TAJO-30
>                 URL: https://issues.apache.org/jira/browse/TAJO-30
>             Project: Tajo
>          Issue Type: New Feature
>            Reporter: Hyunsik Choi
>            Assignee: Dongmin Yu
>              Labels: Parquet
>
> Parquet is very promising file format developed by twitter. We need to 
> investigate the applicability of Parquet. If possible, we implement Parquet 
> port.
> http://parquet.io/



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (TAJO-30) Parquet Integration

Reply via email to