[
https://issues.apache.org/jira/browse/TAJO-30?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814566#comment-13814566
]
David Chen commented on TAJO-30:
--------------------------------
Hi all,
I am new to the Tajo project. I am very excited about Tajo's capabilities and
am interested in contributing to the project.
I am one of the main engineers working on deploying Parquet at LinkedIn and
have made a number of contributions to the Parquet project such as adding
support for the FIXED_LEN_BYTE_ARRAY data type and a number of Avro support
improvements.
I am excited to see that Parquet support is planned for Tajo as well. Due to
the Parquet's generic design, adding Tajo integration would mostly involve
writing a FileReader, FileWriter, and a SchemaConverter so that Tajo can
automatically convert the schema and records to Tajo's internal representation
on the read side and then vice versa on the right side. This is the approach
that most of the packages under parquet-mr take, such as parquet-avro,
parquet-thrift, etc.
Min, have you started working on Parquet support for Tajo? If not, would it be
fine if I take this ticket?
Thanks!
David
> Parquet Integration
> -------------------
>
> Key: TAJO-30
> URL: https://issues.apache.org/jira/browse/TAJO-30
> Project: Tajo
> Issue Type: New Feature
> Reporter: Hyunsik Choi
> Assignee: Dongmin Yu
> Labels: Parquet
>
> Parquet is very promising file format developed by twitter. We need to
> investigate the applicability of Parquet. If possible, we implement Parquet
> port.
> http://parquet.io/
--
This message was sent by Atlassian JIRA
(v6.1#6144)