[
https://issues.apache.org/jira/browse/TAJO-30?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13940105#comment-13940105
]
Hyunsik Choi commented on TAJO-30:
----------------------------------
Hi [~davidzchen],
I have a plan for nested schema. Currently, Tajo only supports a flat schema
like relational DBMS. So, even though Tajo is extended to nested data mode, it
will not break the compatibility.
I'm thinking that Tajo takes Parquet data model (= protobuf or BigQuery). When
I consider nested data model, I thought two main points. Parquet data model
satisfies with these points. The first point that I've thought is the
processing model on nested data. Parquet data model is the same to that of
BigQuery, and BigQuery already concreted the processing model including
flattening, cross production on repeated fields, and aggregation on repeated
fields \[1]\[2]. The second point is file format. Parquet is a native file
format for this model. Parquet already includes the efficient record assembly
method. Besides, Parquet is already mature and is widely used in many systems.
\[1] http://research.google.com/pubs/pub36632.html
\[2] https://developers.google.com/bigquery/docs/data
I'm thinking that we need three stages for this work. Firstly, we can start
with a small change to improve our schema system. Then, we will add some
physical operator to just flatten one nested row into a number of flattened
rows. Finally, we will solve some query optimization issues like
projection/filter push down on nested schema and will add some physical
operators to directly process nested rows.
If you have any idea, feel free to share with us.
Thanks,
Hyunsik
> Parquet Integration
> -------------------
>
> Key: TAJO-30
> URL: https://issues.apache.org/jira/browse/TAJO-30
> Project: Tajo
> Issue Type: New Feature
> Reporter: Hyunsik Choi
> Assignee: David Chen
> Labels: Parquet
>
> Parquet is very promising file format developed by twitter. We need to
> investigate the applicability of Parquet. If possible, we implement Parquet
> port.
> http://parquet.io/
--
This message was sent by Atlassian JIRA
(v6.2#6252)