[
https://issues.apache.org/jira/browse/TAJO-711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13970429#comment-13970429
]
Hyunsik Choi commented on TAJO-711:
-----------------------------------
This is my comment for the concept of schema evolving table.
Few days ago, I discussed your idea with Hyoungjun in offline. We were very
happy to see your interesting idea. I got some additional suggestion from
Hyoungjun, and I add my some concrete ideas to them.
I'd like to give some assumption and define some terms before I discuss the
idea.
* A partitioned table has a schema.
** Let us call this schema 'parent schema'.
* Each partition has its own schema.
** Let us call this schema 'partition schema'.
* Let us call this kind of table 'a schema-evolving table'.
(I know that my naming sense is not good. They are temporary names. I hope
that some guys suggest better names.)
The rough idea is as follows:
* Even though a schema is actually an ordered set of fields, we see the schema
is just a set of fields when we deals with the relationship between parent
schema and partition schemas.
* The schema of a schema evolving table must be a super set of all fields in
partition schemas.
* The field set in each schema must be a subset of the parent schema.
* The same name fields in all partition schemas including the parent schema
must be the same data types.
* The partition schemas among partitions can be different one another.
* The order of schema fields among partitions can be different. (It's because
we just see the fields as a set.)
* Newly added fields of new partitions are added to the tail of the parent
schema.
** The schema maintenance will be performed when 'ALTER TABLE ADD PARTITION'
is executed.
In planning phases, Tajo will use only the parent schema, and then it will
rewrites some projection plan for each partition if needed. When there is no
corresponding field required in a query in a certain partition, the field will
be NULL value in the processing on the partition.
> Add Avro storage support
> ------------------------
>
> Key: TAJO-711
> URL: https://issues.apache.org/jira/browse/TAJO-711
> Project: Tajo
> Issue Type: New Feature
> Reporter: David Chen
> Assignee: David Chen
> Attachments: TAJO-711.patch, TAJO-711.patch,
> TAJO-711_140415_rebased.patch, TAJO-711_20140413_20:36:40.patch,
> TAJO-711_20140413_21:00:34.patch, TAJO-711_20140413_21:46:27.patch,
> TAJO-711_20140414_11:07:13.patch, TAJO-711_20140415_11:13:43.patch
>
>
> Add {{FileScanner}} and {{FileAppender}} for reading from and writing to Avro.
--
This message was sent by Atlassian JIRA
(v6.2#6252)