[jira] [Comment Edited] (TAJO-711) Add Avro storage support

Hyunsik Choi (JIRA) Tue, 15 Apr 2014 22:15:31 -0700

    [ 
https://issues.apache.org/jira/browse/TAJO-711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13970429#comment-13970429
 ]


Hyunsik Choi edited comment on TAJO-711 at 4/16/14 5:13 AM:
------------------------------------------------------------

This is my comment for the concept of schema evolving table.

Few days ago, I discussed your idea with Hyoungjun in offline. We were very 
happy to see your interesting idea. I got some additional suggestion from 
Hyoungjun, and I add my some concrete ideas to them.

I'd like to give some assumption and define some terms before I discuss the 
idea.

 * A partitioned table has a schema.
  ** Let us call this schema 'parent schema'.
 * Each partition has its own schema.
  ** Let us call this schema 'partition schema'.
 * Let us call this kind of table 'a schema-evolving table'.

 (I know that my naming sense is not good. They are temporary names. I hope 
that some guys suggest better names.)

The rough idea is as follows:

 * Even though a schema is actually an ordered set of fields, we see the schema 
is just a set of fields when we deals with the relationship between parent 
schema and partition schemas.
 * The schema of a schema evolving table must be a super set of all fields in 
partition schemas.
 * The field set in each schema must be a subset of the parent schema.
 * The same name fields in all partition schemas including the parent schema 
must be the same data types.
 * The partition schemas among partitions can be different one another.
 * The order of schema fields among partitions can be different. (It's because 
we just see the fields as a set.)
 * Newly added fields of new partitions are added to the tail of the parent 
schema.
   ** The schema maintenance will be performed when 'ALTER TABLE ADD PARTITION' 
is executed.

In planning phases, Tajo will use only the parent schema, and then it will 
rewrites some projection plan for each partition if needed. When there is no 
corresponding field required in a query in a certain partition, the field will 
be NULL value in the processing on the partition. Processing multiple 
partitions with different schemas will output tuples with the same schema via 
the same projection.


was (Author: hyunsik):
This is my comment for the concept of schema evolving table.

Few days ago, I discussed your idea with Hyoungjun in offline. We were very 
happy to see your interesting idea. I got some additional suggestion from 
Hyoungjun, and I add my some concrete ideas to them.

I'd like to give some assumption and define some terms before I discuss the 
idea.

 * A partitioned table has a schema.
  ** Let us call this schema 'parent schema'.
 * Each partition has its own schema.
  ** Let us call this schema 'partition schema'.
 * Let us call this kind of table 'a schema-evolving table'.

 (I know that my naming sense is not good. They are temporary names. I hope 
that some guys suggest better names.)

The rough idea is as follows:

 * Even though a schema is actually an ordered set of fields, we see the schema 
is just a set of fields when we deals with the relationship between parent 
schema and partition schemas.
 * The schema of a schema evolving table must be a super set of all fields in 
partition schemas.
 * The field set in each schema must be a subset of the parent schema.
 * The same name fields in all partition schemas including the parent schema 
must be the same data types.
 * The partition schemas among partitions can be different one another.
 * The order of schema fields among partitions can be different. (It's because 
we just see the fields as a set.)
 * Newly added fields of new partitions are added to the tail of the parent 
schema.
   ** The schema maintenance will be performed when 'ALTER TABLE ADD PARTITION' 
is executed.

In planning phases, Tajo will use only the parent schema, and then it will 
rewrites some projection plan for each partition if needed. When there is no 
corresponding field required in a query in a certain partition, the field will 
be NULL value in the processing on the partition.

> Add Avro storage support
> ------------------------
>
>                 Key: TAJO-711
>                 URL: https://issues.apache.org/jira/browse/TAJO-711
>             Project: Tajo
>          Issue Type: New Feature
>            Reporter: David Chen
>            Assignee: David Chen
>         Attachments: TAJO-711.patch, TAJO-711.patch, 
> TAJO-711_140415_rebased.patch, TAJO-711_20140413_20:36:40.patch, 
> TAJO-711_20140413_21:00:34.patch, TAJO-711_20140413_21:46:27.patch, 
> TAJO-711_20140414_11:07:13.patch, TAJO-711_20140415_11:13:43.patch
>
>
> Add {{FileScanner}} and {{FileAppender}} for reading from and writing to Avro.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (TAJO-711) Add Avro storage support

Reply via email to