[jira] [Commented] (TAJO-283) Add Table Partitioning

Hyunsik Choi (JIRA) Mon, 16 Dec 2013 02:57:04 -0800

    [ 
https://issues.apache.org/jira/browse/TAJO-283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13849020#comment-13849020
 ]


Hyunsik Choi commented on TAJO-283:
-----------------------------------

Currently, as you mentioned, we only have implemented the DDL and 'INSERT 
OVERWRITE INTO' for partitioned tables. 

Please take a look at TestInsertQuery::testInsertOverwritePartition* unit 
tests. Firstly, you need to create a partitioned table with 'PARTITION BY' 
clause. Then, you can store some data into a partitioned table by executing 
'INSERT OVERWRITE INTO' statement. Later, Tajo will support CTAS with a 
partitioned table.

Basically, a logical planner can know whether an insert statement is for 
partitioned or not. If it is for a partitioned table, StoreTableNode will has 
partition information. Then, PhysicalPlannerImpl::createStorePlan chooses a 
proper partitioned store executor. Note that Tajo has used a word 'partition' 
as a meaning of shuffle. So, it definitely makes you very confuse. We will 
reafactor those names as soon as possible.

In addition, the partitioned tables in Tajo still is under heavy development. 
Now, I'm implementing the query optimization part for partitioning pruning. I 
think that we need to have more refactoring and refinement steps on the codes 
of partitioned tables.

> Add Table Partitioning
> ----------------------
>
>                 Key: TAJO-283
>                 URL: https://issues.apache.org/jira/browse/TAJO-283
>             Project: Tajo
>          Issue Type: New Feature
>          Components: catalog, physical operator, planner/optimizer
>            Reporter: Hyunsik Choi
>            Assignee: Hyunsik Choi
>             Fix For: 0.8-incubating
>
>
> Table partitioning gives many facilities to maintain large tables. First of 
> all, it enables the data management system to prune many input data which are 
> actually not necessary. In addition, it gives the system more optimization  
> opportunities  that exploit the physical layouts.
> Basically, Tajo should follow the RDBMS-style partitioning system, including 
> range, list, hash, and so on. In order to keep Hive compatibility, we need to 
> add Hive partition type that does not exists in existing DBMS systems.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (TAJO-283) Add Table Partitioning

Reply via email to