[ 
https://issues.apache.org/jira/browse/TAJO-20?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010914#comment-14010914
 ] 

Hyunsik Choi commented on TAJO-20:
----------------------------------

{{INSERT OVERWRITE INTO}} which removes all table data and inserts new data is 
already implemented in the current Tajo. So, Grammar and many parts are 
implemented. However, {{INSERT INTO}} statement which preserves existing data 
and adds new data is not implemented. This feature is necessary. It would be 
very nice if someone take this issue.

As you asked, I'm going to give more description.

Many parts are already implemented in the current Tajo. The key of this issue 
is to determine the file name pattern used for newly written data files and 
enable each task to output the determined file names. Currently, each worker 
writes the files as {{part-<execution block id>-<queryunit id>}}, where query 
unit is corresponding to Task in MR.

Example:
{code}
part-02-000001
part-02-000002
{code}

If possible, It would be nice if newly written file names follow the last 
written file name. But, this manner may require not small changes. 

We can get the last file name in GlobalEngine in TajoMaster, and we can convey 
the filename prefix and the last number via {{QueryContext}} object which are 
propagated throughout all paths of a query. As I mentioned above, each query 
unit generates the output filename according to the query unit id (i.e., task 
id). In order to follow the last number of the final written file, we need to 
modify the file name only if the filename prefix and last number is given.

My description is just my idea. You can feel free to suggest your idea. 

Best regards,
Hyunsik

> INSERT INTO ... SELECT
> ----------------------
>
>                 Key: TAJO-20
>                 URL: https://issues.apache.org/jira/browse/TAJO-20
>             Project: Tajo
>          Issue Type: New Feature
>            Reporter: Hyunsik Choi
>
> We should support 'INSERT INTO ... SELECT' statement.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to