[
https://issues.apache.org/jira/browse/TAJO-20?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010914#comment-14010914
]
Hyunsik Choi commented on TAJO-20:
----------------------------------
{{INSERT OVERWRITE INTO}} which removes all table data and inserts new data is
already implemented in the current Tajo. So, Grammar and many parts are
implemented. However, {{INSERT INTO}} statement which preserves existing data
and adds new data is not implemented. This feature is necessary. It would be
very nice if someone take this issue.
As you asked, I'm going to give more description.
Many parts are already implemented in the current Tajo. The key of this issue
is to determine the file name pattern used for newly written data files and
enable each task to output the determined file names. Currently, each worker
writes the files as {{part-<execution block id>-<queryunit id>}}, where query
unit is corresponding to Task in MR.
Example:
{code}
part-02-000001
part-02-000002
{code}
If possible, It would be nice if newly written file names follow the last
written file name. But, this manner may require not small changes.
We can get the last file name in GlobalEngine in TajoMaster, and we can convey
the filename prefix and the last number via {{QueryContext}} object which are
propagated throughout all paths of a query. As I mentioned above, each query
unit generates the output filename according to the query unit id (i.e., task
id). In order to follow the last number of the final written file, we need to
modify the file name only if the filename prefix and last number is given.
My description is just my idea. You can feel free to suggest your idea.
Best regards,
Hyunsik
> INSERT INTO ... SELECT
> ----------------------
>
> Key: TAJO-20
> URL: https://issues.apache.org/jira/browse/TAJO-20
> Project: Tajo
> Issue Type: New Feature
> Reporter: Hyunsik Choi
>
> We should support 'INSERT INTO ... SELECT' statement.
--
This message was sent by Atlassian JIRA
(v6.2#6252)