[ 
https://issues.apache.org/jira/browse/HIVE-14535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15593030#comment-15593030
 ] 

Sahil Takiar commented on HIVE-14535:
-------------------------------------

Hey [~sershe],

Very interesting feature. I think this could have some benefits for Hive-on-S3 
write performance also (ref: HIVE-14269). Particularly the changes to the 
{{FileSinkOperator}}. If I understand correctly, the changes cause the 
{{FileSinkOperator}} to directly write to the final Hive table location rather 
than to a staging directory. On Blobstores (like S3), this should significantly 
improve performance since data doesn't need to be copied from a staging 
directory to the final directory. We were thinking of implementing something 
similar in HIVE-14271. Do you think it would be reasonable to commit the 
changes to the {{FileSinkOperator}} without the rest of the MM tables support? 
I know there are some concerns that this "direct output committer" approach 
could cause data corruption issues, is this something was considered explicitly 
in the design? If so, could you expand on why those data corruption issues 
would occur?

> add micromanaged tables to Hive (metastore keeps track of the files)
> --------------------------------------------------------------------
>
>                 Key: HIVE-14535
>                 URL: https://issues.apache.org/jira/browse/HIVE-14535
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>
> Design doc: 
> https://docs.google.com/document/d/1b3t1RywfyRb73-cdvkEzJUyOiekWwkMHdiQ-42zCllY
> Feel free to comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to