[ https://issues.apache.org/jira/browse/HIVE-14535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15593030#comment-15593030 ]
Sahil Takiar commented on HIVE-14535: ------------------------------------- Hey [~sershe], Very interesting feature. I think this could have some benefits for Hive-on-S3 write performance also (ref: HIVE-14269). Particularly the changes to the {{FileSinkOperator}}. If I understand correctly, the changes cause the {{FileSinkOperator}} to directly write to the final Hive table location rather than to a staging directory. On Blobstores (like S3), this should significantly improve performance since data doesn't need to be copied from a staging directory to the final directory. We were thinking of implementing something similar in HIVE-14271. Do you think it would be reasonable to commit the changes to the {{FileSinkOperator}} without the rest of the MM tables support? I know there are some concerns that this "direct output committer" approach could cause data corruption issues, is this something was considered explicitly in the design? If so, could you expand on why those data corruption issues would occur? > add micromanaged tables to Hive (metastore keeps track of the files) > -------------------------------------------------------------------- > > Key: HIVE-14535 > URL: https://issues.apache.org/jira/browse/HIVE-14535 > Project: Hive > Issue Type: Improvement > Reporter: Sergey Shelukhin > Assignee: Sergey Shelukhin > > Design doc: > https://docs.google.com/document/d/1b3t1RywfyRb73-cdvkEzJUyOiekWwkMHdiQ-42zCllY > Feel free to comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)