[ https://issues.apache.org/jira/browse/HIVE-1620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912828#action_12912828 ]
Vaibhav Aggarwal commented on HIVE-1620: ---------------------------------------- I tried to change create a new class S3FileSinkOperator but there seems to be a lot of complexity involved in extending the existing FileSinkOperator class. Most of the changes that are required are in createBucketFiles() method. Overriding that method will lead to a lot of repeated code which would be very hard to maintain. That method needs to be refactored into smaller methods in order to extend FileSinkOperator. I should be able to do it but that seemed to defeat the purpose of not changing FileSinkOperator much. Please let me know if you are OK with refactoring the FileSinkOperator class into smaller methods. Based on my investigations I still feel that the current approach is better. You would notice that there are very few changes to the FileSinkOperator in the current patch. I have just introduced a new variable "fsSupportsMove" which is always parallel to isNativeTable (an existing boolean variable). The only reason I choose not to reuse isNativeTable variable is to allow the functionality of non-native tables to grow independent of the file systems not supporting move. Please review the patch one more time considering the above argument and let me know which approach do you think is best. Thanks Vaibhav > Patch to write directly to S3 from Hive > --------------------------------------- > > Key: HIVE-1620 > URL: https://issues.apache.org/jira/browse/HIVE-1620 > Project: Hadoop Hive > Issue Type: New Feature > Reporter: Vaibhav Aggarwal > Assignee: Vaibhav Aggarwal > Attachments: HIVE-1620.patch > > > We want to submit a patch to Hive which allows user to write files directly > to S3. > This patch allow user to specify an S3 location as the table output location > and hence eliminates the need of copying data from HDFS to S3. > Users can run Hive queries directly over the data stored in S3. > This patch helps integrate hive with S3 better and quicker. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.