[jira] Commented: (HIVE-1620) Patch to write directly to S3 from Hive

Vaibhav Aggarwal (JIRA) Mon, 20 Sep 2010 20:47:00 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-1620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912828#action_12912828
 ]


Vaibhav Aggarwal commented on HIVE-1620:
----------------------------------------

I tried to change create a new class S3FileSinkOperator but there seems to be a 
lot of complexity involved in extending the existing FileSinkOperator class.

Most of the changes that are required are in createBucketFiles() method. 
Overriding that method will lead to a lot of repeated code which would be very 
hard to maintain. That method needs to be refactored into smaller methods in 
order to extend FileSinkOperator. I should be able to do it but that seemed to 
defeat the purpose of not changing FileSinkOperator  much. Please let me know 
if you are OK with refactoring the FileSinkOperator class into smaller methods.

Based on my investigations I still feel that the current approach is better. 
You would notice that there are very few changes to the FileSinkOperator in the 
current patch.
I have just introduced a new variable "fsSupportsMove" which is always parallel 
to isNativeTable (an existing boolean variable).
The only reason I choose not to reuse isNativeTable variable is to allow the 
functionality of non-native tables to grow independent of the file systems not 
supporting move.

Please review the patch one more time considering the above argument and let me 
know which approach do you think is best.

Thanks
Vaibhav

> Patch to write directly to S3 from Hive
> ---------------------------------------
>
>                 Key: HIVE-1620
>                 URL: https://issues.apache.org/jira/browse/HIVE-1620
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Vaibhav Aggarwal
>            Assignee: Vaibhav Aggarwal
>         Attachments: HIVE-1620.patch
>
>
> We want to submit a patch to Hive which allows user to write files directly 
> to S3.
> This patch allow user to specify an S3 location as the table output location 
> and hence eliminates the need  of copying data from HDFS to S3.
> Users can run Hive queries directly over the data stored in S3.
> This patch helps integrate hive with S3 better and quicker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1620) Patch to write directly to S3 from Hive

Reply via email to