[jira] [Updated] (HIVE-14165) Enable faster S3 Split Computation

Abdullah Yousufi (JIRA) Wed, 27 Jul 2016 11:30:04 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-14165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Abdullah Yousufi updated HIVE-14165:
------------------------------------
    Description: Split size computation be may improved by the optimizations 
for listFiles() in HADOOP-13208  (was: During split computation when a large 
number of files are required to be listed from S3, instead of executing 1 API 
call per file, one can optimize by listing 1000 files in each API call. This 
would reduce the amount of time required for listing files.

Qubole has this optimization in place as detailed here: 
https://www.qubole.com/blog/product/optimizing-hadoop-for-s3-part-1/?nabe=5695374637924352:0)

> Enable faster S3 Split Computation
> ----------------------------------
>
>                 Key: HIVE-14165
>                 URL: https://issues.apache.org/jira/browse/HIVE-14165
>             Project: Hive
>          Issue Type: Sub-task
>    Affects Versions: 2.1.0
>            Reporter: Abdullah Yousufi
>            Assignee: Abdullah Yousufi
>
> Split size computation be may improved by the optimizations for listFiles() 
> in HADOOP-13208



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14165) Enable faster S3 Split Computation

Reply via email to