[ 
https://issues.apache.org/jira/browse/HIVE-14165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abdullah Yousufi updated HIVE-14165:
------------------------------------
    Description: 
During split computation when a large number of files are required to be listed 
from S3, instead of executing 1 API call per file, one can optimize by listing 
1000 files in each API call. This would reduce the amount of time required for 
listing files.

Qubole has this optimization in place as detailed here: 
https://www.qubole.com/blog/product/optimizing-hadoop-for-s3-part-1/?nabe=5695374637924352:0

  was:
During split computation when a large of files are required to be listed from 
S3 then instead of executing 1 API call per file, one can optimize by listing 
1000 files in each API call. Thereby reducing the amount of time required for 
listing files.
Qubole has this optimization in place as detailed here: 
https://www.qubole.com/blog/product/optimizing-hadoop-for-s3-part-1/?nabe=5695374637924352:0


> Enable faster S3 Split Computation by listing files in blocks
> -------------------------------------------------------------
>
>                 Key: HIVE-14165
>                 URL: https://issues.apache.org/jira/browse/HIVE-14165
>             Project: Hive
>          Issue Type: Improvement
>    Affects Versions: 2.1.0
>            Reporter: Abdullah Yousufi
>            Assignee: Abdullah Yousufi
>
> During split computation when a large number of files are required to be 
> listed from S3, instead of executing 1 API call per file, one can optimize by 
> listing 1000 files in each API call. This would reduce the amount of time 
> required for listing files.
> Qubole has this optimization in place as detailed here: 
> https://www.qubole.com/blog/product/optimizing-hadoop-for-s3-part-1/?nabe=5695374637924352:0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to