[
https://issues.apache.org/jira/browse/HIVE-14165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sergio Peña updated HIVE-14165:
-------------------------------
Issue Type: Sub-task (was: Improvement)
Parent: HIVE-14269
> Enable faster S3 Split Computation by listing files in blocks
> -------------------------------------------------------------
>
> Key: HIVE-14165
> URL: https://issues.apache.org/jira/browse/HIVE-14165
> Project: Hive
> Issue Type: Sub-task
> Affects Versions: 2.1.0
> Reporter: Abdullah Yousufi
> Assignee: Abdullah Yousufi
>
> During split computation when a large number of files are required to be
> listed from S3, instead of executing 1 API call per file, one can optimize by
> listing 1000 files in each API call. This would reduce the amount of time
> required for listing files.
> Qubole has this optimization in place as detailed here:
> https://www.qubole.com/blog/product/optimizing-hadoop-for-s3-part-1/?nabe=5695374637924352:0
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)