[
https://issues.apache.org/jira/browse/HIVE-14165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15806158#comment-15806158
]
Vihang Karajgaonkar commented on HIVE-14165:
--------------------------------------------
Thanks for the patch [~stakiar]. It seems like the previous implementation was
ignoring zero length files for computing the splits. While
FileInputFormat.getSplit() creates an empty Split for the zero length files. I
am not sure how it impacts the execution, may be worth while to test. Also, if
needed may be you can ignore the empty splits before adding them to
{{FetchInputFormatSplit[] inputSplit}}
> Remove Hive file listing during split computation
> -------------------------------------------------
>
> Key: HIVE-14165
> URL: https://issues.apache.org/jira/browse/HIVE-14165
> Project: Hive
> Issue Type: Sub-task
> Affects Versions: 2.1.0
> Reporter: Abdullah Yousufi
> Assignee: Sahil Takiar
> Attachments: HIVE-14165.02.patch, HIVE-14165.03.patch,
> HIVE-14165.04.patch, HIVE-14165.05.patch, HIVE-14165.06.patch,
> HIVE-14165.patch
>
>
> The Hive side listing in FetchOperator.java is unnecessary, since Hadoop's
> FileInputFormat.java will list the files during split computation anyway to
> determine their size. One way to remove this is to catch the
> InvalidInputFormat exception thrown by FileInputFormat#getSplits() on the
> Hive side instead of doing the file listing beforehand.
> For S3 select queries on partitioned tables, this results in a 2x speedup.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)