Rajesh Balamohan created TEZ-4245:
-------------------------------------
Summary: Optimise split grouping when locality information is set
to null/empty
Key: TEZ-4245
URL: https://issues.apache.org/jira/browse/TEZ-4245
Project: Apache Tez
Issue Type: Improvement
Reporter: Rajesh Balamohan
In objectstores like S3, locality information always shows up as "localhost".
Having this information in inputsplit slows down scheduling as explained in
https://issues.apache.org/jira/browse/HIVE-14060 Systems like hive remove
"localhost" information from splits.
Split information without any locality information (localhost/null/empty)
should be treated equally, so that split grouping can do meaningful grouping
based on cluster size. This is to avoid creating small split groups, which can
significantly increase runtime due to sequential processing (i.e same map task
getting lots of inputs and system ends up spending time in open/seek/close on
objectstores).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)