[
https://issues.apache.org/jira/browse/HIVE-14060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15340988#comment-15340988
]
Gopal V commented on HIVE-14060:
--------------------------------
This happens to any FS which calls FileSystem.listLocatedStatus via super().
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java#L697
> Hive: Remove bogus "localhost" from Hive splits
> -----------------------------------------------
>
> Key: HIVE-14060
> URL: https://issues.apache.org/jira/browse/HIVE-14060
> Project: Hive
> Issue Type: Bug
> Components: Tez
> Affects Versions: 2.1.0, 2.2.0
> Reporter: Gopal V
> Assignee: Gopal V
> Attachments: HIVE-14060.1.patch
>
>
> On remote filesystems like Azure, GCP and S3, the splits contain a filler
> location of "localhost".
> This is worse than having no location information at all - on large clusters
> yarn waits upto 200[1] seconds for heartbeat from "localhost" before
> allocating a container.
> To speed up this process, the split affinity provider should scrub the bogus
> "localhost" from the locations and allow for the allocation of "*" containers
> instead on each heartbeat.
> [1] - yarn.scheduler.capacity.node-locality-delay=40 x heartbeat of 5s
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)