[
https://issues.apache.org/jira/browse/FLINK-12550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16843397#comment-16843397
]
Felix seibert edited comment on FLINK-12550 at 5/19/19 11:23 AM:
-----------------------------------------------------------------
After openining PR #8478 yesterday, I have some additional considerations.
The status quo is the following:
To check if an input split is locally available for a taskmanager, the hostname
of the taskmanager is compared to the hostname of the input split. This happens
in [this
line|[https://github.com/apache/flink/blob/4fa387164cea44f8e0bac1aadab11433c0f0ff2b/flink-core/src/main/java/org/apache/flink/api/common/io/LocatableInputSplitAssigner.java#L223]]:
{code:java}
if (h != null &&
NetUtils.getHostnameFromFQDN(h.toLowerCase()).equals(flinkHost)){code}
h is the hostname of a machine hosting the input split, flinkHost is the
taskmanager that is looking for an input split. NetUtils.getHostnameFromFQDN()
truncates at the first occurrance of a ".". So, if a split is present on
"host.domain", and the hostname of the taskmanager is "host.domain" too, we
actually check whether "host".equals("host.domain") which is not true. PR #8478
applies getHostnameFromFQDN() on the taskmanager hostname as well, so it seems
that this problem is fixed.
BUT. What if there is a taskmanager on host "host.cluster1.domain", and an
input split on host "host.cluster2.domain"? isLocal() would recognize this
split as being on the same host as the taskmanager, which is clearly not the
case.
So to me it looks like getHostNameFromFQDN() shouldn't be applied on neither of
the two compared hostnames.
Or is there any reason why it should be applied?
was (Author: felxe):
After openining PR #8478 yesterday, I have some additional considerations.
The status quo is the following:
To check if an input split is locally available for a taskmanager, the hostname
of the taskmanager is compared to the hostname of the input split. This happens
in [this
line|[https://github.com/apache/flink/blob/4fa387164cea44f8e0bac1aadab11433c0f0ff2b/flink-core/src/main/java/org/apache/flink/api/common/io/LocatableInputSplitAssigner.java#L223]:]
{code:java}
if (h != null &&
NetUtils.getHostnameFromFQDN(h.toLowerCase()).equals(flinkHost)){code}
h is the hostname of a machine hosting the input split, flinkHost is the
taskmanager that is looking for an input split. NetUtils.getHostnameFromFQDN()
truncates at the first occurrance of a ".". So, if a split is present on
"host.domain", and the hostname of the taskmanager is "host.domain" too, we
actually check whether "host".equals("host.domain") which is not true. PR #8478
applies getHostnameFromFQDN() on the taskmanager hostname as well, so it seems
that this problem is fixed.
BUT. What if there is a taskmanager on host "host.cluster1.domain", and an
input split on host "host.cluster2.domain"? isLocal() would recognize this
split as being on the same host as the taskmanager, which is clearly not the
case.
So to me it looks like getHostNameFromFQDN() shouldn't be applied on neither of
the two compared hostnames.
Or is there any reason why it should be applied?
> hostnames with a dot never receive local input splits
> -----------------------------------------------------
>
> Key: FLINK-12550
> URL: https://issues.apache.org/jira/browse/FLINK-12550
> Project: Flink
> Issue Type: Bug
> Components: API / DataSet
> Affects Versions: 1.8.0
> Reporter: Felix seibert
> Priority: Major
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> LocatableInputSplitAssigner (in package api.common.io) fails to assign local
> input splits to hosts whose hostname contains a dot ("."). To reproduce add
> the following test to LocatableSplitAssignerTest and execute it. It will
> always fail. In my mind, this is contrary to the expected behaviour, which is
> that the host should obtain the one split that is stored on the very same
> machine.
>
> {code:java}
> @Test
> public void testLocalSplitAssignmentForHostWithDomainName() {
> try {
> String hostNameWithDot = "testhost.testdomain";
> // load one split
> Set<LocatableInputSplit> splits = new HashSet<LocatableInputSplit>();
> splits.add(new LocatableInputSplit(0, hostNameWithDot));
> // get next split for the host
> LocatableInputSplitAssigner ia = new
> LocatableInputSplitAssigner(splits);
> InputSplit is = null;
> ia.getNextInputSplit(hostNameWithDot, 0);
> // there should be exactly zero remote and one local assignment
> assertEquals(0, ia.getNumberOfRemoteAssignments());
> assertEquals(1, ia.getNumberOfLocalAssignments());
> }
> catch (Exception e) {
> e.printStackTrace();
> fail(e.getMessage());
> }
> }
> {code}
> I also experienced this error in practice, and will later today open a pull
> request to fix it.
>
> Note: I'm not sure if I selected the correct component category.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)