[
https://issues.apache.org/jira/browse/HADOOP-7510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13106097#comment-13106097
]
Daryn Sharp commented on HADOOP-7510:
-------------------------------------
Fully qualified hostnames:
- The reason for using fully qualified hostnames in the token service is that
every machine in a cluster may not have the same resolv.conf domain search
path. This is highly likely for the client host submitting the job versus the
cluster itself. Clusters spanning different networks are also likely to have
different domain search paths.
- If a short name is used then host resolution may either fail or resolve to a
different host depending on the machine performing the resolution. This may
result in jobs failing if the task cannot find a token for the connection.
- Short hostnames may result in multiple tokens being acquired for the same
machine because "host" and "host.domain.com" appear to be different. Ex. the
user acquires token using fetchdt with host "host". Paths in a job are
"hdfs://host.domain.com". The JT will unnecessarily acquire another token.
Use of {{ResolverConfiguration}} (most is documented in the code):
- {{InetSocketAddress}} provides no api to get the fully qualified host with
the search domain.
- {{getCanonicalHostname}} always returns the A record, which cannot be used if
a CNAME was given. Otherwise, ip changes will not be detected. Ex. "nn" is a
CNAME to "nn1". The CNAME is flipped to "nn2". If getCanonicalHostname() is
used, it will return "nn1", therefor the client not switch to "nn2" as expected.
- I originally attempted to use {{getAllByName}} on the resolved ip, and then
prefix matching against the hosts returned. Java only records the A record
(canonical name) for an ip. Same problem as prior point.
- The method of resolving matches the same standard that unix and java use to
resolve unqualified hostnames. I even confirmed via packet capture.
{{HftpFileSystem}}
- {{getCanonicalServiceName}} was doing the same thing as the {{FileSystem}}
base class. Removed it due to unnecessary redundancy which will cause
maintenance issues if the default implementation changes.
{{DFSClient}}
- We need to renew/cancel tokens with the same configuration used to get the
token.
- Current implementation calls {{createRPCNamenode}} which unnecessarily forces
RPC and eschews the {{RetryProxy}}. {{createNamenode}} abstracts both of these
details.
- Exception/retry policies appear to be changed only for file creation, thus
not an issue.
- Looking deeper, actually need to instantiate {{DFSClient}} to get the
configuration timeout/retry for socket connects. Also tags the client with the
job id for easier debugging.
{{DistributedFileSystem}}
- You're right. I'll need to mergeup and evaluate any impact. My initial
reaction is we are pushing the token renewal far too deep into the stack.
Completely bypassing the filesystem is preventing the renewal from using an
identically configured {{DFSClient}} as used to get the token. I'll
investigate.
{{JobClient}}
- Same reasons as DFSClient.
- Should be using an identically configured client as used to acquire the token.
- Should not assume RPC was used to acquire the token. The client abstracts
the underlying protocol.
Thanks for taking time to review this important feature!
> Tokens should use original hostname provided instead of ip
> ----------------------------------------------------------
>
> Key: HADOOP-7510
> URL: https://issues.apache.org/jira/browse/HADOOP-7510
> Project: Hadoop Common
> Issue Type: Improvement
> Components: security
> Reporter: Daryn Sharp
> Assignee: Daryn Sharp
> Fix For: 0.20.205.0
>
> Attachments: HADOOP-7510-2.patch, HADOOP-7510-3.patch,
> HADOOP-7510-4.patch, HADOOP-7510-5.patch, HADOOP-7510-6.patch,
> HADOOP-7510.patch
>
>
> Tokens currently store the ip:port of the remote server. This precludes
> tokens from being used after a host's ip is changed. Tokens should store the
> hostname used to make the RPC connection. This will enable new processes to
> use their existing tokens.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira