Steve Loughran created HADOOP-17915:
---------------------------------------

             Summary: ABFS AbfsDelegationTokenManager to generate 
canonicalServiceName if DT plugin doesn't
                 Key: HADOOP-17915
                 URL: https://issues.apache.org/jira/browse/HADOOP-17915
             Project: Hadoop Common
          Issue Type: Sub-task
          Components: fs/azure
    Affects Versions: 3.3.1
            Reporter: Steve Loughran
            Assignee: Steve Loughran


Currently in {{AbfsDelegationTokenManager}}, any 
{{CustomDelegationTokenManager}} only provides a canonical service name if it
implements {{BoundDTExtension}} and its {{getCanonicalServiceName()}} method.

If this doesn't hold, {{AbfsDelegationTokenManager}} returns null, which causes 
{{AzureBlobFileSystem.getCanonicalServiceName()}}
to call {{super.getCanonicalServiceName()}} *which resolves the IP address of 
the abfs endpoint, and then the FQDN of that IPAddr

If a storage account is served over >1 endpoint, then the DT will only have a 
valid service name for one of the possible
endpoints, so _only work if all process get the same IP address when the look 
up the storage account address_

Fix

# DT plugins SHOULD generate the canonical service name
#  If they don't, and DTs are enabled: {{AbfsDelegationTokenManager}} to create 
a default one
# and {{AzureBlobFileSystem.getCanonicalServiceName()}} MUST NOT call 
superclass.


The default canonical service name of a store will be {{abfs:// + 
FsURI.getHost() + "/"}}, so all containers in same storage account has the same 
service name

{code}
abfs://[email protected]/path
{code}

maps to 
{code}
abfs://stevel-testing.dfs.core.windows.net/ 
{code}

This will mean that only one DT will be created per storage a/c; Applications 
will not need to list all containers which deployed processes will wish to 
interact with. Today's behaviour, based on rDNS lookup of storage account, is 
possibly slightly broader in that all storage accounts which map to the same 
IPAddr share a DT. The proposed scheme will still be much broader than that of 
S3A, where every bucket has its unique service name, so apps need to list all 
target filesystems at launch time (easy for MR, source of trouble in spark).

Fix: straightforward. 

Test
* no DTs: service name == null
* DTs: will match proposed pattern, even if extension returns null.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to