[ 
https://issues.apache.org/jira/browse/HADOOP-17915?focusedWorklogId=651327&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-651327
 ]

ASF GitHub Bot logged work on HADOOP-17915:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 15/Sep/21 20:57
            Start Date: 15/Sep/21 20:57
    Worklog Time Spent: 10m 
      Work Description: steveloughran opened a new pull request #3442:
URL: https://github.com/apache/hadoop/pull/3442


   
   This PR ensures that
   
   * `FileSystem.getCanonicalServiceName()` is *never* used to build an abfs 
service name from the IPaddr of the storage account's endpoint.
   * Instead, if DTs are enabled, the `AbfsDelegationTokenManager` gets it from 
the DT plugin if it implements `getCanonicalServiceName()` & returns a non-null 
value, else derives it from the FS URI.
   * And if DTs are disabled: returns null.
   
   The fallback calculation of the Canonical Service Name is abfs:// + 
fsURI.getHost() + "/"
   
   1. schema is always abfs, even for abfss stores
   1. the container is stripped from the service name.
   1. so all abfs containers for the same service a/c will have the same 
Canonical Service Name
   1. share a single DT in job submission
   1. *and*: if a DT is issued for one of the containers in job submission, all 
other containers for the same storage a/c will use that DT
   1. Even if the caller didn't explicitly name it.
   
   That is consistent with using the storage a/c's endpoint IPAddr to identify 
the storage account.
   
   Today, `abfs://[email protected]/` and
    `abfs://[email protected]/` will both have their CSN 
map to the same endpoint hostname *if the DT plugin doesn't return a schema*
   
   If the DT plugin returns a schema -which it should, this is all moot. This 
is a fallback if they don't.
   
   Tests updated to match new behavior.
   
   ### How was this patch tested?
   
   azure cardiff
   
   ### For code changes:
   
   - [X] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [X] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

            Worklog Id:     (was: 651327)
    Remaining Estimate: 0h
            Time Spent: 10m

> ABFS AbfsDelegationTokenManager to generate canonicalServiceName if DT plugin 
> doesn't
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-17915
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17915
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/azure
>    Affects Versions: 3.3.1
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Critical
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently in {{AbfsDelegationTokenManager}}, any 
> {{CustomDelegationTokenManager}} only provides a canonical service name if it
> implements {{BoundDTExtension}} and its {{getCanonicalServiceName()}} method.
> If this doesn't hold, {{AbfsDelegationTokenManager}} returns null, which 
> causes {{AzureBlobFileSystem.getCanonicalServiceName()}}
> to call {{super.getCanonicalServiceName()}} *which resolves the IP address of 
> the abfs endpoint, and then the FQDN of that IPAddr
> If a storage account is served over >1 endpoint, then the DT will only have a 
> valid service name for one of the possible
> endpoints, so _only work if all process get the same IP address when the look 
> up the storage account address_
> Fix
> # DT plugins SHOULD generate the canonical service name
> #  If they don't, and DTs are enabled: {{AbfsDelegationTokenManager}} to 
> create a default one
> # and {{AzureBlobFileSystem.getCanonicalServiceName()}} MUST NOT call 
> superclass.
> The default canonical service name of a store will be {{abfs:// + 
> FsURI.getHost() + "/"}}, so all containers in same storage account has the 
> same service name
> {code}
> abfs://[email protected]/path
> {code}
> maps to 
> {code}
> abfs://stevel-testing.dfs.core.windows.net/ 
> {code}
> This will mean that only one DT will be created per storage a/c; Applications 
> will not need to list all containers which deployed processes will wish to 
> interact with. Today's behaviour, based on rDNS lookup of storage account, is 
> possibly slightly broader in that all storage accounts which map to the same 
> IPAddr share a DT. The proposed scheme will still be much broader than that 
> of S3A, where every bucket has its unique service name, so apps need to list 
> all target filesystems at launch time (easy for MR, source of trouble in 
> spark).
> Fix: straightforward. 
> Test
> * no DTs: service name == null
> * DTs: will match proposed pattern, even if extension returns null.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to