suman kumari created HBASE-23122:
------------------------------------
Summary: FSHDFSUtils#isSameHdfs doesn't handle azure wasb
filesystems correctly.
Key: HBASE-23122
URL: https://issues.apache.org/jira/browse/HBASE-23122
Project: HBase
Issue Type: Bug
Components: Filesystem Integration
Reporter: suman kumari
FSHDFSUtils#isSameHdfs retrieves the Canonical Service Name from Hadoop to
determine if source and destination are on the same filesystem. This method
"getCanonicalServiceName()" returns IP address for the file system, which can
be same for two different file systems but actually there are two separate
storage accounts, which incorrectly causes isSameHdfs to return true even when
they are different.
It seems this API should not be used to check if the src and target are in the
same filesystem, according to the Hadoop API
[declaration|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhadoop.apache.org%2Fdocs%2Fr3.1.1%2Fapi%2Forg%2Fapache%2Fhadoop%2Ffs%2FFileSystem.html%23getCanonicalServiceName--&data=02%7C01%7CSuman.Kumari%40microsoft.com%7Ce85c2f4412a442dd135108d7492c96af%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637058328158722814&sdata=yik1Fk7uTYYlx5n4G52ay6PiY0oeodXHnonClLlY0YM%3D&reserved=0]
. The token cache is the *only user* of the canonical service name, and uses
it to lookup this FileSystem's service tokens.
This error was found while doing a bulk load on hbase from one file system to
another file system. Since getCanonicalServiceName() was returning same address
for both the storage accounts, the two file systems were getting identified as
same filesystem. When the HBase bulk load commands runs, it tries to find the
file on the default file system and hence it fails for FileNotFoundException.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)