SeongHoon Ku created HDFS-17855:
-----------------------------------
Summary: ViewFS with linkMergeSlash generates invalid paths during
listStatus/listLocatedStatus operations, causing InvalidPathException or
incorrect path resolution
Key: HDFS-17855
URL: https://issues.apache.org/jira/browse/HDFS-17855
Project: Hadoop HDFS
Issue Type: Bug
Components: viewfs
Affects Versions: 3.4.1, 2.10.2
Environment: * Hadoop version: 2.10.2
* Configuration: ViewFS with linkMergeSlash enabled
* Affected applications: JobHistoryServer, Hive, any application using ViewFS
with linkMergeSlash
Reporter: SeongHoon Ku
h1. Summary
ViewFS with linkMergeSlash generates invalid paths during
listStatus/listLocatedStatus operations, causing InvalidPathException or
incorrect path resolution
----
h1. Issue Type
*Bug*
----
h1. Components
* fs
* viewfs
----
h1. Affects Versions
* 2.10.2 (verified)
* Likely affects 3.x versions as well
----
h1. Environment
* Hadoop version: 2.10.2
* Configuration: ViewFS with linkMergeSlash enabled
* Affected applications: JobHistoryServer, Hive, any application using ViewFS
with linkMergeSlash
----
h1. Description
When ViewFS is configured with {{linkMergeSlash}}, directory listing operations
using *RemoteIterator* generate invalid paths, causing {{InvalidPathException}}
errors in applications using the FileContext API.
* Applications using *FileContext API (ViewFs)* with {{listLocatedStatus()}} or
{{listStatusIterator()}}
* Examples: JobHistoryServer, Hive/Tez applications
* Specifically fails in {{ViewFs$WrappingRemoteIterator.next()}} method
h2. Configuration Example
{code:xml}
<property>
<name>fs.defaultFS</name>
<value>viewfs://hadoop-cluster</value>
</property>
<property>
<name>fs.viewfs.mounttable.hadoop-cluster.linkMergeSlash</name>
<value>hdfs://hadoop-cluster</value>
</property>
{code}
h2. Error Stack Trace
*JobHistoryServer:*
{noformat}
org.apache.hadoop.fs.InvalidPathException: Invalid path name relative paths not
allowed:
hadoop-cluster/user/history/done/2021
at
org.apache.hadoop.fs.AbstractFileSystem.checkPath(AbstractFileSystem.java:370)
at
org.apache.hadoop.fs.AbstractFileSystem.makeQualified(AbstractFileSystem.java:428)
at
org.apache.hadoop.fs.viewfs.ViewFs$WrappingRemoteIterator.next(ViewFs.java:848)
at
org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:238)
{noformat}
*Hive (Tez):*
{noformat}
org.apache.tez.dag.app.dag.impl.AMUserCodeException: java.io.IOException:
cannot find dir = viewfs://hadoop-cluster/user/hive/hadoop-cluster/tmp/hive/...
{noformat}
*Observed pattern:*
* Invalid path:
{{viewfs://hadoop-cluster/user/hive/hadoop-cluster/tmp/hive/...}}
* Correct path: {{viewfs://hadoop-cluster/tmp/hive/...}}
* Working directory and cluster name are duplicated in the path
----
h1. Root Cause
h2. Technical Analysis
When {{linkMergeSlash}} is configured, the ViewFS root node is created with its
path name ({{fullPath}}) incorrectly set to {{mountTableName}} instead of
{{"/"}}.
*Bug location in {{InodeTree.java}}:*
{code:java}
// Current (buggy) code
if (isMergeSlashConfigured) {
root = new INodeLink<T>(mountTableName, ugi, // "hadoop-cluster" - BUG!
initAndGetTargetFs(), mergeSlashTarget);
mountPoints.add(new MountPoint<T>("/", (INodeLink<T>) root));
rootFallbackLink = null;
}
{code}
This causes {{root.fullPath}} to be set to the cluster name (e.g.,
{{"hadoop-cluster"}}) instead of {{"/"}}.
h2. Impact Chain
# During path resolution ({{InodeTree.java}}), {{root.fullPath}} is used as
{{ResolveResult.resolvedPath}}:
{code:java}
if (root.isLink()) {
ResolveResult<T> res = new ResolveResult<T>(ResultKind.EXTERNAL_DIR,
getRootLink().getTargetFileSystem(), root.fullPath, remainingPath);
// ^^^^^^^^^^^^^ Uses mountTableName!
return res;
}
{code}
# During path conversion in {{ViewFileSystem.getChrootedPath()}} (line 563):
{code:java}
return this.makeQualified(
suffix.length() == 0 ? f : new Path(res.resolvedPath, suffix));
// Creates: new Path("hadoop-cluster", "user/history/done")
// Result: "hadoop-cluster/user/history/done" (RELATIVE PATH!)
{code}
# {{makeQualified()}} then prepends the working directory to this relative path:
{noformat}
Expected: viewfs://hadoop-cluster/user/history/done
Actual: viewfs://hadoop-cluster/user/mapred/hadoop-cluster/user/history/done
{noformat}
h2. Why linkMergeSlash Should Use "/"
{{linkMergeSlash}} is designed to merge the entire ViewFS root with a single
target directory. Therefore:
* ViewFS root ({{/}}) = Target directory specified by linkMergeSlash
* The root node's {{fullPath}} should naturally be {{/}}
* This maintains consistency with the {{MountPoint}} API which already returns
{{/}}
----
h1. Testing
h2. Test Cases
Added comprehensive test cases in {{TestViewFileSystemLinkMergeSlash.java}}:
# *{{testListStatusReturnsCorrectPaths()}}*
** Verifies {{listStatus()}} returns proper ViewFS paths
** Checks scheme, authority, and path correctness
# *{{testListLocatedStatusReturnsCorrectPaths()}}*
** Verifies {{listLocatedStatus()}} with RemoteIterator
** Ensures lazy evaluation works correctly
# *{{testResolvedPathIsAbsolute()}}*
** Reproduces exact bug scenario (JobHistoryServer use case)
** Validates path resolution for {{/user/history/done/2021}}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]