[
https://issues.apache.org/jira/browse/HADOOP-19815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HADOOP-19815:
------------------------------------
Labels: pull-request-available (was: )
> Path normalizes away important trailing slash used for URI.resolve(other)
> -------------------------------------------------------------------------
>
> Key: HADOOP-19815
> URL: https://issues.apache.org/jira/browse/HADOOP-19815
> Project: Hadoop Common
> Issue Type: Bug
> Components: common
> Affects Versions: 3.4.2
> Reporter: Christopher Tubbs
> Priority: Major
> Labels: pull-request-available
>
> This issue appears to be a relatively long-standing bug with Hadoop's
> FileSystem and Path classes, but is nevertheless important.
> The core of the issue is that {{URI.resolve(...)}} relies on a trailing slash
> to determine how to resolve path components, but the trailing slash is often
> stripped out in common code paths for FileSystem and Path. This causes
> problems when trying to resolve new URIs/Paths from existing ones.
> Constructing a Path from a URI, rather than a String or another Path, does
> preserve the original URI, so things do resolve correctly, but that yields
> highly inconsistent behavior, and depends on the specifics of how it was
> constructed and how the original URI was preserved internally.
> However, even if one argues that the String constructor for Path is supposed
> to normalize, and the URI constructor is supposed to preserve, the problem
> also exists with many of the {{FileSystem}} methods, such as
> {{{}fs.getUri(){}}}, {{{}fs.getHomeDirectory(){}}},
> {{{}fs.getWorkingDirectory(){}}}, etc. So, one must do convoluted string
> manipulation to resolve one Path from another.
> For example:
> {code:java}
> new Path("hdfs://localhost:8020/path/to/somewhere").toUri().resolve("other");
> // expected ==> URI(hdfs://localhost:8020/path/to/other)
> // actual ==> URI(hdfs://localhost:8020/path/to/other)
> new Path("hdfs://localhost:8020/path/to/somewhere/").toUri().resolve("other");
> // expected ==> URI(hdfs://localhost:8020/path/to/somewhere/other)
> // actual ==> URI(hdfs://localhost:8020/path/to/other)
> new Path(new
> URI("hdfs://localhost:8020/path/to/somewhere")).toUri().resolve("other");
> // expected ==> URI(hdfs://localhost:8020/path/to/other)
> // actual ==> URI(hdfs://localhost:8020/path/to/other)
> new Path(new
> URI("hdfs://localhost:8020/path/to/somewhere/")).toUri().resolve("other");
> // expected ==> URI(hdfs://localhost:8020/path/to/somewhere/other)
> // actual ==> URI(hdfs://localhost:8020/path/to/somewhere/other)
> var fs = FileSystem.get(new Configuration());
> fs.getUri();
> // expected ==> URI(hdfs://localhost:8020/)
> // actual ==> URI(hdfs://localhost:8020) // probably matters more for
> LocalFileSystem or viewfs, etc.
> fs.getWorkingDirectory().toUri();
> fs.getHomeDirectory().toUri();
> // expected ==> URI(hdfs://localhost:8020/user/me/)
> // actual ==> URI(hdfs://localhost:8020/user/me)
> // broken code
> URI relativeURI = new URI("mytempdir");
> fs.getWorkingDirectory().toUri().resolve(relativeURI);
> // expected ==> hdfs://localhost:8020/user/me/mytempdir
> // actual ==> hdfs://localhost:8020/user/mytempdir
> // convoluted workaround (assuming relative path in the suffix without any
> other URI elements)
> URI relativeURI = new URI("mytempdir");
> fs.getWorkingDirectory().suffix("/" + relativeURI.toString()).toUri();
> // expected ==> hdfs://localhost:8020/user/me/mytempdir
> // actual ==> hdfs://localhost:8020/user/me/mytempdir
> {code}
> Some of this is workable, so long as you're staying with Path, but the moment
> you try to work with URIs/URLs, things get convoluted quickly, requiring
> {{toString()}} calls and concatenation with slash {{/}} characters, and edge
> cases when the other path isn't relative, or contains a different authority
> or scheme, etc. These are things {{URI.resolve()}} would already handle, so
> code can get unnecessarily complex to work around these API problems.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]