[ 
https://issues.apache.org/jira/browse/HADOOP-19815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HADOOP-19815:
------------------------------------
    Labels: pull-request-available  (was: )

> Path normalizes away important trailing slash used for URI.resolve(other)
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-19815
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19815
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: common
>    Affects Versions: 3.4.2
>            Reporter: Christopher Tubbs
>            Priority: Major
>              Labels: pull-request-available
>
> This issue appears to be a relatively long-standing bug with Hadoop's 
> FileSystem and Path classes, but is nevertheless important.
> The core of the issue is that {{URI.resolve(...)}} relies on a trailing slash 
> to determine how to resolve path components, but the trailing slash is often 
> stripped out in common code paths for FileSystem and Path. This causes 
> problems when trying to resolve new URIs/Paths from existing ones. 
> Constructing a Path from a URI, rather than a String or another Path, does 
> preserve the original URI, so things do resolve correctly, but that yields 
> highly inconsistent behavior, and depends on the specifics of how it was 
> constructed and how the original URI was preserved internally.
> However, even if one argues that the String constructor for Path is supposed 
> to normalize, and the URI constructor is supposed to preserve, the problem 
> also exists with many of the {{FileSystem}} methods, such as 
> {{{}fs.getUri(){}}}, {{{}fs.getHomeDirectory(){}}}, 
> {{{}fs.getWorkingDirectory(){}}}, etc. So, one must do convoluted string 
> manipulation to resolve one Path from another.
> For example:
> {code:java}
> new Path("hdfs://localhost:8020/path/to/somewhere").toUri().resolve("other");
> // expected ==> URI(hdfs://localhost:8020/path/to/other)
> // actual ==> URI(hdfs://localhost:8020/path/to/other)
> new Path("hdfs://localhost:8020/path/to/somewhere/").toUri().resolve("other");
> // expected ==> URI(hdfs://localhost:8020/path/to/somewhere/other)
> // actual ==> URI(hdfs://localhost:8020/path/to/other)
> new Path(new 
> URI("hdfs://localhost:8020/path/to/somewhere")).toUri().resolve("other");
> // expected ==> URI(hdfs://localhost:8020/path/to/other)
> // actual ==> URI(hdfs://localhost:8020/path/to/other)
> new Path(new 
> URI("hdfs://localhost:8020/path/to/somewhere/")).toUri().resolve("other");
> // expected ==> URI(hdfs://localhost:8020/path/to/somewhere/other)
> // actual ==> URI(hdfs://localhost:8020/path/to/somewhere/other)
> var fs = FileSystem.get(new Configuration());
> fs.getUri();
> // expected ==> URI(hdfs://localhost:8020/)
> // actual ==> URI(hdfs://localhost:8020) // probably matters more for 
> LocalFileSystem or viewfs, etc.
> fs.getWorkingDirectory().toUri();
> fs.getHomeDirectory().toUri();
> // expected ==> URI(hdfs://localhost:8020/user/me/)
> // actual ==> URI(hdfs://localhost:8020/user/me)
> // broken code
> URI relativeURI = new URI("mytempdir");
> fs.getWorkingDirectory().toUri().resolve(relativeURI);
> // expected ==> hdfs://localhost:8020/user/me/mytempdir
> // actual ==> hdfs://localhost:8020/user/mytempdir
> // convoluted workaround (assuming relative path in the suffix without any 
> other URI elements)
> URI relativeURI = new URI("mytempdir");
> fs.getWorkingDirectory().suffix("/" + relativeURI.toString()).toUri();
> // expected ==> hdfs://localhost:8020/user/me/mytempdir
> // actual ==> hdfs://localhost:8020/user/me/mytempdir
> {code}
> Some of this is workable, so long as you're staying with Path, but the moment 
> you try to work with URIs/URLs, things get convoluted quickly, requiring 
> {{toString()}} calls and concatenation with slash {{/}} characters, and edge 
> cases when the other path isn't relative, or contains a different authority 
> or scheme, etc. These are things {{URI.resolve()}} would already handle, so 
> code can get unnecessarily complex to work around these API problems.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to