[
https://issues.apache.org/jira/browse/HADOOP-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033986#comment-16033986
]
Mingliang Liu commented on HADOOP-14428:
----------------------------------------
Thanks [~fabbri] and [[email protected]] for review. This is tricky; I know. :)
The reason that existing test code did not catch this bug is because, as you
said, the trailing slash was removed before it's passed to S3AFileSystem. There
are several places that will remove trailing slashes, mostly
{{URI::normalize()}} and {{Path::normalizePath()}}. All Path constructors will
call {{URI::normalize()}} implicitly after which there will be _at most_ one
trailing slash. Meanwhile, some Path constructors call {{normalizePath()}} as
well. Last, {{Path::makeQualified()}} method _sometimes_ removes the trailing
slash. In our existing code, the {{Path::makeQualified()}} is called via
{{path()}} helper method and it does remove trailing slashes if any. So
overall, the existing test code will not get a chance to carry trailing slashes
before passing to S3AFileSystem.
You can add as many trailing slashes, and it should pass always.
{code:title=AbstractContractMkdirTest::testMkdirSlashHandling()}
// With trailing slash
assertTrue(fs.mkdirs(path("testmkdir/b////////////")));
assertPathExists("mkdir with trailing slash failed", path("testmkdir/b/"));
{code}
Comes back to our case,
# note that we say _some Path constructors_ call {{normalizePath()}} and
constructor {{Path(URI)}} does not.
{code}
public Path(URI aUri) {
uri = aUri.normalize();
}
{code}
So the uri is allowed to have one trailing slash. Unfortunately, fs.shell
package is using {{PathData}} class which creates a Path object from command
line arguments via {{Path(URI)}} constructor. In our case the trailing slash is
preserved in constructor.
# Actually {{PathData}} will also call {{Path::makeQualified()}} method after
constructing a Path. However, {{Path::makeQualified()}} _sometimes removes the
trailing slash_ by calling {{Path::normalizePath()}}, and sometimes === not
fully qualified.
{code}
/* ...
* @return this path if it contains a scheme and authority and is absolute, or
* a new path that includes a path and authority and is fully qualified
*/
public Path makeQualified(URI defaultUri, Path workingDir ) {
...
}
{code}
Here in the command line of this JIRA, we provide a fully qualified path
string. As a result, the trailing slash is preserved again. Eventually the
trailing slash gets passed to S3AFileSystem, where we got fooled to believe
{{/a/b/c/}} getParent returns {{/a/b}} while it actually returns {{/a/b/c}}.
As to the new test case in this patch, the {{new
Path(getContract().getTestPath() + "/testMkdirSlashHandling/e///")}} will fail
w/o this fix because:
# We have triple trailing slashes, while {{Path::normalizePath()}} is only able
to remove two of them (refer to its source code)
# After {{Path::normalizePath()}}, {{URI::normalize()}} allows one trailing
slash.
So in S3AFileSystem, the Path to mkdir has the trailing slash and we have to
remove before calling {{getParent}}.
If only you find this amusing. Part of my understanding maybe wrong; correct me.
The findbugs warnings are unrelated, and the patch still apply cleanly. I'll
hold on commit by tomorrow. Thanks,
> s3a: mkdir appears to be broken
> -------------------------------
>
> Key: HADOOP-14428
> URL: https://issues.apache.org/jira/browse/HADOOP-14428
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 3.0.0-alpha2, HADOOP-13345
> Reporter: Aaron Fabbri
> Assignee: Mingliang Liu
> Priority: Blocker
> Attachments: HADOOP-14428.000.patch, HADOOP-14428.001.patch
>
>
> Reproduction is:
> hadoop fs -mkdir s3a://my-bucket/dir/
> hadoop fs -ls s3a://my-bucket/dir/
> ls: `s3a://my-bucket/dir/': No such file or directory
> I believe this is a regression from HADOOP-14255.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]