[ 
https://issues.apache.org/jira/browse/HADOOP-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033986#comment-16033986
 ] 

Mingliang Liu commented on HADOOP-14428:
----------------------------------------

Thanks [~fabbri] and [[email protected]] for review. This is tricky; I know. :)

The reason that existing test code did not catch this bug is because, as you 
said, the trailing slash was removed before it's passed to S3AFileSystem. There 
are several places that will remove trailing slashes, mostly 
{{URI::normalize()}} and {{Path::normalizePath()}}. All Path constructors will 
call {{URI::normalize()}} implicitly after which there will be _at most_ one 
trailing slash. Meanwhile, some Path constructors call {{normalizePath()}} as 
well. Last, {{Path::makeQualified()}} method _sometimes_ removes the trailing 
slash. In our existing code, the {{Path::makeQualified()}} is called via 
{{path()}} helper method and it does remove trailing slashes if any. So 
overall, the existing test code will not get a chance to carry trailing slashes 
before passing to S3AFileSystem.

You can add as many trailing slashes, and it should pass always.
{code:title=AbstractContractMkdirTest::testMkdirSlashHandling()}
    // With trailing slash
    assertTrue(fs.mkdirs(path("testmkdir/b////////////")));
    assertPathExists("mkdir with trailing slash failed", path("testmkdir/b/"));
{code}

Comes back to our case,
# note that we say _some Path constructors_ call {{normalizePath()}} and 
constructor {{Path(URI)}} does not.
{code}
public Path(URI aUri) {
    uri = aUri.normalize();
}
{code}
So the uri is allowed to have one trailing slash. Unfortunately, fs.shell 
package is using {{PathData}} class which creates a Path object from command 
line arguments via {{Path(URI)}} constructor. In our case the trailing slash is 
preserved in constructor.
# Actually {{PathData}} will also call {{Path::makeQualified()}} method after 
constructing a Path. However,  {{Path::makeQualified()}} _sometimes removes the 
trailing slash_ by calling {{Path::normalizePath()}}, and sometimes === not 
fully qualified.
{code}
  /*  ...
   * @return this path if it contains a scheme and authority and is absolute, or
   * a new path that includes a path and authority and is fully qualified
   */
  public Path makeQualified(URI defaultUri, Path workingDir ) {
  ...
  }
{code}
Here in the command line of this JIRA, we provide a fully qualified path 
string. As a result, the trailing slash is preserved again. Eventually the 
trailing slash gets passed to S3AFileSystem, where we got fooled to believe 
{{/a/b/c/}} getParent returns {{/a/b}} while it actually returns {{/a/b/c}}.

As to the new test case in this patch, the {{new 
Path(getContract().getTestPath() + "/testMkdirSlashHandling/e///")}} will fail 
w/o this fix because:
# We have triple trailing slashes, while {{Path::normalizePath()}} is only able 
to remove two of them (refer to its source code)
# After {{Path::normalizePath()}}, {{URI::normalize()}} allows one trailing 
slash.
So in S3AFileSystem, the Path to mkdir has the trailing slash and we have to 
remove before calling {{getParent}}.

If only you find this amusing. Part of my understanding maybe wrong; correct me.

The findbugs warnings are unrelated, and the patch still apply cleanly. I'll 
hold on commit by tomorrow. Thanks,

> s3a: mkdir appears to be broken
> -------------------------------
>
>                 Key: HADOOP-14428
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14428
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 3.0.0-alpha2, HADOOP-13345
>            Reporter: Aaron Fabbri
>            Assignee: Mingliang Liu
>            Priority: Blocker
>         Attachments: HADOOP-14428.000.patch, HADOOP-14428.001.patch
>
>
> Reproduction is:
> hadoop fs -mkdir s3a://my-bucket/dir/
> hadoop fs -ls s3a://my-bucket/dir/
> ls: `s3a://my-bucket/dir/': No such file or directory
> I believe this is a regression from HADOOP-14255.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to