[ 
https://issues.apache.org/jira/browse/HADOOP-9361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-9361:
-----------------------------------

    Attachment: HADOOP-9361-007.patch

Updated patch

# explicit option for an FS to not fail on a {{seek()}} on a closed file, only 
on the read that follows.
# add raw local tests that bypass the {{ChecksumFilesystem}} -mostly to see 
where quirks lay -and a couple that go down to Java's {{File}} class just to 
make sure.
# more tests on rename behaviour

h2. Outstanding "ambiguities"

h3. Major: rename file onto file

On OS/X, LocalFS (and RawLocal) let you rename a source file over a destination 
file; the source data becomes accessible at the source path. This is what 
{{mv}} does at the command line, so presumably it's legitimate, even if HDFS 
and the blobstore APIs all reject this.

This is a major difference in semantics from HDFS and Posix filesystems -code 
I've written to assume that renaming a file fails if the destination exists 
works well to implement some implicit concurrency control on HDFS, but would 
not work on a Posix FS.

We have the option of making the rename operation stricter by explicitly adding 
a check into rawlocal, but there's still a race condition between any check and 
the OS-level rename action. 

What does that leave? It leaves documenting this somewhere for end-users.


h3. Minor:

RawLocal returns true if you attempt to delete a nonexistent path; everything 
else (including {{File.delete())}} returns true. This behaviour comes from 
{{FileUtil.fullyDelete(f)}}, which does not check for a file existence first, 
and when it gets false back from {{File.delete()}} generates the return code 
{{!File.exists()}}. That is, the semantics of fully delete are "return true if 
there is no directory at the end of the operation"


More subtly, it there is a small a race condition where you could accidentally 
recursively delete a directory by attempting to delete a nonexistent file while 
another process is creating a directory of the same name

This is because the check for {{!isFile()}} and !recursive are false when the 
file does not exist, but by the time the delete operation starts the rename has 
created a directory tree.

{code}
    File f = pathToFile(p);
    if (f.isFile()) {
      return f.delete();
    } else if (!recursive && f.isDirectory() && 
        (FileUtil.listFiles(f).length != 0)) {
      throw new IOException("Directory " + f.toString() + " is not empty");
    }
    return FileUtil.fullyDelete(f);
  }
{code}

Adding an existence check at the start of the sequence produces a consistent 
return code and eliminates this aspect of the race condition -this patch does 
exactly that.

There is still a minor race: adding an entry to a directory after the empty 
check and before the fullyDelete call. 

the recursive flag logic should really be moved into a {{FileUtil}} method 
itself.

> Strictly define the expected behavior of filesystem APIs and write tests to 
> verify compliance
> ---------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-9361
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9361
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs, test
>    Affects Versions: 3.0.0, 2.2.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-9361-001.patch, HADOOP-9361-002.patch, 
> HADOOP-9361-003.patch, HADOOP-9361-004.patch, HADOOP-9361-005.patch, 
> HADOOP-9361-006.patch, HADOOP-9361-007.patch
>
>
> {{FileSystem}} and {{FileContract}} aren't tested rigorously enough -while 
> HDFS gets tested downstream, other filesystems, such as blobstore bindings, 
> don't.
> The only tests that are common are those of {{FileSystemContractTestBase}}, 
> which HADOOP-9258 shows is incomplete.
> I propose 
> # writing more tests which clarify expected behavior
> # testing operations in the interface being in their own JUnit4 test classes, 
> instead of one big test suite. 
> # Having each FS declare via a properties file what behaviors they offer, 
> such as atomic-rename, atomic-delete, umask, immediate-consistency -test 
> methods can downgrade to skipped test cases if a feature is missing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to