[ 
https://issues.apache.org/jira/browse/HADOOP-9361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13701311#comment-13701311
 ] 

Steve Loughran commented on HADOOP-9361:
----------------------------------------

This patch

# specifies the a hadoop filesystem
# begins the notion of FS contract tests, with a Hadoop-compatible filesystem 
modeled as a set of paths mapping to data and metadata.
# defines a few initial tests: directory operations, seek operations (lifted 
from the HADOOP-8545 tests), and a test suite for {{FileSystem.concat()}}


h4. Tries to define the behaviour of the {{FileSystem}} class rigorously
# It does this with a model of a filesystem as a set of paths mapping to 
metadata and data, which lets us use set-theory to select parts of the 
filesystem, and in describing how a modified filesystem is derived from is 
predecessor as a result of an operation.
# specifying the preconditions and postconditions of the main operations -based 
on the behaviour of HDFS. 
# begins the notion of FS contract tests, with a Hadoop-compatible filesystem 
modeled as a set of paths mapping to data and metadata.
# defines a few initial tests: directory operations, seek operations (lifted 
from the HADOOP-8545 tests), and a test suite for {{FileSystem.concat()}}

Every filesystem  to be tested must  implement a subclass  of 
{{AbstractFileSystemContract}} which  contains a factory for  their 
{{FileSystem}}
instances (and could  easily do the same for {{FileContext}},  as well as a 
method {{boolean  supports(feature:string)}} that returns true/false
for features like {{"supports-append"}}.  For the local fs and HDFS, the  
definition of supported features is done from  XML config files, using
keys like {{fs.$filesystem.contract.supports-append}}. This allows the future 
option for  these declarations to move into the core- XML files.
The  {{LocalFSContract}} class  updates some  of  its supported  features  
({{is-case-sensitive}}, {{supports-unix-permissions}})  based on  the
features of underlying FS. Every contract  also has a couple of test-only 
features: {{supports-root-tests}} (can you  do things like {{rm /}} in
a test run?) and a method to see if the tests are enabled ({{boolean 
enabled()}}). This lets contract test suites disable themselves if they can't  
run (i.e when the logon details for
a blobstore aren't  in the test configurations).  The enabled() option is 
checked  in the test setups; the  other features can be  tested in the
specific tests -and all downgrade to a skipped test if not set. This makes it 
visible which tests have not actually run.

h4. Instantiates contract tests for the localFS and HDFS. Note that LocalFS 
fails the seek tests, as it downgrades negative seeks to a no-op.

Having written these tests, I can see some limitations of the process.

* the tests have to look for the loosest exception type coming back from a 
failure -e.g. {{IOException}} over {{EOFException}}. 
* needs a story for timeouts: we must have these to deal with blobstores &c, 
and either need a way to allow FS contracts to define these, let the FS 
specific tests define them, or just have some base timeouts large enough to 
deal with blobstores with operations like {{delete(dir)}} being 
{{O(children(dir))}}

h4. Specification Syntax

I'm not convinced the syntax for specifications is great, nor am I sure I'm 
even using it consistently. I've just had
to do something minimal that fits into unformatted text in a {{.apt}} file:

{code}
FileSystem.listStatus(Path P, PathFilter Filter) 

A PathFilter) is a predicate function that returns true iff the path P
meets the filter's requirements.

  Preconditions

-----------

  #path must exist
  exists(FS, P) || throw FileNotFoundException
  
-----------

  Postconditions
  
-----------

  isFile(FS, P) && Filter(P) => [FileStatus(FS, P)]
  
  isFile(FS, P) && !Filter(P) => []
  
  isDir(FS, P) => [all C in children(FS, P) where Filter(C) == true] 
  
-----------
{code}

This blurs C/Java symbols with bits of set theory, and trying to define what 
exceptions to throw
on failed preconditions is something that really needs improving. 

If we could use LaTeX we could do proper set syntax, though we'd still need a 
consistent language
for defining filesystem implementation behaviours.

{code} 
isFile(FS, P) \land \neg Filter(P) \Rightarrow \emptyset

isDir(FS,P) \Rightarrow  \left\{ \forall C \in children(FS, P) : Filter(C) 
\right} 
{code}
This is actually harder to read. 

One other tactic could be to move the specification entirely to javadocs, with 
something at the package level for
the core model & notation, then add precond/postcond specifics as preformatted 
areas in the docs.

This would keep the spec by the signature, increase likelihood of maintenance, 
and for those people who
understood the syntax, be able to understand what the methods do.
                
> Strictly define the expected behavior of filesystem APIs and write tests to 
> verify compliance
> ---------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-9361
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9361
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs, test
>    Affects Versions: 3.0.0, 2.1.0-beta
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-9361-001.patch, HADOOP-9361-002.patch
>
>
> {{FileSystem}} and {{FileContract}} aren't tested rigorously enough -while 
> HDFS gets tested downstream, other filesystems, such as blobstore bindings, 
> don't.
> The only tests that are common are those of {{FileSystemContractTestBase}}, 
> which HADOOP-9258 shows is incomplete.
> I propose 
> # writing more tests which clarify expected behavior
> # testing operations in the interface being in their own JUnit4 test classes, 
> instead of one big test suite. 
> # Having each FS declare via a properties file what behaviors they offer, 
> such as atomic-rename, atomic-delete, umask, immediate-consistency -test 
> methods can downgrade to skipped test cases if a feature is missing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to