[
https://issues.apache.org/jira/browse/HADOOP-9361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13701311#comment-13701311
]
Steve Loughran commented on HADOOP-9361:
----------------------------------------
This patch
# specifies the a hadoop filesystem
# begins the notion of FS contract tests, with a Hadoop-compatible filesystem
modeled as a set of paths mapping to data and metadata.
# defines a few initial tests: directory operations, seek operations (lifted
from the HADOOP-8545 tests), and a test suite for {{FileSystem.concat()}}
h4. Tries to define the behaviour of the {{FileSystem}} class rigorously
# It does this with a model of a filesystem as a set of paths mapping to
metadata and data, which lets us use set-theory to select parts of the
filesystem, and in describing how a modified filesystem is derived from is
predecessor as a result of an operation.
# specifying the preconditions and postconditions of the main operations -based
on the behaviour of HDFS.
# begins the notion of FS contract tests, with a Hadoop-compatible filesystem
modeled as a set of paths mapping to data and metadata.
# defines a few initial tests: directory operations, seek operations (lifted
from the HADOOP-8545 tests), and a test suite for {{FileSystem.concat()}}
Every filesystem to be tested must implement a subclass of
{{AbstractFileSystemContract}} which contains a factory for their
{{FileSystem}}
instances (and could easily do the same for {{FileContext}}, as well as a
method {{boolean supports(feature:string)}} that returns true/false
for features like {{"supports-append"}}. For the local fs and HDFS, the
definition of supported features is done from XML config files, using
keys like {{fs.$filesystem.contract.supports-append}}. This allows the future
option for these declarations to move into the core- XML files.
The {{LocalFSContract}} class updates some of its supported features
({{is-case-sensitive}}, {{supports-unix-permissions}}) based on the
features of underlying FS. Every contract also has a couple of test-only
features: {{supports-root-tests}} (can you do things like {{rm /}} in
a test run?) and a method to see if the tests are enabled ({{boolean
enabled()}}). This lets contract test suites disable themselves if they can't
run (i.e when the logon details for
a blobstore aren't in the test configurations). The enabled() option is
checked in the test setups; the other features can be tested in the
specific tests -and all downgrade to a skipped test if not set. This makes it
visible which tests have not actually run.
h4. Instantiates contract tests for the localFS and HDFS. Note that LocalFS
fails the seek tests, as it downgrades negative seeks to a no-op.
Having written these tests, I can see some limitations of the process.
* the tests have to look for the loosest exception type coming back from a
failure -e.g. {{IOException}} over {{EOFException}}.
* needs a story for timeouts: we must have these to deal with blobstores &c,
and either need a way to allow FS contracts to define these, let the FS
specific tests define them, or just have some base timeouts large enough to
deal with blobstores with operations like {{delete(dir)}} being
{{O(children(dir))}}
h4. Specification Syntax
I'm not convinced the syntax for specifications is great, nor am I sure I'm
even using it consistently. I've just had
to do something minimal that fits into unformatted text in a {{.apt}} file:
{code}
FileSystem.listStatus(Path P, PathFilter Filter)
A PathFilter) is a predicate function that returns true iff the path P
meets the filter's requirements.
Preconditions
-----------
#path must exist
exists(FS, P) || throw FileNotFoundException
-----------
Postconditions
-----------
isFile(FS, P) && Filter(P) => [FileStatus(FS, P)]
isFile(FS, P) && !Filter(P) => []
isDir(FS, P) => [all C in children(FS, P) where Filter(C) == true]
-----------
{code}
This blurs C/Java symbols with bits of set theory, and trying to define what
exceptions to throw
on failed preconditions is something that really needs improving.
If we could use LaTeX we could do proper set syntax, though we'd still need a
consistent language
for defining filesystem implementation behaviours.
{code}
isFile(FS, P) \land \neg Filter(P) \Rightarrow \emptyset
isDir(FS,P) \Rightarrow \left\{ \forall C \in children(FS, P) : Filter(C)
\right}
{code}
This is actually harder to read.
One other tactic could be to move the specification entirely to javadocs, with
something at the package level for
the core model & notation, then add precond/postcond specifics as preformatted
areas in the docs.
This would keep the spec by the signature, increase likelihood of maintenance,
and for those people who
understood the syntax, be able to understand what the methods do.
> Strictly define the expected behavior of filesystem APIs and write tests to
> verify compliance
> ---------------------------------------------------------------------------------------------
>
> Key: HADOOP-9361
> URL: https://issues.apache.org/jira/browse/HADOOP-9361
> Project: Hadoop Common
> Issue Type: Improvement
> Components: fs, test
> Affects Versions: 3.0.0, 2.1.0-beta
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Attachments: HADOOP-9361-001.patch, HADOOP-9361-002.patch
>
>
> {{FileSystem}} and {{FileContract}} aren't tested rigorously enough -while
> HDFS gets tested downstream, other filesystems, such as blobstore bindings,
> don't.
> The only tests that are common are those of {{FileSystemContractTestBase}},
> which HADOOP-9258 shows is incomplete.
> I propose
> # writing more tests which clarify expected behavior
> # testing operations in the interface being in their own JUnit4 test classes,
> instead of one big test suite.
> # Having each FS declare via a properties file what behaviors they offer,
> such as atomic-rename, atomic-delete, umask, immediate-consistency -test
> methods can downgrade to skipped test cases if a feature is missing.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira