Steve, I like the idea of testing all FS for expected behavior, in HttpFS we are already doing something along these lines testing HttpFS against HDFS and LocalFS. Also testing 2 WebHDFS clients.
Regarding where these 'extensions' would go, well, we could have something like share/hadoop/common/filesystem-ext/s3 and whoever wants to use s3 would have to symlink those JARs into common/lib. Or having a way to activate via a HADOOP_COMMON_FS_EXT env which extension JARs to pick up. I guess the BigTop guys could help defining this magic. On Wed, Feb 13, 2013 at 1:44 AM, Steve Loughran <[email protected]>wrote: > On 12 February 2013 22:09, Eli Collins <[email protected]> wrote: > > > I agree that the current place isn't a good one, for both the reasons > > you mention on the jira (and because the people maintaining this code > > don't primarily work on Hadoop). IMO the SwiftFS driver should live in > > the swift source tree (as part of open stack). > > > > If they could be persuaded to move beyond .py, it'd be tempting -because > the FileSystem API is nominally stable. > > However, one thing I have noticed during this work is how the behaviour of > FileSystem is underspecified -that's not an issue for HDFS, which gets > stressed rigorously during the hdfs and mapred test runs, but it does > matter for the rest. > > There's a lot of assumptions "files!=directories", mv / anything fails, and > things that aren't tested (mv self self) returns true if self is file, > false if a directory, what exception to raise if readFully goes past the > end of a file (and the answer is?). > > We even make an implicit assumption that file operations are consistent: > you get back what you wrote, which turns out to be an assumption not > guaranteed by any of the blobstores in all circumstances. > > HADOOP-9258, HADOOP-9119 tighten the spec a bit, but if you look at what > I've been doing for Swift testing, I've created a set of test suites, one > per operation "ls", "read", "rename", with tests for scale, directory depth > and width on my todo list: > > > https://github.com/hortonworks/Hadoop-and-Swift-integration/tree/master/swift-file-system/src/test/java/org/apache/hadoop/fs/swift > > > Then I want to extract those into tests that can be applied to all > filesystems (say in o.a.g.fs.contract), with some per-FS metadata file > providing details on what the FS supports (rename, append, case > sensitivity, MAX_PATH, ...), so that we've got better test coverage (& > being Junit4, you can skip tests in-code by throwing > AssumptionViolatedExceptions; these get reported as skips), test coverage > that can be applied to all the filesystems in the hadoop codebase. > > It's this expanded test coverage that will be the tightest coupling to > hadoop. > > > > > I'm not -1 on it living in-tree, it's just not my 1st choice. If you > > want to create a top-level directory for 3rd party (read non-local, > > non-hdfs file systems) file systems - go for it. It would be an > > improvement on the current situation (o.a.h.fs.ftp also brings in > > dependencies that most people don't need). I don't think we need to > > come up with a new top-level "kitchen sink" directory to handle all > > Hadoop extensions, there are a few well-defined extension points that > > can likely be handled independently so logically grouping them > > separately makes sense to me (and perhaps we'll decide some extensions > > are better in-tree and some not). > > > > Makes sense. That I will do in a JIRA > -- Alejandro
