> We are already there with the S3 and Azure blobstores, as well as the FTP > filesystem
I think this is not correct and we should plan moving them out. This is independent on the effort of straighten up the FS spec, which I think is great. Thx On Fri, Mar 8, 2013 at 8:57 AM, Steve Loughran <[email protected]>wrote: > On 8 March 2013 16:15, Alejandro Abdelnur <[email protected]> wrote: > > > jumping a bit late into the discussion. > > > > yes. I started it in common-dev first, in the "where does contrib stuff > go > now", moved to general, where the conclusion was "except for special cases > like FS clients, it isn't". > > Now I'm trying to lay down the location for FS stuff, both for openstack, > and to handle so proposed dependency changes for s3n:// > > > > I'd argue that unless those filesystems are part of hadoop, their clients > > should not be distributed/build by hadoop. > > > > an analogy to this is not wanting Yarn to be the home for AM > > implementations. > > > > a key concern is testability and maintainability. > > > > We are already there with the S3 and Azure blobstores, as well as the FTP > filesystem > > The testability is straightforward for blobstores precisely because all you > need is some credentials and cluster time; there's no requirement to have > some specific filesystem to hand. Any of those -very much in the vendors > hand to do their own testing, especially if the "it's a replacement for > HDFS" assertion is made. > > If you look at HADOOP-9361 you can see that I've been defining more > rigorously than before what our FS expectations are, with HADOOP-9371 > spelling it out "what happens when you try to readFully() past the end of a > file, or call getBlockLocations("/")? HDFS has actions here, and downstream > code depends on some things (e.g. getBlockLocations() behaviour on > directories) > > https://issues.apache.org/jira/secure/attachment/12572328/HadoopFilesystemContract.pdf > > So far my initially blobstore-specific tests for the functional parts of > the specification (not the consistency, concurrency, atomicity parts) are > in github > > https://github.com/hortonworks/Hadoop-and-Swift-integration/tree/master/swift-file-system/src/test/java/org/apache/hadoop/fs/swift > > > I've also added more tests to the existing FS contract test, and in doing > so showed that s3 and s3n have some data-loss risks which need to be fixed > -that's an argument in having favour of the (testable, low-maintenance > cost) filesystems somewhere where any of us is free to fix. > > While we refine that spec better, I want to take those per-operation tests > from the SwiftFS support, make them retargetable at other filesystems, and > slowly apply them to all the distributed filesystems. Your colleague Andrew > Wang is helping there by abstracting FileSystem and FileContext away, so we > can test both. > > still, i see bigtop as the integration point and the mean of making those > > jars avail to a setup. > > > > > I plan to put integration -the tests that try to run Pig with arbitrary > source and dest filesystems, same for hive, plus some scale tests -can we > upload an 8GB file? What do you get back? can I create > 65536 entries in a > single directory, and what happens to ls / performance? > > To summarise then > > 1. blobstores, ftpfilesystem & c could gradually move to a > hadoop-common/hadoop-filesystem-clients > 2. A stricter specification of compliance, for the benefit of everyone > -us, other FS implementors and users of FS APIs > 3. Lots of new functional tests for compliance -abstract in > hadoop-common; FS-specific in hadoop-filesystem-clients.. > 4. Integration & scale tests in bigtop > 5. Anyone writing a "hadoop compatible FS" can grab the functional and > integration tests and see what breaks -fixing their code. > 6. The combination of (Java API files, specification doc, functional > tests, HDFS implementation) define the expected behavior of a filesystem > > -Steve > > > -Steve > -- Alejandro
