I was chatting offline with Roman about this, his point is 1* segration of the FS impls into different modules makes sense 2* it should be OK if they have mock services for unittests 3* bigtop could do real integration testing 4* by doing this, the diff FileSystem impls would be there out of the box
If we go down this path, I'm OK with it. Thoughts? On Fri, Mar 8, 2013 at 9:07 AM, Alejandro Abdelnur <[email protected]>wrote: > > > We are already there with the S3 and Azure blobstores, as well as the FTP > > filesystem > > I think this is not correct and we should plan moving them out. > > This is independent on the effort of straighten up the FS spec, which I > think is great. > > Thx > > On Fri, Mar 8, 2013 at 8:57 AM, Steve Loughran > <[email protected]>wrote: > >> On 8 March 2013 16:15, Alejandro Abdelnur <[email protected]> wrote: >> >> > jumping a bit late into the discussion. >> > >> > yes. I started it in common-dev first, in the "where does contrib stuff >> go >> now", moved to general, where the conclusion was "except for special cases >> like FS clients, it isn't". >> >> Now I'm trying to lay down the location for FS stuff, both for openstack, >> and to handle so proposed dependency changes for s3n:// >> >> >> > I'd argue that unless those filesystems are part of hadoop, their >> clients >> > should not be distributed/build by hadoop. >> > >> > an analogy to this is not wanting Yarn to be the home for AM >> > implementations. >> > >> > a key concern is testability and maintainability. >> > >> >> We are already there with the S3 and Azure blobstores, as well as the FTP >> filesystem >> >> The testability is straightforward for blobstores precisely because all >> you >> need is some credentials and cluster time; there's no requirement to have >> some specific filesystem to hand. Any of those -very much in the vendors >> hand to do their own testing, especially if the "it's a replacement for >> HDFS" assertion is made. >> >> If you look at HADOOP-9361 you can see that I've been defining more >> rigorously than before what our FS expectations are, with HADOOP-9371 >> spelling it out "what happens when you try to readFully() past the end of >> a >> file, or call getBlockLocations("/")? HDFS has actions here, and >> downstream >> code depends on some things (e.g. getBlockLocations() behaviour on >> directories) >> >> https://issues.apache.org/jira/secure/attachment/12572328/HadoopFilesystemContract.pdf >> >> So far my initially blobstore-specific tests for the functional parts of >> the specification (not the consistency, concurrency, atomicity parts) are >> in github >> >> https://github.com/hortonworks/Hadoop-and-Swift-integration/tree/master/swift-file-system/src/test/java/org/apache/hadoop/fs/swift >> >> >> I've also added more tests to the existing FS contract test, and in doing >> so showed that s3 and s3n have some data-loss risks which need to be fixed >> -that's an argument in having favour of the (testable, low-maintenance >> cost) filesystems somewhere where any of us is free to fix. >> >> While we refine that spec better, I want to take those per-operation tests >> from the SwiftFS support, make them retargetable at other filesystems, and >> slowly apply them to all the distributed filesystems. Your colleague >> Andrew >> Wang is helping there by abstracting FileSystem and FileContext away, so >> we >> can test both. >> >> still, i see bigtop as the integration point and the mean of making those >> > jars avail to a setup. >> > >> > >> I plan to put integration -the tests that try to run Pig with arbitrary >> source and dest filesystems, same for hive, plus some scale tests -can we >> upload an 8GB file? What do you get back? can I create > 65536 entries in >> a >> single directory, and what happens to ls / performance? >> >> To summarise then >> >> 1. blobstores, ftpfilesystem & c could gradually move to a >> hadoop-common/hadoop-filesystem-clients >> 2. A stricter specification of compliance, for the benefit of everyone >> -us, other FS implementors and users of FS APIs >> 3. Lots of new functional tests for compliance -abstract in >> hadoop-common; FS-specific in hadoop-filesystem-clients.. >> 4. Integration & scale tests in bigtop >> 5. Anyone writing a "hadoop compatible FS" can grab the functional and >> integration tests and see what breaks -fixing their code. >> 6. The combination of (Java API files, specification doc, functional >> tests, HDFS implementation) define the expected behavior of a >> filesystem >> >> -Steve >> >> >> -Steve >> > > > > -- > Alejandro > -- Alejandro
