> 2. I can add a hadoop-2.6 profile that sets things up for s3a, azure and > openstack swift.
Added: https://issues.apache.org/jira/browse/SPARK-7481 One thing to consider here is testing; the s3x clients themselves have some tests that individuals/orgs can run against different S3 installations & private versions; people publish their results to see that there's been good coverage of the different S3 installations with their different consistency models & auth mechanisms. There's also some scale tests that take time & don't get run so often but which throw up surprises (RAX UK throttling DELETE, intermittent ConnectionReset exceptions reading multi-GB s3 files). Amazon have some public datasets that could be used to verify that spark can read files off S3, and maybe even find some of the scale problems. In particular, http://datasets.elasticmapreduce.s3.amazonaws.com/ publishes ngrams as a set of .gz files free for all to read Would there be a place in the code tree for some tests to run against things like this? They're cloud integration tests rather than unit tests and nobody would want them to be on by default, but it could be good for regression testing hadoop s3 support & spark integration --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org