> 2. I can add a hadoop-2.6 profile that sets things up for s3a, azure and 
> openstack swift.


Added: 
https://issues.apache.org/jira/browse/SPARK-7481 


One thing to consider here is testing; the s3x clients themselves have some 
tests that individuals/orgs can run against different S3 installations & 
private versions; people publish their results to see that there's been good 
coverage of the different S3 installations with their different consistency 
models & auth mechanisms. 

There's also some scale tests that take time & don't get run so often but which 
throw up surprises (RAX UK throttling DELETE, intermittent ConnectionReset 
exceptions reading multi-GB s3 files). 

Amazon have some public datasets that could be used to verify that spark can 
read files off S3, and maybe even find some of the scale problems.

In particular, http://datasets.elasticmapreduce.s3.amazonaws.com/ publishes 
ngrams as a set of .gz files free for all to read

Would there be a place in the code tree for some tests to run against things 
like this? They're cloud integration tests rather than unit tests and nobody 
would want them to be on by default, but it could be good for regression 
testing hadoop s3 support & spark integration

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to