Hi, I was able to download the dataset this way (and just reconfirmed it by doing so again): //Following before starting spark export AWS_ACCESS_KEY_ID=*key_id* export AWS_SECRET_ACCESS_KEY=*access_key* //Start spark ./spark-shell //In the spark shell val dataset = sc.textFile("s3n://big-data-benchmark/pavlo/text/tiny/crawl") dataset.saveAsTextFile("/home/tom/hadoop/bigDataBenchmark/test/crawl3.txt")
If you want to do this more often, or use it directly from the cloud instead of from local (which will be slower), you can add these keys to ./conf/spark-env.sh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Retrieve-dataset-of-Big-Data-Benchmark-tp9821p15278.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org