Re: Retrieve dataset of Big Data Benchmark

Tom Sat, 27 Sep 2014 14:50:01 -0700

Hi,

I was able to download the dataset this way (and just reconfirmed it by
doing so again):
//Following before starting spark
export AWS_ACCESS_KEY_ID=*key_id* 
export AWS_SECRET_ACCESS_KEY=*access_key*
//Start spark
./spark-shell
//In the spark shell
val dataset = sc.textFile("s3n://big-data-benchmark/pavlo/text/tiny/crawl")
dataset.saveAsTextFile("/home/tom/hadoop/bigDataBenchmark/test/crawl3.txt")


If you want to do this more often, or use it directly from the cloud instead
of from local (which will be slower), you can add these keys to
./conf/spark-env.sh



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Retrieve-dataset-of-Big-Data-Benchmark-tp9821p15278.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Retrieve dataset of Big Data Benchmark

Reply via email to