Hi,

I would like to use the dataset used in the  Big Data Benchmark
<https://amplab.cs.berkeley.edu/benchmark/>   on my own cluster, to run some
tests between Hadoop and Spark. The dataset should be available at
s3n://big-data-benchmark/pavlo/[text|text-deflate|sequence|sequence-snappy]/[suffix],
in the amazon cluster. Is there a way I can download this without being a
user of the Amazon cluster? I tried 
"bin/hadoop distcp s3n://123:456@big-data-benchmark/pavlo/text/tiny/* ./"
but it asks for an AWS Access Key ID and Secret Access Key which I do not
have. 

Thanks in advance,

Tom



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Retrieve-dataset-of-Big-Data-Benchmark-tp9821.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to