Did you happened to have a look at this https://github.com/abashev/vfs-s3
Thanks Best Regards On Tue, May 12, 2015 at 11:33 PM, Stephen Carman <scar...@coldlight.com> wrote: > We have a small mesos cluster and these slaves need to have a vfs setup on > them so that the slaves can pull down the data they need from S3 when spark > runs. > > There doesn’t seem to be any obvious way online on how to do this or how > easily accomplish this. Does anyone have some best practices or some ideas > about how to accomplish this? > > An example stack trace when a job is ran on the mesos cluster… > > Any idea how to get this going? Like somehow bootstrapping spark on run or > something? > > Thanks, > Steve > > > java.io.IOException: Unsupported scheme s3n for URI s3n://removed > at com.coldlight.ccc.vfs.NeuronPath.toPath(NeuronPath.java:43) > at > com.coldlight.neuron.data.ClquetPartitionedData.makeInputStream(ClquetPartitionedData.java:465) > at > com.coldlight.neuron.data.ClquetPartitionedData.access$200(ClquetPartitionedData.java:42) > at > com.coldlight.neuron.data.ClquetPartitionedData$Iter.<init>(ClquetPartitionedData.java:330) > at > com.coldlight.neuron.data.ClquetPartitionedData.compute(ClquetPartitionedData.java:304) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) > at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) > at org.apache.spark.scheduler.Task.run(Task.scala:64) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 15/05/12 13:57:51 ERROR Executor: Exception in task 0.1 in stage 0.0 (TID > 1) > java.lang.RuntimeException: java.io.IOException: Unsupported scheme s3n > for URI s3n://removed > at > com.coldlight.neuron.data.ClquetPartitionedData.compute(ClquetPartitionedData.java:307) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) > at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) > at org.apache.spark.scheduler.Task.run(Task.scala:64) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Unsupported scheme s3n for URI > s3n://removed > at com.coldlight.ccc.vfs.NeuronPath.toPath(NeuronPath.java:43) > at > com.coldlight.neuron.data.ClquetPartitionedData.makeInputStream(ClquetPartitionedData.java:465) > at > com.coldlight.neuron.data.ClquetPartitionedData.access$200(ClquetPartitionedData.java:42) > at > com.coldlight.neuron.data.ClquetPartitionedData$Iter.<init>(ClquetPartitionedData.java:330) > at > com.coldlight.neuron.data.ClquetPartitionedData.compute(ClquetPartitionedData.java:304) > ... 8 more > > This e-mail is intended solely for the above-mentioned recipient and it > may contain confidential or privileged information. If you have received it > in error, please notify us immediately and delete the e-mail. You must not > copy, distribute, disclose or take any action in reliance on it. In > addition, the contents of an attachment to this e-mail may contain software > viruses which could damage your own computer system. While ColdLight > Solutions, LLC has taken every reasonable precaution to minimize this risk, > we cannot accept liability for any damage which you sustain as a result of > software viruses. You should perform your own virus checks before opening > the attachment. >