Hi, I've downloaded and kept the same set of data files on all my cluster nodes, in the same absolute path - say /home/xyzuser/data/*. I am now trying to perform an operation(say open(filename).read()) on all these files in spark, but by passing local file paths. I was under the assumption that as long as the worker can find the file path it will be able to execute it. However, my Spark tasks fail with the error(/home/xyzuser/data/* is not present) - and Im sure its present on all my worker nodes.
If this experiment was successful I was planning to setup a NFS (actually more like a read-only cloud persistent disk connected to my cluster nodes in dataproc) and use that instead. What exactly is going wrong here? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Using-Spark-with-Local-File-System-NFS-tp28781.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org