The stupid question is whether you're deleting the file from hdfs on the
right node?
On Thu, Feb 19, 2015 at 11:31 AM Pavel Velikhov <pavel.velik...@gmail.com>
wrote:

> Yeah, I do manually delete the files, but it still fails with this error.
>
> On Feb 19, 2015, at 8:16 PM, Ganelin, Ilya <ilya.gane...@capitalone.com>
> wrote:
>
>  When writing to hdfs Spark will not overwrite existing files or
> directories. You must either manually delete these or use Java's Hadoop
> FileSystem class to remove them.
>
>
>
> Sent with Good (www.good.com)
>
>
> -----Original Message-----
> *From: *Pavel Velikhov [pavel.velik...@gmail.com]
> *Sent: *Thursday, February 19, 2015 11:32 AM Eastern Standard Time
> *To: *user@spark.apache.org
> *Subject: *Spark job fails on cluster but works fine on a single machine
>
> I have a simple Spark job that goes out to Cassandra, runs a pipe and
> stores results:
>
>  val sc = new SparkContext(conf)
> val rdd = sc.cassandraTable(“keyspace", “table")
>       .map(r => r.getInt(“column") + "\t" +
> write(get_lemmas(r.getString("tags"))))
>       .pipe("python3 /tmp/scripts_and_models/scripts/run.py")
>       .map(r => convertStr(r) )
>       .coalesce(1,true)
>       .saveAsTextFile("/tmp/pavel/CassandraPipeTest.txt")
>       //.saveToCassandra(“keyspace", “table", SomeColumns(“id”,"data”))
>
> When run on a single machine, everything is fine if I save to an hdfs file
> or save to Cassandra.
> When run in cluster neither works:
>
>  - When saving to file, I get an exception: User class threw exception:
> Output directory hdfs://hadoop01:54310/tmp/pavel/CassandraPipeTest.txt
> already exists
>  - When saving to Cassandra, only 4 rows are updated with empty data (I
> test on a 4-machine Spark cluster)
>
> Any hints on how to debug this and where the problem could be?
>
> - I delete the hdfs file before running
> - Would really like the output to hdfs to work, so I can debug
> - Then it would be nice to save to Cassandra
>
> ------------------------------
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed.  If the reader of this message is not the
> intended recipient, you are hereby notified that any review,
> retransmission, dissemination, distribution, copying or other use of, or
> taking of any action in reliance upon this information is strictly
> prohibited. If you have received this communication in error, please
> contact the sender and delete the material from your computer.
>
>
>

Reply via email to