The stupid question is whether you're deleting the file from hdfs on the right node? On Thu, Feb 19, 2015 at 11:31 AM Pavel Velikhov <pavel.velik...@gmail.com> wrote:
> Yeah, I do manually delete the files, but it still fails with this error. > > On Feb 19, 2015, at 8:16 PM, Ganelin, Ilya <ilya.gane...@capitalone.com> > wrote: > > When writing to hdfs Spark will not overwrite existing files or > directories. You must either manually delete these or use Java's Hadoop > FileSystem class to remove them. > > > > Sent with Good (www.good.com) > > > -----Original Message----- > *From: *Pavel Velikhov [pavel.velik...@gmail.com] > *Sent: *Thursday, February 19, 2015 11:32 AM Eastern Standard Time > *To: *user@spark.apache.org > *Subject: *Spark job fails on cluster but works fine on a single machine > > I have a simple Spark job that goes out to Cassandra, runs a pipe and > stores results: > > val sc = new SparkContext(conf) > val rdd = sc.cassandraTable(“keyspace", “table") > .map(r => r.getInt(“column") + "\t" + > write(get_lemmas(r.getString("tags")))) > .pipe("python3 /tmp/scripts_and_models/scripts/run.py") > .map(r => convertStr(r) ) > .coalesce(1,true) > .saveAsTextFile("/tmp/pavel/CassandraPipeTest.txt") > //.saveToCassandra(“keyspace", “table", SomeColumns(“id”,"data”)) > > When run on a single machine, everything is fine if I save to an hdfs file > or save to Cassandra. > When run in cluster neither works: > > - When saving to file, I get an exception: User class threw exception: > Output directory hdfs://hadoop01:54310/tmp/pavel/CassandraPipeTest.txt > already exists > - When saving to Cassandra, only 4 rows are updated with empty data (I > test on a 4-machine Spark cluster) > > Any hints on how to debug this and where the problem could be? > > - I delete the hdfs file before running > - Would really like the output to hdfs to work, so I can debug > - Then it would be nice to save to Cassandra > > ------------------------------ > The information contained in this e-mail is confidential and/or > proprietary to Capital One and/or its affiliates. The information > transmitted herewith is intended only for use by the individual or entity > to which it is addressed. If the reader of this message is not the > intended recipient, you are hereby notified that any review, > retransmission, dissemination, distribution, copying or other use of, or > taking of any action in reliance upon this information is strictly > prohibited. If you have received this communication in error, please > contact the sender and delete the material from your computer. > > >