I am loading a csv text file from s3 into spark, filtering and mapping the
records and writing the result to s3.

I have tried several input sizes: 100k rows, 1M rows & 3.5M rows. The former
two finish successfully while the latter (3.5M rows) hangs in some weird
state in which the job stages monitor web app (the one in port 4040) stops ,
and the command line console gets stuck and does not even respond to ctrl-c.
The Master's web monitoring app still responds and shows the state as
FINISHED.

In s3, I see an empty directory with a single zero-sized entry
_temporary_$folder$. The s3 url is given using the s3n:// protocol.

I did not see any error in the logs in the web console. I also tried several
cluster sizes (1 master + 1 worker, 1 master + 5 workers) and got to the
same state.

Has anyone encountered such an issue? Any idea what's going on?

I also posted this question to Stack Overflow: 
http://stackoverflow.com/questions/25226419/saveastextfile-to-s3-on-spark-does-not-work-just-hangs
<http://stackoverflow.com/questions/25226419/saveastextfile-to-s3-on-spark-does-not-work-just-hangs>
  



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/saveAsTextFile-to-s3-on-spark-does-not-work-just-hangs-tp7795.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to