increasing concurrency of saveAsNewAPIHadoopFile?

Sandeep Parikh Thu, 19 Jun 2014 12:39:26 -0700

I'm trying to write a JavaPairRDD to a downstream database using
saveAsNewAPIHadoopFile with a custom OutputFormat and the process is pretty
slow.


Is there a way to boost the concurrency of the save process? For example,
something like splitting the RDD into multiple smaller RDDs and using Java
threads to write the data out? That seems foreign to the way Spark works so
not sure if there's a better way.

increasing concurrency of saveAsNewAPIHadoopFile?

Reply via email to