Python Dependencies Issue on EMR

2018-09-13 Thread Jonas Shomorony
Hey everyone, I am currently trying to run a Python Spark job (using YARN client mode) that uses multiple libraries, on a Spark cluster on Amazon EMR. To do that, I create a dependencies.zip file that contains all of the dependencies/libraries (installed through pip) for the job to run

Trying to improve performance of the driver.

2018-09-13 Thread Guillermo Ortiz Fernández
I have a process in Spark Streamin which lasts 2 seconds. When I check where the time is spent I see about 0.8s-1s in processing time although the global time is 2s. This one second is spent in the driver. I reviewed the code which is executed by the driver and I commented some of this code with

Unsubscribe

2018-09-13 Thread Pekka Lehtonen