Re: Is there a limit on the number of tasks in one job?

2016-06-14 Thread Khaled Hammouda
Yes, I check Spark UI to follow what’s going on. It seems to start several tasks fine (8 tasks in my case) out of ~70k tasks, and then stalls. I actually was able to get things to work by disabling dynamic allocation. Basically I set the number of executors manually, which disables dynamic

Re: Is there a limit on the number of tasks in one job?

2016-06-13 Thread Takeshi Yamamuro
Hi, You can control an initial num. of partitions (tasks) in v2.0. https://www.mail-archive.com/user@spark.apache.org/msg51603.html // maropu On Tue, Jun 14, 2016 at 7:24 AM, Mich Talebzadeh wrote: > Have you looked at spark GUI to see what it is waiting for. is

Re: Is there a limit on the number of tasks in one job?

2016-06-13 Thread Mich Talebzadeh
Have you looked at spark GUI to see what it is waiting for. is that available memory. What is the resource manager you are using? Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: Is there a limit on the number of tasks in one job?

2016-06-13 Thread Khaled Hammouda
Hi Michael, Thanks for the suggestion to use Spark 2.0 preview. I just downloaded the preview and tried using it, but I’m running into the exact same issue. Khaled > On Jun 13, 2016, at 2:58 PM, Michael Armbrust wrote: > > You might try with the Spark 2.0 preview. We

Re: Is there a limit on the number of tasks in one job?

2016-06-13 Thread Michael Armbrust
You might try with the Spark 2.0 preview. We spent a bunch of time improving the handling of many small files. On Mon, Jun 13, 2016 at 11:19 AM, khaled.hammouda wrote: > I'm trying to use Spark SQL to load json data that are split across about > 70k > files across 24

Is there a limit on the number of tasks in one job?

2016-06-13 Thread khaled.hammouda
I'm trying to use Spark SQL to load json data that are split across about 70k files across 24 directories in hdfs, using sqlContext.read.json("hdfs:///user/hadoop/data/*/*"). This doesn't seem to work for some reason, I get timeout errors like the following: --- 6/06/13 15:46:31 ERROR