Nishit thanks ok this I did not know it is physically possible if I have 4 node managers can we spawn 74 executors from them? How is it possible? How much memory should we give to those 72 executors since we have only 4 nodes?? Please guide. I am sorry I may be wrong but I want to clear out things.
On Sat, Mar 9, 2019 at 3:08 AM nishith agarwal <n3.nas...@gmail.com> wrote: > Umesh, > > Yes, I understand what you are trying to convey. > > Since you have YARN, you can just use the following spark configurations : > *--num-executors 72 --num-cores 1* > > An executor is just a JVM started by Spark on a YARN node/container. The > above config should let you use all the cores. > > Thanks, > Nishith > > On Fri, Mar 8, 2019 at 1:18 PM Umesh Kacha <umesh.ka...@gmail.com> wrote: > > > Hi Nishit thanks I get that 1 executor + 18 cores = 18 executors + 1 core > > but what if I don't have those many executors?? I use yarn and I have 4 > > nodes so 18 into 4 equals 72 cores now let's say we have 72 parquet files > > so as per you I can use 4 executor with one core each processing 4 > parquet > > files at a time and wasting unnecessarily parallel cores?? You getting me > > what I am trying to explain. > > > > On Sat, Mar 9, 2019, 2:33 AM nishith agarwal <n3.nas...@gmail.com> > wrote: > > > > > Umesh, > > > > > > What kind of resource scheduler are you using ? Is it Spark's > standalone > > > service ? If yes, you can start 18 executors by changing the > > > spark-default.conf and restarting your spark cluster (see configs here > > > < > > > > > > https://spark.apache.org/docs/latest/spark-standalone.html#cluster-launch-scripts > > > >) > > > and information on how to do it here > > > < > > > > > > https://spark.apache.org/docs/latest/spark-standalone.html#executors-scheduling > > > >. > > > Find details on how to do it for other resources schedulers on Spark > > > Deployment tab in the documentation. > > > > > > Now, 1 executor + 18 cores = 18 executors + 1 core. Hence, you can get > > the > > > same parallelism either way. > > > Unit of parallelism in Spark = Task = 1 core > > > > > > Thousands of parquet files will be spread over multiples tasks with 18 > of > > > them running in parallel in your case since you have 18 cores at your > > > disposal. > > > (PS : The OS might also do some pipelining and context switching for a > > > single core but that's not very relevant here) > > > > > > Hope this helps. > > > > > > Thanks, > > > Nishith > > > > > > > > > > > > On Fri, Mar 8, 2019 at 12:23 PM Umesh Kacha <umesh.ka...@gmail.com> > > wrote: > > > > > > > Ok that seems like moving away from distributed to single processing > I > > > have > > > > 18 cores per executor now if I dont use all the cores what's the > point > > of > > > > having distributed systems. Also I am just curious how will spark > unit > > of > > > > parallelism work here if we have just one core per executor if I have > > > > thousands of parquet files it means few executors each with one core > so > > > at > > > > a time few parquet files will be loaded in spark partitions/tasks. > > Please > > > > correct me if I am wrong. Thanks. > > > > > > > > On Sat, Mar 9, 2019 at 1:46 AM nishith agarwal <n3.nas...@gmail.com> > > > > wrote: > > > > > > > > > Umesh, > > > > > > > > > > This issue still persists. Could you please use num-cores = 1 ? You > > can > > > > > scale out using num-executors. > > > > > > > > > > -Nishith > > > > > > > > > > On Fri, Mar 8, 2019 at 12:06 PM Umesh Kacha <umesh.ka...@gmail.com > > > > > > wrote: > > > > > > > > > > > I think issue is this https://github.com/uber/hudi/issues/227 I > > get > > > > the > > > > > > same error and I tried to use multiple executor cores 4 and I am > > > using > > > > > > Spark 2.2.0. Is this issue fixed? > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Mar 8, 2019 at 6:58 PM Vinoth Chandar <vin...@apache.org > > > > > > wrote: > > > > > > > > > > > > > Could you please share the entire stack trace? > > > > > > > > > > > > > > On Fri, Mar 8, 2019 at 1:56 AM Umesh Kacha < > > umesh.ka...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > > > Hi I am using Spark Shell to save spark dataframe as Hoodie > > > dataset > > > > > > using > > > > > > > > bulk insert option inside Hoodie spark datasource. It seems > to > > be > > > > > > working > > > > > > > > and trying to save but in the end it fails giving the > following > > > > > > exception > > > > > > > > > > > > > > > > Failed to initialize HoodieStorageWriter for path > > > > > > > > /tmp/hoodie-test/2019/blabla.parquet > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >