Re: Failed to initialize HoodiStorageWriter

Umesh Kacha Fri, 08 Mar 2019 23:20:42 -0800

Nishit thanks ok this I did not know it is physically possible if I have 4
node managers can we spawn 74 executors from them? How is it possible? How
much memory should we give to those 72 executors since we have only 4
nodes?? Please guide. I am sorry I may be wrong but I want to clear out
things.


On Sat, Mar 9, 2019 at 3:08 AM nishith agarwal <n3.nas...@gmail.com> wrote:

> Umesh,
>
> Yes, I understand what you are trying to convey.
>
> Since you have YARN, you can just use the following spark configurations :
> *--num-executors 72 --num-cores 1*
>
> An executor is just a JVM started by Spark on a YARN node/container. The
> above config should let you use all the cores.
>
> Thanks,
> Nishith
>
> On Fri, Mar 8, 2019 at 1:18 PM Umesh Kacha <umesh.ka...@gmail.com> wrote:
>
> > Hi Nishit thanks I get that 1 executor + 18 cores = 18 executors + 1 core
> > but what if I don't have those many executors?? I use yarn and I have 4
> > nodes so 18 into 4 equals 72 cores now let's say we have 72 parquet files
> > so as per you I can use 4 executor with one core each processing 4
> parquet
> > files at a time and wasting unnecessarily parallel cores?? You getting me
> > what I am trying to explain.
> >
> > On Sat, Mar 9, 2019, 2:33 AM nishith agarwal <n3.nas...@gmail.com>
> wrote:
> >
> > > Umesh,
> > >
> > > What kind of resource scheduler are you using ? Is it Spark's
> standalone
> > > service ? If yes, you can start 18 executors by changing the
> > > spark-default.conf and restarting your spark cluster (see configs here
> > > <
> > >
> >
> https://spark.apache.org/docs/latest/spark-standalone.html#cluster-launch-scripts
> > > >)
> > > and information on how to do it here
> > > <
> > >
> >
> https://spark.apache.org/docs/latest/spark-standalone.html#executors-scheduling
> > > >.
> > > Find details on how to do it for other resources schedulers on Spark
> > > Deployment tab in the documentation.
> > >
> > > Now, 1 executor + 18 cores = 18 executors + 1 core. Hence, you can get
> > the
> > > same parallelism either way.
> > > Unit of parallelism in Spark = Task = 1 core
> > >
> > > Thousands of parquet files will be spread over multiples tasks with 18
> of
> > > them running in parallel in your case since you have 18 cores at your
> > > disposal.
> > > (PS : The OS might also do some pipelining and context switching for a
> > > single core but that's not very relevant here)
> > >
> > > Hope this helps.
> > >
> > > Thanks,
> > > Nishith
> > >
> > >
> > >
> > > On Fri, Mar 8, 2019 at 12:23 PM Umesh Kacha <umesh.ka...@gmail.com>
> > wrote:
> > >
> > > > Ok that seems like moving away from distributed to single processing
> I
> > > have
> > > > 18 cores per executor now if I dont use all the cores what's the
> point
> > of
> > > > having distributed systems. Also I am just curious how will spark
> unit
> > of
> > > > parallelism work here if we have just one core per executor if I have
> > > > thousands of parquet files it means few executors each with one core
> so
> > > at
> > > > a time few parquet files will be loaded in spark partitions/tasks.
> > Please
> > > > correct me if I am wrong. Thanks.
> > > >
> > > > On Sat, Mar 9, 2019 at 1:46 AM nishith agarwal <n3.nas...@gmail.com>
> > > > wrote:
> > > >
> > > > > Umesh,
> > > > >
> > > > > This issue still persists. Could you please use num-cores = 1 ? You
> > can
> > > > > scale out using num-executors.
> > > > >
> > > > > -Nishith
> > > > >
> > > > > On Fri, Mar 8, 2019 at 12:06 PM Umesh Kacha <umesh.ka...@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > I think issue is this https://github.com/uber/hudi/issues/227 I
> > get
> > > > the
> > > > > > same error and I tried to use multiple executor cores 4 and I am
> > > using
> > > > > > Spark 2.2.0. Is this issue fixed?
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Mar 8, 2019 at 6:58 PM Vinoth Chandar <vin...@apache.org
> >
> > > > wrote:
> > > > > >
> > > > > > > Could you please share the entire stack trace?
> > > > > > >
> > > > > > > On Fri, Mar 8, 2019 at 1:56 AM Umesh Kacha <
> > umesh.ka...@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi I am using Spark Shell to save spark dataframe as Hoodie
> > > dataset
> > > > > > using
> > > > > > > > bulk insert option inside Hoodie spark datasource. It seems
> to
> > be
> > > > > > working
> > > > > > > > and trying to save but in the end it fails giving the
> following
> > > > > > exception
> > > > > > > >
> > > > > > > > Failed to initialize HoodieStorageWriter for path
> > > > > > > > /tmp/hoodie-test/2019/blabla.parquet
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Failed to initialize HoodiStorageWriter

Reply via email to