You can't change the executor/driver cores/memory on the fly once you've already started a Spark Context. On Tue, Jul 3, 2018 at 4:30 AM Aakash Basu <aakash.spark....@gmail.com> wrote: > > We aren't using Oozie or similar, moreover, the end to end job shall be > exactly the same, but the data will be extremely different (number of > continuous and categorical columns, vertical size, horizontal size, etc), > hence, if there would have been a calculation of the parameters to arrive at > a conclusion that we can simply get the data and derive the respective > configuration/parameters, it would be great. > > On Tue, Jul 3, 2018 at 1:09 PM, Jörn Franke <jornfra...@gmail.com> wrote: >> >> Don’t do this in your job. Create for different types of jobs different jobs >> and orchestrate them using oozie or similar. >> >> On 3. Jul 2018, at 09:34, Aakash Basu <aakash.spark....@gmail.com> wrote: >> >> Hi, >> >> Cluster - 5 node (1 Driver and 4 workers) >> Driver Config: 16 cores, 32 GB RAM >> Worker Config: 8 cores, 16 GB RAM >> >> I'm using the below parameters from which I know the first chunk is cluster >> dependent and the second chunk is data/code dependent. >> >> --num-executors 4 >> --executor-cores 5 >> --executor-memory 10G >> --driver-cores 5 >> --driver-memory 25G >> >> >> --conf spark.sql.shuffle.partitions=100 >> --conf spark.driver.maxResultSize=2G >> --conf "spark.executor.extraJavaOptions=-XX:+UseParallelGC" >> --conf spark.scheduler.listenerbus.eventqueue.capacity=20000 >> >> I've come upto these values depending on my R&D on the properties and the >> issues I faced and hence the handles. >> >> My ask here is - >> >> 1) How can I infer, using some formula or a code, to calculate the below >> chunk dependent on the data/code? >> 2) What are the other usable properties/configurations which I can use to >> shorten my job runtime? >> >> Thanks, >> Aakash. > >
-- Sent from my iPhone --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org