Re: overcommit: cpus / vcores

Peter Liu Mon, 15 Oct 2018 18:06:47 -0700

 Hi Khaled,

I have attached the spark streaming config below in (a).
In case of the 100vcore run (see the initial email), I used 50 executors
where each executor has 2 vcores and 3g memory. For 70 vcore case, 35
executors, for 80 vcore case, 40 executors.
In the yarn config (yarn-site.xml, (b) below), the  available vcores set
over 80 (I call it "overcommit").


Not sure if there is a more proper way to do this (overcommit) and what
would be the best practice in this type of situation (say, light cpu
workload in a dedicated yarn cluster) to increase the cpu utilization for a
better performance.

Any help would be very much appreciated.

Thanks ...

Peter

(a)

   val df = spark
      .readStream
      .format("kafka")
      .option("kafka.bootstrap.servers", kafkaCluster.kafkaNodesString)
      .option("startingOffsets", "latest")
      .option("subscribe", Variables.EVENTS_TOPIC)
      .option("kafkaConsumer.pollTimeoutMs", "5000")
      .load()
      .selectExpr("CAST(value AS STRING)", "CAST(timestamp AS
TIMESTAMP)").as[(String, Timestamp)]
      .select(from_json($"value", mySchema).as("data"), $"timestamp")
      .select("data.*", "timestamp")
      .where($"event_type" === "view")
      .select($"ad_id", $"event_time")
      .join(campaigns.toSeq.toDS().cache(), Seq("ad_id"))
      .groupBy(millisTime(window($"event_time", "10
seconds").getField("start")) as 'time_window, $"campaign_id")
      .agg(count("*") as 'count, max('event_time) as 'lastUpdate)
      .select(to_json(struct("*")) as 'value)
      .writeStream
      .format("kafka")
      .option("kafka.bootstrap.servers", kafkaCluster.kafkaNodesString)
//original
      .option("topic", Variables.OUTPUT_TOPIC)
      .option("checkpointLocation", s"/tmp/${java.util.UUID.randomUUID()}")
//TBD: ram disk?
      .outputMode("update")
      .start()

(b)
<property>
    <name>yarn.nodemanager.resource.cpu-vcores</name>
    <value>110</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>110</value>
</property>

On Mon, Oct 15, 2018 at 4:26 PM Khaled Zaouk <khaledz...@gmail.com> wrote:

> Hi Peter,
>
> What parameters are you putting in your spark streaming configuration?
> What are you putting as number of executor instances and how many cores per
> executor are you setting in your Spark job?
>
> Best,
>
> Khaled
>
> On Mon, Oct 15, 2018 at 9:18 PM Peter Liu <peter.p...@gmail.com> wrote:
>
>> Hi there,
>>
>> I have a system with 80 vcores and a relatively light spark streaming
>> workload. Overcomming the vcore resource (i.e. > 80) in the config (see (a)
>> below) seems to help to improve the average spark batch time (see (b)
>> below).
>>
>> Is there any best practice guideline on resource overcommit with cpu /
>> vcores, such as yarn config options, candidate cases ideal for
>> overcommiting vcores etc.?
>>
>> the slide below (from 2016 though) seems to address the memory overcommit
>> topic and hint a "future" topic on cpu overcommit:
>>
>> https://www.slideshare.net/HadoopSummit/investing-the-effects-of-overcommitting-yarn-resources
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.slideshare.net_HadoopSummit_investing-2Dthe-2Deffects-2Dof-2Dovercommitting-2Dyarn-2Dresources&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=9YF85k6Q86ELSbXl40mkGw&m=ZCbfeVtFh_TC0b2e0fobq62qrBKhQPtyBNfMsVcVzmo&s=UXeomeHkGRlHg9Bxgb81T98oH7zj7T6OmF4dsfhK0Sg&e=>
>>
>> Would like to know if this is a reasonable config practice and why this
>> is not achievable without overcommit. Any help/hint would be very much
>> appreciated!
>>
>> Thanks!
>>
>> Peter
>>
>> (a) yarn-site.xml
>> <property>
>>     <name>yarn.nodemanager.resource.cpu-vcores</name>
>>     <value>110</value>
>> </property>
>>
>> <property>
>> <name>yarn.scheduler.maximum-allocation-vcores</name>
>> <value>110</value>
>> </property>
>>
>>
>> (b)
>> FYI:
>> I have a system with 80 vcores and a relatively light spark streaming
>> workload. overcomming the vocore resource (here 100) seems to help the
>> average spark batch time. need more understanding on this practice.
>> Skylake (1 x 900K msg/sec) total batch# (avg) avg batch time in ms (avg) avg
>> user cpu (%) nw read (mb/sec)
>> 70vocres 178.20 8154.69 n/a n/a
>> 80vocres 177.40 7865.44 27.85 222.31
>> 100vcores 177.00 7,209.37 30.02 220.86
>>
>>

Re: overcommit: cpus / vcores

Reply via email to