Hi there,

I have a system with 80 vcores and a relatively light spark streaming
workload. Overcomming the vcore resource (i.e. > 80) in the config (see (a)
below) seems to help to improve the average spark batch time (see (b)
below).

Is there any best practice guideline on resource overcommit with cpu /
vcores, such as yarn config options, candidate cases ideal for
overcommiting vcores etc.?

the slide below (from 2016 though) seems to address the memory overcommit
topic and hint a "future" topic on cpu overcommit:
https://www.slideshare.net/HadoopSummit/investing-the-effects-of-overcommitting-yarn-resources
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.slideshare.net_HadoopSummit_investing-2Dthe-2Deffects-2Dof-2Dovercommitting-2Dyarn-2Dresources&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=9YF85k6Q86ELSbXl40mkGw&m=ZCbfeVtFh_TC0b2e0fobq62qrBKhQPtyBNfMsVcVzmo&s=UXeomeHkGRlHg9Bxgb81T98oH7zj7T6OmF4dsfhK0Sg&e=>

Would like to know if this is a reasonable config practice and why this is
not achievable without overcommit. Any help/hint would be very much
appreciated!

Thanks!

Peter

(a) yarn-site.xml
<property>
    <name>yarn.nodemanager.resource.cpu-vcores</name>
    <value>110</value>
</property>

<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>110</value>
</property>


(b)
FYI:
I have a system with 80 vcores and a relatively light spark streaming
workload. overcomming the vocore resource (here 100) seems to help the
average spark batch time. need more understanding on this practice.
Skylake (1 x 900K msg/sec) total batch# (avg) avg batch time in ms (avg) avg
user cpu (%) nw read (mb/sec)
70vocres 178.20 8154.69 n/a n/a
80vocres 177.40 7865.44 27.85 222.31
100vcores 177.00 7,209.37 30.02 220.86

Reply via email to