Hi there, I have a system with 80 vcores and a relatively light spark streaming workload. Overcomming the vcore resource (i.e. > 80) in the config (see (a) below) seems to help to improve the average spark batch time (see (b) below).
Is there any best practice guideline on resource overcommit with cpu / vcores, such as yarn config options, candidate cases ideal for overcommiting vcores etc.? the slide below (from 2016 though) seems to address the memory overcommit topic and hint a "future" topic on cpu overcommit: https://www.slideshare.net/HadoopSummit/investing-the-effects-of-overcommitting-yarn-resources <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.slideshare.net_HadoopSummit_investing-2Dthe-2Deffects-2Dof-2Dovercommitting-2Dyarn-2Dresources&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=9YF85k6Q86ELSbXl40mkGw&m=ZCbfeVtFh_TC0b2e0fobq62qrBKhQPtyBNfMsVcVzmo&s=UXeomeHkGRlHg9Bxgb81T98oH7zj7T6OmF4dsfhK0Sg&e=> Would like to know if this is a reasonable config practice and why this is not achievable without overcommit. Any help/hint would be very much appreciated! Thanks! Peter (a) yarn-site.xml <property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>110</value> </property> <property> <name>yarn.scheduler.maximum-allocation-vcores</name> <value>110</value> </property> (b) FYI: I have a system with 80 vcores and a relatively light spark streaming workload. overcomming the vocore resource (here 100) seems to help the average spark batch time. need more understanding on this practice. Skylake (1 x 900K msg/sec) total batch# (avg) avg batch time in ms (avg) avg user cpu (%) nw read (mb/sec) 70vocres 178.20 8154.69 n/a n/a 80vocres 177.40 7865.44 27.85 222.31 100vcores 177.00 7,209.37 30.02 220.86