[ https://issues.apache.org/jira/browse/SPARK-26104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chen Qin updated SPARK-26104: ----------------------------- Labels: Hydrogen (was: ) > make pci devices visible to task scheduler > ------------------------------------------ > > Key: SPARK-26104 > URL: https://issues.apache.org/jira/browse/SPARK-26104 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 2.4.0 > Reporter: Chen Qin > Priority: Major > Labels: Hydrogen > > Spark Task scheduling has long time consider CPU only, depending on how many > vcores each executor has at given moment, the task were scheduled as long as > enough vcores become available. > Moving to deep learning use cases, The fundamental computation and processing > unit switched from CPU to GPU/FPGA + CPU which moves data in and out of GPU > memory. > Deep learning framework build on top of GPU fleets requires fixture of task > to number of GPUs spark haven't support yet. E.g a horord task requires 2 > GPUs running uninterrupted before it finish regardless how CPU availability > in executor. In Uber peloton executor scheduler, the number of cores > available could be more than what user asked due to the fact it might get > over provisioned. > Without definitive occupy of pci device(/gpu1, /gpu2), such workload may run > into unexpected states. > > related jiras allocating executor containers with gpu resources, serve as > bootstrap phase usage > SPARK-19320 Mesos SPARK-24491 K8s SPARK-20327 YARN > Existing SPIP: Accelerator Aware Task Scheduling For Spark SPARK-24615, > compatible with design, approach is a bit different as it tacks utilization > of pci devices where customized taskscheduler could either fallback to "best > to have" approach or implement "must have" approach stated above. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org