Luca Canali created SPARK-30306:
-----------------------------------
Summary: Instrument Python UDF execution time and metrics using
Spark Metrics system
Key: SPARK-30306
URL: https://issues.apache.org/jira/browse/SPARK-30306
Project: Spark
Issue Type: Improvement
Components: PySpark, Spark Core
Affects Versions: 3.0.0
Reporter: Luca Canali
This proposes to extend Spark instrumentation to add metrics aimed at
understanding the performance of Python code called by Spark, via UDF, Pandas
UDF or with MapPartittions. Relevant performance counters are exposed using the
Spark Metrics System (based on the Dropwizard library). This allows to easily
consume the metrics produced by executors, for example using a performance
dashboard. See also the attached screenshot.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]