Luca Canali created SPARK-30306:
-----------------------------------

             Summary: Instrument Python UDF execution time and metrics using 
Spark Metrics system
                 Key: SPARK-30306
                 URL: https://issues.apache.org/jira/browse/SPARK-30306
             Project: Spark
          Issue Type: Improvement
          Components: PySpark, Spark Core
    Affects Versions: 3.0.0
            Reporter: Luca Canali


This proposes to extend Spark instrumentation to add metrics aimed at 
understanding the performance of Python code called by Spark, via UDF, Pandas 
UDF or with MapPartittions. Relevant performance counters are exposed using the 
Spark Metrics System (based on the Dropwizard library).  This allows to easily 
consume the metrics produced by executors, for example using a performance 
dashboard. See also the attached screenshot.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to