[ 
https://issues.apache.org/jira/browse/SPARK-34265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luca Canali updated SPARK-34265:
--------------------------------
    Description: 
This proposes to add SQLMetrics instrumentation for Python UDF. This is aimed 
at improving monitoring and performance troubleshooting of Python code called 
by Spark, via UDF, Pandas UDF or with MapPartittions.
The introduced metrics are exposed to the end users via the WebUI interface, in 
the SQL tab for execution steps related to Python UDF execution. 
Thes scope of this has been limited to Pandas UDF and related operatio, namely: 
ArrowEvalPython, AggregateInPandas, FlaMapGroupsInPandas, MapInPandas, 
FlatMapsCoGroupsInPandas, PythonMapInArrow, WindowsInPandas.
See also the attached screenshot.

  was:
This proposes to add SQLMetrics instrumentation for Python UDF. This is aimed 
at improving monitoring and performance troubleshooting of Python code called 
by Spark, via UDF, Pandas UDF or with MapPartittions.
The introduced metrics are exposed to the end users via the WebUI interface, in 
the SQL tab for execution steps related to Python UDF execution, namely 
BatchEvalPython, ArrowEvalPython, AggregateInPandas, FlaMapGroupsInPandas, 
FlatMapsCoGroupsInPandas, WindowsInPandas.
See also the attached screenshot.


> Instrument Python UDF execution using SQL Metrics
> -------------------------------------------------
>
>                 Key: SPARK-34265
>                 URL: https://issues.apache.org/jira/browse/SPARK-34265
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark, SQL
>    Affects Versions: 3.1.1
>            Reporter: Luca Canali
>            Priority: Minor
>         Attachments: PandasUDF_ArrowEvalPython_Metrics.png, 
> PythonSQLMetrics_Jira_Picture.png, proposed_Python_SQLmetrics_v20210128.png
>
>
> This proposes to add SQLMetrics instrumentation for Python UDF. This is aimed 
> at improving monitoring and performance troubleshooting of Python code called 
> by Spark, via UDF, Pandas UDF or with MapPartittions.
> The introduced metrics are exposed to the end users via the WebUI interface, 
> in the SQL tab for execution steps related to Python UDF execution. 
> Thes scope of this has been limited to Pandas UDF and related operatio, 
> namely: ArrowEvalPython, AggregateInPandas, FlaMapGroupsInPandas, 
> MapInPandas, FlatMapsCoGroupsInPandas, PythonMapInArrow, WindowsInPandas.
> See also the attached screenshot.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to