[jira] [Updated] (SPARK-54223) Add task context and data metrics to Python runner logs

Nishanth (Jira) Thu, 06 Nov 2025 20:50:09 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-54223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Nishanth updated SPARK-54223:
-----------------------------
    Description: 
Currently, log messages in PythonRunner do not include Spark Task Context 
information such as the Task ID or Partition ID.

Additionally, the logs lack useful execution metrics (e.g., number of records 
processed, data size), which makes it difficult to correlate Python process 
behavior with specific Spark tasks.

When debugging UDF performance issues, hangs, or data skew, it’s challenging to 
identify which task or dataset portion caused the issue.

  was:
Currently, log messages in {{PythonRunner}} (used to execute Python UDFs) do 
not include Spark Task Context information such as Task ID or Partition ID.

When debugging UDF performance issues or hangs, it’s difficult to trace which 
Spark task or partition corresponds to the specific Python process that logged 
the message.


> Add task context and data metrics to Python runner logs
> -------------------------------------------------------
>
>                 Key: SPARK-54223
>                 URL: https://issues.apache.org/jira/browse/SPARK-54223
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.5.1, 4.0.1
>            Reporter: Nishanth
>            Priority: Major
>
> Currently, log messages in PythonRunner do not include Spark Task Context 
> information such as the Task ID or Partition ID.
> Additionally, the logs lack useful execution metrics (e.g., number of records 
> processed, data size), which makes it difficult to correlate Python process 
> behavior with specific Spark tasks.
> When debugging UDF performance issues, hangs, or data skew, it’s challenging 
> to identify which task or dataset portion caused the issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-54223) Add task context and data metrics to Python runner logs

Reply via email to