[
https://issues.apache.org/jira/browse/SPARK-56326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jungtaek Lim reassigned SPARK-56326:
------------------------------------
Assignee: Brooks Walls
> Add Streaming query id and batch id to task scheduling logs
> -----------------------------------------------------------
>
> Key: SPARK-56326
> URL: https://issues.apache.org/jira/browse/SPARK-56326
> Project: Spark
> Issue Type: Improvement
> Components: Scheduler, Structured Streaming
> Affects Versions: 4.2.0
> Reporter: Brooks Walls
> Assignee: Brooks Walls
> Priority: Minor
> Labels: pull-request-available
>
> Currently, logs involving the scheduling of tasks do not contain information
> such as query id and batch id for streaming queries. This makes debugging
> streaming queries confusing, especially when there are multiple queries
> running. Lets add the query id and batch id to each log about task scheduling
> when processing a streaming query.
> The current logs involving task scheduling look like this:
> {code:java}
> 6/03/11 22:03:29 INFO FairSchedulableBuilder: Added task set TaskSet_486190.0
> tasks to pool 1772179380933
> 6/03/11 22:03:29 INFO TaskSetManager: Starting task 0.0 in stage 486190.0
> (TID 3075017) (10.68.141.175,executor 13, partition 0, PROCESS_LOCAL,
> 6/03/11 22:03:29 INFO TaskSetManager: Finished task 0.0 in stage 486190.0
> (TID 3075017) in 52 ms on 10.68.141.175 (executor 13) (1/1){code}
> Lets add query and batch information:
> {code:java}
> 6/03/11 22:03:29 INFO FairSchedulableBuilder: [queryId = 71c67] [batchId =
> 685] Added task set TaskSet_486190.0 tasks to pool 1772179380933
> 6/03/11 22:03:29 INFO TaskSetManager: [queryId = 71c67] [batchId = 685]
> Starting task 0.0 in stage 486190.0 (TID 3075017) (10.68.141.175,executor 13,
> partition 0, PROCESS_LOCAL,
> 6/03/11 22:03:29 INFO TaskSetManager: [queryId = 71c67] [batchId = 685]
> Finished task 0.0 in stage 486190.0 (TID 3075017) in 52 ms on 10.68.141.175
> (executor 13) (1/1){code}
> *Impact:* This change is strictly additive to logs and does not change any
> public APIs or internal scheduling logic.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]