Huanli Wang created SPARK-42565:
-----------------------------------

             Summary: Error log improvement for RocksDB state store instance 
lock acquisition
                 Key: SPARK-42565
                 URL: https://issues.apache.org/jira/browse/SPARK-42565
             Project: Spark
          Issue Type: Improvement
          Components: Structured Streaming
    Affects Versions: 3.5.0
            Reporter: Huanli Wang


"23/02/23 23:57:44 INFO Executor: Running task 2.0 in stage 57.1 (TID 363)
"23/02/23 23:58:44 ERROR RocksDB StateStoreId(opId=0,partId=3,name=default): 
RocksDB instance could not be acquired by [ThreadId: Some(49), task: 3.0 in 
stage 57, TID 363] as it was not released by [ThreadId: Some(51), task: 3.1 in 
stage 57, TID 342] after 60002 ms.
We are seeing those error messages for a testing query. The `taskId != 
partitionId` but we fail to clarify this in the error log.

It's confusing when we see those logs: the second log entry seems to talk about 
`task 3.0` (it's actually partition 3 and retry attempt 0), but the `TID 363` 
is already occupied by `task 2.0 in stage 57.1`.

 

Also it's unclear at which stage retry attempt, the lock is acquired (or fails 
to be acquired)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to