cxzl25 commented on PR #1946:
URL: https://github.com/apache/auron/pull/1946#issuecomment-3788513889

   > Could you share the root cause and troubleshooting clues for this q14b 
issue
   
   Judging from the call stack, the accumUpdates of TaskResult contains null 
values.
   
   ```java
   ERROR TaskResultGetter: Exception while getting task result
   java.lang.NullPointerException
        at 
org.apache.spark.scheduler.TaskResultGetter$$anon$3.$anonfun$run$3(TaskResultGetter.scala:109)
   ```
   
   The only place where `TaskMetrics#externalAccums` might be updated is 
`TaskMetrics#registerAccumulator`, which in turn is invoked via 
`taskContext.registerAccumulator(this)` called by the deserialization of 
`AccumulatorV2#readObject`. From the perspective of the code, it is highly 
unlikely for a null value to be written here.
   
   Since `externalAccums` is an ArrayBuffer and not thread-safe, I tried adding 
the synchronized keyword to `TaskMetrics#registerAccumulator`, and the NPE 
issue was resolved. I then added some logging to record which threads were 
accessing this method concurrently, and ultimately identified that Auron's 
deserialization of expressions may be the root cause of this problem.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to