[
https://issues.apache.org/jira/browse/BEAM-9399?focusedWorklogId=482036&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-482036
]
ASF GitHub Bot logged work on BEAM-9399:
----------------------------------------
Author: ASF GitHub Bot
Created on: 11/Sep/20 11:52
Start Date: 11/Sep/20 11:52
Worklog Time Spent: 10m
Work Description: scwhittle commented on pull request #12825:
URL: https://github.com/apache/beam/pull/12825#issuecomment-691048364
@lukecwik Luke can you review as you have context from last time?
Changing to synchronizing on buffer instead of the PrintStream changes the
precondition to just enforce the invariant on when publish is called within our
custom PrintStream. That means that the original deadlock can still occur if
this happens:
T1: synchronizes on System.err (Throwable.printStackTrace for example),
publishes to handler
T2: synchronized within handler, tries to report error to System.err using
Throwable.printStackTrace which synchronizes on the PrintStream
I removed that case by using the custom ErrorManager to print to the
original stderr stream, an alternative would be to change the ErrorManager
still use our custom PrintStream but remove use of Throwable.printStackTrace so
as not to sychronize on the PrintStream.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 482036)
Time Spent: 6h 20m (was: 6h 10m)
> Possible deadlock between DataflowWorkerLoggingHandler and overridden
> System.err PrintStream
> --------------------------------------------------------------------------------------------
>
> Key: BEAM-9399
> URL: https://issues.apache.org/jira/browse/BEAM-9399
> Project: Beam
> Issue Type: Bug
> Components: runner-dataflow
> Reporter: Sam Whittle
> Assignee: Sam Whittle
> Priority: P3
> Fix For: 2.21.0
>
> Time Spent: 6h 20m
> Remaining Estimate: 0h
>
> When an exception is encountered in DataflowWorkerLoggingHandler the
> ErrorManager is used to log the exception. ErrorManager uses System.err
> which is overridden to be a PrintStream that writes back into
> DataflowWorkerLoggingHandler.
> This has the lock ordering DataflowWorkerLoggingHandler -> PrintStream.
> Other logging of System.err has the inverse lock ordering
> PrintStream->DataflowWorkerLoggingHandler so there is potential for deadlock.
> This is one known cause of the inversion, but any other System.err logs from
> inside DataflowWorkerLoggingHandler could cause the same issue.
> Proposed fix is to address low-hanging fruit of having ErrorManager output to
> the original System.err. A full fix would be to improve our override of
> System.err to a PrintStream that can detect the locking inversion or possibly
> we could use the PrintStream mutex in both cases.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)