[
https://issues.apache.org/jira/browse/BEAM-9399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sam Whittle reopened BEAM-9399:
-------------------------------
Reopening to track the following exception triggered by a user logging an
exception to System.stderr. This happens because the PrintStream itself is used
for synchronization by other classes, in this case Throwable.printStackTrace
locks the stream.
Instead of using the stream itself for sychronization of the buffer, we should
use the buffer or another lock so that this is not possible.
java.lang.IllegalStateException: BEAM-9399: publish should not be called with
the lock as it may cause deadlock
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState(Preconditions.java:507)
org.apache.beam.runners.dataflow.worker.logging.JulHandlerPrintStreamAdapterFactory$JulHandlerPrintStream.publishIfNonEmpty(JulHandlerPrintStreamAdapterFactory.java:380)
org.apache.beam.runners.dataflow.worker.logging.JulHandlerPrintStreamAdapterFactory$JulHandlerPrintStream.println(JulHandlerPrintStreamAdapterFactory.java:332)
java.lang.Throwable$WrappedPrintStream.println(Throwable.java:748)
java.lang.Throwable.printStackTrace(Throwable.java:655)
java.lang.Throwable.printStackTrace(Throwable.java:643)
org.slf4j.simple.SimpleLogger.writeThrowable(SimpleLogger.java:254)
org.slf4j.simple.SimpleLogger.write(SimpleLogger.java:248)
org.slf4j.simple.SimpleLogger.innerHandleNormalizedLoggingCall(SimpleLogger.java:416)
org.slf4j.simple.SimpleLogger.handleNormalizedLoggingCall(SimpleLogger.java:359)
org.slf4j.helpers.AbstractLogger.handle_0ArgsCall(AbstractLogger.java:382)
org.slf4j.helpers.AbstractLogger.error(AbstractLogger.java:347)
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1441)
> Possible deadlock between DataflowWorkerLoggingHandler and overridden
> System.err PrintStream
> --------------------------------------------------------------------------------------------
>
> Key: BEAM-9399
> URL: https://issues.apache.org/jira/browse/BEAM-9399
> Project: Beam
> Issue Type: Bug
> Components: runner-dataflow
> Reporter: Sam Whittle
> Assignee: Sam Whittle
> Priority: P3
> Fix For: 2.21.0
>
> Time Spent: 6h
> Remaining Estimate: 0h
>
> When an exception is encountered in DataflowWorkerLoggingHandler the
> ErrorManager is used to log the exception. ErrorManager uses System.err
> which is overridden to be a PrintStream that writes back into
> DataflowWorkerLoggingHandler.
> This has the lock ordering DataflowWorkerLoggingHandler -> PrintStream.
> Other logging of System.err has the inverse lock ordering
> PrintStream->DataflowWorkerLoggingHandler so there is potential for deadlock.
> This is one known cause of the inversion, but any other System.err logs from
> inside DataflowWorkerLoggingHandler could cause the same issue.
> Proposed fix is to address low-hanging fruit of having ErrorManager output to
> the original System.err. A full fix would be to improve our override of
> System.err to a PrintStream that can detect the locking inversion or possibly
> we could use the PrintStream mutex in both cases.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)