cweld510 opened a new pull request #224:
URL: https://github.com/apache/httpcomponents-core/pull/224
I’ve observed a deadlock when writing to a ContentOutputStream obtained
through AbstractClassicEntityProducer. We’ve looked through the layers at how
ContentOutputStream’s SharedOutputBuffer coordinates between threads and traced
this to a race condition in AbstractHttp1StreamDuplexer. The issue is that
AbstractHttp1StreamDuplexer.requestSessionOutput can be called, but the event
loop never calls onOutput, which then doesn’t call produceOutput.
When a thread calls requestSessionOutput in the duplexer, it increments an
AtomicInteger called outputRequests (used to track the current number of
requests for output), and sets an OP_WRITE event. An event thread notices the
OP_WRITE state, and responds to it by calling
AbstractHttp1StreamDuplexer.onOutput(). This method then calls
AbstractHttp1StreamDuplexer.produceOutput(). After the event thread returns, if
requestSessionOutput has not been called since the event thread returned from
produceOutput, the thread clears the event and sets outputRequests to zero.
Otherwise, it decrements outputRequests after it finishes.
How this is going wrong: Let’s say that
AbstractHttp1StreamDuplexer.outputRequests starts with a value of 1 and that
the event thread is in the middle of a call to onOutput and has just returned
from calling produceOutput. At that moment, a call to requestSessionOutput
increments the outputRequests counter to 2. Then line 350: “final int
pendingOutputRequests = outputRequests.get();” pendingOutputRequests is set to
2, and then we set pendingOutputRequeests back to zero and unset the OP_WRITE
state in lines 356 and 357:
if (!outputPending && !outbuf.hasData() &&
outputRequests.compareAndSet(pendingOutputRequests, 0)) {
ioSession.clearEvent(SelectionKey.OP_WRITE);
Because the OP_WRITE event was cleared, onOutput (and thus produceOutput)
will not be called again, even though a caller requested one.
The simplest way to fix this problem is to set pendingOutputRequests before
the call to produceOutput; this makes the condition on line 356 false whenever
requestSessionOutput is called directly after produceOutput. This change is
included in my pull request.
I wasn’t able to reproduce the issue in a unit test because
AbstractHttp1StreamDuplexer has several dependencies (e.g. the behavior of
event threads) that I couldn’t figure out how to mock well, and the class
doesn’t have any existing unit tests. A test in my company’s codebase that
reliably reproduced the race condition passes after the change, but failed
before, and I have verified that the change works as expected by using my
debugger. However, I’d be happy to create a unit test to verify the correctness
of this change, if there is a way to do so effectively.
Does this seem like the correct fix?
Thank you! --Colin
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]