My investigation has revealed a bug in HTTP/2 input window management. I think
the issue is in either AbstractHttp2StreamMultiplexer#updateInputCapacity or
Http2StreamChannelImpl#update. When the ReactiveDataConsumer sends N bytes
downstream to its Subscriber, it calls CapacityChannel#update and passes in an
increment of N bytes. For reference, this is the contract for that method:
/**
* Updates data capacity information through this channel. The total number
of
* bytes the consumer is capable of accepting is incremented
* by the given increment number.
*
* @param increment non-negative number of extra bytes the consumer
* can accept.
*/
void update(int increment) throws IOException;
However, Http2StreamChannelImpl#update implements that method by by way of
trivial delegation to #updateInputCapacity, so here's how the update is
actually processed:
private void updateInputCapacity(
final int streamId, final AtomicInteger inputWindow, final int
inputCapacity) throws IOException {
if (inputCapacity > 0) {
final int streamWinSize = inputWindow.get();
final int chunk = inputCapacity - streamWinSize;
if (chunk > 0) {
final RawFrame windowUpdateFrame =
frameFactory.createWindowUpdate(streamId, chunk);
commitFrame(windowUpdateFrame);
updateInputWindow(streamId, inputWindow, chunk);
}
}
}
In other words, the value that I'm passing as a capacity *increment* is being
interpreted as the total *capacity*, and the update gets dropped on the floor.
This causes the input window to permanently shrink over time, as more and more
increments are discarded. The protocol will either deadlock or degenerate into
a series of two-byte DATA frames. (The current implementation of
`updateInputCapacity` appears to work fine as long as each DATA and
WINDOW_UPDATE frame has a length equal to the full window size, typically
65535. ReactiveClientTest works fine with a block size of 65535, but it
completely breaks with a block size of 65542.)
I believe the fix is to modify Http2StreamChannelImpl#update in such a way that
its input (a capacity increment) will either be converted into a total capacity
or passed into a method
(AbstractHttp2StreamMultiplexer#incrementInputCapacity?) designed to work with
an increment. The problem is that there exist other classes, such as
AbstractCharDataConsumer, that depend on the current behavior:
/**
* Triggered to obtain the current capacity of the consumer.
*
* @return the number of bytes this consumer is prepared to process.
*/
protected abstract int capacity();
@Override
public final void updateCapacity(final CapacityChannel capacityChannel)
throws IOException {
capacityChannel.update(capacity());
}
The default constructor for StringAsyncEntityConsumer sets capacityIncrement to
Integer.MAX_VALUE; using that value as an increment seems to result in an
undetected overflow, and then a protocol deadlock.
My personal opinion is that there are deeper design issues here that need to be
worked out. CapacityChannel is sometimes given an increment, and sometimes
given a total capacity; capacity updates can be signaled asynchronously, but
are also sort of requested synchronously (by way of calls to the
AsyncDataConsumer#updateCapacity method, or the
Http2Stream#produceInputCapacityUpdate method, or a method like
AbstractCharDataConsumer#capacity). This inconsistency is resulting in
confusion and complexity.
On 10/12/18, 4:26 AM, "Oleg Kalnichevski" <[email protected]> wrote:
Hi Ryan
The ReactiveClientTest has been failing intermittently for some time
now. It only happens with HTTP/2 transport and it mostly happens with
Travis CI. Travis CI has been very effective in reproducing all sorts
of synchronization issues and race conditions in the past.
Given this is relatively new code and there are no other failing
integration tests I tend to suspect this is due to a race condition in
the Reactive Streams message handlers.
Would you have time to take a look?
Cheers
Oleg
[ERROR]
testLongRunningRequest[FORCE_HTTP_2](org.apache.hc.core5.testing.reactive.ReactiveClientTest)
Time elapsed: 30.032 s <<< ERROR!
java.lang.RuntimeException:
org.apache.hc.core5.http2.H2StreamResetException: Timeout due to inactivity (30
SECONDS)
at
org.apache.hc.core5.testing.reactive.ReactiveClientTest.testLongRunningRequest(ReactiveClientTest.java:302)
Caused by: org.apache.hc.core5.http2.H2StreamResetException: Timeout due to
inactivity (30 SECONDS)
https://travis-ci.org/apache/httpcomponents-core/jobs/440564812#L3351
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]