My investigation has revealed a bug in HTTP/2 input window management. I think 
the issue is in either AbstractHttp2StreamMultiplexer#updateInputCapacity or 
Http2StreamChannelImpl#update. When the ReactiveDataConsumer sends N bytes 
downstream to its Subscriber, it calls CapacityChannel#update and passes in an 
increment of N bytes. For reference, this is the contract for that method:

    /**
     * Updates data capacity information through this channel. The total number 
of
     * bytes the consumer is capable of accepting is incremented
     * by the given increment number.
     *
     * @param increment non-negative number of extra bytes the consumer
     * can accept.
     */
    void update(int increment) throws IOException;

However, Http2StreamChannelImpl#update implements that method by by way of 
trivial delegation to #updateInputCapacity, so here's how the update is 
actually processed:

    private void updateInputCapacity(
            final int streamId, final AtomicInteger inputWindow, final int 
inputCapacity) throws IOException {
        if (inputCapacity > 0) {
            final int streamWinSize = inputWindow.get();
            final int chunk = inputCapacity - streamWinSize;
            if (chunk > 0) {
                final RawFrame windowUpdateFrame = 
frameFactory.createWindowUpdate(streamId, chunk);
                commitFrame(windowUpdateFrame);
                updateInputWindow(streamId, inputWindow, chunk);
            }
        }
    }

In other words, the value that I'm passing as a capacity *increment* is being 
interpreted as the total *capacity*, and the update gets dropped on the floor. 
This causes the input window to permanently shrink over time, as more and more 
increments are discarded. The protocol will either deadlock or degenerate into 
a series of two-byte DATA frames. (The current implementation of 
`updateInputCapacity` appears to work fine as long as each DATA and 
WINDOW_UPDATE frame has a length equal to the full window size, typically 
65535. ReactiveClientTest works fine with a block size of 65535, but it 
completely breaks with a block size of 65542.)

I believe the fix is to modify Http2StreamChannelImpl#update in such a way that 
its input (a capacity increment) will either be converted into a total capacity 
or passed into a method 
(AbstractHttp2StreamMultiplexer#incrementInputCapacity?) designed to work with 
an increment. The problem is that there exist other classes, such as 
AbstractCharDataConsumer, that depend on the current behavior:

    /**
     * Triggered to obtain the current capacity of the consumer.
     *
     * @return the number of bytes this consumer is prepared to process.
     */
    protected abstract int capacity();

    @Override
    public final void updateCapacity(final CapacityChannel capacityChannel) 
throws IOException {
        capacityChannel.update(capacity());
    }

The default constructor for StringAsyncEntityConsumer sets capacityIncrement to 
Integer.MAX_VALUE; using that value as an increment seems to result in an 
undetected overflow, and then a protocol deadlock.

My personal opinion is that there are deeper design issues here that need to be 
worked out. CapacityChannel is sometimes given an increment, and sometimes 
given a total capacity; capacity updates can be signaled asynchronously, but 
are also sort of requested synchronously (by way of calls to the 
AsyncDataConsumer#updateCapacity method, or the 
Http2Stream#produceInputCapacityUpdate method, or a method like 
AbstractCharDataConsumer#capacity). This inconsistency is resulting in 
confusion and complexity.


On 10/12/18, 4:26 AM, "Oleg Kalnichevski" <[email protected]> wrote:

    Hi Ryan
    
    The ReactiveClientTest has been failing intermittently for some time
    now. It only happens with HTTP/2 transport and it mostly happens with
    Travis CI. Travis CI has been very effective in reproducing all sorts
    of synchronization issues and race conditions in the past. 
    
    Given this is relatively new code and there are no other failing
    integration tests I tend to suspect this is due to a race condition in
    the Reactive Streams message handlers. 
    
    Would you have time to take a look?
    
    Cheers
    
    Oleg
    
    [ERROR] 
testLongRunningRequest[FORCE_HTTP_2](org.apache.hc.core5.testing.reactive.ReactiveClientTest)
  Time elapsed: 30.032 s  <<< ERROR!
    java.lang.RuntimeException: 
org.apache.hc.core5.http2.H2StreamResetException: Timeout due to inactivity (30 
SECONDS)
        at 
org.apache.hc.core5.testing.reactive.ReactiveClientTest.testLongRunningRequest(ReactiveClientTest.java:302)
    Caused by: org.apache.hc.core5.http2.H2StreamResetException: Timeout due to 
inactivity (30 SECONDS)
    
    https://travis-ci.org/apache/httpcomponents-core/jobs/440564812#L3351
    
    
    
    
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
    
    

Reply via email to