I think I figured it out. I added debug logging to ReactiveDataConsumer and ReactiveEntityConsumer, and then I injected randomized sleeps at various points in the code in order to try to induce races. Check out the following:
https://gist.githubusercontent.com/rschmitt/51ba439f5141e6cac8b6e148e29c61fd/raw/2f4e13012451e93c90ccda29ad9f1c37c051f1ed/reactive-race.txt Here, the two threads (an RxJava thread and a client thread) race in such a way that the call to #streamEnd never results in a call to Subscriber#onComplete from either thread: 1. The client calls #streamEnd 2. The client acquires the flush lock 3. The client sees that the subscriber is `null` and decides to return 4. RxJava subscribes 5. RxJava sees that the lock is already held and returns 6. The client releases the lock In the logs, (3) and (4) are reported in reverse order. The `subscriber` field is volatile, so we are dealing with a race condition, not a memory model issue. I'll need to give some thought to how to fix this. The obvious solution is for ReactiveDataConsumer#subscribe to acquire (spinwait on?) the flush lock before the store to the `subscriber` field takes place. On Sun, Dec 22, 2019 at 5:10 AM Oleg Kalnichevski <[email protected]> wrote: > On Sun, 2019-12-22 at 10:28 +0100, Oleg Kalnichevski wrote: > > On Sat, 2019-12-21 at 10:58 +0100, Oleg Kalnichevski wrote: > > > > > > > > > ... > > > > > > > > It appears that the defect only occurs with TLS transport and > > > entity > > > enclosing requests. It also looks to be a race condition of some > > > sort. > > > It is nearly impossible to reproduce with context / wire logging > > > on. > > > > > > I will continue looking into it tomorrow. > > > > > > Oleg > > > > > > > Hi Ryan > > > > I can now reliably reproduce the issue with context / wire logging > > on. > > > > So far it looks quite certain there is a race condition of a sort in > > the reactive binding layer. > > > > All failing tests get stuck here > > > > > > https://github.com/apache/httpcomponents-client/pull/181/commits/c1b060703ac57fc4806d1249b906a385648862b5#diff-4aa21c6723799de37acda71cbe858faaR254 > > > > However all message exchanges clearly look completed in the context > > log > > and there appear to be no issues in the protocol handling layer. > > > > It would be great if you could take a look. I will continue debugging > > reactive binding code in the meantime. > > > > Cheers > > > > Oleg > > > > > > Without #publisherToByteArray conversion all tests pass consistently > without a single failure after many repetitions. > > I will start looking into ReactiveDataConsumer. Any help would be > appreciated. > > Oleg > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
