I think I figured it out. I added debug logging to ReactiveDataConsumer and
ReactiveEntityConsumer, and then I injected randomized sleeps at various
points in the code in order to try to induce races. Check out the following:

https://gist.githubusercontent.com/rschmitt/51ba439f5141e6cac8b6e148e29c61fd/raw/2f4e13012451e93c90ccda29ad9f1c37c051f1ed/reactive-race.txt

Here, the two threads (an RxJava thread and a client thread) race in such a
way that the call to #streamEnd never results in a call to
Subscriber#onComplete from either thread:

1. The client calls #streamEnd
2. The client acquires the flush lock
3. The client sees that the subscriber is `null` and decides to return
4. RxJava subscribes
5. RxJava sees that the lock is already held and returns
6. The client releases the lock

In the logs, (3) and (4) are reported in reverse order. The `subscriber`
field is volatile, so we are dealing with a race condition, not a memory
model issue.

I'll need to give some thought to how to fix this. The obvious solution is
for ReactiveDataConsumer#subscribe to acquire (spinwait on?) the flush lock
before the store to the `subscriber` field takes place.

On Sun, Dec 22, 2019 at 5:10 AM Oleg Kalnichevski <[email protected]> wrote:

> On Sun, 2019-12-22 at 10:28 +0100, Oleg Kalnichevski wrote:
> > On Sat, 2019-12-21 at 10:58 +0100, Oleg Kalnichevski wrote:
> > >
> >
> >
> > ...
> >
> > >
> > > It appears that the defect only occurs with TLS transport and
> > > entity
> > > enclosing requests. It also looks to be a race condition of some
> > > sort.
> > > It is nearly impossible to reproduce with context / wire logging
> > > on.
> > >
> > > I will continue looking into it tomorrow.
> > >
> > > Oleg
> > >
> >
> > Hi Ryan
> >
> > I can now reliably reproduce the issue with context / wire logging
> > on.
> >
> > So far it looks quite certain there is a race condition of a sort in
> > the reactive binding layer.
> >
> > All failing tests get stuck here
> >
> >
>
> https://github.com/apache/httpcomponents-client/pull/181/commits/c1b060703ac57fc4806d1249b906a385648862b5#diff-4aa21c6723799de37acda71cbe858faaR254
> >
> > However all message exchanges clearly look completed in the context
> > log
> > and there appear to be no issues in the protocol handling layer.
> >
> > It would be great if you could take a look. I will continue debugging
> > reactive binding code in the meantime.
> >
> > Cheers
> >
> > Oleg
> >
> >
>
> Without #publisherToByteArray conversion all tests pass consistently
> without a single failure after many repetitions.
>
> I will start looking into ReactiveDataConsumer. Any help would be
> appreciated.
>
> Oleg
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to