On Thu, Apr 07, 2016 at 03:07:55PM +0200, Willy Tarreau wrote:
> It definitely is, as any piece of information we can find around this. I was
> wondering whether there was something in relation with the max TLS record
> size maybe. The other possibility would be that the fix uncovered another
> bug which was hidden by this fix. Unfortunately for now I failed to find
> which one by just reading the code, and I still couldn't find any way to
> reproduce the issue to try to analyze it deeper :-(

So I did a new series of tests, with servers slower than client and
conversely, with SSL on either and both sides, but couldn't get anywhere
near the problem. However I'm having a few thoughts and now I need to
re-audit the 1.5 code regarding this. The buffer full semantics were
a bit different in 1.5 compared to 1.6. The channel_full() function as
it is now indicates whether or not the buffer is full with *incoming*
data more or less the reserve. The previous one would indicate whether
it was full, and only the data in transit were only deduced from the
reserve. In 1.5 this is used to decide when to enable polling for reading,
and buffer_max_len() is used to determine how much can be read into a
buffer, either from an applet or from the network.

What I'm suspecting is that I didn't fully understand the extent of the
difference between 1.5 and 1.6 in this area and that I possibly broke
either buffer_max_len(), channel_full() or both in some corner cases,
resulting in the following scenario and which seems compatible with the
trace Janusz provided :

  - the buffer is in a certain state that is still left to be figured out
  - channel_full() returns zero because there are enough data in transit
    to compensate for the reserve, which is already full
  - stream_interface says "good, there's some room left, let's poll for read"
  - poll returns "read ready".
  - the stream interface data handler is called to perform the read, calls
    bi_avail() to determine how much it can read, founds zero indicating the
    buffer is full, then says "stop telling me to read, it's full".
  - stream interface then finishes a few adjustments, sees thanks to
    channel_full() that there's some room left and enables reading again.
  - goto $-3

If someone who can reliably reproduce the issue could check whether 1.6 has
the same issue, it would help me cut the problem in half. That obviously
excludes all those running sensitive production of course.

Cheers,
Willy


Reply via email to