Re: [AOLSERVER] SSL data truncation

John Caruso Wed, 15 Jul 2009 17:45:25 -0700

On Wednesday 04:26 PM 7/15/2009, Tom Jackson wrote:

Your SF bug report says that you put in a 300 millisecond delay.
Where? Even if you think that such a fix is not good, it would be
helpful to at least know what works.

There's a massive amount of debugging I've done on this that's notincluded in the bug report, actually, for reasons of brevity. But I didstate that the workaround is to "insert a delay before the data startsbeing read by ns_https{post,get}"--or in other words, immediately beforethe loops commented with "Read the content" in ns_httpspost/ns_httpsget:


----- 8< ----------------------------------------------------------
        #
        # Read the content.
        #

        while 1 {
            set buf [_ns_https_read $timeout $rfd $length]
            append page $buf
            [...]
----- 8< ----------------------------------------------------------

The "after X" statement would go immediately before this while loop.

You also talk about truncation, but then the truncation stops if the
received data goes above 81000.

It might be a good idea to narrow down when the bug appears (what byte
value) and when it goes away again. This might suggest something.

I tried that, and it was suggestive but ultimately not much help indebugging the problem. For one thing, the byte values vary by platform,and aren't even consistent on the same platform (i.e., a given byte sizemight work or fail depending on the run). It's a timing issue, as I saidin the bug report. However, if you're curious, this is an analysis of theerrors at various byte values taken from our internal bug report for thisissue:


----- 8< ----------------------------------------------------------

The error shows up consistently (99.9+% of the time) at 74000 through81000 bytes (counting by 1000), so I've been using the range of70000-83000 for testing. Also, some specific testing showed that theerrors actually kick in reliably at 73729 bytes; note that 73728=8192*9.And in all the succeeding sizes until the errors stop again, the socketreturns exactly 73728 bytes of data regardless of the request size. Thisparticular run of consistent errors stops at 81884 bytes (though there area few rare successes in that range), which doesn't have any suggestivepowers of 2.

So it seems clear that the buffer size affects the reliability in at leasttwo ways: 1) larger sizes are more likely to fail, and 2) certainmultiples of 8192 are particularly significant in that they're the lastworking size before a long stretch of failing sizes (all of which returnthat last working size). In addition to 73728=8192*9, I verified that thishappens at 90112=8192*11 and 106496=8192*13, and that it does NOT happenat 81920=8192*10 or 57344=8192*7. So it would appear that odd multiples of8192 where the multiplier is >= 9 are the ones that typically startlengthy failure sequences.

----- 8< ----------------------------------------------------------

Note that this analysis only applies to RHEL4 (the byte-size analysis forMac OS X is similar, but the multipliers and trigger levels are different,though I didn't record the actual values). And even on RHEL4 these aren'tthe only values that fail--other smaller and larger buffer sizes will failtoo, just not as consistently.


- John


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to 
<[email protected]> with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
field of your email blank.

Re: [AOLSERVER] SSL data truncation

Reply via email to