Thanks Jeffrey for testing. Committed to trunk in http://svn.apache.org/r1711657, will let it there for a few times (review) and then backport it to 1.6.x and 1.5.x branches.
Regards, Yann. On Wed, Oct 28, 2015 at 3:41 PM, Jeffrey Crowell <[email protected]> wrote: > That patch works, and users have confirmed that the error message added has > appeared in logs. > > (applied to our copy here) > > https://github.com/pagespeed/mod_pagespeed/commit/84a9deaf9c4df13ae707f44d06f577321de46e8c > > Thanks, > Jeff > > On Thu, Sep 24, 2015 at 1:28 PM, Jeffrey Crowell <[email protected]> > wrote: >> >> Hi Yann, >> >> This patch looks like it should fix the hang we've seen based on the >> instruction trace provided from a bug report ( >> http://i.imgur.com/RI3TKrU.png ) where essentially, queries_sent ( >> https://github.com/pagespeed/mod_pagespeed/blob/master/third_party/aprutil/apr_memcache2.c#L1414 >> ) is never decremented. >> >> The only reports of the bug have had HAProxy along with memcached, so I'm >> not sure if something funky is going on there closing a connection in an odd >> way. >> >> One quick question on the parse_size function though. >> >> Based on the protocol.txt for memcached, it seems like it should also be >> valid if endptr[0] == ' ', endptr[1] == '\r', entptr[2] == '\n' >> or even if just endptr[0] == ' ', no? >> >> https://github.com/memcached/memcached/blob/master/doc/protocol.txt#L227 >> >> I'm not really quite sure if the protol description is correct. >> >> Should it in fact read >> >> VALUE <key> <flags> <bytes>[ <cas unique>]\r\n >> >> instead of >> >> VALUE <key> <flags> <bytes> [<cas unique>]\r\n >> >> And if the optional cas_unique token is present, wont this fail? >> >> Jeff >> >> >> >> On Thu, Sep 24, 2015 at 10:33 AM, Yann Ylavic <[email protected]> >> wrote: >>> >>> Hi Jeffrey, >>> >>> On Wed, Sep 23, 2015 at 9:23 PM, Jeffrey Crowell <[email protected]> >>> wrote: >>> > >>> > We have some patches which were created in an attempt to fix some >>> > signed >>> > bugs in the original code that I think may be causing our issue. >>> > >>> > Namely here: >>> > >>> > https://github.com/pagespeed/mod_pagespeed/blob/master/third_party/aprutil/apr_memcache2.c#L1453 >>> > vs here >>> > >>> > https://gist.github.com/crowell/59bfa1bb9f0cda30c48a#file-multiget-c-L192 >>> > The original code uses an atoi(), which has undefined behavior when >>> > called >>> > on non integer strings, leaving len to be 0. >>> > >>> > Second, the check here >>> > >>> > https://github.com/pagespeed/mod_pagespeed/blob/master/third_party/aprutil/apr_memcache2.c#L1457 >>> > vs >>> > https://gist.github.com/crowell/59bfa1bb9f0cda30c48a#file-multiget-c-L199 >>> > >>> > In the original apr code, this check will never fail. len is an >>> > apr_size_t, >>> > and will never be < 0. >>> >>> The code you refer has changed with [1] (i.e. APR-1.4.3, the current >>> version is 1.5.2), but still does not seem to fix the invalid/unknown >>> length/value/type issue, which are both unexpected errors in >>> apr_memcache, and as such should probably terminate the connection... >>> >>> > >>> > The issue here now is that the check in the "server sent back a key >>> > that i >>> > didn't ask for" >>> > >>> > https://github.com/pagespeed/mod_pagespeed/blob/master/third_party/aprutil/apr_memcache2.c#L1507 >>> > (same in both forked and upstream), apr_pollset_remove, is never >>> > called, and >>> > the number of queries sent is never changed. >>> > >>> > Does it seem possible that this is causing the server to spin here >>> >>> Yes, in the unexpected cases mentioned above, the socket is not >>> "consumed" (i.e. apr_brigade_partition() is not called), which let >>> some immediatly poll()able data for the next loops. >>> >>> > would apr be interested in accepting a patch to fix the signedness >>> > issues? >>> >>> Of course! I included parse_size() in the attached patch which tries >>> to address the parsing/unexpected issues by aborting the connection >>> and returning an error. >>> Does it work for you (it is based on trunk, so you may need to adapt >>> it to your version, which preferably should be the latest...)? >>> >>> I'm not sure about the handling of empty values (the real len == 0 >>> case, not the parsing error). >>> The previous code did not expect the trailing CRLF, while this patch >>> does... >>> >>> Regards, >>> Yann. >>> >>> [1] http://svn.apache.org/r982408 >> >> >
