That patch works, and users have confirmed that the error message added has
appeared in logs.

(applied to our copy here)

https://github.com/pagespeed/mod_pagespeed/commit/84a9deaf9c4df13ae707f44d06f577321de46e8c

Thanks,
Jeff

On Thu, Sep 24, 2015 at 1:28 PM, Jeffrey Crowell <[email protected]>
wrote:

> Hi Yann,
>
> This patch looks like it should fix the hang we've seen based on the
> instruction trace provided from a bug report (
> http://i.imgur.com/RI3TKrU.png ) where essentially, queries_sent (
> https://github.com/pagespeed/mod_pagespeed/blob/master/third_party/aprutil/apr_memcache2.c#L1414
> ) is never decremented.
>
> The only reports of the bug have had HAProxy along with memcached, so I'm
> not sure if something funky is going on there closing a connection in an
> odd way.
>
> One quick question on the parse_size function though.
>
> Based on the protocol.txt for memcached, it seems like it should also be
> valid if endptr[0] == ' ', endptr[1] == '\r', entptr[2] == '\n'
> or even if just endptr[0] == ' ', no?
>
> https://github.com/memcached/memcached/blob/master/doc/protocol.txt#L227
>
> I'm not really quite sure if the protol description is correct.
>
> Should it in fact read
>
> VALUE <key> <flags> <bytes>[ <cas unique>]\r\n
>
> instead of
>
> VALUE <key> <flags> <bytes> [<cas unique>]\r\n
>
> And if the optional cas_unique token is present, wont this fail?
>
> Jeff
>
>
>
> On Thu, Sep 24, 2015 at 10:33 AM, Yann Ylavic <[email protected]>
> wrote:
>
>> Hi Jeffrey,
>>
>> On Wed, Sep 23, 2015 at 9:23 PM, Jeffrey Crowell <[email protected]>
>> wrote:
>> >
>> > We have some patches which were created in an attempt to fix some signed
>> > bugs in the original code that I think may be causing our issue.
>> >
>> > Namely here:
>> >
>> https://github.com/pagespeed/mod_pagespeed/blob/master/third_party/aprutil/apr_memcache2.c#L1453
>> > vs here
>> >
>> https://gist.github.com/crowell/59bfa1bb9f0cda30c48a#file-multiget-c-L192
>> > The original code uses an atoi(), which has undefined behavior when
>> called
>> > on non integer strings, leaving len to be 0.
>> >
>> > Second, the check here
>> >
>> https://github.com/pagespeed/mod_pagespeed/blob/master/third_party/aprutil/apr_memcache2.c#L1457
>> > vs
>> https://gist.github.com/crowell/59bfa1bb9f0cda30c48a#file-multiget-c-L199
>> >
>> > In the original apr code, this check will never fail. len is an
>> apr_size_t,
>> > and will never be < 0.
>>
>> The code you refer has changed with [1] (i.e. APR-1.4.3, the current
>> version is 1.5.2), but still does not seem to fix the invalid/unknown
>> length/value/type issue, which are both unexpected errors in
>> apr_memcache, and as such should probably terminate the connection...
>>
>> >
>> > The issue here now is that the check in the "server sent back a key
>> that i
>> > didn't ask for"
>> >
>> https://github.com/pagespeed/mod_pagespeed/blob/master/third_party/aprutil/apr_memcache2.c#L1507
>> > (same in both forked and upstream), apr_pollset_remove, is never
>> called, and
>> > the number of queries sent is never changed.
>> >
>> > Does it seem possible that this is causing the server to spin here
>>
>> Yes, in the unexpected cases mentioned above, the socket is not
>> "consumed" (i.e. apr_brigade_partition() is not called), which let
>> some immediatly poll()able data for the next loops.
>>
>> > would apr be interested in accepting a patch to fix the signedness
>> issues?
>>
>> Of course! I included parse_size() in the attached patch which tries
>> to address the parsing/unexpected issues by aborting the connection
>> and returning an error.
>> Does it work for you (it is based on trunk, so you may need to adapt
>> it to your version, which preferably should be the latest...)?
>>
>> I'm not sure about the handling of empty values (the real len == 0
>> case, not the parsing error).
>> The previous code did not expect the trailing CRLF, while this patch
>> does...
>>
>> Regards,
>> Yann.
>>
>> [1] http://svn.apache.org/r982408
>>
>
>

Reply via email to