Hi friends,

multi interface users of libcurl, this issue may concern you so please bare with my longish mail and I'll get to the point eventually...

I visited a company yesterday (who's name is left here since it isn't important to this dicussion) that is using libcurl quite heavily and I got to debug libcurl when used by an application simulating hundreds of clients, each doing tens of requests.

These guys of course use the multi_socket API but it turns out at times it seemed to "forget" connections (which caused a hang). It turned out to be an existing (7.19.7) bug in libcurl and it happened like this:

 app calls curl_multi_add_handle() to add a new easy handle, libcurl will then
 set it to timeout in 1 millisecond so libcurl will tell the app about it.

 The app's timeout fires off that there's a timeout, the app calls libcurl as
 we so often document it:

  do {
   res = curl_multi_socket_action(... TIMEOUT ...);
 } while(CURLM_CALL_MULTI_PERFORM == res);

 And this is the problem number one:

 When curl_multi_socket_action() is called with no specific handle, but only
 a timeout-action, it will *only* perform actions within libcurl that are
 marked to run at this time. In this case, the request would go from INIT to
 CONNECT and return CURLM_CALL_MULTI_PERFORM. When the app then calls libcurl
 again, there's no timer set for this handle so it remains in the CONNECT
 state. The CONNECT state is a transitional state in libcurl so it reports no
 sockets there, and thus libcurl never tells the app anything more about that
 easy handle/connection.

 libcurl _does_ set a 1ms timeout for the handle at the end of
 multi_runsingle() if it returns CURLM_CALL_MULTI_PERFORM, but since the loop
 is instant the new job is not ready to run at that point (and there's no
 code that makes libcurl call the app to update the timout for this new
 timeout). It will simply rely on that some other timeout will trigger later
 on or that something else will update the timeout callback. This makes the
 bug fairly hard to repeat.

We can fix this problem in two ways, both with side-effects attached:

 A) The high performance fix is to introduce a loop in lib/multi.c (line 1988
    in current CVS), around the call to multi_runsingle() for the timeout-
    actions and simply check for CURLM_CALL_MULTI_PERFORM internally. This
    has the added benefit that this goes in line with my long-term wishes to
    get rid of the CURLM_CALL_MULTI_PERFORM all together from the public API.

    The downside of this fix, is that the counter we return in
    'running_handles' in several of our public functions then gets a slightly
    new and possibly confusing behavior during times:

    If an app adds a handle that fails to connect (very quickly) it may just
    as well never appear as a 'running_handle' with this fix. Previously it
    would first bump the counter only to get it decreased again at next call.
    Even I have used that change in handle counter to signal "end of a
    transfer". The only *good* way to find the end of a individual transfer
    is calling curl_multi_info_read() to see if it returns one.

    Of course, if the app previously did the looping before it checked the
    counter, it really shouldn't be any new effect.

    Further, curl_multi_info_read() is not as "cheap" as we'd like it to be
    for being this single function to use for this purpose. It currently scans
    over all added easy handles and we should improve that to instead become
    a proper queue of messages that is instant to check if empty.

 B) A way to keep the existing running_handles behavior is to make sure the
    timeout callback gets called when the timeout has been changed for this
    case, but that will delay the processing 1ms and it makes the flow a lot
    uglier.

    But even with such a fix I hesitate to claim that running_handles always
    will get bumped when a new handle has been added. If the handle simply
    dies instantly, it will/should never count as running.

Hm, now when I've spent the time to write all this, I must say I lean towards method (A) rather strongly even though it may alter behavior slightly.

If you've read this far and have an idea of what I've tried to say, I'd say you're qualified to have an opinion on the matter so please tell me what you think or if I've missed some detail or whatever.

--

 / daniel.haxx.se
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html

Reply via email to