On 08/18/2009 01:05 PM, Daniel Stenberg wrote: > For example, he was using an older libssh2 than you. Are you also on > Windows?
I use Fedora Linux and RHEL. It is reproducible issue. > It does give me the feeling it might be a libssh2 bug, but without more > details we can't tell for sure. > >> Infinite loop (invoking of clock_gettime, poll, recv functions) loads >> CPU. > > But why does it loop? When the EAGAIN situation is returned from > libssh2, libcurl checks for what to wait for socket-wise and that > shouldn't happen immediately. Is libssh2 telling us to wait for the > wrong direction in this situation? Can you dig up more specific details > on what exactly is happening? Looks like it is not a libssh2 bug. I've compiled 7.19.5 version of curl with the latest version of libssh2 (1.2). The test is simple (to upload a file to a remote host by SFTP): src/curl -T ./big.iso -u anyuser:anypass sftp://my_host.com/~/ after that curl sends .iso file to the host. And gets 99% of CPU. libssh2 opens a socket for sending the file: socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 4 file/socket descriptor is 4. After that: send(4, "D\320kE\33\10\253:\241~O\261\2407l\346\313\2275^\313X\305\303RY\264Y\201foH|"..., 16452, MSG_NOSIGNAL) = 16452 //sending data from the file recv(4, 0x827c7e4, 16384, MSG_NOSIGNAL) = -1 EAGAIN (Resource temporarily unavailable) //there is no socket data clock_gettime(CLOCK_MONOTONIC, {14965, 269959641}) = 0 clock_gettime(CLOCK_MONOTONIC, {14965, 270097791}) = 0 clock_gettime(CLOCK_MONOTONIC, {14965, 270215983}) = 0 poll([{fd=4, events=POLLOUT}], 1, 1000) = 1 ([{fd=4, revents=POLLOUT}]) poll([{fd=4, events=POLLOUT}], 1, 0) = 1 ([{fd=4, revents=POLLOUT}]) recv(4, 0x827c7e4, 16384, MSG_NOSIGNAL) = -1 EAGAIN (Resource temporarily unavailable) //check again, but still nothing clock_gettime(CLOCK_MONOTONIC, {14965, 270749652}) = 0 clock_gettime(CLOCK_MONOTONIC, {14965, 270877781}) = 0 clock_gettime(CLOCK_MONOTONIC, {14965, 270972324}) = 0 poll([{fd=4, events=POLLOUT}], 1, 1000) = 1 ([{fd=4, revents=POLLOUT}]) poll([{fd=4, events=POLLOUT}], 1, 0) = 1 ([{fd=4, revents=POLLOUT}]) recv(4, 0x827c7e4, 16384, MSG_NOSIGNAL) = -1 EAGAIN (Resource temporarily unavailable) //check again, nothing clock_gettime(CLOCK_MONOTONIC, {14965, 271424763}) = 0 . . . and many calls of "clock_gettime" and "poll" functions until: recv(4, "\207<{\...@\t\256\36|\34\33\24P{\0A\212q \346\34_\323\270V\322\225\222-\213\243\237\237"..., 16384, MSG_NOSIGNAL) = 68 //the data is received from the socket Then we read the file and send the next portion of data: read(3, "\1CD001\1\0LINUX "..., 16384) = 16384 //read file data send(4, "D\320kE\33\10\253:\241~O\261\2407l\346\313\2275^\313X\305\303RY\264Y\201foH|"..., 16452, MSG_NOSIGNAL) = 16452 //send invoke clock_gettime, poll & recv again and again until the data is received from the socket. Looks like these functions: clock_gettime, poll & recv are invoked in infinite loop, because the socket is non-blocking and “recv” syscall does not wait for data from the socket but exits immediately. Here is the result of the test "Who does use the most of CPU time": % time calls errors syscall ------ ----------- ----------- --------- --------- 39.82 70527 clock_gettime 34.33 46339 poll 20.92 23251 23199 recv 1.10 38 send clock_gettime, poll are invoked very often. We can see the following backtraces: clock_gettime is invoked: #0 0x00a2ecf4 in clock_gettime () from /lib/librt.so.1 #1 0x0074cf69 in curlx_tvnow () from /home/nortel/curl-7.19.5/lib/.libs/libcurl.so.4 #2 0x00777842 in Curl_socket_ready () from /home/nortel/curl-7.19.5/lib/.libs/libcurl.so.4 #3 0x0076e8cf in Curl_perform () from /home/nortel/curl-7.19.5/lib/.libs/libcurl.so.4 #4 0x0076f58d in curl_easy_perform () from /home/nortel/curl-7.19.5/lib/.libs/libcurl.so.4 #5 0x0804fc74 in main () clock_gettime is invoked: #0 0x00a2ecc6 in clock_gettime () from /lib/librt.so.1 #1 0x002f6f69 in curlx_tvnow () from /home/nortel/curl-7.19.5/lib/.libs/libcurl.so.4 #2 0x003176bd in Curl_readwrite () from /home/nortel/curl-7.19.5/lib/.libs/libcurl.so.4 #3 0x003188ea in Curl_perform () from /home/nortel/curl-7.19.5/lib/.libs/libcurl.so.4 #4 0x0031958d in curl_easy_perform () from /home/nortel/curl-7.19.5/lib/.libs/libcurl.so.4 #5 0x0804fc74 in main () poll is invoked: #0 0x00e25402 in __kernel_vsyscall () #1 0x003fc93b in poll () from /lib/libc.so.6 #2 0x003218ce in Curl_socket_ready () from /home/nortel/curl-7.19.5/lib/.libs/libcurl.so.4 #3 0x003188cf in Curl_perform () from /home/nortel/curl-7.19.5/lib/.libs/libcurl.so.4 #4 0x0031958d in curl_easy_perform () from /home/nortel/curl-7.19.5/lib/.libs/libcurl.so.4 #5 0x0804fc74 in main () So, poll and clock_gettime are invoked from the libcurl. Looks like these backtraces are performed by the following code in infinite loop (lib/transfer.c file, Transfer() function): switch (Curl_socket_ready(fd_read, fd_write, timeout_ms)) { case -1: /* select() error, stop reading */ #ifdef EINTR /* The EINTR is not serious, and it seems you might get this more often when using the lib in a multi-threaded environment! */ if(SOCKERRNO == EINTR) continue; #endif return CURLE_RECV_ERROR; /* indicate a network problem */ case 0: /* timeout */ default: /* readable descriptors */ result = Curl_readwrite(conn, &done); /* "done" signals to us if the transfer(s) are ready */ break; } > libcurl is supposed to do that blocking magic on its own > using the same mechanism that libssh2 uses internally when blocking is > selected. It would then rather indicate that libssh2 _is_ right and that > libcurl gets confused for some reason. Could you please provide info, how to enable this mechanism on libcurl level? "--disable-nonblock" option for the configure does not help and curl still loads CPU to 100%. >> Could you please tell me, can these changes cause any issues in >> libcurl work? > > Yes. It'll make libcurl work less like it is supposed to, as libcurl > wants to work non-blocking with the transfer-layer. It'll be most > evident if you use the multi interface. Blocking mode in the libssh2 is performed by non-blocked socket and the select syscall. So, libcurl just waits for return from a libssh2 call ("libssh2_sftp_write", probably). After that it will invoke the same code. Could you please tell me, what kind of issues can this cause? > I'd like us to fix all bugs, yes. But fixing bugs is a lot of hard work > and we're all busy people. I hope we can help you track this down when > you tell us more details from what you experience. I understand. You're doing very good job. Thanks a lot. Please, see details above and feel free to comment. Thanks, Pavel
