I spent more time today attempting to isolate the root cause, hoping that it 
would be more efficient than trying to send you more / the right information. 
We are fairly confident that we have isolated the root cause, let me try to 
explain as clearly as possible. The issue is with getaddrinfo and the hints 
that are being passed into getaddrinfo by curl, as well as slightly different 
behavior of getaddrinfo in MacOS vs Linux.

The reason why the bug only happens when IPv6 is disabled is the following 
ifdef:
https://github.com/bagder/curl/blob/master/lib/asyn-thread.c#L212

As a result of this ifdef, with IPv6 support the following hints are passed 
into getaddrinfo:
$1 = {
  ai_flags = 0,
  ai_family = 0,
  ai_socktype = 1, // SOCK_STREAM
  ai_protocol = 0,
  ai_addrlen = 0,
  ai_canonname = 0x0,
  ai_addr = 0x0,
  ai_next = 0x0
}
and without IPv6 support:
$1 = {
  ai_flags = 0,
  ai_family = 0,
  ai_socktype = 0,
  ai_protocol = 0,
  ai_addrlen = 0,
  ai_canonname = 0x0,
  ai_addr = 0x0,
  ai_next = 0x0
}

As a result of the hints, on both MacOS and Linux, for IPv6 SOCK_STREAM (TCP) 
is explicitly requested, which means that the list returned by getaddrinfo 
contains only elements that have ai_protocol == TCP. For IPv4 however, due to 
the missing ai_socktype hint, the list is longer and contains both ai_protocol 
== TCP and ai_protocol == UDP entries.

Now when the socket is created in
https://github.com/bagder/curl/blob/master/lib/connect.c#L1282
the protocol is passed into socket() as it was returned by getaddrinfo(). The 
reason why the bug only reproduces on MacOS, but not on Linux, is the order of 
the elements in the list returned by getaddrinfo. On MacOS, the first element 
in the list has ai_protocol == UDP. As a result, the call to socket looks as 
follows
socket(AF_INET, SOCK_STREAM, UDP); // UPD == 17
and fails (socket() returns -1). No further connect attempts are then made and 
curl times out. On Linux, however, since the first element in the list has 
ai_protocol == TCP, the following call is made
socket(AF_INET, SOCK_STREAM, TCP); // TCP == 6
which succeeds and the connection is successfull.

I added printfs to illustrate the problem for myself to easily trace the 
problem on different machines and have put output of a failure and success run 
on pastebin:
http://pastebin.com/A01Jcwbf (I don’t think there is a lot of value, but the 
code that produced this output is on 
https://github.com/FabianFrank/curl/tree/bug/ if anybody wants to test this on 
yet another platform).

So it seems that commit
https://github.com/bagder/curl/commit/02fbc26d59c59170fd358034b04a43d8e9b7c78f
that we bisected earlier might not be the root cause, but merely surfacing the 
bug, due to the changed way the list returned by getaddrinfo is being digested.

The following patch fixes the bug (meaning on MacOS it works with IPv6 enabled 
as well as disabled while using the threaded resolver), by always passing the 
hints:
diff --git a/lib/asyn-thread.c b/lib/asyn-thread.c
index 0adac40..4ae904c 100644
--- a/lib/asyn-thread.c
+++ b/lib/asyn-thread.c
@@ -209,12 +209,8 @@ int init_thread_sync_data(struct thread_sync_data * tsd,
   memset(tsd, 0, sizeof(*tsd));

   tsd->port = port;
-#ifdef CURLRES_IPV6
   DEBUGASSERT(hints);
   tsd->hints = *hints;
-#else
-  (void) hints;
-#endif

   tsd->mtx = malloc(sizeof(curl_mutex_t));
   if(tsd->mtx == NULL)

I don’t know enough about the curl codebase to judge the potential side effects 
of this change, what do you think? Is there a better fix? Let me know if I can 
help with more info.


Thanks a lot,
Fabian
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html

Reply via email to