nice report. Benjamin M. Schwartz wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Dear Networking experts, > > I have been fighting for several months with the fact that invitations > often seem not to work, when running on a serverless mesh. The symptoms > are quite strange. If an invitation works once between two laptops, it > continues to work between them reliably. If it fails once, it continues > to fail between them consistently. Sometimes, in the same place, > invitations will work on one mesh channel and not on another. The same > two XOs may be reliably successful in a particular high-noise environment, > and consistently fail in an area of virtual radio silence, as well as the > reverse. > > Even when invitations fail, other presence information continues to flow > correctly. Even activity sharing continues to work beautifully. > > With some help from Daf, we managed to get a tcpdump trace from two XOs > exhibiting this behavior at 1CC. The dumps are attached to ticket #6463. > ~ What we saw is bizarre, but also consistent with the behavior in the UI. > ~ The invitations are unicast, implemented using TCP. When machine A sends > an invitation to B, we see the following exchange: > > 1. A broadcasts an ARP request for B > 2. B sees the ARP request and replies to A > 3. A receives the ARP reply from B and sends a TCP SYN to B > 4. B does not see the SYN packet (it does not appear in B's dump) > 5. A retries a total of three times, but none of the SYN packets are seen > by B. > 3b. In parallel, A broadcasts a presence-info update with mDNS, indicating > that it has shared the activity. > 4b. B receives this broadcast, updates its presence-info cache, and even > assigns B's XO icon a new location in the mesh view > > This behavior is fairly frightening. I have seen it occur in low-noise > network environments with a total of 3 XOs, so I suspect a serious bug > somewhere in the lowest levels of the network stack. Once this failure > occurs, it is extremely reproducible. All subsequent invitations will > continue to fail. I therefore suspect that the bug involves the driver or > firmware reaching an invalid state and becoming stuck there. >
You have to keep in mind that the driver/firmware may very well have bugs, but: 1) the driver does not differentiate between different TCP/IP packets (but may wrongly differentiate between unicast and broadcast/multicast). Try establishing a separate TCP/IP connection when invitations reproducibly don't work. 2) the firmware (in terms of a route existing or not) does not differentiate between frames. Try pinging the other node when invitations reproducibly don't work. > Given the variety of critical services that run over TCP, including the > much-emphasized Read activity, I hope that people familiar with the driver > and firmware will take a look at this bug. > > - --Ben Schwartz > > P.S. All this info is present at ticket #6463. I am writing about it here > in an attempt to increase awareness. > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v2.0.9 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iEYEARECAAYFAkhOz+oACgkQUJT6e6HFtqSVBQCeKPWmqeoKOzVv55JS/HTAgf1r > bUYAoKCG+z1bBA+isc7Mun0VlQNGDars > =4w83 > -----END PGP SIGNATURE----- > _______________________________________________ > Networking mailing list > [EMAIL PROTECTED] > http://lists.laptop.org/listinfo/networking > -- Polychronis Ypodimatopoulos Graduate student Viral Communications MIT Media Lab Tel: +1 (617) 459-6058 http://www.mit.edu/~ypod/ _______________________________________________ Devel mailing list [email protected] http://lists.laptop.org/listinfo/devel
