What worries me most about this is the revelation that we continue to rely on mDNS when connected to internet infrastructure. When in the presence of a school server, (or connected to a jabber server), mDNS should be shut down. Otherwise we risk a network meltdown....
wad On Dec 13, 2007, at 11:18 PM, Giannis Galanis wrote: > I had several tests related to the xmas tree effect we see in the > mesh view. > > The effect is that some times XOs disappear + reappear to the same > or different position, or simply disappear. More usually it happens > for many XOs simultaneously. > > The results i have, clearly indicate that this is an issue an the > Avahi daemon, which is used by the Salut telepathy service. The > sugar interface displayes the information it receives from salut > very reliably. This means that when a host dissapear from the > avahi's host list, it vanished instantly from the mesh view, and > the same when a new host arrives. > > The Avahi deamon runs below Salut and keeps receives information > from other hosts in the network which also run Avahi deamon. > It keeps a local cache with the recent hosts. > At regular intervals(of 1-2 mins i think), it checks whether the > hosts in the cache are alive. If not, they are recorded as "failed" > The above check can be invoked by "avahi-browse -t -r > _presence._tcp" continuously(instead of waiting for 1-2mins) > After a certain timeout, a failed entry(dead host) will disappear > from the cache, and instantly it will disappear from the mesh view. > > This timeouts is pretty long(several minutes), so a host(XO) has > the chance to become alive again with no effect on the mesh view. > This can occur when: > a. the XO's avahi packets dont get through due to high mesh > traffic. In this case the other XOs might either see is as alive, > or dead according to the conditions. > b.the XO's deliberately moved to another channel, or anyway > disconnected. In that case, all othes XOs will see it as dead > From a client's point of view, the two cases are treated almost the > same. > > THE TEST: > 6 XOs connected to channel 11, with forwarding tables blinded only > to them selves, so no other element in the mesh can interfere. > > The cache list was scanned continuously on all XOs using a script > > If all XOs remained idle, they all showed reliably to each other > mesh view. Every 5-10 mins an XO showed as dead in some other XOs > scns, but this was shortly recovered, and there was no visual > effect in the mesh view. > > If you switched an XO manually to another channel, again it showed > "dead" in all others. If you reconnected to channel 11, there is > again no effect in the mesh view. > If you never reconnected, in about 10-15 minutes the entry is > deleted, and the corresponding XO icon dissapeared from the view. > > Therefore, it is common and expected for XOs to show as "dead" in > the Avahi cache for some time for some time. > > THE BUG: > IF a new XO appears(a message is received through Avahi), > WHILE there are 1 or more XOs in the cache that are reported as "dead" > THEN Avahi "crashes" temporarily and the cache CLEARS. > > At this point ALL XOs that are listed as dead instantly disappear > from the mesh view. > But, of course, some of the "dead" XOs are expected to re-appear > shortly. Specially those that are still in the same mesh channel, > but merely failed to transmit its avahi packets due to traffic load. > > Note that if there is only 1 XO that looks dead, but returns, > everything is normal. > But, if there are 2,3.. XOs that look dead, when 1 returns, then: > a. all(the dead ones) disappear from the view > b. the 1 that returned will reappear right after in probably a > different position. i.e. it will "jump" > > The avahi-browse command scans realtime the network(i.e. sends > requests for all hosts in its cache list) and runs for a several > seconds. If the above situation occurs, it freezes(this is what i > meant by "crashes"). When it is restarted the cache is cleared from > previously dead hosts. > > A typical situation that the "xmas tree effect" occurs: > 20 XOs are running salut in channel 1. This incuded XOs conencted > to medialab AP, schoolserver, linklocal. > XOs leave the channel continuously. > Concurrently, some connected XOs appear dead for 1 minute or so, > and reappear after short time. > > Assume that at some point 5 XOs have either really left, or "seem > dead" anyway > > At some point 2 of these XOs are reconnected at the same time to > the mesh channel by someone in the office. > The 2 XOs will "jump" to a different position, whereas the other 3 > will simply vanish > > The way I see it, there is very clear/narrow/specific bug in > handling the cache by the avahi daemon, > when new hosts + dead hosts coexist. > > I hope the tests have cleared the picture alot > > yani > _______________________________________________ > Devel mailing list > [email protected] > http://lists.laptop.org/listinfo/devel _______________________________________________ Devel mailing list [email protected] http://lists.laptop.org/listinfo/devel
