I'd like to discuss what the considerations are for network topology. The particular topology I mentioned (which I've since been convinced isn't really a cube or torus after all) was designed with the idea that it's important to be able to reliably query the entire network without sending any nodes duplicate queries. I'm not sure how important these considerrations really are, though. I got the impression that there are huge numbers of duplicate queries sent to the same node by multiple paths in the current gnutella network, but this may be lessa problem than I think it is. Also, as the number of nodes in the network becomes large, it clearly becomes impossible for every query to reach ever node, and besides, this isn't really desirable if you're getting lots of hits.
I get the impression that network design generally starts off with the assumption that you've got data that is intended to go to a particular place, and a "good" design is one where you can get your data there in a small number of hops while avoiding creating any bottlenecks. But the criteria for a p2p file sharing network are very differnt; you're not trying to query any particular node, you just want to query a sufficent number of nodes such that you find what you're looking for (assuming it's out there). The implications I get from this are: 1) if you're looking for something pretty common, there's really very little point in querying much of the network. <aside> I think maybe if I query packet included a "hits so far" stat as well as a time to live (BTW, does anyone else think "time to live" should be "time to die"?) and stop forawrding the query when the hits passes some threshhold. Of course, you're only seeing hits along one particular branch at at time, so it may turn out that you very seldom see enough hits on one branch for this number to be meaingful. </aside> 2)If we accept the fact that most queries will only reach a small piece of the network, then if we want to find something relatively obscure we should either have some way of designating some queries as being "special" and needing/deserving wider distribution (vast potential for abuse here) or we'd like to ensure that if we requery we will hit a distinct (and ideally disjoint) subset of the network. So... what are the other important considerations? what are the implications for network topology? George
