Hi All, I started messing around today with building a node crawler to try and map out the bitcoin network and hopefully provide some useful statistics. It's very basic so far using a mutilated bitcoinj to connect (due me being java developer and not having a clue with c/c++). If it's worthwhile I'll hack bitcoinj some more to run on top Netty to take advantage of it's NIO architecture (netty's been shown to handle 1/2 million concurrent connections so would be ideal for the purpose).
Hoping to a get a bit of input into what would be useful as well as strategy for getting max possible connections without distorted data. I seem to recall Gavin talking about the need for some kind of network health monitoring so I assume there's a need for something like this... Firstly at the moment basically I'm just storing version message and the results of getaddr for each node that I can connect to. Is there any other useful info that can be extracted from a node that's worth collecting? Second and main issue is how to connect. From my first very basic probing it seems the very vast majority of nodes don't accept incoming connections no doubt due to lack of upnp. So it seems the active crawl approach is not really ideal for the purpose. Even if it was used the resultant data would be hopelessly distorted. A honeypot approach would probably be better if there was some way to make a node 'attractive' to other nodes to connect to. That way it could capture non-listening nodes as well. If there is some way to influence other nodes to connect to the crawler node that solves the problem. If there isn't which I suspect is the case then perhaps another approach is to build an easy to deploy crawler node that many volunteers could run and that could then upload collected data to a central repository. While I'm asking questions I'll add one more regarding the getaddr message. It seems most nodes return about 1000 addresses in response to this message. Obviously most of these nodes haven't actually talked to all 1000 on the list so where does this list come from? Is it mixture of addresses obtained from other nodes somehow sorted by timestamp? Does it include some nodes discovered by IRC/DNS? Or are those only used to find the first nodes to connect to? Thanks for any input... Hopefully I can build something that's useful for the network... ------------------------------------------------------------------------------ Special Offer -- Download ArcSight Logger for FREE! Finally, a world-class log management solution at an even better price-free! And you'll get a free "Love Thy Logs" t-shirt when you download Logger. Secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsisghtdev2dev _______________________________________________ Bitcoin-development mailing list Bitcoin-development@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bitcoin-development