On Fri, Nov 15, 2013 at 08:29:22PM +0100, Vladimir Panteleev wrote: > On Friday, 15 November 2013 at 18:43:12 UTC, H. S. Teoh wrote: [...] > >I'm trying to implement an algorithm that traverses a very large > >graph, and I need some kind of data structure to keep track of which > >nodes have been visited, that (1) allows reasonably fast lookups > >(preferably O(1)), and (2) doesn't require GB's of storage (i.e., > >some kind of compression would be nice). > > A while ago I set out to write a solver for a group of problems > which can be described as traversing in breath extremely large > implicit graphs. Some code here (C++): > https://github.com/CyberShadow/DDD. The project uses delayed > duplicate detection, to allow the number of nodes to exceed > available RAM. > > What we found was that in certain cases, delayed duplicate detection > beat hash tables even while filtering duplicates in memory, I think > mainly because DDD is more parallelizable than hash tables. > > You mentioned compression - perhaps a bloom filter will fit your > needs, as an optimization?
Thanks!! I think delayed duplicate detection is what I'm looking for. The bloom filter idea is an interesting one; I'll definitely consider it as a later optimization once I get DDD working. T -- It won't be covered in the book. The source code has to be useful for something, after all. -- Larry Wall
