Re: [OT] Best algorithm for extremely large hashtable?

qznc Fri, 15 Nov 2013 12:06:13 -0800

On Friday, 15 November 2013 at 18:43:12 UTC, H. S. Teoh wrote:

This isn't directly related to D (though the code will be inD), and I
thought this would be a good place to ask.
I'm trying to implement an algorithm that traverses a verylarge graph,and I need some kind of data structure to keep track of whichnodes havebeen visited, that (1) allows reasonably fast lookups(preferably O(1)),and (2) doesn't require GB's of storage (i.e., some kind ofcompression
would be nice).
The graph nodes can be represented in various ways, butpossibly the
most convenient representation is as n-dimensional vectors of
(relatively small) integers. Furthermore, graph edges arealways betweenvectors that differ only by a single coordinate; so the edgesof thegraph may be thought of as a subset of the edges of ann-dimensionalgrid. The hashtable, therefore, needs to represent someconnected subsetof this grid in a space-efficient manner, that still allowsfast lookup
times.
The naïve approach of using an n-dimensional bit array is notfeasiblebecause n can be quite large (up to 100 or so), and the size ofthe griditself can get up to about 10 in each direction, so we'relooking at a
potential maximum size of 10^100, clearly impractical to store
explicitly.

So, -10 to 10 in discrete steps. This means 5 bits per dimensionand 500 bits for a single coordinate. Is the graph distributed ofa compute cluster or does it fit into single computer? With a fewGB of RAM, this means your graph is quite sparse, yet nodes areconnected ("differ only by a single coordinate")?

Can you preprocess? I mean, walk all the nodes O(n) to compute agood (perfect?) hash function.

In general, I think you should either store the flag right in thegraph node or mirror the graph structure.


I do not know any concrete algorithms.

Re: [OT] Best algorithm for extremely large hashtable?

Reply via email to