Re: [OT] Best algorithm for extremely large hashtable?

Ivan Kazmenko Fri, 15 Nov 2013 12:57:35 -0800

On Friday, 15 November 2013 at 20:07:24 UTC, H. S. Teoh wrote:

On Fri, Nov 15, 2013 at 08:48:19PM +0100, John Colvin wrote:
How is this the same as the edges of an n-dimensional grid?
Basically, adjacent nodes differ only in a single coordinate,and thatdifference can only be 1 or -1. So for the 2D case, if yougraph thenodes on the plane, the edges would be horizontal/vertical linesegmentsof unit length. If you consider the 2D grid's edges to be *all*possiblesuch line segments, then in that sense you could think of thegraph's
edges as being a subset of the 2D grid's.
I was hoping that this fact can be taken advantage of, to makea compact
representation of visited nodes.


How dense is the graph?

For example, if it contains every possible edge described (+-1 toevery single coordinate), for a breadth-first search, we canmanage to just keep track of a single integer: the farthestdistance traveled.

For very dense graphs, you can perhaps apply a similar idea:represent large visited areas as "diamonds" with center andradius, then try to "factor" the visited areas into diamondson-the-fly. Possibly they will be "diamonds with a [muchshorter] list of holes". For example, we say that all nodes notfurther than 3 from (0, 0, ..., 0) are visited, and store asingle center (origin) and a radius (3) instead of all(dimensions[100] to the power of distance[3]) of them. Thelookup will be at most O (number-of-diamonds) plus a hash tablelookup for the individual nodes. Perhaps for certain graphs, wecan find a good compromise between the number of diamonds and thenumber of individual stored nodes.

The above is just an idea, perhaps it won't be feasible by itselfwhen you get to the details, but can inspire some relatedoptimization.

Also, the size of the border of n-dimensional area is a(n-1)-dimensional object, and for dense enough graphs, you canhope that the number of elements in it is less by an order ofmagnitude than the total number of visited nodes. However, fortoo much dimensions (as in your case, 10^100 -> 10^99), it doesnot seem to help much.

Another question is, when will the graph traversal end? Forexample, if you are certain that you won't need to visit morethan say one million nodes, a simple hash table storing the noderepresentations at the hash indices will suffice. On the otherhand, if you plan to visit 10^12 nodes, and the graph is not verysparse or very dense (and not regular in any obvious way besideswhat is described), perhaps you won't get the requiredcompression level (1/1000) anyway.


Ivan Kazmenko.

Re: [OT] Best algorithm for extremely large hashtable?

Reply via email to