Andrew Bromage wrote > On Mon, Jul 29, 2002 at 02:50:55PM -0700, Hal Daume III wrote: > > > I need a data structure which is a map from Ints to Doubles; the > > distribution of the Ints is in the range say 0-20000 and a map will > > contain somewhere around 100-200 elements. I need to be able to query > > *very* quickly, insert *quite* quickly, and iterate through the elements > > very quickly. I'm thinking hash tables, but I'm not sure. Does anyone > > have any thoughts on this? > > How about this? > > http://citeseer.nj.nec.com/briggs93efficient.html > > Another paper which might be useful: > > Aasa, A., Holstrom, S. and Nilsson, C. (1988), > "An Efficiency Comparison of Some Representations of Purely > Functional Arrays". BIT 28(3):490--503. You aren't doing sparse matrix computations are you? In that case there are specialised storage techniques around, which are probably better than any general solution. If you ask on sci.math.num-analysis you'll probably get more references than you know what to do with.
The method in the first paper effectively has one array of length 20000 allowing you to query *very* quickly, plus a smaller array listing the elements. I think the main problem with this would be the array of 20000 elements. Apart from the cost of memory (which is probably pretty insignificant), you have the problem that 20000 elements is 160kB which is quite a lot to all keep in cache at once. If the elements are uniformly distributed it seems to me you might get better performance by, say, using a closed hash table with size say 1024 and indexing just by some selection of 10 bits from the integers. (Of course if the integers are not uniformly distributed you will need a better hash function.) Then in 80-90% of cases you will get a positive/negative answer to the query in just one lookup; if it's a don't know (there is a different element at this position) you will only have to scan through the same cache line (almost certainly) until you get the right answer. You can also dispense with the first paper's auxiliary array this way by setting the integers in the array to out-of-range values (or by using signalling NaN's for the doubles); iterating through the array checking all the integers shouldn't take too long. _______________________________________________ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell