seeking array advice

Ketil Malde Wed, 02 Mar 2005 02:52:13 -0800

Hi,

I'm about to rework an old program, and I am pondering what data
structure to use.  Basically, I want to store an index of fixed-length
words (q-grams) with positions, i.e. given a word, I can quickly find
its position in the data.


The data set may be large (gigabytes would be nice), and I also want
the option of storing a sparse set of the words in the data
(i.e. every other word, every third, etc).

I *think* I want to use a Map from words to positions, words packed to
fit in Int(32|64|eger) (depending on word size), and positions as
Int(32|64).  I've used a similar approach with FiniteMaps previously,
and it worked nicely, but for smaller data sets.

However:

Using a 'Map key [pos]' will eat a lot of space for the lists.  
'Map key (UArray Int pos)' will be more efficient, but every insertion
and deletion will destroy and create a new array -- and I would like
the Map to live outside a monad (does there even exist an (IO|ST)Map?)

I've considered lists of small(ish -- cache line compatible, probably)
arrays.  Another possibility is a suffix array, and a Map from words
to regions in it. How will a hashtable work in this setting?

Any thoughts?

-kzm

PS: it seems from a recent thread in comp.lang.functional (see
<[EMAIL PROTECTED]>) that array
performance has improved in GHC6.4.  Impressively so.
-- 
If I haven't seen further, it is by standing in the footprints of giants

_______________________________________________
Glasgow-haskell-users mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

seeking array advice

Reply via email to