Kim, how are you? Out of curiosity, I would like to clarify something.

My naïve impression was that, if your keys are French words, the dataset
can't be very large.
Let's say that a highly educated English speaker has about 300,000 words in
his/her vocabulary.
Let's give a French speaker 500,000 words. Add two integers (8 bytes per
entry) as payload.
Now, I don't know what the histogram for French word length looks like.
Most words are probably less than 16-20 characters long (with a long flat
tail).
So even at 100 bytes per entry, we'd be looking at a ~100 MB hash table.
(Even on a smart phone that's not a terrible size.)
No matter how you organize the dataset(s) in HDF5 on disk, you are not going
to beat an
in-memory hashtable/dictionary lookup (even with a generic hash function
that doesn't speak French).
Why can't you hold that in memory?

Is your question that you don't want to regenerate that hashtable all the
time,
and that's why you'd like to store it on disk in HDF5?
(Again, HDF5 has no DBMS like query engine and I don't see why you'd need
that.)

Best, G.



_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to