most machines these days have 10x that much memory.  it should
be speedy enough to use strstr(2) once you've loaded them into
memory.   and even loading them into memory should take no
more than a few seconds at 80MB/s.

a more elegant solution would be to reduce each document to
a set of stemmed words, enumerate the set of all stems in all
documents and create a bit array mapping stems to message #.
but that seems like too much work for only 150MB.

- erik

Reply via email to