Has anyone thought of (or implemented) caching of term information?

Currently, Lucene stores an index of every nTH term. Then uses this
information to position the TermEnum, and then scans the terms.

Might it be better to read a "page" of term infos (based on the index), and
then keep these pages in a SoftCache in the SegmentTermEnum ?

It seems the byte/char level processing is what consumes the most CPU
performing searches. This should reduce that dramatically, especially for
common term scans.

I realize there are better ways for some common scans (range filters, etc.)
that would avoid the overhead completely.

Any thoughts?

Robert


Reply via email to