Has anyone thought of (or implemented) caching of term information? Currently, Lucene stores an index of every nTH term. Then uses this information to position the TermEnum, and then scans the terms.
Might it be better to read a "page" of term infos (based on the index), and then keep these pages in a SoftCache in the SegmentTermEnum ? It seems the byte/char level processing is what consumes the most CPU performing searches. This should reduce that dramatically, especially for common term scans. I realize there are better ways for some common scans (range filters, etc.) that would avoid the overhead completely. Any thoughts? Robert