Erick Erickson wrote:
Arbitrary restrictions by IT on the space the indexes can take up.
Actually, I won't categorically I *can't* make this happen, but in order to
use this option, I need to be able to present a convincing case. And I
can't
do that until I've exhausted my options/creativity.
Disk space is a LOT cheaper than engineering time. Any manager worth his/her
salt should be able to evaluate that tradeoff in a millisecond, and any IT
professional unable to do so should be reprimanded. Maybe your boss can fix
it. If not, yours is probably not the only such situation in the world ...
If you can retrieve the pre-index content at search time, maybe this would work:
1. Create the "real" index in the form that lets you get the top N books by
relevance, on IT's disks.
2. Create a temporary index on those books in the form that gives you the
chapter counts in RAM, search it, then discard it.
If N is sufficiently small, #2 could be pretty darn fast.
If that wouldn't work, here's another idea. I'm not clear on how your
solution with getLastTermPosition() would work, but how about just counting
words in the pages as you document.add() them (instead of relying on
getLastTermPosition())? It would mean two passes of parsing, but you wouldn't
have to modify any Lucene code ...
--MDC
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]