Erick Erickson wrote:

Arbitrary restrictions by IT on the space the indexes can take up.

Actually, I won't categorically I *can't* make this happen, but in order to
use this option, I need to be able to present a convincing case. And I can't
do that until I've exhausted my options/creativity.

Disk space is a LOT cheaper than engineering time. Any manager worth his/her salt should be able to evaluate that tradeoff in a millisecond, and any IT professional unable to do so should be reprimanded. Maybe your boss can fix it. If not, yours is probably not the only such situation in the world ...

If you can retrieve the pre-index content at search time, maybe this would work:

1. Create the "real" index in the form that lets you get the top N books by relevance, on IT's disks.

2. Create a temporary index on those books in the form that gives you the chapter counts in RAM, search it, then discard it.

If N is sufficiently small, #2 could be pretty darn fast.


If that wouldn't work, here's another idea. I'm not clear on how your solution with getLastTermPosition() would work, but how about just counting words in the pages as you document.add() them (instead of relying on getLastTermPosition())? It would mean two passes of parsing, but you wouldn't have to modify any Lucene code ...

--MDC

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to