Re: termpositions at index time...

Michael D. Curtin Wed, 18 Oct 2006 15:13:21 -0700

Erick Erickson wrote:

Arbitrary restrictions by IT on the space the indexes can take up.


Actually, I won't categorically I *can't* make this happen, but in order to

use this option, I need to be able to present a convincing case. And Ican't

do that until I've exhausted my options/creativity.

Disk space is a LOT cheaper than engineering time. Any manager worth his/hersalt should be able to evaluate that tradeoff in a millisecond, and any ITprofessional unable to do so should be reprimanded. Maybe your boss can fixit. If not, yours is probably not the only such situation in the world ...


If you can retrieve the pre-index content at search time, maybe this would work:

1. Create the "real" index in the form that lets you get the top N books byrelevance, on IT's disks.

2. Create a temporary index on those books in the form that gives you thechapter counts in RAM, search it, then discard it.


If N is sufficiently small, #2 could be pretty darn fast.

If that wouldn't work, here's another idea. I'm not clear on how yoursolution with getLastTermPosition() would work, but how about just countingwords in the pages as you document.add() them (instead of relying ongetLastTermPosition())? It would mean two passes of parsing, but you wouldn'thave to modify any Lucene code ...


--MDC

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: termpositions at index time...

Reply via email to