Chris Hostetter wrote:
: I think though, that I will need a setter on the reader, rather than the
: writer. That is, I don't know what factor we want until I know how
: large the index is. And I don't know how large the index will be at the
: time of creating the writer, but I can just ask for maxDoc() at the time
: of opening the reader.
I believe if you really want to determine settings like this after
building the index, you'll need to do an initial build the index using
best guess values -- then if the calculations you do once the index is
built aren't close enough to your guesses to satisfy you, change the value
and optimize.
Okay, I've gone and revised how things are fitting together in our app.
It seems that we already call optimize() at the end of all the
processing, before which I could figure out what kind of value we should
be using and call this setter method which I'll patch into the version
we're running.
My logic will probably just say that each index is allowed to store X
terms, so if the number of terms is greater than some value, I'll double
the indexInterval until it comes to some amount which _should_ fit under
that size.
I'm still thinking of some way to recognise text which is clearly not
useful, which might just end up being based on the length of the word.
If I truncated all words to 30 characters in the analyser, that should
drop the average word length and get us some memory savings there. If I
can also remove smaller junk words, we'll save even more space due to
having less terms in total (I figure that tens of millions of terms is
unrealistic, yet that's the number of terms we're currently indexing.)
Daniel
--
Daniel Noll
NUIX Pty Ltd
Level 8, 143 York Street, Sydney 2000
Phone: (02) 9283 9010
Fax: (02) 9283 9020
This message is intended only for the named recipient. If you are not
the intended recipient you are notified that disclosing, copying,
distributing or taking any action in reliance on the contents of this
message or attachment is strictly prohibited.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]