> Maybe your fields are too long so that only part of it gets indexed (look
at IndexWriter.maxFieldLength).

This is interesting, I've had a look at the JavaDoc and I think I
understand. The maximum field length describes the maximum number of unique
terms, not the maximum number of words/tokens. Therefore, even if I have a
4Gb field, I could quite safely have a maxFieldLength of, say, 100k words
which should safely handle the maximum number of unique words, rather than
800 million which would be needed to handle every token.

Is this correct? 

Is 100k a worrying maxFieldLength, in terms of how much memory this would
consume?

Does Lucene issue a warning if this limit is exceeded during indexing (it
would be quite worrying if it was silently discarding terms)?

Thanks in advance,

Alex.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to