Yonik Seeley wrote:
as it seems to me really
long tokens should be handled more gracefully. It seems strange that
the message says the terms were skipped (which the code does in fact
do), but then there is a RuntimeException thrown which usually
indicates to me the issue is not recoverable.
It does seem like the document shouldn't be added at all if it caused
an exception.
Is that what happens if one of the analyzers causes an exception to
be thrown?
The other option is to simply ignore tokens above 16K... I'm not sure
what's right here.
Right now we are ignoring the too-long tokens and adding the rest.
Unfortunately, because DocumentsWriter directly updates the posting
lists in RAM, it's very difficult to "undo" those tokens we have
already successfully processed & added to the posting lists.
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]