On Dec 20, 2007 9:41 AM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
> I am getting the following exception when running against trunk:
> java.lang.IllegalArgumentException: at least one term (length 20079)
> exceeds max term length 16383; these terms were skipped
>     at
> org
> .apache.lucene.index.IndexWriter.checkMaxTermLength(IndexWriter.java:
> 1545)
>     at
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1451)
>     at
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1411)
> ....
>
> I'm wondering if the IndexWriter should throw an explicit exception in
> this case as opposed to a RuntimeException, as it seems to me really
> long tokens should be handled more gracefully.  It seems strange that
> the message says the terms were skipped (which the code does in fact
> do), but then there is a RuntimeException thrown which usually
> indicates to me the issue is not recoverable.  I am using the
> StandardTokenizer, but I don't think that much matters.
>
> Any thoughts on this?

I think it's a good to bring attention to it and not sweep it under the rug.
It indicates potential issues or problems with analysis or the data.
The user can use a LengthFilter to explicitly throw long tokens away.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to