On Dec 20, 2007 9:41 AM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > I am getting the following exception when running against trunk: > java.lang.IllegalArgumentException: at least one term (length 20079) > exceeds max term length 16383; these terms were skipped > at > org > .apache.lucene.index.IndexWriter.checkMaxTermLength(IndexWriter.java: > 1545) > at > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1451) > at > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1411) > .... > > I'm wondering if the IndexWriter should throw an explicit exception in > this case as opposed to a RuntimeException, as it seems to me really > long tokens should be handled more gracefully. It seems strange that > the message says the terms were skipped (which the code does in fact > do), but then there is a RuntimeException thrown which usually > indicates to me the issue is not recoverable. I am using the > StandardTokenizer, but I don't think that much matters. > > Any thoughts on this?
I think it's a good to bring attention to it and not sweep it under the rug. It indicates potential issues or problems with analysis or the data. The user can use a LengthFilter to explicitly throw long tokens away. -Yonik --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]