[ https://issues.apache.org/jira/browse/LUCENE-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless updated LUCENE-1072: --------------------------------------- Attachment: LUCENE-1072.take2.patch OK, I added that as a test case (to TestIndexWriter), and then fixed it. Attached patch. I plan to commit in 1 or 2 days. Thanks Michael! This was happening during DW.abort(), which was being called on an unhandled exception to clear all documents added since the last flush. It was incorrectly recycling a null Posting instance. I've also tightened when abort() is called to only those places that actually require it. A failure in the tokenization of one document should not discard previously indexed documents but not-yet-flushed documents. So I added asserts to the test case to verify that. > NullPointerException during indexing in > DocumentsWriter$ThreadState$FieldData.addPosition > ----------------------------------------------------------------------------------------- > > Key: LUCENE-1072 > URL: https://issues.apache.org/jira/browse/LUCENE-1072 > Project: Lucene - Java > Issue Type: Bug > Components: Index > Affects Versions: 2.3 > Environment: Linux CentOS 5 x86_64 running on 2-core Pentium D, Java > HotSpot(TM) 64-Bit Server VM (build 1.6.0_01-b06, mixed mode), using > lucene-core-2007-11-29_02-49-31 > Reporter: Alexei Dets > Assignee: Michael McCandless > Fix For: 2.3 > > Attachments: LUCENE-1072.patch, LUCENE-1072.take2.patch > > > In my case during indexing sometimes appear documents with unusually large > "words" - text-encoded images in fact. > Attempt to add document that contains field with such token produces > java.lang.IllegalArgumentException: > java.lang.IllegalArgumentException: term length 37944 exceeds max term length > 16383 > at > org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.addPosition(DocumentsWriter.java:1492) > at > org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.invertField(DocumentsWriter.java:1321) > at > org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.processField(DocumentsWriter.java:1247) > at > org.apache.lucene.index.DocumentsWriter$ThreadState.processDocument(DocumentsWriter.java:972) > at > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:2202) > at > org.apache.lucene.index.DocumentsWriter.addDocument(DocumentsWriter.java:2186) > at > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1432) > at > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1411) > This is expected, exception is caught and ignored. The problem is that after > this IndexWriter becomes somewhat corrupted and subsequent attempts to add > documents to the index fail as well, this time with NPE: > java.lang.NullPointerException > at > org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.addPosition(DocumentsWriter.java:1497) > at > org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.invertField(DocumentsWriter.java:1321) > at > org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.processField(DocumentsWriter.java:1247) > at > org.apache.lucene.index.DocumentsWriter$ThreadState.processDocument(DocumentsWriter.java:972) > at > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:2202) > at > org.apache.lucene.index.DocumentsWriter.addDocument(DocumentsWriter.java:2186) > at > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1432) > at > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1411) > This is 100% reproducible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]