[ 
https://issues.apache.org/jira/browse/LUCENE-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-1072:
---------------------------------------

    Attachment: LUCENE-1072.take2.patch

OK, I added that as a test case (to TestIndexWriter), and then fixed
it.  Attached patch.  I plan to commit in 1 or 2 days.  Thanks
Michael!

This was happening during DW.abort(), which was being called on an
unhandled exception to clear all documents added since the last flush.
It was incorrectly recycling a null Posting instance.

I've also tightened when abort() is called to only those places that
actually require it.  A failure in the tokenization of one document
should not discard previously indexed documents but not-yet-flushed
documents.  So I added asserts to the test case to verify that.


> NullPointerException during indexing in 
> DocumentsWriter$ThreadState$FieldData.addPosition
> -----------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1072
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1072
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.3
>         Environment: Linux CentOS 5 x86_64 running on 2-core Pentium D, Java 
> HotSpot(TM) 64-Bit Server VM (build 1.6.0_01-b06, mixed mode), using 
> lucene-core-2007-11-29_02-49-31
>            Reporter: Alexei Dets
>            Assignee: Michael McCandless
>             Fix For: 2.3
>
>         Attachments: LUCENE-1072.patch, LUCENE-1072.take2.patch
>
>
> In my case during indexing sometimes appear documents with unusually large 
> "words" - text-encoded images in fact.
> Attempt to add document that contains field with such token produces 
> java.lang.IllegalArgumentException:
> java.lang.IllegalArgumentException: term length 37944 exceeds max term length 
> 16383
>         at 
> org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.addPosition(DocumentsWriter.java:1492)
>         at 
> org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.invertField(DocumentsWriter.java:1321)
>         at 
> org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.processField(DocumentsWriter.java:1247)
>         at 
> org.apache.lucene.index.DocumentsWriter$ThreadState.processDocument(DocumentsWriter.java:972)
>         at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:2202)
>         at 
> org.apache.lucene.index.DocumentsWriter.addDocument(DocumentsWriter.java:2186)
>         at 
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1432)
>         at 
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1411)
> This is expected, exception is caught and ignored. The problem is that after 
> this IndexWriter becomes somewhat corrupted and subsequent attempts to add 
> documents to the index fail as well, this time with NPE:
> java.lang.NullPointerException
>         at 
> org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.addPosition(DocumentsWriter.java:1497)
>         at 
> org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.invertField(DocumentsWriter.java:1321)
>         at 
> org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.processField(DocumentsWriter.java:1247)
>         at 
> org.apache.lucene.index.DocumentsWriter$ThreadState.processDocument(DocumentsWriter.java:972)
>         at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:2202)
>         at 
> org.apache.lucene.index.DocumentsWriter.addDocument(DocumentsWriter.java:2186)
>         at 
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1432)
>         at 
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1411)
> This is 100% reproducible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to