[
https://issues.apache.org/jira/browse/LUCENE-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Busch reopened LUCENE-1072:
-----------------------------------
I'm seeing a similar issue when TokenStream.next() throws an
IOException (or a RuntimeException). The DocumentsWriter is
thereafter not usable anymore, i. e. subsequent calls of
addDocument() fail with a NullPointerException.
I added this test to TestIndexWriter which shows the problem:
{code:java}
public void testExceptionFromTokenStream() throws IOException {
RAMDirectory dir = new RAMDirectory();
IndexWriter writer = new IndexWriter(dir, new Analyzer() {
public TokenStream tokenStream(String fieldName, Reader reader) {
return new TokenFilter(new StandardTokenizer(reader)) {
private int count = 0;
public Token next() throws IOException {
if (count++ == 5) {
throw new IOException();
}
return input.next();
}
};
}
}, true);
Document doc = new Document();
String contents = "aa bb cc dd ee ff gg hh ii jj kk";
doc.add(new Field("content", contents, Field.Store.NO,
Field.Index.TOKENIZED));
try {
writer.addDocument(doc);
fail("did not hit expected exception");
} catch (Exception e) {
}
// Make sure we can add another normal document
doc = new Document();
doc.add(new Field("content", "aa bb cc dd", Field.Store.NO,
Field.Index.TOKENIZED));
writer.addDocument(doc);
// Make sure we can add another normal document
doc = new Document();
doc.add(new Field("content", "aa bb cc dd", Field.Store.NO,
Field.Index.TOKENIZED));
writer.addDocument(doc);
writer.close();
}
{code}
> NullPointerException during indexing in
> DocumentsWriter$ThreadState$FieldData.addPosition
> -----------------------------------------------------------------------------------------
>
> Key: LUCENE-1072
> URL: https://issues.apache.org/jira/browse/LUCENE-1072
> Project: Lucene - Java
> Issue Type: Bug
> Components: Index
> Affects Versions: 2.3
> Environment: Linux CentOS 5 x86_64 running on 2-core Pentium D, Java
> HotSpot(TM) 64-Bit Server VM (build 1.6.0_01-b06, mixed mode), using
> lucene-core-2007-11-29_02-49-31
> Reporter: Alexei Dets
> Assignee: Michael McCandless
> Fix For: 2.3
>
> Attachments: LUCENE-1072.patch
>
>
> In my case during indexing sometimes appear documents with unusually large
> "words" - text-encoded images in fact.
> Attempt to add document that contains field with such token produces
> java.lang.IllegalArgumentException:
> java.lang.IllegalArgumentException: term length 37944 exceeds max term length
> 16383
> at
> org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.addPosition(DocumentsWriter.java:1492)
> at
> org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.invertField(DocumentsWriter.java:1321)
> at
> org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.processField(DocumentsWriter.java:1247)
> at
> org.apache.lucene.index.DocumentsWriter$ThreadState.processDocument(DocumentsWriter.java:972)
> at
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:2202)
> at
> org.apache.lucene.index.DocumentsWriter.addDocument(DocumentsWriter.java:2186)
> at
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1432)
> at
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1411)
> This is expected, exception is caught and ignored. The problem is that after
> this IndexWriter becomes somewhat corrupted and subsequent attempts to add
> documents to the index fail as well, this time with NPE:
> java.lang.NullPointerException
> at
> org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.addPosition(DocumentsWriter.java:1497)
> at
> org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.invertField(DocumentsWriter.java:1321)
> at
> org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.processField(DocumentsWriter.java:1247)
> at
> org.apache.lucene.index.DocumentsWriter$ThreadState.processDocument(DocumentsWriter.java:972)
> at
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:2202)
> at
> org.apache.lucene.index.DocumentsWriter.addDocument(DocumentsWriter.java:2186)
> at
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1432)
> at
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1411)
> This is 100% reproducible.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]