[ 
https://issues.apache.org/jira/browse/LUCENE-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Busch reopened LUCENE-1072:
-----------------------------------


I'm seeing a similar issue when TokenStream.next() throws an
IOException (or a RuntimeException). The DocumentsWriter is
thereafter not usable anymore, i. e. subsequent calls of 
addDocument()  fail with a NullPointerException.

I added this test to TestIndexWriter which shows the problem:
{code:java}
  public void testExceptionFromTokenStream() throws IOException {
    RAMDirectory dir = new RAMDirectory();
    IndexWriter writer = new IndexWriter(dir, new Analyzer() {

      public TokenStream tokenStream(String fieldName, Reader reader) {
        return new TokenFilter(new StandardTokenizer(reader)) {
          private int count = 0;

          public Token next() throws IOException {
            if (count++ == 5) {
              throw new IOException();
            }
            return input.next();
          }
        };
      }

    }, true);

    Document doc = new Document();
    String contents = "aa bb cc dd ee ff gg hh ii jj kk";
    doc.add(new Field("content", contents, Field.Store.NO,
        Field.Index.TOKENIZED));
    try {
      writer.addDocument(doc);
      fail("did not hit expected exception");
    } catch (Exception e) {
    }

    // Make sure we can add another normal document
    doc = new Document();
    doc.add(new Field("content", "aa bb cc dd", Field.Store.NO,
        Field.Index.TOKENIZED));
    writer.addDocument(doc);

    // Make sure we can add another normal document
    doc = new Document();
    doc.add(new Field("content", "aa bb cc dd", Field.Store.NO,
        Field.Index.TOKENIZED));
    writer.addDocument(doc);

    writer.close();
  }

{code}

> NullPointerException during indexing in 
> DocumentsWriter$ThreadState$FieldData.addPosition
> -----------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1072
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1072
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.3
>         Environment: Linux CentOS 5 x86_64 running on 2-core Pentium D, Java 
> HotSpot(TM) 64-Bit Server VM (build 1.6.0_01-b06, mixed mode), using 
> lucene-core-2007-11-29_02-49-31
>            Reporter: Alexei Dets
>            Assignee: Michael McCandless
>             Fix For: 2.3
>
>         Attachments: LUCENE-1072.patch
>
>
> In my case during indexing sometimes appear documents with unusually large 
> "words" - text-encoded images in fact.
> Attempt to add document that contains field with such token produces 
> java.lang.IllegalArgumentException:
> java.lang.IllegalArgumentException: term length 37944 exceeds max term length 
> 16383
>         at 
> org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.addPosition(DocumentsWriter.java:1492)
>         at 
> org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.invertField(DocumentsWriter.java:1321)
>         at 
> org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.processField(DocumentsWriter.java:1247)
>         at 
> org.apache.lucene.index.DocumentsWriter$ThreadState.processDocument(DocumentsWriter.java:972)
>         at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:2202)
>         at 
> org.apache.lucene.index.DocumentsWriter.addDocument(DocumentsWriter.java:2186)
>         at 
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1432)
>         at 
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1411)
> This is expected, exception is caught and ignored. The problem is that after 
> this IndexWriter becomes somewhat corrupted and subsequent attempts to add 
> documents to the index fail as well, this time with NPE:
> java.lang.NullPointerException
>         at 
> org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.addPosition(DocumentsWriter.java:1497)
>         at 
> org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.invertField(DocumentsWriter.java:1321)
>         at 
> org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.processField(DocumentsWriter.java:1247)
>         at 
> org.apache.lucene.index.DocumentsWriter$ThreadState.processDocument(DocumentsWriter.java:972)
>         at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:2202)
>         at 
> org.apache.lucene.index.DocumentsWriter.addDocument(DocumentsWriter.java:2186)
>         at 
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1432)
>         at 
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1411)
> This is 100% reproducible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to