Re: Bug In IndexWriter.addDocument?

Ajay Lakhani Mon, 07 Jul 2008 23:07:07 -0700

Dear Digy,
As of Lucene 2.3, there are new setValue(...) methods that allow you to
change the value of a Field. However, there seems to be an issue with the
org.apache.lucene.index.FieldWriter.writeField(...) API that stores the
string value for the field, which happens to be null in the case of a
TokenStream.



The org.apache.lucene.index.FieldWriter.writeField(...) API needs to be
changed to verify whether the Field Data is an instance of String, Reader or
a TokenStream and then retrieve the respective values. I shall patch this
soon.

Is there a particular reason you are using a TokenStream ? I suggest you set
the text value directly to the Field: Field1.setValue("xxx");

Moreover, it's best to create a single Document instance, then add multiple
Field instances to it, but hold onto these Field instances and re-use them
by changing their values for each added document. After the document is
added, you then directly change the Field values (idField.setValue(...),
etc), and then re-add your Document instance. You cannot re-use a single
Field instance within a Document, and, you should not change a Field's value
until the Document containing that Field has been added to the index.

2008/7/8 Digy <[EMAIL PROTECTED]>:

>  Hi all,
>
>
>
> I am a Lucene.Net user. Since I need a fast indexing in my current project
> I try to use Lucene 2.3.2 which I convert to .Net with IKVM(Since Lucene.Net
> is currently in v2.1) and I use the same instances of document and fields to
> gain some speed improvements.
>
>
>
> I use TokenStreams to set the value of fields.
>
>
>
> My problem is that I get NullPointerException in "addDocument".
>
>
>
> Exception in thread "main" java.lang.NullPointerException
>
>         at
> org.apache.lucene.store.IndexOutput.writeString(IndexOutput.java:99)
>
>         at
> org.apache.lucene.index.FieldsWriter.writeField(FieldsWriter.java:127)
>
>         at
> org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.processField(DocumentsWriter.java:1418)
>
>         at
> org.apache.lucene.index.DocumentsWriter$ThreadState.processDocument(DocumentsWriter.java:1121)
>
>         at
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:2442)
>
>         at
> org.apache.lucene.index.DocumentsWriter.addDocument(DocumentsWriter.java:2424)
>
>         at
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1464)
>
>         at
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1442)
>
>         at MainClass.Test(MainClass.java:39)
>
>         at MainClass.main(MainClass.java:10)
>
>
>
> To show the same bug in Java I prepared a sample application (oh, that was
> hard since this is my second app. in java(first one was a "Hello World"
> app.))
>
>
>
> Is something wrong with my application or is it a bug in Lucene?
>
>
>
> Thanks,
>
> DIGY
>
>
>
>
>
>
>
> *SampleCode:*
>
> *    public class **MainClass***
>
> *    {*
>
> *             *
>
> *        DummyTokenStream **DummyTokenStream1** = new DummyTokenStream();*
>
> *        DummyTokenStream **DummyTokenStream2** = new DummyTokenStream();*
>
> * *
>
> *       //use the same document&field instances for Indexing*
>
> *        org.apache.lucene.document.Document **Doc** = new
> org.apache.lucene.document.Document();*
>
> * *
>
> *        org.apache.lucene.document.Field **Field1** = new
> org.apache.lucene.document.Field("Field1", "",
> org.apache.lucene.document.Field.Store.YES,
> org.apache.lucene.document.Field.Index.TOKENIZED);*
>
> *        org.apache.lucene.document.Field **Field2** = new
> org.apache.lucene.document.Field("Field2", "",
> org.apache.lucene.document.Field.Store.YES,
> org.apache.lucene.document.Field.Index.TOKENIZED);*
>
> * *
>
> *        public **MainClass**()*
>
> *        {*
>
> *            Doc.add(Field1);*
>
> *            Doc.add(Field2);*
>
> *        }*
>
> * *
>
> * *
>
> *        public void Index() throws *
>
> *                           org.apache.lucene.index.CorruptIndexException,
> *
>
> *
> org.apache.lucene.store.LockObtainFailedException,*
>
> *                           java.io.IOException*
>
> *        {*
>
> *              System.out.println("Index Started"); *
>
> *             org.apache.lucene.index.IndexWriter wr = new
> org.apache.lucene.index.IndexWriter("testindex", new
> org.apache.lucene.analysis.WhitespaceAnalyzer(),true);*
>
> *            *
>
> *            for (int i = 0; i < 100; i++)*
>
> *            {*
>
> *                    PrepDoc();*
>
> *                    wr.addDocument(Doc);*
>
> *            }*
>
> *            wr.close();*
>
> *             System.out.println("Index Completed"); *
>
> *        }*
>
> * *
>
> *        **void PrepDoc()*
>
> *        {*
>
> *            DummyTokenStream1.SetText("test1"); //Set a new Text to Token
> Stream*
>
> *            Field1.setValue(DummyTokenStream1); //Set TokenStream to
> Field Value*
>
> * *
>
> * *
>
> *            DummyTokenStream2.SetText("test2"); //Set a new Text to Token
> Stream*
>
> *            Field2.setValue(DummyTokenStream2); //Set TokenStream to
> Field Value*
>
> *        }*
>
> * *
>
> *       public static void main(String[] args)  throws*
>
> *                    org.apache.lucene.index.CorruptIndexException,*
>
> *                    org.apache.lucene.store.LockObtainFailedException,*
>
> *                    java.io.IOException*
>
> *       {*
>
> *              MainClass m = new MainClass();*
>
> *              m.Index();*
>
> *       }*
>
> * *
>
> * *
>
> * *
>
> *             *
>
> *       public class **DummyTokenStream **extends
> org.apache.lucene.analysis.TokenStream*
>
> *       {*
>
> *              String Text = "";*
>
> *              boolean EndOfStream = false;*
>
> *              org.apache.lucene.analysis.Token Token = new
> org.apache.lucene.analysis.Token();*
>
> * *
>
> *             //return "Text" as the first token and null as the second*
>
> *             public org.apache.lucene.analysis.Token next()*
>
> *             {*
>
> *                    if (EndOfStream == false)*
>
> *                    {*
>
> *                           EndOfStream = true;*
>
> * *
>
> *                           Token.setTermText(Text);*
>
> *                           Token.setStartOffset(0);*
>
> *                           Token.setEndOffset(Text.length() - 1);*
>
> *                           Token.setTermLength(Text.length());*
>
> *                           return Token;*
>
> *                    }*
>
> *                    return null;*
>
> *             }*
>
> * *
>
> *             public void SetText(String Text)*
>
> *             {*
>
> *                    EndOfStream = false;*
>
> *                    this.Text = Text;*
>
> *             }*
>
> *       }*
>
> * *
>
> *    }*
>
>
>
>
>

Re: Bug In IndexWriter.addDocument?

Reply via email to