Christoph,
Thanks for the patch and the test.
I refactored your test a bit, converted it to a JUnit-based unit test
and will commit it shortly, following it with your patch.
Thank you,
Otis
--- Christoph Goller <[EMAIL PROTECTED]> wrote:
> Hi Lucene Developers,
>
> first let me thank you all for this excellent peace of software
> that you created. I am using Lucene in several projects and I
> am currently also building more enhanced text mining applications
> on top of it. Because of that I have spent a lot of time studying
> the Lucene sources and I will come up with a couple of proposals
> for bug fixes in the next days. Here is the first one:
>
> I think I can fix a bug in SegmentsTermEnum.
> One can create a TermEnum from an IndexReader in two ways:
>
> indexReader.terms()
> indexReader.terms(t)
>
> If one gets a TermEnum starting at a specified term t one does not
> have to call enum.next() before using it. The enum is valid from the
> beginning.Calling enum.next() switches to the next term. However,
> this
> bahaviour is only true if our index consists of only one segment. If
> we
> have an index consisting of several segments term t is delivered
> twice,
> 1st time after calling indexReader.terms(t); enum.term(), 2nd time
> after
> calling enum.next(). Furthermore the initial document frequency might
> be false (if t occurs in more than one segment). The problem can be
> fixed by calling next() in the constructor of SegmentsTermEnum.
> I attach a test that demonstrates the problem and a patch that fixes
> it.
>
> kind regards,
> Christoph
>
> --
> *****************************************************************
> * Dr. Christoph Goller Tel.: +49 89 203 45734 *
> * Detego Software GmbH Mobile: +49 179 1128469 *
> * Keuslinstr. 13 Fax.: +49 721 151516176 *
> * 80798 M�nchen, Germany Email: [EMAIL PROTECTED] *
> *****************************************************************
> > import java.io.IOException;
>
> import org.apache.lucene.analysis.WhitespaceAnalyzer;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.Field;
> import org.apache.lucene.index.IndexReader;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.index.Term;
> import org.apache.lucene.index.TermEnum;
> import org.apache.lucene.store.Directory;
> import org.apache.lucene.store.RAMDirectory;
>
> /*
> * Created on 23.04.2003
> *
> * To change the template for this generated file go to
> * Window>Preferences>Java>Code Generation>Code and Comments
> */
>
> /**
> * @author goller
> *
> * To change the template for this generated type comment go to
> * Window>Preferences>Java>Code Generation>Code and Comments
> */
> public class SegmentsTermEnumTest {
>
> int docCount = 0;
>
> void addDoc1(IndexWriter writer)
> {
> Document doc = new Document();
>
> doc.add(Field.Keyword("id","id" + docCount));
> doc.add(Field.UnStored("content","aaa"));
>
> try {
> writer.addDocument(doc);
> }
> catch (IOException e) {
> // TODO Auto-generated catch block
> e.printStackTrace();
> }
> docCount++;
> }
>
> void addDoc2(IndexWriter writer)
> {
> Document doc = new Document();
>
> doc.add(Field.Keyword("id","id" + docCount));
> doc.add(Field.UnStored("content","aaa bbb"));
>
> try {
> writer.addDocument(doc);
> }
> catch (IOException e) {
> // TODO Auto-generated catch block
> e.printStackTrace();
> }
> docCount++;
> }
>
>
>
> public static void main(String[] args)
> {
> //System.out.println(System.getProperty("java.version"));
>
> Directory dir = new RAMDirectory();
> SegmentsTermEnumTest test = new SegmentsTermEnumTest();
>
> IndexWriter writer = null;
> IndexReader reader = null;
> TermEnum enum = null;
> int i;
>
> try {
> writer = new IndexWriter(dir, new WhitespaceAnalyzer(), true);
>
> for (i = 0; i < 100; i++)
> test.addDoc1(writer);
>
> for (i = 0; i < 100; i++)
> test.addDoc2(writer);
>
> writer.close();
> }
> catch (IOException e) {
> // TODO Auto-generated catch block
> e.printStackTrace();
> }
>
>
> try {
> reader = IndexReader.open(dir);
>
> System.out.println("terms():");
> enum = reader.terms();
> for(i = 0; i < 5 && enum.next(); i++)
> System.out.println(enum.term() + " " + enum.docFreq());
>
> enum.close();
>
> System.out.println();
> System.out.println("terms(\"aaa\")");
> enum = reader.terms(new Term("content", "aaa"));
> System.out.println(enum.term() + " " + enum.docFreq());
> for(i = 0; i < 5 && enum.next(); i++)
> System.out.println(enum.term() + " " + enum.docFreq());
>
> enum.close();
> reader.close();
>
> writer = new IndexWriter(dir, new WhitespaceAnalyzer(),
> false);
> writer.optimize();
> writer.close();
>
> System.out.println();
> System.out.println("optimize");
>
> reader = IndexReader.open(dir);
>
> System.out.println();
> System.out.println("terms():");
> enum = reader.terms();
> for(i = 0; i < 5 && enum.next(); i++)
> System.out.println(enum.term() + " " + enum.docFreq());
>
> enum.close();
>
> System.out.println();
> System.out.println("terms(\"aaa\")");
> enum = reader.terms(new Term("content", "aaa"));
> System.out.println(enum.term() + " " + enum.docFreq());
> for(i = 0; i < 5 && enum.next(); i++)
> System.out.println(enum.term() + " " + enum.docFreq());
>
> enum.close();
> reader.close();
>
> }
> catch (IOException e2) {
> // TODO Auto-generated catch block
> e2.printStackTrace();
> }
>
>
>
>
>
>
> }
> }
> > Index: SegmentsReader.java
> ===================================================================
> RCS file:
>
/home/cvspublic/jakarta-lucene/src/java/org/apache/lucene/index/SegmentsReader.java,v
> retrieving revision 1.11
> diff -u -r1.11 SegmentsReader.java
> --- SegmentsReader.java 1 May 2003 01:09:15 -0000 1.11
> +++ SegmentsReader.java 3 Sep 2003 13:03:27 -0000
> @@ -238,9 +238,7 @@
> }
>
> if (t != null && queue.size() > 0) {
> - SegmentMergeInfo top = (SegmentMergeInfo)queue.top();
> - term = top.termEnum.term();
> - docFreq = top.termEnum.docFreq();
> + next();
> }
> }
>
>
> >
---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]