index MultiReader.java FilterIndexReader.java IndexReader.java SegmentReader.java

Christoph Goller Tue, 20 Apr 2004 01:26:36 -0700

Incze Lajos wrote:

I'm putting my findings here, as seems to me related. In a mid size
corpora I've found the following mistery:

1) +SZIDO:"jan 1"                                    -- 92 hits
2) +SZIDO:"jan 1" +TYPE:ER-CIKK                      -- 433 hits
3) +SZIDO:"jan 1" +TYPE:ER-CIKK NONSENSE:nonsense    -- 92 hits

2) is obviously a nonsense. The NONSENSE field in the 3rd query
does not exists. Altough I do not understand what's happening,
and couldn't produce a revealing test case, I've found that if
I switch off the ConjunctionScorer optimization (the same way
as the 3rd query switched it off) by inserting

///////////////////////////////////////////////////////////////////
      allRequired = false;
///////////////////////////////////////////////////////////////////
      if (allRequired && noneBoolean) {           // ConjunctionScorer is okay

this bug disappears. Also, found that (at least for me) only the
PhraseQuery produces this result. If I change the 2nd query with

2A) +SZIDO:(+jan +1) +TYPE:ER-CIKK

I gain the (good) 92 hits result. I'm almost sure that there is something
wrong with the document order and skipto what is specific to the
PhraseQuery.

incze

Hi Incze,

looks like the bug in PhraseScorer that I fixed last week (discovered by Daniel). Could you verify whether the strange behavior still shows up with the current CVS-version of Lucene. You may use your old index. Reindexing is not necessary.

Thanks,
Christoph

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: cvs commit: jakarta-lucene/src/java/org/apache/lucene/index MultiReader.java FilterIndexReader.java IndexReader.java SegmentReader.java

Reply via email to