I'm putting my findings here, as seems to me related. In a mid size corpora I've found the following mistery:
1) +SZIDO:"jan 1" -- 92 hits 2) +SZIDO:"jan 1" +TYPE:ER-CIKK -- 433 hits 3) +SZIDO:"jan 1" +TYPE:ER-CIKK NONSENSE:nonsense -- 92 hits
2) is obviously a nonsense. The NONSENSE field in the 3rd query does not exists. Altough I do not understand what's happening, and couldn't produce a revealing test case, I've found that if I switch off the ConjunctionScorer optimization (the same way as the 3rd query switched it off) by inserting
/////////////////////////////////////////////////////////////////// allRequired = false; /////////////////////////////////////////////////////////////////// if (allRequired && noneBoolean) { // ConjunctionScorer is okay
this bug disappears. Also, found that (at least for me) only the PhraseQuery produces this result. If I change the 2nd query with
2A) +SZIDO:(+jan +1) +TYPE:ER-CIKK
I gain the (good) 92 hits result. I'm almost sure that there is something wrong with the document order and skipto what is specific to the PhraseQuery.
incze
Hi Incze,
looks like the bug in PhraseScorer that I fixed last week (discovered by Daniel). Could you verify whether the strange behavior still shows up with the
current CVS-version of Lucene. You may use your old index. Reindexing is not
necessary.
Thanks, Christoph
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]