I wonder if the Analysis chain could be involved. If those stop words
("is") are removed without leaving a hole somehow, then that could
explain?

On Mon, Dec 13, 2021 at 9:35 AM Michael McCandless
<luc...@mikemccandless.com> wrote:
>
> Hello Claude,
>
> Hmm, that is interesting that you see slop=2 matching query "quick fox"
> against document "the fox is quick".
>
> Edit distance (Levenshtein) is a bit tricky because it might include a
> transposition (just swapping the two words) as edit distance 1 OR 2.
>
> So maybe Lucene's PhraseQuery is counting transposition as edit distance 1,
> in which case, your test makes sense, and the javadocs are wrong?
>
> I am far from an expert on PhraseQuery :)  Does anyone know if we change
> the behavior?  In any case, we must at least fix the javadocs.  Claude,
> maybe open a Jira issue (
> https://issues.apache.org/jira/projects/LUCENE/summary) and we can
> discuss there?
>
> Thank you for catching this!
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, Dec 10, 2021 at 8:47 AM Claude Lepere <claudelep...@gmail.com>
> wrote:
>
> > Hello.
> >
> >
> > The explanation of
> >
> > https://lucene.apache.org/core/8_0_0/core/org/apache/lucene/search/PhraseQuery.html#getSlop
> > <
> > https://lucene.apache.org/core/8_0_0/core/org/apache/lucene/search/PhraseQuery.html#getSlop--
> > >
> > writes
> > that the edit distance between "quick fox" and "the fox is quick" would be
> > at an edit distance of 3;
> > this seems inaccurate to me.
> >
> > I don't know if the edit distance used by Lucene is the Levenshtein
> > distance (insertion, deletion, substitution, all of weight 1) - a standard
> > in information retrieval - but a test of "quick fox" PhraseQuery with a
> > slop of 2 hits the text "the fox is quick" (1 deletion + 1 insertion); the
> > slop does not have to be 3.
> >
> > I wonder if I'm right.
> >
> >
> > Claude Lepère, Belgium
> >
> > claudelep...@gmail.com
> >
> >
> >
> > <
> > http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail
> > >
> > Virus-free.
> > www.avg.com
> > <
> > http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail
> > >
> > <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to