I wonder if the Analysis chain could be involved. If those stop words ("is") are removed without leaving a hole somehow, then that could explain?
On Mon, Dec 13, 2021 at 9:35 AM Michael McCandless <luc...@mikemccandless.com> wrote: > > Hello Claude, > > Hmm, that is interesting that you see slop=2 matching query "quick fox" > against document "the fox is quick". > > Edit distance (Levenshtein) is a bit tricky because it might include a > transposition (just swapping the two words) as edit distance 1 OR 2. > > So maybe Lucene's PhraseQuery is counting transposition as edit distance 1, > in which case, your test makes sense, and the javadocs are wrong? > > I am far from an expert on PhraseQuery :) Does anyone know if we change > the behavior? In any case, we must at least fix the javadocs. Claude, > maybe open a Jira issue ( > https://issues.apache.org/jira/projects/LUCENE/summary) and we can > discuss there? > > Thank you for catching this! > > Mike McCandless > > http://blog.mikemccandless.com > > > On Fri, Dec 10, 2021 at 8:47 AM Claude Lepere <claudelep...@gmail.com> > wrote: > > > Hello. > > > > > > The explanation of > > > > https://lucene.apache.org/core/8_0_0/core/org/apache/lucene/search/PhraseQuery.html#getSlop > > < > > https://lucene.apache.org/core/8_0_0/core/org/apache/lucene/search/PhraseQuery.html#getSlop-- > > > > > writes > > that the edit distance between "quick fox" and "the fox is quick" would be > > at an edit distance of 3; > > this seems inaccurate to me. > > > > I don't know if the edit distance used by Lucene is the Levenshtein > > distance (insertion, deletion, substitution, all of weight 1) - a standard > > in information retrieval - but a test of "quick fox" PhraseQuery with a > > slop of 2 hits the text "the fox is quick" (1 deletion + 1 insertion); the > > slop does not have to be 3. > > > > I wonder if I'm right. > > > > > > Claude Lepère, Belgium > > > > claudelep...@gmail.com > > > > > > > > < > > http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail > > > > > Virus-free. > > www.avg.com > > < > > http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail > > > > > <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org