[ 
https://issues.apache.org/jira/browse/LUCENE-10317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460252#comment-17460252
 ] 

Claude Lepère commented on LUCENE-10317:
----------------------------------------

Thank you for your quick answer.
In "the fox is quick", "fox" position is 1, "is" position is 2 and "quick"
is 3; to match "quick fox", first move of "fox" to place it at position 2
(atop "is"), second move to place it at position 3 (atop "quick") and third
move to place it after "quick" = 3 moves; if "is" is removed, one less move.
Is this way to calculate the minimum needed slop correct?
Next test with more terms: query = " wordD wordB wordA", document = "wordA
wordB wordC wordD", the minimum slop is 5.
How does Lucene arrive at this result?




> In PhraseQuery API, the explanation of getSlop is not inexact but could be 
> more clear
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-10317
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10317
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: 5.2.1
>            Reporter: Claude Lepère
>            Priority: Trivial
>
> The explanation says that searching for "quick fox" will match the document 
> "the fox is quick" with a slop of 3.
> That's true if the stop word "is" is not removed by the analyzer at indexing 
> but, with the standard stop word list of Lucene which includes "is", a slop 
> of 2 is enough.
> As I understand the comment in the PhraseQuery source, switching the order of 
> two words requires two moves (the first places the words atop one another) 
> and the slop is 2, but, if "is" is not removed, a third "move" is needed to 
> add "is" itself and the slop is 3. I am not sure of this explanation. I would 
> be happy to have it confirmed ... or not.
> I tested both cases in Lucene 5.2.1 but the text is the same in PhraseQuery 
> API 8_0_0.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to