[ 
https://issues.apache.org/jira/browse/LUCENE-7619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15888747#comment-15888747
 ] 

Jigar Shah commented on LUCENE-7619:
------------------------------------

Hello [~mikemccand]

+1 

Many thanks for fixing this!

I am using WordDelemeterFilter (which often breaks phrase queries on words with 
puntuations). I am currently using Lucene 6.4.1 in production. Can you please 
suggest which classes I should patch on Lucene 6.4.1 to use this feature. 
Patching just WordDelimiterGraphFilter and using it in token stream instead of 
WordDelemeterFilter be fine? or there are any other dependent classes which I 
have to patch (please provide list if there are other classes too) ? 

Once Lucene 6.5 is released i will upgrade to Lucene 6.5 so i will get better 
tested fix, but for now i would like to patch Lucene 6.4.1 if patch is 
compitible and simple.

> Add WordDelimiterGraphFilter
> ----------------------------
>
>                 Key: LUCENE-7619
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7619
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: master (7.0), 6.5
>
>         Attachments: after.png, before.png, LUCENE-7619.patch, 
> LUCENE-7619.patch, LUCENE-7619.patch
>
>
> Currently, {{WordDelimiterFilter}} doesn't try to set the {{posLen}} 
> attribute and so it creates graphs like this:
> !before.png!
> but with this patch (still a work in progress) it creates this graph instead:
> !after.png!
> This means (today) positional queries when using WDF at search time are 
> buggy, but since we fixed LUCENE-7603, with this change here you should be 
> able to use positional queries with WDGF.
> I'm also trying to produce holes properly (removes logic from the current WDF 
> that swallows a hole when whole token is just delimiters).
> Surprisingly, it's actually quite easy to tweak WDF to create a graph (unlike 
> e.g. {{SynonymGraphFilter}}) because it's already creating the necessary new 
> positions, and its output graph never has side paths, except for single 
> tokens that skip nodes because they have {{posLen > 1}}.  I.e. the only fix 
> to make, I think, is to set {{posLen}} properly.  And it really helps that it 
> does its own "new token buffering + sorting" already.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to