[jira] [Commented] (LUCENE-6582) SynonymFilter should generate a correct (or, at least, better) graph

Ian Ribas (JIRA) Mon, 22 Jun 2015 15:30:22 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-6582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596772#comment-14596772
 ]


Ian Ribas commented on LUCENE-6582:
-----------------------------------

Ok, I think I have an idea on how to do this, but it will definitely need some 
thinking, so it will probably take a while.

bq. I think that's fine, I think better correctness trumps the added buffering 
cost.

I completely agree.

bq. I think if we fix syn filter here to produce the correct graph, we should 
also insert a "sausagizer" phase that turns this graph back into a sausage for 
indexing?

I think Robert also commented something on these lines in his answer to my 
email. I think I understand the general idea of what that means, but I would 
certainly appreciate some guidance, when the time comes. I'll focus on 
producing a correct graph first.

This also means that maybe I'll need changes on the test validations, since we 
might run into conditions that are considered wrong now. Specially regarding 
offsets (start and end) and their relation to position lengths. But I'll see 
what I can do about that too.

bq. However, if you apply syn filter at search time, we could fix query parsers 
to possibly "do the right thing" here

I was planning taking a shot at that too, once this part is finished. To make 
the solution more complete. And, again, I'll certainly appreciate ideas when 
the time comes.

> SynonymFilter should generate a correct (or, at least, better) graph
> --------------------------------------------------------------------
>
>                 Key: LUCENE-6582
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6582
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Ian Ribas
>         Attachments: LUCENE-6582.patch, LUCENE-6582.patch, after.png, 
> after2.png, after3.png, before.png
>
>
> Some time ago, I had a problem with synonyms and phrase type queries 
> (actually, it was elasticsearch and I was using a match query with multiple 
> terms and the "and" operator, as better explained here: 
> https://github.com/elastic/elasticsearch/issues/10394).
> That issue led to some work on Lucene: LUCENE-6400 (where I helped a little 
> with tests) and  LUCENE-6401. This issue is also related to LUCENE-3843.
> Starting from the discussion on LUCENE-6400, I'm attempting to implement a 
> solution. Here is a patch with a first step - the implementation to fix 
> "SynFilter to be able to 'make positions'" (as was mentioned on the 
> [issue|https://issues.apache.org/jira/browse/LUCENE-6400?focusedCommentId=14498554&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14498554]).
>  In this way, the synonym filter generates a correct (or, at least, better) 
> graph.
> As the synonym matching is greedy, I only had to worry about fixing the 
> position length of the rules of the current match, no future or past synonyms 
> would "span" over this match (please correct me if I'm wrong!). It did 
> require more buffering, twice as much.
> The new behavior I added is not active by default, a new parameter has to be 
> passed in a new constructor for {{SynonymFilter}}. The changes I made do 
> change the token stream generated by the synonym filter, and I thought it 
> would be better to let that be a voluntary decision for now.
> I did some refactoring on the code, but mostly on what I had to change for 
> may implementation, so that the patch was not too hard to read. I created 
> specific unit tests for the new implementation 
> ({{TestMultiWordSynonymFilter}}) that should show how things will be with the 
> new behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-6582) SynonymFilter should generate a correct (or, at least, better) graph

Reply via email to