[jira] [Commented] (LUCENE-6664) Replace SynonymFilter with SynonymGraphFilter

Paul Elschot (JIRA) Sat, 03 Oct 2015 01:48:40 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-6664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14942187#comment-14942187
 ]


Paul Elschot commented on LUCENE-6664:
--------------------------------------

>From the SausageGraphFilter: Lucene cannot yet index an arbitrary token graph.

Perhaps positional joins (LUCENE-5627) can help here.This indexes joins between 
non-decreasing positions of any field in the same document, and allows the 
joins to be queried. However I have the impression that these positional joins 
bring more complexity than what is needed here.

One basic mechanism for the positional joins is a non decreasing series of 
positions. (Currently these are in payloads, I'm considering docvalues). These 
are accessed by both index and value, and used at query time to jump for 
example from one field to another.
Another basic mechanism there is a hierarchy between the positions of a single 
field, for example for nested XML element names. This hierarchy is probably too 
restrictive here.

How arbitrary are the token graphs here?



> Replace SynonymFilter with SynonymGraphFilter
> ---------------------------------------------
>
>                 Key: LUCENE-6664
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6664
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-6664.patch, LUCENE-6664.patch, LUCENE-6664.patch, 
> LUCENE-6664.patch, usa.png, usa_flat.png
>
>
> Spinoff from LUCENE-6582.
> I created a new SynonymGraphFilter (to replace the current buggy
> SynonymFilter), that produces correct graphs (does no "graph
> flattening" itself).  I think this makes it simpler.
> This means you must add the FlattenGraphFilter yourself, if you are
> applying synonyms during indexing.
> Index-time syn expansion is a necessarily "lossy" graph transformation
> when multi-token (input or output) synonyms are applied, because the
> index does not store {{posLength}}, so there will always be phrase
> queries that should match but do not, and then phrase queries that
> should not match but do.
> http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html
> goes into detail about this.
> However, with this new SynonymGraphFilter, if instead you do synonym
> expansion at query time (and don't do the flattening), and you use
> TermAutomatonQuery (future: somehow integrated into a query parser),
> or maybe just "enumerate all paths and make union of PhraseQuery", you
> should get 100% correct matches (not sure about "proper" scoring
> though...).
> This new syn filter still cannot consume an arbitrary graph.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-6664) Replace SynonymFilter with SynonymGraphFilter

Reply via email to