Multi-word synonym filter (synonym expansion at indexing time).
---------------------------------------------------------------

                 Key: LUCENE-1622
                 URL: https://issues.apache.org/jira/browse/LUCENE-1622
             Project: Lucene - Java
          Issue Type: New Feature
          Components: contrib/*
            Reporter: Dawid Weiss
            Priority: Minor
         Attachments: synonyms.patch

It would be useful to have a filter that provides support for indexing-time 
synonym expansion, especially for multi-word synonyms (with multi-word matching 
for original tokens).

The problem is not trivial, as observed on the mailing list. The problems I was 
able to identify (mentioned in the unit tests as well):

- if multi-word synonyms are indexed together with the original token stream 
(at overlapping positions), then a query for a partial synonym sequence (e.g., 
"big" in the synonym "big apple" for "new york city") causes the document to 
match;

- there are problems with highlighting the original document when synonym is 
matched (see unit tests for an example),

- if the synonym is of different length than the original sequence of tokens to 
be matched, then phrase queries spanning the synonym and the original sequence 
boundary won't be found. Example "big apple" synonym for "new york city". A 
phrase query "big apple restaurants" won't match "new york city restaurants".

I am posting the patch that implements phrase synonyms as a token filter. This 
is not necessarily intended for immediate inclusion, but may provide a basis 
for many people to experiment and adjust to their own scenarios.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to