I would do the query parser part first, without the graph part. This would allow two words without quotes to match a two-word synonym. This would be a great improvement on the current behavior. Suggested behavior:
one two three - "one two", "two three" and "one two three" will checked against synonyms one two "three" - "one two" can be a synonym one two OR three - "one two" can be a synonym one OR two OR three - no multi-word synonyms This would be a clear intuitive behavior. I'm sure there are other use cases that may not make sense, but these are the common use case. On Fri, Aug 10, 2012 at 2:21 PM, Jack Krupansky <j...@basetechnology.com> wrote: > I just noticed this in SynonymFilter in trunk: > > // TODO: we should set PositionLengthAttr too... > > It looks like the code does in fact set the PositionLengthAttribute, so > maybe it is just a dead TODO. > > And, I see the following comment (which I had seen before and was the basis > for my assertion that arbitrary graphs were not supported: > > * <p><b>NOTE</b>: when a match occurs, the output tokens > * associated with the matching rule are "stacked" on top of > * the input stream (if the rule had > * <code>keepOrig=true</code>) and also on top of another > * matched rule's output tokens. This is not a correct > * solution, as really the output should be an arbitrary > * graph/lattice. For example, with the above match, you > * would expect an exact <code>PhraseQuery</code> <code>"y b > * c"</code> to match the parsed tokens, but it will fail to > * do so. This limitation is necessary because Lucene's > * TokenStream (and index) cannot yet represent an arbitrary > * graph.</p> > > Granted, some of that is specific to index-time support for synonyms, which > I am avoiding, but it is a source for some confusion. If full graphs are > somehow supported at query time (or in the TokenStream in general), that > should be stated more clearly. > > > -- Jack Krupansky > > -----Original Message----- From: Robert Muir > Sent: Friday, August 10, 2012 1:44 PM > To: dev@lucene.apache.org > Subject: Re: Proposal: Full support for multi-word synonyms at query time > > > On Fri, Aug 10, 2012 at 1:36 PM, Jack Krupansky <j...@basetechnology.com> > wrote: >> >> One of the ongoing potholes of Solr and Lucene is lack of full support for >> multi-word synonyms at query time. The root of the problem is twofold: >> individual terms are presented for analysis which precludes recognition of >> multi-term synonyms, and the output stream from the analyis process is a >> single, linear stream without regard to any graph/lattice structure for >> multiple synonyms. > > > But this is not true. PositionLengthAttribute was already added, which > makes it a graph. > > -- > lucidimagination.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > -- Lance Norskog goks...@gmail.com --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org