I thought if I have a synonym map that says “synonym alias” is an alias
for “canonical phrase”, and I noodle “canonical phrase” through the
query parser, telling it to auto generate multi term queries, I'd get a
multi term query. But that doesn't seem to be the case.
The only way to generate multi term queries seems to be when the synonym
says that “shortsyn” is an alias for “another phrase”, and then noodle
“shortsyn” through the query parser. Then I get foo:"another phrase"~1
which is what I expected.
My use case is as follows: I have some multi-word strings, and I need to
create queries from them. And if one of the synonym phrases appears in
the multi-word string, then I would like to generate a phrase query for
that part. For example, given the synonyms mentioned above, if the
multi-word string is, say, “my synonym alias is nice”, then I'd like to
generate a query that searches for the word “my”, the phrase “canonical
phrase”, and the words “is” and “nice”. Maybe I would like to
/also/ search for the words “synonym” and “alias”, or the words
“canonical” and “phrase”, or all four of them, I'm not sure.
This description left out quite a bit of information, I'll paste some
code below to clarify.
Kai
/**
* This tests the behavior of the Lucene query
* builder with synonyms
*/
public class SynonymGraphQueryBuilderTest {
private static class MyAnalyzer extends Analyzer {
private final CharArraySet stopwords;
private final SynonymMap synonyms;
public MyAnalyzer(Set<String> stopwords, SynonymMap synonyms) {
this.stopwords = new CharArraySet(stopwords, true);
this.synonyms = synonyms;
}
@Override
protected TokenStreamComponents createComponents(String
fieldName) {
final Tokenizer src = new SimplePatternTokenizer("[a-z0-9]+");
TokenStream tok = new LowerCaseFilter(src);
tok = new SynonymGraphFilter(tok, synonyms, true);
tok = new FlattenGraphFilter(tok);
tok = new StopFilter(tok, stopwords);
return new TokenStreamComponents(
src::setReader,
tok);
}
}
@Test
void testSynonymPhrases() throws Exception {
Builder builder = new Builder();
// canonical phrase <- synonym alias
CharsRef canonical = Builder.join(new String[] { "canonical",
"phrase" }, new CharsRefBuilder());
CharsRef synonym = Builder.join(new String[] { "synonym",
"alias" }, new CharsRefBuilder());
builder.add(synonym, canonical, true);
// another phrase <- shortsyn
canonical = Builder.join(new String[] { "another", "phrase" },
new CharsRefBuilder());
synonym = Builder.join(new String[] { "shortsyn" }, new
CharsRefBuilder());
builder.add(synonym, canonical, true);
SynonymMap synonyms = builder.build();
Set<String> stopwords = Set.of("the");
MyAnalyzer analyzer = new MyAnalyzer(stopwords, synonyms);
QueryParser queryParser = new QueryParser("foo", analyzer);
queryParser.setSplitOnWhitespace(true);
queryParser.setAutoGeneratePhraseQueries(true);
queryParser.setAutoGenerateMultiTermSynonymsPhraseQuery(true);
queryParser.setPhraseSlop(1);
Query q = queryParser.parse("canonical phrase");
assertEquals("foo:canonical foo:phrase", q.toString(),
"I was expecting a phrase query here: foo:\"canonical
phrase\"~1");
q = queryParser.parse("synonym alias");
assertEquals("foo:synonym foo:alias", q.toString(),
"I was expecting a phrase query here: foo:\"canonical
phrase\"~1");
q = queryParser.parse("shortsyn");
assertEquals("foo:\"another phrase\"~1 foo:shortsyn",
q.toString(),
"This is what I expected.");
q = queryParser.parse("another phrase");
assertEquals("foo:another foo:phrase", q.toString(),
"I was expecting a phrase query here: foo:\"another
phrase\"~1");
}
}