Hi Kai, Pls remind me if you use SynonymGraphFilter or SynonymFilter ? and which version do you use? Just a quick answer, If it parses just two words, it's reasonable to yield a boolean query. and with useOrig=false, these two words are replaced to another ones, presumably it ok just yield a boolean query too. I think it goes somewhere https://github.com/apache/lucene/blob/c47ccd83da7692d2e7fa207eaca14975a614065f/lucene/core/src/java/org/apache/lucene/util/QueryBuilder.java#L555
However, useOrig=true, thus these two words should be extended with a pair of others overlapping'em, it might make sense to prohibit permutation between pairs with phrases. It probably goes here https://github.com/apache/lucene/blob/c47ccd83da7692d2e7fa207eaca14975a614065f/lucene/core/src/java/org/apache/lucene/util/QueryBuilder.java#L539 I'm definitely missing something. With regards to summing score from matching phrases, I suppose there's no option besides implementing custom parser producing DisMaxQuery. On Fri, Mar 6, 2026 at 6:05 PM Kai Grossjohann <[email protected]> wrote: > Cycling back on this one... I'm in a bit of a bind now. > > Using a SynonymMap with useOrig=true, the phrase recognition works: > > CharsRef canonical = createCharsRef("canonical phrase"); > CharsRef alias = createCharsRef("alias phrase"); > builder.add(canonical, canonical, true); > builder.add(alias, canonical, true); > > However, if I parse the string "alias phrase", then I get as query: > foo:"canonical phrase" foo:"alias phrase" > > This results in skewed scores, as another document that contains both of > them scores higher. The score is better with useOrig=false (third > parameter of builder.add), but then phrase recognition no longer works: > The string "alias phrase" now results in the query: foo:"canonical" > foo:"phrase" > > It feels to me that this is a bug, and phrase recognition should also > work with useOrig=false. > > What do people think? > > Thanks, > Kai > > On 2025-11-26 14:43, Kai Grossjohann wrote: > > > > Thank you Mikhail, very interesting. It has taken me a long time to > > reply because I got other priorities... > > > > With “enable position increments” it works much better. “Split on > > whitespace” has to be false (as you say) and “auto-generate phrase > > queries” also has to be false. But interestingly enough, > > “auto-generate multi-term synonyms phrase query” can be true, and > > setting it to true helps. > > > > This is now good enough for my actual application code. I do still > > see some oddities. One of them is hopefully more cosmetic, and the > > other can be worked around. > > > > I will work around the following behavior: > > > > * If a phrase appears as the /output/, but not as the /input/, of a > > SynonymMap entry, then it is /not/ automatically recognized. > > * A phrase that appears as the input of a SynonymMap entry is > > automatically recognized. > > > > “My” synonyms are structured in such a way that there is a canonical > > term and multiple possible alias terms. My understanding was that I > > should have one SynonymMap entry per alias term, each of them > > specifying the alias term as input and the canonical term as output. > > I will work around the problem by adding another SynonymMap entry, > > specifying the canonical term as both input and output. > > > > * If I map a phrase to itself (i.e. both input and output) then it's > > doubled in the resulting query. > > > > The workaround above means that the canonical terms are doubled in the > > query, but I'm just going to live with that. I hope it doesn't skew > > the weights too bad. > > > > Kai > > > > > > On 2025-11-03 21:38, Mikhail Khludnev wrote: > >> Hello Kai > >> > >> Pardon for vide coding, but this sample > >> > https://github.com/mkhludnev/mutlyword-phrase-query-test/blob/3e3f1cce6b2b6790970e4a042ddb2967e49d0077/src/test/java/org/example/phrases/MultiWordTests.java#L88 > >> > >> > >> parses plain biword "power grid" without quotes as a bool/should of > >> phrases > >> > >> > org.example.phrases.MultiWordTests#testPhraseQueryGeneratedFromPlainMultiWordSynonym > >> > >> Parsed Query for 'power grid': ("electrical grid" "power grid") > >> Does it look closer to what you are looking for? > >> > >> > >> On Mon, Nov 3, 2025 at 1:50 PM Kai Grossjohann > >> <[email protected]> wrote: > >> > >> Hi Mikhail, > >> > >> I tried to change this to false, and this was the result: > >> > >> java.lang.IllegalArgumentException: > >> setAutoGeneratePhraseQueries(true) is disallowed when > >> getSplitOnWhitespace() == false > >> > >> I experimented with other combinations of setSplitOnWhitespace, > >> setAutoGeneratePhraseQueries, and > >> setAutoGenerateMultiTermSynonymsPhraseQuery. None of them got me > >> the phrase queries I'm looking for. Though some of them searched > >> for more synonyms. > >> > >> In particular, false/false/true resulted in “synonym alias” being > >> parsed as Synonym(foo:canonical foo:synonym) Synonym(foo:alias > >> foo:phrase) which still doesn't produce the foo:"canonical > >> phrase"~1 that I was looking for. > >> > >> Kai > >> > >> On 2025-10-30 18:01, Mikhail Khludnev wrote: > >>> Hello Kaj > >>> > >>> Briefly skimming through the letter > >>> > >>> queryParser.setSplitOnWhitespace(true); // shouldn't > false be here > >>> ? > >>> queryParser.setAutoGeneratePhraseQueries(true); > >>> queryParser.setAutoGenerateMultiTermSynonymsPhraseQuery(true); > >>> queryParser.setPhraseSlop(1); > >>> > >>> Query q = queryParser.parse("canonical phrase"); > >>> assertEquals("foo:canonical foo:phrase", q.toString(), > >>> "I was expecting a phrase query here: > foo:\"canonical > >>> phrase\"~1"); > >>> > >>> > >>> > >>> On Thu, Oct 30, 2025 at 4:49 PM Kai Grossjohann > >>> <[email protected]> <mailto: > [email protected]> wrote: > >>> > >>>> I thought if I have a synonym map that says “synonym alias” is an > alias > >>>> for “canonical phrase”, and I noodle “canonical phrase” through > the > >>>> query parser, telling it to auto generate multi term queries, I'd > get a > >>>> multi term query. But that doesn't seem to be the case. > >>>> > >>>> The only way to generate multi term queries seems to be when the > synonym > >>>> says that “shortsyn” is an alias for “another phrase”, and then > noodle > >>>> “shortsyn” through the query parser. Then I get foo:"another > phrase"~1 > >>>> which is what I expected. > >>>> > >>>> My use case is as follows: I have some multi-word strings, and I > need to > >>>> create queries from them. And if one of the synonym phrases > appears in > >>>> the multi-word string, then I would like to generate a phrase > query for > >>>> that part. For example, given the synonyms mentioned above, if > the > >>>> multi-word string is, say, “my synonym alias is nice”, then I'd > like to > >>>> generate a query that searches for the word “my”, the phrase > “canonical > >>>> phrase”, and the words “is” and “nice”. Maybe I would like to > >>>> /also/ search for the words “synonym” and “alias”, or the words > >>>> “canonical” and “phrase”, or all four of them, I'm not sure. > >>>> > >>>> This description left out quite a bit of information, I'll paste > some > >>>> code below to clarify. > >>>> > >>>> Kai > >>>> > >>>> /** > >>>> * This tests the behavior of the Lucene query > >>>> * builder with synonyms > >>>> */ > >>>> public class SynonymGraphQueryBuilderTest { > >>>> > >>>> private static class MyAnalyzer extends Analyzer { > >>>> private final CharArraySet stopwords; > >>>> private final SynonymMap synonyms; > >>>> > >>>> public MyAnalyzer(Set<String> stopwords, SynonymMap > synonyms) { > >>>> this.stopwords = new CharArraySet(stopwords, true); > >>>> this.synonyms = synonyms; > >>>> } > >>>> > >>>> @Override > >>>> protected TokenStreamComponents createComponents(String > >>>> fieldName) { > >>>> final Tokenizer src = new > SimplePatternTokenizer("[a-z0-9]+"); > >>>> TokenStream tok = new LowerCaseFilter(src); > >>>> tok = new SynonymGraphFilter(tok, synonyms, true); > >>>> tok = new FlattenGraphFilter(tok); > >>>> tok = new StopFilter(tok, stopwords); > >>>> return new TokenStreamComponents( > >>>> src::setReader, > >>>> tok); > >>>> } > >>>> } > >>>> > >>>> @Test > >>>> void testSynonymPhrases() throws Exception { > >>>> Builder builder = new Builder(); > >>>> > >>>> // canonical phrase <- synonym alias > >>>> CharsRef canonical = Builder.join(new String[] { > "canonical", > >>>> "phrase" }, new CharsRefBuilder()); > >>>> CharsRef synonym = Builder.join(new String[] { > "synonym", > >>>> "alias" }, new CharsRefBuilder()); > >>>> builder.add(synonym, canonical, true); > >>>> > >>>> // another phrase <- shortsyn > >>>> canonical = Builder.join(new String[] { "another", > "phrase" }, > >>>> new CharsRefBuilder()); > >>>> synonym = Builder.join(new String[] { "shortsyn" }, new > >>>> CharsRefBuilder()); > >>>> builder.add(synonym, canonical, true); > >>>> > >>>> SynonymMap synonyms = builder.build(); > >>>> > >>>> Set<String> stopwords = Set.of("the"); > >>>> > >>>> MyAnalyzer analyzer = new MyAnalyzer(stopwords, > synonyms); > >>>> > >>>> QueryParser queryParser = new QueryParser("foo", > analyzer); > >>>> queryParser.setSplitOnWhitespace(true); > >>>> queryParser.setAutoGeneratePhraseQueries(true); > >>>> queryParser.setAutoGenerateMultiTermSynonymsPhraseQuery(true); > >>>> queryParser.setPhraseSlop(1); > >>>> > >>>> Query q = queryParser.parse("canonical phrase"); > >>>> assertEquals("foo:canonical foo:phrase", q.toString(), > >>>> "I was expecting a phrase query here: > foo:\"canonical > >>>> phrase\"~1"); > >>>> > >>>> q = queryParser.parse("synonym alias"); > >>>> assertEquals("foo:synonym foo:alias", q.toString(), > >>>> "I was expecting a phrase query here: > foo:\"canonical > >>>> phrase\"~1"); > >>>> > >>>> q = queryParser.parse("shortsyn"); > >>>> assertEquals("foo:\"another phrase\"~1 foo:shortsyn", > >>>> q.toString(), > >>>> "This is what I expected."); > >>>> > >>>> q = queryParser.parse("another phrase"); > >>>> assertEquals("foo:another foo:phrase", q.toString(), > >>>> "I was expecting a phrase query here: > foo:\"another > >>>> phrase\"~1"); > >>>> } > >>>> } > >>>> > >> > >> > >> -- > >> Sincerely yours > >> Mikhail Khludnev -- Sincerely yours Mikhail Khludnev
