Thank you Mikhail, very interesting.  It has taken me a long time to reply because I got other priorities...

With “enable position increments” it works much better.  “Split on whitespace” has to be false (as you say) and “auto-generate phrase queries” also has to be false.  But interestingly enough, “auto-generate multi-term synonyms phrase query” can be true, and setting it to true helps.

This is now good enough for my actual application code.  I do still see some oddities.  One of them is hopefully more cosmetic, and the other can be worked around.

I will work around the following behavior:

 * If a phrase appears as the /output/, but not as the /input/, of a
   SynonymMap entry, then it is /not/ automatically recognized.
 * A phrase that appears as the input of a SynonymMap entry is
   automatically recognized.

“My” synonyms are structured in such a way that there is a canonical term and multiple possible alias terms.  My understanding was that I should have one SynonymMap entry per alias term, each of them specifying the alias term as input and the canonical term as output.  I will work around the problem by adding another SynonymMap entry, specifying the canonical term as both input and output.

 * If I map a phrase to itself (i.e. both input and output) then it's
   doubled in the resulting query.

The workaround above means that the canonical terms are doubled in the query, but I'm just going to live with that.  I hope it doesn't skew the weights too bad.

Kai


On 2025-11-03 21:38, Mikhail Khludnev wrote:
Hello Kai

Pardon for vide coding, but this sample https://github.com/mkhludnev/mutlyword-phrase-query-test/blob/3e3f1cce6b2b6790970e4a042ddb2967e49d0077/src/test/java/org/example/phrases/MultiWordTests.java#L88

parses plain biword "power grid" without quotes as a bool/should of phrases

org.example.phrases.MultiWordTests#testPhraseQueryGeneratedFromPlainMultiWordSynonym

Parsed Query for 'power grid': ("electrical grid" "power grid")
Does it look closer to what you are looking for?


On Mon, Nov 3, 2025 at 1:50 PM Kai Grossjohann <[email protected]> wrote:

    Hi Mikhail,

    I tried to change this to false, and this was the result:

    java.lang.IllegalArgumentException:
    setAutoGeneratePhraseQueries(true) is disallowed when
    getSplitOnWhitespace() == false

    I experimented with other combinations of setSplitOnWhitespace,
    setAutoGeneratePhraseQueries, and
    setAutoGenerateMultiTermSynonymsPhraseQuery.  None of them got me
    the phrase queries I'm looking for.  Though some of them searched
    for more synonyms.

    In particular, false/false/true resulted in “synonym alias” being
    parsed as Synonym(foo:canonical foo:synonym) Synonym(foo:alias
    foo:phrase) which still doesn't produce the foo:"canonical
    phrase"~1 that I was looking for.

    Kai

    On 2025-10-30 18:01, Mikhail Khludnev wrote:
    Hello Kaj

    Briefly skimming through the letter

              queryParser.setSplitOnWhitespace(true); // shouldn't false be here
    ?
              queryParser.setAutoGeneratePhraseQueries(true);
    queryParser.setAutoGenerateMultiTermSynonymsPhraseQuery(true);
              queryParser.setPhraseSlop(1);

              Query q = queryParser.parse("canonical phrase");
              assertEquals("foo:canonical foo:phrase", q.toString(),
                      "I was expecting a phrase query here: foo:\"canonical
    phrase\"~1");



    On Thu, Oct 30, 2025 at 4:49 PM Kai Grossjohann
    <[email protected]> <mailto:[email protected]> 
wrote:

    I thought if I have a synonym map that says “synonym alias” is an alias
    for “canonical phrase”, and I noodle “canonical phrase” through the
    query parser, telling it to auto generate multi term queries, I'd get a
    multi term query.  But that doesn't seem to be the case.

    The only way to generate multi term queries seems to be when the synonym
    says that “shortsyn” is an alias for “another phrase”, and then noodle
    “shortsyn” through the query parser.  Then I get foo:"another phrase"~1
    which is what I expected.

    My use case is as follows: I have some multi-word strings, and I need to
    create queries from them.  And if one of the synonym phrases appears in
    the multi-word string, then I would like to generate a phrase query for
    that part.  For example, given the synonyms mentioned above, if the
    multi-word string is, say, “my synonym alias is nice”, then I'd like to
    generate a query that searches for the word “my”, the phrase “canonical
    phrase”, and the words “is” and “nice”.  Maybe I would like to
    /also/ search for the words “synonym” and “alias”, or the words
    “canonical” and “phrase”, or all four of them, I'm not sure.

    This description left out quite a bit of information, I'll paste some
    code below to clarify.

    Kai

    /**
       * This tests the behavior of the Lucene query
       * builder with synonyms
       */
    public class SynonymGraphQueryBuilderTest {

          private static class MyAnalyzer extends Analyzer {
              private final CharArraySet stopwords;
              private final SynonymMap synonyms;

              public MyAnalyzer(Set<String> stopwords, SynonymMap synonyms) {
                  this.stopwords = new CharArraySet(stopwords, true);
                  this.synonyms = synonyms;
              }

              @Override
              protected TokenStreamComponents createComponents(String
    fieldName) {
                  final Tokenizer src = new SimplePatternTokenizer("[a-z0-9]+");
                  TokenStream tok = new LowerCaseFilter(src);
                  tok = new SynonymGraphFilter(tok, synonyms, true);
                  tok = new FlattenGraphFilter(tok);
                  tok = new StopFilter(tok, stopwords);
                  return new TokenStreamComponents(
                          src::setReader,
                          tok);
              }
          }

          @Test
          void testSynonymPhrases() throws Exception {
              Builder builder = new Builder();

              // canonical phrase <- synonym alias
              CharsRef canonical = Builder.join(new String[] { "canonical",
    "phrase" }, new CharsRefBuilder());
              CharsRef synonym = Builder.join(new String[] { "synonym",
    "alias" }, new CharsRefBuilder());
              builder.add(synonym, canonical, true);

              // another phrase <- shortsyn
              canonical = Builder.join(new String[] { "another", "phrase" },
    new CharsRefBuilder());
              synonym = Builder.join(new String[] { "shortsyn" }, new
    CharsRefBuilder());
              builder.add(synonym, canonical, true);

              SynonymMap synonyms = builder.build();

              Set<String> stopwords = Set.of("the");

              MyAnalyzer analyzer = new MyAnalyzer(stopwords, synonyms);

              QueryParser queryParser = new QueryParser("foo", analyzer);
              queryParser.setSplitOnWhitespace(true);
              queryParser.setAutoGeneratePhraseQueries(true);
    queryParser.setAutoGenerateMultiTermSynonymsPhraseQuery(true);
              queryParser.setPhraseSlop(1);

              Query q = queryParser.parse("canonical phrase");
              assertEquals("foo:canonical foo:phrase", q.toString(),
                      "I was expecting a phrase query here: foo:\"canonical
    phrase\"~1");

              q = queryParser.parse("synonym alias");
              assertEquals("foo:synonym foo:alias", q.toString(),
                      "I was expecting a phrase query here: foo:\"canonical
    phrase\"~1");

              q = queryParser.parse("shortsyn");
              assertEquals("foo:\"another phrase\"~1 foo:shortsyn",
    q.toString(),
                      "This is what I expected.");

              q = queryParser.parse("another phrase");
              assertEquals("foo:another foo:phrase", q.toString(),
                      "I was expecting a phrase query here: foo:\"another
    phrase\"~1");
          }
    }



--
Sincerely yours
Mikhail Khludnev

Reply via email to