This relates to the "position increment gap" for your analyzer and is configurable.
If you check the JavaDoc for Analyzer#getPositionIncrementGap, it says: * Invoked before indexing a IndexableField instance if terms have already been added to that * field. This allows custom analyzers to place an automatic position increment gap between * IndexbleField instances using the same field name. The default value position increment gap is * 0. With a 0 position increment gap and the typical default token position increment of 1, all * terms in a field, including across IndexableField instances, are in successive positions, * allowing exact PhraseQuery matches, for instance, across IndexableField instance boundaries. So, if you want "ab", "c d" to behave the same as "a b c d", you would use the default gap of 0. If you want them to behave differently, you can add a gap between successive values to prevent matching across them. Essentially, the position increment gap adds some number of "holes" (empty positions) between values. So, if you add a gap of 10, then the terms for "a b", "c d" would be in the following positions, I believe: 0 a 1 b 12 c 13 d Phrase matching works by checking if the term positions differ by the appropriate amount. If you have stop word removal, the above example might match the phrase "b the the the the the the the the the the c", because the "thes" (I write, as I'm currently wearing a t-shirt from the band "The The") would also map to empty positions. Hope that helps, Froh On Tue, Mar 25, 2025 at 9:47 AM Kai Grossjohann <grossjoh...@semedy.com.invalid> wrote: > Hi, > > I'd like to understand more about how multiple values of a field are > handled. Consider a Lucene document with a field foo that has a single > value “a b c d” versus another Lucene document where the field foo has > two values, namely “a b” and “c d”. > > When using Synonym Graph (so that synonym phrases are supported), and > supposing I have a synonym phrase “b c”... > > * I suppose the Lucene document with the single value “a b c d” > matches this synonym phrase, but > * does the other document match this phrase, as well? > > In a similar vein, how to phrase queries behave? If I query for the > phrase “b c” will the two-value document match? > > Thanks, > Kai >