This relates to the "position increment gap" for your analyzer and is
configurable.

If you check the JavaDoc for Analyzer#getPositionIncrementGap, it says:

   * Invoked before indexing a IndexableField instance if terms have
already been added to that
   * field. This allows custom analyzers to place an automatic position
increment gap between
   * IndexbleField instances using the same field name. The default value
position increment gap is
   * 0. With a 0 position increment gap and the typical default token
position increment of 1, all
   * terms in a field, including across IndexableField instances, are in
successive positions,
   * allowing exact PhraseQuery matches, for instance, across
IndexableField instance boundaries.

So, if you want "ab", "c d" to behave the same as "a b c d", you would use
the default gap of 0. If you want them to
behave differently, you can add a gap between successive values to prevent
matching across them.

Essentially, the position increment gap adds some number of "holes" (empty
positions) between values. So, if you
add a gap of 10, then the terms for "a b", "c d" would be in the following
positions, I believe:
0   a
1   b
12  c
13  d

Phrase matching works by checking if the term positions differ by the
appropriate amount. If you have stop word
removal, the above example might match the phrase "b the the the the the
the the the the the c", because the
"thes" (I write, as I'm currently wearing a t-shirt from the band "The
The") would also map to empty positions.

Hope that helps,
Froh


On Tue, Mar 25, 2025 at 9:47 AM Kai Grossjohann
<grossjoh...@semedy.com.invalid> wrote:

> Hi,
>
> I'd like to understand more about how multiple values of a field are
> handled.  Consider a Lucene document with a field foo that has a single
> value “a b c d” versus another Lucene document where the field foo has
> two values, namely “a b” and “c d”.
>
> When using Synonym Graph (so that synonym phrases are supported), and
> supposing I have a synonym phrase “b c”...
>
>   * I suppose the Lucene document with the single value “a b c d”
>     matches this synonym phrase, but
>   * does the other document match this phrase, as well?
>
> In a similar vein, how to phrase queries behave?  If I query for the
> phrase “b c” will the two-value document match?
>
> Thanks,
> Kai
>

Reply via email to