Lucene won’t ignore these unless you tell it to via stopwords. This is a problem no matter how you look at it. If you do put in stopwords, the word _positions_ are retained. In your example, word position a 1 de 2 la 3 b 4
If you remove “de” and “la” via stopwords, the positions are still: word position a 1 b 4 So searching for “a b” would fail in the second case unless you included “slop” as “a b”~2 But let’s say you _do not_ have input with these stopwords, just “a b". The positions will be 1 and 2 respectively. Here the user would expect “a b” to match this doc, but not a doc with “a de la b” (unless they knew a lot about search!). So maybe the right thing to do is let phrases have slop as a matter of course. Best, Erick > On Feb 23, 2019, at 11:07 AM, baris.kazar <baris.ka...@oracle.com> wrote: > > Thanks Erick there is a pattern i cant catch in my results such as: > a de la b > i catch “a b” though. > I though Lucene might ignore those automatically while creating index. > > >> On Feb 23, 2019, at 12:29 PM, Erick Erickson <erickerick...@gmail.com> wrote: >> >> Use stopwords, although it's becoming less of a concern, why do you think >> you need to? >> >>> On Sat, Feb 23, 2019, 08:42 baris.kazar <baris.ka...@oracle.com> wrote: >>> >>> Hi,- >>> What is the (most efficient) way to >>> ignore “de la” kinda connectors >>> in a string at index or search time? >>> Thanks >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org