In this case search string is c b and then search query has 8 combos including two cases with c b ~ which means find all containing c And b and c Or b ( two separate queries having ~ ) and then i can find a b but not a de la b without French stopwords. Thanks
> On Feb 23, 2019, at 6:52 PM, Erick Erickson <erickerick...@gmail.com> wrote: > > Lucene won’t ignore these unless you tell it to via stopwords. > > This is a problem no matter how you look at it. If you do put in stopwords, > the word _positions_ are retained. In your example, > word position > a 1 > de 2 > la 3 > b 4 > > If you remove “de” and “la” via stopwords, the positions are still: > > word position > a 1 > b 4 > > So searching for “a b” would fail in the second case unless you included > “slop” as > “a b”~2 > > But let’s say you _do not_ have input with these stopwords, just “a b". The > positions > will be 1 and 2 respectively. Here the user would expect “a b” to match this > doc, but > not a doc with “a de la b” (unless they knew a lot about search!). > > So maybe the right thing to do is let phrases have slop as a matter of course. > > Best, > Erick > > >> On Feb 23, 2019, at 11:07 AM, baris.kazar <baris.ka...@oracle.com> wrote: >> >> Thanks Erick there is a pattern i cant catch in my results such as: >> a de la b >> i catch “a b” though. >> I though Lucene might ignore those automatically while creating index. >> >> >>> On Feb 23, 2019, at 12:29 PM, Erick Erickson <erickerick...@gmail.com> >>> wrote: >>> >>> Use stopwords, although it's becoming less of a concern, why do you think >>> you need to? >>> >>>> On Sat, Feb 23, 2019, 08:42 baris.kazar <baris.ka...@oracle.com> wrote: >>>> >>>> Hi,- >>>> What is the (most efficient) way to >>>> ignore “de la” kinda connectors >>>> in a string at index or search time? >>>> Thanks >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>> >>>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org