i guess so what is phrase search? c b is searched do you expect a de la b? Thanks
> On Feb 24, 2019, at 10:49 AM, Erick Erickson <erickerick...@gmail.com> wrote: > > Not sure we’re talking about the same thing. I was talking specifically about > _phrase_ searches. If all you want is the clause you just said, phrases are > not involved at all and the presence or absence of intervening words is > totally unnecessary. This assumes your field type tokenizes the input similar > to the text_general field in the examples. Specifically _not_ “string” fields > or fields that use KeywordTokenizer. > > q=name:(a AND b) OR name:b > > for instance. With a query like that it doesn’t matter in the least whether > there are, or are not any words between “a” and “b”. > > All that may be obvious to you, but when I read your latest e-mail it > occurred to me that we might not be talking about the same thing. > > Best, > Erick > >> On Feb 23, 2019, at 7:33 PM, baris.kazar <baris.ka...@oracle.com> wrote: >> >> In this case search string is c b >> and then search query has 8 combos >> including two cases with c b ~ which means find all containing c And b and c >> Or b ( two separate queries having ~ ) >> and then i can find a b but not a de la b without French stopwords. >> Thanks >> >>> On Feb 23, 2019, at 6:52 PM, Erick Erickson <erickerick...@gmail.com> wrote: >>> >>> Lucene won’t ignore these unless you tell it to via stopwords. >>> >>> This is a problem no matter how you look at it. If you do put in stopwords, >>> the word _positions_ are retained. In your example, >>> word position >>> a 1 >>> de 2 >>> la 3 >>> b 4 >>> >>> If you remove “de” and “la” via stopwords, the positions are still: >>> >>> word position >>> a 1 >>> b 4 >>> >>> So searching for “a b” would fail in the second case unless you included >>> “slop” as >>> “a b”~2 >>> >>> But let’s say you _do not_ have input with these stopwords, just “a b". The >>> positions >>> will be 1 and 2 respectively. Here the user would expect “a b” to match >>> this doc, but >>> not a doc with “a de la b” (unless they knew a lot about search!). >>> >>> So maybe the right thing to do is let phrases have slop as a matter of >>> course. >>> >>> Best, >>> Erick >>> >>> >>>> On Feb 23, 2019, at 11:07 AM, baris.kazar <baris.ka...@oracle.com> wrote: >>>> >>>> Thanks Erick there is a pattern i cant catch in my results such as: >>>> a de la b >>>> i catch “a b” though. >>>> I though Lucene might ignore those automatically while creating index. >>>> >>>> >>>>> On Feb 23, 2019, at 12:29 PM, Erick Erickson <erickerick...@gmail.com> >>>>> wrote: >>>>> >>>>> Use stopwords, although it's becoming less of a concern, why do you think >>>>> you need to? >>>>> >>>>>> On Sat, Feb 23, 2019, 08:42 baris.kazar <baris.ka...@oracle.com> wrote: >>>>>> >>>>>> Hi,- >>>>>> What is the (most efficient) way to >>>>>> ignore “de la” kinda connectors >>>>>> in a string at index or search time? >>>>>> Thanks >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>>> >>>>>> >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>> >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org