There is PhraseQuery, too, but lets consider two cases: case1: that PhraseQuery is not being used: then should i add to standard filter’s stopwords also the french stopwords both at index & search times? can i just add them at search time and keep old index as it is? [gotta disable this editing feature]
case2: that PhraseQuery being used: i guess i need to play with the “slops” and stopwords in this case will not help, right? Thanks > On Feb 24, 2019, at 8:02 PM, baris.kazar <baris.ka...@oracle.com> wrote: > > There is PhraseQuery, too, but lets consider two cases: > > case1: that PhraseQuery is not being used: > then should i add to standard filter’s stopwords also the french stopwords > both at index & search times? can i just add them at search time and keep old > friends index as it is? > > case2: that PhraseQuery being used: > i guess i need to play with the “slops” and stopwords in this case will not > help, right? > > Thanks > >> On Feb 24, 2019, at 2:25 PM, baris.kazar <baris.ka...@oracle.com> wrote: >> >> That is not what i am looking for. Thanks. >> >> c b search string finds >> a b >> but how cant find >> a de la b >> so i will try french stopwords. >> Doing that i am using 8 queries like the ones i mentioned. >> Best >> >>> On Feb 24, 2019, at 1:19 PM, Erick Erickson <erickerick...@gmail.com> wrote: >>> >>> Phrase search is looking for words next to each other. A phrase search on >>> the text “my dog has fleas” would succeed for “my dog” or “has fleas” but >>> not “my fleas” since the words are not right next to each other. “my >>> fleas”~3 would succeed because the “~3” indicates that the words can have >>> intervening terms. >>> >>> Searching (dog AND fleas) would match no matter how many words were between >>> the two. >>> >>> If you’re unclear about what phrase search .vs. non-phrase search means, >>> some background research/ self-education are strongly recommended, such >>> basic understanding of search is pretty much assumed. >>> >>> Best, >>> Erick >>> >>>> On Feb 24, 2019, at 9:25 AM, baris.kazar <baris.ka...@oracle.com> wrote: >>>> >>>> i guess so >>>> what is phrase search? >>>> c b is searched do you expect a de la b? >>>> Thanks >>>> >>>>> On Feb 24, 2019, at 10:49 AM, Erick Erickson <erickerick...@gmail.com> >>>>> wrote: >>>>> >>>>> Not sure we’re talking about the same thing. I was talking specifically >>>>> about _phrase_ searches. If all you want is the clause you just said, >>>>> phrases are not involved at all and the presence or absence of >>>>> intervening words is totally unnecessary. This assumes your field type >>>>> tokenizes the input similar to the text_general field in the examples. >>>>> Specifically _not_ “string” fields or fields that use KeywordTokenizer. >>>>> >>>>> q=name:(a AND b) OR name:b >>>>> >>>>> for instance. With a query like that it doesn’t matter in the least >>>>> whether there are, or are not any words between “a” and “b”. >>>>> >>>>> All that may be obvious to you, but when I read your latest e-mail it >>>>> occurred to me that we might not be talking about the same thing. >>>>> >>>>> Best, >>>>> Erick >>>>> >>>>>> On Feb 23, 2019, at 7:33 PM, baris.kazar <baris.ka...@oracle.com> wrote: >>>>>> >>>>>> In this case search string is c b >>>>>> and then search query has 8 combos >>>>>> including two cases with c b ~ which means find all containing c And b >>>>>> and c Or b ( two separate queries having ~ ) >>>>>> and then i can find a b but not a de la b without French stopwords. >>>>>> Thanks >>>>>> >>>>>>> On Feb 23, 2019, at 6:52 PM, Erick Erickson <erickerick...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>> Lucene won’t ignore these unless you tell it to via stopwords. >>>>>>> >>>>>>> This is a problem no matter how you look at it. If you do put in >>>>>>> stopwords, the word _positions_ are retained. In your example, >>>>>>> word position >>>>>>> a 1 >>>>>>> de 2 >>>>>>> la 3 >>>>>>> b 4 >>>>>>> >>>>>>> If you remove “de” and “la” via stopwords, the positions are still: >>>>>>> >>>>>>> word position >>>>>>> a 1 >>>>>>> b 4 >>>>>>> >>>>>>> So searching for “a b” would fail in the second case unless you >>>>>>> included “slop” as >>>>>>> “a b”~2 >>>>>>> >>>>>>> But let’s say you _do not_ have input with these stopwords, just “a b". >>>>>>> The positions >>>>>>> will be 1 and 2 respectively. Here the user would expect “a b” to match >>>>>>> this doc, but >>>>>>> not a doc with “a de la b” (unless they knew a lot about search!). >>>>>>> >>>>>>> So maybe the right thing to do is let phrases have slop as a matter of >>>>>>> course. >>>>>>> >>>>>>> Best, >>>>>>> Erick >>>>>>> >>>>>>> >>>>>>>> On Feb 23, 2019, at 11:07 AM, baris.kazar <baris.ka...@oracle.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>> Thanks Erick there is a pattern i cant catch in my results such as: >>>>>>>> a de la b >>>>>>>> i catch “a b” though. >>>>>>>> I though Lucene might ignore those automatically while creating index. >>>>>>>> >>>>>>>> >>>>>>>>> On Feb 23, 2019, at 12:29 PM, Erick Erickson >>>>>>>>> <erickerick...@gmail.com> wrote: >>>>>>>>> >>>>>>>>> Use stopwords, although it's becoming less of a concern, why do you >>>>>>>>> think >>>>>>>>> you need to? >>>>>>>>> >>>>>>>>>> On Sat, Feb 23, 2019, 08:42 baris.kazar <baris.ka...@oracle.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Hi,- >>>>>>>>>> What is the (most efficient) way to >>>>>>>>>> ignore “de la” kinda connectors >>>>>>>>>> in a string at index or search time? >>>>>>>>>> Thanks >>>>>>>>>> >>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>>>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> --------------------------------------------------------------------- >>>>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>>>>> >>>>>>> >>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>>>> >>>>>> >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>>> >>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>> >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>> >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org