Not sure we’re talking about the same thing. I was talking specifically about 
_phrase_ searches. If all you want is the clause you just said, phrases are not 
involved at all and the presence or absence of intervening words is totally 
unnecessary. This assumes your field type tokenizes the input similar to the 
text_general field in the examples. Specifically _not_ “string” fields or 
fields that use KeywordTokenizer. 

q=name:(a AND b) OR name:b

for instance. With a query like that it doesn’t matter in the least whether 
there are, or are not any words between “a” and “b”.

All that may be obvious to you, but when I read your latest e-mail it occurred 
to me that we might not be talking about the same thing.

Best,
Erick

> On Feb 23, 2019, at 7:33 PM, baris.kazar <baris.ka...@oracle.com> wrote:
> 
> In this case search string is c b
> and then search query has 8 combos
> including two cases with c b ~ which means find all containing c And b and c 
> Or b ( two separate queries having ~ )
> and then i can find a b but not a de la b without French stopwords.
> Thanks
> 
>> On Feb 23, 2019, at 6:52 PM, Erick Erickson <erickerick...@gmail.com> wrote:
>> 
>> Lucene won’t ignore these unless you tell it to via stopwords.
>> 
>> This is a problem no matter how you look at it. If you do put in stopwords, 
>> the word _positions_ are retained. In your example,
>> word     position
>> a           1
>> de         2
>> la         3
>> b           4
>> 
>> If you remove “de” and “la” via stopwords, the positions are still:
>> 
>> word     position
>> a           1
>> b           4
>> 
>> So searching for “a b” would fail in the second case unless you included 
>> “slop” as
>> “a b”~2
>> 
>> But let’s say you _do not_ have input with these stopwords, just “a b". The 
>> positions
>> will be 1 and 2 respectively. Here the user would expect “a b” to match this 
>> doc, but
>> not a doc with “a de la b” (unless they knew a lot about search!).
>> 
>> So maybe the right thing to do is let phrases have slop as a matter of 
>> course.
>> 
>> Best,
>> Erick
>> 
>> 
>>> On Feb 23, 2019, at 11:07 AM, baris.kazar <baris.ka...@oracle.com> wrote:
>>> 
>>> Thanks Erick there is a pattern i cant catch in my results such as:
>>> a de la b
>>> i catch “a b” though.
>>> I though Lucene might ignore those automatically while creating index.
>>> 
>>> 
>>>> On Feb 23, 2019, at 12:29 PM, Erick Erickson <erickerick...@gmail.com> 
>>>> wrote:
>>>> 
>>>> Use stopwords, although it's becoming less of a concern, why do you think
>>>> you need to?
>>>> 
>>>>> On Sat, Feb 23, 2019, 08:42 baris.kazar <baris.ka...@oracle.com> wrote:
>>>>> 
>>>>> Hi,-
>>>>> What is the (most efficient) way to
>>>>> ignore “de la” kinda connectors
>>>>> in a string at index or search time?
>>>>> Thanks
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>>> 
>>>>> 
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to