Lucene won’t ignore these unless you tell it to via stopwords.

This is a problem no matter how you look at it. If you do put in stopwords, the 
word _positions_ are retained. In your example,
word     position
a           1
de         2
la         3
b           4

If you remove “de” and “la” via stopwords, the positions are still:

word     position
a           1
b           4

So searching for “a b” would fail in the second case unless you included “slop” 
as
“a b”~2

But let’s say you _do not_ have input with these stopwords, just “a b". The 
positions
will be 1 and 2 respectively. Here the user would expect “a b” to match this 
doc, but
not a doc with “a de la b” (unless they knew a lot about search!).

So maybe the right thing to do is let phrases have slop as a matter of course.

Best,
Erick


> On Feb 23, 2019, at 11:07 AM, baris.kazar <baris.ka...@oracle.com> wrote:
> 
> Thanks Erick there is a pattern i cant catch in my results such as:
> a de la b
> i catch “a b” though.
> I though Lucene might ignore those automatically while creating index.
> 
> 
>> On Feb 23, 2019, at 12:29 PM, Erick Erickson <erickerick...@gmail.com> wrote:
>> 
>> Use stopwords, although it's becoming less of a concern, why do you think
>> you need to?
>> 
>>> On Sat, Feb 23, 2019, 08:42 baris.kazar <baris.ka...@oracle.com> wrote:
>>> 
>>> Hi,-
>>> What is the (most efficient) way to
>>> ignore “de la” kinda connectors
>>> in a string at index or search time?
>>> Thanks
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>> 
>>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to