There is PhraseQuery, too, but lets consider two cases:
case1: that PhraseQuery is not being used:
then should i add to standard filter’s stopwords also the french
stopwords
both at index & search times? can i just add them at search time
and keep
old friends index as it is?
case2: that PhraseQuery being used:
i guess i need to play with the “slops” and stopwords in this case
will
not help, right?
Thanks
On Feb 24, 2019, at 2:25 PM, baris.kazar <baris.ka...@oracle.com>
wrote:
That is not what i am looking for. Thanks.
c b search string finds
a b
but how cant find
a de la b
so i will try french stopwords.
Doing that i am using 8 queries like the ones i mentioned.
Best
On Feb 24, 2019, at 1:19 PM, Erick Erickson
<erickerick...@gmail.com>
wrote:
Phrase search is looking for words next to each other. A phrase
search
on the text “my dog has fleas” would succeed for “my dog” or “has
fleas”
but not “my fleas” since the words are not right next to each
other. “my
fleas”~3 would succeed because the “~3” indicates that the words
can have
intervening terms.
Searching (dog AND fleas) would match no matter how many words were
between the two.
If you’re unclear about what phrase search .vs. non-phrase search
means, some background research/ self-education are strongly
recommended,
such basic understanding of search is pretty much assumed.
Best,
Erick
On Feb 24, 2019, at 9:25 AM, baris.kazar <baris.ka...@oracle.com>
wrote:
i guess so
what is phrase search?
c b is searched do you expect a de la b?
Thanks
On Feb 24, 2019, at 10:49 AM, Erick Erickson
<erickerick...@gmail.com>
wrote:
Not sure we’re talking about the same thing. I was talking
specifically about _phrase_ searches. If all you want is the clause
you
just said, phrases are not involved at all and the presence or
absence of
intervening words is totally unnecessary. This assumes your field type
tokenizes the input similar to the text_general field in the examples.
Specifically _not_ “string” fields or fields that use
KeywordTokenizer.
q=name:(a AND b) OR name:b
for instance. With a query like that it doesn’t matter in the
least
whether there are, or are not any words between “a” and “b”.
All that may be obvious to you, but when I read your latest
e-mail it
occurred to me that we might not be talking about the same thing.
Best,
Erick
On Feb 23, 2019, at 7:33 PM, baris.kazar <baris.ka...@oracle.com>
wrote:
In this case search string is c b
and then search query has 8 combos
including two cases with c b ~ which means find all containing
c And
b and c Or b ( two separate queries having ~ )
and then i can find a b but not a de la b without French
stopwords.
Thanks
On Feb 23, 2019, at 6:52 PM, Erick Erickson <
erickerick...@gmail.com> wrote:
Lucene won’t ignore these unless you tell it to via stopwords.
This is a problem no matter how you look at it. If you do put in
stopwords, the word _positions_ are retained. In your example,
word position
a 1
de 2
la 3
b 4
If you remove “de” and “la” via stopwords, the positions are
still:
word position
a 1
b 4
So searching for “a b” would fail in the second case unless you
included “slop” as
“a b”~2
But let’s say you _do not_ have input with these stopwords,
just “a
b". The positions
will be 1 and 2 respectively. Here the user would expect “a
b” to
match this doc, but
not a doc with “a de la b” (unless they knew a lot about
search!).
So maybe the right thing to do is let phrases have slop as a
matter
of course.
Best,
Erick
On Feb 23, 2019, at 11:07 AM, baris.kazar
<baris.ka...@oracle.com>
wrote:
Thanks Erick there is a pattern i cant catch in my results
such as:
a de la b
i catch “a b” though.
I though Lucene might ignore those automatically while creating
index.
On Feb 23, 2019, at 12:29 PM, Erick Erickson <
erickerick...@gmail.com> wrote:
Use stopwords, although it's becoming less of a concern,
why do
you think
you need to?
On Sat, Feb 23, 2019, 08:42 baris.kazar
<baris.ka...@oracle.com>
wrote:
Hi,-
What is the (most efficient) way to
ignore “de la” kinda connectors
in a string at index or search time?
Thanks
---------------------------------------------------------------------
To unsubscribe, e-mail:
java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail:
java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail:
java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail:
java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org