Ok.I went through the Javadoc of PhraseQuery and tried using position
argument to phrasequery.
Problem encountered:
My text contains : Still it is not happening and generally i will be
able to complete it at the earliest.
The user enters search string : 1. still happening and 2. still it is
not happening.
Now, based on what I understood for the first input, I will add still at
0 and happening at 1 of the phrasequery position. This will not give me
any hit.
For second input, do I still need to add still at 0 and happening at 4
to phrasequery position ? This will mean I need to store locally the
stopwords and every user input will then need to be parsed for stopwords
and extract required terms. This might not be a feasible solution
anyday. Parsing user input to discard stopwords and then search..
However, this is giving me HIT but not at all recommended to implement
by parsing user input.
On 7/26/2013 6:49 PM, Michael McCandless wrote:
Have a look at the position argument to PhraseQuery.add: it lets you
control where this new term is in the phrase.
So to search for "wizard of oz" when of is a stopword you would add
"wizard" at position 0 and "oz" at position 2.
This is different from slop, which allows for "fuzzy" matching of the
phrase, e.g. if you pass slop of 4 (I think) then your search for
"wizard of oz" could match a document containing "oz of wizard".
Yes, ShingleFilter bloats the index, but CommonGramsFilter lets you
only pair up a specific subset of tokens, so the bloat is much less.
Mike McCandless
http://blog.mikemccandless.com
On Fri, Jul 26, 2013 at 7:34 AM, Ankit Murarka
<ankit.mura...@rancoretech.com> wrote:
Hello can you elaborate more on this.. I seem to be lost over here..
Since I am new to lucene, so yesterday I was going through ShingleFilter and
its application. Seems like its a kind of a N-Gram thing and it bloats the
index as Mike have mentioned.
As of now I am only concerned with the appropiate way to solve this problem.
With PhraseQuery if I specify terms, then do you also want me to specify
slop ? If I dont supply slop it default to specific search match. However
due to stopwords this phraseQuery was not giving me any hits and hence I
raised this question.
I still dont know from where to approach this problem and how to solve this.
I am sure this is definitely supported by Lucene but Perhaps a bit more
explanation and guidance will do the trick for me.
On 7/24/2013 6:06 PM, Michael McCandless wrote:
With PhraseQuery you can specify where each term must occur in the phrase.
So X must occur in position 0, David in position 1, and then manager
in position 4 (skipping 2 holes).
QueryParser does this for you: when it analyzes the users phrase, if
the resulting tokens have holes, then it sets the positions
accordingly.
And I agree: shingles are a good solution here too, but they make your
index larger. CommonGramsFilter lets you shingle only specific words,
e.g. you could pass your stop words to it.
Mike McCandless
http://blog.mikemccandless.com
On Wed, Jul 24, 2013 at 7:34 AM, Ankit Murarka
<ankit.mura...@rancoretech.com> wrote:
I tried using Phrase Query with slops. Now since I am specifying the slop
I
also need to specify the 2nd term.
In my case the 2nd term is not present. The whole string to be searched
is
still 1 single term.
How do I skip the holes created by stopwords. I do not know before hand
how
many stop words are skipped and what string user is going to enter.
Is there a definite way to skip the holes created by stopwords.
I was now looking for MultiphraseQuery splitting the user provided string
on
space and providing each word as a term to multiphrasequery.
Will it help..?? Is there any alternative. ??
On 7/24/2013 4:48 PM, Michael McCandless wrote:
PhraseQuery?
You can skip the holes created by stopwords ... e.g. QueryParser does
this. Ie, the PhraseQuery becomes "X David _ _ manager _ _ company"
if is/a/of/the are stop words, which isn't perfect (could return false
matches) but should work well in practice ...
Mike McCandless
http://blog.mikemccandless.com
On Wed, Jul 24, 2013 at 4:31 AM, Ankit Murarka
<ankit.mura...@rancoretech.com> wrote:
Dear All,
Say suppose I have 3 documents. The sample text is
/*File 1 : */
Mr X David is a manager of the company. He is the senior most manager.
I
also want to become manager of the company.
/*File 2 :*/
Mr X David manager of the company is also very senior. He happens to be
the
senior most manager. I wish even I could reach that place.
/*File 3:*/
Mr X David is working for a company. He happens to be the manager of
the
company.Infact he is the senior most manager. I dont want to become
like
him.
/*String I wish to search :* X David is a manager of the company./
Ideally I should get only file1 in the hit result.
I have no clue how to achieve this. Basically I am trying to match the
part
of the sentence or a complete sentence. What can be the best
methodology.
I presume is a are the stop words and will be skipped during indexing
by
the
StandardAnalyzer.
What wonders me how do I then search for a part of the sentence or
complete
sentence if sentence contains some/many stopwords.
I am using StandardAnalyzer. Please guide.
--
Regards
Ankit
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
--
Regards
Ankit Murarka
"Peace is found not in what surrounds us, but in what we hold within."
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
--
Regards
Ankit Murarka
"Peace is found not in what surrounds us, but in what we hold within."
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
--
Regards
Ankit Murarka
"Peace is found not in what surrounds us, but in what we hold within."
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org