Have a look at the position argument to PhraseQuery.add: it lets you control where this new term is in the phrase.
So to search for "wizard of oz" when of is a stopword you would add "wizard" at position 0 and "oz" at position 2. This is different from slop, which allows for "fuzzy" matching of the phrase, e.g. if you pass slop of 4 (I think) then your search for "wizard of oz" could match a document containing "oz of wizard". Yes, ShingleFilter bloats the index, but CommonGramsFilter lets you only pair up a specific subset of tokens, so the bloat is much less. Mike McCandless http://blog.mikemccandless.com On Fri, Jul 26, 2013 at 7:34 AM, Ankit Murarka <ankit.mura...@rancoretech.com> wrote: > Hello can you elaborate more on this.. I seem to be lost over here.. > > Since I am new to lucene, so yesterday I was going through ShingleFilter and > its application. Seems like its a kind of a N-Gram thing and it bloats the > index as Mike have mentioned. > > As of now I am only concerned with the appropiate way to solve this problem. > > With PhraseQuery if I specify terms, then do you also want me to specify > slop ? If I dont supply slop it default to specific search match. However > due to stopwords this phraseQuery was not giving me any hits and hence I > raised this question. > > I still dont know from where to approach this problem and how to solve this. > > I am sure this is definitely supported by Lucene but Perhaps a bit more > explanation and guidance will do the trick for me. > > > On 7/24/2013 6:06 PM, Michael McCandless wrote: >> >> With PhraseQuery you can specify where each term must occur in the phrase. >> >> So X must occur in position 0, David in position 1, and then manager >> in position 4 (skipping 2 holes). >> >> QueryParser does this for you: when it analyzes the users phrase, if >> the resulting tokens have holes, then it sets the positions >> accordingly. >> >> And I agree: shingles are a good solution here too, but they make your >> index larger. CommonGramsFilter lets you shingle only specific words, >> e.g. you could pass your stop words to it. >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> >> On Wed, Jul 24, 2013 at 7:34 AM, Ankit Murarka >> <ankit.mura...@rancoretech.com> wrote: >> >>> >>> I tried using Phrase Query with slops. Now since I am specifying the slop >>> I >>> also need to specify the 2nd term. >>> >>> In my case the 2nd term is not present. The whole string to be searched >>> is >>> still 1 single term. >>> >>> How do I skip the holes created by stopwords. I do not know before hand >>> how >>> many stop words are skipped and what string user is going to enter. >>> >>> Is there a definite way to skip the holes created by stopwords. >>> >>> I was now looking for MultiphraseQuery splitting the user provided string >>> on >>> space and providing each word as a term to multiphrasequery. >>> >>> Will it help..?? Is there any alternative. ?? >>> >>> >>> On 7/24/2013 4:48 PM, Michael McCandless wrote: >>> >>>> >>>> PhraseQuery? >>>> >>>> You can skip the holes created by stopwords ... e.g. QueryParser does >>>> this. Ie, the PhraseQuery becomes "X David _ _ manager _ _ company" >>>> if is/a/of/the are stop words, which isn't perfect (could return false >>>> matches) but should work well in practice ... >>>> >>>> Mike McCandless >>>> >>>> http://blog.mikemccandless.com >>>> >>>> >>>> On Wed, Jul 24, 2013 at 4:31 AM, Ankit Murarka >>>> <ankit.mura...@rancoretech.com> wrote: >>>> >>>> >>>>> >>>>> Dear All, >>>>> >>>>> Say suppose I have 3 documents. The sample text is >>>>> >>>>> /*File 1 : */ >>>>> >>>>> Mr X David is a manager of the company. He is the senior most manager. >>>>> I >>>>> also want to become manager of the company. >>>>> >>>>> /*File 2 :*/ >>>>> >>>>> Mr X David manager of the company is also very senior. He happens to be >>>>> the >>>>> senior most manager. I wish even I could reach that place. >>>>> >>>>> /*File 3:*/ >>>>> >>>>> Mr X David is working for a company. He happens to be the manager of >>>>> the >>>>> company.Infact he is the senior most manager. I dont want to become >>>>> like >>>>> him. >>>>> >>>>> /*String I wish to search :* X David is a manager of the company./ >>>>> >>>>> Ideally I should get only file1 in the hit result. >>>>> >>>>> I have no clue how to achieve this. Basically I am trying to match the >>>>> part >>>>> of the sentence or a complete sentence. What can be the best >>>>> methodology. >>>>> I presume is a are the stop words and will be skipped during indexing >>>>> by >>>>> the >>>>> StandardAnalyzer. >>>>> >>>>> What wonders me how do I then search for a part of the sentence or >>>>> complete >>>>> sentence if sentence contains some/many stopwords. >>>>> >>>>> I am using StandardAnalyzer. Please guide. >>>>> >>>>> -- >>>>> Regards >>>>> >>>>> Ankit >>>>> >>>>> >>>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>> >>>> >>>> >>>> >>> >>> >>> >>> -- >>> Regards >>> >>> Ankit Murarka >>> >>> "Peace is found not in what surrounds us, but in what we hold within." >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> > > > > -- > Regards > > Ankit Murarka > > "Peace is found not in what surrounds us, but in what we hold within." > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org