Tomoko,- Yes, i noticed that last nite when i was researching it and thanks for confirming. StandardAnalyzer does not do stemming. So, MAINS case has some other reason. Best regards
----- Original Message ----- From: tomoko.uchida.1...@gmail.com To: java-user@lucene.apache.org Sent: Sunday, June 16, 2019 4:39:29 AM GMT -05:00 US/Canada Eastern Subject: Re: FuzzyQuery- why is it ignored? Hi, you said you are using standard analyzer. If so, you are not using any stemmer at all (please see the analyzer's Javadocs). 2019年6月16日(日) 11:43 Baris Kazar <baris.ka...@oracle.com>: > > Hello,- > Erick explained how to disable stemming in Solr but i am using Lucene purely. > i am also researching how to disable it in Lucene but if You have > instructions how to do so already > i appreciate if You could share here. > Best regards > > ----- Original Message ----- > From: baris.ka...@oracle.com > To: java-user@lucene.apache.org, tomoko.uchida.1...@gmail.com, > erickerick...@gmail.com, a...@linux.com, baris.ka...@oracle.com, > luc...@mikemccandless.com > Sent: Thursday, June 13, 2019 10:48:47 AM GMT -05:00 US/Canada Eastern > Subject: Re: FuzzyQuery- why is it ignored? > > i see, i am using an older version 6.6 and we should switch to Your 8.1 > version of at least 7.X. > > Tomoko i think i understood You meant MAIN NASHUA .... for the string :) > > Again i really appreciate all answers. > > How do we disable or enable stemming while indexing? :) another question. > > Best regards > > > On 6/13/19 10:40 AM, Tomoko Uchida wrote: > > Sorry, I made a mistake when copypasting. Let me just correct my previous > > mail. > > > >> 1. Indexed this text: "NASHUA NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED > >> STATES". > > 1. Indexed this text: "MAIN DUNSTABLE NASHUA HILLSBOROUGH NEW > > HAMPSHIRE UNITED STATES" > > > > ---- > > As far as I can say, this query correctly find the indexed document > > (so I have no idea about what is wrong with fuzzy query). > > +contentDFLT:mains~2 +contentDFLT:"nashua" > > +contentDFLT:"new-hampshire" +contentDFLT:"united states" > > > > I am > > - using lucene 8.1. > > - using standard analyzer for both of indexing and searching. > > - using classic query parser for parsing. > > > > > > > > 2019年6月13日(木) 23:18 <baris.ka...@oracle.com>: > >> However, the index does not have MAINS but MAIN for the expected entry. > >> > >> Best regards > >> > >> > >> > >> On 6/13/19 10:33 AM, baris.ka...@oracle.com wrote: > >>> does it consider it as like plural word? :) :) :) > >>> That makes sense. > >>> > >>> Best regards > >>> > >>> > >>> On 6/13/19 10:31 AM, baris.ka...@oracle.com wrote: > >>>> Erick, > >>>> > >>>> Cool, could You give a simple example with my example please? > >>>> > >>>> Best regards > >>>> > >>>> > >>>> > >>>> On 6/13/19 10:12 AM, Erick Erickson wrote: > >>>>> Shot in the dark: stemming. Whenever I see a problem with something > >>>>> ending in “s” (or “er” or “ing” or….) my first suspect is that > >>>>> stemming is turned on. In that case the token in the index that’s > >>>>> actually searched on is somewhat different than you expect. > >>>>> > >>>>> The test is easy, just insure your fieldType contains no stemmers. > >>>>> PorterStemmer is particularly aggressive, but for this case to test > >>>>> I’d just remove all stemming, re-index and see if the results differ. > >>>>> > >>>>> Best, > >>>>> Erick > >>>>> > >>>>>> On Jun 13, 2019, at 7:26 AM, baris.ka...@oracle.com wrote: > >>>>>> > >>>>>> Tomoko,- > >>>>>> > >>>>>> That is strange indeed. > >>>>>> > >>>>>> Something is wrong when i use mains but maink, mainl, mainr,mainq, > >>>>>> maint all work ok any consonant at the end except s works in this > >>>>>> case. > >>>>>> > >>>>>> Case #3 had +contentDFLT:mains~2 but not +contentDFLT:"mains~2". > >>>>>> > >>>>>> i am using fuzzy query with ~ from Query.builder and that is not > >>>>>> PhraseQuery. > >>>>>> > >>>>>> Similarly FuzzyQuery with input "mains" (it has to be lowercase > >>>>>> since it does not go through StandardAnalyzer) is also not > >>>>>> PhraseQuery. > >>>>>> > >>>>>> can there be a clearer sample case for ComplexPhraseQuery please in > >>>>>> the docs? > >>>>>> > >>>>>> did You also index "MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED > >>>>>> STATES" the expected output in this case? > >>>>>> > >>>>>> Thanks for spending time on this, i would like to thank everyone. > >>>>>> > >>>>>> Best regards > >>>>>> > >>>>>> > >>>>>> On 6/13/19 12:13 AM, Tomoko Uchida wrote: > >>>>>>> Hi, > >>>>>>> > >>>>>>>> Ok, i think only this very specific only "mains" has an issue. > >>>>>>> It looks strange to me. I did some test locally. > >>>>>>> > >>>>>>> 1. Indexed this text: "NASHUA NASHUA HILLSBOROUGH NEW HAMPSHIRE > >>>>>>> UNITED STATES". > >>>>>>> > >>>>>>> 2a. This query string (just copied from your Case #3) worked > >>>>>>> correctly > >>>>>>> for me as far as I can see. > >>>>>>> +contentDFLT:mains~2 +contentDFLT:"nashua", > >>>>>>> +contentDFLT:"new-hampshire", +contentDFLT:"united state" > >>>>>>> > >>>>>>> 2b. However this query string got no results. > >>>>>>> +contentDFLT:"mains~2", +contentDFLT:"nashua", > >>>>>>> +contentDFLT:"new-hampshire", +contentDFLT:"united states" > >>>>>>> It is an expected behaviour because the classic query parser does not > >>>>>>> support fuzzy query inside phrase query (as far as I know). > >>>>>>> > >>>>>>> I suspect you use fuzzy query operator (~) inside phrase query > >>>>>>> ("), as > >>>>>>> the 2b case. > >>>>>>> > >>>>>>> FYI: there is a special parser for such complex phrase query. > >>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_8-5F1-5F0_queryparser_org_apache_lucene_queryparser_complexPhrase_ComplexPhraseQueryParser.html&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=ZcXpaSlwS5DegX76mHTb_6DH3P7noan1eeMXc-Vh5M8&s=FoIMlcjDO2b7Gut9XRx-NIBWiBQWItsj8IlylJC7Wkc&e= > >>>>>>> > >>>>>>> > >>>>>>> Tomoko > >>>>>>> > >>>>>>> 2019年6月13日(木) 6:16 <baris.ka...@oracle.com>: > >>>>>>>> Ok, i think only this very specific only "mains" has an issue. > >>>>>>>> > >>>>>>>> all i knew about Lucene was fine :) Great... > >>>>>>>> > >>>>>>>> i have one more question: > >>>>>>>> > >>>>>>>> which one is advised to use: FuzzyQuery or the Query.parser with > >>>>>>>> search string~ appended? > >>>>>>>> > >>>>>>>> The second one will go through analyzer and make search string > >>>>>>>> lowercase. > >>>>>>>> > >>>>>>>> Best regards > >>>>>>>> > >>>>>>>> > >>>>>>>> On 6/12/19 1:03 PM, baris.ka...@oracle.com wrote: > >>>>>>>> > >>>>>>>> Hi again,- > >>>>>>>> > >>>>>>>> this is really interesting and i hope i am missing something. > >>>>>>>> Index small cases all entries so case sensitivity is not an issue > >>>>>>>> i think. > >>>>>>>> > >>>>>>>> Case #1: > >>>>>>>> > >>>>>>>> org.apache.lucene.queryparser.classic.QueryParser parser = new > >>>>>>>> org.apache.lucene.queryparser.classic.QueryParser(field, > >>>>>>>> phraseAnalyzer) ; > >>>>>>>> Query q1 = null; > >>>>>>>> try { > >>>>>>>> q1 = parser.parse("Main"); > >>>>>>>> } catch (ParseException e) { > >>>>>>>> e.printStackTrace(); > >>>>>>>> } > >>>>>>>> booleanQuery.add(q1, BooleanClause.Occur.MUST); > >>>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field, > >>>>>>>> "NASHUA"), BooleanClause.Occur.MUST); > >>>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field, > >>>>>>>> "NEW HAMPSHIRE"), BooleanClause.Occur.MUST); > >>>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field, > >>>>>>>> "UNITED STATES"), BooleanClause.Occur.MUST); > >>>>>>>> > >>>>>>>> > >>>>>>>> This brings with this: > >>>>>>>> > >>>>>>>> query plan: > >>>>>>>> > >>>>>>>> [+contentDFLT:main, +contentDFLT:"nashua", > >>>>>>>> +contentDFLT:"new-hampshire", +contentDFLT:"united states"] > >>>>>>>> > >>>>>>>> testQuerySearch1 Time to compute: 0 seconds (copied answer after > >>>>>>>> exec finished) > >>>>>>>> > >>>>>>>> Number of results: 12 > >>>>>>>> Name: Main Dunstable Rd > >>>>>>>> Score: 41.204945 > >>>>>>>> ID: 12677400 > >>>>>>>> Country Code: US > >>>>>>>> Coordinates: 42.72631, -71.50269 > >>>>>>>> Search Key: MAIN DUNSTABLE NASHUA HILLSBOROUGH NEW HAMPSHIRE > >>>>>>>> UNITED STATES > >>>>>>>> > >>>>>>>> Name: Main St > >>>>>>>> Score: 41.204945 > >>>>>>>> ID: 12681980 > >>>>>>>> Country Code: US > >>>>>>>> Coordinates: 42.76416, -71.46681 > >>>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES > >>>>>>>> > >>>>>>>> Name: Main St > >>>>>>>> Score: 41.204945 > >>>>>>>> ID: 12681973 > >>>>>>>> Country Code: US > >>>>>>>> Coordinates: 42.75045, -71.4607 > >>>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES > >>>>>>>> > >>>>>>>> Name: Main St > >>>>>>>> Score: 41.204945 > >>>>>>>> ID: 12681974 > >>>>>>>> Country Code: US > >>>>>>>> Coordinates: 42.76019, -71.465 > >>>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES > >>>>>>>> > >>>>>>>> Name: Main Dunstable Rd > >>>>>>>> Score: 41.204945 > >>>>>>>> ID: 12677399 > >>>>>>>> Country Code: US > >>>>>>>> Coordinates: 42.74641, -71.48943 > >>>>>>>> Search Key: MAIN DUNSTABLE NASHUA HILLSBOROUGH NEW HAMPSHIRE > >>>>>>>> UNITED STATES > >>>>>>>> > >>>>>>>> Name: S Main St > >>>>>>>> Score: 41.204945 > >>>>>>>> ID: 11893215 > >>>>>>>> Country Code: US > >>>>>>>> Coordinates: 42.73412, -71.44797 > >>>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES > >>>>>>>> > >>>>>>>> Name: Main St > >>>>>>>> Score: 41.204945 > >>>>>>>> ID: 12681978 > >>>>>>>> Country Code: US > >>>>>>>> Coordinates: 42.73492, -71.44951 > >>>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES > >>>>>>>> > >>>>>>>> Name: S Main St > >>>>>>>> Score: 41.204945 > >>>>>>>> ID: 11893214 > >>>>>>>> Country Code: US > >>>>>>>> Coordinates: 42.73958, -71.45895 > >>>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES > >>>>>>>> > >>>>>>>> Name: Main St > >>>>>>>> Score: 41.204945 > >>>>>>>> ID: 12681979 > >>>>>>>> Country Code: US > >>>>>>>> Coordinates: 42.76416, -71.46681 > >>>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES > >>>>>>>> > >>>>>>>> Name: Main St > >>>>>>>> Score: 41.204945 > >>>>>>>> ID: 12681977 > >>>>>>>> Country Code: US > >>>>>>>> Coordinates: 42.747, -71.45957 > >>>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> Case #2 > >>>>>>>> > >>>>>>>> When i did this it also worked by adding ~ to make it Fuzzy query > >>>>>>>> to Main word: > >>>>>>>> > >>>>>>>> org.apache.lucene.queryparser.classic.QueryParser parser = new > >>>>>>>> org.apache.lucene.queryparser.classic.QueryParser(field, > >>>>>>>> phraseAnalyzer) ; > >>>>>>>> Query q1 = null; > >>>>>>>> try { > >>>>>>>> q1 = parser.parse("Main~"); > >>>>>>>> } catch (ParseException e) { > >>>>>>>> e.printStackTrace(); > >>>>>>>> } > >>>>>>>> booleanQuery.add(q1, BooleanClause.Occur.MUST); > >>>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field, > >>>>>>>> "NASHUA"), BooleanClause.Occur.MUST); > >>>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field, > >>>>>>>> "NEW HAMPSHIRE"), BooleanClause.Occur.MUST); > >>>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field, > >>>>>>>> "UNITED STATES"), BooleanClause.Occur.MUST); > >>>>>>>> > >>>>>>>> > >>>>>>>> query plan: > >>>>>>>> > >>>>>>>> [+contentDFLT:main~2, +contentDFLT:"nashua", > >>>>>>>> +contentDFLT:"new-hampshire", +contentDFLT:"united states"] > >>>>>>>> > >>>>>>>> testQuerySearch1 Time to compute: 24 seconds (due to debugging > >>>>>>>> stops) > >>>>>>>> Number of results: 12 > >>>>>>>> Name: Main Dunstable Rd > >>>>>>>> Score: 41.06405 > >>>>>>>> ID: 12677400 > >>>>>>>> Country Code: US > >>>>>>>> Coordinates: 42.72631, -71.50269 > >>>>>>>> Search Key: MAIN DUNSTABLE NASHUA HILLSBOROUGH NEW HAMPSHIRE > >>>>>>>> UNITED STATES > >>>>>>>> > >>>>>>>> Name: Main St > >>>>>>>> Score: 41.06405 > >>>>>>>> ID: 12681980 > >>>>>>>> Country Code: US > >>>>>>>> Coordinates: 42.76416, -71.46681 > >>>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES > >>>>>>>> > >>>>>>>> Name: Main St > >>>>>>>> Score: 41.06405 > >>>>>>>> ID: 12681973 > >>>>>>>> Country Code: US > >>>>>>>> Coordinates: 42.75045, -71.4607 > >>>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES > >>>>>>>> > >>>>>>>> Name: Main St > >>>>>>>> Score: 41.06405 > >>>>>>>> ID: 12681974 > >>>>>>>> Country Code: US > >>>>>>>> Coordinates: 42.76019, -71.465 > >>>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES > >>>>>>>> > >>>>>>>> Name: Main Dunstable Rd > >>>>>>>> Score: 41.06405 > >>>>>>>> ID: 12677399 > >>>>>>>> Country Code: US > >>>>>>>> Coordinates: 42.74641, -71.48943 > >>>>>>>> Search Key: MAIN DUNSTABLE NASHUA HILLSBOROUGH NEW HAMPSHIRE > >>>>>>>> UNITED STATES > >>>>>>>> > >>>>>>>> Name: S Main St > >>>>>>>> Score: 41.06405 > >>>>>>>> ID: 11893215 > >>>>>>>> Country Code: US > >>>>>>>> Coordinates: 42.73412, -71.44797 > >>>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES > >>>>>>>> > >>>>>>>> Name: Main St > >>>>>>>> Score: 41.06405 > >>>>>>>> ID: 12681978 > >>>>>>>> Country Code: US > >>>>>>>> Coordinates: 42.73492, -71.44951 > >>>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES > >>>>>>>> > >>>>>>>> Name: S Main St > >>>>>>>> Score: 41.06405 > >>>>>>>> ID: 11893214 > >>>>>>>> Country Code: US > >>>>>>>> Coordinates: 42.73958, -71.45895 > >>>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES > >>>>>>>> > >>>>>>>> Name: Main St > >>>>>>>> Score: 41.06405 > >>>>>>>> ID: 12681979 > >>>>>>>> Country Code: US > >>>>>>>> Coordinates: 42.76416, -71.46681 > >>>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES > >>>>>>>> > >>>>>>>> Name: Main St > >>>>>>>> Score: 41.06405 > >>>>>>>> ID: 12681977 > >>>>>>>> Country Code: US > >>>>>>>> Coordinates: 42.747, -71.45957 > >>>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> Case #3 > >>>>>>>> > >>>>>>>> But why does this not work with fuzzy mode and i misspelled a bit > >>>>>>>> (1 edit away) and as You saw the data is there with Main spelling: > >>>>>>>> > >>>>>>>> org.apache.lucene.queryparser.classic.QueryParser parser = new > >>>>>>>> org.apache.lucene.queryparser.classic.QueryParser(field, > >>>>>>>> phraseAnalyzer) ; > >>>>>>>> > >>>>>>>> Query q1 = null; > >>>>>>>> try { > >>>>>>>> q1 = parser.parse("Mains~"); // 1 edit away > >>>>>>>> } catch (ParseException e) { > >>>>>>>> e.printStackTrace(); > >>>>>>>> } > >>>>>>>> booleanQuery.add(q1, BooleanClause.Occur.MUST); > >>>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field, > >>>>>>>> "NASHUA"), BooleanClause.Occur.MUST); > >>>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field, > >>>>>>>> "NEW HAMPSHIRE"), BooleanClause.Occur.MUST); > >>>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field, > >>>>>>>> "UNITED STATES"), BooleanClause.Occur.MUST); > >>>>>>>> > >>>>>>>> query plan: > >>>>>>>> > >>>>>>>> [+contentDFLT:mains~2, +contentDFLT:"nashua", > >>>>>>>> +contentDFLT:"new-hampshire", +contentDFLT:"united states"] > >>>>>>>> > >>>>>>>> testQuerySearch1 Time to compute: 23 seconds (due to debugging > >>>>>>>> stops) > >>>>>>>> > >>>>>>>> Number of results: 0 > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> Case #4 > >>>>>>>> > >>>>>>>> Then i changed q1 to SHOULD from MUST above: and i think fuzzy > >>>>>>>> query is ignored here since there is no MAIN in the first 468 > >>>>>>>> resuls: > >>>>>>>> > >>>>>>>> there is no boost for Mains term here. > >>>>>>>> > >>>>>>>> query plan: > >>>>>>>> > >>>>>>>> [contentDFLT:mains~2, +contentDFLT:"nashua", > >>>>>>>> +contentDFLT:"new-hampshire", +contentDFLT:"united states"] > >>>>>>>> > >>>>>>>> testQuerySearch1 Time to compute: 125 seconds (due to debugging > >>>>>>>> stops) > >>>>>>>> Number of results: 1794 > >>>>>>>> Name: Nashua Dr > >>>>>>>> Score: 34.186226 > >>>>>>>> ID: 4974936 > >>>>>>>> Country Code: US > >>>>>>>> Coordinates: 42.7636, -71.46063 > >>>>>>>> Search Key: NASHUA NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES > >>>>>>>> > >>>>>>>> Name: Nashua River Rail Trl > >>>>>>>> Score: 34.186226 > >>>>>>>> ID: 4975508 > >>>>>>>> Country Code: US > >>>>>>>> Coordinates: 42.7062, -71.53962 > >>>>>>>> Search Key: NASHUA RIVER RAIL NASHUA HILLSBOROUGH NEW HAMPSHIRE > >>>>>>>> UNITED STATES > >>>>>>>> > >>>>>>>> Name: Nashua Rd > >>>>>>>> Score: 33.84896 > >>>>>>>> ID: 4975388 > >>>>>>>> Country Code: US > >>>>>>>> Coordinates: 42.78746, -71.92823 > >>>>>>>> Search Key: NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES > >>>>>>>> > >>>>>>>> Name: NASHUA > >>>>>>>> Score: 33.84896 > >>>>>>>> ID: 21014865 > >>>>>>>> Country Code: US > >>>>>>>> Coordinates: 42.75873, -71.46438 > >>>>>>>> Search Key: NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES > >>>>>>>> > >>>>>>>> Name: NASHUA > >>>>>>>> Score: 33.84896 > >>>>>>>> ID: 21014865 > >>>>>>>> Country Code: US > >>>>>>>> Coordinates: 42.75873, -71.46438 > >>>>>>>> Search Key: NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES > >>>>>>>> > >>>>>>>> Name: NASHUA > >>>>>>>> Score: 33.84896 > >>>>>>>> ID: 21014865 > >>>>>>>> Country Code: US > >>>>>>>> Coordinates: 42.75873, -71.46438 > >>>>>>>> Search Key: NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES > >>>>>>>> > >>>>>>>> Name: NASHUA > >>>>>>>> Score: 33.84896 > >>>>>>>> ID: 21014865 > >>>>>>>> Country Code: US > >>>>>>>> Coordinates: 42.75873, -71.46438 > >>>>>>>> Search Key: NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES > >>>>>>>> > >>>>>>>> Name: NASHUA > >>>>>>>> Score: 33.84896 > >>>>>>>> ID: 21014865 > >>>>>>>> Country Code: US > >>>>>>>> Coordinates: 42.75873, -71.46438 > >>>>>>>> Search Key: NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES > >>>>>>>> > >>>>>>>> Name: Nashua St > >>>>>>>> Score: 33.84896 > >>>>>>>> ID: 4975671 > >>>>>>>> Country Code: US > >>>>>>>> Coordinates: 42.88471, -70.81687 > >>>>>>>> Search Key: NASHUA ROCKINGHAM NEW HAMPSHIRE UNITED STATES > >>>>>>>> > >>>>>>>> Name: Nashua Rd > >>>>>>>> Score: 33.84896 > >>>>>>>> ID: 4975400 > >>>>>>>> Country Code: US > >>>>>>>> Coordinates: 42.79014, -71.92364 > >>>>>>>> Search Key: NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES > >>>>>>>> > >>>>>>>> > >>>>>>>> Why is the fuzzy query ignored? > >>>>>>>> Even if i have separate fields for street, city,region, country, > >>>>>>>> this fuzzy query issue will come into place for words with > >>>>>>>> multiple parts like main dunstable etc., right? > >>>>>>>> > >>>>>>>> Best regards > >>>>>>>> > >>>>>>>> On 6/12/19 11:36 AM, baris.ka...@oracle.com wrote: > >>>>>>>> > >>>>>>>> Tomoko,- > >>>>>>>> > >>>>>>>> Thank You for Your suggestions. i am trying to understand it > >>>>>>>> and i thought i did :) > >>>>>>>> > >>>>>>>> but it does not work with FuzzyQuery when i used with a *single* > >>>>>>>> large TextField like street=...value... city=...value... > >>>>>>>> region=...value... country=...value... (with or without quotes > >>>>>>>> for the values) > >>>>>>>> > >>>>>>>> What i knew about Lucene fuzzy queries are not holding now with > >>>>>>>> this Textfield form. That is why i suspected of a bug. > >>>>>>>> > >>>>>>>> 1. Yes, i saw and have a solid proof on that now. > >>>>>>>> > >>>>>>>> 2. yes but FuzzyQuery takes quotes as they are as they are > >>>>>>>> escaped and it is not analyzed. > >>>>>>>> > >>>>>>>> Stuffing into one textfield vs having separate fields should only > >>>>>>>> affect probably the performance but not the outcome in my case. > >>>>>>>> But, i have been thinking about this and maybe it is the way to > >>>>>>>> go in this case. > >>>>>>>> > >>>>>>>> mY CONTENT field has street names in mixed case and city, region > >>>>>>>> country names in UPPERCASE. Can this be a problem? > >>>>>>>> i thought index stored them in lowercase since i am using > >>>>>>>> StandardAnalyzer. > >>>>>>>> > >>>>>>>> CONTENT field also has full textfield string with street=... > >>>>>>>> city=... region=... country=... (here all values are UPPERCASE). > >>>>>>>> > >>>>>>>> Why cant the index find the names via FuzzyQuery? i tried both > >>>>>>>> FuzzyQuery and Query builder as i showed before. > >>>>>>>> > >>>>>>>> The last advice in Your previous email would nicely go outside > >>>>>>>> the parantheses since it might be very critical :) :) :) > >>>>>>>> > >>>>>>>> Best regards > >>>>>>>> > >>>>>>>> > >>>>>>>> On 6/12/19 12:17 AM, Tomoko Uchida wrote: > >>>>>>>> > >>>>>>>> I'd suggest to correctly understand the way a software works before > >>>>>>>> suspecting its bug :-) > >>>>>>>> > >>>>>>>> I guess you may miss two points: > >>>>>>>> > >>>>>>>> 1. the standard analyzer (standard tokenizer) breaks words by double > >>>>>>>> quote (U+0022) so quotes are not indexed or searched at all if > >>>>>>>> you are > >>>>>>>> using standard analyzer. (That is the reason you have same results > >>>>>>>> with or without quotes.) > >>>>>>>> See: > >>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_8-5F1-5F0_core_org_apache_lucene_analysis_standard_StandardTokenizer.html&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=1L6ZQKxmWmYxDX4uJHxzY5SAR_UCl6UUXCo916wzXCo&s=8E2lp1YIGM-3v3FspeieGl8z8rEBs6qioTudtFNzh8c&e= > >>>>>>>> and > >>>>>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__unicode.org_reports_tr29_&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=1L6ZQKxmWmYxDX4uJHxzY5SAR_UCl6UUXCo916wzXCo&s=riCZ_f25XW869CKbHPUqfbLiDU-AukE6la0xTLMw6u8&e= > >>>>>>>> > >>>>>>>> 2. double quote has special meaning (it's interpreted as phrase > >>>>>>>> query) > >>>>>>>> with the built-in query parser so you need to escape it if you > >>>>>>>> want to > >>>>>>>> search double quotes itself. > >>>>>>>> See: > >>>>>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lucene.apache.org_core_8-5F1-5F0_queryparser_org_apache_lucene_queryparser_classic_package-2Dsummary.html-23Terms&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=1L6ZQKxmWmYxDX4uJHxzY5SAR_UCl6UUXCo916wzXCo&s=t8OYTgidvcwNpAVFuTsqGhDJK5BwUZVCxc0mPHzqCYU&e= > >>>>>>>> > >>>>>>>> (My advice would be to create separate fields for each key value > >>>>>>>> pairs > >>>>>>>> instead of stuffing all pairs into one text field, if you need to > >>>>>>>> search them separately.) > >>>>>>>> > >>>>>>>> 2019年6月12日(水) 2:39 <baris.ka...@oracle.com>: > >>>>>>>> > >>>>>>>> i can say that quotes is not the issue with index as it still > >>>>>>>> results in > >>>>>>>> same results with quotes or without quotes. > >>>>>>>> > >>>>>>>> i am starting to feel that this might be a bug maybe?? > >>>>>>>> > >>>>>>>> Best regards > >>>>>>>> > >>>>>>>> > >>>>>>>> On 6/10/19 2:46 PM, baris.ka...@oracle.com wrote: > >>>>>>>> > >>>>>>>> Somehow " is causing an issue as this should return street with > >>>>>>>> MAIN: > >>>>>>>> > >>>>>>>> [contentDFLT:street="MAINS"~2, +contentDFLT:"city nashua", > >>>>>>>> +contentDFLT:"region new-hampshire", +contentDFLT:"country united > >>>>>>>> states"] -> this was with fuzzyquery on MAINS > >>>>>>>> > >>>>>>>> Best regards > >>>>>>>> > >>>>>>>> > >>>>>>>> On 6/10/19 2:24 PM, baris.ka...@oracle.com wrote: > >>>>>>>> > >>>>>>>> [+contentDFLT:"city nashua", +contentDFLT:"region new-hampshire", > >>>>>>>> +contentDFLT:"country united states", contentDFLT:street > >>>>>>>> contentDFLT:mains] > >>>>>>>> > >>>>>>>> QueeryParser chops it into two pieces from > >>>>>>>> parser.parser("street=\"MAINS\""); > >>>>>>>> > >>>>>>>> Index has a TextField named contentDFLT the following data : > >>>>>>>> street="MAIN" city="NASHUA" municipality="HILLSBOROUGH" region="NEW > >>>>>>>> HAMPSHIRE" country="UNITED STATES" > >>>>>>>> > >>>>>>>> > >>>>>>>> When i set street=\"MAINS~\" with parser: > >>>>>>>> i get the following > >>>>>>>> [+contentDFLT:"city nashua", +contentDFLT:"region new-hampshire", > >>>>>>>> +contentDFLT:"country united states", contentDFLT:street > >>>>>>>> contentDFLT:mains] > >>>>>>>> > >>>>>>>> probably " quotations are messing this up as You were saying... > >>>>>>>> Best regards > >>>>>>>> > >>>>>>>> > >>>>>>>> On 6/10/19 12:48 PM, Tomoko Uchida wrote: > >>>>>>>> > >>>>>>>> Or, " (double quotation) in your query string may affect query > >>>>>>>> parsing. > >>>>>>>> > >>>>>>>> When I parse this string by classic query parser (lucene 8.1), > >>>>>>>> street="MAINS~" > >>>>>>>> parsed (raw) query is > >>>>>>>> text:street text:mains > >>>>>>>> (I set the default search field to "text", so text:xxxx is appeared > >>>>>>>> here.) > >>>>>>>> > >>>>>>>> Query parsing is a complex process, so it would be good to check > >>>>>>>> parsed raw query string especially when you have (reserved) special > >>>>>>>> characters in your query... > >>>>>>>> > >>>>>>>> 2019年6月11日(火) 1:10 Tomoko Uchida <tomoko.uchida.1...@gmail.com>: > >>>>>>>> > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> I noticed one small thing in your previous mail. > >>>>>>>> > >>>>>>>> when i use q1 = parser.parse("street=\"MAIN\""); i get same results > >>>>>>>> > >>>>>>>> which is good. > >>>>>>>> > >>>>>>>> To specify a search field, ":" (colon) should be used instead of > >>>>>>>> "=". > >>>>>>>> See the query parser documentation: > >>>>>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lucene.apache.org_core_8-5F1-5F0_queryparser_org_apache_lucene_queryparser_classic_package-2Dsummary.html-23Fields&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=u4SeJqH4lePhOazCLwxLEr3WqcMkODtYLv4njiKZ4PM&s=WrNfUXO9gz1PqpczTJw1vD9sWqvr76WRv2Aeo9uWqa4&e= > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> I'm not sure this is related to your problem. > >>>>>>>> > >>>>>>>> 2019年6月11日(火) 0:51 <baris.ka...@oracle.com>: > >>>>>>>> > >>>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field, > >>>>>>>> "city=\"NASHUA\""), BooleanClause.Occur.MUST); > >>>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field, > >>>>>>>> "region=\"NEW HAMPSHIRE\""), BooleanClause.Occur.MUST); > >>>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field, > >>>>>>>> "country=\"UNITED STATES\""), BooleanClause.Occur.MUST); > >>>>>>>> > >>>>>>>> org.apache.lucene.queryparser.classic.QueryParser parser = new > >>>>>>>> org.apache.lucene.queryparser.classic.QueryParser(field, > >>>>>>>> phraseAnalyzer) ; > >>>>>>>> Query q1 = null; > >>>>>>>> try { > >>>>>>>> q1 = parser.parse("MAIN"); > >>>>>>>> } catch (ParseException e) { > >>>>>>>> > >>>>>>>> e.printStackTrace(); > >>>>>>>> } > >>>>>>>> booleanQuery.add(q1, BooleanClause.Occur.SHOULD); > >>>>>>>> > >>>>>>>> testQuerySearch2 Time to compute: 0 seconds > >>>>>>>> Number of results: 1775 > >>>>>>>> Name: Main St > >>>>>>>> Score: 37.20959 > >>>>>>>> ID: 12681979 > >>>>>>>> Country Code: US > >>>>>>>> Coordinates: 42.76416, -71.46681 > >>>>>>>> Search Key: street="MAIN" city="NASHUA" municipality="HILLSBOROUGH" > >>>>>>>> region="NEW HAMPSHIRE" country="UNITED STATES" > >>>>>>>> > >>>>>>>> Name: Main St > >>>>>>>> Score: 37.20959 > >>>>>>>> ID: 12681977 > >>>>>>>> Country Code: US > >>>>>>>> Coordinates: 42.747, -71.45957 > >>>>>>>> Search Key: street="MAIN" city="NASHUA" municipality="HILLSBOROUGH" > >>>>>>>> region="NEW HAMPSHIRE" country="UNITED STATES" > >>>>>>>> > >>>>>>>> Name: Main St > >>>>>>>> Score: 37.20959 > >>>>>>>> ID: 12681978 > >>>>>>>> Country Code: US > >>>>>>>> Coordinates: 42.73492, -71.44951 > >>>>>>>> Search Key: street="MAIN" city="NASHUA" municipality="HILLSBOROUGH" > >>>>>>>> region="NEW HAMPSHIRE" country="UNITED STATES" > >>>>>>>> > >>>>>>>> when i use q1 = parser.parse("street=\"MAIN\""); i get same > >>>>>>>> results > >>>>>>>> which is good. > >>>>>>>> > >>>>>>>> But when i switch to MAINS~ then fuzzy query does not work. > >>>>>>>> > >>>>>>>> > >>>>>>>> i need to say something with the q1 only in the booleanquery: > >>>>>>>> it tries to match the MAIN in street, city, region and country > >>>>>>>> which are > >>>>>>>> in a single TextField field. > >>>>>>>> But i dont want this. that is why i need to street="..." etc when > >>>>>>>> searching. > >>>>>>>> > >>>>>>>> Best regards > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> On 6/10/19 11:31 AM, Tomoko Uchida wrote: > >>>>>>>> > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> just for the basic verification, can you find the document without > >>>>>>>> fuzzy query? I mean, does this query work for you? > >>>>>>>> > >>>>>>>> Query query = parser.parse("MAIN"); > >>>>>>>> > >>>>>>>> Tomoko > >>>>>>>> > >>>>>>>> 2019年6月11日(火) 0:22 <baris.ka...@oracle.com>: > >>>>>>>> > >>>>>>>> why cant the second set not work at all? > >>>>>>>> > >>>>>>>> it is indexed as Textfield like street="..." city="..." etc. > >>>>>>>> > >>>>>>>> Best regards > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> On 6/10/19 11:23 AM, baris.ka...@oracle.com wrote: > >>>>>>>> > >>>>>>>> i dont know how to use Fuzzyquery with queryparser but probably > >>>>>>>> You > >>>>>>>> are suggesting > >>>>>>>> > >>>>>>>> QueryParser parser = new QueryParser(field, analyzer) ; > >>>>>>>> Query query = parser.parse("MAINS~2"); > >>>>>>>> > >>>>>>>> booleanQuery.add(query, BooleanClause.Occur.SHOULD); > >>>>>>>> > >>>>>>>> am i right? > >>>>>>>> Best regards > >>>>>>>> > >>>>>>>> > >>>>>>>> On 6/10/19 10:47 AM, Atri Sharma wrote: > >>>>>>>> > >>>>>>>> I would suggest using a QueryParser for your fuzzy query before > >>>>>>>> adding it to the Boolean query. This should weed out any case > >>>>>>>> issues. > >>>>>>>> > >>>>>>>> On Mon, 10 Jun 2019 at 8:06 PM, <baris.ka...@oracle.com > >>>>>>>> <mailto:baris.ka...@oracle.com>> wrote: > >>>>>>>> > >>>>>>>> BooleanQuery.Builder booleanQuery = new > >>>>>>>> BooleanQuery.Builder(); > >>>>>>>> > >>>>>>>> //First set > >>>>>>>> > >>>>>>>> booleanQuery.add(new FuzzyQuery(new > >>>>>>>> org.apache.lucene.index.Term(field, "MAINS")), > >>>>>>>> BooleanClause.Occur.SHOULD); > >>>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field, > >>>>>>>> "NASHUA"), BooleanClause.Occur.MUST); > >>>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field, > >>>>>>>> "NEW HAMPSHIRE"), BooleanClause.Occur.MUST); > >>>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field, > >>>>>>>> "UNITED STATES"), BooleanClause.Occur.MUST); > >>>>>>>> > >>>>>>>> // Second set > >>>>>>>> //booleanQuery.add(new FuzzyQuery(new > >>>>>>>> org.apache.lucene.index.Term(field, "street=\"MAINS\"")), > >>>>>>>> BooleanClause.Occur.SHOULD); > >>>>>>>> //booleanQuery.add(Utils.createPhraseQueryFullText(phraseAnalyzer, > >>>>>>>> > >>>>>>>> field, "city=\"NASHUA\""), BooleanClause.Occur.MUST); > >>>>>>>> //booleanQuery.add(Utils.createPhraseQueryFullText(phraseAnalyzer, > >>>>>>>> > >>>>>>>> field, "region=\"NEW HAMPSHIRE\""), > >>>>>>>> BooleanClause.Occur.MUST); > >>>>>>>> //booleanQuery.add(Utils.createPhraseQueryFullText(phraseAnalyzer, > >>>>>>>> > >>>>>>>> field, "country=\"UNITED STATES\""), > >>>>>>>> BooleanClause.Occur.MUST); > >>>>>>>> > >>>>>>>> The first set brings also street with Nashua name. > >>>>>>>> (NASHUA). > >>>>>>>> > >>>>>>>> so, to prevent that and since i also indexed with > >>>>>>>> street="..." > >>>>>>>> city="..." i did the second set but it does not bring > >>>>>>>> anything. > >>>>>>>> > >>>>>>>> createPhraseQuery builds a Phrasequery with one term > >>>>>>>> equal to the > >>>>>>>> string > >>>>>>>> in the call. > >>>>>>>> > >>>>>>>> Best regards > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> On 6/10/19 10:47 AM, baris.ka...@oracle.com > >>>>>>>> <mailto:baris.ka...@oracle.com> wrote: > >>>>>>>> > How do i check how it is indexed? lowecase or uppercase? > >>>>>>>> > > >>>>>>>> > only way is now to by testing. > >>>>>>>> > > >>>>>>>> > i am using standardanalyzer. > >>>>>>>> > > >>>>>>>> > Best regards > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > On 6/9/19 11:57 AM, Atri Sharma wrote: > >>>>>>>> >> On Sun, Jun 9, 2019 at 8:53 PM Tomoko Uchida > >>>>>>>> >> <tomoko.uchida.1...@gmail.com > >>>>>>>> <mailto:tomoko.uchida.1...@gmail.com>> wrote: > >>>>>>>> >>> Hi, > >>>>>>>> >>> > >>>>>>>> >>> What analyzer do you use for the text field? Is the > >>>>>>>> term "Main" > >>>>>>>> >>> correctly indexed? > >>>>>>>> >> Agreed. Also, it would be good if you could post your > >>>>>>>> actual > >>>>>>>> code. > >>>>>>>> >> > >>>>>>>> >> What analyzer are you using? If you are using > >>>>>>>> StandardAnalyzer, > >>>>>>>> then > >>>>>>>> >> all of your terms while indexing will be lowercased, > >>>>>>>> AFAIK, but > >>>>>>>> your > >>>>>>>> >> query will not be analyzed until you run a > >>>>>>>> QueryParser on it. > >>>>>>>> >> > >>>>>>>> >> > >>>>>>>> >> Atri > >>>>>>>> >> > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> --------------------------------------------------------------------- > >>>>>>>> > >>>>>>>> > >>>>>>>> > To unsubscribe, e-mail: > >>>>>>>> java-user-unsubscr...@lucene.apache.org > >>>>>>>> <mailto:java-user-unsubscr...@lucene.apache.org> > >>>>>>>> > For additional commands, e-mail: > >>>>>>>> java-user-h...@lucene.apache.org > >>>>>>>> <mailto:java-user-h...@lucene.apache.org> > >>>>>>>> > > >>>>>>>> > >>>>>>>> --------------------------------------------------------------------- > >>>>>>>> > >>>>>>>> > >>>>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >>>>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org > >>>>>>>> > >>>>>>>> --------------------------------------------------------------------- > >>>>>>>> > >>>>>>>> > >>>>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >>>>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org > >>>>>>>> > >>>>>>>> --------------------------------------------------------------------- > >>>>>>>> > >>>>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >>>>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org > >>>>>>>> > >>>>>>>> --------------------------------------------------------------------- > >>>>>>>> > >>>>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >>>>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>> --------------------------------------------------------------------- > >>>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >>>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org > >>>>>>> > >>>>>> --------------------------------------------------------------------- > >>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org > >>>>>> > >>>>> --------------------------------------------------------------------- > >>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org > >>>>> > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org