There are useful tips in the FAQ, http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2BAC8_incorrect_hits.3F.
I still think you should come up with small self-contained example code. -- Ian. On Wed, Apr 25, 2012 at 4:02 PM, Elmer van Chastelet <evanchaste...@gmail.com> wrote: > Thanks for your suggestion Ian, but I just found out that if I replace the > KeywordTokenizer with a WhitespaceTokenizer, all seems to work fine. > > Just to test what happens, I created another field 'orig', using this > analyzer: > analyzer KeywordLowered{ > tokenizer = KeywordTokenizer > tokenfilter = LowerCaseFilter > } > > Guess what.. exactly the same problem, also in Luke. > It finds no documents with for query: > orig:strange > While the term 'strange' is in the index for the field 'orig'. > > Does anybody have a clue why documents are not matched when using the > KeywordTokenizer? Remember that all queries and terms don't contain white > spaces. > > > Thanks again. > -Elmer > > > On 04/25/2012 02:53 PM, Ian Lea wrote: >> >> You seem to be quietly going round in circles, by yourself! I suggest >> a small self-contained program/test case with a RAM index created from >> scratch. You can then experiment with inject on or off and if you >> still can't figure it out, post the code and hopefully someone will be >> able to help you make sense of it. >> >> Make sure you tell us what version of Lucene you are using. If not >> the latest, wouldn't hurt to try with the latest. >> >> >> -- >> Ian. >> >> >> On Wed, Apr 25, 2012 at 1:22 PM, Elmer van Chastelet >> <evanchaste...@gmail.com> wrote: >>> >>> I keep replying to myself, it all gets a bit confusing. >>> The problem still exists and I don't understand why, and why it worked >>> once. >>> >>> I have the same behavior again as posted in my first mail: >>> - Inject parameter is set to true. >>> - The index has _no deleted documents_ and is optimized. >>> - The term 'compete' is in there. >>> - If I ask Luke to show all docs for term 'compete' it shows me the one >>> and >>> only document that represents this word. But... >>> - If I perform the query 'value:compete' in luke again, it says there are >>> no >>> results. >>> >>> Here is the index I'm currently using. It contains various fields for the >>> available phonetic filter encoders: >>> https://www.box.com/s/34212e82227e102f6734 >>> >>> Can somebody explain this behavior? What's the real use of the inject >>> parameter of the PhoneticFilterFactory? >>> >>> Thanks in advance. >>> >>> -Elmer >>> >>> >>> On 04/25/2012 12:25 PM, Elmer van Chastelet wrote: >>>> >>>> Problem solved. Long story short: for some reason I had deleted >>>> documents >>>> in the index and the non-deleted documents used the phonetic filter with >>>> inject set to false. >>>> >>>> Works fine now :) >>>> >>>> On 04/23/2012 09:27 PM, Elmer van Chastelet wrote: >>>>> >>>>> Hi all, >>>>> >>>>> (scroll to bottom for question) >>>>> >>>>> I was setting up a simple web app to play around with phonetic filters. >>>>> The idea is simple, I just create a document for each word in the >>>>> English >>>>> dictionary, each document containing a single search field holding the >>>>> value >>>>> after it is preprocessed using the following analyzer def (in our own >>>>> dsl >>>>> syntax, which gets transformed to java): >>>>> >>>>> analyzer soundslike{ >>>>> tokenizer = KeywordTokenizer >>>>> tokenfilter = LowerCaseFilter >>>>> tokenfilter = PhoneticFilter(encoder="DoubleMetaphone", >>>>> inject="true") >>>>> } >>>>> >>>>> I can run the web app and I get results that indeed (in some way) sound >>>>> like the original query term. >>>>> >>>>> But what confuses me is the ranking of the results, knowing that I set >>>>> the inject param to true. If I search for the query term 'compete', the >>>>> parsed query becomes '(value:KMPT value:compete)', and therefore I >>>>> expect >>>>> the word 'compete' to be ranked highest in the list than any other >>>>> word.... >>>>> but this wasn't the case. >>>>> >>>>> Looking further at the explanation of results, I saw that the term >>>>> 'compete' in the parsed query is totally absent, and only the phonetic >>>>> encoding seems affect the ranking: >>>>> >>>>> * COMPETITOR >>>>> o 4.368826 = (MATCH) sum of: >>>>> + 4.368826 = (MATCH) weight(value:KMPT in 3174), product of: >>>>> # 0.52838135 = queryWeight(value:KMPT), product of: >>>>> * 8.26832 = idf(docFreq=150, maxDocs=216555) >>>>> * 0.063904315 = queryNorm >>>>> # 8.26832 = (MATCH) fieldWeight(value:KMPT in 3174), >>>>> product of: >>>>> * 1.0 = tf(termFreq(value:KMPT)=1) >>>>> * 8.26832 = idf(docFreq=150, maxDocs=216555) >>>>> * 1.0 = fieldNorm(field=value, doc=3174) >>>>> >>>>> The next thing I did was running our friend Luke. In Luke, I opened the >>>>> documents tab, and started iterating over some terms for the field >>>>> 'value' >>>>> until I found 'compete'. When I hit 'Show All Docs', the search tab >>>>> opens >>>>> and it displays the one and only document holding this value (i.e. the >>>>> document representing the word 'compete'). It shows the query: >>>>> 'value:compete '. Then, when I hit the search button again (query is >>>>> still >>>>> 'value:compete '), it says that there are no results !? >>>>> >>>>> Probably, the 'Show All Docs' button does something different than >>>>> performing a query using the search tab in Luke. >>>>> >>>>> Q: Can somebody explain why the injected original terms seem to get >>>>> ignored at query time? Or may it be related to the name of the search >>>>> field >>>>> ('value'), or something else? >>>>> >>>>> We use Lucene 3.1 with SOLR analyzers (by Hibernate Search 3.4.2). >>>>> >>>>> -Elmer >>>>> >>>>> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org