Okay, I figured out my issue (well, actually a coworker spotted it - I was just too close). A word of warning:
Token "termBuffer" character arrays are fixed size, not sized to the number of characters! Yep, I was dropping the term buffer into a String without start and length, thereby adding unseen characters to the end of the String that just monkey up everything :) This is of course mentioned in the documentation - I just overlooked that little detail... On 5/28/08 5:10 PM, "Casey Dement" <[EMAIL PROTECTED]> wrote: > LOL - I sure wish it was! :) > > Sadly, that was a typo (Luke, for all its beauties, does not seem to grasp > the concept of a clipboard so the sample was a manual transcription). > > A few more details - don't know if this will help or not. > > Same query as before, when I do a rewrite of the query in Luke I get back a > set of 44 matching tokens for the given "Austell" ranging between a boost of > 0.1429 for "turrel" up to 1.0 for "austell". That the token "austell" gets > a 1.0 is rather obvious since it's a perfect match... > > BUT - when I rewrite the query in my local code, I get 2 matches, austell > (at 0.14285707) and (austerlitz at 0.20000005). This is disturbing on two > fronts - first of all where are the other 42? And secondly - why is the > exact match evaluating to such a low boost? > > Does that point at all to where I'm going astray? > > Thanks! > > Casey > > > On 5/23/08 5:00 AM, "Ian Lea" <[EMAIL PROTECTED]> wrote: > >>> ... >>> And expect to match document 156297 (search_text=="Austell GA", type==1). >>> ... >>> System.out.println(searcher.explain(query, 156296)); >> >> 156297 != 156296 >> >> Could that be it? >> >> >> -- >> Ian. >> >> >> On Thu, May 22, 2008 at 11:21 PM, Casey Dement <[EMAIL PROTECTED]> wrote: >>> Hi - trying to execute a search in Lucene and getting results I don't >>> understand :( >>> >>> The index contains fields search_text and type - both indexed tokenized. >>> I'm attempting to execute the query: >>> >>> +(search_text:austell~0.9 search_text:ga~0.9) +(type:1 type:4) >>> >>> And expect to match document 156297 (search_text=="Austell GA", type==1). >>> >>> I am executing this query both directly in code and via the tool Luke - but >>> getting WILDLY different answers. In Luke, the expected document is found >>> no problem, but in my own code I find no results. Obviously I suspect my >>> code of being crap ;) >>> >>> Oh, FYI, in both my local code and Luke I am using a StandardAnalyzer and >>> the default column is "search_text". >>> >>> Here's what I'm doing: >>> >>> /******************************************************************/ >>> File location = new File("/the/correct/path"); >>> IndexReader index = IndexReader.open(location); >>> Searcher searcher = new IndexSearcher(index); >>> QueryParser parser = new QueryParser("search_text", new >>> StandardAnalyzer()); >>> Query query = parser.parse("+(search_text:austell~0.9 search_text:ga~0.9) >>> +(type:1 type:4)"); >>> System.out.println(searcher.explain(query, 156296)); >>> /******************************************************************/ >>> >>> When I run this, I get: >>> | 0.0000 = (NON-MATCH) Failure to meet condition(s) of required/prohibited >>> clause(s) >>> | 0.0000 = no match on required clause (() ()) >>> | 0.0000 = (NON-MATCH) product of: >>> | 0.0000 = (NON-MATCH) sum of: >>> | 0.0000 = coord(0/2) >>> | 0.2133 = (MATCH) product of: >>> | 0.4267 = (MATCH) sum of: >>> | 0.4267 = (MATCH) weight(type:1 in 156296), product of: >>> | 0.3672 = queryWeight(type:1), product of: >>> | 1.1618 = idf(docFreq=315734, numDocs=371197) >>> | 0.3161 = queryNorm >>> | 1.1618 = (MATCH) fieldWeight(type:1 in 156296), product of: >>> | 1.0000 = tf(termFreq(type:1)=1) >>> | 1.1618 = idf(docFreq=315734, numDocs=371197) >>> | 1.0000 = fieldNorm(field=type, doc=156296) >>> | 0.5000 = coord(1/2) >>> >>> So obviously I'm loading the index (since it did match the "type") - but it >>> seems to be COMPLETELY ignoring the criteria on "search_text". >>> >>> When I run this exact same string in Luke, I get: >>> | 8.0079 = (MATCH) sum of: >>> | 7.9578 = (MATCH) sum of: >>> | 5.4904 = (MATCH) weight(search_text:austell in 156297), product of: >>> | 0.8074 = queryWeight(search_text:austell), product of: >>> | 10.8800 = idf(docFreq=18, numDocs=371197) >>> | 0.0742 = queryNorm >>> | 6.8000 = (MATCH) fieldWeight(search_text:austell in 156297), >>> product of: >>> | 1.0000 = tf(termFreq(search_text:austell)=1) >>> | 10.8800 = idf(docFreq=18, numDocs=371197) >>> | 0.6250 = fieldNorm(field=search_text, doc=156297) >>> | 2.4673 = (MATCH) weight(search_text:ga in 156297), product of: >>> | 0.5413 = queryWeight(search_text:ga), product of: >>> | 7.2936 = idf(docFreq=685, numDocs=371197) >>> | 0.0742 = queryNorm >>> | 4.5585 = (MATCH) fieldWeight(search_text:ga in 156297), product of: >>> | 1.0000 = tf(termFreq(search_text:ga)=1) >>> | 7.2936 = idf(docFreq=685, numDocs=371197) >>> | 0.6250 = fieldNorm(field=search_text, doc=156297) >>> | 0.0501 = (MATCH) product of: >>> | 0.1002 = (MATCH) sum of: >>> | 0.1002 = (MATCH) weight(type:1 in 156296), product of: >>> | 0.0862 = queryWeight(type:1), product of: >>> | 1.1618 = idf(docFreq=315734, numDocs=371197) >>> | 0.0742 = queryNorm >>> | 1.1618 = (MATCH) fieldWeight(type:1 in 156296), product of: >>> | 1.0000 = tf(termFreq(type:1)=1) >>> | 1.1618 = idf(docFreq=315734, numDocs=371197) >>> | 1.0000 = fieldNorm(field=type, doc=156296) >>> | 0.5000 = coord(1/2) >>> >>> Which while clearly looking at the same document ID in the same index is >>> conversely working perfectly! >>> >>> Does anybody have any idea where I am screwing up? Thanks! >>> >>> Casey >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]