Andrzej, Hah!
I tried as you suggested using Luke, and I found at least part of my problem. Luke was defaulting to KeywordAnalyzer. I changed that to StandardAnalyzer, and did queries for: path:xxxxxxxxxxxxxxxxxxxxx and path:xxxxxxxxxxxxxxxxxxxxxx.dat For the first, the Rewritten was: path:xxxxxxxxxxxxxxxxxxxxx and found 1 document. For the 2nd, the Rewritten was: path:"xxxxxxxxxxxxxxxxxxxxxx.dat" and found 1 document. So, at least now the Luke search results are the same as what I'm seeing in the luceneweb web query. With the 2nd query, I did "Explain structure" and it shows: Term 0: field='path' text='xxxxxxxxxxxxxxxxxx' Term 1: field='path' text='dat' So, going back to Phil Whelan's explanation in his email yesterday: ==================================== This query will also pass through the same (hopefully) Analyzer and will be broken into terms. So the query will actually be for "file-1-2" and "dat" where "file-1-2" is followed immediately by "dat". In indexing the terms position is stored, so "C:\dir1\dir2\file-1-1.dat" becomes... [0] c [1] dir1 [2] dir2 [3] file-1-1 [4] dat "file-1-1" is followed by "dat", so there is a match. ======================================== I think the above explains things. So, the bottom line was that with Luke, it was using KeywordAnalyzer. When I switched Luke to using StandardAnalyzer, the Luke query results matched my web query results. THANKS!! I feel better now :)... Later, Jim ---- Andrzej Bialecki <[email protected]> wrote: > [email protected] wrote: > > Hi Phil, > > > > Well, kind of... but... > > > > Then, why, when I do the search in Luke, do I get the results I cited: > > > > xxxx ==> succeeds > > > > xxxx.yyy ==> fails (no results) > > > > I guess that I've been assuming that the search in Luke is "correct" and > > I've been using that to "test my understanding", but maybe that's an > > invalid assumption? > > Luke has some bugs, that's for sure, but not as many as one would think > ;) I recommend the following exercise: > > * first, check what the "Rewritten" query looks like, in both cases. > This could be enlightening, because depending on the choice of default > field and query analyzer results could differ dramatically. > > * then, if a query succeeds in matching one or more documents, open this > document and view its fields using "Reconstruct & edit", especially the > "Tokenized" version of the field. At this point any potential mismatch > in query terms vs. analyzed tokens in the field should become apparent. > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
