Most often, from what I've seen on this e-mail list, unexpected results are because you're not indexing on the tokens you *think* you're indexing. Or not searching on them. By that I mean that the analyzers you're using are behaving in ways you don't expect.
That said, I think you're getting exactly what you should. I suspect you're indexing tokens as follows doc1: "6/12" and "(finding)" doc2: "R-eye=6/12" and "(finding)" So it makes perfect sense that searching in 6/12 returns doc1 and search on R-eye=6/12 returns doc 2 So, first question: Have you actually used something like Luke (google luke lucene) to examine your index and see if what you've put in there is what you expect? What analyzer is your custom analyzer built upon and is it doing anything you're unaware of (for instance, lower-casing the 'R' in your second example)? Here's what I'd do. 1> get Luke and see what's actually in your index. 2> use searcher.explain (?) to see the query you're actually emitting. 3> if you make no headway, post the smallest code snippets you can that illustrate the problem. Folks would need the indexing AND searching code. As far as queryies like "contains" in java.... Well sure. Write a filter that filters on regular expressions or wildcards (you'll need WildcardTermEnum and RegexTermEnum). Or index things differently (e.g. index "6/12" and "finding" on doc1 and "r". "eye" "6/12" and "finding" on doc 2. Now your searches for "6/12" will work. Or index "6" "/", "12" and "finding" on doc1, index similarly for doc2, and use a SpanNearQuery with an appropriate span value. Or.... This is all gobbldeygook if you haven't gotten a copy of "Lucene In Action", which you should read in order to get the most out of Lucene. It's for the 1.4 code base, but the 2.0 Lucene code base isn't that much different. More importantly, it ties lots of stuff together. Also, the junit tests that come along with the Lucene code can be invaluable to show you how to do something. Hope this helps Erick On 10/1/06, Rahil <[EMAIL PROTECTED]> wrote:
Hi I have a custom-built Analyzer where I tokenize all non-whitespace characters as well available in the field "TERM" (which is the only field being tokenised). If I now query my index file for a term "6/12" for instance, I get back only ONE result SCORE DESCRIPTIONSTATUS CONCEPTID TERM 1.0 0 260278007 6/12 (finding) instead of TWO. There is another token in the index file of the form 2561280012 0 163939000 R-eye=6/12 (finding) 0 3 en At first it wasn't quite obvious to me why this was happening. But after playing around a bit I realised that if I pass a query "R-eye=6/12" instead, I will get this second result (but not the first one now). Is there a way to tweak the Query query = parser.parse(searchString) method so that I can get both the records if I query for "6/12". Something like a 'contains' query in Java. Will appreciate all help. Thanks a lot Regards Rahil --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]