This appears to be a string distance problem. Look up a library such as https://secure.wikimedia.org/wikipedia/en/wiki/SimMetrics. It offers several string distance metrics, choose one that fits your needs.
On Aug 17, 2011 12:10 PM, "Ben West" <bwsithspaw...@yahoo.com> wrote: > Hey Bjorn, > > The search "aut*" is equivalent to "auto OR autotur". Lucene has no concept of which is a "better" match (both auto and autotur have "aut" followed by something, so they are both perfect matches). > > The spellcheck contrib can do something vaguely like what you want, but my guess is you'd have to write your own query to accomplish what you want. (This isn't a problem that Lucene focuses on solving really.) You can just modify the wildcard query but add a boost proportional to the length of the term. > > Hope this helps, > -Ben > > > ----- Original Message ----- > From: Björn Kremer <b...@patorg.de> > To: lucene-net-user@lucene.apache.org > Cc: > Sent: Wednesday, August 17, 2011 10:07 AM > Subject: Re: [Lucene.Net] Lucene search results sort order > > Hello, > > ok I see this wasn't a good sample. Here is another: > > I'd like to index german text. So the field "content" can e.g. contain > the words "Auto" and "Autotür". > If I search for "Aut*" the order is still: > 1.) Autotür > 2.) Auto > > The field "content" can contain a complete text. What analyzer should I use? > > Thank You > Björn > > > Am 17.08.2011 16:45, schrieb Robert Jordan: >> Hi, >> >> You're either using the wrong analyzer (e.g. StandardAnalyzer >> is not suitable for this kind of values) or you're not using >> the same analyzer during indexing and search. >> >> Your data looks like it should be stored unanalyzed >> (Field.Index.NOT_ANALYZED). >> >> Robert >> >> >> On 17.08.2011 16:17, Björn Kremer wrote: >>> Hello Anders, >>> >>> thank you for the answer. >>> >>> Here is a little sample that shows my problem: >>> >>> I have 4 documents with the field "test"(and some more fields): >>> >>> Doc-Number -> Value in field "test" >>> 1 -> TEST001 >>> 2 -> TEST001A >>> 3 -> TEST001B >>> 4 -> TEST001C >>> >>> Now I send a wildcard query like this: >>> test:TEST00* >>> >>> This returns a list with this order: >>> TEST001B >>> TEST001C >>> TEST001A >>> TEST001 >>> >>> But I think "TEST001" should be the first match because "TEST001" has >>> the best fit to "TEST00". >>> >>> Thank you >>> Björn >>> >>> >>> Björn Kremer >>> Brügmann Software GmbH >>> Bokeler Straße 18 >>> 26871 Papenburg >>> >>> Phone: +49(0)4962-9119-43 >>> Fax: +49(0)4962-9119-33 >>> mailto: b...@patorg.de >>> Internet: www.patorg.de >>> >>> Sitz der Gesellschaft: Papenburg >>> Handelsregister: Amtsgericht Osnabrück, HRB 202707 >>> Ust.-Id.-Nr.: DE 262 943 559 >>> Geschäftsführer: Dipl.-Ing. Jochen Brügmann, Dipl.-Kfr. Julia Brügmann, >>> Dipl.-Inf. Sören Brügmann >>> >>> Diese E-Mail und ihre eventuellen Anlagen können vertrauliche >>> Informationen enthalten. Wenn Sie diese E-Mail irrtümlich empfangen >>> haben, dann bitten wir die E-Mail zu vernichten und uns zu informieren. >>> Jedes Verwenden, Kopieren, Verbreiten oder sonstiges Benutzen des >>> Inhalts der irrtümlich empfangenen E-Mail ist untersagt. >>> >>> >>> Am 17.08.2011 15:03, schrieb Anders Lybecker: >>>> Hej Björn, >>>> >>>> The default search order (Relevance) will return the best match as the >>>> first >>>> result. >>>> >>>> :-) >>>> Anders Lybecker >>>> >>>> 2011/8/17 Björn Kremer<b...@patorg.de> >>>> >>>>> Hi, >>>>> >>>>> what is the default sort order for lucene search results? I'm using >>>>> lucene >>>>> 2.9 and send a query that looks like this: >>>>> >>>>> topdocs= IndexSearcherInstance.Search(**query,null,limit, >>>>> Lucene.Net.Search.Sort.**RELEVANCE) >>>>> >>>>> Is the first document in the topdocs collection(Index 0) the document >>>>> with >>>>> the best score or the worst score? >>>>> >>>>> Sometimes the last match in the collections seems to be the best >>>>> match. >>>>> >>>>> Thank you >>>>> Björn >>>>> >> >> >