This appears to be a string distance problem. Look up a library such as
https://secure.wikimedia.org/wikipedia/en/wiki/SimMetrics. It offers several
string distance metrics, choose one that fits your needs.

On Aug 17, 2011 12:10 PM, "Ben West" <bwsithspaw...@yahoo.com> wrote:
> Hey Bjorn,
>
> The search "aut*" is equivalent to "auto OR autotur". Lucene has no
concept of which is a "better" match (both auto and autotur have "aut"
followed by something, so they are both perfect matches).
>
> The spellcheck contrib can do something vaguely like what you want, but my
guess is you'd have to write your own query to accomplish what you want.
(This isn't a problem that Lucene focuses on solving really.) You can just
modify the wildcard query but add a boost proportional to the length of the
term.
>
> Hope this helps,
> -Ben
>
>
> ----- Original Message -----
> From: Björn Kremer <b...@patorg.de>
> To: lucene-net-user@lucene.apache.org
> Cc:
> Sent: Wednesday, August 17, 2011 10:07 AM
> Subject: Re: [Lucene.Net] Lucene search results sort order
>
> Hello,
>
> ok I see this wasn't a good sample. Here is another:
>
> I'd like to index german text. So the field "content" can e.g. contain
> the words "Auto" and "Autotür".
> If I search for "Aut*" the order is still:
> 1.) Autotür
> 2.) Auto
>
> The field "content" can contain a complete text. What analyzer should I
use?
>
> Thank You
> Björn
>
>
> Am 17.08.2011 16:45, schrieb Robert Jordan:
>> Hi,
>>
>> You're either using the wrong analyzer (e.g. StandardAnalyzer
>> is not suitable for this kind of values) or you're not using
>> the same analyzer during indexing and search.
>>
>> Your data looks like it should be stored unanalyzed
>> (Field.Index.NOT_ANALYZED).
>>
>> Robert
>>
>>
>> On 17.08.2011 16:17, Björn Kremer wrote:
>>> Hello Anders,
>>>
>>> thank you for the answer.
>>>
>>> Here is a little sample that shows my problem:
>>>
>>> I have 4 documents with the field "test"(and some more fields):
>>>
>>> Doc-Number -> Value in field "test"
>>> 1 -> TEST001
>>> 2 -> TEST001A
>>> 3 -> TEST001B
>>> 4 -> TEST001C
>>>
>>> Now I send a wildcard query like this:
>>> test:TEST00*
>>>
>>> This returns a list with this order:
>>> TEST001B
>>> TEST001C
>>> TEST001A
>>> TEST001
>>>
>>> But I think "TEST001" should be the first match because "TEST001" has
>>> the best fit to "TEST00".
>>>
>>> Thank you
>>> Björn
>>>
>>>
>>> Björn Kremer
>>> Brügmann Software GmbH
>>> Bokeler Straße 18
>>> 26871 Papenburg
>>>
>>> Phone: +49(0)4962-9119-43
>>> Fax: +49(0)4962-9119-33
>>> mailto: b...@patorg.de
>>> Internet: www.patorg.de
>>>
>>> Sitz der Gesellschaft: Papenburg
>>> Handelsregister: Amtsgericht Osnabrück, HRB 202707
>>> Ust.-Id.-Nr.: DE 262 943 559
>>> Geschäftsführer: Dipl.-Ing. Jochen Brügmann, Dipl.-Kfr. Julia Brügmann,
>>> Dipl.-Inf. Sören Brügmann
>>>
>>> Diese E-Mail und ihre eventuellen Anlagen können vertrauliche
>>> Informationen enthalten. Wenn Sie diese E-Mail irrtümlich empfangen
>>> haben, dann bitten wir die E-Mail zu vernichten und uns zu informieren.
>>> Jedes Verwenden, Kopieren, Verbreiten oder sonstiges Benutzen des
>>> Inhalts der irrtümlich empfangenen E-Mail ist untersagt.
>>>
>>>
>>> Am 17.08.2011 15:03, schrieb Anders Lybecker:
>>>> Hej Björn,
>>>>
>>>> The default search order (Relevance) will return the best match as the
>>>> first
>>>> result.
>>>>
>>>> :-)
>>>> Anders Lybecker
>>>>
>>>> 2011/8/17 Björn Kremer<b...@patorg.de>
>>>>
>>>>> Hi,
>>>>>
>>>>> what is the default sort order for lucene search results? I'm using
>>>>> lucene
>>>>> 2.9 and send a query that looks like this:
>>>>>
>>>>> topdocs= IndexSearcherInstance.Search(**query,null,limit,
>>>>> Lucene.Net.Search.Sort.**RELEVANCE)
>>>>>
>>>>> Is the first document in the topdocs collection(Index 0) the document
>>>>> with
>>>>> the best score or the worst score?
>>>>>
>>>>> Sometimes the last match in the collections seems to be the best
>>>>> match.
>>>>>
>>>>> Thank you
>>>>> Björn
>>>>>
>>
>>
>

Reply via email to