Paul, i think in general scoring should take care of this too, its all about
your dictionary, same as the previous example.
this is because überwachungsgesetz matches 3 tokens: überwachungsgesetz,
überwachung, gesetz
but überwachung gesetz only matches 2.
überwachungsgesetz
0.37040412 = (MATCH) sum of:
0.10848885 = (MATCH) weight(field:überwachungsgesetz in 0), product of:
0.5 = queryWeight(field:überwachungsgesetz), product of:
0.30685282 = idf(docFreq=1, maxDocs=1)
1.6294457 = queryNorm
0.2169777 = (MATCH) fieldWeight(field:überwachungsgesetz in 0), product
of:
1.4142135 = tf(termFreq(field:überwachungsgesetz)=2)
0.30685282 = idf(docFreq=1, maxDocs=1)
0.5 = fieldNorm(field=field, doc=0)
0.076713204 = (MATCH) weight(field:überwachung in 0), product of:
0.5 = queryWeight(field:überwachung), product of:
0.30685282 = idf(docFreq=1, maxDocs=1)
1.6294457 = queryNorm
0.15342641 = (MATCH) fieldWeight(field:überwachung in 0), product of:
1.0 = tf(termFreq(field:überwachung)=1)
0.30685282 = idf(docFreq=1, maxDocs=1)
0.5 = fieldNorm(field=field, doc=0)
0.10848885 = (MATCH) weight(field:überwachungsgesetz in 0), product of:
0.5 = queryWeight(field:überwachungsgesetz), product of:
0.30685282 = idf(docFreq=1, maxDocs=1)
1.6294457 = queryNorm
0.2169777 = (MATCH) fieldWeight(field:überwachungsgesetz in 0), product
of:
1.4142135 = tf(termFreq(field:überwachungsgesetz)=2)
0.30685282 = idf(docFreq=1, maxDocs=1)
0.5 = fieldNorm(field=field, doc=0)
0.076713204 = (MATCH) weight(field:gesetz in 0), product of:
0.5 = queryWeight(field:gesetz), product of:
0.30685282 = idf(docFreq=1, maxDocs=1)
1.6294457 = queryNorm
0.15342641 = (MATCH) fieldWeight(field:gesetz in 0), product of:
1.0 = tf(termFreq(field:gesetz)=1)
0.30685282 = idf(docFreq=1, maxDocs=1)
0.5 = fieldNorm(field=field, doc=0)
überwachung gesetz
0.30685282 = (MATCH) sum of:
0.15342641 = (MATCH) sum of:
0.076713204 = (MATCH) weight(field:überwachung in 0), product of:
0.5 = queryWeight(field:überwachung), product of:
0.30685282 = idf(docFreq=1, maxDocs=1)
1.6294457 = queryNorm
0.15342641 = (MATCH) fieldWeight(field:überwachung in 0), product of:
1.0 = tf(termFreq(field:überwachung)=1)
0.30685282 = idf(docFreq=1, maxDocs=1)
0.5 = fieldNorm(field=field, doc=0)
0.076713204 = (MATCH) weight(field:überwachung in 0), product of:
0.5 = queryWeight(field:überwachung), product of:
0.30685282 = idf(docFreq=1, maxDocs=1)
1.6294457 = queryNorm
0.15342641 = (MATCH) fieldWeight(field:überwachung in 0), product of:
1.0 = tf(termFreq(field:überwachung)=1)
0.30685282 = idf(docFreq=1, maxDocs=1)
0.5 = fieldNorm(field=field, doc=0)
0.15342641 = (MATCH) sum of:
0.076713204 = (MATCH) weight(field:gesetz in 0), product of:
0.5 = queryWeight(field:gesetz), product of:
0.30685282 = idf(docFreq=1, maxDocs=1)
1.6294457 = queryNorm
0.15342641 = (MATCH) fieldWeight(field:gesetz in 0), product of:
1.0 = tf(termFreq(field:gesetz)=1)
0.30685282 = idf(docFreq=1, maxDocs=1)
0.5 = fieldNorm(field=field, doc=0)
0.076713204 = (MATCH) weight(field:gesetz in 0), product of:
0.5 = queryWeight(field:gesetz), product of:
0.30685282 = idf(docFreq=1, maxDocs=1)
1.6294457 = queryNorm
0.15342641 = (MATCH) fieldWeight(field:gesetz in 0), product of:
1.0 = tf(termFreq(field:gesetz)=1)
0.30685282 = idf(docFreq=1, maxDocs=1)
0.5 = fieldNorm(field=field, doc=0)
On Wed, Oct 21, 2009 at 3:16 PM, Paul Libbrecht <[email protected]> wrote:
> Can the dictionary have weights?
>
> überwachungsgesetz alone probably needs a higher rank than überwachung and
> gesetzt or?
>
> paul
>
>
> Le 21-oct.-09 à 21:09, Benjamin Douglas a écrit :
>
>
> OK, that makes sense. So I just need to add all of the sub-compounds that
>> are real words at posIncr=0, even if they are combinations of other
>> sub-compounds.
>>
>> Thanks!
>>
>> -----Original Message-----
>> From: Robert Muir [mailto:[email protected]]
>> Sent: Wednesday, October 21, 2009 11:49 AM
>> To: [email protected]
>> Subject: Re: Using org.apache.lucene.analysis.compound
>>
>> yes, your dictionary :)
>>
>> if überwachungsgesetz is a real word, add it to your dictionary.
>>
>> for example, if your dictionary is { "Rind", "Fleisch", "Draht", "Schere",
>> "Gesetz", "Aufgabe", "Überwachung" }, and you index
>> Rindfleischüberwachungsgesetz, then all 3 queries will have the same
>> score.
>> but if you expand the dictionary to { "Rind", "Fleisch", "Draht",
>> "Schere",
>> "Gesetz", "Aufgabe", "Überwachung", "Überwachungsgesetz" }, then this
>> makes
>> a big difference.
>>
>> all 3 queries will still match, but überwachungsgesetz will have a higher
>> score. this is because now things are analyzed differently:
>> Rindfleischüberwachungsgesetz will be decompounded as before, but with an
>> additional token: Überwachungsgesetz.
>> so back to your original question, these 'concatenations' of multiple
>> components, yes compounds will do that, if they are real words. but it
>> won't
>> just make them up.
>>
>> "überwachungsgesetz"
>> 0.23013961 = (MATCH) sum of:
>> 0.057534903 = (MATCH) weight(field:überwachungsgesetz in 0), product of:
>> 0.5 = queryWeight(field:überwachungsgesetz), product of:
>> 0.30685282 = idf(docFreq=1, maxDocs=1)
>> 1.6294457 = queryNorm
>> 0.11506981 = (MATCH) fieldWeight(field:überwachungsgesetz in 0), product
>> of:
>> 1.0 = tf(termFreq(field:überwachungsgesetz)=1)
>> 0.30685282 = idf(docFreq=1, maxDocs=1)
>> 0.375 = fieldNorm(field=field, doc=0)
>> 0.057534903 = (MATCH) weight(field:überwachung in 0), product of:
>> 0.5 = queryWeight(field:überwachung), product of:
>> 0.30685282 = idf(docFreq=1, maxDocs=1)
>> 1.6294457 = queryNorm
>> 0.11506981 = (MATCH) fieldWeight(field:überwachung in 0), product of:
>> 1.0 = tf(termFreq(field:überwachung)=1)
>> 0.30685282 = idf(docFreq=1, maxDocs=1)
>> 0.375 = fieldNorm(field=field, doc=0)
>> 0.057534903 = (MATCH) weight(field:überwachungsgesetz in 0), product of:
>> 0.5 = queryWeight(field:überwachungsgesetz), product of:
>> 0.30685282 = idf(docFreq=1, maxDocs=1)
>> 1.6294457 = queryNorm
>> 0.11506981 = (MATCH) fieldWeight(field:überwachungsgesetz in 0), product
>> of:
>> 1.0 = tf(termFreq(field:überwachungsgesetz)=1)
>> 0.30685282 = idf(docFreq=1, maxDocs=1)
>> 0.375 = fieldNorm(field=field, doc=0)
>> 0.057534903 = (MATCH) weight(field:gesetz in 0), product of:
>> 0.5 = queryWeight(field:gesetz), product of:
>> 0.30685282 = idf(docFreq=1, maxDocs=1)
>> 1.6294457 = queryNorm
>> 0.11506981 = (MATCH) fieldWeight(field:gesetz in 0), product of:
>> 1.0 = tf(termFreq(field:gesetz)=1)
>> 0.30685282 = idf(docFreq=1, maxDocs=1)
>> 0.375 = fieldNorm(field=field, doc=0)
>>
>> "gesetzüberwachung"
>> 0.064782135 = (MATCH) sum of:
>> 0.032391068 = (MATCH) weight(field:gesetz in 0), product of:
>> 0.2814906 = queryWeight(field:gesetz), product of:
>> 0.30685282 = idf(docFreq=1, maxDocs=1)
>> 0.9173473 = queryNorm
>> 0.11506981 = (MATCH) fieldWeight(field:gesetz in 0), product of:
>> 1.0 = tf(termFreq(field:gesetz)=1)
>> 0.30685282 = idf(docFreq=1, maxDocs=1)
>> 0.375 = fieldNorm(field=field, doc=0)
>> 0.032391068 = (MATCH) weight(field:überwachung in 0), product of:
>> 0.2814906 = queryWeight(field:überwachung), product of:
>> 0.30685282 = idf(docFreq=1, maxDocs=1)
>> 0.9173473 = queryNorm
>> 0.11506981 = (MATCH) fieldWeight(field:überwachung in 0), product of:
>> 1.0 = tf(termFreq(field:überwachung)=1)
>> 0.30685282 = idf(docFreq=1, maxDocs=1)
>> 0.375 = fieldNorm(field=field, doc=0)
>>
>> "fleischgesetz"
>> 0.064782135 = (MATCH) sum of:
>> 0.032391068 = (MATCH) weight(field:fleisch in 0), product of:
>> 0.2814906 = queryWeight(field:fleisch), product of:
>> 0.30685282 = idf(docFreq=1, maxDocs=1)
>> 0.9173473 = queryNorm
>> 0.11506981 = (MATCH) fieldWeight(field:fleisch in 0), product of:
>> 1.0 = tf(termFreq(field:fleisch)=1)
>> 0.30685282 = idf(docFreq=1, maxDocs=1)
>> 0.375 = fieldNorm(field=field, doc=0)
>> 0.032391068 = (MATCH) weight(field:gesetz in 0), product of:
>> 0.2814906 = queryWeight(field:gesetz), product of:
>> 0.30685282 = idf(docFreq=1, maxDocs=1)
>> 0.9173473 = queryNorm
>> 0.11506981 = (MATCH) fieldWeight(field:gesetz in 0), product of:
>> 1.0 = tf(termFreq(field:gesetz)=1)
>> 0.30685282 = idf(docFreq=1, maxDocs=1)
>> 0.375 = fieldNorm(field=field, doc=0)
>>
>>
>>
>>
>> On Wed, Oct 21, 2009 at 1:40 PM, Benjamin Douglas
>> <[email protected]>wrote:
>>
>> Thanks for all of the answers so far!
>>>
>>> Paul's question is similar to another aspect I am curious about:
>>>
>>> Given the way the sample word is analyzed, is there anything in the
>>> scoring
>>> mechanism that would rank "überwachungsgesetz" higher than
>>> "gesetzüberwachung" or "fleischgesetz"?
>>>
>>>
>>>
>> --
>> Robert Muir
>> [email protected]
>>
>
>
--
Robert Muir
[email protected]