Paul, i think in general scoring should take care of this too, its all about your dictionary, same as the previous example. this is because überwachungsgesetz matches 3 tokens: überwachungsgesetz, überwachung, gesetz but überwachung gesetz only matches 2.
überwachungsgesetz 0.37040412 = (MATCH) sum of: 0.10848885 = (MATCH) weight(field:überwachungsgesetz in 0), product of: 0.5 = queryWeight(field:überwachungsgesetz), product of: 0.30685282 = idf(docFreq=1, maxDocs=1) 1.6294457 = queryNorm 0.2169777 = (MATCH) fieldWeight(field:überwachungsgesetz in 0), product of: 1.4142135 = tf(termFreq(field:überwachungsgesetz)=2) 0.30685282 = idf(docFreq=1, maxDocs=1) 0.5 = fieldNorm(field=field, doc=0) 0.076713204 = (MATCH) weight(field:überwachung in 0), product of: 0.5 = queryWeight(field:überwachung), product of: 0.30685282 = idf(docFreq=1, maxDocs=1) 1.6294457 = queryNorm 0.15342641 = (MATCH) fieldWeight(field:überwachung in 0), product of: 1.0 = tf(termFreq(field:überwachung)=1) 0.30685282 = idf(docFreq=1, maxDocs=1) 0.5 = fieldNorm(field=field, doc=0) 0.10848885 = (MATCH) weight(field:überwachungsgesetz in 0), product of: 0.5 = queryWeight(field:überwachungsgesetz), product of: 0.30685282 = idf(docFreq=1, maxDocs=1) 1.6294457 = queryNorm 0.2169777 = (MATCH) fieldWeight(field:überwachungsgesetz in 0), product of: 1.4142135 = tf(termFreq(field:überwachungsgesetz)=2) 0.30685282 = idf(docFreq=1, maxDocs=1) 0.5 = fieldNorm(field=field, doc=0) 0.076713204 = (MATCH) weight(field:gesetz in 0), product of: 0.5 = queryWeight(field:gesetz), product of: 0.30685282 = idf(docFreq=1, maxDocs=1) 1.6294457 = queryNorm 0.15342641 = (MATCH) fieldWeight(field:gesetz in 0), product of: 1.0 = tf(termFreq(field:gesetz)=1) 0.30685282 = idf(docFreq=1, maxDocs=1) 0.5 = fieldNorm(field=field, doc=0) überwachung gesetz 0.30685282 = (MATCH) sum of: 0.15342641 = (MATCH) sum of: 0.076713204 = (MATCH) weight(field:überwachung in 0), product of: 0.5 = queryWeight(field:überwachung), product of: 0.30685282 = idf(docFreq=1, maxDocs=1) 1.6294457 = queryNorm 0.15342641 = (MATCH) fieldWeight(field:überwachung in 0), product of: 1.0 = tf(termFreq(field:überwachung)=1) 0.30685282 = idf(docFreq=1, maxDocs=1) 0.5 = fieldNorm(field=field, doc=0) 0.076713204 = (MATCH) weight(field:überwachung in 0), product of: 0.5 = queryWeight(field:überwachung), product of: 0.30685282 = idf(docFreq=1, maxDocs=1) 1.6294457 = queryNorm 0.15342641 = (MATCH) fieldWeight(field:überwachung in 0), product of: 1.0 = tf(termFreq(field:überwachung)=1) 0.30685282 = idf(docFreq=1, maxDocs=1) 0.5 = fieldNorm(field=field, doc=0) 0.15342641 = (MATCH) sum of: 0.076713204 = (MATCH) weight(field:gesetz in 0), product of: 0.5 = queryWeight(field:gesetz), product of: 0.30685282 = idf(docFreq=1, maxDocs=1) 1.6294457 = queryNorm 0.15342641 = (MATCH) fieldWeight(field:gesetz in 0), product of: 1.0 = tf(termFreq(field:gesetz)=1) 0.30685282 = idf(docFreq=1, maxDocs=1) 0.5 = fieldNorm(field=field, doc=0) 0.076713204 = (MATCH) weight(field:gesetz in 0), product of: 0.5 = queryWeight(field:gesetz), product of: 0.30685282 = idf(docFreq=1, maxDocs=1) 1.6294457 = queryNorm 0.15342641 = (MATCH) fieldWeight(field:gesetz in 0), product of: 1.0 = tf(termFreq(field:gesetz)=1) 0.30685282 = idf(docFreq=1, maxDocs=1) 0.5 = fieldNorm(field=field, doc=0) On Wed, Oct 21, 2009 at 3:16 PM, Paul Libbrecht <p...@activemath.org> wrote: > Can the dictionary have weights? > > überwachungsgesetz alone probably needs a higher rank than überwachung and > gesetzt or? > > paul > > > Le 21-oct.-09 à 21:09, Benjamin Douglas a écrit : > > > OK, that makes sense. So I just need to add all of the sub-compounds that >> are real words at posIncr=0, even if they are combinations of other >> sub-compounds. >> >> Thanks! >> >> -----Original Message----- >> From: Robert Muir [mailto:rcm...@gmail.com] >> Sent: Wednesday, October 21, 2009 11:49 AM >> To: java-user@lucene.apache.org >> Subject: Re: Using org.apache.lucene.analysis.compound >> >> yes, your dictionary :) >> >> if überwachungsgesetz is a real word, add it to your dictionary. >> >> for example, if your dictionary is { "Rind", "Fleisch", "Draht", "Schere", >> "Gesetz", "Aufgabe", "Überwachung" }, and you index >> Rindfleischüberwachungsgesetz, then all 3 queries will have the same >> score. >> but if you expand the dictionary to { "Rind", "Fleisch", "Draht", >> "Schere", >> "Gesetz", "Aufgabe", "Überwachung", "Überwachungsgesetz" }, then this >> makes >> a big difference. >> >> all 3 queries will still match, but überwachungsgesetz will have a higher >> score. this is because now things are analyzed differently: >> Rindfleischüberwachungsgesetz will be decompounded as before, but with an >> additional token: Überwachungsgesetz. >> so back to your original question, these 'concatenations' of multiple >> components, yes compounds will do that, if they are real words. but it >> won't >> just make them up. >> >> "überwachungsgesetz" >> 0.23013961 = (MATCH) sum of: >> 0.057534903 = (MATCH) weight(field:überwachungsgesetz in 0), product of: >> 0.5 = queryWeight(field:überwachungsgesetz), product of: >> 0.30685282 = idf(docFreq=1, maxDocs=1) >> 1.6294457 = queryNorm >> 0.11506981 = (MATCH) fieldWeight(field:überwachungsgesetz in 0), product >> of: >> 1.0 = tf(termFreq(field:überwachungsgesetz)=1) >> 0.30685282 = idf(docFreq=1, maxDocs=1) >> 0.375 = fieldNorm(field=field, doc=0) >> 0.057534903 = (MATCH) weight(field:überwachung in 0), product of: >> 0.5 = queryWeight(field:überwachung), product of: >> 0.30685282 = idf(docFreq=1, maxDocs=1) >> 1.6294457 = queryNorm >> 0.11506981 = (MATCH) fieldWeight(field:überwachung in 0), product of: >> 1.0 = tf(termFreq(field:überwachung)=1) >> 0.30685282 = idf(docFreq=1, maxDocs=1) >> 0.375 = fieldNorm(field=field, doc=0) >> 0.057534903 = (MATCH) weight(field:überwachungsgesetz in 0), product of: >> 0.5 = queryWeight(field:überwachungsgesetz), product of: >> 0.30685282 = idf(docFreq=1, maxDocs=1) >> 1.6294457 = queryNorm >> 0.11506981 = (MATCH) fieldWeight(field:überwachungsgesetz in 0), product >> of: >> 1.0 = tf(termFreq(field:überwachungsgesetz)=1) >> 0.30685282 = idf(docFreq=1, maxDocs=1) >> 0.375 = fieldNorm(field=field, doc=0) >> 0.057534903 = (MATCH) weight(field:gesetz in 0), product of: >> 0.5 = queryWeight(field:gesetz), product of: >> 0.30685282 = idf(docFreq=1, maxDocs=1) >> 1.6294457 = queryNorm >> 0.11506981 = (MATCH) fieldWeight(field:gesetz in 0), product of: >> 1.0 = tf(termFreq(field:gesetz)=1) >> 0.30685282 = idf(docFreq=1, maxDocs=1) >> 0.375 = fieldNorm(field=field, doc=0) >> >> "gesetzüberwachung" >> 0.064782135 = (MATCH) sum of: >> 0.032391068 = (MATCH) weight(field:gesetz in 0), product of: >> 0.2814906 = queryWeight(field:gesetz), product of: >> 0.30685282 = idf(docFreq=1, maxDocs=1) >> 0.9173473 = queryNorm >> 0.11506981 = (MATCH) fieldWeight(field:gesetz in 0), product of: >> 1.0 = tf(termFreq(field:gesetz)=1) >> 0.30685282 = idf(docFreq=1, maxDocs=1) >> 0.375 = fieldNorm(field=field, doc=0) >> 0.032391068 = (MATCH) weight(field:überwachung in 0), product of: >> 0.2814906 = queryWeight(field:überwachung), product of: >> 0.30685282 = idf(docFreq=1, maxDocs=1) >> 0.9173473 = queryNorm >> 0.11506981 = (MATCH) fieldWeight(field:überwachung in 0), product of: >> 1.0 = tf(termFreq(field:überwachung)=1) >> 0.30685282 = idf(docFreq=1, maxDocs=1) >> 0.375 = fieldNorm(field=field, doc=0) >> >> "fleischgesetz" >> 0.064782135 = (MATCH) sum of: >> 0.032391068 = (MATCH) weight(field:fleisch in 0), product of: >> 0.2814906 = queryWeight(field:fleisch), product of: >> 0.30685282 = idf(docFreq=1, maxDocs=1) >> 0.9173473 = queryNorm >> 0.11506981 = (MATCH) fieldWeight(field:fleisch in 0), product of: >> 1.0 = tf(termFreq(field:fleisch)=1) >> 0.30685282 = idf(docFreq=1, maxDocs=1) >> 0.375 = fieldNorm(field=field, doc=0) >> 0.032391068 = (MATCH) weight(field:gesetz in 0), product of: >> 0.2814906 = queryWeight(field:gesetz), product of: >> 0.30685282 = idf(docFreq=1, maxDocs=1) >> 0.9173473 = queryNorm >> 0.11506981 = (MATCH) fieldWeight(field:gesetz in 0), product of: >> 1.0 = tf(termFreq(field:gesetz)=1) >> 0.30685282 = idf(docFreq=1, maxDocs=1) >> 0.375 = fieldNorm(field=field, doc=0) >> >> >> >> >> On Wed, Oct 21, 2009 at 1:40 PM, Benjamin Douglas >> <bbdoug...@basistech.com>wrote: >> >> Thanks for all of the answers so far! >>> >>> Paul's question is similar to another aspect I am curious about: >>> >>> Given the way the sample word is analyzed, is there anything in the >>> scoring >>> mechanism that would rank "überwachungsgesetz" higher than >>> "gesetzüberwachung" or "fleischgesetz"? >>> >>> >>> >> -- >> Robert Muir >> rcm...@gmail.com >> > > -- Robert Muir rcm...@gmail.com