Can the dictionary have weights?überwachungsgesetz alone probably needs a higher rank than überwachung and gesetzt or?
paul Le 21-oct.-09 à 21:09, Benjamin Douglas a écrit :
OK, that makes sense. So I just need to add all of the sub-compounds that are real words at posIncr=0, even if they are combinations of other sub-compounds.Thanks! -----Original Message----- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Wednesday, October 21, 2009 11:49 AM To: java-user@lucene.apache.org Subject: Re: Using org.apache.lucene.analysis.compound yes, your dictionary :) if überwachungsgesetz is a real word, add it to your dictionary.for example, if your dictionary is { "Rind", "Fleisch", "Draht", "Schere","Gesetz", "Aufgabe", "Überwachung" }, and you indexRindfleischüberwachungsgesetz, then all 3 queries will have the same score. but if you expand the dictionary to { "Rind", "Fleisch", "Draht", "Schere", "Gesetz", "Aufgabe", "Überwachung", "Überwachungsgesetz" }, then this makesa big difference.all 3 queries will still match, but überwachungsgesetz will have a higherscore. this is because now things are analyzed differently:Rindfleischüberwachungsgesetz will be decompounded as before, but with anadditional token: Überwachungsgesetz. so back to your original question, these 'concatenations' of multiplecomponents, yes compounds will do that, if they are real words. but it won'tjust make them up. "überwachungsgesetz" 0.23013961 = (MATCH) sum of:0.057534903 = (MATCH) weight(field:überwachungsgesetz in 0), product of:0.5 = queryWeight(field:überwachungsgesetz), product of: 0.30685282 = idf(docFreq=1, maxDocs=1) 1.6294457 = queryNorm0.11506981 = (MATCH) fieldWeight(field:überwachungsgesetz in 0), productof: 1.0 = tf(termFreq(field:überwachungsgesetz)=1) 0.30685282 = idf(docFreq=1, maxDocs=1) 0.375 = fieldNorm(field=field, doc=0) 0.057534903 = (MATCH) weight(field:überwachung in 0), product of: 0.5 = queryWeight(field:überwachung), product of: 0.30685282 = idf(docFreq=1, maxDocs=1) 1.6294457 = queryNorm0.11506981 = (MATCH) fieldWeight(field:überwachung in 0), product of:1.0 = tf(termFreq(field:überwachung)=1) 0.30685282 = idf(docFreq=1, maxDocs=1) 0.375 = fieldNorm(field=field, doc=0)0.057534903 = (MATCH) weight(field:überwachungsgesetz in 0), product of:0.5 = queryWeight(field:überwachungsgesetz), product of: 0.30685282 = idf(docFreq=1, maxDocs=1) 1.6294457 = queryNorm0.11506981 = (MATCH) fieldWeight(field:überwachungsgesetz in 0), productof: 1.0 = tf(termFreq(field:überwachungsgesetz)=1) 0.30685282 = idf(docFreq=1, maxDocs=1) 0.375 = fieldNorm(field=field, doc=0) 0.057534903 = (MATCH) weight(field:gesetz in 0), product of: 0.5 = queryWeight(field:gesetz), product of: 0.30685282 = idf(docFreq=1, maxDocs=1) 1.6294457 = queryNorm 0.11506981 = (MATCH) fieldWeight(field:gesetz in 0), product of: 1.0 = tf(termFreq(field:gesetz)=1) 0.30685282 = idf(docFreq=1, maxDocs=1) 0.375 = fieldNorm(field=field, doc=0) "gesetzüberwachung" 0.064782135 = (MATCH) sum of: 0.032391068 = (MATCH) weight(field:gesetz in 0), product of: 0.2814906 = queryWeight(field:gesetz), product of: 0.30685282 = idf(docFreq=1, maxDocs=1) 0.9173473 = queryNorm 0.11506981 = (MATCH) fieldWeight(field:gesetz in 0), product of: 1.0 = tf(termFreq(field:gesetz)=1) 0.30685282 = idf(docFreq=1, maxDocs=1) 0.375 = fieldNorm(field=field, doc=0) 0.032391068 = (MATCH) weight(field:überwachung in 0), product of: 0.2814906 = queryWeight(field:überwachung), product of: 0.30685282 = idf(docFreq=1, maxDocs=1) 0.9173473 = queryNorm0.11506981 = (MATCH) fieldWeight(field:überwachung in 0), product of:1.0 = tf(termFreq(field:überwachung)=1) 0.30685282 = idf(docFreq=1, maxDocs=1) 0.375 = fieldNorm(field=field, doc=0) "fleischgesetz" 0.064782135 = (MATCH) sum of: 0.032391068 = (MATCH) weight(field:fleisch in 0), product of: 0.2814906 = queryWeight(field:fleisch), product of: 0.30685282 = idf(docFreq=1, maxDocs=1) 0.9173473 = queryNorm 0.11506981 = (MATCH) fieldWeight(field:fleisch in 0), product of: 1.0 = tf(termFreq(field:fleisch)=1) 0.30685282 = idf(docFreq=1, maxDocs=1) 0.375 = fieldNorm(field=field, doc=0) 0.032391068 = (MATCH) weight(field:gesetz in 0), product of: 0.2814906 = queryWeight(field:gesetz), product of: 0.30685282 = idf(docFreq=1, maxDocs=1) 0.9173473 = queryNorm 0.11506981 = (MATCH) fieldWeight(field:gesetz in 0), product of: 1.0 = tf(termFreq(field:gesetz)=1) 0.30685282 = idf(docFreq=1, maxDocs=1) 0.375 = fieldNorm(field=field, doc=0) On Wed, Oct 21, 2009 at 1:40 PM, Benjamin Douglas <bbdoug...@basistech.com>wrote:Thanks for all of the answers so far! Paul's question is similar to another aspect I am curious about:Given the way the sample word is analyzed, is there anything in the scoringmechanism that would rank "überwachungsgesetz" higher than "gesetzüberwachung" or "fleischgesetz"?-- Robert Muir rcm...@gmail.com
smime.p7s
Description: S/MIME cryptographic signature