This is a follow up from my previous query about search weightings. The problem is a simple search for some text in the opp:body field. If the text is also in the dc:title element in addition to the opp:body then boost the score of those results. Naively I entered the following query.

cts:search((
 /doc,
 cts:or-query((
   cts:element-query(xs:QName("dc:title"),cts:word-query("bach",(),16)),
   cts:element-query(xs:QName("opp:body"),cts:word-query("bach"))
 ))
))


What is happening however does not make any sense. Let me step you through my investigation. Firstly I get a list of the first 13 entries that have "bach", in opp:body.

<results>{
for $x at $i in (cts:search(
/doc, cts:element-query(xs:QName("opp:body"),cts:word-query("bach"))
))[1 to 13]
return <result id="{$i}">
 { base-uri($x) } :
 { cts:score($x) } :
 { $x/opp:meta/dc:title/text() }
 </result>
}</results>

<opp:results>
<opp:result id="1">/grove/music/19768 : 465 : Neue Bach-Gesellschaft.</opp:result> <opp:result id="2">/grove/music/01690 : 434 : Bach, Cecilia.</opp:result>
   <opp:result id="3">/grove/music/01696   : 434 : Bach Choir.</opp:result>
<opp:result id="4">/grove/music/52274 : 434 : Bach, P.D.Q.</opp:result> <opp:result id="5">/grove/music/O007770 : 434 : Bach, P. D. Q.</opp:result>
   <opp:result id="6">/grove/music/52912   : 434 : Bach Guild.</opp:result>
<opp:result id="7">/grove/music/01710 : 434 : Bach Society.</opp:result> <opp:result id="8">/opr/t76/e649 : 434 : Bach Gesellschaft</opp:result> <opp:result id="9">/opr/t114/e526 : 434 : Bach Revival</opp:result> <opp:result id="10">/opr/t76/e3128 : 403 : Estro armonico, L’</opp:result> <opp:result id="11">/grove/music/30356 : 403 : Williams, Peter (Frederic)</opp:result> <opp:result id="12">/grove/music/01689 : 403 : Bach, August Wilhelm</opp:result> <opp:result id="13">/grove/music/01692 : 403 : Bach, Vincent [Schrottenbach, Vinzenz]</opp:result>
</opp:results>

Then, just to make sure I searched for "bach" just in dc:title

<results>{
for $x at $i in (cts:search(
/doc, cts:element-query(xs:QName("dc:title"),cts:word-query("bach"))
))[1 to 13]
return <result id="{$i}">
 { base-uri($x) } :
 { cts:score($x) } :
 { $x/opp:meta/dc:title/text() }
 </result>
}</results>

<opp:results>
<opp:result id="1">/grove/music/19768 : 465 : Neue Bach-Gesellschaft.</opp:result> <opp:result id="2">/grove/music/01690 : 434 : Bach, Cecilia.</opp:result> <opp:result id="3">/grove/music/01696 : 434 : Bach Choir.</opp:result> <opp:result id="4">/grove/music/52274 : 434 : Bach, P.D.Q.</opp:result> <opp:result id="5">/grove/music/O007770 : 434 : Bach, P. D. Q.</opp:result> <opp:result id="6">/grove/music/52912 : 434 : Bach Guild.</opp:result> <opp:result id="7">/grove/music/01710 : 434 : Bach Society.</opp:result> <opp:result id="8">/opr/t76/e649 : 434 : Bach Gesellschaft</opp:result> <opp:result id="9">/opr/t114/e526 : 434 : Bach Revival</opp:result> <opp:result id="10">/grove/music/01689 : 403 : Bach, August Wilhelm</opp:result> <opp:result id="11">/grove/music/01692 : 403 : Bach, Vincent [Schrottenbach, Vinzenz]</opp:result> <opp:result id="12">/grove/music/01693 : 403 : Bach-Abel Concerts.</opp:result> <opp:result id="13">/grove/music/O006539 : 403 : English Bach Festival.</opp:result>
</opp:results>

Now I combined the two searches with a cts:or-query and no weightings:

<results>{
for $x at $i in (cts:search(
/doc, cts:or-query((
 cts:element-query(xs:QName("opp:body"),cts:word-query("bach")),
 cts:element-query(xs:QName("dc:title"),cts:word-query("bach"))
  ))
))[1 to 13]
return <result id="{$i}">
 { base-uri($x) } :
 { cts:score($x) } :
 { $x/opp:meta/dc:title/text() }</result>
}</results>

<opp:results>
<opp:result id="1">/grove/music/19768 : 465 : Neue Bach-Gesellschaft.</opp:result> <opp:result id="2">/grove/music/01690 : 434 : Bach, Cecilia.</opp:result>
   <opp:result id="3">/grove/music/01696   : 434 : Bach Choir.</opp:result>
<opp:result id="4">/grove/music/52274 : 434 : Bach, P.D.Q.</opp:result> <opp:result id="5">/grove/music/O007770 : 434 : Bach, P. D. Q.</opp:result>
   <opp:result id="6">/grove/music/52912   : 434 : Bach Guild.</opp:result>
<opp:result id="7">/grove/music/01710 : 434 : Bach Society.</opp:result> <opp:result id="8">/opr/t76/e649 : 434 : Bach Gesellschaft</opp:result> <opp:result id="9">/opr/t114/e526 : 434 : Bach Revival</opp:result> <opp:result id="10">/opr/t76/e3128 : 403 : Estro armonico, L’</opp:result> <opp:result id="11">/grove/music/30356 : 403 : Williams, Peter (Frederic)</opp:result> <opp:result id="12">/grove/music/01689 : 403 : Bach, August Wilhelm</opp:result> <opp:result id="13">/grove/music/01692 : 403 : Bach, Vincent [Schrottenbach, Vinzenz]</opp:result>
</opp:results>

The results to note are 10 and 11, these are documents that do not contain "bach" in the dc:title element but have identical scores to documents that do (results 12 and 13). So now I add some weighting to the query for the dc:title element.

<results>{
for $x at $i in (cts:search(
/doc, cts:or-query((
 cts:element-query(xs:QName("dc:title"),cts:word-query("bach",(),16)),
 cts:element-query(xs:QName("opp:body"),cts:word-query("bach"))
  ))
))[1 to 13]
return <result id="{$i}">
 { base-uri($x) } :
 { cts:score($x) } :
 { $x/opp:meta/dc:title/text() }</result>
}</results>

<opp:results>
<opp:result id="1">/grove/music/19768 : 474 : Neue Bach-Gesellschaft.</opp:result> <opp:result id="2">/grove/music/01690 : 443 : Bach, Cecilia.</opp:result>
   <opp:result id="3">/grove/music/01696   : 443 : Bach Choir.</opp:result>
<opp:result id="4">/grove/music/52274 : 443 : Bach, P.D.Q.</opp:result> <opp:result id="5">/grove/music/O007770 : 443 : Bach, P. D. Q.</opp:result>
   <opp:result id="6">/grove/music/52912   : 443 : Bach Guild.</opp:result>
<opp:result id="7">/grove/music/01710 : 443 : Bach Society.</opp:result> <opp:result id="8">/opr/t76/e649 : 443 : Bach Gesellschaft</opp:result> <opp:result id="9">/opr/t114/e526 : 443 : Bach Revival</opp:result> <opp:result id="10">/opr/t76/e3128 : 411 : Estro armonico, L’</opp:result> <opp:result id="11">/grove/music/30356 : 411 : Williams, Peter (Frederic)</opp:result> <opp:result id="12">/grove/music/01689 : 411 : Bach, August Wilhelm</opp:result> <opp:result id="13">/grove/music/01692 : 411 : Bach, Vincent [Schrottenbach, Vinzenz]</opp:result>
</opp:results>

Result .:   1   2   3   4   5   6   7   8   9  10  11  12  13
Before .: 465 434 434 434 434 434 434 434 434 403 403 403 403
After ..: 474 443 443 443 443 443 443 443 443 411 411 411 411

As you can see the scores for all the results have changed, including those for results 10 and 11 which have received the same minuscule boost as 12 and 13. Remembering that 10 and 11 do not have "bach" in the dc:title element and so I would have expected that they would not have received a boost. So the net effect is that everything has changed and everything has stayed the same (probably sounds better in French).

Whatever I do the ordering will remain the same, I have tried some completely insane values (only to discover that the max appears to be 16) and the only outcome is that all the results change by the same amount and the ordering remains unaltered.

I am beginning to suspect that the whole query weighting song and dance is just plain broken.

Can someone please tell me what I am doing wrong or what else I might try?

--
Peter Hickman.

Semantico, Lees House, 21-23 Dyke Road, Brighton BN1 3FE
t: 01273 722222
f: 01273 723232
e: [EMAIL PROTECTED]
w: www.semantico.com

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to