This is a follow up from my previous query about search weightings. The
problem is a simple search for some text in the opp:body field. If the
text is also in the dc:title element in addition to the opp:body then
boost the score of those results. Naively I entered the following query.
cts:search((
/doc,
cts:or-query((
cts:element-query(xs:QName("dc:title"),cts:word-query("bach",(),16)),
cts:element-query(xs:QName("opp:body"),cts:word-query("bach"))
))
))
What is happening however does not make any sense. Let me step you
through my investigation. Firstly I get a list of the first 13 entries
that have "bach", in opp:body.
<results>{
for $x at $i in (cts:search(
/doc,
cts:element-query(xs:QName("opp:body"),cts:word-query("bach"))
))[1 to 13]
return <result id="{$i}">
{ base-uri($x) } :
{ cts:score($x) } :
{ $x/opp:meta/dc:title/text() }
</result>
}</results>
<opp:results>
<opp:result id="1">/grove/music/19768 : 465 : Neue
Bach-Gesellschaft.</opp:result>
<opp:result id="2">/grove/music/01690 : 434 : Bach,
Cecilia.</opp:result>
<opp:result id="3">/grove/music/01696 : 434 : Bach Choir.</opp:result>
<opp:result id="4">/grove/music/52274 : 434 : Bach,
P.D.Q.</opp:result>
<opp:result id="5">/grove/music/O007770 : 434 : Bach, P. D.
Q.</opp:result>
<opp:result id="6">/grove/music/52912 : 434 : Bach Guild.</opp:result>
<opp:result id="7">/grove/music/01710 : 434 : Bach
Society.</opp:result>
<opp:result id="8">/opr/t76/e649 : 434 : Bach
Gesellschaft</opp:result>
<opp:result id="9">/opr/t114/e526 : 434 : Bach
Revival</opp:result>
<opp:result id="10">/opr/t76/e3128 : 403 : Estro armonico,
L’</opp:result>
<opp:result id="11">/grove/music/30356 : 403 : Williams, Peter
(Frederic)</opp:result>
<opp:result id="12">/grove/music/01689 : 403 : Bach, August
Wilhelm</opp:result>
<opp:result id="13">/grove/music/01692 : 403 : Bach, Vincent
[Schrottenbach, Vinzenz]</opp:result>
</opp:results>
Then, just to make sure I searched for "bach" just in dc:title
<results>{
for $x at $i in (cts:search(
/doc,
cts:element-query(xs:QName("dc:title"),cts:word-query("bach"))
))[1 to 13]
return <result id="{$i}">
{ base-uri($x) } :
{ cts:score($x) } :
{ $x/opp:meta/dc:title/text() }
</result>
}</results>
<opp:results>
<opp:result id="1">/grove/music/19768 : 465 : Neue
Bach-Gesellschaft.</opp:result>
<opp:result id="2">/grove/music/01690 : 434 : Bach,
Cecilia.</opp:result>
<opp:result id="3">/grove/music/01696 : 434 : Bach
Choir.</opp:result>
<opp:result id="4">/grove/music/52274 : 434 : Bach,
P.D.Q.</opp:result>
<opp:result id="5">/grove/music/O007770 : 434 : Bach, P. D.
Q.</opp:result>
<opp:result id="6">/grove/music/52912 : 434 : Bach
Guild.</opp:result>
<opp:result id="7">/grove/music/01710 : 434 : Bach
Society.</opp:result>
<opp:result id="8">/opr/t76/e649 : 434 : Bach
Gesellschaft</opp:result>
<opp:result id="9">/opr/t114/e526 : 434 : Bach
Revival</opp:result>
<opp:result id="10">/grove/music/01689 : 403 : Bach, August
Wilhelm</opp:result>
<opp:result id="11">/grove/music/01692 : 403 : Bach, Vincent
[Schrottenbach, Vinzenz]</opp:result>
<opp:result id="12">/grove/music/01693 : 403 : Bach-Abel
Concerts.</opp:result>
<opp:result id="13">/grove/music/O006539 : 403 : English Bach
Festival.</opp:result>
</opp:results>
Now I combined the two searches with a cts:or-query and no weightings:
<results>{
for $x at $i in (cts:search(
/doc,
cts:or-query((
cts:element-query(xs:QName("opp:body"),cts:word-query("bach")),
cts:element-query(xs:QName("dc:title"),cts:word-query("bach"))
))
))[1 to 13]
return <result id="{$i}">
{ base-uri($x) } :
{ cts:score($x) } :
{ $x/opp:meta/dc:title/text() }</result>
}</results>
<opp:results>
<opp:result id="1">/grove/music/19768 : 465 : Neue
Bach-Gesellschaft.</opp:result>
<opp:result id="2">/grove/music/01690 : 434 : Bach,
Cecilia.</opp:result>
<opp:result id="3">/grove/music/01696 : 434 : Bach Choir.</opp:result>
<opp:result id="4">/grove/music/52274 : 434 : Bach,
P.D.Q.</opp:result>
<opp:result id="5">/grove/music/O007770 : 434 : Bach, P. D.
Q.</opp:result>
<opp:result id="6">/grove/music/52912 : 434 : Bach Guild.</opp:result>
<opp:result id="7">/grove/music/01710 : 434 : Bach
Society.</opp:result>
<opp:result id="8">/opr/t76/e649 : 434 : Bach
Gesellschaft</opp:result>
<opp:result id="9">/opr/t114/e526 : 434 : Bach
Revival</opp:result>
<opp:result id="10">/opr/t76/e3128 : 403 : Estro armonico,
L’</opp:result>
<opp:result id="11">/grove/music/30356 : 403 : Williams, Peter
(Frederic)</opp:result>
<opp:result id="12">/grove/music/01689 : 403 : Bach, August
Wilhelm</opp:result>
<opp:result id="13">/grove/music/01692 : 403 : Bach, Vincent
[Schrottenbach, Vinzenz]</opp:result>
</opp:results>
The results to note are 10 and 11, these are documents that do not
contain "bach" in the dc:title element but have identical scores to
documents that do (results 12 and 13). So now I add some weighting to
the query for the dc:title element.
<results>{
for $x at $i in (cts:search(
/doc,
cts:or-query((
cts:element-query(xs:QName("dc:title"),cts:word-query("bach",(),16)),
cts:element-query(xs:QName("opp:body"),cts:word-query("bach"))
))
))[1 to 13]
return <result id="{$i}">
{ base-uri($x) } :
{ cts:score($x) } :
{ $x/opp:meta/dc:title/text() }</result>
}</results>
<opp:results>
<opp:result id="1">/grove/music/19768 : 474 : Neue
Bach-Gesellschaft.</opp:result>
<opp:result id="2">/grove/music/01690 : 443 : Bach,
Cecilia.</opp:result>
<opp:result id="3">/grove/music/01696 : 443 : Bach Choir.</opp:result>
<opp:result id="4">/grove/music/52274 : 443 : Bach,
P.D.Q.</opp:result>
<opp:result id="5">/grove/music/O007770 : 443 : Bach, P. D.
Q.</opp:result>
<opp:result id="6">/grove/music/52912 : 443 : Bach Guild.</opp:result>
<opp:result id="7">/grove/music/01710 : 443 : Bach
Society.</opp:result>
<opp:result id="8">/opr/t76/e649 : 443 : Bach
Gesellschaft</opp:result>
<opp:result id="9">/opr/t114/e526 : 443 : Bach
Revival</opp:result>
<opp:result id="10">/opr/t76/e3128 : 411 : Estro armonico,
L’</opp:result>
<opp:result id="11">/grove/music/30356 : 411 : Williams, Peter
(Frederic)</opp:result>
<opp:result id="12">/grove/music/01689 : 411 : Bach, August
Wilhelm</opp:result>
<opp:result id="13">/grove/music/01692 : 411 : Bach, Vincent
[Schrottenbach, Vinzenz]</opp:result>
</opp:results>
Result .: 1 2 3 4 5 6 7 8 9 10 11 12 13
Before .: 465 434 434 434 434 434 434 434 434 403 403 403 403
After ..: 474 443 443 443 443 443 443 443 443 411 411 411 411
As you can see the scores for all the results have changed, including
those for results 10 and 11 which have received the same minuscule boost
as 12 and 13. Remembering that 10 and 11 do not have "bach" in the
dc:title element and so I would have expected that they would not have
received a boost. So the net effect is that everything has changed and
everything has stayed the same (probably sounds better in French).
Whatever I do the ordering will remain the same, I have tried some
completely insane values (only to discover that the max appears to be
16) and the only outcome is that all the results change by the same
amount and the ordering remains unaltered.
I am beginning to suspect that the whole query weighting song and dance
is just plain broken.
Can someone please tell me what I am doing wrong or what else I might try?
--
Peter Hickman.
Semantico, Lees House, 21-23 Dyke Road, Brighton BN1 3FE
t: 01273 722222
f: 01273 723232
e: [EMAIL PROTECTED]
w: www.semantico.com
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general