Hmmm, is getting hard the way your mails/questions are structured, but
I'll try to do my best to recapitulate your question, take the relevant
dasl snippert, and try to answer it, or indicate wether I need to sort
it out. I would appreciate if in the future you could adhere a little
more to community mail standards. Anyway, here we go:

Question 1: 'In the where clause of the dasl I only check for h:type. I
would expect the score for every hit to be the same, but this isn't.
Why?'

Answer 1: The relevant dasl part is

<d:and>
   <d:or>
        <d:eq>
                <d:prop><h:type/></d:prop>
                <d:literal>product</d:literal>
        </d:eq>
        <d:eq>
                <d:prop><h:type/></d:prop>
                <d:literal>subsidie</d:literal>
        </d:eq>
    </d:or>
</d:and> 

So, a document is either of type 'product', *or* of type 'subsidie', or
something else. Never of 2 types or more. So, you would expect that if a
document is of type 'product' or of type 'subsidie' they would have a
equal scoring. Basically, you could easily write your own query impl
doing this, or use something like a ConstantScoreQuery which is already
present in lucene. I can agree with your conceptual intuition about
equal score, but I might be able to persuade you into thinking
differently about scoring and searching (though if you want to know more
I suggest to read a book about it :-) ). 

Anyway, suppose you are looking in the world for *all* persons that are
as tall as 1.80m *or* 2.26m and rank the result by score. You have about
500.000.000 hits for 1.80m and about 50 hits for 2.26m. 

Now my question to you: Should all hits score equally, or do you think
the hits for 2.26m are much more specific and interesting and should
rank higher? So, should we rank your hits all equal, or are the
documents of the more frequent type of less importance. You might want
to take a look at the lucene scoring algorithm at [1], but the level is
'expert' and it is far from trivial


Question 2: 'When I change the ordering of sorting, my absolute scoring
value changes for the same hits'

Answer 2: Relevant dasl part is:

<d:orderby>
                <d:order><d:prop><d:score/></d:prop></d:order>
        
<d:order><d:prop><h:sortTitle/></d:prop><d:ascending/></d:order>
</d:orderby>

reversing h:sortTitle with d:score, the absolute value of the returned
score is different. The relative value is the same. 

Dasl searches *should* return scores between 0...1000, so [0,1000].
Lucene scoring is normalized to this range. Your numbers indicate that
currently normalisation does not work correct, since values way higher
then 1000 are returned. This is because normalisation is done with
respect to the first hit, where this hit is assumed to have the highest
socre, but this clearly is not true (when you sort on anything different
then d:score). So, this can be considered as a bug. I hope it is easy to
solve (from a search result in lucene i need to be able to get the
document with the highest score, otherwise solving the issue would
impose a very large performance issue, but this is to much detail).
Bottom line at the moment is, that the ordering is correct, as are the
relative score values, only the absolute scores are not normalized
correctly,

Hopefully these are the answers to your questions,

-Ard

[1]
http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/or
g/apache/lucene/search/Similarity.html


> Best Ard,
>  
> See attachment, Hopefully better readable :).
>  
> Regards,
> Amon
> 
> 
> 
> > Subject: RE: [HippoCMS-dev] WEBDAV search and score> Date: 
> Tue, 22 Jan 
> > 2008 18:03:44 +0100> From: [EMAIL PROTECTED]> To: 
> > [email protected]> > Hello Amon,> > is it possible to 
> > attach your dasls as xml or have a mail with proper> indenting: I 
> > cannot read it like this :-)> > Pls also note that this is a public 
> > list, where non native dutch> speakers are subscribed to as well. I 
> > recently added the d:score> support, so there might be an 
> issue with 
> > it. Also do note, that the> scoring algorithm in lucene is non 
> > trivial, so explaining why a certain> doc scores higher than some 
> > other is quite hard. I'll try to find the> default scoring 
> algorithm, 
> > and find the most important parts of it. > > I do admit that only 
> > searching for a docs type you would expect to have> all scores 
> > similar. If this is a bug i'll try to reproduce it and 
> find> time to 
> > fix it, otherwise i'll try to explain why it works like it> works. 
> > Also note that you can write your own scoring algorithm, 
> but it> might 
> > take you some time if your not familiar with it (but of 
> course, if> it 
> > is good looking code, you'll make it to the contributors 
> list :-) )> > 
> > So, pls resend your used dasls,> > Regards Ard> > > > > 
> Beste Ard,> > 
> > > > Een vraag over het score attribuut. ik gebruik de volgende > > 
> > webdav search:> > > > <?xml version="1.0" 
> > encoding="utf-8"?><webdav:request > > 
> > xmlns:webdav="http://hippo.nl/webdav/1.0"; > > 
> > xmlns:S="http://jakarta.apache.org/slide/"; > > 
> > xmlns:jx="http://apache.org/cocoon/templates/jx/1.0"; > > 
> > xmlns:h="http://hippo.nl/cms/1.0"; xmlns:d="DAV:" > > 
> method="SEARCH" > 
> > > 
> target="webdav://xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"><webdav:heade> > r 
> > value="Infinity" > > 
> > name="Depth"/><webdav:body><d:searchrequest><d:basicsearch><d:> > 
> > select><d:prop><S:hitPosition/><S:nrHits/><h:title/><h:product> > 
> > Type/><d:score/></d:prop></d:select>> > 
> > <d:from><d:scope><d:href/><d:depth>Infinity</d:depth></d:scope> > 
> > ></d:from>> > 
> > <d:where><d:and><d:or><d:eq><d:prop><h:type/></d:prop><d:liter> > 
> > al>product</d:literal></d:eq><d:eq><d:prop><h:type/></d:prop><> > 
> > d:literal>subsidie</d:literal></d:eq></d:or></d:and></d:where>> > 
> > <d:orderby><d:order><d:prop><d:score/></d:prop></d:order><d:or> > 
> > der><d:prop><h:sortTitle/></d:prop><d:ascending/></d:order></d> > 
> > :orderby>> > 
> > 
> <d:limit><d:nresults>15</d:nresults><S:offset>0</S:offset></d:limit>> 
> > > 
> </d:basicsearch></d:searchrequest></webdav:body></webdav:request>> > 
> > > > Als ik een search doe verwacht ik voor elk gevonden 
> document > > 
> > dezelfde score. Echter, dit is niet het geval. Waardoor kan > > dit 
> > komen? In de where clausule van de wevdav wordt er alleen > 
> > op type 
> > gecontroleerd.> > > > Als ik de volgende webdav gebruik:> > 
> > > <?xml 
> > version="1.0" encoding="utf-8"?><webdav:request > > 
> > xmlns:webdav="http://hippo.nl/webdav/1.0"; > > 
> > xmlns:S="http://jakarta.apache.org/slide/"; > > 
> > xmlns:jx="http://apache.org/cocoon/templates/jx/1.0"; > > 
> > xmlns:h="http://hippo.nl/cms/1.0"; xmlns:d="DAV:" > > 
> method="SEARCH" > 
> > > 
> target="webdav://xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"><webdav:heade> > r 
> > value="Infinity" > > 
> > name="Depth"/><webdav:body><d:searchrequest><d:basicsearch><d:> > 
> > select><d:prop><S:hitPosition/><S:nrHits/><h:title/><h:product> > 
> > Type/><d:score/></d:prop></d:select>> > 
> > <d:from><d:scope><d:href/><d:depth>Infinity</d:depth></d:scope> > 
> > ></d:from>> > 
> > <d:where><d:and><d:or><d:eq><d:prop><h:type/></d:prop><d:liter> > 
> > al>product</d:literal></d:eq><d:eq><d:prop><h:type/></d:prop><> > 
> > d:literal>subsidie</d:literal></d:eq></d:or><d:or>> > 
> > <S:strict-property-contains>> > > > 
> > <d:prop><h:title/></d:prop><d:literal>bouwvergunning</d:litera> > 
> > l></S:strict-property-contains>> > <S:property-contains>> > > > 
> > <d:prop><h:title/></d:prop><d:literal>bouwvergunning</d:litera> > 
> > l></S:property-contains>> > <S:propsearch>> > > > 
> > <d:prop><h:sortTitle/></d:prop><d:literal>bouwvergunning*</d:l> > 
> > iteral></S:propsearch>> > <S:propsearch>> > > > 
> > <d:prop><h:subtitle/></d:prop><d:literal>bouwvergunning*</d:li> > 
> > teral></S:propsearch><d:contains>bouwvergunning</d:contains></> > 
> > d:or></d:and></d:where>> > 
> <d:orderby><d:order><d:prop><d:score/></d:prop></d:order><d:or
> > > 
> der><d:prop><h:sortTitle/></d:prop><d:ascending/></d:order></d
> > > :orderby>> > 
> <d:limit><d:nresults>15</d:nresults><S:offset>0</S:offset></d:
> limit>> > 
> </d:basicsearch></d:searchrequest></webdav:body></webdav:reque
> st>> > > > > > > > krijg ik in de frontend het volgende 
> resultaat, zie foto > > scorealfa.jpg.> > > > als ik de 
> "order by" omdraai:> > 
> <d:orderby><d:order><d:prop><h:sortTitle/></d:prop></d:order><
> > > 
> d:order><d:prop><d:score/></d:prop><d:ascending/></d:order></d
> > > :orderby>> > > > krijg ik hele andere score terug. Zie 
> foto alfascore.jpg.> > > > Waarom is deze score heel anders?> 
> > > > Met vriendelijke groet,> > Amon> > > > > > > > 
> ______________________________________________________________
> ___> > Express yourself instantly with MSN Messenger! 
> Download today > > it's FREE!> > 
> http://messenger.msn.click-url.com/go/onm00200471ave/direct/01
> /> > > ********************************************> 
> Hippocms-dev: Hippo CMS development public mailinglist
> _________________________________________________________________
> Express yourself instantly with MSN Messenger! Download today 
> it's FREE!
> http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
> 
********************************************
Hippocms-dev: Hippo CMS development public mailinglist

Reply via email to