Suppose you have an element <publisher> in the meta data section of your
documents (exactly one per document). Further suppose you have three
publishers: PUB_IMPORTANT, PUB_STANDARD and PUB_UNIMPORTANT. In the
boosting query you boost the documents of PUB_IMPORTANT with a weight of 5
and the documents of PUB_STANDARD with a weight of 4, using
element-value-queries. The documents of PUB_UNIMPORTANT are not boosted.

Now assume that 90% of your documents belong to PUB_IMPORTANT and only 1%
to PUB_STANDARD. The document frequency of PUB_STANDARD is very little,
compared to the document frequency of PUB_IMPORTANT. My conclusion is that
therefore logtfidf boosts the PUB_IMPORTANT documents less than the
PUB_STANDARD documents, even though the boosting weights prefer
PUB_IMPORTANT documents.
This is the reason why it would be interesting to use simple scoring for
the boosting query only.

My solution to the ordering problem is to insert a new field
<publisher-sort> into the documents which holds the numeric relevance value
for the document's publisher (eg: PUB_IMPORTANT=3, PUT_STANDARD=2,
PUB_UNIMPORTANT=1). Then I sort by this field using cts:index-order as
primary sort field and cts:score-order only as secondary sort field.
The disadvantage of this approach is that now the relevance score is not
able to outdo the publisher anymore, no matter how high the relevance score
is.



2015-06-12 20:01 GMT+02:00 Damon Feldman <[email protected]>:

>  What makes the logtfidf score inappropriate for metadata?
>
>
>
> It occurs to me that if you put the metadata in every doc one time (though
> not in the metadata section if it is not really “there” for the doc) you’ll
> fool the system into thinking that all metadata has the same tf and idf, so
> more like score-simple. It’s a hack and I’m not quite sure it will work, so
> please share the underlying result ordering problem so we know what to work
> around.
>
>
>
> Damon
>
>
>
> *From:* [email protected] [mailto:
> [email protected]] *On Behalf Of *Andreas Hubmer
> *Sent:* Tuesday, June 09, 2015 6:59 AM
> *To:* MarkLogic Developer Discussion
> *Subject:* [MarkLogic Dev General] Boosting with simple scoring
>
>
>
> Hi,
>
>
>
> I am using cts:search with the option "score-logtfidf" and a
> cts:boost-query for my search. I am boosting on some meta-elements that are
> enumerations where the meaning does not matter. But what one value is "on"
> and due to logtfidf scoring (I guess) the boost on that value does not
> matter much.
>
>
>
> logtfidf is perfect for my word queries, but not for the
> element-value-queries that I use in the boosting query. Is it possible to
> use "score-simple" for the boosting query only and "score-logtfidf" for the
> "real" query?
>
>
>
> Best regards,
>
> Andreas
>
>
>
>
> --
>
> Andreas Hubmer
>
> IT Consultant
>
>
>
> EBCONT enterprise technologies GmbH
>
> Millennium Tower
>
> Handelskai 94-96
>
> A-1200 Vienna
>
>
>
> OUR TEAM IS YOUR SUCCESS
>
>
>
> UID-Nr. ATU68135644
>
> HG St.Pölten - FN 399978 d
>
> _______________________________________________
> General mailing list
> [email protected]
> Manage your subscription at:
> http://developer.marklogic.com/mailman/listinfo/general
>
>


-- 
Andreas Hubmer
IT Consultant

EBCONT enterprise technologies GmbH
Millennium Tower
Handelskai 94-96
A-1200 Vienna

Mobile: +43 664 60651861
Fax: +43 2772 512 69-9
Email: [email protected]
Web: http://www.ebcont.com

OUR TEAM IS YOUR SUCCESS

UID-Nr. ATU68135644
HG St.Pölten - FN 399978 d
_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to