Re: [MarkLogic Dev General] Higher relevance for newer documents?

David Gorbet Tue, 20 Aug 2013 12:49:18 -0700

Why couldn't you do an exponential decay? You control the formula, right? It 
could be (weeks-since-1970)^2, couldn't it?


Sent from my Windows Phone
________________________________
From: Ron Hitchens<mailto:[email protected]>
Sent: ‎8/‎20/‎2013 12:46 PM
To: MarkLogic Developer Discussion<mailto:[email protected]>
Subject: Re: [MarkLogic Dev General] Higher relevance for newer documents?


   Thanks Mike.  I'd looked at a similar idea involving the
copyright year (that was too coarse).  The number of weeks
since some distant date is a pretty good idea.

   I suppose the biggest weakness of this algorithm is that it
is necessarily linear.  You can't do an exponential decay where
the quality of recent document drops off quickly and then levels
of as they get older.  Though linear is better than nothing.

   Is there any downside to constantly increasing quality values?
The quality argument of xdmp:set-document-quality is a 32-bit xs:int.
Is the relevance boost from quality applied evenly across the range
of possible values?

   Thanks.

---
Ron Hitchens {[email protected]}  +44 7879 358212


On Aug 20, 2013, at 7:20 PM, Michael Blakeley <[email protected]> wrote:

> What about using a naturally-increasing number for quality?
>
> For example the number of weeks since 1970:
>
>    xs:integer(
>      (current-date() - xs:date('1970-01-01'))
>       div xs:dayTimeDuration("P7D"))
>    => 531
>
> You can reduce the magnitude of the quality boost by increasing the bucket 
> size: 14D, 30D, etc. Or changing the start-date might also be useful.
>
> No crawl is necessary, unless you change your mind about the boost algorithm.
>
> -- Mike
>
> On 20 Aug 2013, at 11:10 , Ron Hitchens <[email protected]> wrote:
>
>>
>>  What are the techniques out there for giving newer documents
>> higher relevance?  My target is MarkLogic 5.x, but 6.x may be in
>> play before long.
>>
>>  There are two schemes that I am aware of, neither of which feels
>> very elegant:
>>
>> 1) Give documents a high quality value when ingested.  Periodically
>> crawl the content and for any document with positive quality, reduce
>> its quality according to some algorithm until the quality reaches zero.
>>
>>  This gives the best control over "freshness", but has the disadvantage
>> of causing potentially large numbers of updates on each pass with the
>> attendant merges and disk I/O & CPU load.
>>
>> 2) Replicate the "real" query n times, each and-ed with a time-based
>> query against the insertion date.  All of these are or-ed together
>> with descending weights for older dates.
>>
>>  This does't require changing documents to tweak their freshness.  But
>> it also means you have a stair-step function of n-steps, which may not
>> be very precise - and which wouldn't scale very well for large values
>> of n.  And unfortunately, since the queries would be time-based, you
>> can't pre-register them ahead of time.
>>
>>  Any other clever techniques that you've used?
>>
>> ---
>> Ron Hitchens {[email protected]}  +44 7879 358212
>>
>>
>>
>>
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>>
>
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Higher relevance for newer documents?

Reply via email to