Because your quality value would have to be exponentially
increasing over time.  As you move up the increasingly vertical
curve, you'll soon shoot past the magnitude of a 32 bit number.

   What you really want is for the curve to fall quickly into
the past from now, then level off the further back you go.  You'd
want that curve to be computed relative to the query time, not the
ingestion time.

   You could do the exponential thing if you constantly crawl the
content and re-adjust the quality values.  But not if you stick a
constant number on the document that doesn't change over its lifetime.

---
Ron Hitchens {mailto:[email protected]}   Ronsoft Technologies
     +44 7879 358 212 (voice)          http://www.ronsoft.com
     +1 707 924 3878 (fax)              Bit Twiddling At Its Finest
"No amount of belief establishes any fact." -Unknown

On Aug 20, 2013, at 8:48 PM, David Gorbet <[email protected]> wrote:

> Why couldn't you do an exponential decay? You control the formula, right? It 
> could be (weeks-since-1970)^2, couldn't it?
> 
> Sent from my Windows Phone
> From: Ron Hitchens
> Sent: ‎8/‎20/‎2013 12:46 PM
> To: MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] Higher relevance for newer documents?
> 
> 
>    Thanks Mike.  I'd looked at a similar idea involving the
> copyright year (that was too coarse).  The number of weeks
> since some distant date is a pretty good idea.
> 
>    I suppose the biggest weakness of this algorithm is that it
> is necessarily linear.  You can't do an exponential decay where
> the quality of recent document drops off quickly and then levels
> of as they get older.  Though linear is better than nothing.
> 
>    Is there any downside to constantly increasing quality values?
> The quality argument of xdmp:set-document-quality is a 32-bit xs:int.
> Is the relevance boost from quality applied evenly across the range
> of possible values?
> 
>    Thanks.
> 
> ---
> Ron Hitchens {[email protected]}  +44 7879 358212
> 
> 
> On Aug 20, 2013, at 7:20 PM, Michael Blakeley <[email protected]> wrote:
> 
> > What about using a naturally-increasing number for quality?
> > 
> > For example the number of weeks since 1970:
> > 
> >    xs:integer(
> >      (current-date() - xs:date('1970-01-01'))
> >       div xs:dayTimeDuration("P7D"))
> >    => 531
> > 
> > You can reduce the magnitude of the quality boost by increasing the bucket 
> > size: 14D, 30D, etc. Or changing the start-date might also be useful.
> > 
> > No crawl is necessary, unless you change your mind about the boost 
> > algorithm.
> > 
> > -- Mike
> > 
> > On 20 Aug 2013, at 11:10 , Ron Hitchens <[email protected]> wrote:
> > 
> >> 
> >>  What are the techniques out there for giving newer documents 
> >> higher relevance?  My target is MarkLogic 5.x, but 6.x may be in
> >> play before long.
> >> 
> >>  There are two schemes that I am aware of, neither of which feels
> >> very elegant:
> >> 
> >> 1) Give documents a high quality value when ingested.  Periodically
> >> crawl the content and for any document with positive quality, reduce
> >> its quality according to some algorithm until the quality reaches zero.
> >> 
> >>  This gives the best control over "freshness", but has the disadvantage
> >> of causing potentially large numbers of updates on each pass with the
> >> attendant merges and disk I/O & CPU load.
> >> 
> >> 2) Replicate the "real" query n times, each and-ed with a time-based
> >> query against the insertion date.  All of these are or-ed together
> >> with descending weights for older dates.
> >> 
> >>  This does't require changing documents to tweak their freshness.  But
> >> it also means you have a stair-step function of n-steps, which may not
> >> be very precise - and which wouldn't scale very well for large values
> >> of n.  And unfortunately, since the queries would be time-based, you
> >> can't pre-register them ahead of time.
> >> 
> >>  Any other clever techniques that you've used?
> >> 
> >> ---
> >> Ron Hitchens {[email protected]}  +44 7879 358212
> >> 
> >> 
> >> 
> >> 
> >> _______________________________________________
> >> General mailing list
> >> [email protected]
> >> http://developer.marklogic.com/mailman/listinfo/general
> >> 
> > 
> > _______________________________________________
> > General mailing list
> > [email protected]
> > http://developer.marklogic.com/mailman/listinfo/general
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to