When you say deviation, do you mean deviation in rate of something that is
nominally Poisson distributed (after accounting for diurnal and weekly
variations)?
If you can do the computation on week boundaries, then these variations have
less effect.
Alternatively, if you base the computation on rank, it is also much easier.
For the rank-based computation, if you assume Zipfian distribution, then the
cumulative distribution is close to log r (r = rank). This is typically
accurate for common web applications except for the top 1% or so of items.
This implies that the quantity
score = log (r_old / r_new)
is a good measure of the significance of the change for an item. This is
close to what we used at Veoh with very good results. We tried a number of
more complicated approaches and this was one of the best. Certainly, it was
the best simple measure. You could equivalently use log(k_new / k_old),
but using ranks self corrects for most frequency variations which is really
nice.
If all this is good enough for you, then you don't need anything fancy at
all. Just count occurrences using any technique you like, rank the top
umpteen of these (10,000 to 100,000 is plenty for most purposes). Then
compare the scores from period to period.
One gotcha is how to handle items that are brand new. The easiest answer is
to give them a rank just beyond anything you retain counts for. Thus, if
you keep 10,000 counts, any item that doesn't appear is given a rank of
10,001.
Was this even what you wanted?
On Wed, Sep 23, 2009 at 9:19 PM, Palleti, Pallavi <
[email protected]> wrote:
> Hi Ted,
>
> I am especially looking for efficient algorithms for computing trends
> (Similar to Google's hot-trends). The simple approach to start with it is to
> compute the deviation of item(URL/query) from its current score. I'm
> exploring if I can do the same more efficiently and effectively.
>
> Thanks
> Pallavi
>
> -----Original Message-----
> From: Ted Dunning [mailto:[email protected]]
> Sent: Wednesday, September 23, 2009 10:29 PM
> To: [email protected]
> Subject: Re: Trending and prediction Analysis
>
> Can you say just a bit more about what you want?
>
> On Wed, Sep 23, 2009 at 9:45 AM, Palleti, Pallavi <
> [email protected]> wrote:
>
> > Hi all,
> >
> > I am currently exploring trending and prediction analysis algorithms.
> > Kindly refer any good work/paper that you have come across.
> >
> > Thanks
> > Pallavi
> >
> >
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>
--
Ted Dunning, CTO
DeepDyve