You might want to leverage some of the research for good heuristics on
session reconstruction. For example, the following paper by Spiliopoulou et
al. is a good starting point:

http://maya.cs.depaul.edu/~mobasher/papers/SMBN03.pdf

They give a couple different heuristics you can try and you might want to
experiment with each of them to see how it affects your results.

Scott

On Sun, Aug 2, 2009 at 8:14 AM, Robin Anil <[email protected]> wrote:

> As I see from the dataset, most of the queries that follow a query don't
> look like they are related, if, they differ by say a day. I will try with a
> 2 hour window and see what happens. If you have any tag-tag dataset,then I
> believe the results will look very cool for a demo
> Robin
>
> On Sun, Aug 2, 2009 at 8:23 PM, Ted Dunning <[email protected]> wrote:
>
> > Another, more traditional approach is to group by user id, sort by time.
> > Then you can slide through a single users transactions emitting pairs of
> > items that occur in the same window.  Windowed co-occurrence is a bit of
> a
> > strange beast because it isn't transitive (A can cooccur with B and B
> with
> > C
> > while not having A with C).
> >
> > The problem with what you propose is that users are likely to often come
> in
> > for about 5 minutes.  Using 5 minute windows that don't slide will
> > substantially decrease the number of cooccur.  It should also work well
> if
> > you use a very large window such as 2 hours and slide using that or in
> the
> > extreme, just group on user and ignore time.  The defects in extreme
> > solutions is that the downstream algorithms have to be better at handling
> > more data (potentially roughly quadratic in window size if all users are
> > active all the time) and better at handling noise due to attention span
> > issues.
> >
> >
> >
> > On Sun, Aug 2, 2009 at 3:51 AM, Robin Anil <[email protected]> wrote:
> >
> > > What I am thinking is given a 5 minute window in time for a given user,
> > > group all the queries (if they are unique choose only one) and call
> that
> > as
> > > a transaction for PFPGrowth.
> > >
> >
> >
> >
> > --
> > Ted Dunning, CTO
> > DeepDyve
> >
>

Reply via email to