As I see from the dataset, most of the queries that follow a query don't
look like they are related, if, they differ by say a day. I will try with a
2 hour window and see what happens. If you have any tag-tag dataset,then I
believe the results will look very cool for a demo
Robin

On Sun, Aug 2, 2009 at 8:23 PM, Ted Dunning <[email protected]> wrote:

> Another, more traditional approach is to group by user id, sort by time.
> Then you can slide through a single users transactions emitting pairs of
> items that occur in the same window.  Windowed co-occurrence is a bit of a
> strange beast because it isn't transitive (A can cooccur with B and B with
> C
> while not having A with C).
>
> The problem with what you propose is that users are likely to often come in
> for about 5 minutes.  Using 5 minute windows that don't slide will
> substantially decrease the number of cooccur.  It should also work well if
> you use a very large window such as 2 hours and slide using that or in the
> extreme, just group on user and ignore time.  The defects in extreme
> solutions is that the downstream algorithms have to be better at handling
> more data (potentially roughly quadratic in window size if all users are
> active all the time) and better at handling noise due to attention span
> issues.
>
>
>
> On Sun, Aug 2, 2009 at 3:51 AM, Robin Anil <[email protected]> wrote:
>
> > What I am thinking is given a 5 minute window in time for a given user,
> > group all the queries (if they are unique choose only one) and call that
> as
> > a transaction for PFPGrowth.
> >
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>

Reply via email to