Another, more traditional approach is to group by user id, sort by time.
Then you can slide through a single users transactions emitting pairs of
items that occur in the same window.  Windowed co-occurrence is a bit of a
strange beast because it isn't transitive (A can cooccur with B and B with C
while not having A with C).

The problem with what you propose is that users are likely to often come in
for about 5 minutes.  Using 5 minute windows that don't slide will
substantially decrease the number of cooccur.  It should also work well if
you use a very large window such as 2 hours and slide using that or in the
extreme, just group on user and ignore time.  The defects in extreme
solutions is that the downstream algorithms have to be better at handling
more data (potentially roughly quadratic in window size if all users are
active all the time) and better at handling noise due to attention span
issues.



On Sun, Aug 2, 2009 at 3:51 AM, Robin Anil <[email protected]> wrote:

> What I am thinking is given a 5 minute window in time for a given user,
> group all the queries (if they are unique choose only one) and call that as
> a transaction for PFPGrowth.
>



-- 
Ted Dunning, CTO
DeepDyve

Reply via email to