As I see from the dataset, most of the queries that follow a query don't look like they are related, if, they differ by say a day. I will try with a 2 hour window and see what happens. If you have any tag-tag dataset,then I believe the results will look very cool for a demo Robin
On Sun, Aug 2, 2009 at 8:23 PM, Ted Dunning <[email protected]> wrote: > Another, more traditional approach is to group by user id, sort by time. > Then you can slide through a single users transactions emitting pairs of > items that occur in the same window. Windowed co-occurrence is a bit of a > strange beast because it isn't transitive (A can cooccur with B and B with > C > while not having A with C). > > The problem with what you propose is that users are likely to often come in > for about 5 minutes. Using 5 minute windows that don't slide will > substantially decrease the number of cooccur. It should also work well if > you use a very large window such as 2 hours and slide using that or in the > extreme, just group on user and ignore time. The defects in extreme > solutions is that the downstream algorithms have to be better at handling > more data (potentially roughly quadratic in window size if all users are > active all the time) and better at handling noise due to attention span > issues. > > > > On Sun, Aug 2, 2009 at 3:51 AM, Robin Anil <[email protected]> wrote: > > > What I am thinking is given a 5 minute window in time for a given user, > > group all the queries (if they are unique choose only one) and call that > as > > a transaction for PFPGrowth. > > > > > > -- > Ted Dunning, CTO > DeepDyve >
