You might want to leverage some of the research for good heuristics on session reconstruction. For example, the following paper by Spiliopoulou et al. is a good starting point:
http://maya.cs.depaul.edu/~mobasher/papers/SMBN03.pdf They give a couple different heuristics you can try and you might want to experiment with each of them to see how it affects your results. Scott On Sun, Aug 2, 2009 at 8:14 AM, Robin Anil <[email protected]> wrote: > As I see from the dataset, most of the queries that follow a query don't > look like they are related, if, they differ by say a day. I will try with a > 2 hour window and see what happens. If you have any tag-tag dataset,then I > believe the results will look very cool for a demo > Robin > > On Sun, Aug 2, 2009 at 8:23 PM, Ted Dunning <[email protected]> wrote: > > > Another, more traditional approach is to group by user id, sort by time. > > Then you can slide through a single users transactions emitting pairs of > > items that occur in the same window. Windowed co-occurrence is a bit of > a > > strange beast because it isn't transitive (A can cooccur with B and B > with > > C > > while not having A with C). > > > > The problem with what you propose is that users are likely to often come > in > > for about 5 minutes. Using 5 minute windows that don't slide will > > substantially decrease the number of cooccur. It should also work well > if > > you use a very large window such as 2 hours and slide using that or in > the > > extreme, just group on user and ignore time. The defects in extreme > > solutions is that the downstream algorithms have to be better at handling > > more data (potentially roughly quadratic in window size if all users are > > active all the time) and better at handling noise due to attention span > > issues. > > > > > > > > On Sun, Aug 2, 2009 at 3:51 AM, Robin Anil <[email protected]> wrote: > > > > > What I am thinking is given a 5 minute window in time for a given user, > > > group all the queries (if they are unique choose only one) and call > that > > as > > > a transaction for PFPGrowth. > > > > > > > > > > > -- > > Ted Dunning, CTO > > DeepDyve > > >
