Hi Dmitri, Yes, that's an interesting challenge. We had the same issue in the Forbes dataset. I think on average a person only visited 6 pages, and we didn't have previous browsing history for these users.
We ended up treating each new visit to the website as a separate sequence, with a reset in between. If you train the TM on a large set of sequences from previous users, it generalize to new users. Essentially it will try to find other users with a similar pattern of activity and make predictions based on those previous sequences. The TM can make multiple predictions simultaneously, so you will get a probability distribution of next clicks. We also grouped pages into < 200 categories, rather than treat each page individually (there were thousands of different pages). Grouping into topics helped reduce the complexity of the problem. We didn't get a chance to experiment with spatial aspects. I agree it could help, for example including demographic information, OS type, etc. New product adds another layer of challenge - since you don't have history for that product either. --Subutai On Tue, Feb 2, 2016 at 3:33 AM, dokondr <[email protected]> wrote: > Hi Subutai, > > Thanks for the presentation. The problem I am trying to solve is different > but related to predicting where a person browsing a website is likely to > click next. It is so called "cold start" problem for recommendation system > that actively learns from new user implicit feedback. The goal is to > determine product ratings for a new user who just came and started browsing > products on a web site for the first time. I think both spatial and temporal > aspects are important here and HTM may help to classify user preferences. The > main problem is that very little data is available both for new user and new > product cold-start scenario. > > > On* Mon Feb 1 15:54:37 EST 2016*, *Subutai Ahmad* wrote: > >> Hi Dmitri, >> >> That paragraph refers to work I did way back in 2009/2010. We got a huge >> amount of real web traffic data from the news site forbes.com and used it >> to debug the very first versions of the Temporal Memory algorithms. I've >> attached the presentation I gave at a workshop a while ago. (Note that it >> uses our old product name Grok - you can ignore that and substitute "HTM".) >> >> Unfortunately we are not allowed to release the data. It would be great if >> someone could find similar data from another company that we could actually >> release as a dataset. >> >> --Subutai >> >> >>
