Re: Feeding the HTM network streams of web click data?

Subutai Ahmad Tue, 02 Feb 2016 07:48:17 -0800

Hi Dmitri,

Yes, that's an interesting challenge. We had the same issue in the Forbes
dataset. I think on average a person only visited 6 pages, and we didn't
have previous browsing history for these users.

We ended up treating each new visit to the website as a separate sequence,
with a reset in between.  If you train the TM on a large set of sequences
from previous users, it generalize to new users. Essentially it will try to
find other users with a similar pattern of activity and make predictions
based on those previous sequences. The TM can make multiple predictions
simultaneously, so you will get a probability distribution of next clicks.
We also grouped pages into < 200 categories, rather than treat each page
individually (there were thousands of different pages). Grouping into
topics helped reduce the complexity of the problem.

We didn't get a chance to experiment with spatial aspects. I agree it could
help, for example including demographic information, OS type, etc.

New product adds another layer of challenge - since you don't have history
for that product either.

--Subutai

On Tue, Feb 2, 2016 at 3:33 AM, dokondr <[email protected]> wrote:

> Hi Subutai,
>
> Thanks for the presentation. The problem I am trying to solve is different 
> but related to predicting where a person browsing a website is likely to 
> click next. It is so called "cold start" problem for recommendation system 
> that actively learns from new user implicit feedback. The goal is to 
> determine product ratings for a new user who just came and started browsing 
> products on a web site for the first time. I think both spatial and temporal 
> aspects are important here and HTM may help to classify user preferences. The 
> main problem is that very little data is available both for new user and new 
> product cold-start scenario.
>
>
> On* Mon Feb 1 15:54:37 EST 2016*, *Subutai Ahmad*  wrote:
>
>> Hi Dmitri,
>>
>> That paragraph refers to work I did way back in 2009/2010. We got a huge
>> amount of real web traffic data from the news site forbes.com and used it
>> to debug the very first versions of the Temporal Memory algorithms.  I've
>> attached the presentation I gave at a workshop a while ago.  (Note that it
>> uses our old product name Grok - you can ignore that and substitute "HTM".)
>>
>> Unfortunately we are not allowed to release the data.  It would be great if
>> someone could find similar data from another company that we could actually
>> release as a dataset.
>>
>> --Subutai
>>
>>
>>

Re: Feeding the HTM network streams of web click data?

Reply via email to