Hi,

I want to be able to discover the 10 most popular routes through our  web site 
that lead a visitor to register with us.

I am already logging page view data but don't seem to be able to  find the best 
solution to query it. (Each Visitor has an ID, each Visitor makes multiple 
Visits, each with an ID, and each page request has an ID. I can join across all 
of these fields).

The site is quite high traffic, > 3 million page views per month,  so the 
solution needs to scale. That said, I envisage that for the next year or so the 
dataset would fit on a single machine.

So far my research on Hadoop seems to suggest that it's generally only 
beneficial to use it if the dataset is large enough to warrant running a couple 
of Hadoop nodes.

Is this problem even suited to Hadoop? How might I go about solving this 
problem? 



My initial thought was to use a graph database, and traverse from the 
registration page node outwards, selecting the next node as that which has the 
most links between itself and the registration node. I'm running into 
difficulties here, and wondered whether Hadoop might offer an alternative 
approach.


Any pointers would be greatly appreciated.

Thanks,
Tim



      

Reply via email to