collecting data (was Re: What are the business cases for collaborative filtering?)

Ian Holsman Sun, 12 Oct 2008 00:48:18 -0700

Otis Gospodnetic wrote:

Heh, it sounds like we are going through similar steps.  I first wrote a simple 
"beacon servlet" for tracking purposes.  Then opted for a simpler (and more 
static) pixel tracker and a web server (nginx) logging and a log parser that is supposed 
to process that log and store it to _____ (not sure where, yet, didn't get there) and 
then from there get it to Taste.  This, of course, means more batch oriented processes.  
Going with the beacon servlet approach could *presumably* do something closer to 
real-time recommendations....

right.. we have put our 'real time' portion on the side lines for themoment, and are have hadoop jobs running every X minutes to process thedata coming in.

We are planning on using something like spread or possibly jabber tohandle the pushing the data between the log collectors and the variousreceivers of the data.

Our scale also limits us, we have a lot of page views to count ;-)

Ian, can you elaborate on the "feed data into HDFS" part?  You simply store it 
in HDFS?  Why HDFS?  Why not some other FS or why not a RDBMS?  What happens to your data 
after you store it in the HDFS?

we put the log files onto HDFS so that other things can read them andprocess them.We have several CF applications that use subsets of the data. (forexample a very basic one shows summaries of popular pages on a site, toones that use the Fuzzy K-Means algorithm that Pallavi has contributed)Several of those scripts writes summary info into a sets of mysqlservers that are accessed by various web sites, as our web sitedevelopers are familiar with that.


Regards
Ian

Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

collecting data (was Re: What are the business cases for collaborative filtering?)

Reply via email to