Otis Gospodnetic wrote:
Heh, it sounds like we are going through similar steps.  I first wrote a simple 
"beacon servlet" for tracking purposes.  Then opted for a simpler (and more 
static) pixel tracker and a web server (nginx) logging and a log parser that is supposed 
to process that log and store it to _____ (not sure where, yet, didn't get there) and 
then from there get it to Taste.  This, of course, means more batch oriented processes.  
Going with the beacon servlet approach could *presumably* do something closer to 
real-time recommendations....

right.. we have put our 'real time' portion on the side lines for the moment, and are have hadoop jobs running every X minutes to process the data coming in.

We are planning on using something like spread or possibly jabber to handle the pushing the data between the log collectors and the various receivers of the data.
Our scale also limits us, we have a lot of page views to count ;-)
Ian, can you elaborate on the "feed data into HDFS" part?  You simply store it 
in HDFS?  Why HDFS?  Why not some other FS or why not a RDBMS?  What happens to your data 
after you store it in the HDFS?


we put the log files onto HDFS so that other things can read them and process them. We have several CF applications that use subsets of the data. (for example a very basic one shows summaries of popular pages on a site, to ones that use the Fuzzy K-Means algorithm that Pallavi has contributed) Several of those scripts writes summary info into a sets of mysql servers that are accessed by various web sites, as our web site developers are familiar with that.

Regards
Ian
Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



Reply via email to