Otis Gospodnetic wrote:
Heh, it sounds like we are going through similar steps. I first wrote a simple
"beacon servlet" for tracking purposes. Then opted for a simpler (and more
static) pixel tracker and a web server (nginx) logging and a log parser that is supposed
to process that log and store it to _____ (not sure where, yet, didn't get there) and
then from there get it to Taste. This, of course, means more batch oriented processes.
Going with the beacon servlet approach could *presumably* do something closer to
real-time recommendations....
right.. we have put our 'real time' portion on the side lines for the
moment, and are have hadoop jobs running every X minutes to process the
data coming in.
We are planning on using something like spread or possibly jabber to
handle the pushing the data between the log collectors and the various
receivers of the data.
Our scale also limits us, we have a lot of page views to count ;-)
Ian, can you elaborate on the "feed data into HDFS" part? You simply store it
in HDFS? Why HDFS? Why not some other FS or why not a RDBMS? What happens to your data
after you store it in the HDFS?
we put the log files onto HDFS so that other things can read them and
process them.
We have several CF applications that use subsets of the data. (for
example a very basic one shows summaries of popular pages on a site, to
ones that use the Fuzzy K-Means algorithm that Pallavi has contributed)
Several of those scripts writes summary info into a sets of mysql
servers that are accessed by various web sites, as our web site
developers are familiar with that.
Regards
Ian
Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch