*nod* Syslog itself is a UDP-based protocol, if I'm not mistaken, and there's no reason the collector couldn't simply listen for incoming packets, and then turn them into Chunks.
Is that a plausible way to do this? 2010/2/18 Guillermo Pérez <bi...@tuenti.com>: > On Wed, Feb 17, 2010 at 20:14, Ariel Rabkin <asrab...@gmail.com> wrote: >> Hmm. >> >> What you propose would certainly work. However, the whole >> agent-collector pipeline isn't really necessary if your data already >> comes in biggish files. You might try out the Backfilling loader >> instead. (bin/backfill.sh) The code is in >> org.apache.hadoop.chukwa.tools.backfilling.BackfillingLoader, if you >> need to read it. The idea is that you point it at a file, and it >> copies that file directly into HDFS, appropriately formatted for >> Chukwa. >> >> One warning. The backfilling loader was developed to meet the needs of >> a particular site. Since it's a sort of special-purpose thing, we >> haven't yet written the documentation. You might be the second person >> to use it in production. > > For now I will start with chukwa agent, so I can have minutes of > latency before analyzing data in hadoop. > > It will be nice if the collector could directly grab from syslog-ng at > some point, by udp, tcp or a pipe. That will make things lot faster. > > -- > Guille -ℬḭṩḩø- <bi...@tuenti.com> > :wq > -- Ari Rabkin asrab...@gmail.com UC Berkeley Computer Science Department