On Wed, Feb 17, 2010 at 20:14, Ariel Rabkin <asrab...@gmail.com> wrote: > Hmm. > > What you propose would certainly work. However, the whole > agent-collector pipeline isn't really necessary if your data already > comes in biggish files. You might try out the Backfilling loader > instead. (bin/backfill.sh) The code is in > org.apache.hadoop.chukwa.tools.backfilling.BackfillingLoader, if you > need to read it. The idea is that you point it at a file, and it > copies that file directly into HDFS, appropriately formatted for > Chukwa. > > One warning. The backfilling loader was developed to meet the needs of > a particular site. Since it's a sort of special-purpose thing, we > haven't yet written the documentation. You might be the second person > to use it in production.
For now I will start with chukwa agent, so I can have minutes of latency before analyzing data in hadoop. It will be nice if the collector could directly grab from syslog-ng at some point, by udp, tcp or a pipe. That will make things lot faster. -- Guille -ℬḭṩḩø- <bi...@tuenti.com> :wq