Hmm. What you propose would certainly work. However, the whole agent-collector pipeline isn't really necessary if your data already comes in biggish files. You might try out the Backfilling loader instead. (bin/backfill.sh) The code is in org.apache.hadoop.chukwa.tools.backfilling.BackfillingLoader, if you need to read it. The idea is that you point it at a file, and it copies that file directly into HDFS, appropriately formatted for Chukwa.
One warning. The backfilling loader was developed to meet the needs of a particular site. Since it's a sort of special-purpose thing, we haven't yet written the documentation. You might be the second person to use it in production. --Ari 2010/2/16 Guillermo Pérez <bi...@tuenti.com>: > Probably people here already have experience integrating syslog-ng and > chukwa, so I want to let you know what we are planning and discuss > possible improvements. > > We don't want to deploy chukwa agents to all the monitored servers > (little disk and no java there), and use instead syslog-ng that is > sent msgs through UDP to a central syslog server. I have setup it so > it creates files with > /var/log/cluster/$HOST/$YEAR/$MONTH/$DAY/$FACILITY pattern, and I'm > planning importing this to chukwa with an agent and an adaptor > DirTailingAdaptor on /var/log/cluster. Each day we will clean old > temporary files based on path, to avoid rotations and problems with > the adaptor. > > Is there any better option for doing this? Perhaps directly pipe from > syslog to the chukwa agent? But I'm concerned about what will happen > if the pipe doesn't work, the agent is not ready... > > Ideas and suggestions are welcome. > > Thanks a lot in advance! > -- Ari Rabkin asrab...@gmail.com UC Berkeley Computer Science Department