It should actually be straightforward to do this with Chukwa.  Chukwa
has a bunch of other pieces, but at its core, it does basically what
you describe.

The one complexity is that instead of storing each file separately,
Chukwa runs them together into larger sequence files.  This turns out
to be important if you want good filesystem performance or if you have
large data volumes or if you want to keep metadata telling you which
machine your file came from.

--Ari

On Fri, Apr 23, 2010 at 5:38 AM, Patrick Datko <[email protected]> wrote:
> Hey everyone,
>
> i deal with hadoop since a few weeks to build up a cluster with hdfs. I
> was looking for several Monitoring tools to observe my cluster and find
> a good solution with ganglia+nagios. To complete the monitoring part of
> the cluster, i am looking for an Log collection tool, which store the
> log files of the nodes centralized. I have tested Chukwa and Facebook's
> Scribe, but both are not that type of simple storing log files, in my
> opinion they are too big, only for such a job.
>
> So i've thinking about writing an own LogCollector. I didn't want
> something special. My idea is, to build a deamon, which could be
> installed on every node in the cluster and onxml-file, which describes
> which log files have to be collected. The daemon should collect, in
> configured time interval, all needed log files and store them using the
> Java API in HDFS.
>
> This was just an idea for a simple LogCollector and it would cool if you
> can give me some opinion about this or whether such a LogCollector
> exits.
>
> Kind regards,
> Patrick
>
>



-- 
Ari Rabkin [email protected]
UC Berkeley Computer Science Department

Reply via email to