Fwd: Chukwa integration for Legolas Media real-time servers

Oded Rosen Mon, 22 Feb 2010 07:28:28 -0800

I have just sent this mail to Ari, but it is probably wise to share it will
all of you:


Hello Ari,
I'm Oded Rosen, with Legolas Media R&D team.
We would like to use Chukwa to pass data from our real time servers into our
hadoop cluster. The dataflow already reaches several GB/day, and we are
about to extend this in the near future.
Our main aim is to process raw data (in the form of
fieldname1=value1<tab>fieldname2=value2....\n) into a format that fits
straight into Hive, for a later processing.

We are already running a DirTailingAdaptor on our input directory, and
recieve the the collected data in the chukwa/logs dir.
Now, we would like to write our own Demux processor, in order to process the
sink data, get only the fields we need from it, format the data and write it
to the output directory, which will be defined as the input directory of a
Hive table.

We have already written mapper/reducer classes that know how to extract the
wanted fields from the raw data and apply the needed formats.
We want to set a Demux processor with these classes as the map/reduce
classes, but we could not find any documentation about how to do it.
All we could do until now is to run the default demux that just copies the
data into the output directory.
We will appreciate any help you can offer us.

-- 
Oded

Fwd: Chukwa integration for Legolas Media real-time servers

Reply via email to