Hi Oded, If you are using the code from TRUNK, instruction here:
- Package your mapper and reducer classes, and put in a jar file. - Upload parser jar file to hdfs://host:port/chukwa/demux - Configure CHUKWA_CONF_DIR/chukwa-demux-conf.xml, add new record type reference to your class names in Demux aliases section. If you are using Chukwa 0.3.0, instruction here: - Package your mapper and reducer classes into chukwa-core-0.3.0.jar - Configure CHUKWA_CONF_DIR/chukwa-demux-conf.xml, add new record type reference to your class names in Demux aliases section. Hope this helps. Regards, Eric On 2/22/10 7:28 AM, "Oded Rosen" <o...@legolas-media.com> wrote: > I have just sent this mail to Ari, but it is probably wise to share it will > all of you: > > Hello Ari, > I'm Oded Rosen, with Legolas Media R&D team. > We would like to use Chukwa to pass data from our real time servers into our > hadoop cluster. The dataflow already reaches several GB/day, and we are about > to extend this in the near future. > Our main aim is to process raw data (in the form of > fieldname1=value1<tab>fieldname2=value2....\n) into a format that fits > straight into Hive, for a later processing. > > We are already running a DirTailingAdaptor on our input directory, and recieve > the the collected data in the chukwa/logs dir. > Now, we would like to write our own Demux processor, in order to process the > sink data, get only the fields we need from it, format the data and write it > to the output directory, which will be defined as the input directory of a > Hive table. > > We have already written mapper/reducer classes that know how to extract the > wanted fields from the raw data and apply the needed formats. > We want to set a Demux processor with these classes as the map/reduce classes, > but we could not find any documentation about how to do it. > All we could do until now is to run the default demux that just copies the > data into the output directory. > We will appreciate any help you can offer us.