Thanks Eric, I have managed to write my own processor and to get the output as ChukwaRecords with our own customized fields in them. Now, I get to the part where I try to load this output into hive (or actually use the output dir, /repos, as the data directory of a Hive table). In this stage I need to let Hive recognize the ChukwaRecordKey + ChukwaRecord SerDes, so I need your help with that.
I've seen that integration with Pig is pretty straighforward for Chukwa (using Chukwa-Pig.jar), but our idea is to automate the whole process straight into a table, and with Hive you can just define a directory as a hive table input. If we could get the data in a form that hive can regconize, we will not need another stage after the Demux. Can you think of a way to do this? Thanks, On Mon, Feb 22, 2010 at 7:31 PM, Eric Yang <ey...@yahoo-inc.com> wrote: > Hi Oded, > > If you are using the code from TRUNK, instruction here: > > - Package your mapper and reducer classes, and put in a jar file. > - Upload parser jar file to hdfs://host:port/chukwa/demux > - Configure CHUKWA_CONF_DIR/chukwa-demux-conf.xml, add new record type > reference to your class names in Demux aliases section. > > If you are using Chukwa 0.3.0, instruction here: > > - Package your mapper and reducer classes into chukwa-core-0.3.0.jar > - Configure CHUKWA_CONF_DIR/chukwa-demux-conf.xml, add new record type > reference to your class names in Demux aliases section. > > Hope this helps. > > Regards, > Eric > > On 2/22/10 7:28 AM, "Oded Rosen" <o...@legolas-media.com> wrote: > > > I have just sent this mail to Ari, but it is probably wise to share it > will > > all of you: > > > > Hello Ari, > > I'm Oded Rosen, with Legolas Media R&D team. > > We would like to use Chukwa to pass data from our real time servers into > our > > hadoop cluster. The dataflow already reaches several GB/day, and we are > about > > to extend this in the near future. > > Our main aim is to process raw data (in the form of > > fieldname1=value1<tab>fieldname2=value2....\n) into a format that fits > > straight into Hive, for a later processing. > > > > We are already running a DirTailingAdaptor on our input directory, and > recieve > > the the collected data in the chukwa/logs dir. > > Now, we would like to write our own Demux processor, in order to process > the > > sink data, get only the fields we need from it, format the data and write > it > > to the output directory, which will be defined as the input directory of > a > > Hive table. > > > > We have already written mapper/reducer classes that know how to extract > the > > wanted fields from the raw data and apply the needed formats. > > We want to set a Demux processor with these classes as the map/reduce > classes, > > but we could not find any documentation about how to do it. > > All we could do until now is to run the default demux that just copies > the > > data into the output directory. > > We will appreciate any help you can offer us. > > -- Oded