Or Demux could be the right place to do it ...
If you need to parse/format your data ten demux is the right place to do
that and with my new demux class (to be published) then you can parse/format
your data and then use any output format to get the best format that match
your needs, and in my case Demux output is an Hive SeqFile.

On the other end, if you don't need to parse the data and you just want to
store the data without any modification then having an HbaseWriter like Ari
mentions could be an other option.

/Jerome.

BTW, I will do a short presentation tonight at the Facebook/Hive user group
On how we are using Honu(Chukwa-Streaming) and hive to compute stats and
metrics information at Netflix.

On 3/18/10 11:49 AM, "Ariel Rabkin" <asrab...@gmail.com> wrote:

> Hrm.
> 
> Demux might not be the right place in the processing pipeline to
> attack your problem.  The Chukwa collector supports pluggable writers,
> and you could think about having data pushed directly from collectors
> to HBase.  Data shows up at the collector in variable-length Chunks,
> so you'd have to parse 'em and figure out how to map them into your
> particular table schema.
> 
> --Ari
> 
> On Wed, Mar 17, 2010 at 10:00 AM, Oded Rosen <o...@legolas-media.com> wrote:
>> I work with a hadoop cluster with tons of new data each day.
>> The data is flowing into hadoop from outside servers, using chukwa.
>> Chukwa has a tool called demux, a builtin mapred job.
>> Chukwa users may write their own map & reduce classes for this demux, with
>> the only limitation that the input & output types are chukwa records - I
>> cannot use HBase's TableMap, TableReduce.
>> In order to write data to hbase during this mapred job, I can only use the
>> table.put & table.commit, which work on one hbase raw only (aren't they?).
>> This raised serious latency issues, as writing thousands of records to hbase
>> this way every 5 minutes is not effective and really s-l-o-w.
>> Even if I'll move the hbase writing from the map phase to the reduce phase,
>> the same rows should be updated, so moving the ".put" to the reducer seems
>> does not suppose to change anything.
>> I would like to write straight to hbase from the chukwa demuxer, and not to
>> have another job that reads the chukwa output and write it to hbase.
>> The target is to have this data as fast as I can in hbase.
>> Is there a way to write effectively to hbase without TableReduce? Have I got
>> something wrong?
>> is there someone using Chukwa that managed to do this thing?
>> 
>> 
>> Thanks in advance for any kind of help,
>> --
>> Oded
>> 
> 
> 

Reply via email to