Hi, I have a new Demux that use something similar to MultipleOutputFormat and one of my output is an Hive SeqFile (directly from Demux). So I guess that it should not be difficult to get a specific OutputFormat for Hbase. Do you have any special requirement other than being able to output to HBase?
/Jerome. On 3/17/10 10:00 AM, "Oded Rosen" <o...@legolas-media.com> wrote: > I work with a hadoop cluster with tons of new data each day. > The data is flowing into hadoop from outside servers, using chukwa. > > Chukwa has a tool called demux, a builtin mapred job. > Chukwa users may write their own map & reduce classes for this demux, with the > only limitation that the input & output types are chukwa records - I cannot > use HBase's TableMap, TableReduce. > In order to write data to hbase during this mapred job, I can only use the > table.put & table.commit, which work on one hbase raw only (aren't they?). > This raised serious latency issues, as writing thousands of records to hbase > this way every 5 minutes is not effective and really s-l-o-w. > Even if I'll move the hbase writing from the map phase to the reduce phase, > the same rows should be updated, so moving the ".put" to the reducer seems > does not suppose to change anything. > > I would like to write straight to hbase from the chukwa demuxer, and not to > have another job that reads the chukwa output and write it to hbase. > The target is to have this data as fast as I can in hbase. > > Is there a way to write effectively to hbase without TableReduce? Have I got > something wrong? > is there someone using Chukwa that managed to do this thing? > > > Thanks in advance for any kind of help,