Thanks Hari - that does help. I was envisioning something akin to the
RegexSerde in Hive, where you can just write a regular expression to
extract fields from the event data and put in to separate columns (within a
CF). Sounds like a customer Serializer is exactly what I want here.

- Patrick

On Sat, Jun 9, 2012 at 11:01 PM, Hari Shreedharan <[email protected]
> wrote:

> Hi Patrick,
>
> The HbaseSink has 2 components - one being the sink itself and the other
> being the serializer. When the sink picks up an event from the channel, it
> is handed over to the serializer which can process the event and return
> Puts and/or Increments. So if you plan to write to different columns within
> the same column family, all you need to do is to write your own serializer
> that implements HbaseEventSerializer, and set that as the serializer for
> the HbaseSink.
>
> If you need to write to more than one column family, the way to do it is
> to add a header to the event based on the column family/column, use the
> multiplexing channel selector to divert the event to different flows and
> then use multiple Hbase sinks. As of now, the HbaseSink writes only to one
> table and one column family. This was done to simplify configuration and
> the serializer interface.
>
> Basically - write a HBaseEventSerializer and plug it into the HbaseSink,
> which will write to Hbase
>
>
> I hope this helps.
>
>
> Thanks
> Hari
>
>
> --
> Hari Shreedharan
>
>
> On Saturday, June 9, 2012 at 11:27 PM, Patrick Wendell wrote:
>
> > Hi There,
> >
> > For certain types of event data, such as log files, it would be nice to
> > have a way to write to HBase such that fields from the original file can
> be
> > parsed into distinct columns.
> >
> > I want to implement this for a one-off project (and maybe for
> contribution
> > back to flume if this makes sense).
> >
> > What is the best way to go about it? Based on skimming the code my sense
> is
> > that writing a custom HBase sink makes the most sense. Is that heading
> down
> > the right path, or is there some other component I should be modifying or
> > extending?
> >
> > - Patrick
>
>

Reply via email to