[jira] Commented: (CHUKWA-564) HBase output collector uses incorrect column family

Bill Graham (JIRA) Fri, 10 Dec 2010 15:59:27 -0800

    [ 
https://issues.apache.org/jira/browse/CHUKWA-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12970345#action_12970345
 ]


Bill Graham commented on CHUKWA-564:
------------------------------------

I agree that there are limitations in using annotations on the processors. I 
think that where the data is written should be decoupled from the processors. A 
processor knows how to process data, but it shouldn't also state where the data 
should be written. Generic processors like TsProcessors could be used 
repeatedly for different data types, all of which should be written to 
different table/column-families. Coupling the two with annotations makes this 
difficult. You end up with empty subclasses used only to configure different 
data types to table/cfs via overridden annotations.

I suggest we externalize the table/cf mappings from the processors. Instead we 
could have something like an HBaseRouterFactory (or something perhaps named 
better) that the OutputCollector and the HBaseWriter interact with. 
HBaseRouterFactory has a method that takes in a dataType and probably also a 
ChukwaRecord and knows how to return the Table and ColumnFamily that the data 
should be written too. 

We could then configure that dataType 'foo' should use BarProcessor and write 
to table 'bat', column family 'biz'.

I don't know how we'd configure 'foo's payload to be written to multiple cfs 
though. What's the use case for why we'd want to write the same data to two 
locations?

There's still an unresolved separate problem of how to handle ORM-ish 
functionality as well, since reduxing the many parameters in the record body 
back to a single 'body' field can be sub-optimal.

> HBase output collector uses incorrect column family
> ---------------------------------------------------
>
>                 Key: CHUKWA-564
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-564
>             Project: Chukwa
>          Issue Type: Bug
>            Reporter: Bill Graham
>             Fix For: 0.5.0
>
>
> The HBase {{OutputCollector}} does this to obtain the column family from the 
> data type:
> {noformat}
> cf = key.getReduceType().getBytes();
> {noformat}
> The column family should instead be taken by the {[email protected]}} 
> annotation on the processor.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CHUKWA-564) HBase output collector uses incorrect column family

Reply via email to