[
https://issues.apache.org/jira/browse/CHUKWA-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12970345#action_12970345
]
Bill Graham commented on CHUKWA-564:
------------------------------------
I agree that there are limitations in using annotations on the processors. I
think that where the data is written should be decoupled from the processors. A
processor knows how to process data, but it shouldn't also state where the data
should be written. Generic processors like TsProcessors could be used
repeatedly for different data types, all of which should be written to
different table/column-families. Coupling the two with annotations makes this
difficult. You end up with empty subclasses used only to configure different
data types to table/cfs via overridden annotations.
I suggest we externalize the table/cf mappings from the processors. Instead we
could have something like an HBaseRouterFactory (or something perhaps named
better) that the OutputCollector and the HBaseWriter interact with.
HBaseRouterFactory has a method that takes in a dataType and probably also a
ChukwaRecord and knows how to return the Table and ColumnFamily that the data
should be written too.
We could then configure that dataType 'foo' should use BarProcessor and write
to table 'bat', column family 'biz'.
I don't know how we'd configure 'foo's payload to be written to multiple cfs
though. What's the use case for why we'd want to write the same data to two
locations?
There's still an unresolved separate problem of how to handle ORM-ish
functionality as well, since reduxing the many parameters in the record body
back to a single 'body' field can be sub-optimal.
> HBase output collector uses incorrect column family
> ---------------------------------------------------
>
> Key: CHUKWA-564
> URL: https://issues.apache.org/jira/browse/CHUKWA-564
> Project: Chukwa
> Issue Type: Bug
> Reporter: Bill Graham
> Fix For: 0.5.0
>
>
> The HBase {{OutputCollector}} does this to obtain the column family from the
> data type:
> {noformat}
> cf = key.getReduceType().getBytes();
> {noformat}
> The column family should instead be taken by the {[email protected]}}
> annotation on the processor.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.