[
https://issues.apache.org/jira/browse/CHUKWA-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12843861#action_12843861
]
Eric Yang commented on CHUKWA-462:
----------------------------------
Test cases failed after applying this patch:
{noformat}
[junit] Running org.apache.hadoop.chukwa.analysis.salsa.fsm.TestFSMBuilder
[junit] Tests run: 2, Failures: 1, Errors: 0, Time elapsed: 123.183 sec
[junit] Running org.apache.hadoop.chukwa.tools.backfilling.TestBackfillingLoader
[junit] Tests run: 4, Failures: 3, Errors: 0, Time elapsed: 73.1 sec
[junit] Test org.apache.hadoop.chukwa.tools.backfilling.TestBackfillingLoader
FAILED
[junit] Running org.apache.hadoop.chukwa.util.TestCreateRecordFile
[junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0.792 sec
[junit] Test org.apache.hadoop.chukwa.util.TestCreateRecordFile FAILED
[junit] Running org.apache.hadoop.chukwa.util.TestFilter
[junit] Tests run: 2, Failures: 1, Errors: 0, Time elapsed: 0.131 sec
{noformat}
> Store the cluster in the key for performance and easier customization on
> mappers
> --------------------------------------------------------------------------------
>
> Key: CHUKWA-462
> URL: https://issues.apache.org/jira/browse/CHUKWA-462
> Project: Hadoop Chukwa
> Issue Type: Improvement
> Components: Data Processors
> Reporter: Guille -bisho-
> Attachments: cluster_in_ChukwaRecordKey.v3.diff
>
>
> Right now the chukwa framework is storing the destination cluster as a tag in
> the Chunk. Then the tags are copied to the ChukwaRecord, and before storing
> it, it's parsed with a regular expression from each record.
> - It's slow to apply a preg to each record
> - It's harder to modify the destination cluster from the mapper, you have to
> tweak the tags field.
> - Takes unneeded space on records storing the cluster on each of them.
> The proposed path:
> - Extracts the cluster from chunk tags just once per chunk, much faster.
> - Stores the cluster in the key, so it's easy to recover.
> - It's easy to tweak from the mapper. Just alter it with
> key.setClusterName(String clusterName)
> - Strips the cluster from the tags field of the resulting chukwa records. If
> the tags field is empty, completely skips setting the tags field in the
> record.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.