[ 
https://issues.apache.org/jira/browse/CHUKWA-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12843849#action_12843849
 ] 

Eric Yang commented on CHUKWA-462:
----------------------------------

+1 Looks good, and it'll speeds up demux.  The original record design was 
aiming for generalization instead of speed.  In real use case, it's better to 
have the concept of grouping data by cluster.  Hence, the cluster concept is 
already set in stone in Chukwa.  Hence, this performance improvement is a 
reasonable trading off for "clusterName" to become a reserved keyword for 
Chukwa.

> Store the cluster in the key for performance and easier customization on 
> mappers
> --------------------------------------------------------------------------------
>
>                 Key: CHUKWA-462
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-462
>             Project: Hadoop Chukwa
>          Issue Type: Improvement
>          Components: Data Processors
>            Reporter: Guille -bisho-
>         Attachments: cluster_in_ChukwaRecordKey.v3.diff
>
>
> Right now the chukwa framework is storing the destination cluster as a tag in 
> the Chunk. Then the tags are copied to the ChukwaRecord, and before storing 
> it, it's parsed with a regular expression from each record.
> - It's slow to apply a preg to each record
> - It's harder to modify the destination cluster from the mapper, you have to 
> tweak the tags field.
> - Takes unneeded space on records storing the cluster on each of them.
> The proposed path:
> - Extracts the cluster from chunk tags just once per chunk, much faster.
> - Stores the cluster in the key, so it's easy to recover.
> - It's easy to tweak from the mapper. Just alter it with 
> key.setClusterName(String clusterName)
> - Strips the cluster from the tags field of the resulting chukwa records. If 
> the tags field is empty, completely skips setting the tags field in the 
> record.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to