Store the cluster in the key for performance and easier customization on mappers
--------------------------------------------------------------------------------

                 Key: CHUKWA-462
                 URL: https://issues.apache.org/jira/browse/CHUKWA-462
             Project: Hadoop Chukwa
          Issue Type: Improvement
          Components: Data Processors
            Reporter: Guille -bisho-


Right now the chukwa framework is storing the destination cluster as a tag in 
the Chunk. Then the tags are copied to the ChukwaRecord, and before storing it, 
it's parsed with a regular expression from each record.

- It's slow to apply a preg to each record
- It's harder to modify the destination cluster from the mapper, you have to 
tweak the tags field.
- Takes unneeded space on records storing the cluster on each of them.

The proposed path:

- Extracts the cluster from chunk tags just once per chunk, much faster.
- Stores the cluster in the key, so it's easy to recover.
- It's easy to tweak from the mapper. Just alter it with 
key.setClusterName(String clusterName)
- Strips the cluster from the tags field of the resulting chukwa records. If 
the tags field is empty, completely skips setting the tags field in the record.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to