[jira] [Commented] (CHUKWA-647) Spread out intermediate data with the same ReduceType into different Reduce Tasks

Jie Huang (JIRA) Sun, 15 Jul 2012 23:36:39 -0700

    [ 
https://issues.apache.org/jira/browse/CHUKWA-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13414878#comment-13414878
 ]


Jie Huang commented on CHUKWA-647:
----------------------------------

The current ChukwaRecordPartitioner dispatches the records to different Reduce 
Tasks based on ReduceType. 
{noformat}
return (key.getReduceType().hashCode() & Integer.MAX_VALUE)
{noformat}
I wonder if it is possible to include the key or part of the key content into 
the ChukwaRecordPartitioner, so that we can spread out all those map output 
data into different Reduce Tasks even for the same Reduce Type.

                
> Spread out intermediate data with the same ReduceType into different Reduce 
> Tasks
> ---------------------------------------------------------------------------------
>
>                 Key: CHUKWA-647
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-647
>             Project: Chukwa
>          Issue Type: Improvement
>          Components: Data Processors
>    Affects Versions: 0.4.0, 0.6.0
>            Reporter: Jie Huang
>            Priority: Minor
>
> We have found that if we partitioned the map output data according to 
> ReduceType, we can see the data skew in some HiTune cases. Then one or two 
> Reduce Tasks slow down the whole Demux job somehow, since those reduce tasks 
> have to process more input-data.    

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CHUKWA-647) Spread out intermediate data with the same ReduceType into different Reduce Tasks

Reply via email to