[
https://issues.apache.org/jira/browse/IGNITE-4270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15734749#comment-15734749
]
ASF GitHub Bot commented on IGNITE-4270:
----------------------------------------
Github user devozerov closed the pull request at:
https://github.com/apache/ignite/pull/1334
> Hadoop: optionally stripe mapper output for every partition.
> ------------------------------------------------------------
>
> Key: IGNITE-4270
> URL: https://issues.apache.org/jira/browse/IGNITE-4270
> Project: Ignite
> Issue Type: Sub-task
> Components: hadoop
> Affects Versions: 1.8
> Reporter: Vladimir Ozerov
> Assignee: Vladimir Ozerov
> Labels: performance
> Fix For: 2.0
>
>
> Currently we have R maps for M mappers, where R is number of reducers. For
> this reason many mappers writes to concurrent offheap data structure, loosing
> time on concurrency burden.
> Let's add an option to create R * M maps, so that every mapper has dedicated
> map for every reducer. This will eliminate almost all concurrency overhead.
> Design:
> 1) Every mapper works with it's own set of "remote" output maps;
> 2) These maps are essentially not "maps", but IO messages, which we fill up
> to certain threshold;
> 3) Once filled, message is sent to remote node.
> 4) Async shuffle thread is no longer need in this architecture.
> As a result we decrease concurrency, removes slowdown from a single shuffle
> thread which is not able to send messages fast enough, and removes
> unnecessary intermediate sorting.
> NB! Be careful with "combiner" case and with "external" execution.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)