[
https://issues.apache.org/jira/browse/TEZ-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jonathan Eagles updated TEZ-3452:
---------------------------------
Description:
Overflow can occur when the numTasks is high (say 45000) and outputSize is high
(say 311TB) and slow start is set to 1.0.
{code:title=ShuffleVertexManager}
for (Map.Entry<String, SourceVertexInfo> vInfo : getBipartiteInfo()) {
SourceVertexInfo srcInfo = vInfo.getValue();
if (srcInfo.numTasks > 0 && srcInfo.numVMEventsReceived > 0) {
// this assumes that 1 vmEvent is received per completed task - TEZ-2961
expectedTotalSourceTasksOutputSize +=
(srcInfo.numTasks * srcInfo.outputSize) /
srcInfo.numVMEventsReceived;
}
}
{code}
> Auto-reduce parallelism calculation can overflow with large inputs
> ------------------------------------------------------------------
>
> Key: TEZ-3452
> URL: https://issues.apache.org/jira/browse/TEZ-3452
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Jonathan Eagles
> Assignee: Jonathan Eagles
>
> Overflow can occur when the numTasks is high (say 45000) and outputSize is
> high (say 311TB) and slow start is set to 1.0.
> {code:title=ShuffleVertexManager}
> for (Map.Entry<String, SourceVertexInfo> vInfo : getBipartiteInfo()) {
> SourceVertexInfo srcInfo = vInfo.getValue();
> if (srcInfo.numTasks > 0 && srcInfo.numVMEventsReceived > 0) {
> // this assumes that 1 vmEvent is received per completed task -
> TEZ-2961
> expectedTotalSourceTasksOutputSize +=
> (srcInfo.numTasks * srcInfo.outputSize) /
> srcInfo.numVMEventsReceived;
> }
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)