[
https://issues.apache.org/jira/browse/TEZ-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rajesh Balamohan updated TEZ-2565:
----------------------------------
Attachment: mem_usage_without_patch.png
mem_usage_with_patch.png
cpu_usage_without_patch.png
cpu_usage_with_patch.png
TEZ-2565.3.patch
Attaching revised patch.
- Computing only succeeded tasks. Running stats can be added post TEZ-2496
experiments.
- Removed clear()/setupIOStats() etc
- Cache entry is added via TaskCompletedTransition
- constructStats() just iterates over the succeeded tasks & merges.
- Ran 10K x 10K scattergather with TestMemoryWithEvents. For testing purpose,
added getVertexStatistics() in ShuffleVertexManager->onSourceTaskCompleted.
This was done just to simulate load.
- Attaching the yourkit profile with/without patch. As mentioned earlier, time
spent in GC is reduced from 31 seconds to 24 seconds with patch. Not much
difference in CPU usage. With TEZ-2496, there would be good difference in CPU
usage, as every task would end up using its own BitSet that needs to be merged.
> Consider scanning unfinished tasks in VertexImpl::constructStatistics to
> reduce merge overhead
> ----------------------------------------------------------------------------------------------
>
> Key: TEZ-2565
> URL: https://issues.apache.org/jira/browse/TEZ-2565
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Rajesh Balamohan
> Assignee: Rajesh Balamohan
> Attachments: TEZ-2565.1.patch, TEZ-2565.2.patch, TEZ-2565.3.patch,
> cpu_usage_with_patch.png, cpu_usage_without_patch.png,
> mem_usage_with_patch.png, mem_usage_without_patch.png
>
>
> constructStatistics() can be an overhead (scanning all tasks and merging
> stats) depending on the number of invocations to Vertex::getStatistics().
> Consider scanning only unfinished tasks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)