[
https://issues.apache.org/jira/browse/TEZ-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561283#comment-14561283
]
Rohini Palaniswamy commented on TEZ-2491:
-----------------------------------------
[~jlowe] is working on storing tez events to disk and parsing them like
mapreduce JHS as ATS does not scale for us with the direct publishing from AMs.
Capturing his thoughts below on some of the possible fixes from the discussion
we had on the size analysis he did after logging the events to the file.
Analysis:
- A 40K+ task created a file size of 517.8M. We suspected configuration
was taking up a lot of space, but it was only 6MB. Task events had taken up
499MB of space
- 426mb of the 499mb are finished events. Half of which is attempt
finished events. So counters being logged twice is the most of it
Possible fixes:
- There are some odd counters. "WRONG_REDUCE", "WRONG_MAP", etc. seems like
counters that should never be non-zero in practice, so sort of a waste to emit
them over and over and over. Realize they _could_ occur, but seems so rare to
bother dedicating a counter just for those cases. Would be nice to omit zero
counters. Looking up a non-existent counter means you get zero, so why bother
storing it explicitly This could break if people were iterating over Group
Counters instead of direct counter lookup. For eg: Pig iterates over counter
groups for RANK implementation in mapreduce, but application should handle
missing counters as empty maps and reducers will not produce counters in
mapreduce. So that should not be an issue. Or can omit sending send counters
when running, but send only on completion (succeeded, failed, killed) in case
it might be still required for something like counter drill down in UI or
analytics of the jobs themselves later.
- Counter display names take a lot of space
{code}
{'counterDisplayName': 'BAD_ID',
'counterName': 'BAD_ID',
'counterValue': 0},
{code}
Can omit if name and display name are same. Will require UI changes.
Better would be store the display names only once for all counters for the app.
Reducing the counter size will also reduce memory usage on AM and allow it to
process task events faster.
> Optimize storage and exchange of Counters for better scaling
> ------------------------------------------------------------
>
> Key: TEZ-2491
> URL: https://issues.apache.org/jira/browse/TEZ-2491
> Project: Apache Tez
> Issue Type: Task
> Reporter: Rohini Palaniswamy
>
> Counters take up a lot of space in the task events generated and is a
> major bottleneck for scaling ATS. [~jlowe] found a lot of potential
> optimizations. Using this as an umbrella jira for documentation. Can create
> sub-tasks later after discussions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)