[ 
https://issues.apache.org/jira/browse/TEZ-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561283#comment-14561283
 ] 

Rohini Palaniswamy commented on TEZ-2491:
-----------------------------------------

[~jlowe] is working on storing tez events to disk and parsing them like 
mapreduce JHS as ATS does not scale for us with the direct publishing from AMs. 
Capturing his thoughts below on some of the possible fixes from the discussion 
we had on the size analysis he did after logging the events to the file.

Analysis:
     - A 40K+ task created a file size of 517.8M. We suspected configuration 
was taking up a lot of space, but it was only 6MB. Task events had taken up 
499MB of space
     - 426mb of the 499mb are finished events. Half of which is attempt 
finished events. So counters being logged twice is the most of it

Possible fixes:
   - There are some odd counters. "WRONG_REDUCE", "WRONG_MAP", etc. seems like 
counters that should never be non-zero in practice, so sort of a waste to emit 
them over and over and over. Realize they _could_ occur, but seems so rare to 
bother dedicating a counter just for those cases. Would be nice to omit zero 
counters. Looking up a non-existent counter means you get zero, so why bother 
storing it explicitly  This could break if people were iterating over Group 
Counters instead of direct counter lookup. For eg: Pig iterates over counter 
groups for RANK implementation in mapreduce, but application should handle 
missing counters as empty maps and reducers will not produce counters in 
mapreduce.  So that should not be an issue. Or can omit sending send counters 
when running, but send only on completion (succeeded, failed, killed) in case 
it might be still required for something like counter drill down in UI or 
analytics of the jobs themselves later.
  -  Counter display names take a lot of space
{code}
{'counterDisplayName': 'BAD_ID',
                'counterName': 'BAD_ID',
                'counterValue': 0},
{code}
     Can omit if name and display name are same. Will require UI changes. 
Better would be store the display names only once for all counters for the app.

  Reducing the counter size will also reduce memory usage on AM and allow it to 
process task events faster.

> Optimize storage and exchange of Counters for better scaling
> ------------------------------------------------------------
>
>                 Key: TEZ-2491
>                 URL: https://issues.apache.org/jira/browse/TEZ-2491
>             Project: Apache Tez
>          Issue Type: Task
>            Reporter: Rohini Palaniswamy
>
>      Counters take up a lot of space in the task events generated and is a 
> major bottleneck for scaling ATS. [~jlowe] found a lot of potential 
> optimizations. Using this as an umbrella jira for documentation. Can create 
> sub-tasks later after discussions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to