[ 
https://issues.apache.org/jira/browse/TEZ-2775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739337#comment-14739337
 ] 

Siddharth Seth commented on TEZ-2775:
-------------------------------------

Some statistics on log sizes, on a 20 node cluster
h6. JoinValidate example with 100, 50, 100 (lhsScan, rhsScan, SortMergeJoin) 
tasks
||Type||TotalLogSize||AM LogSize||SortMergeLogSize per task||
|Current|38MB|4MB|~300KB|
|Reduced|11MB|2.1MB|~65K|

h6. HashJoin example with 100, 100, 200 (lhsScan, rhsScan, HashJoin) tasks
||Type||TotalLogSize||AM LogSize||HashJoinLogSize per task||
|Current|316MB|7.2MB|~1.6MB|
|Reduced|65MB|3.3MB|~330KB|

That's some pretty large log files that we generate at the moment, which makes 
it tougher to read logs as well as hurts performance. Clearly we need adequate 
information in the logs to debug in case of issues. Given this affects everyone 
trying to debug via log files, please go ahead and modify the patch to add back 
/ change whatever is required. While doing this though, running a couple of 
jobs will help, and please try looking for information that is already 
available via some other source, so that we can try keeping the size of the 
logs small. 

> Reduce logging in runtime components
> ------------------------------------
>
>                 Key: TEZ-2775
>                 URL: https://issues.apache.org/jira/browse/TEZ-2775
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Siddharth Seth
>            Assignee: Siddharth Seth
>         Attachments: TEZ-2775.1.txt
>
>
> Specifically Shuffle, which logs a lot for each event being processed and 
> data being fetched.
> Also PipelinedShuffle is fairly noisy - some of the information from here 
> could be consolidated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to