[
https://issues.apache.org/jira/browse/TEZ-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739338#comment-14739338
]
Siddharth Seth commented on TEZ-2774:
-------------------------------------
Some statistics on log sizes, on a 20 node cluster
h6. JoinValidate example with 100, 50, 100 (lhsScan, rhsScan, SortMergeJoin)
tasks
||Type||TotalLogSize||AM LogSize||SortMergeLogSize per task||
|Current|38MB|4MB|~300KB|
|Reduced|11MB|2.1MB|~65K|
h6. HashJoin example with 100, 100, 200 (lhsScan, rhsScan, HashJoin) tasks
||Type||TotalLogSize||AM LogSize||HashJoinLogSize per task||
|Current|316MB|7.2MB|~1.6MB|
|Reduced|65MB|3.3MB|~330KB|
That's some pretty large log files that we generate at the moment, which makes
it tougher to read logs as well as hurts performance. Clearly we need adequate
information in the logs to debug in case of issues. Given this affects everyone
trying to debug via log files, please go ahead and modify the patch to add back
/ change whatever is required. While doing this though, running a couple of
jobs will help, and please try looking for information that is already
available via some other source, so that we can try keeping the size of the
logs small.
> Reduce logging in the AM, and parts of the runtime
> --------------------------------------------------
>
> Key: TEZ-2774
> URL: https://issues.apache.org/jira/browse/TEZ-2774
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Siddharth Seth
> Assignee: Siddharth Seth
> Attachments: TEZ-2774.1.txt, TEZ-2774.2.txt, TEZ-2774.3.txt
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)