[ https://issues.apache.org/jira/browse/TEZ-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14361379#comment-14361379 ]
Rajesh Balamohan commented on TEZ-2021: --------------------------------------- This has been tested on small cluster with 20 nodes. It would be really helpful if you would like to try it out and provide your comments. * Apply this patch. * Build tez-tfile-parser in $TEZ/tez-tools/tez-tfile-parser/ ** "mvn clean package" * Populate env.sh in $TEZ/tez-tools/perf-analyzer/shuffle/ ** PIG_HOME, TEZ_HOME ** YARN_APP_LOGS_LOCATION *** "yarn.log-aggregation-enable" is set to true in the cluster *** Note down "yarn.nodemanager.remote-app-log-dir & yarn.nodemanager.remote-app-log-dir-suffix" parameters in your cluster and setup YARN_APP_LOGS_LOCATIONin env.sh appropriately * This requires "gnuplot" in the machine where you are planning to run. * Run "sh gnuplot.sh <application_id>" (In case you would like to parse some other user's job, you might want to set "export APP_USER=appUserWhoRanTheJob" before running this) > Tez tool to analyze shuffle performance in large clusters by mining task logs > ----------------------------------------------------------------------------- > > Key: TEZ-2021 > URL: https://issues.apache.org/jira/browse/TEZ-2021 > Project: Apache Tez > Issue Type: Improvement > Reporter: Rajesh Balamohan > Assignee: Rajesh Balamohan > Attachments: TEZ-2021.1.patch, TEZ-2021.2.patch, > avg_time_Taken_after_fix.png, avg_time_taken_to_download.png, > no_of_times_contacted.png, total_data_transferred.png > > > Tez tool to analyze shuffle performance in large clusters by mining task > logs. Provide an easier way to visualize (heat chart) and identify bad nodes > in large cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)