[ https://issues.apache.org/jira/browse/HADOOP-6107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724076#action_12724076 ]
Steve Loughran commented on HADOOP-6107: ---------------------------------------- as examples of the problem, some client side logs {{code} [java] 09/06/25 13:41:07 WARN mapred.JobClient: Error reading task outputConnection refused [java] 09/06/25 13:41:07 WARN mapred.JobClient: Error reading task outputConnection refused [java] 09/06/25 13:41:10 INFO mapred.JobClient: Task Id : attempt_200906251314_0002_r_000001_0, Status : FAILED [java] Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. [java] 09/06/25 13:41:10 WARN mapred.JobClient: Error reading task outputConnection refused [java] 09/06/25 13:41:10 WARN mapred.JobClient: Error reading task outputConnection refused [java] 09/06/25 13:44:07 INFO mapred.JobClient: Task Id : attempt_200906251314_0002_m_000004_0, Status : FAILED [java] Too many fetch-failures [java] 09/06/25 13:44:07 WARN mapred.JobClient: Error reading task outputConnection refused [java] 09/06/25 13:44:07 WARN mapred.JobClient: Error reading task outputConnection refused [java] 09/06/25 13:44:11 INFO mapred.JobClient: map 83% reduce 0% [java] 09/06/25 13:44:14 INFO mapred.JobClient: map 100% reduce 0% [java] 09/06/25 13:49:23 INFO mapred.JobClient: Task Id : attempt_200906251314_0002_m_000005_0, Status : FAILED [java] Too many fetch-failures [java] 09/06/25 13:49:23 WARN mapred.JobClient: Error reading task outputConnection refused [java] 09/06/25 13:49:23 WARN mapred.JobClient: Error reading task outputConnection refused [java] 09/06/25 13:49:27 INFO mapred.JobClient: map 83% reduce 0% {code} # bad spacing in the " Error reading task outputConnection refused" message. # not enough context as to why the connection was being refused: need to include the (hostname, port) details -which would change the message and break chukwa # no stack trace in the connection refused message # not enough context in the JobClient messages; if >1 job is running simultaneously, you cant determine what the map and reduce is referring to # The shuffle error doesn't actually say what the MAX_FAILED_UNIQUE_FETCHES value is. > Have some log messages designed for machine parsing, either real-time or > post-mortem > ------------------------------------------------------------------------------------ > > Key: HADOOP-6107 > URL: https://issues.apache.org/jira/browse/HADOOP-6107 > Project: Hadoop Common > Issue Type: Improvement > Affects Versions: 0.21.0 > Reporter: Steve Loughran > > Many programs take the log output of bits of Hadoop, and try and parse it. > Some may also put their own back end behind commons-logging, to capture the > input without going via Log4J, so as to keep the output more machine-readable. > These programs need log messages that > # are easy to parse by a regexp or other simple string parse (consider > quoting values, etc) > # push out the full exception chain rather than stringify() bits of it > # stay stable across versions > # log the things the tools need to analyse: events, data volumes, errors > For these logging tools, ease of parsing, retention of data and stability > over time take the edge over readability. In HADOOP-5073, Jiaqi Tan proposed > marking some of the existing log events as evolving towards stability. As > someone who regulary patches log messages to improve diagnostics, this > creates a conflict of interest. For me, good logs are ones that help people > debug their problems without anyone else helping, and if that means improving > the text, so be it. Tools like Chukwa have a different need. > What to do? Some options > # Have some messages that are designed purely for other programs to handle > # Have some logs specifically for machines, to which we log alongside the > human-centric messages > # Fix many of the common messages, then leave them alone. > # Mark log messages to be left alone (somehow) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.