[
https://issues.apache.org/jira/browse/HADOOP-6107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724076#action_12724076
]
Steve Loughran edited comment on HADOOP-6107 at 6/25/09 5:59 AM:
-----------------------------------------------------------------
as examples of the problem, some client side logs
{code}
[java] 09/06/25 13:41:07 WARN mapred.JobClient: Error reading task
outputConnection refused
[java] 09/06/25 13:41:07 WARN mapred.JobClient: Error reading task
outputConnection refused
[java] 09/06/25 13:41:10 INFO mapred.JobClient: Task Id :
attempt_200906251314_0002_r_000001_0, Status : FAILED
[java] Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
[java] 09/06/25 13:41:10 WARN mapred.JobClient: Error reading task
outputConnection refused
[java] 09/06/25 13:41:10 WARN mapred.JobClient: Error reading task
outputConnection refused
[java] 09/06/25 13:44:07 INFO mapred.JobClient: Task Id :
attempt_200906251314_0002_m_000004_0, Status : FAILED
[java] Too many fetch-failures
[java] 09/06/25 13:44:07 WARN mapred.JobClient: Error reading task
outputConnection refused
[java] 09/06/25 13:44:07 WARN mapred.JobClient: Error reading task
outputConnection refused
[java] 09/06/25 13:44:11 INFO mapred.JobClient: map 83% reduce 0%
[java] 09/06/25 13:44:14 INFO mapred.JobClient: map 100% reduce 0%
[java] 09/06/25 13:49:23 INFO mapred.JobClient: Task Id :
attempt_200906251314_0002_m_000005_0, Status : FAILED
[java] Too many fetch-failures
[java] 09/06/25 13:49:23 WARN mapred.JobClient: Error reading task
outputConnection refused
[java] 09/06/25 13:49:23 WARN mapred.JobClient: Error reading task
outputConnection refused
[java] 09/06/25 13:49:27 INFO mapred.JobClient: map 83% reduce 0%
{code}
# bad spacing in the " Error reading task outputConnection refused" message.
# not enough context as to why the connection was being refused: need to
include the (hostname, port) details -which would change the message and break
chukwa
# no stack trace in the connection refused message
# not enough context in the JobClient messages; if >1 job is running
simultaneously, you cant determine what the map and reduce is referring to
# The shuffle error doesn't actually say what the MAX_FAILED_UNIQUE_FETCHES
value is.
was (Author: steve_l):
as examples of the problem, some client side logs
{{code}
[java] 09/06/25 13:41:07 WARN mapred.JobClient: Error reading task
outputConnection refused
[java] 09/06/25 13:41:07 WARN mapred.JobClient: Error reading task
outputConnection refused
[java] 09/06/25 13:41:10 INFO mapred.JobClient: Task Id :
attempt_200906251314_0002_r_000001_0, Status : FAILED
[java] Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
[java] 09/06/25 13:41:10 WARN mapred.JobClient: Error reading task
outputConnection refused
[java] 09/06/25 13:41:10 WARN mapred.JobClient: Error reading task
outputConnection refused
[java] 09/06/25 13:44:07 INFO mapred.JobClient: Task Id :
attempt_200906251314_0002_m_000004_0, Status : FAILED
[java] Too many fetch-failures
[java] 09/06/25 13:44:07 WARN mapred.JobClient: Error reading task
outputConnection refused
[java] 09/06/25 13:44:07 WARN mapred.JobClient: Error reading task
outputConnection refused
[java] 09/06/25 13:44:11 INFO mapred.JobClient: map 83% reduce 0%
[java] 09/06/25 13:44:14 INFO mapred.JobClient: map 100% reduce 0%
[java] 09/06/25 13:49:23 INFO mapred.JobClient: Task Id :
attempt_200906251314_0002_m_000005_0, Status : FAILED
[java] Too many fetch-failures
[java] 09/06/25 13:49:23 WARN mapred.JobClient: Error reading task
outputConnection refused
[java] 09/06/25 13:49:23 WARN mapred.JobClient: Error reading task
outputConnection refused
[java] 09/06/25 13:49:27 INFO mapred.JobClient: map 83% reduce 0%
{code}
# bad spacing in the " Error reading task outputConnection refused" message.
# not enough context as to why the connection was being refused: need to
include the (hostname, port) details -which would change the message and break
chukwa
# no stack trace in the connection refused message
# not enough context in the JobClient messages; if >1 job is running
simultaneously, you cant determine what the map and reduce is referring to
# The shuffle error doesn't actually say what the MAX_FAILED_UNIQUE_FETCHES
value is.
> Have some log messages designed for machine parsing, either real-time or
> post-mortem
> ------------------------------------------------------------------------------------
>
> Key: HADOOP-6107
> URL: https://issues.apache.org/jira/browse/HADOOP-6107
> Project: Hadoop Common
> Issue Type: Improvement
> Affects Versions: 0.21.0
> Reporter: Steve Loughran
>
> Many programs take the log output of bits of Hadoop, and try and parse it.
> Some may also put their own back end behind commons-logging, to capture the
> input without going via Log4J, so as to keep the output more machine-readable.
> These programs need log messages that
> # are easy to parse by a regexp or other simple string parse (consider
> quoting values, etc)
> # push out the full exception chain rather than stringify() bits of it
> # stay stable across versions
> # log the things the tools need to analyse: events, data volumes, errors
> For these logging tools, ease of parsing, retention of data and stability
> over time take the edge over readability. In HADOOP-5073, Jiaqi Tan proposed
> marking some of the existing log events as evolving towards stability. As
> someone who regulary patches log messages to improve diagnostics, this
> creates a conflict of interest. For me, good logs are ones that help people
> debug their problems without anyone else helping, and if that means improving
> the text, so be it. Tools like Chukwa have a different need.
> What to do? Some options
> # Have some messages that are designed purely for other programs to handle
> # Have some logs specifically for machines, to which we log alongside the
> human-centric messages
> # Fix many of the common messages, then leave them alone.
> # Mark log messages to be left alone (somehow)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.