[ 
https://issues.apache.org/jira/browse/HADOOP-6107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724078#action_12724078
 ] 

Steve Loughran commented on HADOOP-6107:
----------------------------------------

and server side, similar issues

{code}
09/06/25 13:54:26 INFO datanode.DataNode: BlockReport of 8 blocks got processed 
in 1 msecs
09/06/25 13:54:26 INFO mapred.TaskTracker: attempt_200906251314_0002_r_000001_1 
0.0% reduce > copy > 
09/06/25 13:54:29 INFO mapred.TaskTracker: attempt_200906251314_0002_r_000001_1 
0.0% reduce > copy > 
09/06/25 13:54:29 INFO mapred.TaskTracker: attempt_200906251314_0002_r_000000_1 
0.0% reduce > copy > 
09/06/25 13:54:32 INFO mapred.TaskTracker: attempt_200906251314_0002_r_000000_1 
0.0% reduce > copy > 
09/06/25 13:54:35 INFO mapred.TaskTracker: attempt_200906251314_0002_r_000001_1 
0.0% reduce > copy > 
09/06/25 13:54:36 INFO datanode.DataNode: BlockReport of 8 blocks got processed 
in 2 msecs
09/06/25 13:54:38 INFO mapred.JobInProgress: Failed fetch notification #3 for 
task attempt_200906251314_0002_m_000000_1
09/06/25 13:54:38 INFO mapred.JobInProgress: Too many fetch-failures for output 
of task: attempt_200906251314_0002_m_000000_1 ... killing it
09/06/25 13:54:38 INFO mapred.TaskInProgress: Error from 
attempt_200906251314_0002_m_000000_1: Too many fetch-failures
{code}
The task tracker lists attempt IDs, but nothing includes any datanode or 
tasktracker identity information. If you are aggregating the logs you need 
that, which means you need to stick extra data on every log event that comes 
in, and you have to track all their values. Aggregation is simpler if the 
machine-level events include more sender info.


> Have some log messages designed for machine parsing, either real-time or 
> post-mortem
> ------------------------------------------------------------------------------------
>
>                 Key: HADOOP-6107
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6107
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 0.21.0
>            Reporter: Steve Loughran
>
> Many programs take the log output of bits of Hadoop, and try and parse it. 
> Some may also put their own back end behind commons-logging, to capture the 
> input without going via Log4J, so as to keep the output more machine-readable.
> These programs need log messages that
> # are easy to parse by a regexp or other simple string parse  (consider 
> quoting values, etc)
> # push out the full exception chain rather than stringify() bits of it
> # stay stable across versions
> # log the things the tools need to analyse: events, data volumes, errors
> For these logging tools, ease of parsing, retention of data and stability 
> over time take the edge over readability. In HADOOP-5073, Jiaqi Tan proposed 
> marking some of the existing log events as evolving towards stability. As 
> someone who regulary patches log messages to improve diagnostics, this 
> creates a conflict of interest. For me, good logs are ones that help people 
> debug their problems without anyone else helping, and if that means improving 
> the text, so be it. Tools like Chukwa have a different need. 
> What to do? Some options
>  # Have some messages that are designed purely for other programs to handle
>  # Have some logs specifically for machines, to which we log alongside the 
> human-centric messages
>  # Fix many of the common messages, then leave them alone.
>  # Mark log messages to be left alone (somehow)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to