+1 for Michel's points.
With these point in mind, solution (3) does not look attractive -- at
least as long as the tools for immediate access to such logs are
perfect.
I like solution (2). Concatenation is separate issue -- the important
thing is immediate availability. Once the logs are written, it will
be not to hard to provide web UI to access them.
Maybe solution (2) be modified so that the messages from all tasks go
to the single DFS files -- each line of the logs prefixed with task ID
and time stamp?
On Aug 29, 2006, at 12:19 PM, Michel Tourn (JIRA) wrote:
[
http://issues.apache.org/jira/browse/HADOOP-489?
page=comments#action_12431327 ]
Michel Tourn commented on HADOOP-489:
-------------------------------------
I don't know what is the best implementation.
But the normal requirements for this info are:
1. Task log data should be made available in real-time. The main point
is to debug execution problems at startupt. The point is not to gather
after-the-fact throughput metrics on successful jobs.
Example messages we want to see immediately (so we can kill the job):
/usr/custombin/perl : program not found
/usr/bin/perl: myscript.pl: file not found
2. Leave any log aggregation to the consumers. Aggregation gets in the
way of timeliness: after the FIRST map task fails as in 1., we want to
access its log. Waiting for failure-everywhere is not an acceptable
alternative. (due to time delays, partial success and 3x reexecution)
3. Nice to have: access to a single Task log is as a stream, not a
file or blob.
i.e. we do not need to wait for Task termination to access its log.
However this is more constraining on the implementation, and for the
above use case of failure at launch it is not necessary.
Seperating user logs from system logs in map reduce
---------------------------------------------------
Key: HADOOP-489
URL: http://issues.apache.org/jira/browse/HADOOP-489
Project: Hadoop
Issue Type: Improvement
Components: mapred
Reporter: Mahadev konar
Assigned To: Mahadev konar
Priority: Minor
Currently the user logs are a part of system logs in mapreduce.
Anything logged by the user is logged into the tasktracker log files.
This create two issues-
1) The system log files get cluttered with user output. If the user
outputs a large amount of logs, the system logs need to be cleaned up
pretty often.
2) For the user, it is difficult to get to each of the machines and
look for the logs his/her job might have generated.
I am proposing three solutions to the problem. All of them have
issues with it -
Solution 1.
Output the user logs on the user screen as part of the job submission
process.
Merits-
This will prevent users from printing large amount of logs and the
user can get runtime feedback on what is wrong with his/her job.
Issues -
This proposal will use the framework bandwidth while running jobs for
the user. The user logs will need to pass from the tasks to the
tasktrackers, from the tasktrackers to the jobtrackers and then from
the jobtrackers to the jobclient using a lot of framework bandwidth
if the user is printing out too much data.
Solution 2.
Output the user logs onto a dfs directory and then concatenate these
files. Each task can create a file for the output in the log
direcotyr for a given user and jobid.
Issues -
This will create a huge amount of small files in DFS which later can
be concatenated into a single file. Also there is this issue that who
would concatenate these files into a single file? This could be done
by the framework (jobtracker) as part of the cleanup for the jobs -
might stress the jobtracker.
Solution 3.
Put the user logs into a seperate user log file in the log directory
on each tasktrackers. We can provide some tools to query these local
log files. We could have commands like for jobid j and for taskid t
get me the user log output. These tools could run as a seperate map
reduce program with each map grepping the user log files and a single
recude aggregating these logs in to a single dfs file.
Issues-
This does sound like more work for the user. Also, the output might
not be complete since a tasktracker might have went down after it ran
the job.
Any thoughts?
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the
administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira