this might be a reasonable short term solution. Although it adds a lot of complexity.

I was assuming master aggregation. This clearly could be a burden (good point), although if we simply log launch, completion and failure events, that should be ok. Maybe we should stick to that? Or only record failure logs by default?

Another approach would be to launch a job to reap entries whenever there get to be a large number, and just concatenate them together. Say concatenate the smallest 100 together whenever we get to 200?

On Mar 22, 2006, at 9:21 AM, Stefan Groschupf wrote:

Hi,

In case we would be able to query a host that runs a specific maprunnable from the jobtracker, we would be able to run one logging server as map task and tasktrackers can send log messages to this logging server. From my point of view this would be easier to implement than multiple writers to one dfs file.

Just my 2 cents.
Greetings,
Stefan


Am 22.03.2006 um 18:10 schrieb Yoram Arnon:

DFS files can only be written once, and by a single writer.
Until that changes our hands are tied, as long as we require the output to
reside in the output directory.

Unless... we create a protocol whereby the task masters report up to the job
master, and it's only the job master that does the logging.
That might introduce unwanted overhead and some load on the job master.


-----Original Message-----
From: Eric Baldeschwieler [mailto:[EMAIL PROTECTED]
Sent: Tuesday, March 21, 2006 8:54 PM
To: [email protected]
Subject: Re: [jira] Commented: (HADOOP-92) Error Reporting/ logging in
MapReduce

Will it really make sense to have 300,000 subdirectories with several log files? Seems like a real loosing proposition. I'd just go for a
single log file with reasonable per line prefixes (time, job, ...).

Then you can grep out what you want.




---------------------------------------------------------------
company:        http://www.media-style.com
forum:        http://www.text-mining.org
blog:            http://www.find23.net



Reply via email to