this might be a reasonable short term solution. Although it adds a
lot of complexity.
I was assuming master aggregation. This clearly could be a burden
(good point), although if we simply log launch, completion and
failure events, that should be ok. Maybe we should stick to that?
Or only record failure logs by default?
Another approach would be to launch a job to reap entries whenever
there get to be a large number, and just concatenate them together.
Say concatenate the smallest 100 together whenever we get to 200?
On Mar 22, 2006, at 9:21 AM, Stefan Groschupf wrote:
Hi,
In case we would be able to query a host that runs a specific
maprunnable from the jobtracker,
we would be able to run one logging server as map task and
tasktrackers can send log messages to this logging server.
From my point of view this would be easier to implement than
multiple writers to one dfs file.
Just my 2 cents.
Greetings,
Stefan
Am 22.03.2006 um 18:10 schrieb Yoram Arnon:
DFS files can only be written once, and by a single writer.
Until that changes our hands are tied, as long as we require the
output to
reside in the output directory.
Unless... we create a protocol whereby the task masters report up
to the job
master, and it's only the job master that does the logging.
That might introduce unwanted overhead and some load on the job
master.
-----Original Message-----
From: Eric Baldeschwieler [mailto:[EMAIL PROTECTED]
Sent: Tuesday, March 21, 2006 8:54 PM
To: [email protected]
Subject: Re: [jira] Commented: (HADOOP-92) Error Reporting/
logging in
MapReduce
Will it really make sense to have 300,000 subdirectories with
several
log files? Seems like a real loosing proposition. I'd just go
for a
single log file with reasonable per line prefixes (time, job, ...).
Then you can grep out what you want.
---------------------------------------------------------------
company: http://www.media-style.com
forum: http://www.text-mining.org
blog: http://www.find23.net