Nigel Daley wrote:
On May 9, 2008, at 2:03 AM, Steve Loughran wrote:
Owen O'Malley wrote:
On May 8, 2008, at 10:21 AM, Doug Cutting wrote:
I'd go with something closer to an accidental policy. My suspicion
is that logging framework didn't print nested exceptions well. Owen
is the father of stringifyException and may have more insight.
Yeah, it was my fault. The log4j was misconfigured, so we didn't get
the exception traces out of the messages. I didn't realize it was a
misconfiguration until much later and it had become standard practice
within Hadoop. *sigh* I've fixed a couple of them, but there are a
lot more.
OK, that means when I encounter them I can change them.
Somewhere on my todo list is way better Junit reports, including logs
from multiple machines and stack traces in machine readable
form...getting the exceptions out raw is one of the requirements for
this to work
http://wiki.apache.org/ant/Proposals/EnhancedTestReports
-steve
Hey Steve,
Hope Hudson is on your list of CI servers to test wrt to Ant Junit
report changes :-)
I hope so too. One of the big problems is backwards compatibility...the
original junit report saves a summary in the attributes of the root
node, so you can't stream the results out, you have to buffer it -then,
if the JVM crashes, you get a 0 byte file.
I'm not sure what the ideal solution would be here, what I may do is
generate the new format alongside the old, or do a backup format which
can be used to postmortem a JVM crash when it happens.
On another note, have you used SmartFrog to config/deploy Hadoop? If
so, I'm interested in your experiences.
I'm working on it in the period of my time I get to do interesting
stuff; you can track the status here:
http://jira.smartfrog.org/jira/browse/SFOS-780
-I have the ability for hadoop to get its state from our configuration
files, not the existing XML files
-I can submit work to an existing cluster
-we can poll a cluster for being in a working condition (job tracker
live &C), and block actions until that state is reached
-I'm just bringing up namenode and data nodes;
-I'm also adding components to do HDFS maintenance: formatting,
balancing, etc.
What is nice so far is that it does work with our testing components, so
I can run a test that brings up a cluster with some parameters (such as
JVM, replication options) and try something, then tear that down and do
a different set. That should be good for testing interesting values
-like what happens on different replication options, JVM tuning etc. And
I can get the logs back into one place. You don't necessarily want that
in production (one logger=one point of failure), but its good in small
test runs.
By the end of the month I should have something that others can play
with; we're going to talk about it at the UK hadoop users meeting in
august. I'm also taking notes of where I've had to do ugly things where
some changes to hadoop core would make things a lot better. So far
-make it easier to get configuration information from some form of
external factory
-have the services - namenode, datanode, etc, all have a lifecycle
interface, with a base class that provides stable thread safe
startup/ping/shutdown.
-make package scoped stuff in NameNode/DataNode private, maybe with
protected accessors
-find where exceptions are being stringified before nesting/logging
and retain in their raw form
These changes aren't smartfrog-specific, they're just the things you
need to do to manage hadoop better from inside the JVM.
Currently my code is here
http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/components/hadoop/
with subclassed things in the apache packages to get at package-scoped
content.
http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/components/hadoop/src/org/apache/hadoop/dfs/
-steve