On Mon, Sep 24, 2012 at 11:09 AM, Matthias Friedrich <[email protected]> wrote: > On Sunday, 2012-09-23, Josh Wills wrote: >> On Sun, Sep 23, 2012 at 2:11 AM, Matthias Friedrich <[email protected]> wrote: >>> On Saturday, 2012-09-22, Josh Wills wrote: >>> [...] > >> Ah, okay. So what we want for debugging is the Hadoop WARN logs. When >> a hadoop job fails on the cluster, we have those logs available on the >> JobTracker webpage (at least, I do in CDH, I assume it works the same >> way in Hadoop 1.0.3), so enableDebug doesn't do anything for us >> (besides altering the Configuration to force Crunch to put try-catch >> blocks around the DoNode tasks, which I assume still works fine). I >> use enableDebug to force the logging of Hadoop WARN statements on my >> machine when I'm testing out pipelines, so in that case, it's only >> effecting LocalJobRunner. > > Yep. I think we could remove log4j.properties, the log4j setup code in > enableDebug(), and the log4j dependency from Crunch and the behavior > on the cluster should still be the same. The same holds for > LocalJobRunner when running via "hadoop jar". > > Running the LocalJobRunner from the IDE is the problem because then > we need a logging backend on the classpath. If we don't have log4j, > then java.util.logging is used, which logs everything on INFO level. > As soon as log4j is on the classpath, however, the user really needs > a log4j.properties or log4j will complain that it doesn't have > configuration (and logs nothing). > >> Given that, what's the best approach here? Javadoc statement on the >> function indicating its intended use, or is there a better option? > > I'd say let's remove log4j.properties from Crunch, because users > can't defend themselves against it. We have local applications at > work that run some parts locally, without anything Hadoop-specific; > shipping a log4j.properties with Crunch would cause problems for us. > > We could then add a log4j.properties to src/main/resources in the > archetype with an explanation of when exactly this configuration is > used (only when running from the IDE). We would keep enableDebug() > with its setting of "crunch.debug", but remove the log4j code, and add > a "provided" log4j dependency to the archetype (because log4j is > missing from hadoop-core). > > Does this make sense? Will this give you the logging/debugging output > that you need?
I'm on board with that plan. My one tweak would be to add support for hacking log4j to turn on Hadoop's WARN logs into the crunch-test functionality, which I think will serve my needs from within the IDE and won't interfere with any production or client log settings. Does that meet your goals as well? > > [...] >>> Ah, that reminds me: We haven't decided yet if we want an archetype in >>> Crunch. > >> I want one. I thought you created it? I remember seeing an email-- if >> I didn't reply, it was b/c I was in the midst of that crazy travel >> week and my sleep schedule was off (honestly, I'm just now >> recovering.) > > No worries, I'm a bit sleep-deprived myself so I can relate. With > Gabriel we're +3 pro archetype, so I'll make a patch this weekend. > > Regards, > Matthias -- Director of Data Science Cloudera Twitter: @josh_wills
