Thanks Paul - it isn't mean to be a "full solution" but just a fix for the 0.9 branch - for the full solution there is another PR by Sean Owen.
On Sun, Feb 9, 2014 at 11:35 PM, Paul Brown <p...@mult.ifario.us> wrote: > Hi, Patrick -- > > I gave that a go locally, and it works as desired. > > Best. > -- Paul > > -- > p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/ > > > On Fri, Feb 7, 2014 at 6:10 PM, Patrick Wendell <pwend...@gmail.com> wrote: > >> Ah okay sounds good. This is what I meant earlier by "You have >> some other application that directly calls log4j."... i.e. you have >> for historical reasons installed the log4j-over-slf4j. >> >> Would you mind trying out this fix and seeing if it works? This is >> designed to be a hotfix for 0.9, not a general solution where we rip >> out log4j from our published dependencies: >> >> https://github.com/apache/incubator-spark/pull/560/files >> >> - Patrick >> >> On Fri, Feb 7, 2014 at 5:57 PM, Paul Brown <p...@mult.ifario.us> wrote: >> > Hi, Patrick -- >> > >> > I forget which other component is responsible, but we're using the >> > log4j-over-slf4j as part of an overall requirement to centralize logging, >> > i.e., *someone* else is logging over log4j and we're pulling that in. >> > (There's also some jul logging from Jersey, etc.) >> > >> > Goals: >> > >> > - Fully control/capture all possible logging. (God forbid we have to >> grab >> > System.out/err, but we'd do it if needed.) >> > - Use the backend we like best at the moment. (Happens to be logback.) >> > >> > Possible cases: >> > >> > - If Spark used Log4j at all, we would pull in that logging via >> > log4j-over-slf4j. >> > - If Spark used only slf4j and referenced no backend, we would use it >> as-is >> > although we'd still have the log4j-over-slf4j because of other libraries. >> > - If Spark used only slf4j and referenced the slf4j-log4j12 backend, we >> > would exclude that one dependency (via our POM). >> > >> > Best. >> > -- Paul >> > >> > >> > -- >> > p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/ >> > >> > >> > On Fri, Feb 7, 2014 at 5:38 PM, Patrick Wendell <pwend...@gmail.com> >> wrote: >> > >> >> Hey Paul, >> >> >> >> So if your goal is ultimately to output to logback. Then why don't you >> >> just use slf4j and logback-classic.jar as described here [1]. Why >> >> involve log4j-over-slf4j at all? >> >> >> >> Let's say we refactored the spark build so it didn't advertise >> >> slf4j-log4j12 as a dependency. Would you still be using >> >> log4j-over-slf4j... or is this just a "fix" to deal with the fact that >> >> Spark is somewhat log4j dependent at this point. >> >> >> >> [1] http://www.slf4j.org/manual.html >> >> >> >> - Patrick >> >> >> >> On Fri, Feb 7, 2014 at 5:14 PM, Paul Brown <p...@mult.ifario.us> wrote: >> >> > Hi, Patrick -- >> >> > >> >> > That's close but not quite it. >> >> > >> >> > The issue that occurs is not the delegation loop mentioned in slf4j >> >> > documentation. The stack overflow is entirely within the code in the >> >> Spark >> >> > trait: >> >> > >> >> > at org.apache.spark.Logging$class.initializeLogging(Logging.scala:112) >> >> > at >> org.apache.spark.Logging$class.initializeIfNecessary(Logging.scala:97) >> >> > at org.apache.spark.Logging$class.log(Logging.scala:36) >> >> > at org.apache.spark.SparkEnv$.log(SparkEnv.scala:94) >> >> > >> >> > >> >> > And then that repeats. >> >> > >> >> > As for our situation, we exclude the slf4j-log4j12 dependency when we >> >> > import the Spark library (because we don't want to use log4j) and have >> >> > log4j-over-slf4j already in place to ensure that all of the logging in >> >> the >> >> > overall application runs through slf4j and then out through logback. >> (We >> >> > also, as another poster already mentioned, also force jcl and jul >> through >> >> > slf4j.) >> >> > >> >> > The zen of slf4j for libraries is that the library uses the slf4j API >> and >> >> > then the enclosing application can route logging as it sees fit. >> Spark >> >> > master CLI would log via slf4j and include the slf4j-log4j12 backend; >> >> same >> >> > for Spark worker CLI. Spark as a library (versus as a container) >> would >> >> not >> >> > include any backend to the slf4j API and leave this up to the >> >> application. >> >> > (FWIW, this would also avoid your log4j warning message.) >> >> > >> >> > But as I was saying before, I'd be happy with a situation where I can >> >> avoid >> >> > log4j being enabled or configured, and I think you'll find an existing >> >> > choice of logging framework to be a common scenario for those >> embedding >> >> > Spark in other systems. >> >> > >> >> > Best. >> >> > -- Paul >> >> > >> >> > -- >> >> > p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/ >> >> > >> >> > >> >> > On Fri, Feb 7, 2014 at 3:01 PM, Patrick Wendell <pwend...@gmail.com> >> >> wrote: >> >> > >> >> >> Paul, >> >> >> >> >> >> Looking back at your problem. I think it's the one here: >> >> >> http://www.slf4j.org/codes.html#log4jDelegationLoop >> >> >> >> >> >> So let me just be clear what you are doing so I understand. You have >> >> >> some other application that directly calls log4j. So you have to >> >> >> include log4j-over-slf4j to route those logs through slf4j to >> logback. >> >> >> >> >> >> At the same time you embed Spark in this application. In the past it >> >> >> was fine, but now that Spark programmatic ally initializes log4j, it >> >> >> screws up your application because log4j-over-slf4j doesn't work with >> >> >> applications that do this explicilty as discussed here: >> >> >> http://www.slf4j.org/legacy.html >> >> >> >> >> >> Correct? >> >> >> >> >> >> - Patrick >> >> >> >> >> >> On Fri, Feb 7, 2014 at 2:02 PM, Koert Kuipers <ko...@tresata.com> >> >> wrote: >> >> >> > got it. that sounds reasonable >> >> >> > >> >> >> > >> >> >> > On Fri, Feb 7, 2014 at 2:31 PM, Patrick Wendell < >> pwend...@gmail.com> >> >> >> wrote: >> >> >> > >> >> >> >> Koert - my suggestion was this. We let users use any slf4j backend >> >> >> >> they want. If we detect that they are using the log4j backend and >> >> >> >> *also* they didn't configure any log4j appenders, we set up some >> nice >> >> >> >> defaults for them. If they are using another backend, Spark >> doesn't >> >> >> >> try to modify the configuration at all. >> >> >> >> >> >> >> >> On Fri, Feb 7, 2014 at 11:14 AM, Koert Kuipers <ko...@tresata.com >> > >> >> >> wrote: >> >> >> >> > well "static binding" is probably the wrong terminology but you >> get >> >> >> the >> >> >> >> > idea. multiple backends are not allowed and cause an even uglier >> >> >> >> warning... >> >> >> >> > >> >> >> >> > see also here: >> >> >> >> > https://github.com/twitter/scalding/pull/636 >> >> >> >> > and here: >> >> >> >> > >> https://groups.google.com/forum/#!topic/cascading-user/vYvnnN_15ls >> >> >> >> > all me being annoying and complaining about slf4j-log4j12 >> >> dependencies >> >> >> >> > (which did get removed). >> >> >> >> > >> >> >> >> > >> >> >> >> > On Fri, Feb 7, 2014 at 2:09 PM, Koert Kuipers < >> ko...@tresata.com> >> >> >> wrote: >> >> >> >> > >> >> >> >> >> the issue is that slf4j uses static binding. you can put only >> one >> >> >> slf4j >> >> >> >> >> backend on the classpath, and that's what it uses. more than >> one >> >> is >> >> >> not >> >> >> >> >> allowed. >> >> >> >> >> >> >> >> >> >> so you either keep the slf4j-log4j12 dependency for spark, and >> >> then >> >> >> you >> >> >> >> >> took away people's choice of slf4j backend which is considered >> bad >> >> >> form >> >> >> >> for >> >> >> >> >> a library, or you do not include it and then people will always >> >> get >> >> >> the >> >> >> >> big >> >> >> >> >> fat ugly warning and slf4j logging will not flow to log4j. >> >> >> >> >> >> >> >> >> >> including log4j itself is not necessary a problem i think? >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Fri, Feb 7, 2014 at 1:11 PM, Patrick Wendell < >> >> pwend...@gmail.com >> >> >> >> >wrote: >> >> >> >> >> >> >> >> >> >>> This also seems relevant - but not my area of expertise >> (whether >> >> >> this >> >> >> >> >>> is a valid way to check this). >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >> >> >> >> >> >> http://stackoverflow.com/questions/10505418/how-to-find-which-library-slf4j-has-bound-itself-to >> >> >> >> >>> >> >> >> >> >>> On Fri, Feb 7, 2014 at 10:08 AM, Patrick Wendell < >> >> >> pwend...@gmail.com> >> >> >> >> >>> wrote: >> >> >> >> >>> > Hey Guys, >> >> >> >> >>> > >> >> >> >> >>> > Thanks for explainning. Ya this is a problem - we didn't >> really >> >> >> know >> >> >> >> >>> > that people are using other slf4j backends, slf4j is in >> there >> >> for >> >> >> >> >>> > historical reasons but I think we may assume in a few places >> >> that >> >> >> >> >>> > log4j is being used and we should minimize those. >> >> >> >> >>> > >> >> >> >> >>> > We should patch this and get a fix into 0.9.1. So some >> >> solutions I >> >> >> >> see >> >> >> >> >>> are: >> >> >> >> >>> > >> >> >> >> >>> > (a) Add SparkConf option to disable this. I'm fine with this >> >> one. >> >> >> >> >>> > >> >> >> >> >>> > (b) Ask slf4j which backend is active and only try to >> enforce >> >> this >> >> >> >> >>> > default if we know slf4j is using log4j. Do either of you >> know >> >> if >> >> >> >> this >> >> >> >> >>> > is possible? Not sure if slf4j exposes this. >> >> >> >> >>> > >> >> >> >> >>> > (c) Just remove this default stuff. We'd rather not do this. >> >> The >> >> >> goal >> >> >> >> >>> > of this thing is to provide good usability for people who >> have >> >> >> linked >> >> >> >> >>> > against Spark and haven't done anything to configure >> logging. >> >> For >> >> >> >> >>> > beginners we try to minimize the assumptions about what else >> >> they >> >> >> >> know >> >> >> >> >>> > about, and I've found log4j configuration is a huge mental >> >> barrier >> >> >> >> for >> >> >> >> >>> > people who are getting started. >> >> >> >> >>> > >> >> >> >> >>> > Paul if you submit a patch doing (a) we can merge it in. If >> you >> >> >> have >> >> >> >> >>> > any idea if (b) is possible I prefer that one, but it may >> not >> >> be >> >> >> >> >>> > possible or might be brittle. >> >> >> >> >>> > >> >> >> >> >>> > - Patrick >> >> >> >> >>> > >> >> >> >> >>> > On Fri, Feb 7, 2014 at 6:36 AM, Koert Kuipers < >> >> ko...@tresata.com> >> >> >> >> >>> wrote: >> >> >> >> >>> >> Totally agree with Paul: a library should not pick the >> slf4j >> >> >> >> backend. >> >> >> >> >>> It >> >> >> >> >>> >> defeats the purpose of slf4j. That big ugly warning is >> there >> >> to >> >> >> >> alert >> >> >> >> >>> >> people that its their responsibility to pick the back >> end... >> >> >> >> >>> >> On Feb 7, 2014 3:55 AM, "Paul Brown" <p...@mult.ifario.us> >> >> wrote: >> >> >> >> >>> >> >> >> >> >> >>> >>> Hi, Patrick -- >> >> >> >> >>> >>> >> >> >> >> >>> >>> From slf4j, you can either backend it into log4j (which is >> >> the >> >> >> way >> >> >> >> >>> that >> >> >> >> >>> >>> Spark is shipped) or you can route log4j through slf4j and >> >> then >> >> >> on >> >> >> >> to >> >> >> >> >>> a >> >> >> >> >>> >>> different backend (e.g., logback). We're doing the latter >> >> and >> >> >> >> >>> manipulating >> >> >> >> >>> >>> the dependencies in the build because that's the way the >> >> >> enclosing >> >> >> >> >>> >>> application is set up. >> >> >> >> >>> >>> >> >> >> >> >>> >>> The issue with the current situation is that there's no >> way >> >> for >> >> >> an >> >> >> >> >>> end user >> >> >> >> >>> >>> to choose to *not* use the log4j backend. (My short-term >> >> >> solution >> >> >> >> >>> was to >> >> >> >> >>> >>> use the Maven shade plugin to swap in a version of the >> >> Logging >> >> >> >> trait >> >> >> >> >>> with >> >> >> >> >>> >>> the body of that method commented out.) In addition to >> the >> >> >> >> situation >> >> >> >> >>> with >> >> >> >> >>> >>> log4j-over-slf4j and the empty enumeration of ROOT >> appenders, >> >> >> you >> >> >> >> >>> might >> >> >> >> >>> >>> also run afoul of someone who intentionally configured >> log4j >> >> >> with >> >> >> >> an >> >> >> >> >>> empty >> >> >> >> >>> >>> set of appenders at the time that Spark is initializing. >> >> >> >> >>> >>> >> >> >> >> >>> >>> I'd be happy with any implementation that lets me choose >> my >> >> >> logging >> >> >> >> >>> >>> backend: override default behavior via system property, >> >> plug-in >> >> >> >> >>> >>> architecture, etc. I do think it's reasonable to expect >> >> someone >> >> >> >> >>> digesting >> >> >> >> >>> >>> a substantial JDK-based system like Spark to understand >> how >> >> to >> >> >> >> >>> initialize >> >> >> >> >>> >>> logging -- surely they're using logging of some kind >> >> elsewhere >> >> >> in >> >> >> >> >>> their >> >> >> >> >>> >>> application -- but if you want the default behavior there >> as >> >> a >> >> >> >> >>> courtesy, it >> >> >> >> >>> >>> might be worth putting an INFO (versus a the glaring log4j >> >> WARN) >> >> >> >> >>> message on >> >> >> >> >>> >>> the output that says something like "Initialized default >> >> logging >> >> >> >> via >> >> >> >> >>> Log4J; >> >> >> >> >>> >>> pass -Dspark.logging.loadDefaultLogger=false to disable >> this >> >> >> >> >>> behavior." so >> >> >> >> >>> >>> that it's both convenient and explicit. >> >> >> >> >>> >>> >> >> >> >> >>> >>> Cheers. >> >> >> >> >>> >>> -- Paul >> >> >> >> >>> >>> >> >> >> >> >>> >>> >> >> >> >> >>> >>> >> >> >> >> >>> >>> >> >> >> >> >>> >>> >> >> >> >> >>> >>> >> >> >> >> >>> >>> -- >> >> >> >> >>> >>> p...@mult.ifario.us | Multifarious, Inc. | >> >> >> http://mult.ifario.us/ >> >> >> >> >>> >>> >> >> >> >> >>> >>> >> >> >> >> >>> >>> On Fri, Feb 7, 2014 at 12:05 AM, Patrick Wendell < >> >> >> >> pwend...@gmail.com> >> >> >> >> >>> >>> wrote: >> >> >> >> >>> >>> >> >> >> >> >>> >>> > A config option e.g. could just be to add: >> >> >> >> >>> >>> > >> >> >> >> >>> >>> > spark.logging.loadDefaultLogger (default true) >> >> >> >> >>> >>> > If set to true, Spark will try to initialize a log4j >> >> logger if >> >> >> >> none >> >> >> >> >>> is >> >> >> >> >>> >>> > detected. Otherwise Spark will not modify logging >> behavior. >> >> >> >> >>> >>> > >> >> >> >> >>> >>> > Then users could just set this to false if they have a >> >> logging >> >> >> >> >>> set-up >> >> >> >> >>> >>> > that conflicts with this. >> >> >> >> >>> >>> > >> >> >> >> >>> >>> > Maybe there is a nicer fix... >> >> >> >> >>> >>> > >> >> >> >> >>> >>> > On Fri, Feb 7, 2014 at 12:03 AM, Patrick Wendell < >> >> >> >> >>> pwend...@gmail.com> >> >> >> >> >>> >>> > wrote: >> >> >> >> >>> >>> > > Hey Paul, >> >> >> >> >>> >>> > > >> >> >> >> >>> >>> > > Thanks for digging this up. I worked on this feature >> and >> >> the >> >> >> >> >>> intent >> >> >> >> >>> >>> > > was to give users good default behavior if they didn't >> >> >> include >> >> >> >> any >> >> >> >> >>> >>> > > logging configuration on the classpath. >> >> >> >> >>> >>> > > >> >> >> >> >>> >>> > > The problem with assuming that CL tooling is going to >> fix >> >> >> the >> >> >> >> job >> >> >> >> >>> is >> >> >> >> >>> >>> > > that many people link against spark as a library and >> run >> >> >> their >> >> >> >> >>> >>> > > application using their own scripts. In this case the >> >> first >> >> >> >> thing >> >> >> >> >>> >>> > > people see when they run an application that links >> >> against >> >> >> >> Spark >> >> >> >> >>> was a >> >> >> >> >>> >>> > > big ugly logging warning. >> >> >> >> >>> >>> > > >> >> >> >> >>> >>> > > I'm not super familiar with log4j-over-slf4j, but this >> >> >> >> behavior of >> >> >> >> >>> >>> > > returning null for the appenders seems a little weird. >> >> What >> >> >> is >> >> >> >> >>> the use >> >> >> >> >>> >>> > > case for using this and not just directly use >> >> slf4j-log4j12 >> >> >> >> like >> >> >> >> >>> Spark >> >> >> >> >>> >>> > > itself does? >> >> >> >> >>> >>> > > >> >> >> >> >>> >>> > > Did you have a more general fix for this in mind? Or >> was >> >> >> your >> >> >> >> >>> plan to >> >> >> >> >>> >>> > > just revert the existing behavior... We might be able >> to >> >> >> add a >> >> >> >> >>> >>> > > configuration option to disable this logging default >> >> stuff. >> >> >> Or >> >> >> >> we >> >> >> >> >>> >>> > > could just rip it out - but I'd like to avoid that if >> >> >> possible. >> >> >> >> >>> >>> > > >> >> >> >> >>> >>> > > - Patrick >> >> >> >> >>> >>> > > >> >> >> >> >>> >>> > > On Thu, Feb 6, 2014 at 11:41 PM, Paul Brown < >> >> >> >> p...@mult.ifario.us> >> >> >> >> >>> >>> wrote: >> >> >> >> >>> >>> > >> We have a few applications that embed Spark, and in >> >> 0.8.0 >> >> >> and >> >> >> >> >>> 0.8.1, >> >> >> >> >>> >>> we >> >> >> >> >>> >>> > >> were able to use slf4j, but 0.9.0 broke that and >> >> >> >> unintentionally >> >> >> >> >>> >>> forces >> >> >> >> >>> >>> > >> direct use of log4j as the logging backend. >> >> >> >> >>> >>> > >> >> >> >> >> >>> >>> > >> The issue is here in the org.apache.spark.Logging >> trait: >> >> >> >> >>> >>> > >> >> >> >> >> >>> >>> > >> >> >> >> >> >>> >>> > >> >> >> >> >>> >>> >> >> >> >> >>> >> >> >> >> >> >> >> >> >> >> https://github.com/apache/incubator-spark/blame/master/core/src/main/scala/org/apache/spark/Logging.scala#L107 >> >> >> >> >>> >>> > >> >> >> >> >> >>> >>> > >> log4j-over-slf4j *always* returns an empty >> enumeration >> >> for >> >> >> >> >>> appenders >> >> >> >> >>> >>> to >> >> >> >> >>> >>> > the >> >> >> >> >>> >>> > >> ROOT logger: >> >> >> >> >>> >>> > >> >> >> >> >> >>> >>> > >> >> >> >> >> >>> >>> > >> >> >> >> >>> >>> >> >> >> >> >>> >> >> >> >> >> >> >> >> >> >> https://github.com/qos-ch/slf4j/blob/master/log4j-over-slf4j/src/main/java/org/apache/log4j/Category.java?source=c#L81 >> >> >> >> >>> >>> > >> >> >> >> >> >>> >>> > >> And this causes an infinite loop and an eventual >> stack >> >> >> >> overflow. >> >> >> >> >>> >>> > >> >> >> >> >> >>> >>> > >> I'm happy to submit a Jira and a patch, but it would >> be >> >> >> >> >>> significant >> >> >> >> >>> >>> > enough >> >> >> >> >>> >>> > >> reversal of recent changes that it's probably worth >> >> >> discussing >> >> >> >> >>> before >> >> >> >> >>> >>> I >> >> >> >> >>> >>> > >> sink a half hour into it. My suggestion would be >> that >> >> >> >> >>> initialization >> >> >> >> >>> >>> > (or >> >> >> >> >>> >>> > >> not) should be left to the user with reasonable >> default >> >> >> >> behavior >> >> >> >> >>> >>> > supplied >> >> >> >> >>> >>> > >> by the spark commandline tooling and not forced on >> >> >> >> applications >> >> >> >> >>> that >> >> >> >> >>> >>> > >> incorporate Spark. >> >> >> >> >>> >>> > >> >> >> >> >> >>> >>> > >> Thoughts/opinions? >> >> >> >> >>> >>> > >> >> >> >> >> >>> >>> > >> -- Paul >> >> >> >> >>> >>> > >> -- >> >> >> >> >>> >>> > >> p...@mult.ifario.us | Multifarious, Inc. | >> >> >> >> http://mult.ifario.us/ >> >> >> >> >>> >>> > >> >> >> >> >>> >>> >> >> >> >> >>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>