[ 
https://issues.apache.org/jira/browse/SPARK-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15247424#comment-15247424
 ] 

Matthew Byng-Maddick commented on SPARK-14703:
----------------------------------------------

Leaving log4j in the build and then routing the calls at runtime gives the 
stacktrace above when trying to construct the sparkContext (which means you 
never actually get a sparkContext, leaving the shell, at least, unusable. As 
best I can tell, it's because the class method for 
((org.apache.log4j.Logger)Logger).getLogger() doesn't actually give you back an 
org.apache.log4j.Logger in that situation, but gives you back a 
ch.qos.logback.Logger instead, which means that there's no polymorphic 
setLevel() with an org.apache.log4j.Level as an argument.

I have to admit to being a little unclear myself, only that doing the class 
matching and adding logback, which seems to be popular, does appear to fix the 
problems and allow the sparkContext both standalone and under yarn to be able 
to be constructed (in particular within the spark-shell, but presumably in 
general).

As to "why logback" - I think I mentioned this, but really 2 reasons for us:
1) online-updatable logging configs (mean that we can changing logging config 
without restarting services
2) we have a bunch of logstash/elasticsearch infrastructure, and logback (with 
a connector class) can natively write directly to logstash instead of writing 
locally and then having to have an agent pick up the data. This allows us to 
collate and correlate our hadoop and hbase logs across the cluster.

Thanks for engaging, and I hope that even if we don't solve the problem in this 
way we can at least get to a point where we're not reliant on using log4j at 
runtime to be able to even use Spark (even if not all the logger config 
features are enabled).

> Spark uses SLF4J, but actually relies quite heavily on Log4J
> ------------------------------------------------------------
>
>                 Key: SPARK-14703
>                 URL: https://issues.apache.org/jira/browse/SPARK-14703
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core, YARN
>    Affects Versions: 1.6.0
>         Environment: 1.6.0-cdh5.7.0, logback 1.1.3, yarn
>            Reporter: Matthew Byng-Maddick
>            Priority: Minor
>              Labels: log4j, logback, logging, slf4j
>         Attachments: spark-logback.patch
>
>
> We've built a version of Hadoop CDH-5.7.0 in house with logback as the SLF4J 
> provider, in order to send hadoop logs straight to logstash (to handle with 
> logstash/elasticsearch), on top of our existing use of the logback backend.
> In trying to start spark-shell I discovered several points where the fact 
> that we weren't quite using a real L4J caused the sc not to be created or the 
> YARN module not to exist. There are many more places where we should probably 
> be wrapping the logging more sensibly, but I have a basic patch that fixes 
> some of the worst offenders (at least the ones that stop the sparkContext 
> being created properly).
> I'm prepared to accept that this is not a good solution and there probably 
> needs to be some sort of better wrapper, perhaps in the Logging.scala class 
> which handles this properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to