[
https://issues.apache.org/jira/browse/SPARK-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15247424#comment-15247424
]
Matthew Byng-Maddick commented on SPARK-14703:
----------------------------------------------
Leaving log4j in the build and then routing the calls at runtime gives the
stacktrace above when trying to construct the sparkContext (which means you
never actually get a sparkContext, leaving the shell, at least, unusable. As
best I can tell, it's because the class method for
((org.apache.log4j.Logger)Logger).getLogger() doesn't actually give you back an
org.apache.log4j.Logger in that situation, but gives you back a
ch.qos.logback.Logger instead, which means that there's no polymorphic
setLevel() with an org.apache.log4j.Level as an argument.
I have to admit to being a little unclear myself, only that doing the class
matching and adding logback, which seems to be popular, does appear to fix the
problems and allow the sparkContext both standalone and under yarn to be able
to be constructed (in particular within the spark-shell, but presumably in
general).
As to "why logback" - I think I mentioned this, but really 2 reasons for us:
1) online-updatable logging configs (mean that we can changing logging config
without restarting services
2) we have a bunch of logstash/elasticsearch infrastructure, and logback (with
a connector class) can natively write directly to logstash instead of writing
locally and then having to have an agent pick up the data. This allows us to
collate and correlate our hadoop and hbase logs across the cluster.
Thanks for engaging, and I hope that even if we don't solve the problem in this
way we can at least get to a point where we're not reliant on using log4j at
runtime to be able to even use Spark (even if not all the logger config
features are enabled).
> Spark uses SLF4J, but actually relies quite heavily on Log4J
> ------------------------------------------------------------
>
> Key: SPARK-14703
> URL: https://issues.apache.org/jira/browse/SPARK-14703
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core, YARN
> Affects Versions: 1.6.0
> Environment: 1.6.0-cdh5.7.0, logback 1.1.3, yarn
> Reporter: Matthew Byng-Maddick
> Priority: Minor
> Labels: log4j, logback, logging, slf4j
> Attachments: spark-logback.patch
>
>
> We've built a version of Hadoop CDH-5.7.0 in house with logback as the SLF4J
> provider, in order to send hadoop logs straight to logstash (to handle with
> logstash/elasticsearch), on top of our existing use of the logback backend.
> In trying to start spark-shell I discovered several points where the fact
> that we weren't quite using a real L4J caused the sc not to be created or the
> YARN module not to exist. There are many more places where we should probably
> be wrapping the logging more sensibly, but I have a basic patch that fixes
> some of the worst offenders (at least the ones that stop the sparkContext
> being created properly).
> I'm prepared to accept that this is not a good solution and there probably
> needs to be some sort of better wrapper, perhaps in the Logging.scala class
> which handles this properly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]