[jira] [Commented] (SPARK-6305) Add support for log4j 2.x to Spark

Charles Allen (JIRA) Wed, 18 May 2016 12:56:19 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15289676#comment-15289676
 ]


Charles Allen commented on SPARK-6305:
--------------------------------------

Shading is often used as an artificial ClassLoader with the exception that you 
can't replace classes by replacing jars. So if it is used as a "we don't want 
you to replace stock classes" then that's fine, but if it is used as 
"ClassLoader isolation and dependency tracking is hard" then ultimately that is 
bad. (Spark has lots of fun classloaders to deal with, I know it is not trivial)

For a usage example from my side, the spark druid indexer at 
https://github.com/metamx/druid-spark-batch uses good-'ol-fashioned-jars 
(without shading or assembly) with some primitive classloader isolation through 
https://github.com/druid-io/druid/blob/master/indexing-service/src/main/java/io/druid/indexing/common/task/HadoopTask.java#L128

this means the following is in a directory which is loaded in a classloader for 
the driver:

activation-1.1.1.jar
akka-actor_2.10-2.3.11.jar
akka-remote_2.10-2.3.11.jar
akka-slf4j_2.10-2.3.11.jar
aopalliance-1.0.jar
asm-3.1.jar
avro-1.7.7.jar
avro-ipc-1.7.7.jar
avro-ipc-1.7.7-tests.jar
avro-mapred-1.7.7-hadoop2.jar
base64-2.3.8.jar
bcprov-jdk15on-1.51.jar
chill_2.10-0.5.0.jar
chill-java-0.5.0.jar
commons-beanutils-1.7.0.jar
commons-beanutils-core-1.8.0.jar
commons-cli-1.2.jar
commons-codec-1.10.jar
commons-collections-3.2.1.jar
commons-compress-1.4.1.jar
commons-configuration-1.6.jar
commons-digester-1.8.jar
commons-httpclient-3.1.jar
commons-io-2.4.jar
commons-lang-2.6.jar
commons-lang3-3.3.2.jar
commons-math3-3.4.1.jar
commons-net-2.2.jar
compress-lzf-1.0.3.jar
config-1.2.1.jar
curator-client-2.4.0.jar
curator-framework-2.4.0.jar
curator-recipes-2.4.0.jar
gmbal-api-only-3.0.0-b023.jar
grizzly-framework-2.1.2.jar
grizzly-http-2.1.2.jar
grizzly-http-server-2.1.2.jar
grizzly-http-servlet-2.1.2.jar
grizzly-rcm-2.1.2.jar
guice-3.0.jar
hadoop-annotations-2.4.0-mmx6.jar
hadoop-auth-2.4.0-mmx6.jar
hadoop-client-2.4.0-mmx6.jar
hadoop-common-2.4.0-mmx6.jar
hadoop-hdfs-2.4.0-mmx6.jar
hadoop-mapreduce-client-app-2.4.0-mmx6.jar
hadoop-mapreduce-client-common-2.4.0-mmx6.jar
hadoop-mapreduce-client-core-2.4.0-mmx6.jar
hadoop-mapreduce-client-jobclient-2.4.0-mmx6.jar
hadoop-mapreduce-client-shuffle-2.4.0-mmx6.jar
hadoop-yarn-api-2.2.0.jar
hadoop-yarn-client-2.2.0.jar
hadoop-yarn-common-2.2.0.jar
hadoop-yarn-server-common-2.4.0-mmx6.jar
httpclient-4.3.6.jar
httpcore-4.3.3.jar
ivy-2.4.0.jar
jackson-annotations-2.4.0.jar
jackson-core-2.4.4.jar
jackson-core-asl-1.9.13.jar
jackson-databind-2.4.4.jar
jackson-jaxrs-1.9.13.jar
jackson-mapper-asl-1.9.13.jar
jackson-module-scala_2.10-2.4.4.jar
jackson-xc-1.9.13.jar
javax.inject-1.jar
java-xmlbuilder-1.0.jar
javax.servlet-3.0.0.v201112011016.jar
javax.servlet-3.1.jar
javax.servlet-api-3.0.1.jar
jaxb-api-2.2.2.jar
jaxb-impl-2.2.3-1.jar
jcl-over-slf4j-1.7.10.jar
jersey-client-1.9.jar
jersey-core-1.9.jar
jersey-grizzly2-1.9.jar
jersey-guice-1.9.jar
jersey-json-1.9.jar
jersey-server-1.9.jar
jersey-test-framework-core-1.9.jar
jersey-test-framework-grizzly2-1.9.jar
jets3t-0.9.3.jar
jettison-1.1.jar
jetty-util-6.1.26.jar
jline-0.9.94.jar
json4s-ast_2.10-3.2.10.jar
json4s-core_2.10-3.2.10.jar
json4s-jackson_2.10-3.2.10.jar
jsr305-1.3.9.jar
jul-to-slf4j-1.7.10.jar
kryo-2.21.jar
log4j-1.2.17.jar
lz4-1.3.0.jar
mail-1.4.7.jar
management-api-3.0.0-b012.jar
mesos-0.21.1-shaded-protobuf.jar
metrics-core-3.1.2.jar
metrics-graphite-3.1.2.jar
metrics-json-3.1.2.jar
metrics-jvm-3.1.2.jar
minlog-1.2.jar
mx4j-3.0.2.jar
netty-3.8.0.Final.jar
netty-all-4.0.29.Final.jar
objenesis-1.2.jar
oro-2.0.8.jar
paranamer-2.6.jar
protobuf-java-2.5.0.jar
py4j-0.8.2.1.jar
pyrolite-4.4.jar
reflectasm-1.07-shaded.jar
RoaringBitmap-0.4.5.jar
scala-compiler-2.10.4.jar
scala-library-2.10.4.jar
scalap-2.10.4.jar
scala-reflect-2.10.4.jar
slf4j-api-1.7.10.jar
slf4j-log4j12-1.7.10.jar
snappy-java-1.1.1.7.jar
spark-core_2.10-1.5.2-mmx4.jar
spark-launcher_2.10-1.5.2-mmx4.jar
spark-network-common_2.10-1.5.2-mmx4.jar
spark-network-shuffle_2.10-1.5.2-mmx4.jar
spark-unsafe_2.10-1.5.2-mmx4.jar
stream-2.7.0.jar
tachyon-client-0.7.1.jar
tachyon-underfs-hdfs-0.7.1.jar
tachyon-underfs-local-0.7.1.jar
uncommons-maths-1.2.2a.jar
unused-1.0.0.jar
xmlenc-0.52.jar
xz-1.0.jar
zookeeper-3.4.5.jar

So that's the list of jars spark thinks it needs to get a driver to connect and 
launch a task.

I haven't bothered to go through and clean out the unwanted jars because the 
classloader isolation is smart (lucky?) enough to where they don't interfere.

The point is, I can go replace specific jars to control what versions of stuff 
are used. For example, I can update mesos versions for the driver independent 
of spark versions, or change the version of hadoop utilized by spark. There is 
an argument to be made that enforcing running the Spark test suite against 
these libs is probably a good thing (especially ones with not-so-strong API 
guarantees), so in that case being able to change classes (jars) after 
compile/test time is probably bad.

If this approach doesn't fit in with how spark is intended to work I'd love to 
hear improved suggestions for java-based, automated, spark-driver launching 
frameworks.

Long story short (too late!), class and dependency problems are hard to solve 
reliably, and shading is one mode of isolation that works better in some ways 
or worse in others.

I think that's getting off track from the log4j question though.

Ultimately if I'm building an application which incorporates Spark in the 
solution, I want to be able to control the logging of the application in some 
consistent manner. Operationally this usually means "Have one logging config 
that will work in most places" or at least with minimal modification.

So really there are two things that need propagated around:

1. A logging config
2. A logging impl that can understand and enforce the config

I *think* the OP for this request (and certainly the ask from my side) is that 
*at least* the executor bundle be able to accommodate these two things so that 
I can package them as part of my deployment.

In spark 1.6.1 there seem to be at least some awareness in spark.Logging of 
non-log4j1.x bindings.

I'm curious if you've ever run into the case where the user does something to 
have jul load pretty early (like screwing around with remote jmx), which may 
cause jul bindings to not quite be as expected.

Anyways, it seems to me the options for allowing different logging methods at 
the executor level (assuming they are slf4j implementations) can be handled 
through either making it easy to exclude the default log4j bindings in the 
assembly and ALSO making it easy to load up extra files in the executor 
classloader (config files and impl-jars), or have a way to package the executor 
that doesn't do shaded assembly (maybe just for the slf4j-impl?) and allows 
replacing the jars in the executor distribution package, or have an easy way to 
change the impl via a maven profile or similar (but I don't think this solves 
the configuration side of the problem).

Anyways, just thinking out loud. I haven't been able to screw around with 
executor-level logging much.

> Add support for log4j 2.x to Spark
> ----------------------------------
>
>                 Key: SPARK-6305
>                 URL: https://issues.apache.org/jira/browse/SPARK-6305
>             Project: Spark
>          Issue Type: Improvement
>          Components: Build
>            Reporter: Tal Sliwowicz
>            Priority: Minor
>
> log4j 2 requires replacing the slf4j binding and adding the log4j jars in the 
> classpath. Since there are shaded jars, it must be done during the build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-6305) Add support for log4j 2.x to Spark

Reply via email to