[
https://issues.apache.org/jira/browse/SPARK-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15289676#comment-15289676
]
Charles Allen commented on SPARK-6305:
--------------------------------------
Shading is often used as an artificial ClassLoader with the exception that you
can't replace classes by replacing jars. So if it is used as a "we don't want
you to replace stock classes" then that's fine, but if it is used as
"ClassLoader isolation and dependency tracking is hard" then ultimately that is
bad. (Spark has lots of fun classloaders to deal with, I know it is not trivial)
For a usage example from my side, the spark druid indexer at
https://github.com/metamx/druid-spark-batch uses good-'ol-fashioned-jars
(without shading or assembly) with some primitive classloader isolation through
https://github.com/druid-io/druid/blob/master/indexing-service/src/main/java/io/druid/indexing/common/task/HadoopTask.java#L128
this means the following is in a directory which is loaded in a classloader for
the driver:
activation-1.1.1.jar
akka-actor_2.10-2.3.11.jar
akka-remote_2.10-2.3.11.jar
akka-slf4j_2.10-2.3.11.jar
aopalliance-1.0.jar
asm-3.1.jar
avro-1.7.7.jar
avro-ipc-1.7.7.jar
avro-ipc-1.7.7-tests.jar
avro-mapred-1.7.7-hadoop2.jar
base64-2.3.8.jar
bcprov-jdk15on-1.51.jar
chill_2.10-0.5.0.jar
chill-java-0.5.0.jar
commons-beanutils-1.7.0.jar
commons-beanutils-core-1.8.0.jar
commons-cli-1.2.jar
commons-codec-1.10.jar
commons-collections-3.2.1.jar
commons-compress-1.4.1.jar
commons-configuration-1.6.jar
commons-digester-1.8.jar
commons-httpclient-3.1.jar
commons-io-2.4.jar
commons-lang-2.6.jar
commons-lang3-3.3.2.jar
commons-math3-3.4.1.jar
commons-net-2.2.jar
compress-lzf-1.0.3.jar
config-1.2.1.jar
curator-client-2.4.0.jar
curator-framework-2.4.0.jar
curator-recipes-2.4.0.jar
gmbal-api-only-3.0.0-b023.jar
grizzly-framework-2.1.2.jar
grizzly-http-2.1.2.jar
grizzly-http-server-2.1.2.jar
grizzly-http-servlet-2.1.2.jar
grizzly-rcm-2.1.2.jar
guice-3.0.jar
hadoop-annotations-2.4.0-mmx6.jar
hadoop-auth-2.4.0-mmx6.jar
hadoop-client-2.4.0-mmx6.jar
hadoop-common-2.4.0-mmx6.jar
hadoop-hdfs-2.4.0-mmx6.jar
hadoop-mapreduce-client-app-2.4.0-mmx6.jar
hadoop-mapreduce-client-common-2.4.0-mmx6.jar
hadoop-mapreduce-client-core-2.4.0-mmx6.jar
hadoop-mapreduce-client-jobclient-2.4.0-mmx6.jar
hadoop-mapreduce-client-shuffle-2.4.0-mmx6.jar
hadoop-yarn-api-2.2.0.jar
hadoop-yarn-client-2.2.0.jar
hadoop-yarn-common-2.2.0.jar
hadoop-yarn-server-common-2.4.0-mmx6.jar
httpclient-4.3.6.jar
httpcore-4.3.3.jar
ivy-2.4.0.jar
jackson-annotations-2.4.0.jar
jackson-core-2.4.4.jar
jackson-core-asl-1.9.13.jar
jackson-databind-2.4.4.jar
jackson-jaxrs-1.9.13.jar
jackson-mapper-asl-1.9.13.jar
jackson-module-scala_2.10-2.4.4.jar
jackson-xc-1.9.13.jar
javax.inject-1.jar
java-xmlbuilder-1.0.jar
javax.servlet-3.0.0.v201112011016.jar
javax.servlet-3.1.jar
javax.servlet-api-3.0.1.jar
jaxb-api-2.2.2.jar
jaxb-impl-2.2.3-1.jar
jcl-over-slf4j-1.7.10.jar
jersey-client-1.9.jar
jersey-core-1.9.jar
jersey-grizzly2-1.9.jar
jersey-guice-1.9.jar
jersey-json-1.9.jar
jersey-server-1.9.jar
jersey-test-framework-core-1.9.jar
jersey-test-framework-grizzly2-1.9.jar
jets3t-0.9.3.jar
jettison-1.1.jar
jetty-util-6.1.26.jar
jline-0.9.94.jar
json4s-ast_2.10-3.2.10.jar
json4s-core_2.10-3.2.10.jar
json4s-jackson_2.10-3.2.10.jar
jsr305-1.3.9.jar
jul-to-slf4j-1.7.10.jar
kryo-2.21.jar
log4j-1.2.17.jar
lz4-1.3.0.jar
mail-1.4.7.jar
management-api-3.0.0-b012.jar
mesos-0.21.1-shaded-protobuf.jar
metrics-core-3.1.2.jar
metrics-graphite-3.1.2.jar
metrics-json-3.1.2.jar
metrics-jvm-3.1.2.jar
minlog-1.2.jar
mx4j-3.0.2.jar
netty-3.8.0.Final.jar
netty-all-4.0.29.Final.jar
objenesis-1.2.jar
oro-2.0.8.jar
paranamer-2.6.jar
protobuf-java-2.5.0.jar
py4j-0.8.2.1.jar
pyrolite-4.4.jar
reflectasm-1.07-shaded.jar
RoaringBitmap-0.4.5.jar
scala-compiler-2.10.4.jar
scala-library-2.10.4.jar
scalap-2.10.4.jar
scala-reflect-2.10.4.jar
slf4j-api-1.7.10.jar
slf4j-log4j12-1.7.10.jar
snappy-java-1.1.1.7.jar
spark-core_2.10-1.5.2-mmx4.jar
spark-launcher_2.10-1.5.2-mmx4.jar
spark-network-common_2.10-1.5.2-mmx4.jar
spark-network-shuffle_2.10-1.5.2-mmx4.jar
spark-unsafe_2.10-1.5.2-mmx4.jar
stream-2.7.0.jar
tachyon-client-0.7.1.jar
tachyon-underfs-hdfs-0.7.1.jar
tachyon-underfs-local-0.7.1.jar
uncommons-maths-1.2.2a.jar
unused-1.0.0.jar
xmlenc-0.52.jar
xz-1.0.jar
zookeeper-3.4.5.jar
So that's the list of jars spark thinks it needs to get a driver to connect and
launch a task.
I haven't bothered to go through and clean out the unwanted jars because the
classloader isolation is smart (lucky?) enough to where they don't interfere.
The point is, I can go replace specific jars to control what versions of stuff
are used. For example, I can update mesos versions for the driver independent
of spark versions, or change the version of hadoop utilized by spark. There is
an argument to be made that enforcing running the Spark test suite against
these libs is probably a good thing (especially ones with not-so-strong API
guarantees), so in that case being able to change classes (jars) after
compile/test time is probably bad.
If this approach doesn't fit in with how spark is intended to work I'd love to
hear improved suggestions for java-based, automated, spark-driver launching
frameworks.
Long story short (too late!), class and dependency problems are hard to solve
reliably, and shading is one mode of isolation that works better in some ways
or worse in others.
I think that's getting off track from the log4j question though.
Ultimately if I'm building an application which incorporates Spark in the
solution, I want to be able to control the logging of the application in some
consistent manner. Operationally this usually means "Have one logging config
that will work in most places" or at least with minimal modification.
So really there are two things that need propagated around:
1. A logging config
2. A logging impl that can understand and enforce the config
I *think* the OP for this request (and certainly the ask from my side) is that
*at least* the executor bundle be able to accommodate these two things so that
I can package them as part of my deployment.
In spark 1.6.1 there seem to be at least some awareness in spark.Logging of
non-log4j1.x bindings.
I'm curious if you've ever run into the case where the user does something to
have jul load pretty early (like screwing around with remote jmx), which may
cause jul bindings to not quite be as expected.
Anyways, it seems to me the options for allowing different logging methods at
the executor level (assuming they are slf4j implementations) can be handled
through either making it easy to exclude the default log4j bindings in the
assembly and ALSO making it easy to load up extra files in the executor
classloader (config files and impl-jars), or have a way to package the executor
that doesn't do shaded assembly (maybe just for the slf4j-impl?) and allows
replacing the jars in the executor distribution package, or have an easy way to
change the impl via a maven profile or similar (but I don't think this solves
the configuration side of the problem).
Anyways, just thinking out loud. I haven't been able to screw around with
executor-level logging much.
> Add support for log4j 2.x to Spark
> ----------------------------------
>
> Key: SPARK-6305
> URL: https://issues.apache.org/jira/browse/SPARK-6305
> Project: Spark
> Issue Type: Improvement
> Components: Build
> Reporter: Tal Sliwowicz
> Priority: Minor
>
> log4j 2 requires replacing the slf4j binding and adding the log4j jars in the
> classpath. Since there are shaded jars, it must be done during the build.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]