On Mon, Jun 2, 2014 at 6:05 PM, Marcelo Vanzin <van...@cloudera.com> wrote: > You mentioned something in your shading argument that kinda reminded > me of something. Spark currently depends on slf4j implementations and > log4j with "compile" scope. I'd argue that's the wrong approach if > we're talking about Spark being used embedded inside applications; > Spark should only depend on the slf4j API package, and let the > application provide the underlying implementation.
Good idea in general; in practice, the drawback is that you can't do things like set log levels if you only depend on the SLF4J API. There are a few cases where that's nice to control, and that's only possible if you bind to a particular logger as well. You typically bundle a SLF4J binding anyway, to give a default, or else the end-user has to know to also bind some SLF4J logger to get output. Of course it does make for a bit more surgery if you want to override the binding this way. Shading can bring a whole new level of confusion; I myself would only use it where essential as a workaround. Same with trying to make more elaborate custom classloading schemes -- never in my darkest nightmares have I imagine the failure modes that probably pop up when that goes wrong. I think the library collisions will get better over time as only later versions of Hadoop are in scope, for example, and/or one build system is in play. I like tackling complexity along those lines first.