Chetan, Thanks for writing this up.
Apex depends on the Hadoop client (with it's version of dependency X). If Apex changes X to another version and that version isn't backward compatible, then the Hadoop client is potentially broken. Trying to solve this with class loaders can be tricky. One approach is to completely isolate class loaders. In some systems that works (EJB or servlet containers). But here, we depend on Hadoop so we cannot isolate it. So what if we really really need a newer version of X? That's where shading helps. We don't touch whatever Hadoop was certified to work with. Instead, we repackage X and change the import statements in our code that depends on it so there is no conflict with the rest of the stack. That's what we had to do with ASM, I guess this is why Flink shaded guava. Thomas On Wed, Nov 25, 2015 at 2:35 PM, Chetan Narsude (cnarsude) < [email protected]> wrote: > Continuing this thread to elaborate on what I meant: > > When an application developer writes Apex application, and they use binary > incompatible version of jar (using extreme case to simplify elaboration) > than that Hadoop depends on, depending upon how the class path is defined > - JVM will either load the version that application wants or the one that > Hadoop wants and one of the two entities are destined to malfunction. I > faced this situation with an early Apex customer incidentally with the > same jar under discussion - Guava and had no way out. Incidentally > Cloudera¹s revision to their distro included the version of Guava we > wanted and conflict went away. > > Another flavor is (which we have fixed both in cli and gateway now) is > when cli is used to launch application1 which has a class a.b.c.classname > and later without terminating the cli is used to launch application2 > which requires a different class which incidentally uses the same package > and classname - application2 is launched with the class which was meant > for application1. Although this is more likely to happen with dependency > version changes, the first time I encountered - I was making changes to > the code, recompiling, and creating a new jar. > > Servlet Containers apparently solve this problem because they have to load > multiple applications within their own dependency (possibly conflicting) > in the same JVM space. They do some black magic (it seems trivial though) > with ClassLoaders. With Apex we do not do any of this magic (due to tight > coupling among Hadoop/Apex/Application) so the entire stack is vulnerable > to class leaks. > > It can be solved in theory and has been proven by Servlet Containers but I > hear one more person saying (Ted - correct me if I mis-interpret) that > it¹s a steep effort. > > ‹ > Chetan > > > > On 11/25/15, 12:31 PM, "Ted Dunning" <[email protected]> wrote: > > >On Thu, Nov 26, 2015 at 4:15 AM, Timothy Farkas <[email protected]> > >wrote: > > > >> I thought frameworks such as OSGi are used to isolate classes. > >>Application > >> servers like GlassFish use that technique to isolate application > >> dependencies from platform dependencies. Cask has also implemented a > >> similar technique for their platform > >> > > > >I have tried to use OSGI in the past. It was mind bogglingly difficult to > >get right. My sense is that OSGI has decreased in popularity quite a lot > >in > >recent years. Glassfish is hardly gaining adherents at this point. > >
