Re: Why do we load Hadoop classes in a separate classloader?

Sergi Vladykin Thu, 24 Dec 2015 04:03:02 -0800

Initially this was done to support multithreaded model of running tasks vs
the original multiprocess model of Hadoop.
And as far as I remember there were attempts to reuse Hadoop classes but
they failed.


As for high permgen load, it should not actually be that high, number of
classloaders should be something
slightly higher than number of concurrently running tasks. Task
classloaders are getting pooled and reused.

As for native code, I'm not sure what can be done here, I think
environments like OSGi (Eclipse, etc..) have
the same issue, may be we can look what they do in case of native
dependencies in bundles?

Sergi





2015-12-24 9:09 GMT+03:00 Vladimir Ozerov <[email protected]>:

> Folks,
>
> In current Hadoop Accelerator design we always process user jobs in a
> separate classloader called HadoopClassLoader. It is somewhat special
> because it always loads Hadoop classes from scratch.
>
> This leads to at least two serious problems:
> 1) Very high permgen/metaspace load. Workaround - more permgen.
> 2) Native Hadoop libraries cannot be used. There are quire a few native
> methods in Hadoop. Corresponding dll/so files are loaded in static class
> initializers. As each HadoopClassLoader loads classes over and over again,
> libraries are loaded several times as well. But Java do not allow several
> loads of the same native library from different classloader. Result -> JNI
> linkage errors. For instance, this affects Snappy compress/decompress
> library which is pretty important in Hadoop ecosystem.
>
> Clearly, this isolation with custom class loader was done on purpose. And I
> understand why it is important, for example, for user-defined classes.
>
> But why do we load Hadoop classes (e.g. org.apache.hadoop.fs.FileSystem)
> multiple times? Does any one has a clue?
>
> Vladimir.
>

Re: Why do we load Hadoop classes in a separate classloader?

Reply via email to