(answered inline)
Le mer. 13 juin 2018 à 18:42, Lukasz Cwik <lc...@google.com> a écrit : > Thanks for the example Romain. > > I took a look through it and was wondering whether it is only the root > objects in the deserialization tree that need to implement > SerializableService? > Do lots of things need to implement SerializableService typically? > yes, on the one entering the (de)serialization process must handle that > What do you do with types that you don't control (for example do you > create wrapper types)? > Like beam classes? ;) You can instrument their bytecode like in https://github.com/Talend/component-runtime/blob/master/component-runtime-beam/src/main/java/org/talend/sdk/component/runtime/beam/transformer/BeamIOTransformer.java#L208 (sorry i dont use bytebuddy but directly asm). This is quite easy to do as soon as you have a classloader for these classes or - if you reuse the jvm classloader - a javaagent or equivalent. If you don't want/can't change the bytecode then you manipulate a proxy instead of the real instance (a wrapper): https://github.com/Talend/component-runtime/blob/master/component-runtime-manager/src/main/java/org/talend/sdk/component/runtime/manager/asm/ProxyGenerator.java#L118 . > > On Wed, Jun 6, 2018 at 9:56 PM Romain Manni-Bucau <rmannibu...@gmail.com> > wrote: > >> Note sure the example is atomic enough but in >> https://github.com/Talend/component-runtime/blob/master/component-runtime-manager/src/main/java/org/talend/sdk/component/runtime/manager/finder/StandaloneContainerFinder.java#L40 >> the "instance()" is a singleton used by all the runtime of the framework. >> >> Deserialization happens in >> https://github.com/Talend/component-runtime/blob/master/component-runtime-impl/src/main/java/org/talend/sdk/component/runtime/serialization/SerializableService.java#L26 >> and serialization is about creating this object in a write replace. Then >> the runtime is switching its classloader (runner for beam) as in >> https://github.com/Talend/component-runtime/blob/master/component-runtime-impl/src/main/java/org/talend/sdk/component/runtime/base/LifecycleImpl.java#L60 >> asap and resets it once done to not break its environment for reused jvms >> case. >> >> If we take the case of an IO, the io would lazily creates its defined >> classloader from its spec and use some reference counting logic to destroy >> it when needed in its teardown for instance. The io then does the >> classloader switch in its callbacks (setup/teardown/process/bundle hooks >> etc). >> >> >> Le mer. 6 juin 2018 23:33, Lukasz Cwik <lc...@google.com> a écrit : >> >>> Romain, can you point to an example of a global singleton registry that >>> does this right for class loading (it may allow people to work towards such >>> an effort)? >>> >>> On Tue, Jun 5, 2018 at 10:06 PM Romain Manni-Bucau < >>> rmannibu...@gmail.com> wrote: >>> >>>> It is actually very localised in runner code where beam should reset >>>> the classloader when the deserialization happens and then the runner owns >>>> the classloader all the way in evaluators. >>>> >>>> If IO change the classloader they must indeed handle it too and patch >>>> the deserialization too. >>>> >>>> Here again (we mentionned it multiple times in other threads) beam >>>> misses a global singleton registry where you can register a "service" to >>>> look it up based of a serialization configuration and a lifecycle allowing >>>> to close the classloader in all instances without hacks. >>>> >>>> >>>> Le mar. 5 juin 2018 23:50, Kenneth Knowles <k...@google.com> a écrit : >>>> >>>>> Perhaps we can also adopt a practice of making our own APIs explicitly >>>>> pass a Classloader when appropriate so we only have to set this when we >>>>> are >>>>> entering code that does not have good hygiene. It might actually be nice >>>>> to >>>>> have a lightweight static analysis to forbid bad methods in our code. >>>>> >>>>> Kenn >>>>> >>>>> On Mon, Jun 4, 2018 at 3:43 PM Lukasz Cwik <lc...@google.com> wrote: >>>>> >>>>>> I totally agree, but there are so many Java APIs (including ours) >>>>>> that messed this up so everyone lives with the same hack. >>>>>> >>>>>> On Mon, Jun 4, 2018 at 3:41 PM Andrew Pilloud <apill...@google.com> >>>>>> wrote: >>>>>> >>>>>>> It seems like a terribly fragile way to pass arguments but my tests >>>>>>> pass when I wrap the JDBC path into Beam pipeline execution with that >>>>>>> pattern. >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>>> Andrew >>>>>>> >>>>>>> On Mon, Jun 4, 2018 at 3:20 PM Lukasz Cwik <lc...@google.com> wrote: >>>>>>> >>>>>>>> It is a common mistake for APIs to not include a way to specify >>>>>>>> which class loader to use when doing something like deserializing an >>>>>>>> instance of a class via the ObjectInputStream. This common issue also >>>>>>>> affects Apache Beam (SerializableCoder, PipelineOptionsFactory, ...) >>>>>>>> and >>>>>>>> the way that typical Java APIs have gotten around this is to use to the >>>>>>>> thread context class loader (TCCL) as the way to plumb this additional >>>>>>>> attribute through. So Apache Beam is meant to in all places honor the >>>>>>>> TCCL >>>>>>>> if it has been set as most Java libraries (not all) do the same hack. >>>>>>>> >>>>>>>> In most environments the TCCL is not set and we are working with a >>>>>>>> single class loader. It turns out that in more complicated environments >>>>>>>> (like when loading a JDBC driver, or JNDI, or an application server, >>>>>>>> ...) >>>>>>>> this usually doesn't work without each caller knowing what class >>>>>>>> loading >>>>>>>> context they should be in. A common work around for most scenarios is >>>>>>>> to >>>>>>>> always set the TCCL to the current classes class loader like so before >>>>>>>> invoking any APIs that do class loading so you don't propagate the >>>>>>>> TCCL of >>>>>>>> the caller along since they may have set it for some other reason: >>>>>>>> >>>>>>>> ClassLoader originalClassLoader = >>>>>>>> Thread.currentThread().getContextClassLoader();try { >>>>>>>> >>>>>>>> Thread.currentThread().setContextClassLoader(getClass().getClassLoader()); >>>>>>>> // call some API that uses reflection without taking ClassLoader >>>>>>>> param} finally { >>>>>>>> Thread.currentThread().setContextClassLoader(originalClassLoader);} >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Jun 4, 2018 at 1:57 PM Andrew Pilloud <apill...@google.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> I'm having class loading issues that go away when I revert the >>>>>>>>> changes in our use of Class.forName added in >>>>>>>>> https://github.com/apache/beam/pull/4674. The problem I'm having >>>>>>>>> is that the typical JDBC GUI (SqlWorkbench/J, SQuirreL SQL) creates an >>>>>>>>> isolated class loader to load our library. Things work if we call >>>>>>>>> Class.forName with the default class loader >>>>>>>>> [getClass().getClassLoader() or >>>>>>>>> no argument] but not if we use the thread context class loader >>>>>>>>> [Thread.currentThread().getContextClassLoader() or >>>>>>>>> ReflectHelpers.findClassLoader()]. Why is using the default class >>>>>>>>> loader >>>>>>>>> not the right thing to do? How can I fix this problem? >>>>>>>>> >>>>>>>>> See this integration test for an example: >>>>>>>>> https://github.com/apilloud/beam/blob/directrunner/sdks/java/extensions/sql/jdbc/src/test/java/org/apache/beam/sdk/extensions/sql/jdbc/JdbcIT.java#L44 >>>>>>>>> >>>>>>>>> https://scans.gradle.com/s/iquqinhns2ymi/tests/slmg6ytuuqlus-akh5xpgshj32k >>>>>>>>> >>>>>>>>> Andrew >>>>>>>>> >>>>>>>>