Agree, this makes sense. On Wed, Nov 27, 2019 at 6:23 PM Luke Cwik <lc...@google.com> wrote:
> That looks good as well. > > I would suggest that we make the classpath scanning system pluggable using > PipelineOptions. For example in GcpOptions[1], we use two default instance > factories. The first one controls which class is used as the factory[2] and > the second one instantiates an instance of that class and creates the > credential[3]. The same strategy could be added where there is a default > instance factory for the set of resources and another option which controls > which class is instantiated to provide that default. > > Do you think that we could make the default always: > new ClassGraph() > .addClassLoader(classLoader) > .getClasspathURLs(); > > 1: > https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java#L159 > 2: > https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java#L144 > 3: > https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java#L159 > > On Wed, Nov 27, 2019 at 8:19 AM Gleb Kanterov <g...@spotify.com> wrote: > >> I didn't think it through, but this is something I have in mind. Keep >> existing implementation for URLClassLoader, and use URLClassLoader for >> experimental support of Java 11. >> >> List<URL> urls; >> if (classLoader instanceof URLClassLoader) { >> urls = Arrays.asList(((URLClassLoader) classLoader).getURLs()); >> } else { >> urls = new ClassGraph() >> .addClassLoader(classLoader) >> .getClasspathURLs(); >> } >> >> On Wed, Nov 27, 2019 at 4:16 PM Łukasz Gajowy <lukasz.gaj...@gmail.com> >> wrote: >> >>> This looks promising. Do you think you could share your code as well? >>> >>> That part sounds very calming: >>> "ClassGraph is fully compatible with the new JPMS module system (Project >>> Jigsaw / JDK 9+), i.e. it can scan both the traditional classpath and the >>> module path. However, the code is also fully backwards compatible with JDK >>> 7 and JDK 8 (i.e. the code is compiled in Java 7 compatibility mode, and >>> all interaction with the module system is implemented via reflection for >>> backwards compatibility)." >>> >>> I'm currently working on rebuilding the classpath detection mechanism so >>> that it scans java.class.path when URLClassLoader cannot be used (as Luke >>> suggested) but if we decide to use classgraph it should be relatively easy >>> to do that instead. Moreover, I want to enable the possibility of injecting >>> any algorithm implementation through pipeline options - this will enable >>> third-party vendors to inject their custom implementations if needed (SPI >>> pattern that was mentioned at some point in a jira ticket). I think I'm >>> pretty close to finishing that. >>> >>> Thanks! >>> >>> śr., 27 lis 2019 o 15:24 Gleb Kanterov <g...@spotify.com> napisał(a): >>> >>>> Today I tried using classgraph [1] library to scan classpath in Java 11 >>>> instead of using URLClassLoader, and after that, the job worked on >>>> Dataflow. The logic of scanning classpath is pretty sophisticated [2], and >>>> classgraph doesn't have any dependencies. I'm wondering if we can relocate >>>> it to java-core jar and use in for non-URLClassLoaders? >>>> >>>> [1]: https://github.com/classgraph/classgraph >>>> [2]: >>>> https://github.com/classgraph/classgraph/blob/master/src/main/java/io/github/classgraph/Scanner.java >>>> >>>> On Fri, Nov 8, 2019 at 11:40 PM Luke Cwik <lc...@google.com> wrote: >>>> >>>>> I believe the closest suggestion[1] we had that worked for Java 11 and >>>>> maintained backwards compatibility was to use the URLClassLoader to infer >>>>> the resources and if we couldn't do that then look at the java.class.path >>>>> system property to do the inference otherwise fail and force the users to >>>>> tell us what. There are too many scenarios where we will do it wrong >>>>> because of how people package and deploy their code whether it is an >>>>> embedded application server or some other application container with a >>>>> security manager that will prevent us from doing the right thing. >>>>> >>>>> On Fri, Nov 8, 2019 at 10:31 AM Robert Bradshaw <rober...@google.com> >>>>> wrote: >>>>> >>>>>> Note that resources are more properly tied to specific operations and >>>>>> stages, not to the entire pipeline. This is especially true in the >>>>>> face of libraries (which should have the ability to declare their own >>>>>> resources) and cross-language. >>>>>> >>>>>> On Fri, Nov 8, 2019 at 10:19 AM Łukasz Gajowy <lgaj...@apache.org> >>>>>> wrote: >>>>>> > >>>>>> > I figured that it would be good to bump this thread for greater >>>>>> visibility even though I don't have a strong opinion about this (yet - >>>>>> hopefully, I will know more later to be able to share ;) ). >>>>>> > >>>>>> > Answering the questions Luke asked will unblock this issue: >>>>>> https://issues.apache.org/jira/browse/BEAM-5495. Solving it is >>>>>> needed for Java 11 migration (current detecting mechanism does not work >>>>>> with java > 8). >>>>>> > >>>>>> > >>>>>> >> >>>>>> >> That said letting the user resolve the jars to stage can be saner >>>>>> instead of assuming it is in the classpath/loader. I already have a few >>>>>> cases where it will fail cause the transforms load the jars from outside >>>>>> the app classloader (transforms are isolated). >>>>>> > >>>>>> > >>>>>> > >>>>>> > If I understand correctly, at least in Dataflow runner, if users >>>>>> want to provide custom resources to stage, they can use filesToStage >>>>>> pipeline option. Once the option is not null, the runner doesn't detect >>>>>> the >>>>>> resources automatically and stages resources enlisted in the option >>>>>> instead. I think this should be the approach common for all runners (if >>>>>> it >>>>>> is not the case already). >>>>>> >>>>> >>>>> Your understanding is correct and consistency across runners for a >>>>> pipeline option is good for our users. >>>>> >>>>> >>>>>> > >>>>>> > Thanks, >>>>>> > Łukasz >>>>>> > >>>>>> > >>>>>> > >>>>>> >>>>> >>>>> 1: https://github.com/apache/beam/pull/8775 >>>>> >>>>