Agree, this makes sense.

On Wed, Nov 27, 2019 at 6:23 PM Luke Cwik <lc...@google.com> wrote:

> That looks good as well.
>
> I would suggest that we make the classpath scanning system pluggable using
> PipelineOptions. For example in GcpOptions[1], we use two default instance
> factories. The first one controls which class is used as the factory[2] and
> the second one instantiates an instance of that class and creates the
> credential[3]. The same strategy could be added where there is a default
> instance factory for the set of resources and another option which controls
> which class is instantiated to provide that default.
>
> Do you think that we could make the default always:
> new ClassGraph()
>           .addClassLoader(classLoader)
>           .getClasspathURLs();
>
> 1:
> https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java#L159
> 2:
> https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java#L144
> 3:
> https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java#L159
>
> On Wed, Nov 27, 2019 at 8:19 AM Gleb Kanterov <g...@spotify.com> wrote:
>
>> I didn't think it through, but this is something I have in mind. Keep
>> existing implementation for URLClassLoader, and use URLClassLoader for
>> experimental support of Java 11.
>>
>>     List<URL> urls;
>>     if (classLoader instanceof URLClassLoader) {
>>       urls = Arrays.asList(((URLClassLoader) classLoader).getURLs());
>>     } else {
>>       urls = new ClassGraph()
>>           .addClassLoader(classLoader)
>>           .getClasspathURLs();
>>     }
>>
>> On Wed, Nov 27, 2019 at 4:16 PM Łukasz Gajowy <lukasz.gaj...@gmail.com>
>> wrote:
>>
>>> This looks promising. Do you think you could share your code as well?
>>>
>>> That part sounds very calming:
>>> "ClassGraph is fully compatible with the new JPMS module system (Project
>>> Jigsaw / JDK 9+), i.e. it can scan both the traditional classpath and the
>>> module path. However, the code is also fully backwards compatible with JDK
>>> 7 and JDK 8 (i.e. the code is compiled in Java 7 compatibility mode, and
>>> all interaction with the module system is implemented via reflection for
>>> backwards compatibility)."
>>>
>>> I'm currently working on rebuilding the classpath detection mechanism so
>>> that it scans java.class.path when URLClassLoader cannot be used (as Luke
>>> suggested) but if we decide to use classgraph it should be relatively easy
>>> to do that instead. Moreover, I want to enable the possibility of injecting
>>> any algorithm implementation through pipeline options - this will enable
>>> third-party vendors to inject their custom implementations if needed (SPI
>>> pattern that was mentioned at some point in a jira ticket). I think I'm
>>> pretty close to finishing that.
>>>
>>> Thanks!
>>>
>>> śr., 27 lis 2019 o 15:24 Gleb Kanterov <g...@spotify.com> napisał(a):
>>>
>>>> Today I tried using classgraph [1] library to scan classpath in Java 11
>>>> instead of using URLClassLoader, and after that, the job worked on
>>>> Dataflow. The logic of scanning classpath is pretty sophisticated [2], and
>>>> classgraph doesn't have any dependencies. I'm wondering if we can relocate
>>>> it to java-core jar and use in for non-URLClassLoaders?
>>>>
>>>> [1]: https://github.com/classgraph/classgraph
>>>> [2]:
>>>> https://github.com/classgraph/classgraph/blob/master/src/main/java/io/github/classgraph/Scanner.java
>>>>
>>>> On Fri, Nov 8, 2019 at 11:40 PM Luke Cwik <lc...@google.com> wrote:
>>>>
>>>>> I believe the closest suggestion[1] we had that worked for Java 11 and
>>>>> maintained backwards compatibility was to use the URLClassLoader to infer
>>>>> the resources and if we couldn't do that then look at the java.class.path
>>>>> system property to do the inference otherwise fail and force the users to
>>>>> tell us what. There are too many scenarios where we will do it wrong
>>>>> because of how people package and deploy their code whether it is an
>>>>> embedded application server or some other application container with a
>>>>> security manager that will prevent us from doing the right thing.
>>>>>
>>>>> On Fri, Nov 8, 2019 at 10:31 AM Robert Bradshaw <rober...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Note that resources are more properly tied to specific operations and
>>>>>> stages, not to the entire pipeline. This is especially true in the
>>>>>> face of libraries (which should have the ability to declare their own
>>>>>> resources) and cross-language.
>>>>>>
>>>>>> On Fri, Nov 8, 2019 at 10:19 AM Łukasz Gajowy <lgaj...@apache.org>
>>>>>> wrote:
>>>>>> >
>>>>>> > I figured that it would be good to bump this thread for greater
>>>>>> visibility even though I don't have a strong opinion about this (yet -
>>>>>> hopefully, I will know more later to be able to share ;) ).
>>>>>> >
>>>>>> > Answering the questions Luke asked will unblock this issue:
>>>>>> https://issues.apache.org/jira/browse/BEAM-5495. Solving it is
>>>>>> needed for Java 11 migration (current detecting mechanism does not work
>>>>>> with java > 8).
>>>>>> >
>>>>>> >
>>>>>> >>
>>>>>> >> That said letting the user resolve the jars to stage can be saner
>>>>>> instead of assuming it is in the classpath/loader. I already have a few
>>>>>> cases where it will fail cause the transforms load the jars from outside
>>>>>> the app classloader (transforms are isolated).
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > If I understand correctly, at least in Dataflow runner, if users
>>>>>> want to provide custom resources to stage, they can use filesToStage
>>>>>> pipeline option. Once the option is not null, the runner doesn't detect 
>>>>>> the
>>>>>> resources automatically and stages resources enlisted in the option
>>>>>> instead. I think this should be the approach common for all runners (if 
>>>>>> it
>>>>>> is not the case already).
>>>>>>
>>>>>
>>>>> Your understanding is correct and consistency across runners for a
>>>>> pipeline option is good for our users.
>>>>>
>>>>>
>>>>>> >
>>>>>> > Thanks,
>>>>>> > Łukasz
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>>
>>>>>
>>>>> 1: https://github.com/apache/beam/pull/8775
>>>>>
>>>>

Reply via email to