That looks good as well.

I would suggest that we make the classpath scanning system pluggable using
PipelineOptions. For example in GcpOptions[1], we use two default instance
factories. The first one controls which class is used as the factory[2] and
the second one instantiates an instance of that class and creates the
credential[3]. The same strategy could be added where there is a default
instance factory for the set of resources and another option which controls
which class is instantiated to provide that default.

Do you think that we could make the default always:
new ClassGraph()
          .addClassLoader(classLoader)
          .getClasspathURLs();

1:
https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java#L159
2:
https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java#L144
3:
https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java#L159

On Wed, Nov 27, 2019 at 8:19 AM Gleb Kanterov <g...@spotify.com> wrote:

> I didn't think it through, but this is something I have in mind. Keep
> existing implementation for URLClassLoader, and use URLClassLoader for
> experimental support of Java 11.
>
>     List<URL> urls;
>     if (classLoader instanceof URLClassLoader) {
>       urls = Arrays.asList(((URLClassLoader) classLoader).getURLs());
>     } else {
>       urls = new ClassGraph()
>           .addClassLoader(classLoader)
>           .getClasspathURLs();
>     }
>
> On Wed, Nov 27, 2019 at 4:16 PM Łukasz Gajowy <lukasz.gaj...@gmail.com>
> wrote:
>
>> This looks promising. Do you think you could share your code as well?
>>
>> That part sounds very calming:
>> "ClassGraph is fully compatible with the new JPMS module system (Project
>> Jigsaw / JDK 9+), i.e. it can scan both the traditional classpath and the
>> module path. However, the code is also fully backwards compatible with JDK
>> 7 and JDK 8 (i.e. the code is compiled in Java 7 compatibility mode, and
>> all interaction with the module system is implemented via reflection for
>> backwards compatibility)."
>>
>> I'm currently working on rebuilding the classpath detection mechanism so
>> that it scans java.class.path when URLClassLoader cannot be used (as Luke
>> suggested) but if we decide to use classgraph it should be relatively easy
>> to do that instead. Moreover, I want to enable the possibility of injecting
>> any algorithm implementation through pipeline options - this will enable
>> third-party vendors to inject their custom implementations if needed (SPI
>> pattern that was mentioned at some point in a jira ticket). I think I'm
>> pretty close to finishing that.
>>
>> Thanks!
>>
>> śr., 27 lis 2019 o 15:24 Gleb Kanterov <g...@spotify.com> napisał(a):
>>
>>> Today I tried using classgraph [1] library to scan classpath in Java 11
>>> instead of using URLClassLoader, and after that, the job worked on
>>> Dataflow. The logic of scanning classpath is pretty sophisticated [2], and
>>> classgraph doesn't have any dependencies. I'm wondering if we can relocate
>>> it to java-core jar and use in for non-URLClassLoaders?
>>>
>>> [1]: https://github.com/classgraph/classgraph
>>> [2]:
>>> https://github.com/classgraph/classgraph/blob/master/src/main/java/io/github/classgraph/Scanner.java
>>>
>>> On Fri, Nov 8, 2019 at 11:40 PM Luke Cwik <lc...@google.com> wrote:
>>>
>>>> I believe the closest suggestion[1] we had that worked for Java 11 and
>>>> maintained backwards compatibility was to use the URLClassLoader to infer
>>>> the resources and if we couldn't do that then look at the java.class.path
>>>> system property to do the inference otherwise fail and force the users to
>>>> tell us what. There are too many scenarios where we will do it wrong
>>>> because of how people package and deploy their code whether it is an
>>>> embedded application server or some other application container with a
>>>> security manager that will prevent us from doing the right thing.
>>>>
>>>> On Fri, Nov 8, 2019 at 10:31 AM Robert Bradshaw <rober...@google.com>
>>>> wrote:
>>>>
>>>>> Note that resources are more properly tied to specific operations and
>>>>> stages, not to the entire pipeline. This is especially true in the
>>>>> face of libraries (which should have the ability to declare their own
>>>>> resources) and cross-language.
>>>>>
>>>>> On Fri, Nov 8, 2019 at 10:19 AM Łukasz Gajowy <lgaj...@apache.org>
>>>>> wrote:
>>>>> >
>>>>> > I figured that it would be good to bump this thread for greater
>>>>> visibility even though I don't have a strong opinion about this (yet -
>>>>> hopefully, I will know more later to be able to share ;) ).
>>>>> >
>>>>> > Answering the questions Luke asked will unblock this issue:
>>>>> https://issues.apache.org/jira/browse/BEAM-5495. Solving it is needed
>>>>> for Java 11 migration (current detecting mechanism does not work with java
>>>>> > 8).
>>>>> >
>>>>> >
>>>>> >>
>>>>> >> That said letting the user resolve the jars to stage can be saner
>>>>> instead of assuming it is in the classpath/loader. I already have a few
>>>>> cases where it will fail cause the transforms load the jars from outside
>>>>> the app classloader (transforms are isolated).
>>>>> >
>>>>> >
>>>>> >
>>>>> > If I understand correctly, at least in Dataflow runner, if users
>>>>> want to provide custom resources to stage, they can use filesToStage
>>>>> pipeline option. Once the option is not null, the runner doesn't detect 
>>>>> the
>>>>> resources automatically and stages resources enlisted in the option
>>>>> instead. I think this should be the approach common for all runners (if it
>>>>> is not the case already).
>>>>>
>>>>
>>>> Your understanding is correct and consistency across runners for a
>>>> pipeline option is good for our users.
>>>>
>>>>
>>>>> >
>>>>> > Thanks,
>>>>> > Łukasz
>>>>> >
>>>>> >
>>>>> >
>>>>>
>>>>
>>>> 1: https://github.com/apache/beam/pull/8775
>>>>
>>>

Reply via email to