[ 
https://issues.apache.org/jira/browse/BEAM-8113?focusedWorklogId=308360&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308360
 ]

ASF GitHub Bot logged work on BEAM-8113:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 07/Sep/19 15:52
            Start Date: 07/Sep/19 15:52
    Worklog Time Spent: 10m 
      Work Description: lukecwik commented on issue #9451: [BEAM-8113] Stage 
files from context classloader
URL: https://github.com/apache/beam/pull/9451#issuecomment-529119399
 
 
   > I cannot say I don't agree with both of you. I totally do. Let me recap 
the current state:
   > 
   > * we extract jar from classloader of some arbitrarily picked class
   > * we assume it is URLClassLoader
   the most common of classloaders before JDK 9
   
   > * if user creates any user-supplied class loader and passes that as 
context class loader, we don't extract classes from that
   supporting all uses cases was never the goal but supporting common ones 
could be.
   
   > 
   > That has several consequences:
   > a) it fails on JDK >= 9
   > b) it stages some arbitrary subset of possible jar that exist on class 
path, even if more of them could be extracted
   > 
   > The best way to provide jars to use is actually to specify them by hand, 
but that
   > 
   > * is not standardized among runners (it probably should be part of 
`PipelineOptions`, but currently is not)
   > * even if we add some standard way to stage files ti `PipelineOptions`, 
some runners (typically local runners) will tend to ignore those (because they 
assume, that all classes are loaded or able to load when the pipeline is run)
   > 
   > One might argue, that it is wrongif local runners ignore these files, but 
there is currently no way to supply any jars to local flink for instance.
   > 
   > So, to conclude what actually was my intent here:
   > 
   > * one can always take list of jars and create a context class loader like 
that
   >   ```java
   >    Thread.currentThread().setContextClassLoader(
   >        new URLClassLoader(new URL[] { /* my jars */}));
   >    Pipeline p = ...;
   >    p.run();
   >   ```
   > * if we correctly stage files, that should work for both _all_ distributed 
and _all_ local runners (provided they are well behaved, which unfortunately 
flink is not, but that is [different 
issue](https://issues.apache.org/jira/browse/FLINK-13925))
   > 
   > I really don't think that serializing class loader hierarchy along with 
the DoFns is a solution, because even if it would be possible (which seems to 
me is not), then it would still be fragile and error prone.
   > 
   I believe this is possible but not worth the effort since the majority of 
users have simple scenarios (one flat classpath). Advanced users will be able 
to modify the Beam Java container and then do whatever they want.
   
   > I'd be very glad to hear about some other possibility how to do the 
staging robust enough to
   > 
   > * work well for all runners (even runners that beam actually doesn't know 
about, because everybody can in theory create his own runner outside of beam 
repo)
   > * work well on all JDKs
   >   but I currently don't know any other way.
   
   Out of a change in this space, I was looking for anything that would improve 
the situation for all users by either handling a wider range of supported 
scenarios or by expanding how many runners honor a classpath the user can 
provide. On #8775, I suggested that we use a technique where we attempt to get 
the classpath from a UrlClassLoader and if that isn't available then fallback 
to using java.class.path system property. This would then enable the most 
common scenario where there users application runs in a flat classpath in JDK 8 
and below and also JDK 9 and above.
   
   With portability, the Beam Java container will allow us to know what the 
execution environment will be for many users and hence make decisions that 
won't be impacted by how runners decide to package themselves up or load 
classes. Users of local runners will always have to deal with the limitations 
that the runner imposes on them.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 308360)
    Time Spent: 8h 10m  (was: 8h)

> FlinkRunner: Stage files from context classloader
> -------------------------------------------------
>
>                 Key: BEAM-8113
>                 URL: https://issues.apache.org/jira/browse/BEAM-8113
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-flink
>            Reporter: Jan Lukavský
>            Assignee: Jan Lukavský
>            Priority: Major
>          Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> Currently, only files from {{FlinkRunner.class.getClassLoader()}} are staged 
> by default. Add also files from 
> {{Thread.currentThread().getContextClassLoader()}}.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to