[
https://issues.apache.org/jira/browse/BEAM-8113?focusedWorklogId=307730&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307730
]
ASF GitHub Bot logged work on BEAM-8113:
----------------------------------------
Author: ASF GitHub Bot
Created on: 06/Sep/19 09:55
Start Date: 06/Sep/19 09:55
Worklog Time Spent: 10m
Work Description: je-ik commented on issue #9451: [BEAM-8113] Stage files
from context classloader
URL: https://github.com/apache/beam/pull/9451#issuecomment-528792454
I cannot say I don't agree with both of you. I totally do. Let me recap the
current state:
- we extract jar from classloader of some arbitrarily picked class
- we assume it is URLClassLoader
- if user creates any user-supplied class loader and passes that as context
class loader, we don't extract classes from that
That has several consequences:
a) it fails on JDK >= 9
b) it stages some arbitrary subset of possible jar that exist on class
path, even if more of them could be extracted
The best way to provide jars to use is actually to specify them by hand, but
that
- is not standardized among runners (it probably should be part of
`PipelineOptions`, but currently is not)
- even if we add some standard way to stage files ti `PipelineOptions`,
some runners (typically local runners) will tend to ignore those (because they
assume, that all classes are loaded or able to load when the pipeline is run)
One might argue, that it is wrongif local runners ignore these files, but
there is currently no way to supply any jars to local flink for instance.
So, to conclude what actually was my intent here:
- one can always take list of jars and create a context class loader like
that
```java
Thread.currentThread().setContextClassLoader(
new URLClassLoader(new URL[] { /* my jars */}));
Pipeline p = ...;
p.run();
```
- if we correctly stage files, that should work for both *all* distributed
and *all* local runners (provided they are well behaved, which unfortunately
flink is not, but that is [different
issue](https://issues.apache.org/jira/browse/FLINK-13925))
I really don't think that serializing class loader hierarchy along with the
DoFns is a solution, because even if it would be possible (which seems to me is
not), then it would still be fragile and error prone.
I'd be very glad to hear about some other possibility how to do the staging
robust enough to
* work well for all runners (even runners that beam actually doesn't know
about, because everybody can in theory create his own runner outside of beam
repo)
* work well on all JDKs
but I currently don't know any other way.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 307730)
Time Spent: 7h 40m (was: 7.5h)
> FlinkRunner: Stage files from context classloader
> -------------------------------------------------
>
> Key: BEAM-8113
> URL: https://issues.apache.org/jira/browse/BEAM-8113
> Project: Beam
> Issue Type: Improvement
> Components: runner-flink
> Reporter: Jan Lukavský
> Assignee: Jan Lukavský
> Priority: Major
> Time Spent: 7h 40m
> Remaining Estimate: 0h
>
> Currently, only files from {{FlinkRunner.class.getClassLoader()}} are staged
> by default. Add also files from
> {{Thread.currentThread().getContextClassLoader()}}.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)