[
https://issues.apache.org/jira/browse/FLINK-8455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16330832#comment-16330832
]
ASF GitHub Bot commented on FLINK-8455:
---------------------------------------
GitHub user StephanEwen opened a pull request:
https://github.com/apache/flink/pull/5313
[FLINK-8455] [core] Make 'org.apache.hadoop.' a 'parent-first' classloading
pattern
## What is the purpose of the change
This change avoids duplication of Hadoop classes between the Flink runtime
and the user code.
Hadoop (and transitively its dependencies) should be part of the
application class loader.
The user code classloader is allowed to duplicate transitive dependencies,
but not Hadoop's
classes directly.
This change addresses an issue that various users have reported (mainly
using the BucketingSink) where they get ClassCastExceptions related to Hadoop
classes.
In all cases, users had Hadoop dependencies bundled into their application
jar files. To make the experience better, I suggest to let Hadoop always load
its classes parent-first.
## Brief change log
- Add `org.apache.hadoop.` to the parent-first patterns.
- Add some tests for the parent-first patterns.
## Verifying this change
This change added self-contained tests.
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): (yes / **no**)
- The public API, i.e., is any changed class annotated with
`@Public(Evolving)`: (yes / **no**)
- The serializers: (yes / **no** / don't know)
- The runtime per-record code paths (performance sensitive): (yes /
**no** / don't know)
- Anything that affects deployment or recovery: JobManager (and its
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
- The S3 file system connector: (yes / **no** / don't know)
## Documentation
- Does this pull request introduce a new feature? (yes / **no**)
- If yes, how is the feature documented? (**not applicable** / docs /
JavaDocs / not documented)
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/StephanEwen/incubator-flink
hadoop_parent_first
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/5313.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #5313
----
commit b6bf9c9e32b0aef079662ff7040969614afa5bdc
Author: Stephan Ewen <sewen@...>
Date: 2018-01-18T16:57:10Z
[FLINK-8455] [core] Make 'org.apache.hadoop.' a 'parent-first' classloading
pattern.
This change avoid duplication of Hadoop classes between the Flink runtime
and the user code.
Hadoop (and transitively its dependencies) should be part of the
application class loader.
The user code classloader is allowed to duplicate transitive dependencies,
but not Hadoop's
classes directly.
This also adds tests to validate parent-first classloading patterns.
----
> Add Hadoop to the parent-first loading patterns
> -----------------------------------------------
>
> Key: FLINK-8455
> URL: https://issues.apache.org/jira/browse/FLINK-8455
> Project: Flink
> Issue Type: Improvement
> Components: Core
> Affects Versions: 1.4.0
> Reporter: Stephan Ewen
> Assignee: Stephan Ewen
> Priority: Major
> Fix For: 1.5.0, 1.4.1
>
>
> Various users have reported issues (mainly in the BucketingSink) where they
> get ClassCastExceptions related to Hadoop classes.
> In all cases, users had Hadoop dependencies bundled into their application
> jar files. To make the experience better, I suggest to let Hadoop always load
> its classes parent-first.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)