[
https://issues.apache.org/jira/browse/FLINK-8406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16334208#comment-16334208
]
ASF GitHub Bot commented on FLINK-8406:
---------------------------------------
GitHub user StephanEwen opened a pull request:
https://github.com/apache/flink/pull/5330
[FLINK-8406] [bucketing sink] Fix proper access of Hadoop File Systems
## What is the purpose of the change
Fixes the access to Hadoop file systems when initializing the BucketingSink.
## Verifying this change
The PR adds a unit test for the problem.
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): (yes / **no)**
- The public API, i.e., is any changed class annotated with
`@Public(Evolving)`: (yes / **no)**
- The serializers: (yes / **no** / don't know)
- The runtime per-record code paths (performance sensitive): (yes /
**no** / don't know)
- Anything that affects deployment or recovery: JobManager (and its
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
- The S3 file system connector: (yes / **no** / don't know)
## Documentation
- Does this pull request introduce a new feature? (yes / **no)**
- If yes, how is the feature documented? (**not applicable** / docs /
JavaDocs / not documented)
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/StephanEwen/incubator-flink bucket_fix
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/5330.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #5330
----
commit 1a0736281710fc3710df3410c83b769d51d8cb07
Author: Stephan Ewen <sewen@...>
Date: 2018-01-22T12:32:09Z
[FLINK-8406] [bucketing sink] Fix proper access of Hadoop File Systems
----
> BucketingSink does not detect hadoop file systems
> -------------------------------------------------
>
> Key: FLINK-8406
> URL: https://issues.apache.org/jira/browse/FLINK-8406
> Project: Flink
> Issue Type: Bug
> Components: FileSystem
> Affects Versions: 1.4.0, 1.5.0
> Reporter: Chesnay Schepler
> Assignee: Stephan Ewen
> Priority: Blocker
> Fix For: 1.5.0, 1.4.1
>
>
> In {{BucketingSink#createHadoopFileSystem}} one can find this piece of code:
> {code}
> final org.apache.flink.core.fs.FileSystem flinkFs =
> org.apache.flink.core.fs.FileSystem.get(path.toUri());
> final FileSystem hadoopFs = (flinkFs instanceof HadoopFileSystem)
> ? ((HadoopFileSystem) flinkFs).getHadoopFileSystem()
> : null;
> {code}
> {{FileSystem#get()}} wraps the created {{FileSystem}} in a
> {{SafetyNetWrapperFileSystem}}, resulting in the instanceof check to
> categorically fail.
> We may want to replace the {{get()}} call with {{getUnguardedFileSystem()}}.
> We should also look for other occurrences of similar instanceof checks.
> According to a thread on the mailing list this causes the BucketingSink to be
> unusable.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)