GitHub user kl0u opened a pull request:
https://github.com/apache/flink/pull/2797
[FLINK-5056] Makes the BucketingSink rescalable.
This PR makes the BucketingSink rescalable, fixes a bug that could lead to
deleting
valid data and improves the javadocs of the class.
In the process of making the sink rescalable, we also stop deleting
lingering files upon restoring.
This is to avoid possible race-conditions that can lead to one task
deleting files that another
task uses.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/kl0u/flink bucket-ref-fix
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/2797.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2797
----
commit d233d807d91e4438a07d3e38192a65e9f2c302bc
Author: kl0u <[email protected]>
Date: 2016-11-06T19:44:53Z
[FLINK-5054] Make the BucketingSink rescalable.
Refactors the BucketingSink to be able to change
parallelism after restoring from a savepoint. To
do so, this commit changes the following:
1) the sink does not clean up lingering files upon
restoring
2) the previous snapshot/restore cycle is replaced
by the new initializeState/snapshotState one.
commit fbf5c8699ee8c2e2c3b108ba6ec5051ff8d06f2a
Author: kl0u <[email protected]>
Date: 2016-11-06T19:44:53Z
[FLINK-5056] BucketingSink:Clear state only after committing all pending
data.
Before clearing up the state of the Sink upon receiving a notification
about a successful checkpoint, we also check if all pending buckets for
previous checkponts have already been committed.
commit d2d638eee240848c632ee54769ca844a131f216b
Author: kl0u <[email protected]>
Date: 2016-11-13T22:21:46Z
[FLINK-5056] Improve documentation of the BucketingSink.
This commit also removes an unused method that would replace
the close() when a dispose() is introduced in the RichFunction.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---