Github user tillrohrmann commented on a diff in the pull request:
https://github.com/apache/flink/pull/2629#discussion_r86524130
--- Diff:
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java
---
@@ -540,15 +540,12 @@ private boolean performCheckpoint(CheckpointMetaData
checkpointMetaData) throws
synchronized (lock) {
if (isRunning) {
+ checkpointState(checkpointMetaData);
- // Since both state checkpointing and
downstream barrier emission occurs in this
- // lock scope, they are an atomic operation
regardless of the order in which they occur.
- // Given this, we immediately emit the
checkpoint barriers, so the downstream operators
- // can start their checkpoint work as soon as
possible
+ // broadcast barriers after snapshot operators'
states.
operatorChain.broadcastCheckpointBarrier(
-
checkpointMetaData.getCheckpointId(), checkpointMetaData.getTimestamp());
-
- checkpointState(checkpointMetaData);
+
checkpointMetaData.getCheckpointId(), checkpointMetaData.getTimestamp()
+ );
--- End diff --
I think the `ReentrantReadWriteLock` could work. However, I'm not so sure
whether the higher costs of this lock compared to a mutual exclusion lock we're
currently using is worth the change. I fear that we're optimising here for the
case where you have a long chain of `AsyncWaitOperators`. Instead we could
simply disallow chaining for these operators. Then every chain would have at
most 2 write threads (main and `Emitter`) competing for the lock. Thus, I would
vote for using the existing mutual exclusion lock instead.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---