Github user pnowojski commented on a diff in the pull request:
https://github.com/apache/flink/pull/4840#discussion_r145611565
--- Diff:
flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/legacy/metrics/MetricStore.java
---
@@ -38,29 +43,136 @@
import static
org.apache.flink.runtime.metrics.dump.QueryScopeInfo.INFO_CATEGORY_OPERATOR;
import static
org.apache.flink.runtime.metrics.dump.QueryScopeInfo.INFO_CATEGORY_TASK;
import static
org.apache.flink.runtime.metrics.dump.QueryScopeInfo.INFO_CATEGORY_TM;
+import static org.apache.flink.util.Preconditions.checkNotNull;
/**
* Nested data-structure to store metrics.
- *
- * <p>This structure is not thread-safe.
*/
+@ThreadSafe
public class MetricStore {
private static final Logger LOG =
LoggerFactory.getLogger(MetricStore.class);
- final JobManagerMetricStore jobManager = new JobManagerMetricStore();
- final Map<String, TaskManagerMetricStore> taskManagers = new
HashMap<>();
- final Map<String, JobMetricStore> jobs = new HashMap<>();
+ private final ComponentMetricStore jobManager = new
ComponentMetricStore();
+ private final Map<String, TaskManagerMetricStore> taskManagers = new
ConcurrentHashMap<>();
+ private final Map<String, JobMetricStore> jobs = new
ConcurrentHashMap<>();
+
+ /**
+ * Remove not active task managers.
+ *
+ * @param activeTaskManagers to retain.
+ */
+ public synchronized void retainTaskManagers(List<String>
activeTaskManagers) {
--- End diff --
Strictly speaking right now it isn't. However without synchronising those
methods, implementing them is much harder. For example I made one mistake where
I implemented:
```
if (!jobs.containKey(jobID)) {
return null;
}
return jobs.get(jobID).getTaskMetricStore(taskID)
```
Also without synchronising it's harder to understand correctness of code,
that's handling three separate fields (whether this is a correct access order;
whether the state of variable is/should be coherent, like if existence of key
in one map, should imply existence a key on another, etc).
Long story short, synchronising is not an issue here, but helps with
reasoning about this code.
---