Github user mxm commented on the pull request:
https://github.com/apache/flink/pull/896#issuecomment-120302065
> Looks like this change breaks the YARN integration. The YARN WordCount no
longer works.
Should be working again now.
> It would be good if the accumulator update interval was configurable.
> Edit: Is that the same value as the heartbeats?
Yes, that was a design rationale to keep the message count low. We could
only send the accumulators in every Nth heartbeat and let it be configurable.
> The is a potential modification conflict: Drawing a snapshot for
serialization and registering a new accumulator can lead to a
ConcurrentModificationException in the drawing of the snapshot.
I conducted tests with concurrent insertions and deletions and found that
only concurrent removals cause ConcurrentModificationExceptions. Removals are
not allowed for accumulators. Anyways, we could switch to a synchronized or
copy on write hash map. If we do I would opt for the latter.
> The naming of the accumulators refers sometimes to "flink vs.
user-defined", and sometimes to "internal vs. external". Can we make this
consistent? I actually like the "flink vs. user-defined" naming better.
Then let's stick to the "flink vs. user-defined" naming scheme.
> I think the code would be simpler is the registry simply always had a
created map for internal and external accumulators. Also, a "reporter" object
would help.
I agree that would be a nicer way of dealing with the API.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---