Github user NicoK commented on the issue:
https://github.com/apache/flink/pull/5550
Since the tests go through various different scenarios, that's the natural
place to also verify statistics which should be aligned with the real world
despite the overhead during changes.
The test failure indeed is interesting but unrelated as proposed - I
created [FLINK-8750](https://issues.apache.org/jira/browse/FLINK-8750) for this.---
