cpugputpu created FLINK-16349:
---------------------------------
Summary: Use LinkedHashSet in TimeWindow.java
Key: FLINK-16349
URL: https://issues.apache.org/jira/browse/FLINK-16349
Project: Flink
Issue Type: Bug
Reporter: cpugputpu
The test in
_apache.flink,org.apache.flink.streaming.runtime.operators.windowing.MergingWindowSetTest#testMergeLargeWindowCoveringMultipleWindows_
can fail due to a different iteration order of HashSet.
The failure is presented as follows.
java.lang.AssertionError: java.lang.AssertionError:
*Expected*: (iterable over <TimeWindow\{start=0, end=3}>, <TimeWindow\{start=5,
end=8}> in any order or iterable over <TimeWindow\{start=0, end=3}>,
<TimeWindow\{start=10, end=13}> in any order or iterable over
<TimeWindow\{start=5, end=8}>, <TimeWindow\{start=10, end=13}> in any order)
*but was*: <TimeWindow\{start=1, end=3}, TimeWindow\{start=10, end=13}> at
org.apache.flink.streaming.runtime.operators.windowing.MergingWindowSetTest.testMergeLargeWindowCoveringMultipleWindows(MergingWindowSetTest.java:358)
The root cause of it lies in TimeWindow.java, where _currentMerge.f1 = new
LinkedHashSet<>();_ is executed. When calling _W mergedStateWindow =
this.mapping.get(mergedWindows.iterator().next());_
(flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/operators/windowing/MergingWindowSet.java
, line 192), the _iterator()_ of HashSet will make no guarantee about the
order.
The specification about HashSet says that "it makes no guarantees as to the
iteration order of the set; in particular, it does not guarantee that the order
will remain constant over time". The documentation is here for your reference:
[https://docs.oracle.com/javase/8/docs/api/java/util/HashSet.html]
The fix is to use LinkedHashSet instead of HashSet so that the
non-deterministic behaviour is eliminated. The code will be more stable.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)