[
https://issues.apache.org/jira/browse/KAFKA-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sophie Blee-Goldman reopened KAFKA-8940:
----------------------------------------
Assignee: (was: Guozhang Wang)
This failed again with
{code:java}
min fail: key=7-1006 actual=580 expected=7
{code}
The actual min output was
{code:java}
minEvents =
ConsumerRecord(topic = min, partition = 0, leaderEpoch = 0, offset = 7,
CreateTime = 1602115190769 key = 7-1006, value = 7)
ConsumerRecord(topic = min, partition = 0, leaderEpoch = 0, offset = 17,
CreateTime = 1602115193896, key = 7-1006, value = 7)
ConsumerRecord(topic = min, partition = 0, leaderEpoch = 0, offset = 24,
CreateTime = 1602115199973, key = 7-1006, value = 7)
ConsumerRecord(topic = min, partition = 0, leaderEpoch = 0, offset = 33,
CreateTime = 1602115204173, key = 7-1006, value = 580)
ConsumerRecord(topic = min, partition = 0, leaderEpoch = 0, offset = 43,
CreateTime = 1602115209606, key = 7-1006, value = 580)
{code}
So it expects to keep seeing 7 but jumps up to 580 at some point. Why? The
relevant inputs are
{code:java}
CreateTime = 1602115186447, key = 7-1006, value = 7
...
CreateTime = 1602115200806, key = 7-1006, value = 580{code}
Let's say we convert these timestamps to days. The "7" record was created at
18,542.9999 days past the epoch, while the "580" record was created at 18,543.0
days.
So the "580" record was technically from the day after the "7" record. That
definitely seems like a clue...
And sure enough, the SmokeTestClient's "min" aggregation is actually a windowed
aggregation with tumbling windows of 1 day. And it strips out the start-time
part of the windowed key to flatten the output back into the original keyspace,
so the output verifier has no way of knowing that these values actually refer
to different windows.
I haven't personally checked the timestamps of every other failure reported,
but I'd be willing to bet that there's a pattern of input records spanning the
24-mark and falling into different windows. The good news is this doesn't
reflect any bug in Streams, but it's definitely a bug in the SmokeTest.
We could try to manipulate the input data to avoid this, but the better fix is
to just account for the potentially varying time windows when verifying the
output
> Flaky Test SmokeTestDriverIntegrationTest.shouldWorkWithRebalance
> -----------------------------------------------------------------
>
> Key: KAFKA-8940
> URL: https://issues.apache.org/jira/browse/KAFKA-8940
> Project: Kafka
> Issue Type: Bug
> Components: streams, unit tests
> Reporter: Guozhang Wang
> Priority: Major
> Labels: flaky-test
>
> I lost the screen shot unfortunately... it reports the set of expected
> records does not match the received records.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)