gianm commented on code in PR #19188:
URL: https://github.com/apache/druid/pull/19188#discussion_r2986562244
##########
embedded-tests/src/test/java/org/apache/druid/testing/embedded/server/FaultyClusterTest.java:
##########
@@ -92,12 +93,21 @@ public void test_overlord_skipsCleanupOfPendingSegments()
cluster.callApi().postSupervisor(supervisorSpec);
final int recordCount = publish1kRecords(topic, true);
- waitUntilPublishedRecordsAreIngested(recordCount);
+ Assertions.assertEquals(expectedRecords, recordCount);
- cluster.callApi().postSupervisor(supervisorSpec.createSuspendedSpec());
+ waitUntilPublishedRecordsAreIngested(expectedRecords);
+
+ // Additionally, confirm that rows were placed into Druid.
+ // Caution: "ingest/rows/output" does not have a 'DATASOURCE' dimension.
+ indexer.latchableEmitter().waitForEventAggregate(
+ event -> event.hasMetricName("ingest/rows/output"),
Review Comment:
IIRC, the task completion timeout for this test is quite short (few seconds
I think). It's possible a sequence of events happens like this:
- task publishes records
- task waits for handoff to historicals
- task hits completion timeout and exits *before* handoff has actually
happened
- handoff will eventually happen, but a query that comes in before it does
will "miss" segments that have not handed off yet
If this is happening then extending the completion timeout should fix the
flakiness.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]