kfaraz commented on code in PR #18060:
URL: https://github.com/apache/druid/pull/18060#discussion_r2120088105
##########
indexing-service/src/test/java/org/apache/druid/indexing/overlord/TaskQueueConcurrencyTest.java:
##########
@@ -77,7 +80,20 @@ public void setUpIngestionTestBase() throws IOException
@Override
public ListenableFuture<TaskStatus> run(Task task)
{
- return Futures.immediateFuture(TaskStatus.success(task.getId()));
+ // Skip the initialization; we just need to simulate a delayed
future
+ Preconditions.checkArgument(task instanceof NoopTask, "task must
be an instance of NoopTask");
+ final SettableFuture<TaskStatus> future = SettableFuture.create();
+
+ taskExecutor.submit(() -> {
+ try {
+ TaskStatus status = ((NoopTask) task).runTask(null);
+ future.set(status);
+ }
+ catch (Exception e) {
+ future.setException(e);
+ }
+ });
+ return future;
Review Comment:
> there's no point in measuring concurrency if you have no control over the
blocking nature of the callbacks
The point is to verify the blocks relationship between the methods that are
already being asserted in the
existing unit tests.
> Without a way to parameterize the task run time, you effectively negate
any blocking effects of callbacks (which might reveal concurrency bugs
themselves)
That is true, but the solution to that would be to add more test cases.
> This test file is currently only testing concurrency correctness between
"visible" functions being called, but doesn't consider the background callbacks
which acquire the same locks, etc. that the tested methods do (and hence can
cause flakes, etc)
Yes, we had missed adding test cases for the callbacks in the original PR.
Feel free to add them as needed.
> since they execute immediately, which is not at all how the production
code runs.
I am not at all opposed to the idea of having the tasks run for a bit (and
not return immediately)
to more closely resemble a realistic scenario.
**Note on the current approach:**
The `NoopTask` already allows for a run duration parameter. Did you intend
to use that here?
But I don't see the run time of any of the `NoopTask` updated in this patch.
The current patch launches a new thread and runs this task on that thread.
This approach doesn't enforce whether the task run finishes __before__ or
__after__
any other specific event which is being verified in a given test case.
> This leads to the flakes, etc. that I'm trying to fix.
Yes, this is my concern exactly. To avoid flakiness, unit tests should have
more predictable behaviour
rather than running a task for an unspecified time, as is being done in the
current patch.
**Preferred approach:**
My suggestion is to add new test cases to verify the flakiness that you have
fixed in this patch.
For those test cases, return an incomplete future for the task and force the
unit test to complete
that task at a specific point and then do some assertions.
That way, we can have a test case that says:
"if a task finishes _before_ event A, B happens"
or "if a task finishes _after_ event A, C happens and B does not happen"
**Future steps:**
The Overlord is mired in concurrency problems. We need to have dedicated
test setups to catch bugs
like that, any unit test cannot and is not meant to achieve that.
We have already started thinking about Overlord simulations for this exact
purpose.
I hope that provides some clarification for the comment. Please let me know
if you agree.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]