clintropolis opened a new pull request, #13344:
URL: https://github.com/apache/druid/pull/13344

   RemoteTaskRunnerTest.testRunPendingTaskFailToAssignTask ` fails pretty 
consistently if run until failure in intelij. After adding this thread.sleep i 
let it run for over 2k iterations without failure.
   
   I hate it, but it seems to significantly reduce the flakiness and I wasn't 
able to determine a "good" fix in a short amount of time so lets do this for 
now.
   
   The underlying issue appears to be a race condition with test zk server and 
worker startup, where if the timing is incorrect an `INITIALIZED` event that 
happens after the first pending task is added, can result in the task runner 
calling `runPendingTask`, before the test is able to call `runPendingTask`, 
which makes the test assertions no longer true.
   
   In successful runs, the logs have a section like:
   ```
   2022-11-03T01:20:05,937 INFO [Time-limited test] 
org.apache.druid.indexing.overlord.RemoteTaskRunner - Added pending task task 
id with spaces
   2022-11-03T01:20:05,938 ERROR [Time-limited test] 
org.apache.druid.indexing.overlord.RemoteTaskRunner - Exception while trying to 
assign task: {class=org.apache.druid.indexing.overlord.RemoteTaskRunner, 
exceptionType=class java.lang.IllegalArgumentException, exceptionMessage=task 
id != workItem id, taskId=wrongId}
   java.lang.IllegalArgumentException: task id != workItem id
        at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:125) 
~[guava-16.0.1.jar:?]
        at 
org.apache.druid.indexing.overlord.RemoteTaskRunner.tryAssignTask(RemoteTaskRunner.java:847)
 ~[classes/:?]
        at 
org.apache.druid.indexing.overlord.RemoteTaskRunner.runPendingTask(RemoteTaskRunner.java:771)
 ~[classes/:?]
   ```
   
   but in the failure, there is no exception:
   ```
   2022-11-03T01:20:17,391 INFO [Time-limited test] 
org.apache.druid.indexing.overlord.RemoteTaskRunner - Added pending task task 
id with spaces
   2022-11-03T01:20:17,399 INFO [rtr-pending-tasks-runner-0] 
org.apache.druid.indexing.overlord.RemoteTaskRunner - Assigning task [task id 
with spaces] to worker [worker]
   2022-11-03T01:20:17,423 INFO [rtr-pending-tasks-runner-0] 
org.apache.druid.indexing.overlord.RemoteTaskRunner - Task [task id with 
spaces] started running on worker [worker]
   2022-11-03T01:20:18,316 INFO [SessionTracker] 
org.apache.zookeeper.server.SessionTrackerImpl - SessionTrackerImpl exited loop!
   2022-11-03T01:20:18,397 INFO [Time-limited test] 
org.apache.druid.indexing.overlord.RemoteTaskRunner - Stopping 
RemoteTaskRunner...
   ```
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to