kgyrtkirk commented on code in PR #18510:
URL: https://github.com/apache/druid/pull/18510#discussion_r2375263799


##########
embedded-tests/src/test/java/org/apache/druid/testing/embedded/indexing/IndexTaskTest.java:
##########
@@ -99,6 +99,9 @@ public void test_runIndexTask_forInlineDatasource()
     }
 
     cluster.callApi().waitForAllSegmentsToBeAvailable(dataSource, coordinator, 
broker);
+    broker.latchableEmitter().waitForMetricEvent(
+        event -> event.hasDimension(DruidMetrics.DATASOURCE, dataSource)
+    );

Review Comment:
   no



##########
embedded-tests/src/test/java/org/apache/druid/testing/embedded/server/HttpRemoteTaskRunnerWorkerFailTest.java:
##########
@@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.testing.embedded.server;
+
+import org.apache.druid.client.indexing.TaskStatusResponse;
+import org.apache.druid.common.utils.IdUtils;
+import org.apache.druid.indexer.TaskState;
+import org.apache.druid.indexing.common.task.NoopTask;
+import org.apache.druid.indexing.overlord.hrtr.HttpRemoteTaskRunner;
+import org.apache.druid.query.DruidMetrics;
+import org.apache.druid.segment.TestDataSource;
+import org.apache.druid.testing.embedded.EmbeddedBroker;
+import org.apache.druid.testing.embedded.EmbeddedCoordinator;
+import org.apache.druid.testing.embedded.EmbeddedDruidCluster;
+import org.apache.druid.testing.embedded.EmbeddedIndexer;
+import org.apache.druid.testing.embedded.EmbeddedOverlord;
+import org.apache.druid.testing.embedded.junit5.EmbeddedClusterTestBase;
+import org.junit.jupiter.api.Test;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+
+public class HttpRemoteTaskRunnerWorkerFailTest extends EmbeddedClusterTestBase
+{
+  private final EmbeddedOverlord overlord = new EmbeddedOverlord();
+  private final EmbeddedIndexer indexer = new 
EmbeddedIndexer().addProperty("druid.worker.capacity", "3");
+
+  @Override
+  public EmbeddedDruidCluster createCluster()
+  {
+    return EmbeddedDruidCluster.withEmbeddedDerbyAndZookeeper()
+        .useLatchableEmitter()
+        .addServer(new EmbeddedCoordinator())
+        .addServer(new EmbeddedBroker())
+        .addServer(overlord)
+        .addServer(indexer);
+  }
+
+  @Test
+  public void test_overlord_marksTaskAsFailed_ifIndexerCrashes() throws 
Exception
+  {
+    final String taskId = IdUtils.newTaskId("sim_test_noop", 
TestDataSource.WIKI, null);
+    cluster.callApi().onLeaderOverlord(
+        o -> {
+          return o.runTask(taskId, new NoopTask(taskId, null, null, 8000L, 0L, 
null));
+        }
+    );
+    // wait for the overlord to dispatch the task and worker start it
+    indexer.latchableEmitter().waitForMetricEvent(
+        event -> event.hasMetricName(NoopTask.NOOP_TASK_EVENT_STARTED),
+        1000

Review Comment:
   10s is a lot....
   removed the method and the abilitty to set the timeout



##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/NoopTask.java:
##########
@@ -97,13 +102,22 @@ public boolean isReady(TaskActionClient taskActionClient)
   @Override
   public void stopGracefully(TaskConfig taskConfig)
   {
+    aborted.set(true);
   }
 
   @Override
   public TaskStatus runTask(TaskToolbox toolbox) throws Exception
   {
-    Thread.sleep(runTime);
-    return TaskStatus.success(getId());
+    
toolbox.getEmitter().emit(ServiceMetricEvent.builder().setMetric(NOOP_TASK_EVENT_STARTED,
 1));
+    long endTime = System.currentTimeMillis() + runTime;
+    while (endTime > System.currentTimeMillis() && !aborted.get()) {

Review Comment:
   regardless what other classes do - this `runTask` method should return 
failure...as it should be failed as it was aborted
   
   I don't think interrupts are used as it was running for the full time of the 
`runTime`; commenting the content of `stopGracefully` restores the original 
behaviour...which hangs for 8 second for the new test.
   
   I wonder if the interrupt would be handled correctly - would there be still 
a need for the `stopGracefully` method?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to