Re: [PR] IGNITE-21250 Client compute should pass all candidate nodes [ignite-3]

via GitHub Tue, 30 Jan 2024 04:55:50 -0800


ptupitsyn commented on code in PR #3104:
URL: https://github.com/apache/ignite-3/pull/3104#discussion_r1471161419



##########
modules/client-handler/src/main/java/org/apache/ignite/client/handler/requests/compute/ClientComputeExecuteRequest.java:
##########
@@ -50,29 +56,82 @@ public class ClientComputeExecuteRequest {
     public static CompletableFuture<Void> process(
             ClientMessageUnpacker in,
             ClientMessagePacker out,
-            IgniteCompute compute,
+            IgniteComputeInternal compute,
             ClusterService cluster,
-            NotificationSender notificationSender) {
-        var nodeName = in.tryUnpackNil() ? null : in.unpackString();
-
-        var node = nodeName == null
-                ? cluster.topologyService().localMember()
-                : cluster.topologyService().getByConsistentId(nodeName);
-
-        if (node == null) {
-            throw new IgniteException("Specified node is not present in the 
cluster: " + nodeName);
-        }
+            NotificationSender notificationSender
+    ) {
+        Set<ClusterNode> candidates = unpackCandidateNodes(in, cluster);
 
         List<DeploymentUnit> deploymentUnits = unpackDeploymentUnits(in);
         String jobClassName = in.unpackString();
         JobExecutionOptions options = 
JobExecutionOptions.builder().priority(in.unpackInt()).maxRetries(in.unpackInt()).build();
         Object[] args = unpackArgs(in);
 
-        JobExecution<Object> execution = compute.executeAsync(Set.of(node), 
deploymentUnits, jobClassName, options, args);
+        ClusterNode localNode = cluster.topologyService().localMember();
+        ClusterNode targetNode = selectTargetNode(candidates, localNode);
+        candidates.remove(targetNode);
+
+        JobExecution<Object> execution = 
compute.executeAsyncWithFailover(targetNode, candidates,
+                deploymentUnits, jobClassName, options, args);
         sendResultAndStatus(execution, notificationSender);
         return execution.idAsync().thenAccept(out::packUuid);
     }
 
+    private static Set<ClusterNode> unpackCandidateNodes(ClientMessageUnpacker 
in, ClusterService cluster) {
+        Set<String> candidateIds = 
Arrays.stream(in.unpackObjectArrayFromBinaryTuple())
+                .map(String.class::cast)
+                .collect(Collectors.toSet());
+
+        Set<ClusterNode> candidates = new HashSet<>();
+        List<String> notFoundNodes = new ArrayList<>();
+        candidateIds.forEach(id -> {
+            ClusterNode node = cluster.topologyService().getByConsistentId(id);
+            if (node == null) {
+                notFoundNodes.add(id);
+            } else {
+                candidates.add(node);
+            }
+        });
+
+        if (!notFoundNodes.isEmpty()) {
+            throw new IgniteException("Specified nodes are not present in the 
cluster: " + notFoundNodes);
+        }
+        return candidates;
+    }
+
+    /**
+     * Selects a random node from the set of candidates, preferably not a 
local node.

Review Comment:
   Agree with this, client retry is different and can even be disabled. 
   
   Still, using a remote node instead of a local one for job execution does not 
make sense. It degrades performance and does not improve anything.
   
   
   I think we should design client-side compute job failover. The design goals 
can't be achieved with "coordinator node" approach, because coordinator can 
fail, or the connection to coordinator can be briefly lost (and 
re-established). The client should be the coordinator.



##########
modules/client-handler/src/main/java/org/apache/ignite/client/handler/requests/compute/ClientComputeExecuteRequest.java:
##########
@@ -50,29 +56,82 @@ public class ClientComputeExecuteRequest {
     public static CompletableFuture<Void> process(
             ClientMessageUnpacker in,
             ClientMessagePacker out,
-            IgniteCompute compute,
+            IgniteComputeInternal compute,
             ClusterService cluster,
-            NotificationSender notificationSender) {
-        var nodeName = in.tryUnpackNil() ? null : in.unpackString();
-
-        var node = nodeName == null
-                ? cluster.topologyService().localMember()
-                : cluster.topologyService().getByConsistentId(nodeName);
-
-        if (node == null) {
-            throw new IgniteException("Specified node is not present in the 
cluster: " + nodeName);
-        }
+            NotificationSender notificationSender
+    ) {
+        Set<ClusterNode> candidates = unpackCandidateNodes(in, cluster);
 
         List<DeploymentUnit> deploymentUnits = unpackDeploymentUnits(in);
         String jobClassName = in.unpackString();
         JobExecutionOptions options = 
JobExecutionOptions.builder().priority(in.unpackInt()).maxRetries(in.unpackInt()).build();
         Object[] args = unpackArgs(in);
 
-        JobExecution<Object> execution = compute.executeAsync(Set.of(node), 
deploymentUnits, jobClassName, options, args);
+        ClusterNode localNode = cluster.topologyService().localMember();
+        ClusterNode targetNode = selectTargetNode(candidates, localNode);
+        candidates.remove(targetNode);
+
+        JobExecution<Object> execution = 
compute.executeAsyncWithFailover(targetNode, candidates,
+                deploymentUnits, jobClassName, options, args);
         sendResultAndStatus(execution, notificationSender);
         return execution.idAsync().thenAccept(out::packUuid);
     }
 
+    private static Set<ClusterNode> unpackCandidateNodes(ClientMessageUnpacker 
in, ClusterService cluster) {
+        Set<String> candidateIds = 
Arrays.stream(in.unpackObjectArrayFromBinaryTuple())
+                .map(String.class::cast)
+                .collect(Collectors.toSet());
+
+        Set<ClusterNode> candidates = new HashSet<>();
+        List<String> notFoundNodes = new ArrayList<>();
+        candidateIds.forEach(id -> {
+            ClusterNode node = cluster.topologyService().getByConsistentId(id);
+            if (node == null) {
+                notFoundNodes.add(id);
+            } else {
+                candidates.add(node);
+            }
+        });
+
+        if (!notFoundNodes.isEmpty()) {
+            throw new IgniteException("Specified nodes are not present in the 
cluster: " + notFoundNodes);
+        }
+        return candidates;
+    }
+
+    /**
+     * Selects a random node from the set of candidates, preferably not a 
local node.

Review Comment:
   Agree with this, client retry is different and can even be disabled. 
   
   Still, using a remote node instead of a local one for job execution does not 
make sense. It degrades performance and does not achieve anything.
   
   
   I think we should design client-side compute job failover. The design goals 
can't be achieved with "coordinator node" approach, because coordinator can 
fail, or the connection to coordinator can be briefly lost (and 
re-established). The client should be the coordinator.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@ignite.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] IGNITE-21250 Client compute should pass all candidate nodes [ignite-3]

Reply via email to