adoroszlai commented on code in PR #4069:
URL: https://github.com/apache/ozone/pull/4069#discussion_r1046117377


##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/commandhandler/ReplicateContainerCommandHandler.java:
##########
@@ -66,12 +66,15 @@ public void handle(SCMCommand command, OzoneContainer 
container,
     final List<DatanodeDetails> sourceDatanodes =
         replicateCommand.getSourceDatanodes();
     final long containerID = replicateCommand.getContainerID();
+    final long deadline = replicateCommand.getDeadline();
 
     Preconditions.checkArgument(sourceDatanodes.size() > 0,
         "Replication command is received for container %s "
             + "without source datanodes.", containerID);
 
-    supervisor.addTask(new ReplicationTask(containerID, sourceDatanodes));
+    ReplicationTask task = new ReplicationTask(containerID, sourceDatanodes);
+    task.setDeadline(deadline);
+    supervisor.addTask(task);

Review Comment:
   For HDDS-7620 I will likely need to add `term` to `ReplicationTask` to be 
able to check it before the task is executed.  I was wondering if we should 
pass the entire command to `ReplicationTask` instead of duplicating more and 
more fields.



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/replication/ReplicationSupervisorMetrics.java:
##########
@@ -66,6 +66,9 @@ public void getMetrics(MetricsCollector collector, boolean 
all) {
             supervisor.getQueueSize())
         .addGauge(Interns.info("numRequestedReplications",
             "Number of requested replications"),
-            supervisor.getReplicationRequestCount());
+            supervisor.getReplicationRequestCount())
+        .addGauge(Interns.info("numTimeoutReplications",
+            "Number of timeout replications in queue"),

Review Comment:
   Possible clarification:
   
   ```suggestion
               "Number of replication requests timed out before being 
processed"),
   ```



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/protocol/commands/SCMCommand.java:
##########
@@ -88,4 +90,40 @@ public String getEncodedToken() {
   public void setEncodedToken(String encodedToken) {
     this.encodedToken = encodedToken;
   }
+
+  /**
+   * Allows a deadline to be set on the command. The deadline is set as the
+   * milliseconds since the epoch when the command must have been completed by.
+   * It is up to the code processing the command to enforce the deadline by
+   * calling the hasExpired() method, and the code sending the command to set
+   * the deadline. The default deadline is zero, which means no deadline.
+   * @param deadlineMs The ms since epoch when the command must have completed
+   *                   by.
+   */
+  public void setDeadline(long deadlineMs) {
+    this.deadlineMsSinceEpoch = deadlineMs;
+  }
+
+  /**
+   * @return The deadline set for this command, or zero if no command has been
+   *         set.
+   */
+  public long getDeadline() {
+    return deadlineMsSinceEpoch;
+  }
+
+  /**
+   * If a deadline has been set to a non zero value, test if the current time
+   * passed is beyond the deadline or not.
+   * @param currentEpochMs current time in milliseconds since the epoch.
+   * @return false if there is no deadline, or it has not expired. True if the
+   *         set deadline has expired.
+   */
+  public boolean hasExpired(long currentEpochMs) {
+    if (deadlineMsSinceEpoch > 0 &&
+        currentEpochMs > deadlineMsSinceEpoch) {
+      return true;
+    }
+    return false;

Review Comment:
   Nit:
   
   ```suggestion
       return deadlineMsSinceEpoch > 0 &&
           currentEpochMs > deadlineMsSinceEpoch;
   ```



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ReplicationManager.java:
##########
@@ -765,6 +767,34 @@ public void setEventTimeout(Duration timeout) {
       this.eventTimeout = timeout.toMillis();
     }
 
+    /**
+     * Deadline which should be set on commands sent from ReplicationManager
+     * to the datanodes, as a percentage of the event.timeout. If the command
+     * has not been processed on the datanode by this time, it will be dropped
+     * by the datanode and Replication Manager will need to resend it.
+     */
+    @Config(key = "command.deadline.factor",
+        type = ConfigType.DOUBLE,
+        defaultValue = "0.9",
+        tags = {SCM, OZONE},
+        description = "Fraction of the hdds.scm.replication.event.timeout "
+            + "from the current time which should be set as a deadline for "
+            + "commands sent from ReplicationManager to datanodes. "
+            + "Commands which are not processed before this deadline will be "
+            + "dropped by the datanodes. Should be a value > 0 and <= 1.")
+    private double commandDeadlineFactor = 0.9;
+    public double getCommandDeadlineFactor() {
+      return commandDeadlineFactor;
+    }
+
+    public void setCommandDeadlineFactor(double val) {
+      if (!(val > 0) || (val > 1)) {
+        throw new IllegalArgumentException(val
+            + " must be greater than 0 and less than equal to 1");
+      }

Review Comment:
   Validation is better added to a separate method, annotated with 
`@PostConstruct`.  This ensures validation is performed even if the instance is 
created and initialized via reflections (also allows multiple config items to 
be cross-validated).
   
   Some examples:
   
   
https://github.com/apache/ozone/blob/d0e6824eec7c015fb5066fefae89408220de1583/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/OzoneClientConfig.java#L194-L220
   
   
https://github.com/apache/ozone/blob/d0e6824eec7c015fb5066fefae89408220de1583/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/replication/ReplicationServer.java#L171-L179



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to