[ 
https://issues.apache.org/jira/browse/GOBBLIN-1837?focusedWorklogId=863082&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-863082
 ]

ASF GitHub Bot logged work on GOBBLIN-1837:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 31/May/23 15:53
            Start Date: 31/May/23 15:53
    Worklog Time Spent: 10m 
      Work Description: Will-Lo commented on code in PR #3700:
URL: https://github.com/apache/gobblin/pull/3700#discussion_r1211927683


##########
gobblin-runtime/src/main/java/org/apache/gobblin/runtime/api/MysqlSchedulerLeaseDeterminationStore.java:
##########
@@ -0,0 +1,207 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.gobblin.runtime.api;
+
+import java.io.IOException;
+import java.sql.Connection;
+import java.sql.PreparedStatement;
+import java.sql.ResultSet;
+import java.sql.SQLException;
+import java.sql.Timestamp;
+
+import com.google.inject.Inject;
+import com.typesafe.config.Config;
+
+import javax.sql.DataSource;
+
+import org.apache.gobblin.broker.SharedResourcesBrokerFactory;
+import org.apache.gobblin.configuration.ConfigurationKeys;
+import org.apache.gobblin.metastore.MysqlDataSourceFactory;
+import org.apache.gobblin.service.ServiceConfigKeys;
+import org.apache.gobblin.util.ConfigUtils;
+
+
+public class MysqlSchedulerLeaseDeterminationStore implements 
SchedulerLeaseDeterminationStore {

Review Comment:
   Can you add a javadoc describing what this class does?



##########
gobblin-service/src/main/java/org/apache/gobblin/service/monitoring/DagActionStoreChangeMonitor.java:
##########
@@ -151,7 +170,16 @@ protected void processMessage(DecodeableKafkaRecord 
message) {
           log.info("Received insert dag action and about to send kill flow 
request");
           dagManager.handleKillFlowRequest(flowGroup, flowName, 
Long.parseLong(flowExecutionId));
           this.killsInvoked.mark();
-        } else {
+        } else if (dagAction.equals(DagActionStore.DagActionValue.LAUNCH)) {
+          // If multi-active scheduler is NOT turned on we should not receive 
these type of events
+          if (!this.isMultiActiveSchedulerEnabled) {
+            log.warn("Received LAUNCH dagAction while not in multi-active 
scheduler mode for flow group: {}, flow name:"
+                + "{}, execution id: {}, dagAction: {}", flowGroup, flowName, 
flowExecutionId, dagAction);
+            this.unexpectedErrors.mark();
+          }
+          log.info("Received insert dag action and about to forward launch 
request to DagManager");
+          submitFlowToDagManager(flowGroup, flowName);
+        }else {

Review Comment:
   nit: missing space



##########
gobblin-runtime/src/main/java/org/apache/gobblin/runtime/api/DagActionStore.java:
##########
@@ -27,7 +27,9 @@
 public interface DagActionStore {
   enum DagActionValue {
     KILL,
-    RESUME
+    RESUME,
+    // TODO: potentially combine this enum with {@link 
SchedulerLeaseDeterminationStore.FlowActionType}
+    LAUNCH

Review Comment:
   Naive question for the sake of my understanding here:
   So we are making the assumption that the host scheduler that wins the lease 
to run the job does not imply that the host actually runs the job here? So any 
host can run a job that is stored/scheduled by other hosts.



##########
gobblin-runtime/src/main/java/org/apache/gobblin/runtime/api/MysqlSchedulerLeaseDeterminationStore.java:
##########
@@ -0,0 +1,207 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.gobblin.runtime.api;
+
+import java.io.IOException;
+import java.sql.Connection;
+import java.sql.PreparedStatement;
+import java.sql.ResultSet;
+import java.sql.SQLException;
+import java.sql.Timestamp;
+
+import com.google.inject.Inject;
+import com.typesafe.config.Config;
+
+import javax.sql.DataSource;
+
+import org.apache.gobblin.broker.SharedResourcesBrokerFactory;
+import org.apache.gobblin.configuration.ConfigurationKeys;
+import org.apache.gobblin.metastore.MysqlDataSourceFactory;
+import org.apache.gobblin.service.ServiceConfigKeys;
+import org.apache.gobblin.util.ConfigUtils;
+
+
+public class MysqlSchedulerLeaseDeterminationStore implements 
SchedulerLeaseDeterminationStore {
+  public static final String CONFIG_PREFIX = 
"MysqlSchedulerLeaseDeterminationStore";
+
+  protected final DataSource dataSource;
+  private final DagActionStore dagActionStore;
+  private final String tableName;
+  private final long epsilon;
+  private final long linger;
+  /* TODO:
+     - define retention on this table
+     - initialize table with epsilon and linger if one already doesn't exist 
using these configs
+     - join with table above to ensure epsilon/linger values are consistent 
across hosts (in case hosts are deployed with different configs)
+   */
+  protected static final String WHERE_CLAUSE_TO_MATCH_ROW = "WHERE 
flow_group=? AND flow_name=? AND flow_execution_id=? "
+      + "AND flow_action=? AND ABS(trigger_event_timestamp-?) <= %s";
+  protected static final String 
ATTEMPT_INSERT_AND_GET_PURSUANT_TIMESTAMP_STATEMENT = "INSERT INTO %s 
(flow_group, "
+      + "flow_name, flow_execution_id, flow_action, trigger_event_timestamp) 
VALUES (?, ?, ?, ?, ?) WHERE NOT EXISTS ("
+      + "SELECT * FROM %s " + WHERE_CLAUSE_TO_MATCH_ROW + "; SELECT 
ROW_COUNT() AS rows_inserted_count, "
+      + "pursuant_timestamp FROM %s " + WHERE_CLAUSE_TO_MATCH_ROW;
+
+  protected static final String UPDATE_PURSUANT_TIMESTAMP_STATEMENT = "UPDATE 
%s SET pursuant_timestamp = NULL "
+      + WHERE_CLAUSE_TO_MATCH_ROW;
+  private static final String CREATE_TABLE_STATEMENT = "CREATE TABLE IF NOT 
EXISTS %S (" + "flow_group varchar("
+      + ServiceConfigKeys.MAX_FLOW_GROUP_LENGTH + ") NOT NULL, flow_name 
varchar("
+      + ServiceConfigKeys.MAX_FLOW_GROUP_LENGTH + ") NOT NULL, " + 
"flow_execution_id varchar("
+      + ServiceConfigKeys.MAX_FLOW_EXECUTION_ID_LENGTH + ") NOT NULL, 
flow_action varchar(100) NOT NULL, "
+      + "trigger_event_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP, "
+      + "pursuant_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,"
+      + "PRIMARY KEY 
(flow_group,flow_name,flow_execution_id,flow_action,trigger_event_timestamp)";
+
+  @Inject
+  public MysqlSchedulerLeaseDeterminationStore(Config config, DagActionStore 
dagActionStore) throws IOException {
+    if (config.hasPath(CONFIG_PREFIX)) {
+      config = config.getConfig(CONFIG_PREFIX).withFallback(config);
+    } else {
+      throw new IOException("Please specify the config for 
MysqlSchedulerLeaseDeterminationStore");
+    }
+
+    this.tableName = ConfigUtils.getString(config, 
ConfigurationKeys.SCHEDULER_LEASE_DETERMINATION_STORE_DB_TABLE_KEY,
+        
ConfigurationKeys.DEFAULT_SCHEDULER_LEASE_DETERMINATION_STORE_DB_TABLE);
+    this.epsilon = ConfigUtils.getLong(config, 
ConfigurationKeys.SCHEDULER_TRIGGER_EVENT_EPSILON_MILLIS_KEY,
+        ConfigurationKeys.DEFAULT_SCHEDULER_TRIGGER_EVENT_EPSILON_MILLIS);
+    this.linger = ConfigUtils.getLong(config, 
ConfigurationKeys.SCHEDULER_TRIGGER_EVENT_EPSILON_MILLIS_KEY,
+        ConfigurationKeys.DEFAULT_SCHEDULER_TRIGGER_EVENT_EPSILON_MILLIS);
+
+    this.dataSource = MysqlDataSourceFactory.get(config, 
SharedResourcesBrokerFactory.getImplicitBroker());
+    try (Connection connection = dataSource.getConnection();
+        PreparedStatement createStatement = 
connection.prepareStatement(String.format(CREATE_TABLE_STATEMENT, tableName))) {
+      createStatement.executeUpdate();
+      connection.commit();
+    } catch (SQLException e) {
+      throw new IOException("Table creation failure for " + tableName, e);
+    }
+    this.dagActionStore = dagActionStore;
+  }
+
+  @Override
+  public LeaseAttemptStatus attemptInsertAndGetPursuantTimestamp(String 
flowGroup, String flowName,
+      String flowExecutionId, FlowActionType flowActionType, long 
triggerTimeMillis)
+      throws IOException {
+    Timestamp triggerTimestamp = new Timestamp(triggerTimeMillis);
+    try (Connection connection = this.dataSource.getConnection();
+        PreparedStatement insertStatement = connection.prepareStatement(
+            String.format(ATTEMPT_INSERT_AND_GET_PURSUANT_TIMESTAMP_STATEMENT, 
tableName, tableName, epsilon, tableName,
+                epsilon))) {
+      int i = 0;
+      // Values to set in new row
+      insertStatement.setString(++i, flowGroup);
+      insertStatement.setString(++i, flowName);
+      insertStatement.setString(++i, flowExecutionId);
+      insertStatement.setString(++i, flowActionType.toString());
+      insertStatement.setTimestamp(++i, triggerTimestamp);
+      // Values to check if existing row matches
+      insertStatement.setString(++i, flowGroup);
+      insertStatement.setString(++i, flowName);
+      insertStatement.setString(++i, flowExecutionId);
+      insertStatement.setString(++i, flowActionType.toString());
+      insertStatement.setTimestamp(++i, triggerTimestamp);
+      // Values to make select statement to read row
+      insertStatement.setString(++i, flowGroup);
+      insertStatement.setString(++i, flowName);
+      insertStatement.setString(++i, flowExecutionId);
+      insertStatement.setString(++i, flowActionType.toString());
+      insertStatement.setTimestamp(++i, triggerTimestamp);
+      ResultSet resultSet = insertStatement.executeQuery();
+      connection.commit();
+
+      if (!resultSet.next()) {
+        resultSet.close();
+        throw new IOException(String.format("Unexpected error where no result 
returned while trying to obtain lease. "
+                + "This error indicates that no entry existed for trigger flow 
event for table %s flow group: %s, flow "
+                + "name: %s flow execution id: %s and trigger timestamp: %s 
when one should have been inserted",
+            tableName, flowGroup, flowName, flowExecutionId, 
triggerTimestamp));
+      }
+      // If a row was inserted, then we have obtained the lease
+      int rowsUpdated = resultSet.getInt(1);
+      if (rowsUpdated == 1) {
+        // If the pursuing flow launch has been persisted to the {@link 
DagActionStore} we have completed lease obtainment
+        this.dagActionStore.addDagAction(flowGroup, flowName, flowExecutionId, 
DagActionStore.DagActionValue.LAUNCH);
+        if (this.dagActionStore.exists(flowGroup, flowName, flowExecutionId, 
DagActionStore.DagActionValue.LAUNCH)) {
+          if (updatePursuantTimestamp(flowGroup, flowName, flowExecutionId, 
flowActionType, triggerTimestamp)) {
+            // TODO: potentially add metric here to count number of flows 
scheduled by each scheduler

Review Comment:
   I think this would be an excellent metric also for the purpose of 
loadbalancing, maybe worth implementing directly



##########
gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/SchedulerLeaseAlgoHandler.java:
##########
@@ -0,0 +1,144 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.gobblin.service.modules.orchestration;
+
+import java.io.IOException;
+import java.time.LocalDateTime;
+import java.time.format.DateTimeFormatter;
+import java.time.temporal.ChronoUnit;
+import java.util.Locale;
+import java.util.Properties;
+import java.util.Random;
+
+import org.quartz.JobKey;
+import org.quartz.SchedulerException;
+import org.quartz.Trigger;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.typesafe.config.Config;
+
+import javax.inject.Inject;
+
+import org.apache.gobblin.configuration.ConfigurationKeys;
+import org.apache.gobblin.runtime.api.SchedulerLeaseDeterminationStore;
+import org.apache.gobblin.scheduler.JobScheduler;
+import org.apache.gobblin.scheduler.SchedulerService;
+import org.apache.gobblin.util.ConfigUtils;
+
+
+public class SchedulerLeaseAlgoHandler {
+  private static final Logger LOG = 
LoggerFactory.getLogger(SchedulerLeaseAlgoHandler.class);
+  private final long linger;
+  private final int staggerUpperBoundSec;
+  private static Random random = new Random();
+  protected SchedulerLeaseDeterminationStore leaseDeterminationStore;
+  protected JobScheduler jobScheduler;
+  protected SchedulerService schedulerService;
+  @Inject
+  public SchedulerLeaseAlgoHandler(Config config, 
SchedulerLeaseDeterminationStore leaseDeterminationStore,
+      JobScheduler jobScheduler, SchedulerService schedulerService)
+      throws IOException {
+    this.linger = ConfigUtils.getLong(config, 
ConfigurationKeys.SCHEDULER_TRIGGER_EVENT_EPSILON_MILLIS_KEY,
+        ConfigurationKeys.DEFAULT_SCHEDULER_TRIGGER_EVENT_EPSILON_MILLIS);
+    this.staggerUpperBoundSec = ConfigUtils.getInt(config,
+        ConfigurationKeys.SCHEDULER_STAGGERING_UPPER_BOUND_SEC_KEY,
+        ConfigurationKeys.DEFAULT_SCHEDULER_STAGGERING_UPPER_BOUND_SEC);
+    this.leaseDeterminationStore = leaseDeterminationStore;
+    this.jobScheduler = jobScheduler;
+    this.schedulerService = schedulerService;
+  }
+  private SchedulerLeaseDeterminationStore schedulerLeaseDeterminationStore;
+
+  /**
+   * This method is used in the multi-active scheduler case for one or more 
hosts to respond to a flow's trigger event
+   * by attempting a lease for the flow event.
+   * @param jobProps
+   * @param flowGroup
+   * @param flowName
+   * @param flowExecutionId
+   * @param flowActionType
+   * @param triggerTimeMillis
+   * @return true if this host obtained the lease for this flow's trigger 
event, false otherwise.
+   * @throws IOException
+   */
+  public boolean handleNewTriggerEvent(Properties jobProps, String flowGroup, 
String flowName, String flowExecutionId,
+      SchedulerLeaseDeterminationStore.FlowActionType flowActionType, long 
triggerTimeMillis)
+      throws IOException {
+    SchedulerLeaseDeterminationStore.LeaseAttemptStatus leaseAttemptStatus =
+        
schedulerLeaseDeterminationStore.attemptInsertAndGetPursuantTimestamp(flowGroup,
 flowName, flowExecutionId,
+            flowActionType, triggerTimeMillis);
+    // TODO: add a log event or metric for each of these cases
+    switch (leaseAttemptStatus) {
+      case LEASE_OBTAINED:
+        return true;
+      case PREVIOUS_LEASE_EXPIRED:
+        // recursively try obtaining lease again immediately, stops when 
reaches one of the other cases
+        return handleNewTriggerEvent(jobProps, flowGroup, flowName, 
flowExecutionId, flowActionType, triggerTimeMillis);
+      case PREVIOUS_LEASE_VALID:
+        scheduleReminderForTriggerEvent(jobProps, flowGroup, flowName, 
flowExecutionId, flowActionType, triggerTimeMillis);
+    }
+    return false;
+  }
+
+  /**
+   * This method is used by {@link 
SchedulerLeaseAlgoHandler.handleNewTriggerEvent} to schedule a reminder for 
itself to
+   * check on the other participant's progress during pursuing orchestration 
after the time the lease should expire.
+   * If the previous participant was successful, then no further action is 
taken otherwise we re-attempt pursuing
+   * orchestration ourselves.
+   * @param flowGroup
+   * @param flowName
+   * @param flowExecutionId
+   * @param flowActionType
+   * @param triggerTimeMillis
+   */
+  protected void scheduleReminderForTriggerEvent(Properties jobProps, String 
flowGroup, String flowName, String flowExecutionId,
+      SchedulerLeaseDeterminationStore.FlowActionType flowActionType, long 
triggerTimeMillis) {
+    // Check-in `linger` time after the current timestamp which is 
"close-enough" to the time the pursuant attempted
+    // the flow action. We also add a small randomization to avoid 'thundering 
herd' issue
+    String cronExpression = createCronFromDelayPeriod(linger + 
random.nextInt(staggerUpperBoundSec));
+    jobProps.setProperty(ConfigurationKeys.JOB_SCHEDULE_KEY, cronExpression);
+    // This timestamp is what will be used to identify the particular flow 
trigger event it's associated with
+    
jobProps.setProperty(ConfigurationKeys.SCHEDULER_ORIGINAL_TRIGGER_TIMESTAMP_MILLIS_KEY,
 String.valueOf(triggerTimeMillis));
+    JobKey key = new JobKey(flowName, flowGroup);
+    Trigger trigger = this.jobScheduler.getTrigger(key, jobProps);
+    try {
+      LOG.info("Attempting to add job reminder to Scheduler Service where job 
is %s trigger event %s and reminder is at "
+          + "%s.", key, triggerTimeMillis, trigger.getNextFireTime());
+      this.schedulerService.getScheduler().scheduleJob(trigger);
+    } catch (SchedulerException e) {
+      LOG.warn("Failed to add job reminder due to SchedulerException for job 
%s trigger event %s ", key, triggerTimeMillis, e);
+    }
+    LOG.info(String.format("Scheduled reminder for job %s trigger event %s. 
Next run: %s.", key, triggerTimeMillis, trigger.getNextFireTime()));
+  }
+
+  /**
+   * These methods should only be called from the Orchestrator or JobScheduler 
classes as it directly adds jobs to the
+   * Quartz scheduler
+   * @param delayPeriodSeconds
+   * @return
+   */
+  protected static String createCronFromDelayPeriod(long delayPeriodSeconds) {
+    LocalDateTime now = LocalDateTime.now();
+    LocalDateTime delaySecondsLater = now.plus(delayPeriodSeconds, 
ChronoUnit.SECONDS);
+    // TODO: investigate potentially better way of generating cron expression 
that does not make it US dependent
+    DateTimeFormatter formatter = DateTimeFormatter.ofPattern("ss mm HH dd MM 
? yyyy", Locale.US);

Review Comment:
   Does LocalDateTime.now() default to US timezone? Otherwise this could cause 
issues. I would suggest you use LocalDateTime.now(<timezone>) to ensure 
consistency in this system across timezones.
   Also, I think GaaS scheduler defaults to UTC.



##########
gobblin-service/src/main/java/org/apache/gobblin/service/monitoring/DagActionStoreChangeMonitor.java:
##########
@@ -151,7 +170,16 @@ protected void processMessage(DecodeableKafkaRecord 
message) {
           log.info("Received insert dag action and about to send kill flow 
request");
           dagManager.handleKillFlowRequest(flowGroup, flowName, 
Long.parseLong(flowExecutionId));
           this.killsInvoked.mark();
-        } else {
+        } else if (dagAction.equals(DagActionStore.DagActionValue.LAUNCH)) {
+          // If multi-active scheduler is NOT turned on we should not receive 
these type of events
+          if (!this.isMultiActiveSchedulerEnabled) {
+            log.warn("Received LAUNCH dagAction while not in multi-active 
scheduler mode for flow group: {}, flow name:"
+                + "{}, execution id: {}, dagAction: {}", flowGroup, flowName, 
flowExecutionId, dagAction);
+            this.unexpectedErrors.mark();
+          }
+          log.info("Received insert dag action and about to forward launch 
request to DagManager");
+          submitFlowToDagManager(flowGroup, flowName);

Review Comment:
   I agree with this sentiment, rather would let it fail loudly if it's not 
expected.





Issue Time Tracking
-------------------

    Worklog Id:     (was: 863082)
    Time Spent: 5h 40m  (was: 5.5h)

> Implement multi-active, non blocking for leader host
> ----------------------------------------------------
>
>                 Key: GOBBLIN-1837
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1837
>             Project: Apache Gobblin
>          Issue Type: Bug
>          Components: gobblin-service
>            Reporter: Urmi Mustafi
>            Assignee: Abhishek Tiwari
>            Priority: Major
>          Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> This task will include the implementation of non-blocking, multi-active 
> scheduler for each host. It will NOT include metric emission or unit tests 
> for validation. That will be done in a separate follow-up ticket. The work in 
> this ticket includes
>  * define a table to do scheduler lease determination for each flow's trigger 
> event and related methods to execute actions on this tableĀ 
>  * update DagActionStore schema and DagActionStoreMonitor to act upon new 
> "LAUNCH" type events in addition to KILL/RESUME
>  * update scheduler/orchestrator logic to apply the non-blocking algorithm 
> when "multi-active scheduler mode" is enabled, otherwise submit events 
> directly to the DagManager after receiving a scheduler trigger
>  * implement the non-blocking algorithm, particularly handling reminder 
> events if another host is in the process of securing the lease for a 
> particular flow trigger



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to