[
https://issues.apache.org/jira/browse/GOBBLIN-1921?focusedWorklogId=883015&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-883015
]
ASF GitHub Bot logged work on GOBBLIN-1921:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 02/Oct/23 23:13
Start Date: 02/Oct/23 23:13
Worklog Time Spent: 10m
Work Description: umustafi commented on code in PR #3790:
URL: https://github.com/apache/gobblin/pull/3790#discussion_r1343247173
##########
gobblin-runtime/src/main/java/org/apache/gobblin/runtime/api/MysqlMultiActiveLeaseArbiter.java:
##########
@@ -86,16 +88,26 @@ protected interface CheckedFunction<T, R> {
private final int epsilon;
private final int linger;
private String thisTableGetInfoStatement;
+ private String thisTableGetInfoStatementForReminder;
private String thisTableSelectAfterInsertStatement;
private String thisTableAcquireLeaseIfMatchingAllStatement;
private String thisTableAcquireLeaseIfFinishedStatement;
// TODO: define retention on this table
+ /*
+ Notes:
+ - Set `event_timestamp` default value to turn off timestamp auto-updates
for row modifications which alters this col
+ in an unexpected way upon completing the lease
+ - Upon reading any timestamps from MySQL we convert the timezone from
session (default) to UTC to consistently
+ use epoch-millis in UTC locally
+ - Upon using any timestamps from local we convert the timezone from UTC to
session
+ - We desire millisecond level precision and denote that with `(3)` for the
TIMESTAMP types
+ */
private static final String CREATE_LEASE_ARBITER_TABLE_STATEMENT = "CREATE
TABLE IF NOT EXISTS %s ("
+ "flow_group varchar(" + ServiceConfigKeys.MAX_FLOW_GROUP_LENGTH + ")
NOT NULL, flow_name varchar("
+ ServiceConfigKeys.MAX_FLOW_GROUP_LENGTH + ") NOT NULL, " + "
flow_action varchar(100) NOT NULL, "
- + "event_timestamp TIMESTAMP, "
- + "lease_acquisition_timestamp TIMESTAMP NULL DEFAULT CURRENT_TIMESTAMP,
"
+ + "event_timestamp TIMESTAMP(3) DEFAULT CURRENT_TIMESTAMP(3), "
Review Comment:
I converted everything to BIGINT and updated the tests but reverted back
because I removed the database laundering after realizing that not having
database laundering made it very hard to deal with `reminderEvents`. We end up
having to think about every possible case of what to do with reminder events
and `withinEpsilon` and `afterLinger` or not while db laundering instead of
using the `triggerTimeMillis` from the hosts may result in out of order
timestamps if we have host drift. With db laundering, we use database timestamp
and it becomes easier to do time conversions with timestamp type. There's a lot
of back and forth that went into this
Issue Time Tracking
-------------------
Worklog Id: (was: 883015)
Time Spent: 2h 10m (was: 2h)
> Properly handle reminder events
> -------------------------------
>
> Key: GOBBLIN-1921
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1921
> Project: Apache Gobblin
> Issue Type: Bug
> Components: gobblin-service
> Reporter: Urmi Mustafi
> Assignee: Abhishek Tiwari
> Priority: Major
> Time Spent: 2h 10m
> Remaining Estimate: 0h
>
> Reminder flow trigger events were being improperly handled and interpreted as
> new events because they are triggered {{linger}} time after the original
> trigger where {{epsilon < linger}} and we use {{epsilon}} to determine event
> distinctness. With reminder events being considered distinct events, we were
> launching excess concurrent flows that were then being cancelled. Now we
> handle reminder events differently from normal event triggers to ensure
> they're properly evaluated. Because of db laundering, reminder events are
> easy to handle - if they're older than the currently worked upon event in the
> database they can be skipped and if they're equal to the current event in the
> database they are handled like normal. Reminder events should never be newer
> than the current event in the lease arbiter table because db laundering
> always results in increasing event times.Â
--
This message was sent by Atlassian Jira
(v8.20.10#820010)