[
https://issues.apache.org/jira/browse/AURORA-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864569#comment-15864569
]
David McLaughlin commented on AURORA-1890:
------------------------------------------
You're right, the write volume is totally dependent on your update volume and
the pulse interval. For many use cases, the cost of the update would be
negligible. I think the real concern was the cost of reading the last pulse
time.
One other reason why persisting the pulse is not super useful is the scheduler
failover time typically exceeds a sane pulse timeout. The same applies to
automatically setting it to the last event time (which would be preferable
IMO). I think the reason we backed out of the grace period change (which was
going to be achieved by setting the timestamp to scheduler acquiring leadership
timestamp) is that it would potentially reactivate a bunch of updates that were
legitimately blocked. In the end, we agreed the churn from ROLLING_FORWARD ->
BLOCKED_AWAITING_PULSE -> ROLLING_FORWARD was harmless. But I suppose if you
have automation on top of this that reacts to state changes, it could be
annoying.
> Job Update Pulse History is not durably stored
> ----------------------------------------------
>
> Key: AURORA-1890
> URL: https://issues.apache.org/jira/browse/AURORA-1890
> Project: Aurora
> Issue Type: Bug
> Reporter: Zameer Manji
>
> I have experienced the following problem with pulse updates. To reproduce:
> 1. Create an update with a pulse timeout of 1h
> 2. Send a pulse to get the update going.
> 3. Failover the scheduler immediately after.
> 4. Observe that the update is awaiting another pulse right after the failover.
> This is because the {{JobUpdateControllerImpl}} stores pulse history and
> state in memory in {{PulseHandler}}. On scheduler startup, the pulse state is
> reset to no pulse received.
> We can solve this by durably storing the timestamp of the last pulse received
> in storage.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)