[
https://issues.apache.org/jira/browse/KUDU-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jean-Daniel Cryans reassigned KUDU-2083:
----------------------------------------
Assignee: David Alves
Code Review: https://gerrit.cloudera.org/#/c/7610/
David pushed a fix to master but we ought to backport it so leaving it open
until that's done.
Also, for future google searches, here's the smoking gun:
{noformat}
I0825 00:57:57.157274 26537 maintenance_manager.cc:265] P
2ff1960fe2ef43a09f7458a1883bdca1: Prepare failed for
FlushMRSOp(5ed8492e345f45cbbf4a7c2f706605a5). Re-running scheduler.
{noformat}
When you hit this as many times as you have maintenance manager threads
configured (default: 1), you stop flushing.
> MaintenanceManager running_op_ count not decremented if
> MaintenanceOp::Prepare() fails
> --------------------------------------------------------------------------------------
>
> Key: KUDU-2083
> URL: https://issues.apache.org/jira/browse/KUDU-2083
> Project: Kudu
> Issue Type: Bug
> Reporter: Samuel Okrent
> Assignee: David Alves
> Priority: Critical
>
> In MaintenanceManager::RunSchedulerThread(), an op gets selected,
> running_ops_ is incremented, and Prepare() is called on the op. If Prepare()
> returns false, the op isn't run, so running_ops_ never gets decremented. If
> Prepare() ever fails, then this could be a problem, as the maintenance
> manager compares running_ops_ to the number of operation threads to determine
> whether or not it can run another operation. Prepare generally doesn't fail,
> but if Tablet::AlterSchema() is called in between FlushMRSOp::UpdateStats()
> and FlushMRSOp::Prepare(), that is one instance where Prepare() could
> potentially fail.
> To fix, decrement running_ops_ in the codepath that follows from Prepare()
> failing.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)