[ 
https://issues.apache.org/jira/browse/KUDU-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans reassigned KUDU-2083:
----------------------------------------

       Assignee: David Alves
    Code Review: https://gerrit.cloudera.org/#/c/7610/

David pushed a fix to master but we ought to backport it so leaving it open 
until that's done.

Also, for future google searches, here's the smoking gun:

{noformat}
I0825 00:57:57.157274 26537 maintenance_manager.cc:265] P 
2ff1960fe2ef43a09f7458a1883bdca1: Prepare failed for 
FlushMRSOp(5ed8492e345f45cbbf4a7c2f706605a5).  Re-running scheduler.
{noformat}

When you hit this as many times as you have maintenance manager threads 
configured (default: 1), you stop flushing.

> MaintenanceManager running_op_ count not decremented if 
> MaintenanceOp::Prepare() fails
> --------------------------------------------------------------------------------------
>
>                 Key: KUDU-2083
>                 URL: https://issues.apache.org/jira/browse/KUDU-2083
>             Project: Kudu
>          Issue Type: Bug
>            Reporter: Samuel Okrent
>            Assignee: David Alves
>            Priority: Critical
>
> In MaintenanceManager::RunSchedulerThread(), an op gets selected, 
> running_ops_ is incremented, and Prepare() is called on the op. If Prepare() 
> returns false, the op isn't run, so running_ops_ never gets decremented. If 
> Prepare() ever fails, then this could be a problem, as the maintenance 
> manager compares running_ops_ to the number of operation threads to determine 
> whether or not it can run another operation. Prepare generally doesn't fail, 
> but if Tablet::AlterSchema() is called in between FlushMRSOp::UpdateStats() 
> and  FlushMRSOp::Prepare(), that is one instance where Prepare() could 
> potentially fail.
> To fix, decrement running_ops_ in the codepath that follows from Prepare() 
> failing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to