[jira] [Resolved] (KUDU-1362) Ensure master behaves correctly after a sys_catalog write failure

Adar Dembo (JIRA) Mon, 07 Mar 2016 14:48:18 -0800

     [ 
https://issues.apache.org/jira/browse/KUDU-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Adar Dembo resolved KUDU-1362.
------------------------------
    Resolution: Duplicate

Whoops, Alex already had an issue tracking this.

> Ensure master behaves correctly after a sys_catalog write failure
> -----------------------------------------------------------------
>
>                 Key: KUDU-1362
>                 URL: https://issues.apache.org/jira/browse/KUDU-1362
>             Project: Kudu
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.7.0
>            Reporter: Adar Dembo
>            Assignee: Adar Dembo
>            Priority: Critical
>
> For multi-master usage to truly be safe, we must ensure that a failure to 
> write to the system catalog table is handled correctly. When there's only one 
> master this can only happen in the event of a disk failure or equivalent, but 
> with multiple masters, failures can happen all the time (i.e. failed 
> replicas, network partitions, etc.)
> So far I've only found one case where this is truly broken, in 
> catalog_manager.cc:L2444:
> {noformat}
>    2433 void CatalogManager::DeleteTabletsAndSendRequests(const 
> scoped_refptr<TableInfo>& table) {
>    2434   vector<scoped_refptr<TabletInfo> > tablets;
>    2435   table->GetAllTablets(&tablets);
>    2436 
>    2437   string deletion_msg = "Table deleted at " + LocalTimeAsString();
>    2438 
>    2439   for (const scoped_refptr<TabletInfo>& tablet : tablets) {
>    2440     DeleteTabletReplicas(tablet.get(), deletion_msg);
>    2441 
>    2442     TabletMetadataLock tablet_lock(tablet.get(), 
> TabletMetadataLock::WRITE);
>    2443     tablet_lock.mutable_data()->set_state(SysTabletsEntryPB::DELETED, 
> deletion_msg);
>   >2444     CHECK_OK(sys_catalog_->UpdateTablets({ tablet.get() }));
>    2445     tablet_lock.Commit();
>    2446   }
>    2447 }
> {noformat}
> In this case we should batch up all of the tablet deletions into one 
> UpdateTablets() call, and pass the status up to the DeleteTable caller too.
> Part of the work here is an integration test that provides good coverage for 
> the various failure paths.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (KUDU-1362) Ensure master behaves correctly after a sys_catalog write failure

Reply via email to