[ 
https://issues.apache.org/jira/browse/KUDU-915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon reassigned KUDU-915:
--------------------------------

    Assignee:     (was: Todd Lipcon)

> Bootstrap can fail shortly after an alter-table
> -----------------------------------------------
>
>                 Key: KUDU-915
>                 URL: https://issues.apache.org/jira/browse/KUDU-915
>             Project: Kudu
>          Issue Type: Bug
>          Components: tablet
>    Affects Versions: Private Beta
>            Reporter: Todd Lipcon
>            Priority: Critical
>         Attachments: alter_table-randomized-test (5).txt.gz
>
>
> I saw a test failure which seems to be due to the following sequence:
> 1) Log: REPLICATE 1.8 ALTER_SCHEMA
> 2) Log: REPLICATE 1.9 WRITE
> 3) Log: COMMIT 1.9 WRITE
> 4) TabletMetadata::Flush()
> 5) crash (before COMMIT 1.8 ALTER_SCHEMA)
> During bootstrap, we then have an issue that, because we haven't seen a 
> commit message for 1.8, we consider operation 1.9 to be still pending. We are 
> relying on the tablet peer's FlushInFlightsToLogCallback to ensure that we 
> don't flush metadata until the COMMIT message in the log, but that isn't 
> strong enough -- we need to actually wait until COMMIT messages are in the 
> log for _all_ prior operations, not just all prior _writes_. The 
> implementation currently uses MvccManager::WaitForAllInFlightToCommit, but 
> since AlterSchema doesn't use MvccManager, we aren't waiting for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to