Andrew Wong created KUDU-2690:
---------------------------------

             Summary: Alter schema seems to be missing
                 Key: KUDU-2690
                 URL: https://issues.apache.org/jira/browse/KUDU-2690
             Project: Kudu
          Issue Type: Bug
          Components: log, master, tablet
    Affects Versions: 1.7.1
            Reporter: Andrew Wong


I've seen an issue that looks as though an ADD_COLUMN is not fully applied 
before performing writes. This results in a failure to bootstrap with an error 
like:

{{F0112 19:58:08.591284  8692 transaction_driver.cc:383] T 
578f2c6e60d84cb18d704889ea323cda P dc0af5867d52468f8fd47abf13c08040 S R-NP Ts 
6317323785408049152: Cannot cancel transactions that have already replicated: 
Invalid argument: Client provided column <COLUMN NAME>[double NULLABLE] not 
present in tablet transaction:R-NP WriteTransaction [type=REPLICA, 
start_time=2019-01-12 19:58:08, state=WriteTransactionState 0x5d52000 
[op_id=(term: 2548 index: 160364490), ts=6317323785408049152, rows=[]]]}}

 

One clue is that in the WALs, the "client schema" (the schema in each write 
request) contains a column that is not in the "tablet schema" (the schema in 
the log segment), and so dumping the WALs will fail. This alone shouldn't 
prevent bootstrapping, but when replaying the WAL, we decode the write request 
against the schema in the tablet metadata. This failure seems to indicate that 
the tablet metadata's schema is missing a column that is being used by a 
committed write. I've been trying to piece together various ALTER SCHEMA bugs 
that we have (e.g. KUDU-860) to recreate this, but haven't had much luck.

 

It's worth noting that this cluster is misconfigured so its tablet servers 
point to duplicate master addresses, and is therefore susceptible to KUDU-2681 
and KUDU-2684, meaning each tablet report will result in multiple concurrent 
tasks being scheduled in response.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to