[jira] [Created] (CASSANDRA-15844) Create table Asynchronously or contact the same node from many client threads at same time may causing data lose

maxwellguo (Jira) Mon, 01 Jun 2020 20:40:07 -0700

maxwellguo created CASSANDRA-15844:
--------------------------------------

             Summary: Create table Asynchronously or contact the same node from 
many client threads at same time may causing data lose
                 Key: CASSANDRA-15844
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15844
             Project: Cassandra
          Issue Type: Bug
          Components: Cluster/Schema
            Reporter: maxwellguo
            Assignee: maxwellguo
         Attachments: createkeyspace.jpg, keyspace inner.jpg, schemaversion.jpg


When creating a table from on one coordinator node from some client threads at 
the same time, Or creating a table using session.executeAsync() method, may 
cause the schema'information incorrect. Seriously will causing data lose.
For my test. I use executeAsync() to create table one by one using the same 
table name (Though I do konw create table should be synchronously, but some of 
our customers may create table using executAsync() ). My expectations is that 
the last cql 
{code:java}
CREATE TABLE ks.tb (name text PRIMARY KEY , age int, adds text, height text)
{code}
should take effect . 

 !createkeyspace.jpg! 

But after runing the code, I foud that the result is not what I am expected.  
the schema struct is is :

{code:java}
CREATE TABLE ks.tb (name text PRIMARY KEY , age int, adds text, sex int, height 
int)
{code}
 !keyspace inner.jpg! 
And the schema version in the memory and on the disk is not the same. 
 !schemaversion.jpg! 

When add a new columnfamily (creat a new table), the request of creating same 
table with different schema definition arrived at the same time from different 
clients or using 
executeAsync method. 
{code:java}
 private static void announceNewColumnFamily(CFMetaData cfm, boolean 
announceLocally, boolean throwOnDuplicate, long timestamp) throws 
ConfigurationException
    {
        cfm.validate();

        KeyspaceMetadata ksm = Schema.instance.getKSMetaData(cfm.ksName);
        if (ksm == null)
            throw new ConfigurationException(String.format("Cannot add table 
'%s' to non existing keyspace '%s'.", cfm.cfName, cfm.ksName));
        // If we have a table or a view which has the same name, we can't add a 
new one
        else if (throwOnDuplicate && ksm.getTableOrViewNullable(cfm.cfName) != 
null)
            throw new AlreadyExistsException(cfm.ksName, cfm.cfName);

        logger.info("Create new table: {}", cfm);
        announce(SchemaKeyspace.makeCreateTableMutation(ksm, cfm, timestamp), 
announceLocally);
    }
{code}
The code of checking table existance may failed. And same table's request may 
all going to do announce() method;

{code:java}
public static synchronized void mergeSchema(Collection<Mutation> mutations, 
boolean forDynamoTTL)
    {
        // only compare the keyspaces affected by this set of schema mutations
        Set<String> affectedKeyspaces =
        mutations.stream()
                 .map(m -> UTF8Type.instance.compose(m.key().getKey()))
                 .collect(Collectors.toSet());

        // fetch the current state of schema for the affected keyspaces only
        Keyspaces before = Schema.instance.getKeyspaces(affectedKeyspaces);

        // apply the schema mutations and flush
        mutations.forEach(Mutation::apply);
        if (FLUSH_SCHEMA_TABLES)
            flush();


        // fetch the new state of schema from schema tables (not applied to 
Schema.instance yet)
        Keyspaces after = fetchKeyspacesOnly(affectedKeyspaces);

        mergeSchema(before, after);
        scheduleDynamoTTLClean(forDynamoTTL, mutations);
    }
{code}
For we may write the new table definition into disk, so at last we saw 
{code:java}
CREATE TABLE ks.tb (name text PRIMARY KEY , age int, adds text, sex int, height 
int)
{code}
in our case.
And we also saw the different version in memory and disk. 
when writing data we using the schema in memory, but when we doing node restart 
the schema definition on disk will be used. Then may causing data lose. 




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (CASSANDRA-15844) Create table Asynchronously or contact the same node from many client threads at same time may causing data lose

Reply via email to