[ANNOUNCE] 3.0.23/3.0.24/3.11.9/3.11.10 Can Potentially Corrupt Data During Schema Changes

2021-07-25 Thread Jordan West
The bug reported in CASSANDRA-16735 [1] was known to cause corruption
thought to be recoverable but can, in fact, induce *non-recoverable*
corruption in some partitions. If you are not yet on 3.0.23, 3.0.24,
3.11.9, or 3.11.10, it is recommended you wait to upgrade until the
Cassandra community releases 3.0.25 and 3.11.11. Once released, skip
directly from 3.0.22 to 3.0.25 or from 3.11.8 to 3.11.11. For those
already on the affected versions, the Cassandra community is working
to release 3.0.25 and 3.11.11 immediately. Immediate upgrade to 3.0.25
or 3.11.11 is recommended and all schema changes should be stopped
until the upgrade is complete.

While the issue has been known for some time, the severity of the
issue was not well understood. This understanding has improved and
with that we are suggesting the above actions for all users.


The issue was introduced by a fix for CASSANDRA-15899 [2] which
affected all versions up to and including 3.0.22 and 3.11.8. The fix
for CASSANDRA-16735 was to revert the patch made in CASSANDRA-15899
meaning clusters will continue to be susceptible to this transient
issue.

In summary:

- 3.0.22 and before/3.11.8 and before - susceptible to CASSANDRA-15899
which carries considerably less risk relative to CASSANDRA-16735.

- 3.0.23, 3.0.24, 3.11.9, 3.11.10 - has the CASSANDRA-15899 patch that
introduces the bug reported in CASSANDRA-16735. This makes Cassandra
susceptible to non-recoverable corruption and should be upgraded
immediately.

- 3.0.25, 3.11.11 - has CASSANDRA-15899 patch reverted by patch in
CASSANDRA-16735 -- no longer susceptible to unrecoverable corruption
but continues to be susceptible to CASSANDRA-15899.


[1] https://issues.apache.org/jira/browse/CASSANDRA-16735
[2] https://issues.apache.org/jira/browse/CASSANDRA-15899


Re: multiple clients making schema changes at once

2021-06-03 Thread Erick Ramirez
Having said that, I'm still not a fan of making schema changes
programmatically. I spend way too much time helping users unscramble their
schema after they've hit multiple disagreements. I do understand the need
for it but avoid it if you can particularly in production.

On Fri, 4 Jun 2021 at 09:41, Erick Ramirez 
wrote:

> I wonder if there’s a way to query the driver to see if your schema change
>> has fully propagated.  I haven’t looked into this.
>>
>
> Yes, the drivers have APIs for this. For example, the Java driver has
> isSchemaInAgreement() and checkSchemaAgreement().
>
> See
> https://docs.datastax.com/en/developer/java-driver/latest/manual/core/metadata/schema/.
> Cheers!
>
>


Re: multiple clients making schema changes at once

2021-06-03 Thread Erick Ramirez
>
> I wonder if there’s a way to query the driver to see if your schema change
> has fully propagated.  I haven’t looked into this.
>

Yes, the drivers have APIs for this. For example, the Java driver has
isSchemaInAgreement() and checkSchemaAgreement().

See
https://docs.datastax.com/en/developer/java-driver/latest/manual/core/metadata/schema/.
Cheers!


Re: multiple clients making schema changes at once

2021-06-03 Thread Max C.
Hi Joe,

In our case we only do this in the test environment and it could be the case 
that there are several seconds or even minutes between when a schema change 
occurs vs when a test executes that depends on said schema change.  Perhaps we 
have been lucky thus far.  :-)

I wonder if there’s a way to query the driver to see if your schema change has 
fully propagated.  I haven’t looked into this.

- Max

> On Jun 3, 2021, at 8:23 am, Joe Obernberger  
> wrote:
> 
> How does this work?  I have a program that runs a series of alter table 
> statements, and then does inserts.  In some cases, the insert happens 
> immediately after the alter table statement and the insert fails because the 
> schema (apparently) has not had time to propagate.  I get an Undefined column 
> name error.
> 
> The alter statements run single threaded, but the inserts run in multiple 
> threads.  The alter statement is run in a synchronized block (Java).  Should 
> I put an artificial delay after the alter statement?
> 
> -Joe
> 
> On 6/1/2021 2:59 PM, Max C. wrote:
>> We use ZooKeeper + kazoo’s lock implementation.  Kazoo is a Python client 
>> library for ZooKeeper.
>> 
>> - Max
>> 
>>> Yes this is quite annoying. How did you implement that "external lock"? I 
>>> also thought of doing an external service that would be dedicated to that. 
>>> Cassandra client apps would send create instruction to that service, that 
>>> would receive them and do the creates 1 by 1, and the client app would wait 
>>> the response from it before starting to insert.
>>> 
>>> Best,
>>> 
>>> Sébastien.
>>> 
>>> Le mar. 1 juin 2021 à 05:21, Max C. >> <mailto:mc_cassand...@core43.com>> a écrit :
>>> In our case we have a shared dev cluster with (for example) a key space for 
>>> each developer, a key space for each CI runner, etc.   As part of 
>>> initializing our test suite we setup the schema to match the code that is 
>>> about to be tested.  This can mean multiple CI runners each adding/dropping 
>>> tables at the same time but for different key spaces.   
>>>   
>>> 
>>> Our experience is even though the schema changes do not conflict, we still 
>>> run into schema mismatch problems.   Our solution to this was to have a 
>>> lock (external to Cassandra) that ensures only a single schema change 
>>> operation is being issued at a time.
>>> 
>>> People assume schema changes in Cassandra work the same way as MySQL or 
>>> multiple users editing files on disk — i.e. as long as you’re not editing 
>>> the same file (or same MySQL table), then there’s no problem.  This is NOT 
>>> the case.  Cassandra schema changes are more like “git push”ing a commit to 
>>> the same branch — i.e. at most one change can be outstanding at a time 
>>> (across all tables, all key spaces)…otherwise you will run into trouble.
>>> 
>>> Hope that helps.  Best of luck.
>>> 
>>> - Max
>>> 
>>> 
>>> Hello,
>>> 
>>> I have a more general question about that, I cannot find clear answer.
>>> 
>>> In my use case I have many tables (around 10k new tables created per 
>>> months) and they are created from many clients and only dynamically, with 
>>> several clients creating same tables simulteanously.
>>> 
>>> What is the recommended way of creating tables dynamically? If I am doing 
>>> "if not exists" queries + wait for schema aggreement before and after each 
>>> create statement, will it work correctly for Cassandra?
>>> 
>>> Sébastien.
>>> 
>> 
>> 
>>  
>> <http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient>
>> Virus-free. www.avg.com 
>> <http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient>
>>  


Re: multiple clients making schema changes at once

2021-06-03 Thread Jeff Jirsa
CFID mismatch is not "schema not propagated", it means you created the
table twice at the same time, and you have an inconsistent view of the
table within your cluster.

This is bad. Really bad. Worse than you expect. It's a bug in cassandra,
but until it's fixed, you should stop doing concurrent schema modifications.



On Thu, Jun 3, 2021 at 8:37 AM Sébastien Rebecchi 
wrote:

> Sometimes even waiting hours does not change. I have a cluster where I did
> like you, synchronization of create tables statement, then even I tried
> waiting for schema agreement, in loop until success, but sometimes the
> success never happens, i got that error in loop in the logs of a node, it
> seems we must restart nodes really often :(
>
> Sébastien
>
> ERROR [InternalResponseStage:1117] 2021-06-03 17:32:34,937
> MigrationCoordinator.java:408 - Unable to merge schema from /
> 135.181.222.100
> org.apache.cassandra.exceptions.ConfigurationException: Column family ID
> mismatch (found a991bb50-c475-11eb-83cb-df35fc5a9bea; expected
> 994bee02-c475-11eb-beff-6d70d473832f)
> at
> org.apache.cassandra.config.CFMetaData.validateCompatibility(CFMetaData.java:984)
> ~[apache-cassandra-3.11.10.jar:3.11.10]
> at org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:938)
> ~[apache-cassandra-3.11.10.jar:3.11.10]
> at org.apache.cassandra.config.Schema.updateTable(Schema.java:687)
> ~[apache-cassandra-3.11.10.jar:3.11.10]
> at
> org.apache.cassandra.schema.SchemaKeyspace.updateKeyspace(SchemaKeyspace.java:1478)
> ~[apache-cassandra-3.11.10.jar:3.11.10]
> at
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1434)
> ~[apache-cassandra-3.11.10.jar:3.11.10]
> at
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1403)
> ~[apache-cassandra-3.11.10.jar:3.11.10]
> at
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1380)
> ~[apache-cassandra-3.11.10.jar:3.11.10]
> at
> org.apache.cassandra.service.MigrationCoordinator.mergeSchemaFrom(MigrationCoordinator.java:367)
> ~[apache-cassandra-3.11.10.jar:3.11.10]
> at
> org.apache.cassandra.service.MigrationCoordinator$Callback.response(MigrationCoordinator.java:404)
> [apache-cassandra-3.11.10.jar:3.11.10]
> at
> org.apache.cassandra.service.MigrationCoordinator$Callback.response(MigrationCoordinator.java:393)
> [apache-cassandra-3.11.10.jar:3.11.10]
> at
> org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:53)
> [apache-cassandra-3.11.10.jar:3.11.10]
> at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66)
> [apache-cassandra-3.11.10.jar:3.11.10]
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [na:1.8.0_292]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_292]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [na:1.8.0_292]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [na:1.8.0_292]
> at
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
> [apache-cassandra-3.11.10.jar:3.11.10]
> at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_292]
>
> Le jeu. 3 juin 2021 à 17:23, Joe Obernberger 
> a écrit :
>
>> How does this work?  I have a program that runs a series of alter table
>> statements, and then does inserts.  In some cases, the insert happens
>> immediately after the alter table statement and the insert fails because
>> the schema (apparently) has not had time to propagate.  I get an Undefined
>> column name error.
>>
>> The alter statements run single threaded, but the inserts run in multiple
>> threads.  The alter statement is run in a synchronized block (Java).
>> Should I put an artificial delay after the alter statement?
>>
>> -Joe
>> On 6/1/2021 2:59 PM, Max C. wrote:
>>
>> We use ZooKeeper + kazoo’s lock implementation.  Kazoo is a Python client
>> library for ZooKeeper.
>>
>> - Max
>>
>> Yes this is quite annoying. How did you implement that "external lock"? I
>> also thought of doing an external service that would be dedicated to that.
>> Cassandra client apps would send create instruction to that service, that
>> would receive them and do the creates 1 by 1, and the client app would wait
>> the response from it before starting to insert.
>>
>> Best,
>>
>> Sébastien.
>>
>> Le mar. 1 juin 2021 à 05:21, Max C.  a écrit :
>>
>>> In our case we have a shared dev cluster with (for example) a key space
>>> for each developer, a key spa

Re: multiple clients making schema changes at once

2021-06-03 Thread Sébastien Rebecchi
Sometimes even waiting hours does not change. I have a cluster where I did
like you, synchronization of create tables statement, then even I tried
waiting for schema agreement, in loop until success, but sometimes the
success never happens, i got that error in loop in the logs of a node, it
seems we must restart nodes really often :(

Sébastien

ERROR [InternalResponseStage:1117] 2021-06-03 17:32:34,937
MigrationCoordinator.java:408 - Unable to merge schema from /135.181.222.100
org.apache.cassandra.exceptions.ConfigurationException: Column family ID
mismatch (found a991bb50-c475-11eb-83cb-df35fc5a9bea; expected
994bee02-c475-11eb-beff-6d70d473832f)
at
org.apache.cassandra.config.CFMetaData.validateCompatibility(CFMetaData.java:984)
~[apache-cassandra-3.11.10.jar:3.11.10]
at org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:938)
~[apache-cassandra-3.11.10.jar:3.11.10]
at org.apache.cassandra.config.Schema.updateTable(Schema.java:687)
~[apache-cassandra-3.11.10.jar:3.11.10]
at
org.apache.cassandra.schema.SchemaKeyspace.updateKeyspace(SchemaKeyspace.java:1478)
~[apache-cassandra-3.11.10.jar:3.11.10]
at
org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1434)
~[apache-cassandra-3.11.10.jar:3.11.10]
at
org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1403)
~[apache-cassandra-3.11.10.jar:3.11.10]
at
org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1380)
~[apache-cassandra-3.11.10.jar:3.11.10]
at
org.apache.cassandra.service.MigrationCoordinator.mergeSchemaFrom(MigrationCoordinator.java:367)
~[apache-cassandra-3.11.10.jar:3.11.10]
at
org.apache.cassandra.service.MigrationCoordinator$Callback.response(MigrationCoordinator.java:404)
[apache-cassandra-3.11.10.jar:3.11.10]
at
org.apache.cassandra.service.MigrationCoordinator$Callback.response(MigrationCoordinator.java:393)
[apache-cassandra-3.11.10.jar:3.11.10]
at
org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:53)
[apache-cassandra-3.11.10.jar:3.11.10]
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66)
[apache-cassandra-3.11.10.jar:3.11.10]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[na:1.8.0_292]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_292]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[na:1.8.0_292]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[na:1.8.0_292]
at
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
[apache-cassandra-3.11.10.jar:3.11.10]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_292]

Le jeu. 3 juin 2021 à 17:23, Joe Obernberger 
a écrit :

> How does this work?  I have a program that runs a series of alter table
> statements, and then does inserts.  In some cases, the insert happens
> immediately after the alter table statement and the insert fails because
> the schema (apparently) has not had time to propagate.  I get an Undefined
> column name error.
>
> The alter statements run single threaded, but the inserts run in multiple
> threads.  The alter statement is run in a synchronized block (Java).
> Should I put an artificial delay after the alter statement?
>
> -Joe
> On 6/1/2021 2:59 PM, Max C. wrote:
>
> We use ZooKeeper + kazoo’s lock implementation.  Kazoo is a Python client
> library for ZooKeeper.
>
> - Max
>
> Yes this is quite annoying. How did you implement that "external lock"? I
> also thought of doing an external service that would be dedicated to that.
> Cassandra client apps would send create instruction to that service, that
> would receive them and do the creates 1 by 1, and the client app would wait
> the response from it before starting to insert.
>
> Best,
>
> Sébastien.
>
> Le mar. 1 juin 2021 à 05:21, Max C.  a écrit :
>
>> In our case we have a shared dev cluster with (for example) a key space
>> for each developer, a key space for each CI runner, etc.   As part of
>> initializing our test suite we setup the schema to match the code that is
>> about to be tested.  This can mean multiple CI runners each adding/dropping
>> tables at the same time but for different key spaces.
>>
>> Our experience is even though the schema changes do not conflict, we
>> still run into schema mismatch problems.   Our solution to this was to have
>> a lock (external to Cassandra) that ensures only a single schema change
>> operation is being issued at a time.
>>
>> People assume schema changes in Cassandra work the same way as MySQL or
>> multiple users editing files on disk — i.e. as long as you’re not editing
>> the same file (or same MySQL table), then there’s no problem.  *This is
>> NO

Re: multiple clients making schema changes at once

2021-06-03 Thread Joe Obernberger
How does this work?  I have a program that runs a series of alter table 
statements, and then does inserts.  In some cases, the insert happens 
immediately after the alter table statement and the insert fails because 
the schema (apparently) has not had time to propagate.  I get an 
Undefined column name error.


The alter statements run single threaded, but the inserts run in 
multiple threads.  The alter statement is run in a synchronized block 
(Java).  Should I put an artificial delay after the alter statement?


-Joe

On 6/1/2021 2:59 PM, Max C. wrote:
We use ZooKeeper + kazoo’s lock implementation.  Kazoo is a Python 
client library for ZooKeeper.


- Max

Yes this is quite annoying. How did you implement that "external 
lock"? I also thought of doing an external service that would be 
dedicated to that. Cassandra client apps would send create 
instruction to that service, that would receive them and do the 
creates 1 by 1, and the client app would wait the response from it 
before starting to insert.


Best,

Sébastien.

Le mar. 1 juin 2021 à 05:21, Max C.  a 
écrit :


In our case we have a shared dev cluster with (for example) a key
space for each developer, a key space for each CI runner, etc.  
As part of initializing our test suite we setup the schema to
match the code that is about to be tested.� This can mean
multiple CI runners each adding/dropping tables at the same time
but for different key spaces.

Our experience is even though the schema changes do not conflict,
we still run into schema mismatch problems.   Our solution to
this was to have a lock (external to Cassandra) that ensures only
a single schema change operation is being issued at a time.

People assume schema changes in Cassandra work the same way as
MySQL or multiple users editing files on disk — i.e. as long as
you’re not editing the same file (or same MySQL table), then
there’s no problem. � *_This is NOT the case._*  Cassandra
    schema changes are more like “git push”ing a commit to the
same branch — i.e. at most one change can be outstanding at a
time (across all tables, all key spaces)…otherwise you will run
into trouble.

Hope that helps.  Best of luck.

- Max

Hello,

I have a more general question about that, I cannot find
clear answer.

In my use case I have many tables (around 10k new tables
created per months) and they are created from many clients
and only dynamically, with several clients creating same
tables simulteanously.

What is the recommended way of creating tables dynamically?
If I am doing "if not exists" queries + wait for schema
aggreement before and after each create statement, will it
work correctly for Cassandra?

Sébastien.





<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient> 
	Virus-free. www.avg.com 
<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient> 



<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Re: multiple clients making schema changes at once

2021-06-01 Thread Max C.
We use ZooKeeper + kazoo’s lock implementation.  Kazoo is a Python client 
library for ZooKeeper.

- Max

> Yes this is quite annoying. How did you implement that "external lock"? I 
> also thought of doing an external service that would be dedicated to that. 
> Cassandra client apps would send create instruction to that service, that 
> would receive them and do the creates 1 by 1, and the client app would wait 
> the response from it before starting to insert.
> 
> Best,
> 
> Sébastien.
> 
> Le mar. 1 juin 2021 à 05:21, Max C.  <mailto:mc_cassand...@core43.com>> a écrit :
> In our case we have a shared dev cluster with (for example) a key space for 
> each developer, a key space for each CI runner, etc.   As part of 
> initializing our test suite we setup the schema to match the code that is 
> about to be tested.  This can mean multiple CI runners each adding/dropping 
> tables at the same time but for different key spaces.
> 
> Our experience is even though the schema changes do not conflict, we still 
> run into schema mismatch problems.   Our solution to this was to have a lock 
> (external to Cassandra) that ensures only a single schema change operation is 
> being issued at a time.
> 
> People assume schema changes in Cassandra work the same way as MySQL or 
> multiple users editing files on disk — i.e. as long as you’re not editing the 
> same file (or same MySQL table), then there’s no problem.  This is NOT the 
> case.  Cassandra schema changes are more like “git push”ing a commit to the 
> same branch — i.e. at most one change can be outstanding at a time (across 
> all tables, all key spaces)…otherwise you will run into trouble.
> 
> Hope that helps.  Best of luck.
> 
> - Max
> 
> 
> Hello,
> 
> I have a more general question about that, I cannot find clear answer.
> 
> In my use case I have many tables (around 10k new tables created per months) 
> and they are created from many clients and only dynamically, with several 
> clients creating same tables simulteanously.
> 
> What is the recommended way of creating tables dynamically? If I am doing "if 
> not exists" queries + wait for schema aggreement before and after each create 
> statement, will it work correctly for Cassandra?
> 
> Sébastien.
> 



Re: multiple clients making schema changes at once

2021-06-01 Thread Sébastien Rebecchi
Hello,

Yes this is quite annoying. How did you implement that "external lock"? I
also thought of doing an external service that would be dedicated to that.
Cassandra client apps would send create instruction to that service, that
would receive them and do the creates 1 by 1, and the client app would wait
the response from it before starting to insert.

Best,

Sébastien.

Le mar. 1 juin 2021 à 05:21, Max C.  a écrit :

> In our case we have a shared dev cluster with (for example) a key space
> for each developer, a key space for each CI runner, etc.   As part of
> initializing our test suite we setup the schema to match the code that is
> about to be tested.  This can mean multiple CI runners each adding/dropping
> tables at the same time but for different key spaces.
>
> Our experience is even though the schema changes do not conflict, we still
> run into schema mismatch problems.   Our solution to this was to have a
> lock (external to Cassandra) that ensures only a single schema change
> operation is being issued at a time.
>
> People assume schema changes in Cassandra work the same way as MySQL or
> multiple users editing files on disk — i.e. as long as you’re not editing
> the same file (or same MySQL table), then there’s no problem.  *This is
> NOT the case.*  Cassandra schema changes are more like “git push”ing a
> commit to the same branch — i.e. at most one change can be outstanding at a
> time (across all tables, all key spaces)…otherwise you will run into
> trouble.
>
> Hope that helps.  Best of luck.
>
> - Max
>
> Hello,
>>
>> I have a more general question about that, I cannot find clear answer.
>>
>> In my use case I have many tables (around 10k new tables created per
>> months) and they are created from many clients and only dynamically, with
>> several clients creating same tables simulteanously.
>>
>> What is the recommended way of creating tables dynamically? If I am doing
>> "if not exists" queries + wait for schema aggreement before and after each
>> create statement, will it work correctly for Cassandra?
>>
>> Sébastien.
>>
>
>


Re: multiple clients making schema changes at once

2021-05-31 Thread Max C.
In our case we have a shared dev cluster with (for example) a key space for 
each developer, a key space for each CI runner, etc.   As part of initializing 
our test suite we setup the schema to match the code that is about to be 
tested.  This can mean multiple CI runners each adding/dropping tables at the 
same time but for different key spaces.

Our experience is even though the schema changes do not conflict, we still run 
into schema mismatch problems.   Our solution to this was to have a lock 
(external to Cassandra) that ensures only a single schema change operation is 
being issued at a time.

People assume schema changes in Cassandra work the same way as MySQL or 
multiple users editing files on disk — i.e. as long as you’re not editing the 
same file (or same MySQL table), then there’s no problem.  This is NOT the 
case.  Cassandra schema changes are more like “git push”ing a commit to the 
same branch — i.e. at most one change can be outstanding at a time (across all 
tables, all key spaces)…otherwise you will run into trouble.

Hope that helps.  Best of luck.

- Max


Hello,

I have a more general question about that, I cannot find clear answer.

In my use case I have many tables (around 10k new tables created per months) 
and they are created from many clients and only dynamically, with several 
clients creating same tables simulteanously.

What is the recommended way of creating tables dynamically? If I am doing "if 
not exists" queries + wait for schema aggreement before and after each create 
statement, will it work correctly for Cassandra?

Sébastien.



Re: Schema Changes

2016-11-17 Thread Fabrice Facorat
Schema are propagated by GOSSIP

you can check schema propagation cluster wide with nodetool describecluster
or "nodetool gossipinfo | grep SCHEMA | cut -f3 -d: | sort | uniq -c"

You'd better send your DDL instruction to only one node (for example by
using the whitelist load balancing policy with only 1 host specified), this
way your schemas changes will be serialized and you will avoid issues and
race conditions



2016-11-15 19:04 GMT+01:00 Josh Smith <jsm...@ionicsecurity.com>:

> Would someone please explain how schema changes happen?
>
> Here are some of the ring details
>
> We have 5 nodes in 1 DC and 5 nodes in another DC across the country.
>
> Here is our problem, we have a tool which automates our schema creation.
> Our schema consists of 7 keyspaces with 21 tables in each keyspace, so a
> total of 147 tables are created at the initial provisioning.  During this
> schema creation we end up with system_schema keyspace corruption, we have
> found that it is due to schema version disagreement. To combat this we
> setup a wait until there is only one version in both system.local and
> system.peers tables.
>
> The way I understand it schema changes are made on the local node only;
> changes are then propagated through either Thrift or Gossip, I could not
> find a definitive answer online if thrift or gossip was the carrier. So if
> I make all of the schema changes to one node it should propagate the
> changes to the other nodes one at a time. This is how I used to think that
> schema changes are propagated but we still get schema disagreement when
> changing the schema only on one node. Is the only option to introduce a
> wait after every table creation?  Should we be looking at another table
> besides system.local and peers? Any help would be appreciated.
>
>
>
> Josh Smith
>



-- 
Close the World, Open the Net
http://www.linux-wizard.net


Re: Schema Changes

2016-11-15 Thread Matija Gobec
We used cassandra migration tool for schema versioning and schema
agreement. Check it out here
<https://github.com/smartcat-labs/cassandra-migration-tool-java>.

Short:
When executing schema altering statements use these to wait for schema
propagation
resultSet.getExecutionInfo().isSchemaInAgreement()
and
session.getCluster().getMetadata().checkSchemaAgreement()

For detailed info check driver documentation. This solution is based on this
fix <https://datastax-oss.atlassian.net/browse/JAVA-669>.

Matija

On Tue, Nov 15, 2016 at 7:32 PM, Edward Capriolo <edlinuxg...@gmail.com>
wrote:

> You can start here:
>
> https://issues.apache.org/jira/browse/CASSANDRA-10699
>
> And here:
>
> http://stackoverflow.com/questions/20293897/cassandra-
> resolution-of-concurrent-schema-changes
>
> In a nutshell, schema changes works best when issued serially, when all
> nodes are up, and reachable. When these 3 conditions are not met a variety
> of behavior can be observed.
>
> On Tue, Nov 15, 2016 at 1:04 PM, Josh Smith <jsm...@ionicsecurity.com>
> wrote:
>
>> Would someone please explain how schema changes happen?
>>
>> Here are some of the ring details
>>
>> We have 5 nodes in 1 DC and 5 nodes in another DC across the country.
>>
>> Here is our problem, we have a tool which automates our schema creation.
>> Our schema consists of 7 keyspaces with 21 tables in each keyspace, so a
>> total of 147 tables are created at the initial provisioning.  During this
>> schema creation we end up with system_schema keyspace corruption, we have
>> found that it is due to schema version disagreement. To combat this we
>> setup a wait until there is only one version in both system.local and
>> system.peers tables.
>>
>> The way I understand it schema changes are made on the local node only;
>> changes are then propagated through either Thrift or Gossip, I could not
>> find a definitive answer online if thrift or gossip was the carrier. So if
>> I make all of the schema changes to one node it should propagate the
>> changes to the other nodes one at a time. This is how I used to think that
>> schema changes are propagated but we still get schema disagreement when
>> changing the schema only on one node. Is the only option to introduce a
>> wait after every table creation?  Should we be looking at another table
>> besides system.local and peers? Any help would be appreciated.
>>
>>
>>
>> Josh Smith
>>
>
>


Re: Schema Changes

2016-11-15 Thread Edward Capriolo
You can start here:

https://issues.apache.org/jira/browse/CASSANDRA-10699

And here:

http://stackoverflow.com/questions/20293897/cassandra-resolution-of-concurrent-schema-changes

In a nutshell, schema changes works best when issued serially, when all
nodes are up, and reachable. When these 3 conditions are not met a variety
of behavior can be observed.

On Tue, Nov 15, 2016 at 1:04 PM, Josh Smith <jsm...@ionicsecurity.com>
wrote:

> Would someone please explain how schema changes happen?
>
> Here are some of the ring details
>
> We have 5 nodes in 1 DC and 5 nodes in another DC across the country.
>
> Here is our problem, we have a tool which automates our schema creation.
> Our schema consists of 7 keyspaces with 21 tables in each keyspace, so a
> total of 147 tables are created at the initial provisioning.  During this
> schema creation we end up with system_schema keyspace corruption, we have
> found that it is due to schema version disagreement. To combat this we
> setup a wait until there is only one version in both system.local and
> system.peers tables.
>
> The way I understand it schema changes are made on the local node only;
> changes are then propagated through either Thrift or Gossip, I could not
> find a definitive answer online if thrift or gossip was the carrier. So if
> I make all of the schema changes to one node it should propagate the
> changes to the other nodes one at a time. This is how I used to think that
> schema changes are propagated but we still get schema disagreement when
> changing the schema only on one node. Is the only option to introduce a
> wait after every table creation?  Should we be looking at another table
> besides system.local and peers? Any help would be appreciated.
>
>
>
> Josh Smith
>


Schema Changes

2016-11-15 Thread Josh Smith
Would someone please explain how schema changes happen?
Here are some of the ring details
We have 5 nodes in 1 DC and 5 nodes in another DC across the country.
Here is our problem, we have a tool which automates our schema creation. Our 
schema consists of 7 keyspaces with 21 tables in each keyspace, so a total of 
147 tables are created at the initial provisioning.  During this schema 
creation we end up with system_schema keyspace corruption, we have found that 
it is due to schema version disagreement. To combat this we setup a wait until 
there is only one version in both system.local and system.peers tables.
The way I understand it schema changes are made on the local node only; changes 
are then propagated through either Thrift or Gossip, I could not find a 
definitive answer online if thrift or gossip was the carrier. So if I make all 
of the schema changes to one node it should propagate the changes to the other 
nodes one at a time. This is how I used to think that schema changes are 
propagated but we still get schema disagreement when changing the schema only 
on one node. Is the only option to introduce a wait after every table creation? 
 Should we be looking at another table besides system.local and peers? Any help 
would be appreciated.

Josh Smith


Re: Steps to do after schema changes

2015-03-12 Thread Mark Reddy
It's always good to run nodetool describecluster after a schema change,
this will show you all the nodes in your cluster and what schema version
they have. If they have different versions you have a schema disagreement
and should follow this guide to resolution:
http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_handle_schema_disagree_t.html

Regards,
Mark

On 12 March 2015 at 05:47, Phil Yang ud1...@gmail.com wrote:

 Usually, you have nothing to do. Changes will be synced to every nodes
 automatically.

 2015-03-12 13:21 GMT+08:00 Ajay ajay.ga...@gmail.com:

 Hi,

 Are there any steps to do (like nodetool or restart node) or any
 precautions after schema changes are done in a column family say adding a
 new column or modifying any table properties?

 Thanks
 Ajay




 --
 Thanks,
 Phil Yang




Re: Steps to do after schema changes

2015-03-12 Thread Ajay
Thanks Mark.

-
Ajay
On 12-Mar-2015 11:08 pm, Mark Reddy mark.l.re...@gmail.com wrote:

 It's always good to run nodetool describecluster after a schema change,
 this will show you all the nodes in your cluster and what schema version
 they have. If they have different versions you have a schema disagreement
 and should follow this guide to resolution:
 http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_handle_schema_disagree_t.html

 Regards,
 Mark

 On 12 March 2015 at 05:47, Phil Yang ud1...@gmail.com wrote:

 Usually, you have nothing to do. Changes will be synced to every nodes
 automatically.

 2015-03-12 13:21 GMT+08:00 Ajay ajay.ga...@gmail.com:

 Hi,

 Are there any steps to do (like nodetool or restart node) or any
 precautions after schema changes are done in a column family say adding a
 new column or modifying any table properties?

 Thanks
 Ajay




 --
 Thanks,
 Phil Yang





Steps to do after schema changes

2015-03-11 Thread Ajay
Hi,

Are there any steps to do (like nodetool or restart node) or any
precautions after schema changes are done in a column family say adding a
new column or modifying any table properties?

Thanks
Ajay


Re: Steps to do after schema changes

2015-03-11 Thread Phil Yang
Usually, you have nothing to do. Changes will be synced to every nodes
automatically.

2015-03-12 13:21 GMT+08:00 Ajay ajay.ga...@gmail.com:

 Hi,

 Are there any steps to do (like nodetool or restart node) or any
 precautions after schema changes are done in a column family say adding a
 new column or modifying any table properties?

 Thanks
 Ajay




-- 
Thanks,
Phil Yang


Re: Schema changes: where in Java code are they sent?

2015-02-25 Thread Richard Dawe
Good morning,

Sorry for the slow reply here. I finally had some time to test cqlsh tracing on 
a ccm cluster with 2 of 3 nodes down, to see if the unavailable error was due 
to cqlsh or my query. Reply inline below.

On 15/01/2015 12:46, Tyler Hobbs 
ty...@datastax.commailto:ty...@datastax.com wrote:

On Thu, Jan 15, 2015 at 6:30 AM, Richard Dawe 
rich.d...@messagesystems.commailto:rich.d...@messagesystems.com wrote:

I thought it might be quorum consistency level, because of the because I was 
seeing with cqlsh. I was testing with ccm with C* 2.0.8, 3 nodes, vnodes 
enabled (ccm create test -v 2.0.8 -n 3 --vnodes -s”). With all three nodes up, 
my schema operations were working fine. When I took down two nodes using “ccm 
node2 stop”, “ccm node3 stop”, I found that schema operations through “ccm 
node1 cqlsh” were failing like this:

  cqlsh ALTER TABLE test.test3 ADD fred text;
  Unable to complete request: one or more nodes were unavailable.

That’s the full output — I had enabled tracing, but only that error came back.

After reading your reply, I went back and re-ran my tests with cqlsh, and it 
seems like the “one or more nodes were unavailable” may be due to cqlsh’s error 
handling.

If I wait a bit, and re-run my schema operations, they work fine with only one 
node up. I can see in the tracing that it’s only talking to node1 (127.0.0.1) 
to make the schema modifications.

Is this a known issue in cqlsh? If it helps I can send the full command-line 
session log.

That Unavailable error may actually be from the tracing-related queries failing 
(that's what I suspect, at least).  Starting cqlsh with --debug might show you 
a stacktrace in that case, but I'm not 100% sure.

Yes, it does seem to be cqlsh tracing. The debug output below was generated 
with:

 * A 3 node ccm cluster, running Cassandra 2.0.8 on Ubuntu 14.10 x86_64.
 * I took down 2 of the 3 nodes.
 * Table test5 has a replication factor of 3, primary key is “id text”.
 * cqlsh session was started after 2 of the 3 nodes had been shut down.

Debug output:

rdawe@cstar:~$ ccm node1 cqlsh --debug
Using CQL driver: module 'cql' from 
'/home/rdawe/.ccm/repository/2.0.8/bin/../lib/cql-internal-only-1.4.1.zip/cql-1.4.1/cql/__init__.py'
Using thrift lib: module 'thrift' from 
'/home/rdawe/.ccm/repository/2.0.8/bin/../lib/thrift-python-internal-only-0.9.1.zip/thrift/__init__.py'
Connected to test at 127.0.0.1:9160.
[cqlsh 4.1.1 | Cassandra 2.0.8-SNAPSHOT | CQL spec 3.1.1 | Thrift protocol 
19.39.0]
Use HELP for help.
cqlsh USE test;
cqlsh:test TRACING ON
Now tracing requests.
cqlsh:test SELECT * FROM test5;

 id| foo
---+---
 blarg |  ness
 hello | world

(2 rows)

Traceback (most recent call last):
  File /home/rdawe/.ccm/repository/2.0.8/bin/cqlsh, line 827, in onecmd
self.handle_statement(st, statementtext)
  File /home/rdawe/.ccm/repository/2.0.8/bin/cqlsh, line 865, in 
handle_statement
return custom_handler(parsed)
  File /home/rdawe/.ccm/repository/2.0.8/bin/cqlsh, line 901, in do_select
with_default_limit=with_default_limit)
  File /home/rdawe/.ccm/repository/2.0.8/bin/cqlsh, line 910, in 
perform_statement
print_trace_session(self, self.cursor, session_id)
  File /home/rdawe/.ccm/repository/2.0.8/bin/../pylib/cqlshlib/tracing.py, 
line 26, in print_trace_session
rows  = fetch_trace_session(cursor, session_id)
  File /home/rdawe/.ccm/repository/2.0.8/bin/../pylib/cqlshlib/tracing.py, 
line 47, in fetch_trace_session
consistency_level='ONE')
  File 
/home/rdawe/.ccm/repository/2.0.8/bin/../lib/cql-internal-only-1.4.1.zip/cql-1.4.1/cql/cursor.py,
 line 80, in execute
response = self.get_response(prepared_q, cl)
  File 
/home/rdawe/.ccm/repository/2.0.8/bin/../lib/cql-internal-only-1.4.1.zip/cql-1.4.1/cql/thrifteries.py,
 line 77, in get_response
return self.handle_cql_execution_errors(doquery, compressed_q, compress, cl)
  File 
/home/rdawe/.ccm/repository/2.0.8/bin/../lib/cql-internal-only-1.4.1.zip/cql-1.4.1/cql/thrifteries.py,
 line 102, in handle_cql_execution_errors
raise cql.OperationalError(Unable to complete request: one or 
OperationalError: Unable to complete request: one or more nodes were 
unavailable.

Sometimes I get a different error:

rdawe@cstar:~$ echo -e 'TRACING ON\nSELECT * FROM test.test5;\n' | ccm node1 
cqlsh --debug
Using CQL driver: module 'cql' from 
'/home/rdawe/.ccm/repository/2.0.8/bin/../lib/cql-internal-only-1.4.1.zip/cql-1.4.1/cql/__init__.py'
Using thrift lib: module 'thrift' from 
'/home/rdawe/.ccm/repository/2.0.8/bin/../lib/thrift-python-internal-only-0.9.1.zip/thrift/__init__.py'
Now tracing requests.

 id| foo
---+---
 blarg |  ness
 hello | world

(2 rows)

stdin:3:Session edc8c010-bcd5-11e4-a008-1dd7f4de70a1 wasn't found.

I notice that the system_traces keyspace has replication factor 2. Since 2 
nodes are down, perhaps sometimes the tracing session would be stored on nodes 
that are down. And other times one of the two replicas for 

Re: Schema changes: where in Java code are they sent?

2015-01-15 Thread Tyler Hobbs
On Thu, Jan 15, 2015 at 6:30 AM, Richard Dawe rich.d...@messagesystems.com
wrote:


  I thought it might be quorum consistency level, because of the because I
 was seeing with cqlsh. I was testing with ccm with C* 2.0.8, 3 nodes,
 vnodes enabled (ccm create test -v 2.0.8 -n 3 --vnodes -s”). With all
 three nodes up, my schema operations were working fine. When I took down
 two nodes using “ccm node2 stop”, “ccm node3 stop”, I found that schema
 operations through “ccm node1 cqlsh” were failing like this:

cqlsh ALTER TABLE test.test3 ADD fred text;
   Unable to complete request: one or more nodes were unavailable.

  That’s the full output — I had enabled tracing, but only that error came
 back.

  After reading your reply, I went back and re-ran my tests with cqlsh,
 and it seems like the “one or more nodes were unavailable” may be due to
 cqlsh’s error handling.

  If I wait a bit, and re-run my schema operations, they work fine with
 only one node up. I can see in the tracing that it’s only talking to node1
 (127.0.0.1) to make the schema modifications.

  Is this a known issue in cqlsh? If it helps I can send the full
 command-line session log.


That Unavailable error may actually be from the tracing-related queries
failing (that's what I suspect, at least).  Starting cqlsh with --debug
might show you a stacktrace in that case, but I'm not 100% sure.


-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: Schema changes: where in Java code are they sent?

2015-01-15 Thread Richard Dawe
Hi Tyler,

Thank you for your quick reply; follow-up inline below.

On 14/01/2015 19:36, Tyler Hobbs 
ty...@datastax.commailto:ty...@datastax.com wrote:

On Wed, Jan 14, 2015 at 5:13 PM, Richard Dawe 
rich.d...@messagesystems.commailto:rich.d...@messagesystems.com wrote:

I’ve been trying to find the Java code where the schema migration is sent to 
the other nodes in the cluster, to understand what the requirements are for 
successfully applying the update. E.g.: is QUORUM consistency level applied?

A quorum isn't required.  Schema changes are simply applied against the local 
node (whichever node the client sends the query to) and then are pushed out to 
the other nodes.  Nodes will also pull the latest schema from other nodes as 
needed (for example, if a node was down during a schema change).

I thought it might be quorum consistency level, because of the because I was 
seeing with cqlsh. I was testing with ccm with C* 2.0.8, 3 nodes, vnodes 
enabled (ccm create test -v 2.0.8 -n 3 --vnodes -s”). With all three nodes up, 
my schema operations were working fine. When I took down two nodes using “ccm 
node2 stop”, “ccm node3 stop”, I found that schema operations through “ccm 
node1 cqlsh” were failing like this:

  cqlsh ALTER TABLE test.test3 ADD fred text;
  Unable to complete request: one or more nodes were unavailable.

That’s the full output — I had enabled tracing, but only that error came back.

After reading your reply, I went back and re-ran my tests with cqlsh, and it 
seems like the “one or more nodes were unavailable” may be due to cqlsh’s error 
handling.

If I wait a bit, and re-run my schema operations, they work fine with only one 
node up. I can see in the tracing that it’s only talking to node1 (127.0.0.1) 
to make the schema modifications.

Is this a known issue in cqlsh? If it helps I can send the full command-line 
session log.


I spent an hour looking through the Java code last night, with no luck. I 
thought this code would be in StorageProxy.java, but I have not found it there, 
or in any of the other classes I looked at.

MigrationManager is probably the most central class for this stuff.

Thank you. That code makes a lot more sense now. :)

Best regards, Rich



Schema changes: where in Java code are they sent?

2015-01-14 Thread Richard Dawe
Hello,

I’m doing some research on schema migrations for Cassandra.

I’ve been playing with cqlsh with TRACING ON, and I can see that a schema 
change like “CREATE TABLE” is sent to all nodes in the cluster. And also that 
“CREATE TABLE” fails if only one of my three nodes is up (with replication 
factor = 3).

I’ve been trying to find the Java code where the schema migration is sent to 
the other nodes in the cluster, to understand what the requirements are for 
successfully applying the update. E.g.: is QUORUM consistency level applied?

I spent an hour looking through the Java code last night, with no luck. I 
thought this code would be in StorageProxy.java, but I have not found it there, 
or in any of the other classes I looked at.

Any pointers would be appreciated.

Thanks, best regards, Rich



Re: Schema changes: where in Java code are they sent?

2015-01-14 Thread Tyler Hobbs
On Wed, Jan 14, 2015 at 5:13 PM, Richard Dawe rich.d...@messagesystems.com
wrote:


  I’ve been trying to find the Java code where the schema migration is
 sent to the other nodes in the cluster, to understand what the requirements
 are for successfully applying the update. E.g.: is QUORUM consistency level
 applied?


A quorum isn't required.  Schema changes are simply applied against the
local node (whichever node the client sends the query to) and then are
pushed out to the other nodes.  Nodes will also pull the latest schema from
other nodes as needed (for example, if a node was down during a schema
change).



  I spent an hour looking through the Java code last night, with no luck.
 I thought this code would be in StorageProxy.java, but I have not found it
 there, or in any of the other classes I looked at.


MigrationManager is probably the most central class for this stuff.


-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: best practice for waiting for schema changes to propagate

2014-09-30 Thread Ben Bromhead
The system.peers table which is a copy of some gossip info the node has
stored, including the schema version. You should query this and wait until
all schema versions have converged.

http://www.datastax.com/documentation/cql/3.0/cql/cql_using/use_sys_tab_cluster_t.html

http://www.datastax.com/dev/blog/the-data-dictionary-in-cassandra-1-2

As ensuring that the driver keeps talking to the node you made the schema
change on I would ask the drivers specific mailing list / IRC:


   - MAILING LIST:
   https://groups.google.com/a/lists.datastax.com/forum/#!forum/java-driver-user
   - IRC: #datastax-drivers on irc.freenode.net http://freenode.net/



On 30 September 2014 10:16, Clint Kelly clint.ke...@gmail.com wrote:

 Hi all,

 I often have problems with code that I write that uses the DataStax Java
 driver to create / modify a keyspace or table and then soon after reads the
 metadata for the keyspace to verify that whatever changes I made the
 keyspace or table are complete.

 As an example, I may create a table called `myTableName` and then very
 soon after do something like:

 assert(session
   .getCluster()
   .getMetaData()
   .getKeyspace(myKeyspaceName)
   .getTable(myTableName) != null)

 I assume this fails sometimes because the default round-robin load
 balancing policy for the Java driver will send my create-table request to
 one node and the metadata read to another, and because it takes some time
 for the table creation to propagate across all of the nodes in my cluster.

 What is the best way to deal with this problem?  Is there a standard way
 to wait for schema changes to propagate?

 Best regards,
 Clint




-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
http://twitter.com/instaclustr | +61 415 936 359


Re: best practice for waiting for schema changes to propagate

2014-09-30 Thread graham sanderson
Also be aware of https://issues.apache.org/jira/browse/CASSANDRA-7734 if you 
are using C* 2.0.6+ (2.0.6 introduced a change that can sometimes causes 
initial schema propagation not to happen, introducing potentially long delays 
until some other code path repairs it later)

On Sep 30, 2014, at 1:54 AM, Ben Bromhead b...@instaclustr.com wrote:

 The system.peers table which is a copy of some gossip info the node has 
 stored, including the schema version. You should query this and wait until 
 all schema versions have converged.
 
 http://www.datastax.com/documentation/cql/3.0/cql/cql_using/use_sys_tab_cluster_t.html
 
 http://www.datastax.com/dev/blog/the-data-dictionary-in-cassandra-1-2
 
 As ensuring that the driver keeps talking to the node you made the schema 
 change on I would ask the drivers specific mailing list / IRC:
 
 MAILING LIST: 
 https://groups.google.com/a/lists.datastax.com/forum/#!forum/java-driver-user
 IRC: #datastax-drivers on irc.freenode.net
 
 
 On 30 September 2014 10:16, Clint Kelly clint.ke...@gmail.com wrote:
 Hi all,
 
 I often have problems with code that I write that uses the DataStax Java 
 driver to create / modify a keyspace or table and then soon after reads the 
 metadata for the keyspace to verify that whatever changes I made the keyspace 
 or table are complete.
 
 As an example, I may create a table called `myTableName` and then very soon 
 after do something like:
 
 assert(session
   .getCluster()
   .getMetaData()
   .getKeyspace(myKeyspaceName)
   .getTable(myTableName) != null)
 
 I assume this fails sometimes because the default round-robin load balancing 
 policy for the Java driver will send my create-table request to one node and 
 the metadata read to another, and because it takes some time for the table 
 creation to propagate across all of the nodes in my cluster.
 
 What is the best way to deal with this problem?  Is there a standard way to 
 wait for schema changes to propagate?
 
 Best regards,
 Clint
 
 
 
 -- 
 Ben Bromhead
 
 Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359
 



smime.p7s
Description: S/MIME cryptographic signature


best practice for waiting for schema changes to propagate

2014-09-29 Thread Clint Kelly
Hi all,

I often have problems with code that I write that uses the DataStax Java
driver to create / modify a keyspace or table and then soon after reads the
metadata for the keyspace to verify that whatever changes I made the
keyspace or table are complete.

As an example, I may create a table called `myTableName` and then very soon
after do something like:

assert(session
  .getCluster()
  .getMetaData()
  .getKeyspace(myKeyspaceName)
  .getTable(myTableName) != null)

I assume this fails sometimes because the default round-robin load
balancing policy for the Java driver will send my create-table request to
one node and the metadata read to another, and because it takes some time
for the table creation to propagate across all of the nodes in my cluster.

What is the best way to deal with this problem?  Is there a standard way to
wait for schema changes to propagate?

Best regards,
Clint


Re: Schema changes not getting picked up from different process

2012-05-30 Thread aaron morton
What clients are the scripts using ? This sounds like something that should be 
handled in the client. 

I would worry about holding a long running connection to a single node. There 
are several situations where the correct behaviour for a client is to kill a 
connection and connect to another node. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 26/05/2012, at 12:11 AM, Victor Blaga wrote:

 Hi Dave,
 
 Thank you for your answer.
 
 2012/5/25 Dave Brosius dbros...@mebigfatguy.com
 What version are you using?
 
 I am using version 1.1.0
  
 It might be related to https://issues.apache.org/jira/browse/CASSANDRA-4052
  
  Indeed the Issue you suggested goes into the direction of my problem. 
 However, things are a little bit more complex. I used the cassandra-cli just 
 for this example, although I'm getting this behavior from other clients (I'm 
 using python and ruby scripts). Basically I'm modifying the schema through 
 the ruby script and I'm trying to query and insert data through the python 
 script. Both of the scripts are meant to be on forever (sort of daemons) and 
 thus they establish once at start a connection to the Cassandra which is kept 
 alive.
 
 I can see from the comments on the issue that keeping a long-lived connection 
 to the Cluster might not be ideal and it would probably be better to 
 reconnect upon executing a set of queries.



Schema changes not getting picked up from different process

2012-05-25 Thread Victor Blaga
Hi all,

This is my first message on this posting list so I'm sorry if I am breaking
any rules. I just wanted to report some sort of a problem that I'm having
with Cassandra.
Short version of my problem: if I make changes to the schema from within a
process, they do not get picked up by the other processes that are
connected to the Cassandra cluster unless I trigger a reconnect.

Long version:

Process 1: cassandra-cli connected to cluster and keyspace
Process 2: cassandra-cli connected to cluster and keyspace

From within process 1 - create column family test;
From within process 2 - describe test; - fails with an error (other
query/insert methods fail as well).

I'm not sure if this is indeed a bug or just a misunderstanding from my
part.

Regards,
Victor


Re: Schema changes not getting picked up from different process

2012-05-25 Thread Dave Brosius

What version are you using?

It might be related to https://issues.apache.org/jira/browse/CASSANDRA-4052

On 05/25/2012 07:32 AM, Victor Blaga wrote:

Hi all,

This is my first message on this posting list so I'm sorry if I am 
breaking any rules. I just wanted to report some sort of a problem 
that I'm having with Cassandra.
Short version of my problem: if I make changes to the schema from 
within a process, they do not get picked up by the other processes 
that are connected to the Cassandra cluster unless I trigger a reconnect.


Long version:

Process 1: cassandra-cli connected to cluster and keyspace
Process 2: cassandra-cli connected to cluster and keyspace

From within process 1 - create column family test;
From within process 2 - describe test; - fails with an error (other 
query/insert methods fail as well).


I'm not sure if this is indeed a bug or just a misunderstanding from 
my part.


Regards,
Victor




Re: Schema changes not getting picked up from different process

2012-05-25 Thread Victor Blaga
Hi Dave,

Thank you for your answer.

2012/5/25 Dave Brosius dbros...@mebigfatguy.com

  What version are you using?


I am using version 1.1.0


 It might be related to
 https://issues.apache.org/jira/browse/CASSANDRA-4052


 Indeed the Issue you suggested goes into the direction of my problem.
However, things are a little bit more complex. I used the cassandra-cli
just for this example, although I'm getting this behavior from other
clients (I'm using python and ruby scripts). Basically I'm modifying the
schema through the ruby script and I'm trying to query and insert data
through the python script. Both of the scripts are meant to be on forever
(sort of daemons) and thus they establish once at start a connection to the
Cassandra which is kept alive.

I can see from the comments on the issue that keeping a long-lived
connection to the Cluster might not be ideal and it would probably be
better to reconnect upon executing a set of queries.


SV: Replication factor and other schema changes in = 0.7

2010-08-20 Thread Thorvaldsson Justus
KsDef
CfDef   -has metadata
And perhaps ColumnDef

how to make a ksdef---
KsDef k = new KsDef();
k.setName(keyspacename);
k.setReplication_factor(replicafactor);
k.setStrategy_class(org.apache.cassandra.locator.RackUnawareStrategy);
ListCfDef cfDefs = new ArrayListCfDef();
k.setCf_defs(cfDefs);
c.system_add_keyspace(k);

/Justus
www.justus.st

Från: Andres March [mailto:ama...@qualcomm.com]
Skickat: den 20 augusti 2010 01:01
Till: user@cassandra.apache.org
Ämne: Replication factor and other schema changes in = 0.7

How should we go about changing the replication factor and other keyspace 
settings now that it and other KSMetaData are no longer managed in 
cassandra.yaml?

I found makeDefinitionMutation() in the Migration class and see that it is 
called for the other schema migrations.  There just seems to be a big gap in 
the management API for the KSMetaData we might want to change.
--
Andres March
ama...@qualcomm.commailto:ama...@qualcomm.com
Qualcomm Internet Services


Re: Replication factor and other schema changes in = 0.7

2010-08-20 Thread Gary Dusbabek
It is coming.  In fact, I started working on this ticket yesterday.
Most of the settings that you could change before will be modifiable.
Unfortunately, you must still manually perform the repair operations,
etc., afterward.

https://issues.apache.org/jira/browse/CASSANDRA-1285

Gary.


On Thu, Aug 19, 2010 at 18:00, Andres March ama...@qualcomm.com wrote:
 How should we go about changing the replication factor and other keyspace
 settings now that it and other KSMetaData are no longer managed in
 cassandra.yaml?

 I found makeDefinitionMutation() in the Migration class and see that it is
 called for the other schema migrations.  There just seems to be a big gap in
 the management API for the KSMetaData we might want to change.
 --
 Andres March
 ama...@qualcomm.com
 Qualcomm Internet Services


Re: Replication factor and other schema changes in = 0.7

2010-08-20 Thread Andres March

 Cool, thanks.  I suspected the same, including the repair.

On 08/20/2010 06:05 AM, Gary Dusbabek wrote:

It is coming.  In fact, I started working on this ticket yesterday.
Most of the settings that you could change before will be modifiable.
Unfortunately, you must still manually perform the repair operations,
etc., afterward.

https://issues.apache.org/jira/browse/CASSANDRA-1285

Gary.


On Thu, Aug 19, 2010 at 18:00, Andres Marchama...@qualcomm.com  wrote:

How should we go about changing the replication factor and other keyspace
settings now that it and other KSMetaData are no longer managed in
cassandra.yaml?

I found makeDefinitionMutation() in the Migration class and see that it is
called for the other schema migrations.  There just seems to be a big gap in
the management API for the KSMetaData we might want to change.
--
Andres March
ama...@qualcomm.com
Qualcomm Internet Services


--
*Andres March*
ama...@qualcomm.com mailto:ama...@qualcomm.com
Qualcomm Internet Services


Replication factor and other schema changes in = 0.7

2010-08-19 Thread Andres March
 How should we go about changing the replication factor and other 
keyspace settings now that it and other KSMetaData are no longer managed 
in cassandra.yaml?


I found makeDefinitionMutation() in the Migration class and see that it 
is called for the other schema migrations.  There just seems to be a big 
gap in the management API for the KSMetaData we might want to change.

--
*Andres March*
ama...@qualcomm.com mailto:ama...@qualcomm.com
Qualcomm Internet Services