Re: Schema collision results in multiple data directories per table

2021-10-18 Thread Erick Ramirez
> > Erick, one last question: Is there a quick and easy way to extract the > date from a time UUID? > Yeah, just use any online converters on the web. Cheers!

Re: Schema collision results in multiple data directories per table

2021-10-18 Thread Tom Offermann
> > Any possibility that you "merged" two clusters together? Ooohh...I think that's the missing piece of this puzzle! A couple weeks earlier, prior to the problem described in this thread, we did inadvertently merge two clusters together. We merged the original 'dc1' cluster with an entirely

Re: Schema collision results in multiple data directories per table

2021-10-15 Thread Erick Ramirez
I agree with Jeff that this isn't related to ALTER TABLE. FWIW, the original table was created in 2017 but a new version got created on August 5: - 20739eb0-d92e-11e6-b42f-e7eb6f21c481 - Friday, January 13, 2017 at 1:18:01 GMT - 8ad72660-f629-11eb-a217-e1a09d8bc60c - Thursday, August 5,

Re: Schema collision results in multiple data directories per table

2021-10-15 Thread Tom Offermann
Jeff, Ahh...I see. That makes sense. I'll add this to the list of things to check before making a schema change. Thanks so much for taking the time to walk me through this. Really appreciate all of your help! On Fri, Oct 15, 2021 at 3:52 PM Jeff Jirsa wrote: > Consistency doesnt matter for

Re: Schema collision results in multiple data directories per table

2021-10-15 Thread Jeff Jirsa
Consistency doesnt matter for schema. For every host: " select id from system_schema tables WHERE keyspace_name=? and table_name=?" ( https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/schema/SchemaKeyspace.java#L144 ) Then, compare that to the

Re: Schema collision results in multiple data directories per table

2021-10-15 Thread Tom Offermann
So, if I were to do `CONSISTENCY ALL; select *` from each of the system_schema tables, then on-disk and in-memory should be in sync? On Fri, Oct 15, 2021 at 3:38 PM Jeff Jirsa wrote: > Heap dumps + filesystem inspection + SELECT from schema tables. > > > On Fri, Oct 15, 2021 at 3:28 PM Tom

Re: Schema collision results in multiple data directories per table

2021-10-15 Thread Jeff Jirsa
Heap dumps + filesystem inspection + SELECT from schema tables. On Fri, Oct 15, 2021 at 3:28 PM Tom Offermann wrote: > Interesting! > > Is there a way to determine if the on-disk schema and the in-memory schema > are in sync? Is there a way to force them to sync? If so, would it help to >

Re: Schema collision results in multiple data directories per table

2021-10-15 Thread Tom Offermann
Interesting! Is there a way to determine if the on-disk schema and the in-memory schema are in sync? Is there a way to force them to sync? If so, would it help to force a sync before running an `ALTER KEYSPACE` schema change? On Fri, Oct 15, 2021 at 3:08 PM Jeff Jirsa wrote: > I would not

Re: Schema collision results in multiple data directories per table

2021-10-15 Thread Jeff Jirsa
I would not expect an ALTER KEYSPACE to introduce a divergent CFID, that usually happens during a CREATE TABLE. With no other evidence or ability to debug, I would guess that the CFIDs diverged previously, but due to the race(s) I described, the on-disk schema and the in-memory schema differed,

Re: Schema collision results in multiple data directories per table

2021-10-15 Thread Tom Offermann
Jeff, Thanks for describing the race condition. I understand that performing concurrent schema changes is dangerous, and that running an `ALTER KEYSPACE` on one node, and then running another `ALTER KEYSPACE` on a different node, before the first has fully propagated throughout the cluster, can

Re: Schema collision results in multiple data directories per table

2021-10-15 Thread Tom Offermann
Vytenis, I ran the `ALTER KEYSPACE` command on one of the original `dc1` nodes. Should it make any difference? My understanding was that it could be run from any node in either datacenter. But, if there's a reason to prefer running it on a new datacenter node, I'm happy to do it that way. --Tom

Re: Schema collision results in multiple data directories per table

2021-10-15 Thread Tom Offermann
Stefan, Yes, this is probably one of many good reasons to upgrade! Upgrading to Cassandra 4.0 is definitely on our roadmap, but we're hoping to do these migrations first before we upgrade. However, if we keep running into this problem, we may have to rethink that ordering. --Tom On Wed, Oct

Re: Schema collision results in multiple data directories per table

2021-10-13 Thread Jeff Jirsa
I've described this race a few times on the list. It is very very dangerous to do concurrent table creation in cassandra with non-determistic CFIDs. I'll try to describe it quickly right now: - Imagine you have 3 hosts, A B and C You connect to A and issue a "CREATE TABLE ... IF NOT EXISTS". A

Re: Schema collision results in multiple data directories per table

2021-10-13 Thread vytenis silgalis
You ran the `alter keyspace` command on the original dc1 nodes or the new dc2 nodes? On Wed, Oct 13, 2021 at 8:15 AM Stefan Miklosovic < stefan.mikloso...@instaclustr.com> wrote: > Hi Tom, > > while I am not completely sure what might cause your issue, I just > want to highlight that schema

Re: Schema collision results in multiple data directories per table

2021-10-13 Thread Stefan Miklosovic
Hi Tom, while I am not completely sure what might cause your issue, I just want to highlight that schema agreements were overhauled in 4.0 (1) a lot so that may be somehow related to what that ticket was trying to fix. Regards (1) https://issues.apache.org/jira/browse/CASSANDRA-15158 On Fri, 1