Recently, I successfully used the following procedure when decommissioning a
datacenter:
1. Reduced the replication factor for this DC to zero for all keyspaces except
the system_auth keyspace. For that keyspace, I reduced the RF to one.
2. Decommissioned all nodes except one in the DC using
> It's actually correct to do it how it is today.
> Insertion date does not matter, what matters is the time after tombstones are
> supposed to be deleted.
> If the delete got to all nodes, sure, no problem, but if any of the nodes
> didn't get the delete, and you would get rid of the
> That's not how gc_grace_seconds work.
> gc_grace_seconds controls how much time *after* a tombstone can be deleted,
> it can actually be deleted, in order to give you enough time to run repairs.
>
> Say you have data that is about to expire on March 16 8am, and
> gc_grace_seconds is 10 days.
> by reading the documentation about TTL
> https://cassandra.apache.org/doc/4.1/cassandra/operating/compaction/index.html#ttl
> It mention that it creates a tombstone when data expired, how does it
> possible without writing to the tombstone on the table ? I thought TTL
> doesn't create
You might the following discussion from the mailing-list archive helpful:
https://lists.apache.org/thread/6hnypp6vfxj1yc35ptp0xf15f11cx77d
This thread discusses a similar situation gives a few pointers on when it might
be save to simply move the SSTables around.
> Am 08.02.2024 um 13:06
paired SSTables because some unrepaired SSTables are
> being marked as repaired on one node but not on another, you would then
> understand why over-streaming can happen. The over-streaming is only
> problematic for the repaired SSTables, because they are much bigger than the
> unrepai
> Caution, using the method you described, the amount of data streamed at the
> end with the full repair is not the amount of data written between stopping
> the first node and the last node, but depends on the table size, the number
> of partitions written, their distribution in the ring and
> That's a feature we need to implement in Reaper. I think disallowing the
> start of the new incremental repair would be easier to manage than pausing
> the full repair that's already running. It's also what I think I'd expect as
> a user.
>
> I'll create an issue to track this.
Thank you,
> Full repair running for an entire week sounds excessively long. Even if
> you've got 1 TB of data per node, 1 week means the repair speed is less than
> 2 MB/s, that's very slow. Perhaps you should focus on finding the bottleneck
> of the full repair speed and work on that instead.
We store
Hi,
> 2. use an orchestration tool, such as Cassandra Reaper, to take care of that
> for you. You will still need monitor and alert to ensure the repairs are run
> successfully, but fixing a stuck or failed repair is not very time sensitive,
> you can usually leave it till Monday morning if it
I would check whether some SSTables are marked as repaired while others are not
(by running sstablemetadata and checking the value of repairedAt).
An inconsistency in the repaired state, it might explain overstreaming. During
repairs, data from repaired SSTables on one node is only compared
Hi Arjun,
this is strange. You should be able to use a range query on a column that is
part of the clustering key, as long as all columns in the clustering key left
to this column are set to fixed values.
So, given the table definition that you specified, your query should work (I
just
> I assume these are column names of a non-system table.
>
This is correct. It is one of our application tables. The table has the
following schema:
CREATE TABLE pv_archive.channels (
channel_name text,
decimation_level int,
bucket_start_time bigint,
channel_data_id uuid static,
> If an upgrade involves changing the schema, I think backwards compatibility
> would be out of the question?
That’s a good point.
I just noticed that during the upgrade, the output of “nodetool
describecluster” showed a schema version disagreement, where the nodes running
3.11.14 were on
Hi,
while upgrading our production cluster from C* 3.11.14 to 4.1.3, we experienced
the issue that some SELECT queries failed due to supposedly no replica being
available. The system logs on the C* nodes where full of messages like the
following one:
ERROR [ReadStage-1] 2023-12-11
Hi,
we are currently in the process of migrating from C* 3.11 to C* 4.1 and we want
to start using incremental repairs after the upgrade has been completed. It
seems like all the really bad bugs that made using incremental repairs
dangerous in C* 3.x have been fixed in 4.x, and for our
Hi,
as we are currently facing the same challenge (upgrading an existing cluster
from C* 3 to C* 4), I wanted to share our strategy with you. It largely is what
Scott already suggested, but I have some extra details, so I thought it might
still be useful.
We duplicated our cluster using the
17 matches
Mail list logo