Re: Cassandra 4.0 upgrade - Upgradesstables
Though it is not required to run upgradesstables, but upgradesstables -a will re-write the file to kick out tombstones, in sizeTieredcompaction, the largest files may stay a long time to wait for the next compaction to kick out tombstones. So it really depends, to run it or not, usually upgrades have a change window, applications may be no load or less load, why don't take the chance to run it. Regards, Jim On Tue, Aug 16, 2022 at 3:17 PM Jai Bheemsen Rao Dhanwada < jaibheem...@gmail.com> wrote: > Thank you > > On Tue, Aug 16, 2022 at 11:48 AM C. Scott Andreas > wrote: > >> No downside at all for 3.x -> 4.x (however, Cassandra 3.x reading 2.1 >> SSTables incurred a performance hit). >> >> Many users of Cassandra don't run upgradesstables after 3.x -> 4.x >> upgrades at all. It's not necessary to run until a hypothetical future time >> if/when support for reading Cassandra 3.x SSTables is removed from >> Cassandra. One of the most common reasons to avoid running upgradesstables >> is because doing so causes 100% churn of the data files, meaning your >> backup processes will need to upload a full copy of the data. Allowing >> SSTables to organically churn into the new version via compaction avoids >> this. >> >> If you're upgrading from 3.x to 4.x, don't feel like you have to - but it >> does avoid the need to run upgradesstables in a hypothetical distant future. >> >> – Scott >> >> On Aug 16, 2022, at 6:32 AM, Jai Bheemsen Rao Dhanwada < >> jaibheem...@gmail.com> wrote: >> >> >> Thank you Erick, >> >> > it is going to be single-threaded by default so it will take a while to >> get through all the sstables on dense nodes >> Is there any downside if the upgradesstables take longer (example 1-2 >> days), other than I/O? >> >> Also when is the upgradesstable get triggered? after every node is >> upgraded or it will kick in only when all the nodes in the cluster upgraded >> to 4.0.x? >> >> On Tue, Aug 16, 2022 at 2:12 AM Erick Ramirez >> wrote: >> >>> As convenient as it is, there are a few caveats and it isn't a silver >>> bullet. The automatic feature will only kick in if there are no other >>> compactions scheduled. Also, it is going to be single-threaded by default >>> so it will take a while to get through all the sstables on dense nodes. >>> >>> In contrast, you'll have a bit more control if you manually upgrade the >>> sstables. For example, you can schedule the upgrade during low traffic >>> periods so reads are not competing with compactions for IO. Cheers! >>> >>
Re: cell vs row timestamp tie resolution
Andrey: cassandra every cell has a timestamp, select writetime (..) can see the timestamp, cassandra merge cells when compaction, when read, sort by timestamp. for you example, if you left pad the writetime to column value (writetime + cell value), then sort, shall return what you see now. Regards, Jim On Tue, Aug 16, 2022 at 10:25 AM Andrey Zapariy wrote: > Hello Cassandra users! > > I'm dealing with the unexpected behavior of the tie resolution for the > same timestamp inserts. At least, unexpected for me. > The following simple repro under Cassandra 3.11.4 illustrates the question: > > CREATE KEYSPACE the_test WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': '2'} AND durable_writes = true; > CREATE TABLE the_test.case (id int, sort int, body text, size int, PRIMARY > KEY (id, sort)) WITH CLUSTERING ORDER BY (sort ASC); > INSERT INTO the_test.case (id, sort, body, size) VALUES (1, 2, 'foo foo', > 7) USING TIMESTAMP 1660596312240; > INSERT INTO the_test.case (id, sort, body, size) VALUES (1, 2, 'flap > flap', 9) USING TIMESTAMP 1660596312240; > > After these two inserts I expect that either combination of <'foo foo',11> > or combination of <'flap flap',9> would survive. > But the select > select id, sort, body, size from the_test.case where id=1 and sort=2; > is giving rather uncomfortable result: > id | sort | body| size > +--+-+-- > 1 |2 | foo foo |9 > Essentially, showing that timestamp tie resolution is performed on per > cell basis, and not on row basis, as I was expecting. > > My questions are: > Am I right about the way Cassandra does resolve timestamp ties? > Or is there a way to configure Cassandra to perform per row resolution? > > Flushing data to sstables and dumping them, suggests that these inserts > are stored as rows. And, naively thinking, I hope there is a way to make > the whole row insert to survive. > > >