Re: Cassandra 4.0 upgrade - Upgradesstables

2022-08-21 Thread Jim Shaw
Though it is not required to run upgradesstables, but upgradesstables -a
will re-write the file to kick out tombstones, in sizeTieredcompaction, the
largest files may stay a long time to wait for the next compaction to
kick out tombstones.
So it really depends,  to run it or not,  usually upgrades have a change
window, applications may be no load or less load, why don't take the chance
to run it.

Regards,

Jim

On Tue, Aug 16, 2022 at 3:17 PM Jai Bheemsen Rao Dhanwada <
jaibheem...@gmail.com> wrote:

> Thank you
>
> On Tue, Aug 16, 2022 at 11:48 AM C. Scott Andreas 
> wrote:
>
>> No downside at all for 3.x -> 4.x (however, Cassandra 3.x reading 2.1
>> SSTables incurred a performance hit).
>>
>> Many users of Cassandra don't run upgradesstables after 3.x -> 4.x
>> upgrades at all. It's not necessary to run until a hypothetical future time
>> if/when support for reading Cassandra 3.x SSTables is removed from
>> Cassandra. One of the most common reasons to avoid running upgradesstables
>> is because doing so causes 100% churn of the data files, meaning your
>> backup processes will need to upload a full copy of the data. Allowing
>> SSTables to organically churn into the new version via compaction avoids
>> this.
>>
>> If you're upgrading from 3.x to 4.x, don't feel like you have to - but it
>> does avoid the need to run upgradesstables in a hypothetical distant future.
>>
>> – Scott
>>
>> On Aug 16, 2022, at 6:32 AM, Jai Bheemsen Rao Dhanwada <
>> jaibheem...@gmail.com> wrote:
>>
>>
>> Thank you Erick,
>>
>> > it is going to be single-threaded by default so it will take a while to
>> get through all the sstables on dense nodes
>> Is there any downside if the upgradesstables take longer (example 1-2
>> days), other than I/O?
>>
>> Also when is the upgradesstable get triggered? after every node is
>> upgraded or it will kick in only when all the nodes in the cluster upgraded
>> to 4.0.x?
>>
>> On Tue, Aug 16, 2022 at 2:12 AM Erick Ramirez 
>> wrote:
>>
>>> As convenient as it is, there are a few caveats and it isn't a silver
>>> bullet. The automatic feature will only kick in if there are no other
>>> compactions scheduled. Also, it is going to be single-threaded by default
>>> so it will take a while to get through all the sstables on dense nodes.
>>>
>>> In contrast, you'll have a bit more control if you manually upgrade the
>>> sstables. For example, you can schedule the upgrade during low traffic
>>> periods so reads are not competing with compactions for IO. Cheers!
>>>


>>


Re: cell vs row timestamp tie resolution

2022-08-21 Thread Jim Shaw
Andrey:
cassandra every cell has a timestamp, select writetime (..) can see
the timestamp,
cassandra merge cells when compaction,  when read,  sort by timestamp.
for you example, if you left pad the writetime to column value (writetime +
cell value), then sort,  shall return what you see now.

Regards,

Jim


On Tue, Aug 16, 2022 at 10:25 AM Andrey Zapariy 
wrote:

> Hello Cassandra users!
>
> I'm dealing with the unexpected behavior of the tie resolution for the
> same timestamp inserts. At least, unexpected for me.
> The following simple repro under Cassandra 3.11.4 illustrates the question:
>
> CREATE KEYSPACE the_test WITH replication = {'class': 'SimpleStrategy',
> 'replication_factor': '2'}  AND durable_writes = true;
> CREATE TABLE the_test.case (id int, sort int, body text, size int, PRIMARY
> KEY (id, sort)) WITH CLUSTERING ORDER BY (sort ASC);
> INSERT INTO the_test.case (id, sort, body, size) VALUES (1, 2, 'foo foo',
> 7) USING TIMESTAMP 1660596312240;
> INSERT INTO the_test.case (id, sort, body, size) VALUES (1, 2, 'flap
> flap', 9) USING TIMESTAMP 1660596312240;
>
> After these two inserts I expect that either combination of <'foo foo',11>
> or combination of <'flap flap',9> would survive.
> But the select
> select id, sort, body, size from the_test.case where id=1 and sort=2;
> is giving rather uncomfortable result:
> id | sort | body| size
> +--+-+--
>   1 |2 | foo foo |9
> Essentially, showing that timestamp tie resolution is performed on per
> cell basis, and not on row basis, as I was expecting.
>
> My questions are:
> Am I right about the way Cassandra does resolve timestamp ties?
> Or is there a way to configure Cassandra to perform per row resolution?
>
> Flushing data to sstables and dumping them, suggests that these inserts
> are stored as rows. And, naively thinking, I hope there is a way to make
> the whole row insert to survive.
>
>
>