Re: Backup and Restore Strategy and Tools

2023-10-26 Thread Miklosovic, Stefan via user
Hi Bhavesh,

have you gone through Cassandra tools here (1)?

Just search for "backup", there are couple (CLI) solutions out there for your 
problem.

Feel tree to ping me on Cassandra Slack or privately if you want.

Cheers

https://cassandra.apache.org/_/ecosystem.html


From: Bhavesh Prajapati via user 
Sent: Thursday, October 26, 2023 20:44
To: user@cassandra.apache.org
Cc: Bhavesh Prajapati
Subject: Backup and Restore Strategy and Tools

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



Hi,

I have 48 nodes, single DC, Apache Cassandra cluster running in Prod – version 
is 4.0.6.
Currently, we are using a home-grown backup script based on nodetool snapshot 
that uploads backup to s3. We are using a home-grown restore script to recover 
incase of disaster.

I am looking for a guidance on what is a good backup and restore strategy when 
cluster has 48 nodes.
Is it possible to use Datastax OpsCenter for Apache Cassandra cluster ? Is it 
available to use for free ?
Is there any other UI or Command line tools that you recommend ?

Thanks,
Bhavesh


Re: Unsubscribe

2023-10-25 Thread Miklosovic, Stefan via user
Hi,

you need to unsubscribe as shown here 
https://cassandra.apache.org/_/community.html#discussions

Regards


From: Daniel Stibor 
Sent: Wednesday, October 25, 2023 15:17
To: user@cassandra.apache.org
Subject: Unsubscribe

You don't often get email from daniel.sti...@gmail.com. Learn why this is 
important
NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



Hey, I'd like to unsubscribe. Thanks


Re: java driver with cassandra proxies (option: -Dcassandra.join_ring=false)

2023-10-12 Thread Miklosovic, Stefan via user
It will use the first contact point to connect to the database and once 
connected, it will read that peers table, which is empty. Contact points are 
really just that - contact points. I think it does not mean that all of them 
will be used in some round-robin fashion or what. They are there just to read 
that peer's table and use these nodes, not contact points.

I think same would be seen if you specify two contact points where the first 
one is a non-existing ip address and the second one is proxy. It should connect 
to that proxy again which reads peers table as empty.

I was involved in some investigation around this functionality and I hit the 
same problem, basically. My idea was to put these proxies to peers table but 
that complicates things quite fast as they are not proper members of the ring, 
by definition, as they do not hold data etc 

I think this would need to be fixed in the driver - to included all contact 
points even they are not found in peers. But, if they are not part of the ring, 
they can never "leave" the ring. I wonder if they are visible in gossip etc ... 
I do not remember. Hence, how would you know that your proxy went down?


From: Jeff Jirsa 
Sent: Thursday, October 12, 2023 14:20
To: user@cassandra.apache.org
Subject: Re: java driver with cassandra proxies (option: 
-Dcassandra.join_ring=false)

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



Just to be clear:

- How many of the proxy nodes are you providing as contact points? One of them 
or all of them?

It sounds like you're saying you're passing all of them, and only one is 
connecting, and the driver is declining to connect to the rest because they're 
not in system.peers. I'm not surprised that the proxies aren't in system.peers, 
but I'd have also expected that if you pass all proxies in contact points, it'd 
connect to all of them, so I think you're appropriately surprised here.



On Thu, Oct 12, 2023 at 5:09 AM Regis Le Bretonnic 
mailto:r.lebreton...@meetic-corp.com>> wrote:
We have tested Stargate and were very disappointed...

Originally our architecture was PHP microservices (with FPM) + cassandra 
proxies.
But we were blocked because PHP driver is no more supported.

We made tests to keep PHP + stargate but there were many issues, the main one 
(but not the only one) being stargate does not support "ALLOW FILTERING" 
clause. I don't want to re-open this debate I already had with Stargate 
maintainers...

We finally decided to move from PHP to java but we'd like to keep cassandra 
proxies that are very usefull.

Regards

Le jeu. 12 oct. 2023 à 12:05, Erick Ramirez 
mailto:erickramire...@apache.org>> a écrit :
Those nodes are not in the peers table(s) because you told them NOT to join the 
ring with `join_ring=false` so it is working by design.

I'm not really sure what you're trying to achieve but if you want to separate 
the coordinator functions from the storage then what you probably want is to 
deploy Stargate nodes. Stargate is a data API gateway 
that sits between the app instances and the Cassandra database. It decouples 
client request coordination from the storage aspects of C*. It also allows you 
to perform CRUD operations against C* using APIs -- REST, JSON, gRPC, GraphQL.

See the docs on Using the Stargate CQL 
API for details on 
how to set up Stargate nodes as coordinators for your C* database.

If you want to see it in action, you can try it free on Astra 
DB (Cassandra-as-a-service). Cheers!


Re: Restricting data access at column and/or row level

2023-10-03 Thread Miklosovic, Stefan
Hi,

columns can be restricted per user by Dynamic Data Masking (will be in 5.0).

https://cassandra.apache.org/doc/trunk/cassandra/developing/cql/dynamic_data_masking.html

I am not sure about specific rows. To my knowledge I do not think that is 
possible.


From: Nitan Kainth 
Sent: Tuesday, October 3, 2023 23:31
To: cassandra
Subject: Restricting data access  at column and/or row level

You don't often get email from nitankai...@gmail.com. Learn why this is 
important
NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



Hi Team,

I have a requirement to grant select privileges on a table to some user 
restricting few columns and few rows. Something like this:

c1 | c2 | c3 | c4
+++
  5 |  1 |  5 |  1
10 |  1 |  1 |  1
  1 |  1 |  1 |  1
  8 |  1 |  1 |  1
  2 |  1 |  1 |  1
  4 |  1 |  1 |  1
  7 |  5 |  1 |  1
  6 |  1 |  1 |  1
  9 |  1 |  1 |  1
  3 |  4 |  1 |  1

User1 should be able to see only c1 and c2:
c1 | c2
+
  5 |  1
10 |  1
  1 |  1
  8 |  1
  2 |  1
  4 |  1
  7 |  5
  6 |  1
  9 |  1
  3 |  4

User2 should be able to see only c1=5:
c1 | c2
+
  5 |  1

I created Materialized view but unfortunately, it can’t work without granting 
permission on base table.


Regards,
Nitan
Cell: 510 449 9629



[RELEASE] Apache Cassandra 3.11.16 released

2023-08-20 Thread Miklosovic, Stefan
The Cassandra team is pleased to announce the release of Apache Cassandra 
version 3.11.16.

Apache Cassandra is a fully distributed database. It is the right choice when 
you need scalability and high availability without compromising performance.

 https://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download section:

 https://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.11 series. As always, please pay 
attention to the release notes[2] and let us know[3] if you were to encounter 
any problem.

[WARNING] Debian and RedHat package repositories have moved! Debian 
/etc/apt/sources.list.d/cassandra.sources.list and RedHat 
/etc/yum.repos.d/cassandra.repo files must be updated to the new repository 
URLs. For Debian it is now https://debian.cassandra.apache.org . For RedHat it 
is now https://redhat.cassandra.apache.org/311x/ .

Enjoy!

[1]: CHANGES.txt 
https://github.com/apache/cassandra/blob/cassandra-3.11.16/CHANGES.txt
[2]: NEWS.txt 
https://github.com/apache/cassandra/blob/cassandra-3.11.16/NEWS.txt
[3]: https://issues.apache.org/jira/browse/CASSANDRA

Re: Materialized View inconsistency issue

2023-08-18 Thread Miklosovic, Stefan
Well you could always do it like this

cqlsh> CREATE TABLE dating.visits2 (user_id int, visitor_id int, visit_month 
int, visit_date int, primary key (user_id, visitor_id, visit_month)) WITH 
CLUSTERING ORDER BY (visitor_id ASC, visit_month DESC );

This means that if you have, clearly, 6 months, you might have at most 6 
entries per user. If your primary key is user_id, visitor_id and visit_month, 
then clustering columns are vistor_id and visit_month and visit_month is in 
descending order.

// user 300 visits user 100 in august (8) on some specific timestamp
cqlsh> insert into dating.visits2 (user_id , visitor_id , visit_month , 
visit_date ) VALUES ( 100, 300, 8, 123);

// user 200 visits 100 in July and June on some timestamps.
cqlsh> insert into dating.visits2 (user_id , visitor_id , visit_month , 
visit_date ) VALUES ( 100, 200, 7, 456);
cqlsh> insert into dating.visits2 (user_id , visitor_id , visit_month , 
visit_date ) VALUES ( 100, 200, 6, 456);

cqlsh> select * from dating.visits2 WHERE user_id = 100 and visitor_id = 200;

 user_id | visitor_id | visit_month | visit_date
-++-+
 100 |200 |   7 |456
 100 |200 |   6 |456

(2 rows)


This is the most important query. You always get sorted it by month, latest 
month on top with some visit day.

cqlsh> select * from dating.visits2 WHERE user_id = 100 and visitor_id = 200 
limit 1;

 user_id | visitor_id | visit_month | visit_date
-++-+
 100 |200 |   7 |456

The trick is that if somebody visited that user later in July (visit_month 7), 
it will get overwritten because the whole primary key is same:

cqlsh> insert into dating.visits2 (user_id , visitor_id , visit_month , 
visit_date ) VALUES ( 100, 200, 7, 12345);
cqlsh> select * from dating.visits2 WHERE user_id = 100 and visitor_id = 200 
limit 1;

 user_id | visitor_id | visit_month | visit_date
-++-+
 100 |200 |   7 |  12345

So you will have 1 entry ever per month and you will have 6 entries for 6 
months, each such entry would always tell you the most recent visit in that 
month.



From: Regis Le Bretonnic 
Sent: Friday, August 18, 2023 11:30
To: user@cassandra.apache.org
Subject: Re: Materialized View inconsistency issue

You don't often get email from r.lebreton...@meetic-corp.com. Learn why this is 
important<https://aka.ms/LearnAboutSenderIdentification>
NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



What you propose is another debate 

Most of the time there are a product department and a tech department (I'm sure 
it is your case at netapp)... I'd like to have a voice loud enough to influence 
product requirements but it is not the way it works. I'm paid to make miracles 
and not to explain to the Director of Product, he can not do what he wants...
I know that "6 months" is arbiitrary and a lower period could simplify 
things... but basically it is a compromise I can not challenge.

- 1 month is not enough for different reasons :
 - long enough for a "jet fighter" that received 1 visits per months... 
but not long enough for people that receive 4 visits per month (because he 
lives in a poor density area or other reasons). This has a psychological impact 
directly influences the experience (and revenue).
 - you can suspend an account for instance because you are in holidays... 
and when you will come back the list of visits received will be empty. This as 
also a psychological impact (also impacting the revenue).
- 1 year is probably to long...

The compromise with the product team is 6 months and I can not change that even 
if it is stupid.

I am sure that most readers of this forum are technical folks that are in the 
same situtation as me.
Let's stay on the technical point of view...


Le ven. 18 août 2023 à 10:48, Miklosovic, Stefan 
mailto:stefan.mikloso...@netapp.com>> a écrit :
The 2 tables you propose Stefan can not natively order rows by time (they will 
be ordered by visitor_id), excepted if you sort rows after the select.

So what? I think this is way better than dealing with MV which you will get 
inconsistent eventually. Do you want to have broken MV or you want to sort on 
the client? Which is better?

The table will be like this

cqlsh> select * from dating.visits_by_visitor_id ;

 user_id | visitor_id
-+
 100 |200
 100 |300

(2 rows)
cqlsh> select * from dating.visits;

 user_id | visitor_id | visit_date
-++
 100 |300 |  3
 100 |200 |  5
 100 |200 |  2
 100 |200 |   

Re: Materialized View inconsistency issue

2023-08-18 Thread Miklosovic, Stefan
The 2 tables you propose Stefan can not natively order rows by time (they will 
be ordered by visitor_id), excepted if you sort rows after the select.

So what? I think this is way better than dealing with MV which you will get 
inconsistent eventually. Do you want to have broken MV or you want to sort on 
the client? Which is better?

The table will be like this

cqlsh> select * from dating.visits_by_visitor_id ;

 user_id | visitor_id
-+
 100 |200
 100 |300

(2 rows)
cqlsh> select * from dating.visits;

 user_id | visitor_id | visit_date
-++
 100 |300 |  3
 100 |200 |  5
 100 |200 |  2
 100 |200 |  1

(4 rows)
cqlsh>

Now if you iterate over 100 and 200 and you get limits by 1, you get latest 
results.

Now it might be true that you get the result which is not sorted on timestamp 
but does that really matter? You can always sort it on the client.

The advantage of this approach is that you know all visitors of somebody on one 
query if that ever mattered.
You also know when somebody was visited by somebody in some period of time

select visit_date from dating.visits where user_id = 100 and visitor_id = 200 
and visit_date > 3 and visit_date < 8;

Also, I dont know what business logic you have in detail, but why would 
somebody be interested who visited him 6 months ago? What is that information 
good for in practice? Why dont you do it like this?

INSERT INTO dating.visits (user_id , visitor_id, visit_date ) VALUES ( 100, 
300, 60) USING TTL 10;

Use TTL of e.g. 1 month? So rows would start to disappear automatically. If 
somebody visited me 2 months ago and then it disappears next I would not care 
at all. A user who visited me 2 months ago is basically equal to a user who has 
never visited me.



From: Regis Le Bretonnic 
Sent: Friday, August 18, 2023 9:47
To: user@cassandra.apache.org
Subject: Re: Materialized View inconsistency issue

You don't often get email from r.lebreton...@meetic-corp.com. Learn why this is 
important
NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



Hi Stefan

Happy to see that our use case interest you :-)
I'm not sure that I explained well what we want.

Imagine that sequence of events :
- Julia visits Joe at t1
- Julia visits Joe at t2
- Karen visits Joe at t3
- Silvia visits Joe at t4
- Karen visits Joe at t5
- Karen visits Joe at t6
- Julia visits Joe at t7

We want to provide to Joe a webpage listing visits he received in that order :
- Juiia at t7  (the more recent)
- Karen at t6
- Silvia at t4

The 2 tables you propose Stefan can not natively order rows by time (they will 
be ordered by visitor_id), excepted if you sort rows after the select.

Keep in mind that some people can received a loot on visits in 6 months 
(200,000 or 300,000 deduplicated visits, and much more if you keep duplicate 
visits) and ordering such volume of rows by code is not easy (in fact 
impossible because we use PHP and we can't do that in a FPM memory...)
... and of course, because we can not provide in a single page of 200,000 or 
300,000 members stickers in one shot, the webpage requires pagination (with lot 
of 100 profiles per page). If you decide that sorting should be made on the 
code side, the pagination becomes awful to manage.

PS 1 : when we decide to do this, MV were not yet back to experimental
PS 2 : the code to manage a visit received is very easy... we just do a insert 
in the master table without doing any select before... we just don't care of 
what happened in past...
PS 3 : the pagination is very easy... we just do a
- select * from visits_received_by_date where receiver_id=111 and 
visit_datemailto:stefan.mikloso...@netapp.com>> a écrit :
Why can't you do it like this? You would have two tables:

create table visits (user_id bigint, visitor_id bigint, visit_date timestamp, 
primary key ((user_id, visitor_id), visit_date)) order by visit_date desc

create table visitors_by_user_id (user_id bigint, visitor_id bigint, primary 
key ((user_id), visitor_id))

The logic behind the second table, visitors_by_user_id, is that you do not care 
if a user visited you twice, because it is primary key + clustering column, if 
same user visits you twice, the second time it would basically do nothing, 
because such entry is already there.

For example:

user_id | visitor_id
joe | karen
joe | julia

If Karen visits me again, nothing happens as that entry is already there.

Then if Karen visits me, I put into the second table

joe | karen | tuesday
joe | karen | monday
joe | karen | last friday
joe | julia | today

So to know who visited me recently, I do

select visitor_id from visitors_by_user_id where user_id = Joe;

So I get Karen and 

Re: Materialized View inconsistency issue

2023-08-17 Thread Miklosovic, Stefan
Why can't you do it like this? You would have two tables:

create table visits (user_id bigint, visitor_id bigint, visit_date timestamp, 
primary key ((user_id, visitor_id), visit_date)) order by visit_date desc

create table visitors_by_user_id (user_id bigint, visitor_id bigint, primary 
key ((user_id), visitor_id))

The logic behind the second table, visitors_by_user_id, is that you do not care 
if a user visited you twice, because it is primary key + clustering column, if 
same user visits you twice, the second time it would basically do nothing, 
because such entry is already there.

For example:

user_id | visitor_id
joe | karen
joe | julia

If Karen visits me again, nothing happens as that entry is already there.

Then if Karen visits me, I put into the second table

joe | karen | tuesday
joe | karen | monday
joe | karen | last friday
joe | julia | today

So to know who visited me recently, I do

select visitor_id from visitors_by_user_id where user_id = Joe;

So I get Karen and Julia

And then for each such visitor I do

select visit_date from visits where user_id = Joe and visitor_id = Julia limit 1


From: Regis Le Bretonnic 
Sent: Tuesday, August 15, 2023 17:49
To: user@cassandra.apache.org
Subject: Re: Materialized View inconsistency issue

You don't often get email from r.lebreton...@meetic-corp.com. Learn why this is 
important
NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



Hi Josh...

A long (and almost private) message to explain how we fix materialized views.

Let me first explain our use case... I work for an european dating website.
Users can received visits from other users (typically when someone looks at a 
member profile page), and we want to inform them for each visit received 
(sorted from the most recent one to the oldest one).
But imagine that Karen goes several times on my profile page... I don't want to 
see all her visits but only the last one. So, we want to deduplicate rows (see 
only once Karen), and ordered the rows (showing Julia that visit me 1 minute 
ago, Sophia that visit me 3 minutes ago, Karen that visit me 10 minutes ago, 
and so on).

You can not do that in cassandra. If you want to deduplicate rows by pairs of 
users, the "visit timestamp" can not be in the primary key... and if you want 
to order rows by the "visit timestamp", this field must be in the clustering 
columns and consequently in the primary key. That is just not possible !

Waht we do is :
- a master table like this :

CREATE TABLE visits_received (
receiver_id bigint,
sender_id bigint,
visit_date timestamp,
PRIMARY KEY ((receiver_id), sender_id)
) WITH CLUSTERING ORDER BY (sender_id ASC);

- and a materialized view like this :

CREATE MATERIALIZED VIEW visits_received_by_date as
SELECT receiver_id, sender_id, visit_date
FROM visits_received
WHERE receiver_id IS NOT NULL AND sender_id IS NOT NULL AND visit_date IS 
NOT NULL
PRIMARY KEY ((receiver_id), visit_date, sender_id)
WITH CLUSTERING ORDER BY (visit_date DESC, sender_id ASC);

With this the master table deduplicates, and the MV sorts rows the way we want.


Problems we have are most of the time having rows that should not exist in the 
MV...
Let's say that I have this row in the master table :
- 111, 222, t3
and that because of materialized view unconsistency, I have 3 rows in the MV :
- 111, 222, t3
- 111, 222, t2
- 111, 222, t1

then to remove the 2 wrong rows in the MV, we do a double insert on the master 
table :
insert (111, 222, t1) + insert (111, 222, t3) -> this remove the row "111, 222, 
t1"
insert (111, 222, t2) + insert (111, 222, t3) -> this remove the row "111, 222, 
t2"

We can very, very rarely have other cases (rows in master and not in MV), but 
these are also very easy to fix by just re-inserting the master rows.


Now about our spark script :
- we download sequentially the master table and the MV
- we compare them to find ... "potential inconsistencies" (because the tables 
are not download at the same time and data can have change, we can find false 
positive errors)
- we loop on all the "potential inconsistencies" and force a new read on the 
table and the MV to check if there is truly inconsistency when reads are made 
in few milliseconds
- if it is a true inconsistency, we force inserts on the master table to fix 
the MV as describe below


Now, about the volume of inconsistency :
- on a master table with 1.7 B-rows
- we have ~ 12.5 K-rows that are unconsistent (0,0007%) after 2 years... 
clearly better than what our developpers will do by managing inserts and 
deletes by themshelves (and acceptable for our use case)


Le lun. 14 août 2023 à 16:36, Josh McKenzie 
mailto:jmcken...@apache.org>> a écrit :
When it comes to denormalization in Cassandra today your options are to either 
do it 

Re: Cassandra FQL question-

2023-08-08 Thread Miklosovic, Stefan
Hey,

I did same steps from extracted tarball of 4.1.3 and it just works. It also 
works when I install Debian package.

The error you get seems like the class path is not set correctly so it can not 
load it.

You can probably debug this by wrapping last line in fqltool in "echo" to see 
what is on the class path.

Regards


From: Akshith Mull 
Sent: Tuesday, August 8, 2023 18:30
To: user@cassandra.apache.org
Subject: Cassandra FQL question-

You don't often get email from akshith.m...@gmail.com. Learn why this is 
important
NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



Hi All,

We have cassandra 4.x version,

I have enabled FQL as per the documentation-


Full Query Logging | Apache Cassandra 
Documentation
cassandra.apache.org
[favicon.ico]


nodetool enablefullquerylog --path /tmp/cassandrafullquerylog



Files generated.


-rw-r--r--   1131072 Aug  8 16:22 metadata.cq4t

-rw-r--r--   1 83886080 Aug  8 16:22 20230808-16.cq4


Its generated in binary .


As per the doc Im running dump command to convert human readable format.


fqltool dump /tmp/cassandrafullquerylog


But Im getting below error.


Error: Could not find or load main class 
org.apache.cassandra.fqltool.FullQueryLogTool

Caused by: java.lang.ClassNotFoundException: 
org.apache.cassandra.fqltool.FullQueryLogTool


Do we need to install anything to generate  the output ?



Gently appreciate the answers.



thanks









[RELEASE] Apache Cassandra 4.1.3 released

2023-07-24 Thread Miklosovic, Stefan
The Cassandra team is pleased to announce the release of Apache Cassandra 
version 4.1.3.

Apache Cassandra is a fully distributed database. It is the right choice when 
you need scalability and high availability without compromising performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 4.1 series. As always, please pay 
attention to the release notes[2] and Let us know[3] if you were to encounter 
any problem.

[WARNING] Debian and RedHat package repositories have moved! Debian 
/etc/apt/sources.list.d/cassandra.sources.list and RedHat 
/etc/yum.repos.d/cassandra.repo files must be updated to the new repository 
URLs. For Debian it is now https://debian.cassandra.apache.org . For RedHat it 
is now https://redhat.cassandra.apache.org/41x/ .

Enjoy!

[1]: CHANGES.txt 
https://github.com/apache/cassandra/blob/cassandra-4.1.3/CHANGES.txt
[2]: NEWS.txt https://github.com/apache/cassandra/blob/cassandra-4.1.3/NEWS.txt
[3]: https://issues.apache.org/jira/browse/CASSANDRA

[RELEASE] Apache Cassandra 4.0.11 released

2023-07-18 Thread Miklosovic, Stefan
The Cassandra team is pleased to announce the release of Apache Cassandra 
version 4.0.11.

Apache Cassandra is a fully distributed database. It is the right choice when 
you need scalability and high availability without compromising performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 4.0 series. As always, please pay 
attention to the release notes[2] and let us know[3] if you were to encounter 
any problem.

[WARNING] Debian and RedHat package repositories have moved! Debian 
/etc/apt/sources.list.d/cassandra.sources.list and RedHat 
/etc/yum.repos.d/cassandra.repo files must be updated to the new repository 
URLs. For Debian it is now https://debian.cassandra.apache.org . For RedHat it 
is now https://redhat.cassandra.apache.org/40x/ .

Enjoy!

[1]: CHANGES.txt 
https://github.com/apache/cassandra/blob/cassandra-4.0.11/CHANGES.txt
[2]: NEWS.txt https://github.com/apache/cassandra/blob/cassandra-4.0.11/NEWS.txt
[3]: https://issues.apache.org/jira/browse/CASSANDRA

Re: Backporting CASSANDRA-18560 to Cassandra 4.0.10

2023-07-17 Thread Miklosovic, Stefan
Hi Manish,

I do not think that is possible. 4.0.10 was already released. We can not 
backport anything to what is already released. I believe you need to update to 
4.0.11.

Regards


From: manish khandelwal 
Sent: Monday, July 17, 2023 11:40
To: user@cassandra.apache.org
Subject: Backporting CASSANDRA-18560 to Cassandra 4.0.10

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



I see a critical bug 
https://issues.apache.org/jira/browse/CASSANDRA-18507
 fixed in Cassandra 4.0.10. But also see that one critical bug 
https://issues.apache.org/jira/browse/CASSANDRA-18560
 introduced and that is going to be fixed in 4.0.11.  Can fix of 
https://issues.apache.org/jira/browse/CASSANDRA-18560
 (which is basically revert) can be easily backported to Cassandra 4.0.10 
without any impact.


Regards
Manish


Re: Survey about the parsing of the tooling's output

2023-07-11 Thread Miklosovic, Stefan
I am sorry, this is the correct link

https://lists.apache.org/thread/72j5qfgbttjcmylhcmfq1ptboh641ns0


From: Miklosovic, Stefan 
Sent: Wednesday, July 12, 2023 0:08
To: user@cassandra.apache.org
Subject: Re: Survey about the parsing of the tooling's output

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.




Thank you very much for your valuable feedback and insights.

There is a thread (1) where we are discussing this as well. We should come up 
with some decisions, you are welcome to participate / follow the discussion 
there as well if you wish.

(1) https://lists.apache.org/list.html?d...@cassandra.apache.org


From: Andrew Weaver 
Sent: Monday, July 10, 2023 17:37
To: user@cassandra.apache.org
Subject: Re: Survey about the parsing of the tooling's output

You don't often get email from andrewjwea...@gmail.com. Learn why this is 
important<https://aka.ms/LearnAboutSenderIdentification>
NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



+1 to Bowen Song's feedback for the most part.

We have processes that parse output from these nodetool commands:

  *   info
  *   netstats
  *   status
  *   version

My opinion is that for anyone running a reasonably sized fleet of Cassandra 
will have different flavors of automation - some things running on the nodes 
themselves where nodetool is very handy and some things running outside the 
cluster where virtual tables accessed via cql are preferred.

I propose a rule that within a given major version, additional lines of output 
are acceptable changes, but changes to the format of existing lines of output 
are forbidden.

I would be inclined to accept JSON or YAML output from nodetool for 
Ruby/Python/etc scripts, but for bash, the human-readable output is more 
work-able.

On Mon, Jul 10, 2023 at 4:35 AM Miklosovic, Stefan 
mailto:stefan.mikloso...@netapp.com>> wrote:
Hi Cassandra users,

I am a Cassandra developer and we in Cassandra project would love to know if 
there are users out there for whom the output of the tooling, like, nodetool, 
is important when it comes to parsing it.

We are elaborating on the consequences when nodetool's output for various 
commands is changed - we are not completely sure if users are parsing this 
output in some manner in their custom scripts so us changing the output would 
break their scripts which are parsing it.

Additionally, how big of a problem the output change would be if it was 
happening only between major Cassandra versions? E.g. 4.0 -> 5.0 or 5.0 -> 6.0 
only. In other words, there would be a guarantee that no breaking changes in 
minor versions would ever occur. Only in majors.

Is somebody out there who is relying on the output of some particular nodetool 
commands (or any command in tools/bin) in production? How often do you rely on 
the parsing of nodetool's output and how much work it would be for you to 
rework some minor changes? For example, when the tool output prints 
"someStatistic: 10" and we would rework it to "Some Statistic: 10".

Would you be OK if the output changed but you would have a way how to get e.g. 
JSON or YAML output instead by some flag on nodetool command so it would be 
irrelevant what the default output would be?

It would be appreciated a lot if you gave us more feedback on this. I 
understand that not all questions are relatable to everyone.

Even you are not relying on the output of the tooling in some custom scripts 
where you parse it, please tell us so. We are progressively trying to provide 
CQL way how to query the internal state of Cassandra, via virtual tables, for 
example.

Regards

Stefan Miklosovic


--
Andrew Weaver


Re: Survey about the parsing of the tooling's output

2023-07-11 Thread Miklosovic, Stefan
Thank you very much for your valuable feedback and insights.

There is a thread (1) where we are discussing this as well. We should come up 
with some decisions, you are welcome to participate / follow the discussion 
there as well if you wish.

(1) https://lists.apache.org/list.html?d...@cassandra.apache.org


From: Andrew Weaver 
Sent: Monday, July 10, 2023 17:37
To: user@cassandra.apache.org
Subject: Re: Survey about the parsing of the tooling's output

You don't often get email from andrewjwea...@gmail.com. Learn why this is 
important<https://aka.ms/LearnAboutSenderIdentification>
NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



+1 to Bowen Song's feedback for the most part.

We have processes that parse output from these nodetool commands:

  *   info
  *   netstats
  *   status
  *   version

My opinion is that for anyone running a reasonably sized fleet of Cassandra 
will have different flavors of automation - some things running on the nodes 
themselves where nodetool is very handy and some things running outside the 
cluster where virtual tables accessed via cql are preferred.

I propose a rule that within a given major version, additional lines of output 
are acceptable changes, but changes to the format of existing lines of output 
are forbidden.

I would be inclined to accept JSON or YAML output from nodetool for 
Ruby/Python/etc scripts, but for bash, the human-readable output is more 
work-able.

On Mon, Jul 10, 2023 at 4:35 AM Miklosovic, Stefan 
mailto:stefan.mikloso...@netapp.com>> wrote:
Hi Cassandra users,

I am a Cassandra developer and we in Cassandra project would love to know if 
there are users out there for whom the output of the tooling, like, nodetool, 
is important when it comes to parsing it.

We are elaborating on the consequences when nodetool's output for various 
commands is changed - we are not completely sure if users are parsing this 
output in some manner in their custom scripts so us changing the output would 
break their scripts which are parsing it.

Additionally, how big of a problem the output change would be if it was 
happening only between major Cassandra versions? E.g. 4.0 -> 5.0 or 5.0 -> 6.0 
only. In other words, there would be a guarantee that no breaking changes in 
minor versions would ever occur. Only in majors.

Is somebody out there who is relying on the output of some particular nodetool 
commands (or any command in tools/bin) in production? How often do you rely on 
the parsing of nodetool's output and how much work it would be for you to 
rework some minor changes? For example, when the tool output prints 
"someStatistic: 10" and we would rework it to "Some Statistic: 10".

Would you be OK if the output changed but you would have a way how to get e.g. 
JSON or YAML output instead by some flag on nodetool command so it would be 
irrelevant what the default output would be?

It would be appreciated a lot if you gave us more feedback on this. I 
understand that not all questions are relatable to everyone.

Even you are not relying on the output of the tooling in some custom scripts 
where you parse it, please tell us so. We are progressively trying to provide 
CQL way how to query the internal state of Cassandra, via virtual tables, for 
example.

Regards

Stefan Miklosovic


--
Andrew Weaver


Survey about the parsing of the tooling's output

2023-07-10 Thread Miklosovic, Stefan
Hi Cassandra users,

I am a Cassandra developer and we in Cassandra project would love to know if 
there are users out there for whom the output of the tooling, like, nodetool, 
is important when it comes to parsing it. 

We are elaborating on the consequences when nodetool's output for various 
commands is changed - we are not completely sure if users are parsing this 
output in some manner in their custom scripts so us changing the output would 
break their scripts which are parsing it.

Additionally, how big of a problem the output change would be if it was 
happening only between major Cassandra versions? E.g. 4.0 -> 5.0 or 5.0 -> 6.0 
only. In other words, there would be a guarantee that no breaking changes in 
minor versions would ever occur. Only in majors. 

Is somebody out there who is relying on the output of some particular nodetool 
commands (or any command in tools/bin) in production? How often do you rely on 
the parsing of nodetool's output and how much work it would be for you to 
rework some minor changes? For example, when the tool output prints 
"someStatistic: 10" and we would rework it to "Some Statistic: 10".

Would you be OK if the output changed but you would have a way how to get e.g. 
JSON or YAML output instead by some flag on nodetool command so it would be 
irrelevant what the default output would be?

It would be appreciated a lot if you gave us more feedback on this. I 
understand that not all questions are relatable to everyone. 

Even you are not relying on the output of the tooling in some custom scripts 
where you parse it, please tell us so. We are progressively trying to provide 
CQL way how to query the internal state of Cassandra, via virtual tables, for 
example.

Regards

Stefan Miklosovic

Re: unsubscribe

2023-06-29 Thread Miklosovic, Stefan
Hi,

you need to send an email to this email address in order to unsubscribe

user-unsubscr...@cassandra.apache.org


From: zbg...@gmail.com 
Sent: Thursday, June 29, 2023 12:56
To: user
Subject: unsubscribe

You don't often get email from zbg...@gmail.com. Learn why this is 
important
NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.








Is anybody out there using CloudstackSnitch?

2023-06-29 Thread Miklosovic, Stefan
Hi users,

I would like to know if there is somebody who recently used or is currently 
using or ever used CloudstackSnitch (1) which is supposed to be a snitch for 
Cassandra deployed in Apache CloudStack  (2).

Regards

(1) 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/locator/CloudstackSnitch.java
(2) https://cloudstack.apache.org/


Re: Is there a way to find out if a server is part of application connection string?

2023-06-07 Thread Miklosovic, Stefan
Hi Surbhi,

maybe looking into system_views.clients virtual table in case you are on a 
cluster of version 4.0+ would be helpful? That table contains all clients 
connected to that particular Cassandra node having "address" and "hostname" 
columns as well as "username" column.

I am not sure there is any equivalent of this in 3.11.

Regards


From: Surbhi Gupta 
Sent: Wednesday, June 7, 2023 2:10
To: user@cassandra.apache.org
Subject: Is there a way to find out if a server is part of application 
connection string?

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.




Hi,

We have a cluster with many applications connecting to it.
We need to decommission few of the servers from the cluster .
Without asking the application team, is there any way to know the ips
of the application connection string?
Does cassandra logs (system or debug) this information somewhere?

Application team might have different ips than seed nodes.

Thanks in advance.
Surbhi


[RELEASE] Apache Cassandra 3.0.29 released

2023-05-15 Thread Miklosovic, Stefan
The Cassandra team is pleased to announce the release of Apache Cassandra 
version 3.0.29.

Apache Cassandra is a fully distributed database. It is the right choice when 
you need scalability and high availability without compromising performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.0 series. As always, please pay 
attention to the release notes[2] and Let us know[3] if you were to encounter 
any problem.

[WARNING] Debian and RedHat package repositories have moved! Debian 
/etc/apt/sources.list.d/cassandra.sources.list and RedHat 
/etc/yum.repos.d/cassandra.repo files must be updated to the new repository 
URLs. For Debian it is now https://debian.cassandra.apache.org . For RedHat it 
is now https://redhat.cassandra.apache.org/30x/ .

Enjoy!

[1]: CHANGES.txt 
https://github.com/apache/cassandra/blob/cassandra-3.0.29/CHANGES.txt
[2]: NEWS.txt https://github.com/apache/cassandra/blob/cassandra-3.0.29/NEWS.txt
[3]: https://issues.apache.org/jira/browse/CASSANDRA

[RELEASE] Apache Cassandra 3.11.15 released

2023-05-05 Thread Miklosovic, Stefan
The Cassandra team is pleased to announce the release of Apache Cassandra 
version 3.11.15.

Apache Cassandra is a fully distributed database. It is the right choice when 
you need scalability and high availability without compromising performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.11 series. As always, please pay 
attention to the release notes[2] and Let us know[3] if you were to encounter 
any problem.

[WARNING] Debian and RedHat package repositories have moved! Debian 
/etc/apt/sources.list.d/cassandra.sources.list and RedHat 
/etc/yum.repos.d/cassandra.repo files must be updated to the new repository 
URLs. For Debian it is now https://debian.cassandra.apache.org . For RedHat it 
is now https://redhat.cassandra.apache.org/311x/ .

Enjoy!

[1]: CHANGES.txt 
https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/cassandra-3.11.15
[2]: NEWS.txt 
https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/cassandra-3.11.15
[3]: https://issues.apache.org/jira/browse/CASSANDRA

[RELEASE] Apache Cassandra 4.0.9 released

2023-04-15 Thread Miklosovic, Stefan

The Cassandra team is pleased to announce the release of Apache Cassandra 
version 4.0.9.

Apache Cassandra is a fully distributed database. It is the right choice when 
you need scalability and high availability without compromising performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 4.0 series. As always, please pay 
attention to the release notes[2] and Let us know[3] if you were to encounter 
any problem.

[WARNING] Debian and RedHat package repositories have moved! Debian 
/etc/apt/sources.list.d/cassandra.sources.list and RedHat 
/etc/yum.repos.d/cassandra.repo files must be updated to the new repository 
URLs. For Debian it is now https://debian.cassandra.apache.org . For RedHat it 
is now https://redhat.cassandra.apache.org/40x/ .

Enjoy!

[1]: CHANGES.txt 
https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/cassandra-4.0.9
[2]: NEWS.txt 
https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/cassandra-4.0.9
[3]: https://issues.apache.org/jira/browse/CASSANDRA

[RELEASE] Apache Cassandra 4.1.1 released

2023-03-21 Thread Miklosovic, Stefan
The Cassandra team is pleased to announce the release of Apache Cassandra 
version 4.1.1.

Apache Cassandra is a fully distributed database. It is the right choice when 
you need scalability and high availability without compromising performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 4.1 series. As always, please pay 
attention to the release notes[2] and Let us know[3] if you were to encounter 
any problem.

[WARNING] Debian and RedHat package repositories have moved! Debian 
/etc/apt/sources.list.d/cassandra.sources.list and RedHat 
/etc/yum.repos.d/cassandra.repo files must be updated to the new repository 
URLs. For Debian it is now https://debian.cassandra.apache.org . For RedHat it 
is now https://redhat.cassandra.apache.org/41x/ .

Enjoy!

[1]: CHANGES.txt 
https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/cassandra-4.1.1
[2]: NEWS.txt 
https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/cassandra-4.1.1
[3]: https://issues.apache.org/jira/browse/CASSANDRA

Re: Does Coordinator select fastest node for Digest request In Read Path

2023-03-09 Thread Miklosovic, Stefan
Hi Ranju,

I see this in the code: 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/StorageProxy.java#L2096


From: ranju goel 
Sent: Thursday, March 9, 2023 13:20
To: user@cassandra.apache.org
Subject: Does Coordinator select fastest node for Digest request In Read Path

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



Hi Everyone,

If I have a Local_Quorum CL and RF=3, For Read Path Coordinator selects the 
fastest replica using dynamicSnitch for Read full data , but does it use 
dynamicSnitch ( or fastest replica) for reading digest data? or it chooses any 
of the replica for digest?

Regards
Ranju


[RELEASE] Apache Cassandra 4.0.8 released

2023-02-14 Thread Miklosovic, Stefan
The Cassandra team is pleased to announce the release of Apache Cassandra 
version 4.0.8.

Apache Cassandra is a fully distributed database. It is the right choice when 
you need scalability and high availability without compromising performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 4.0 series. As always, please pay 
attention to the release notes[2] and Let us know[3] if you were to encounter 
any problem.

[WARNING] Debian and RedHat package repositories have moved! Debian 
/etc/apt/sources.list.d/cassandra.sources.list and RedHat 
/etc/yum.repos.d/cassandra.repo files must be updated to the new repository 
URLs. For Debian it is now https://debian.cassandra.apache.org . For RedHat it 
is now https://redhat.cassandra.apache.org/40x/ .

Enjoy!

[1]: CHANGES.txt 
https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/cassandra-4.0.8
[2]: NEWS.txt 
https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/cassandra-4.0.8
[3]: https://issues.apache.org/jira/browse/CASSANDRA