The DCAwareRoundRobinPolicy/TokenAwareHostPolicy controlls which
Cassandra coordinator node the client sends queries to, not the nodes it
connects to, nor the nodes that performs the actual read.
A client sends a CQL read query to a coordinator node, and the
coordinator node parses the CQL query, and send READ requests to other
nodes in the cluster based on the consistency level.
Have you checked the consistency level of the session (and the query if
applicable)? Is it prefixed with "LOCAL_"? If not, the coordinator will
send the READ requests to non-local DCs.
On 05/08/2022 19:40, Raphael Mazelier wrote:
Hi Cassandra Users,
I'm relatively new to Cassandra and first I have to say I'm really
impressed by the technology.
Good design and a lot of stuff to understand the underlying (the
Oreilly book help a lot as well as thelastpickle blog post).
I have an muli-datacenter c* cluster (US, Europe, Singapore) with
eight node on each (two seeds on each region), two racks on Eu,
Singapore, 3 on US. Everything deployed in AWS.
We have a keyspace configured with network topology and two replicas
on every region like this: {'class': 'NetworkTopologyStrategy',
'ap-southeast-1': '2', 'eu-west-1': '2', 'us-east-1': '2'}
Investigating some performance issue I noticed strange things in my
experiment:
What we expect is very slow latency 3/5ms max for this specific select
query. So we want every read to be local the each datacenter.
We configure DCAwareRoundRobinPolicy(local_dc=DC) in python, and the
same in Go gocql.TokenAwareHostPolicy(gocql.DCAwareRoundRobinPolicy("DC"))
Testing a bit with two short program (I can provide them) in go and
python I notice very strange result. Basically I do the same query
over and over with a very limited dataset of id.
The first result were surprising cause the very first query were
always more than 250ms and after with stressing c* (playing with sleep
between query) I can achieve a good ratio of query at 3/4 ms (what I
expected).
My guess was that long query were somewhat executed not locally (or at
least imply multi datacenter queries) and short one no.
Activating tracing in my program (like enalbing trace in cqlsh) kindla
confirm my suspicion.
(I will provide trace in attachment).
My question is why sometime C* try to read not localy? how we can
disable it? what is the criteria for this?
(btw I'm very not fan of this multi region design for theses very
specific kind of issues...)
Also side question: why C* is so slow at connection? it's like it's
trying to reach every nodes in each DC? (we only provide locals seeds
however). Sometimes it take more than 20s...
Any help appreciated.
Best,
--
Raphael Mazelier