Re: Using keyspaces for virtual clusters

Eric Stevens Wed, 21 Sep 2016 07:57:48 -0700

Using keyspaces to support multi tenancy is very close to an anti pattern
unless there is a finite and reasonable upper bound to how many tenants
you'll support overall. Large numbers of tables comes with cluster overhead
and operational complexity you will come to regret eventually.


>and because I don't like having multiple cql clients/connections on my
app-code

You should note that although Cassandra drivers present a single logical
connection per cluster, under the hood it maintains connection pools per C*
host. You might be able to do a slightly better job of managing those pools
as a single cluster and logical connection, but I doubt it will be very
significant. It would depend on what options you have available in your
driver of choice.

Application logic would complexity not be greatly improved because you
still need to switch by tenant, whether it's keyspace name or connection
name doesn't seem like it would make much difference.

As Alain pointed out, upgrades will be painful and maybe even dangerous as
a monolithic cluster.

On Wed, Sep 21, 2016, 3:50 AM Alain RODRIGUEZ <arodr...@gmail.com> wrote:

> Hi Dorian,
>
> I'm thinking of creating many keyspaces and storing them into many virtual
>> datacenters (the servers will be in 1 logical datacenter, but separated by
>> keyspaces).
>>
>> Does that make sense (so growing up to 200 dcs of 3 servers each in best
>> case scenario)?
>
>
> There is 3 main things you can do here
>
> 1 - Use 1 DC, 200 keyspaces using the DC
> 2 - Use 200 DC, 1 keyspace per DC.
> 3 - Use 200 cluster, 1 DC, 1 keyspace per client (or many keyspaces, but
> related to 1 client)
>
> I am not sure if you want to go with 1 or 2, my understanding is you
> wanted to write "the servers will be in 1 -*logical- **physical*
> datacenter" and you are willing to do as described in 2.
>
> This looks to be a good idea to me, but for other reasons (clients /
> workload isolation, limited risk, independent growth for each client,
> visibility on cost per client, ...)
>
> Does that make sense (so growing up to 200 dcs of 3 servers each in best
>> case scenario)?
>>
>
> Yet I would not go with distinct DC, but rather distinct C* clusters
> (different cluster names, seeds, etc).
>
> I see no good reason to use virtual cluster instead of distinct cluster.
> Keep keyspace in distinct isolated datacenter would work. Datacenter would
> be quite isolated since no information or load would be shared, excepted
> from gossip.
>
> Yet there are some issue with big clusters due to gossip, and I had some
> issue in the past due to gossip, affecting all the DC within a cluster. In
> this case you would face a major issue, that you could have avoided or
> limited. Plus when upgrading Cassandra, you would have to upgrade 600 nodes
> quite quickly when distinct clusters can be upgraded independently. I would
> then go with either option 1 or 3.
>
> and because I don't like having multiple cql clients/connections on my
>> app-code
>
>
> In this case, wouldn't it make sense for you to have per customer app-code
> or just a conditional connection creation depending on the client?
>
> I just try to give you some ideas.
>
> Are the keyspaces+tables of dc1 stored in a cassandra node of dc2 ?(since
>> there is overhead with each keyspace + table which would probably break
>> this design)
>
> Or is it just a simple map dcx--->ip1,ip2,ip3 ?
>
>
> I just checked it. All the nodes would know about every keyspace and
> table, if using the same Cassandra cluster, (in my testing version C*3.7,
> this is stored under system_schema.tables - local strategy, no
> replication). To avoid that, using distinct clusters is the way to go.
>
> https://gist.github.com/arodrime/2f4fb2133c5b242b9500860ac8c6d89c
>
> C*heers,
> -----------------------
> Alain Rodriguez - @arodream - al...@thelastpickle.com
> France
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> 2016-09-20 22:49 GMT+02:00 Dorian Hoxha <dorian.ho...@gmail.com>:
>
>> Hi,
>>
>> I need to separate clients data into multiple clusters and because I
>> don't like having multiple cql clients/connections on my app-code, I'm
>> thinking of creating many keyspaces and storing them into many virtual
>> datacenters (the servers will be in 1 logical datacenter, but separated by
>> keyspaces).
>>
>> Does that make sense (so growing up to 200 dcs of 3 servers each in best
>> case scenario)?
>>
>> Does the cql-engine make a new connection (like "use keyspace") when
>> specifying "keyspace.table" on the query ?
>>
>> Are the keyspaces+tables of dc1 stored in a cassandra node of dc2 ?(since
>> there is overhead with each keyspace + table which would probably break
>> this design)
>> Or is it just a simple map dcx--->ip1,ip2,ip3 ?
>>
>> Thank you!
>>
>
>

Re: Using keyspaces for virtual clusters

Reply via email to