Re: Using keyspaces for virtual clusters

Dorian Hoxha Wed, 21 Sep 2016 09:41:31 -0700

@Alain
I wanted to do 2, but looks like that won't be possible because of too much
overhead.


@Eric
Yeah that's what I was afraid of. Though I know that the client connects to
every server, I just didn't want to do the extra code.

On Wed, Sep 21, 2016 at 4:56 PM, Eric Stevens <migh...@gmail.com> wrote:

> Using keyspaces to support multi tenancy is very close to an anti pattern
> unless there is a finite and reasonable upper bound to how many tenants
> you'll support overall. Large numbers of tables comes with cluster overhead
> and operational complexity you will come to regret eventually.
>
> >and because I don't like having multiple cql clients/connections on my
> app-code
>
> You should note that although Cassandra drivers present a single logical
> connection per cluster, under the hood it maintains connection pools per C*
> host. You might be able to do a slightly better job of managing those pools
> as a single cluster and logical connection, but I doubt it will be very
> significant. It would depend on what options you have available in your
> driver of choice.
>
> Application logic would complexity not be greatly improved because you
> still need to switch by tenant, whether it's keyspace name or connection
> name doesn't seem like it would make much difference.
>
> As Alain pointed out, upgrades will be painful and maybe even dangerous as
> a monolithic cluster.
>
> On Wed, Sep 21, 2016, 3:50 AM Alain RODRIGUEZ <arodr...@gmail.com> wrote:
>
>> Hi Dorian,
>>
>> I'm thinking of creating many keyspaces and storing them into many
>>> virtual datacenters (the servers will be in 1 logical datacenter, but
>>> separated by keyspaces).
>>>
>>> Does that make sense (so growing up to 200 dcs of 3 servers each in best
>>> case scenario)?
>>
>>
>> There is 3 main things you can do here
>>
>> 1 - Use 1 DC, 200 keyspaces using the DC
>> 2 - Use 200 DC, 1 keyspace per DC.
>> 3 - Use 200 cluster, 1 DC, 1 keyspace per client (or many keyspaces, but
>> related to 1 client)
>>
>> I am not sure if you want to go with 1 or 2, my understanding is you
>> wanted to write "the servers will be in 1 -*logical- **physical*
>> datacenter" and you are willing to do as described in 2.
>>
>> This looks to be a good idea to me, but for other reasons (clients /
>> workload isolation, limited risk, independent growth for each client,
>> visibility on cost per client, ...)
>>
>> Does that make sense (so growing up to 200 dcs of 3 servers each in best
>>> case scenario)?
>>>
>>
>> Yet I would not go with distinct DC, but rather distinct C* clusters
>> (different cluster names, seeds, etc).
>>
>> I see no good reason to use virtual cluster instead of distinct cluster.
>> Keep keyspace in distinct isolated datacenter would work. Datacenter would
>> be quite isolated since no information or load would be shared, excepted
>> from gossip.
>>
>> Yet there are some issue with big clusters due to gossip, and I had some
>> issue in the past due to gossip, affecting all the DC within a cluster. In
>> this case you would face a major issue, that you could have avoided or
>> limited. Plus when upgrading Cassandra, you would have to upgrade 600 nodes
>> quite quickly when distinct clusters can be upgraded independently. I would
>> then go with either option 1 or 3.
>>
>> and because I don't like having multiple cql clients/connections on my
>>> app-code
>>
>>
>> In this case, wouldn't it make sense for you to have per customer app-code
>> or just a conditional connection creation depending on the client?
>>
>> I just try to give you some ideas.
>>
>> Are the keyspaces+tables of dc1 stored in a cassandra node of dc2 ?(since
>>> there is overhead with each keyspace + table which would probably break
>>> this design)
>>
>> Or is it just a simple map dcx--->ip1,ip2,ip3 ?
>>
>>
>> I just checked it. All the nodes would know about every keyspace and
>> table, if using the same Cassandra cluster, (in my testing version C*3.7,
>> this is stored under system_schema.tables - local strategy, no
>> replication). To avoid that, using distinct clusters is the way to go.
>>
>> https://gist.github.com/arodrime/2f4fb2133c5b242b9500860ac8c6d89c
>>
>> C*heers,
>> -----------------------
>> Alain Rodriguez - @arodream - al...@thelastpickle.com
>> France
>>
>> The Last Pickle - Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>> 2016-09-20 22:49 GMT+02:00 Dorian Hoxha <dorian.ho...@gmail.com>:
>>
>>> Hi,
>>>
>>> I need to separate clients data into multiple clusters and because I
>>> don't like having multiple cql clients/connections on my app-code, I'm
>>> thinking of creating many keyspaces and storing them into many virtual
>>> datacenters (the servers will be in 1 logical datacenter, but separated by
>>> keyspaces).
>>>
>>> Does that make sense (so growing up to 200 dcs of 3 servers each in best
>>> case scenario)?
>>>
>>> Does the cql-engine make a new connection (like "use keyspace") when
>>> specifying "keyspace.table" on the query ?
>>>
>>> Are the keyspaces+tables of dc1 stored in a cassandra node of dc2
>>> ?(since there is overhead with each keyspace + table which would probably
>>> break this design)
>>> Or is it just a simple map dcx--->ip1,ip2,ip3 ?
>>>
>>> Thank you!
>>>
>>
>>

Re: Using keyspaces for virtual clusters

Reply via email to