Some thoughts I have had around multi-tennancy from a while back outside of the 
CF/Keyspace limit issues… not sure how relevant they are nowadays. 

Assuming that you have no control over your tenants, they have direct 
thrift/native access and they may be malicious (either intentionally or not):

- Resource limits on the read / write path. For example 
https://issues.apache.org/jira/browse/CASSANDRA-6117 ensures Cassandra won't 
fall over if it reads a whole bunch of tombstones. Not too sure how many 
similar issues exist where a single read could use out of proportion amounts of 
resources. 
- Operation limits per node, currently configurable via cassandra.yaml… but you 
might want a more flexible solution. Not sure how this applies to large batches 
etc. 
- Sandboxing tenant triggers.
- Resource limits on expensive operations like CAS, CL=ALL 

We would be happy to work on some tickets around this as well as our 
in-progress multi-tennant solution just uses containers, namespaces et al for 
isolation. 

Cheers

Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359

On 31/08/2014, at 10:06 AM, Jay Patel <pateljay3...@gmail.com> wrote:

> Hi Folks,
> 
> Ideally, it would be awesome if multitanency is a first class citizen in
> Cassandra. But as of today, the easiest way to get multitanency (on-disk
> data isolation, per tanent recovery & backup, replication strategy) is by
> having one keyspace per tanent. However, it’s not recommended to go beyond
> 300 to 500 tables in one cluster today.
> 
> By this thread, I would like to find out the current blocking issues for
> supporting high number of tables (10K/50K/?), and contribute the fixes.
> Also, open for any ideas for making keyspace itself tanent-aware and
> supporting multitanency out-of-the box, but having replication strategy
> (NTS) per tanent & on-disk data isolation are minimal features to have. Not
> sure but supporting high tables in a cluster may lead us to support
> multitanency out-of-the box in the future..
> 
> As per my quick discussion with Jonathan & few other folks, I think we
> already know below issues:
> 
> 1 MB heap per memtables
> Creating CFs can take long time (Fixed - CASSANDRA-6977)
> Multiple flushes turn writes into random than sequential (should we worry
> if use SSDs?)
> Unknowns!
> 
> Regarding '1 MB per memtable', CASSANDRA-5935 adds an option to allow
> disabling slab allocation to pack more CFs, but at the cost of GC pains.
> Seems like Cassandra 2.1 off-heap memtables will be a better option.
> However, looks like it also uses region-based memory allocation to avoid
> fragmentation. Does this mean no GC pain but still need high RAM (for 50K
> tables, end up with 50GB)?
> 
>>> (pls. correct if this is not the right file I'm looking into)
> 
> public class NativeAllocator extends MemtableAllocator
> {
>    private static final Logger logger =
> LoggerFactory.getLogger(NativeAllocator.class);
> 
>    private final static int REGION_SIZE = 1024 * 1024;
>    private final static int MAX_CLONED_SIZE = 128 * 1024; // bigger than
> this don't go in the region
> 
> Would like to know any other known issues that I’ve not listed here and/or
> any recommendations for multitanency. Also, any thoughts on supporting
> efficient off-heap allocator option for high # of tables?
> 
> BTW, having 10K tables brings up many other issues around management,
> tooling, etc. but I'm less worried for that, at this point.
> 
> Thanks,
> Jay

Reply via email to