Some thoughts I have had around multi-tennancy from a while back outside of the CF/Keyspace limit issues… not sure how relevant they are nowadays.
Assuming that you have no control over your tenants, they have direct thrift/native access and they may be malicious (either intentionally or not): - Resource limits on the read / write path. For example https://issues.apache.org/jira/browse/CASSANDRA-6117 ensures Cassandra won't fall over if it reads a whole bunch of tombstones. Not too sure how many similar issues exist where a single read could use out of proportion amounts of resources. - Operation limits per node, currently configurable via cassandra.yaml… but you might want a more flexible solution. Not sure how this applies to large batches etc. - Sandboxing tenant triggers. - Resource limits on expensive operations like CAS, CL=ALL We would be happy to work on some tickets around this as well as our in-progress multi-tennant solution just uses containers, namespaces et al for isolation. Cheers Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359 On 31/08/2014, at 10:06 AM, Jay Patel <pateljay3...@gmail.com> wrote: > Hi Folks, > > Ideally, it would be awesome if multitanency is a first class citizen in > Cassandra. But as of today, the easiest way to get multitanency (on-disk > data isolation, per tanent recovery & backup, replication strategy) is by > having one keyspace per tanent. However, it’s not recommended to go beyond > 300 to 500 tables in one cluster today. > > By this thread, I would like to find out the current blocking issues for > supporting high number of tables (10K/50K/?), and contribute the fixes. > Also, open for any ideas for making keyspace itself tanent-aware and > supporting multitanency out-of-the box, but having replication strategy > (NTS) per tanent & on-disk data isolation are minimal features to have. Not > sure but supporting high tables in a cluster may lead us to support > multitanency out-of-the box in the future.. > > As per my quick discussion with Jonathan & few other folks, I think we > already know below issues: > > 1 MB heap per memtables > Creating CFs can take long time (Fixed - CASSANDRA-6977) > Multiple flushes turn writes into random than sequential (should we worry > if use SSDs?) > Unknowns! > > Regarding '1 MB per memtable', CASSANDRA-5935 adds an option to allow > disabling slab allocation to pack more CFs, but at the cost of GC pains. > Seems like Cassandra 2.1 off-heap memtables will be a better option. > However, looks like it also uses region-based memory allocation to avoid > fragmentation. Does this mean no GC pain but still need high RAM (for 50K > tables, end up with 50GB)? > >>> (pls. correct if this is not the right file I'm looking into) > > public class NativeAllocator extends MemtableAllocator > { > private static final Logger logger = > LoggerFactory.getLogger(NativeAllocator.class); > > private final static int REGION_SIZE = 1024 * 1024; > private final static int MAX_CLONED_SIZE = 128 * 1024; // bigger than > this don't go in the region > > Would like to know any other known issues that I’ve not listed here and/or > any recommendations for multitanency. Also, any thoughts on supporting > efficient off-heap allocator option for high # of tables? > > BTW, having 10K tables brings up many other issues around management, > tooling, etc. but I'm less worried for that, at this point. > > Thanks, > Jay