Re: Support Multi-Tenant in Cassandra

2016-09-09 Thread J. D. Jordan
I think the resource constraining aspects are one of the most important things 
we are missing.  Actually doing resource constraints in SEDA is hard. In TPC it 
should be easier, so we put off some discussions we were having about it until 
we have TPC in place such that tracking resource use of a given query should be 
much easier if a given request is being serviced by a single thread.

> On Sep 9, 2016, at 9:41 PM, Jason Brown  wrote:
> 
> Heh, nice find, Jeremy. Thanks for digging it up
> 
> On Friday, September 9, 2016, Jeremy Hanna 
> wrote:
> 
>> For posterity, our wiki page from many moons ago was
>> https://wiki.apache.org/cassandra/MultiTenant > cassandra/MultiTenant>.  It was a different era of the project but there
>> might be some useful bits in there for anyone interested in MT.
>> 
 On Sep 9, 2016, at 9:28 PM, Jason Brown >> > wrote:
>>> 
>>> The current implementation will probably be yanked when thrift as a whole
>>> is removed for 4.0. And I'm ok with that.
>>> 
>>> That being said, there has been an undercurrent of interest over time
>> about
>>> multitenancy, and I'm willing to entertain a renewed discussion. It might
>>> be instructive to see if any other systems are currently offering
>>> multitenancy and if there's something to be learned there. If not, we
>> could
>>> at least explore the topic more seriously and then document for posterity
>>> the well-informed pros/cons of why we as a community choose to not do it,
>>> postpone for later, or actually do it. Of course, it would be great for a
>>> motivated individual to lead the effort if we really want to entertain
>> it.
>>> 
>>> On Friday, September 9, 2016, Jeremy Hanna > >
>>> wrote:
>>> 
 I agree that the request scheduler should probably be deprecated and
 removed unless someone wants to put in something that's usable from the
>> non
 thrift request processor. We added it for prioritization and QoS but I
 don't know of anyone ever using it. Our project we thought of using it
>> for
 got shelved.
 
 Unless it's just multiple clients with the same general use case, I
>> think
 multi tenant is going to be quite difficult to tune and diagnose
>> problems
 for. I would steer clear and have a cluster per logical app if at all
 possible.
 
> On Sep 9, 2016, at 6:43 PM, Mick Semb Wever > 
 > wrote:
> 
> On 15 July 2016 at 16:38, jason zhao yang > 
 >
> wrote:
> 
>> 
>> May I ask is there any plan of extending functionalities related to
>> Multi-Tenant?
> 
> 
> 
> I had needs for this in the past and my questioning always seemed to
> eventuate to answers along the lines of this should be done more at the
> resource level. There is a variety of ways a bad datamodel or client
>> can
> bring a cluster down, not just at request time.
> 
> There was some thoughts IIRC around a resource scheduler somewhere
 post-3.0
> but i don't think that ever eventuated (someone more knowledgable
>> please
> correct me).
> 
> Otherwise you could look into using tiered storage so that you had at
 least
> disk isolation per keyspace. Solves some things, but won't help with
> overhead and memtable impact from number of keyspaces/tables and lack
>> of
> heap/throughput isolation/scheduling.
> 
> The approach of doing this at the driver level, prefixing the partition
> key, is as good as any approach for now.
> 
> Could be an idea to remove/deprecate the request_scheduler from code
>> and
> yaml.
> 
> ~mck
>> 
>> 


Re: Support Multi-Tenant in Cassandra

2016-09-09 Thread Jason Brown
Heh, nice find, Jeremy. Thanks for digging it up

On Friday, September 9, 2016, Jeremy Hanna 
wrote:

> For posterity, our wiki page from many moons ago was
> https://wiki.apache.org/cassandra/MultiTenant  cassandra/MultiTenant>.  It was a different era of the project but there
> might be some useful bits in there for anyone interested in MT.
>
> > On Sep 9, 2016, at 9:28 PM, Jason Brown  > wrote:
> >
> > The current implementation will probably be yanked when thrift as a whole
> > is removed for 4.0. And I'm ok with that.
> >
> > That being said, there has been an undercurrent of interest over time
> about
> > multitenancy, and I'm willing to entertain a renewed discussion. It might
> > be instructive to see if any other systems are currently offering
> > multitenancy and if there's something to be learned there. If not, we
> could
> > at least explore the topic more seriously and then document for posterity
> > the well-informed pros/cons of why we as a community choose to not do it,
> > postpone for later, or actually do it. Of course, it would be great for a
> > motivated individual to lead the effort if we really want to entertain
> it.
> >
> > On Friday, September 9, 2016, Jeremy Hanna  >
> > wrote:
> >
> >> I agree that the request scheduler should probably be deprecated and
> >> removed unless someone wants to put in something that's usable from the
> non
> >> thrift request processor. We added it for prioritization and QoS but I
> >> don't know of anyone ever using it. Our project we thought of using it
> for
> >> got shelved.
> >>
> >> Unless it's just multiple clients with the same general use case, I
> think
> >> multi tenant is going to be quite difficult to tune and diagnose
> problems
> >> for. I would steer clear and have a cluster per logical app if at all
> >> possible.
> >>
> >>> On Sep 9, 2016, at 6:43 PM, Mick Semb Wever  
> >> > wrote:
> >>>
> >>> On 15 July 2016 at 16:38, jason zhao yang  
> >> >
> >>> wrote:
> >>>
> 
>  May I ask is there any plan of extending functionalities related to
>  Multi-Tenant?
> >>>
> >>>
> >>>
> >>> I had needs for this in the past and my questioning always seemed to
> >>> eventuate to answers along the lines of this should be done more at the
> >>> resource level. There is a variety of ways a bad datamodel or client
> can
> >>> bring a cluster down, not just at request time.
> >>>
> >>> There was some thoughts IIRC around a resource scheduler somewhere
> >> post-3.0
> >>> but i don't think that ever eventuated (someone more knowledgable
> please
> >>> correct me).
> >>>
> >>> Otherwise you could look into using tiered storage so that you had at
> >> least
> >>> disk isolation per keyspace. Solves some things, but won't help with
> >>> overhead and memtable impact from number of keyspaces/tables and lack
> of
> >>> heap/throughput isolation/scheduling.
> >>>
> >>> The approach of doing this at the driver level, prefixing the partition
> >>> key, is as good as any approach for now.
> >>>
> >>> Could be an idea to remove/deprecate the request_scheduler from code
> and
> >>> yaml.
> >>>
> >>> ~mck
> >>
>
>


Re: Support Multi-Tenant in Cassandra

2016-09-09 Thread Jeremy Hanna
For posterity, our wiki page from many moons ago was 
https://wiki.apache.org/cassandra/MultiTenant 
.  It was a different era of the 
project but there might be some useful bits in there for anyone interested in 
MT.

> On Sep 9, 2016, at 9:28 PM, Jason Brown  wrote:
> 
> The current implementation will probably be yanked when thrift as a whole
> is removed for 4.0. And I'm ok with that.
> 
> That being said, there has been an undercurrent of interest over time about
> multitenancy, and I'm willing to entertain a renewed discussion. It might
> be instructive to see if any other systems are currently offering
> multitenancy and if there's something to be learned there. If not, we could
> at least explore the topic more seriously and then document for posterity
> the well-informed pros/cons of why we as a community choose to not do it,
> postpone for later, or actually do it. Of course, it would be great for a
> motivated individual to lead the effort if we really want to entertain it.
> 
> On Friday, September 9, 2016, Jeremy Hanna 
> wrote:
> 
>> I agree that the request scheduler should probably be deprecated and
>> removed unless someone wants to put in something that's usable from the non
>> thrift request processor. We added it for prioritization and QoS but I
>> don't know of anyone ever using it. Our project we thought of using it for
>> got shelved.
>> 
>> Unless it's just multiple clients with the same general use case, I think
>> multi tenant is going to be quite difficult to tune and diagnose problems
>> for. I would steer clear and have a cluster per logical app if at all
>> possible.
>> 
>>> On Sep 9, 2016, at 6:43 PM, Mick Semb Wever > > wrote:
>>> 
>>> On 15 July 2016 at 16:38, jason zhao yang > >
>>> wrote:
>>> 
 
 May I ask is there any plan of extending functionalities related to
 Multi-Tenant?
>>> 
>>> 
>>> 
>>> I had needs for this in the past and my questioning always seemed to
>>> eventuate to answers along the lines of this should be done more at the
>>> resource level. There is a variety of ways a bad datamodel or client can
>>> bring a cluster down, not just at request time.
>>> 
>>> There was some thoughts IIRC around a resource scheduler somewhere
>> post-3.0
>>> but i don't think that ever eventuated (someone more knowledgable please
>>> correct me).
>>> 
>>> Otherwise you could look into using tiered storage so that you had at
>> least
>>> disk isolation per keyspace. Solves some things, but won't help with
>>> overhead and memtable impact from number of keyspaces/tables and lack of
>>> heap/throughput isolation/scheduling.
>>> 
>>> The approach of doing this at the driver level, prefixing the partition
>>> key, is as good as any approach for now.
>>> 
>>> Could be an idea to remove/deprecate the request_scheduler from code and
>>> yaml.
>>> 
>>> ~mck
>> 



Re: Support Multi-Tenant in Cassandra

2016-09-09 Thread Jason Brown
The current implementation will probably be yanked when thrift as a whole
is removed for 4.0. And I'm ok with that.

That being said, there has been an undercurrent of interest over time about
multitenancy, and I'm willing to entertain a renewed discussion. It might
be instructive to see if any other systems are currently offering
multitenancy and if there's something to be learned there. If not, we could
at least explore the topic more seriously and then document for posterity
the well-informed pros/cons of why we as a community choose to not do it,
postpone for later, or actually do it. Of course, it would be great for a
motivated individual to lead the effort if we really want to entertain it.

On Friday, September 9, 2016, Jeremy Hanna 
wrote:

> I agree that the request scheduler should probably be deprecated and
> removed unless someone wants to put in something that's usable from the non
> thrift request processor. We added it for prioritization and QoS but I
> don't know of anyone ever using it. Our project we thought of using it for
> got shelved.
>
> Unless it's just multiple clients with the same general use case, I think
> multi tenant is going to be quite difficult to tune and diagnose problems
> for. I would steer clear and have a cluster per logical app if at all
> possible.
>
> > On Sep 9, 2016, at 6:43 PM, Mick Semb Wever  > wrote:
> >
> > On 15 July 2016 at 16:38, jason zhao yang  >
> > wrote:
> >
> >>
> >> May I ask is there any plan of extending functionalities related to
> >> Multi-Tenant?
> >
> >
> >
> > I had needs for this in the past and my questioning always seemed to
> > eventuate to answers along the lines of this should be done more at the
> > resource level. There is a variety of ways a bad datamodel or client can
> > bring a cluster down, not just at request time.
> >
> > There was some thoughts IIRC around a resource scheduler somewhere
> post-3.0
> > but i don't think that ever eventuated (someone more knowledgable please
> > correct me).
> >
> > Otherwise you could look into using tiered storage so that you had at
> least
> > disk isolation per keyspace. Solves some things, but won't help with
> > overhead and memtable impact from number of keyspaces/tables and lack of
> > heap/throughput isolation/scheduling.
> >
> > The approach of doing this at the driver level, prefixing the partition
> > key, is as good as any approach for now.
> >
> > Could be an idea to remove/deprecate the request_scheduler from code and
> > yaml.
> >
> > ~mck
>


Re: Support Multi-Tenant in Cassandra

2016-09-09 Thread Jeremy Hanna
I agree that the request scheduler should probably be deprecated and removed 
unless someone wants to put in something that's usable from the non thrift 
request processor. We added it for prioritization and QoS but I don't know of 
anyone ever using it. Our project we thought of using it for got shelved.

Unless it's just multiple clients with the same general use case, I think multi 
tenant is going to be quite difficult to tune and diagnose problems for. I 
would steer clear and have a cluster per logical app if at all possible.

> On Sep 9, 2016, at 6:43 PM, Mick Semb Wever  wrote:
> 
> On 15 July 2016 at 16:38, jason zhao yang 
> wrote:
> 
>> 
>> May I ask is there any plan of extending functionalities related to
>> Multi-Tenant?
> 
> 
> 
> I had needs for this in the past and my questioning always seemed to
> eventuate to answers along the lines of this should be done more at the
> resource level. There is a variety of ways a bad datamodel or client can
> bring a cluster down, not just at request time.
> 
> There was some thoughts IIRC around a resource scheduler somewhere post-3.0
> but i don't think that ever eventuated (someone more knowledgable please
> correct me).
> 
> Otherwise you could look into using tiered storage so that you had at least
> disk isolation per keyspace. Solves some things, but won't help with
> overhead and memtable impact from number of keyspaces/tables and lack of
> heap/throughput isolation/scheduling.
> 
> The approach of doing this at the driver level, prefixing the partition
> key, is as good as any approach for now.
> 
> Could be an idea to remove/deprecate the request_scheduler from code and
> yaml.
> 
> ~mck


Re: Support Multi-Tenant in Cassandra

2016-09-09 Thread Mick Semb Wever
On 15 July 2016 at 16:38, jason zhao yang 
wrote:

>
> May I ask is there any plan of extending functionalities related to
> Multi-Tenant?



I had needs for this in the past and my questioning always seemed to
eventuate to answers along the lines of this should be done more at the
resource level. There is a variety of ways a bad datamodel or client can
bring a cluster down, not just at request time.

There was some thoughts IIRC around a resource scheduler somewhere post-3.0
but i don't think that ever eventuated (someone more knowledgable please
correct me).

Otherwise you could look into using tiered storage so that you had at least
disk isolation per keyspace. Solves some things, but won't help with
overhead and memtable impact from number of keyspaces/tables and lack of
heap/throughput isolation/scheduling.

The approach of doing this at the driver level, prefixing the partition
key, is as good as any approach for now.

Could be an idea to remove/deprecate the request_scheduler from code and
yaml.

~mck


Re: Support Multi-Tenant in Cassandra

2016-09-09 Thread jason zhao yang
Hi Romain,

Thanks for the reply.

> request_scheduler

it is a legacy feature which only works for thrift api..

It will be great to have some sort of scheduling per user/role, but
scheduling on the request will only provide limit isolation..if JVM crashes
due to one tenant's invalid request(eg. insert a blo to collection column),
it will be awful.


Thank you.

jason zhao yang 于2016年8月6日周六 下午12:33写道:

> We consider splitting by Keypspace or tables before, but Cassandra's table
> is a costly structure(more cpu, flush, memory..).
>
> In our use case, it's expected to have more than 50 tenants on same
> cluster.
>
> > As it was already mentioned in the ticket itself, filtering is a highly 
> > inefficient
> operation.
> I totally agree, but it's to good to have data filtered on server sider,
> rather than client side..
>
> How about adding a logical tenant concept in Cassandra?  all logical
> tenants will share the same table schemas, but queries/storage are
> separated?
>
>
> Oleksandr Petrov 于2016年7月15日周五 下午4:28写道:
>
>> There's a ticket on filtering (#11031), although I would not count on
>> filtering in production.
>>
>> As it was already mentioned in the ticket itself, filtering is a highly
>> inefficient operation. it was thought as aid for people who're exploring
>> data and/or can structure query in such a way that it will at least be
>> local (for example, with IN or EQ query on the partition key and filtering
>> out results from the small partition). However, filtering on the Partition
>> Key assumes that _every_ replica has to be queried for the results, as we
>> do not know which partitions are going to be holding the data. Having
>> every
>> query in your system to rely on filtering, big amount of data and high
>> load
>> will eventually have substantial negative impact on performance.
>>
>> I'm not sure what's the amount of tenants you're working with, although
>> I've seen setups where tenancy was solved by using multiple keyspaces,
>> which helps to completely isolate the data, avoid filtering. Given that
>> you've tried splitting sstables on tenant_id, that might be solved by
>> using
>> multiple keyspaces. This will also help with server resource isolation and
>> most of the issues you've raised.
>>
>>
>> On Fri, Jul 15, 2016 at 10:10 AM Romain Hardouin
>>  wrote:
>>
>> > I don't use C* in such a context but out of curiosity did you set
>> > the request_scheduler to RoundRobin or did you implement your own
>> scheduler?
>> > Romain
>> > Le Vendredi 15 juillet 2016 8h39, jason zhao yang <
>> > zhaoyangsingap...@gmail.com> a écrit :
>> >
>> >
>> >  Hi,
>> >
>> > May I ask is there any plan of extending functionalities related to
>> > Multi-Tenant?
>> >
>> > Our current approach is to define an extra PartitionKey called
>> "tenant_id".
>> > In my use cases, all tenants will have the same table schemas.
>> >
>> > * For security isolation: we customized GRANT statement to be able to
>> > restrict user query based on the "tenant_id" partition.
>> >
>> > * For getting all data of single tenant, we customized SELECT statement
>> to
>> > support allow filtering on "tenant_id" partition key.
>> >
>> > * For server resource isolation, I have no idea how to.
>> >
>> > * For per-tenant backup restore, I was trying a
>> > tenant_base_compaction_strategy to split sstables based on tenant_id. it
>> > turned out to be very inefficient.
>> >
>> > What's community's opinion about submitting those patches to Cassandra?
>> It
>> > will be great if you guys can share the ideal Multi-Tenant architecture
>> for
>> > Cassandra?
>> >
>> > jasonstack
>> >
>> >
>> >
>>
>> --
>> Alex Petrov
>>
>


Re: Support Multi-Tenant in Cassandra

2016-08-05 Thread jason zhao yang
We consider splitting by Keypspace or tables before, but Cassandra's table
is a costly structure(more cpu, flush, memory..).

In our use case, it's expected to have more than 50 tenants on same cluster.

> As it was already mentioned in the ticket itself, filtering is a highly 
> inefficient
operation.
I totally agree, but it's to good to have data filtered on server sider,
rather than client side..

How about adding a logical tenant concept in Cassandra?  all logical
tenants will share the same table schemas, but queries/storage are
separated?


Oleksandr Petrov 于2016年7月15日周五 下午4:28写道:

> There's a ticket on filtering (#11031), although I would not count on
> filtering in production.
>
> As it was already mentioned in the ticket itself, filtering is a highly
> inefficient operation. it was thought as aid for people who're exploring
> data and/or can structure query in such a way that it will at least be
> local (for example, with IN or EQ query on the partition key and filtering
> out results from the small partition). However, filtering on the Partition
> Key assumes that _every_ replica has to be queried for the results, as we
> do not know which partitions are going to be holding the data. Having every
> query in your system to rely on filtering, big amount of data and high load
> will eventually have substantial negative impact on performance.
>
> I'm not sure what's the amount of tenants you're working with, although
> I've seen setups where tenancy was solved by using multiple keyspaces,
> which helps to completely isolate the data, avoid filtering. Given that
> you've tried splitting sstables on tenant_id, that might be solved by using
> multiple keyspaces. This will also help with server resource isolation and
> most of the issues you've raised.
>
>
> On Fri, Jul 15, 2016 at 10:10 AM Romain Hardouin
>  wrote:
>
> > I don't use C* in such a context but out of curiosity did you set
> > the request_scheduler to RoundRobin or did you implement your own
> scheduler?
> > Romain
> > Le Vendredi 15 juillet 2016 8h39, jason zhao yang <
> > zhaoyangsingap...@gmail.com> a écrit :
> >
> >
> >  Hi,
> >
> > May I ask is there any plan of extending functionalities related to
> > Multi-Tenant?
> >
> > Our current approach is to define an extra PartitionKey called
> "tenant_id".
> > In my use cases, all tenants will have the same table schemas.
> >
> > * For security isolation: we customized GRANT statement to be able to
> > restrict user query based on the "tenant_id" partition.
> >
> > * For getting all data of single tenant, we customized SELECT statement
> to
> > support allow filtering on "tenant_id" partition key.
> >
> > * For server resource isolation, I have no idea how to.
> >
> > * For per-tenant backup restore, I was trying a
> > tenant_base_compaction_strategy to split sstables based on tenant_id. it
> > turned out to be very inefficient.
> >
> > What's community's opinion about submitting those patches to Cassandra?
> It
> > will be great if you guys can share the ideal Multi-Tenant architecture
> for
> > Cassandra?
> >
> > jasonstack
> >
> >
> >
>
> --
> Alex Petrov
>


Re: Support Multi-Tenant in Cassandra

2016-07-15 Thread Romain Hardouin
I don't use C* in such a context but out of curiosity did you set the 
request_scheduler to RoundRobin or did you implement your own scheduler? 
Romain
Le Vendredi 15 juillet 2016 8h39, jason zhao yang 
 a écrit :
 

 Hi,

May I ask is there any plan of extending functionalities related to
Multi-Tenant?

Our current approach is to define an extra PartitionKey called "tenant_id".
In my use cases, all tenants will have the same table schemas.

* For security isolation: we customized GRANT statement to be able to
restrict user query based on the "tenant_id" partition.

* For getting all data of single tenant, we customized SELECT statement to
support allow filtering on "tenant_id" partition key.

* For server resource isolation, I have no idea how to.

* For per-tenant backup restore, I was trying a
tenant_base_compaction_strategy to split sstables based on tenant_id. it
turned out to be very inefficient.

What's community's opinion about submitting those patches to Cassandra? It
will be great if you guys can share the ideal Multi-Tenant architecture for
Cassandra?

jasonstack


  

Support Multi-Tenant in Cassandra

2016-07-15 Thread jason zhao yang
Hi,

May I ask is there any plan of extending functionalities related to
Multi-Tenant?

Our current approach is to define an extra PartitionKey called "tenant_id".
In my use cases, all tenants will have the same table schemas.

* For security isolation: we customized GRANT statement to be able to
restrict user query based on the "tenant_id" partition.

* For getting all data of single tenant, we customized SELECT statement to
support allow filtering on "tenant_id" partition key.

* For server resource isolation, I have no idea how to.

* For per-tenant backup restore, I was trying a
tenant_base_compaction_strategy to split sstables based on tenant_id. it
turned out to be very inefficient.

What's community's opinion about submitting those patches to Cassandra? It
will be great if you guys can share the ideal Multi-Tenant architecture for
Cassandra?

jasonstack