Hi, Having to run spark queries for each tenant can be very expensive with large number of tenants, but in terms of data isolation current design is better I believe. If we can come up with a good to design for supporting tenant level data isolation, this is something we can do indeed.
However, on the other hand, let's say we keep data in a single table and process all data with a single query. Since that table contains entire dataset, it could still be a somewhat expensive one and will take longer time to complete. In that case, tenants that have little amount of data will get affected and will have to wait for longer time to see their results. On Thu, Mar 31, 2016 at 5:25 AM, Anjana Fernando <[email protected]> wrote: > Hi Srinath, > > I'm not sure if this is something we would have to "fix". It was a clear > design decision we took in order to isolate the tenant data, in order for > others not to access other tenant's data. So also in Spark virtual tables, > it will directly map to their own analytics tables. If we allow, maybe the > super tenant, to access other tenant's data, it can be seen as a security > threat. The idea should be, no single tenant should have any special access > to other tenant's data. > > So setting aside the physical representation (which has other > complications, like adding another index for tenantId and so on, which > should be supported by all data sources), if we are to do this, we need a > special view for super tenant tables in Spark virtual tables, in order for > them to have access to the "tenantId" property of that table. And in other > tenant's tables, we need to hide this, and not let them use it of course. > This looks like bit of a hack to implement a specific scenario we have. > > So this requirement as I know mainly came from APIM analytics, where its > in-built analytics publishes all tenant's data to super tenant's tables and > the data is processed from there. So if we are doing this, this data is > only used internally, and cannot be shown to each respective tenants for > their own analytics. If each tenant needs to do their own analytics, they > should configure to get data for their tenant space, and write their own > analytics scripts. This may at the end mean, some type of data duplication, > but it should happen, because two different users are doing their different > processing. And IMO, we should not try to share any possible common data > they may have and hack the system. > > At the end, the point is, we should not take lightly what we try to > achieve in having multi-tenancy, and compromise its fundamentals. At the > moment, the idea should be, each tenant would have their own data, its own > analytics scripts, and if you need to scale accordingly, have separate > hardware for those tenants. And running separate queries for different > tenants does not necessarily make it very slow, since the data load will be > divided between the tenants, and only extra processing would be possible > ramp up times for query executions. > > Cheers, > Anjana. > > On Thu, Mar 31, 2016 at 11:45 AM, Srinath Perera <[email protected]> wrote: > >> Hi Anjana, >> >> Currently we keep different Hbase/ RDBMS table per tenant. In >> multi-tenant, environment, this is very expensive as we will have to run a >> query per tenant. >> >> How can we fix this? e.g. if we keep tenant as field in the table, that >> let us do a "group by". >> >> --Srinath >> >> -- >> ============================ >> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera >> Site: http://home.apache.org/~hemapani/ >> Photos: http://www.flickr.com/photos/hemapani/ >> Phone: 0772360902 >> > > > > -- > *Anjana Fernando* > Senior Technical Lead > WSO2 Inc. | http://wso2.com > lean . enterprise . middleware > > _______________________________________________ > Architecture mailing list > [email protected] > https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture > > -- Thanks & Regards, Inosh Goonewardena Associate Technical Lead- WSO2 Inc. Mobile: +94779966317
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
