Hi Srinath, I'm not sure if this is something we would have to "fix". It was a clear design decision we took in order to isolate the tenant data, in order for others not to access other tenant's data. So also in Spark virtual tables, it will directly map to their own analytics tables. If we allow, maybe the super tenant, to access other tenant's data, it can be seen as a security threat. The idea should be, no single tenant should have any special access to other tenant's data.
So setting aside the physical representation (which has other complications, like adding another index for tenantId and so on, which should be supported by all data sources), if we are to do this, we need a special view for super tenant tables in Spark virtual tables, in order for them to have access to the "tenantId" property of that table. And in other tenant's tables, we need to hide this, and not let them use it of course. This looks like bit of a hack to implement a specific scenario we have. So this requirement as I know mainly came from APIM analytics, where its in-built analytics publishes all tenant's data to super tenant's tables and the data is processed from there. So if we are doing this, this data is only used internally, and cannot be shown to each respective tenants for their own analytics. If each tenant needs to do their own analytics, they should configure to get data for their tenant space, and write their own analytics scripts. This may at the end mean, some type of data duplication, but it should happen, because two different users are doing their different processing. And IMO, we should not try to share any possible common data they may have and hack the system. At the end, the point is, we should not take lightly what we try to achieve in having multi-tenancy, and compromise its fundamentals. At the moment, the idea should be, each tenant would have their own data, its own analytics scripts, and if you need to scale accordingly, have separate hardware for those tenants. And running separate queries for different tenants does not necessarily make it very slow, since the data load will be divided between the tenants, and only extra processing would be possible ramp up times for query executions. Cheers, Anjana. On Thu, Mar 31, 2016 at 11:45 AM, Srinath Perera <[email protected]> wrote: > Hi Anjana, > > Currently we keep different Hbase/ RDBMS table per tenant. In > multi-tenant, environment, this is very expensive as we will have to run a > query per tenant. > > How can we fix this? e.g. if we keep tenant as field in the table, that > let us do a "group by". > > --Srinath > > -- > ============================ > Blog: http://srinathsview.blogspot.com twitter:@srinath_perera > Site: http://home.apache.org/~hemapani/ > Photos: http://www.flickr.com/photos/hemapani/ > Phone: 0772360902 > -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
