Hi Srinath,

I'm not sure if this is something we would have to "fix". It was a clear
design decision we took in order to isolate the tenant data, in order for
others not to access other tenant's data. So also in Spark virtual tables,
it will directly map to their own analytics tables. If we allow, maybe the
super tenant, to access other tenant's data, it can be seen as a security
threat. The idea should be, no single tenant should have any special access
to other tenant's data.

So setting aside the physical representation (which has other
complications, like adding another index for tenantId and so on, which
should be supported by all data sources), if we are to do this, we need a
special view for super tenant tables in Spark virtual tables, in order for
them to have access to the "tenantId" property of that table. And in other
tenant's tables, we need to hide this, and not let them use it of course.
This looks like bit of a hack to implement a specific scenario we have.

So this requirement as I know mainly came from APIM analytics, where its
in-built analytics publishes all tenant's data to super tenant's tables and
the data is processed from there. So if we are doing this, this data is
only used internally, and cannot be shown to each respective tenants for
their own analytics. If each tenant needs to do their own analytics, they
should configure to get data for their tenant space, and write their own
analytics scripts. This may at the end mean, some type of data duplication,
but it should happen, because two different users are doing their different
processing. And IMO, we should not try to share any possible common data
they may have and hack the system.

At the end, the point is, we should not take lightly what we try to achieve
in having multi-tenancy, and compromise its fundamentals. At the moment,
the idea should be, each tenant would have their own data, its own
analytics scripts, and if you need to scale accordingly, have separate
hardware for those tenants. And running separate queries for different
tenants does not necessarily make it very slow, since the data load will be
divided between the tenants, and only extra processing would be possible
ramp up times for query executions.

Cheers,
Anjana.

On Thu, Mar 31, 2016 at 11:45 AM, Srinath Perera <[email protected]> wrote:

> Hi Anjana,
>
> Currently we keep different Hbase/ RDBMS table per tenant. In
> multi-tenant, environment, this is very expensive as we will have to run a
> query per tenant.
>
> How can we fix this? e.g. if we keep tenant as field in the table, that
> let us do a "group by".
>
> --Srinath
>
> --
> ============================
> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
> Site: http://home.apache.org/~hemapani/
> Photos: http://www.flickr.com/photos/hemapani/
> Phone: 0772360902
>



-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to