paul-rogers commented on pull request #2251:
URL: https://github.com/apache/drill/pull/2251#issuecomment-887041732


   @vdiravka, I had an opportunity to get a bit more background info for this 
project. I am going out of my way to try to facilitate this PR; normally we'd 
require that the PR author provide this information so that us reviews can 
simply review the code, and not have to reverse engineer requirements and 
design.
   
   Sounds like the requirements are for a very specific light-weight 
multi-tenant model: one that allows tenants to set options, create storage 
plugin configs, and run queries, but not access any other part of Drill. 
Tenants are to be trusted to not make mistakes. Specifically:
   
   * A *tenant* has a set of "system" options (maybe call them *tenant 
options*) available when that tenant creates a Drill session.
   * A tenant can define a set of storage plugin configs which are visible to 
*only* that tenant. Perhaps call these *tenant plugin configs*.
   * A tenant can run queries that use the tenant options and tenant plugin 
configs.
   
   This use case is limited compared to the normal multi-tenant requirements. 
The following appear to be restrictions for this project:
   
   * A tenant does not have access to the Drill Web Console or the Drill REST 
API and thus does not have access to query profiles.
   * A tenant does not have access to Zookeeper or the Drill native API. 
Queries sent by the tenant must go through an intermediate software layer 
provided by the service provider.
   * A tenant does not have access to Drill logs to diagnose failed queries.
   
   The above restrictions say that the feature is not useful for open source 
Drill users who use the Drill-provided UI and APIs. This makes the feature of 
very limited appeal to the Drill community. So, one of our challenges is to 
design the feature in a way that users of the "out-of-the-box" Drill can 
benefit.
   
   Additional restrictions for this one use case:
   
   * A tenant cannot start, stop or restart Drillbits, nor can they change 
startup properties.
   * A tenant cannot upload a UDF nor can a tenant provide custom *connectors* 
(storage plugin classes). (Note that 
[DRILL-7916](https://github.com/apache/drill/pull/2215) is working at 
cross-purposes to this PR.)
   * Tenants are trusted to not change system-wide performance-related options 
(queueing, resource allocation, etc.) The resulting behavior, if those options 
are changed, is undefined and must be dealt with by the service provider if 
they occur.
   * No provision for the Drill admin to view or modify tenant options or 
plugin configs. If such behavior is desired, a service provider must write 
tools that work with Drill's persistent storage.
   * Tenants are trusted to not consume excess resources, so no resource 
isolation between tenants. Tenant A might try to sort a trillion rows, which 
might deny resources to other tenants.
   * Tenants cannot (?) create views or a metadata store.
   * Parquet metadata caching is either unsupported (?) or must be written to 
the tenant's S3 bucket; Drill provides no storage for the metadata.
   
   The above limit the solution, but leave the door open to eventually 
providing more general multi-tenant support.
   
   A final question is the relation between *tenant* and *user*. This PR 
assumes that they are identical: that "fred" is either a normal Drill user in 
"normal mode", or a tenant in "tenant mode." That is, each tenant has a single 
Drill user (which works in this use case because of the intermediate software 
layer.) This explains why this PR is labeled as "instances for different 
*users*", the the discussion has revealed the goal to be "instances for 
different *tenants*."
   
   Since the "tenant = user" model applies to only this one use case, it again 
is not a general enough feature to add to the Apache Drill code. Instead, 
Apache Drill must provide a generally useful solution. 
   
   Prior notes have explained how a "per-user" model should work (it requires 
sharing between users). Specifically a, "config per user" solution must 
recognize that users work on a team, allow sharing, and permit admin abilities. 
Similarly, an "options per user" solution must persist only a subset of options 
(those which are neither system nor per-query options), and must solve the 
synchronization issue.
   
   Notes have also explained that the customary definition of "tenant" is an 
organization with multiple users. A true multi-tenant solution must allow 
multiple users per tenant, and provide a path toward full tenant isolation 
later.
   
   The challenge here is to find a design that balances the very specific, 
ad-hoc, unusual needs of this one use case, with something that can evolve to 
become of general use to the Drill community.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to