paul-rogers commented on pull request #2251: URL: https://github.com/apache/drill/pull/2251#issuecomment-887041732
@vdiravka, I had an opportunity to get a bit more background info for this project. I am going out of my way to try to facilitate this PR; normally we'd require that the PR author provide this information so that us reviews can simply review the code, and not have to reverse engineer requirements and design. Sounds like the requirements are for a very specific light-weight multi-tenant model: one that allows tenants to set options, create storage plugin configs, and run queries, but not access any other part of Drill. Tenants are to be trusted to not make mistakes. Specifically: * A *tenant* has a set of "system" options (maybe call them *tenant options*) available when that tenant creates a Drill session. * A tenant can define a set of storage plugin configs which are visible to *only* that tenant. Perhaps call these *tenant plugin configs*. * A tenant can run queries that use the tenant options and tenant plugin configs. This use case is limited compared to the normal multi-tenant requirements. The following appear to be restrictions for this project: * A tenant does not have access to the Drill Web Console or the Drill REST API and thus does not have access to query profiles. * A tenant does not have access to Zookeeper or the Drill native API. Queries sent by the tenant must go through an intermediate software layer provided by the service provider. * A tenant does not have access to Drill logs to diagnose failed queries. The above restrictions say that the feature is not useful for open source Drill users who use the Drill-provided UI and APIs. This makes the feature of very limited appeal to the Drill community. So, one of our challenges is to design the feature in a way that users of the "out-of-the-box" Drill can benefit. Additional restrictions for this one use case: * A tenant cannot start, stop or restart Drillbits, nor can they change startup properties. * A tenant cannot upload a UDF nor can a tenant provide custom *connectors* (storage plugin classes). (Note that [DRILL-7916](https://github.com/apache/drill/pull/2215) is working at cross-purposes to this PR.) * Tenants are trusted to not change system-wide performance-related options (queueing, resource allocation, etc.) The resulting behavior, if those options are changed, is undefined and must be dealt with by the service provider if they occur. * No provision for the Drill admin to view or modify tenant options or plugin configs. If such behavior is desired, a service provider must write tools that work with Drill's persistent storage. * Tenants are trusted to not consume excess resources, so no resource isolation between tenants. Tenant A might try to sort a trillion rows, which might deny resources to other tenants. * Tenants cannot (?) create views or a metadata store. * Parquet metadata caching is either unsupported (?) or must be written to the tenant's S3 bucket; Drill provides no storage for the metadata. The above limit the solution, but leave the door open to eventually providing more general multi-tenant support. A final question is the relation between *tenant* and *user*. This PR assumes that they are identical: that "fred" is either a normal Drill user in "normal mode", or a tenant in "tenant mode." That is, each tenant has a single Drill user (which works in this use case because of the intermediate software layer.) This explains why this PR is labeled as "instances for different *users*", the the discussion has revealed the goal to be "instances for different *tenants*." Since the "tenant = user" model applies to only this one use case, it again is not a general enough feature to add to the Apache Drill code. Instead, Apache Drill must provide a generally useful solution. Prior notes have explained how a "per-user" model should work (it requires sharing between users). Specifically a, "config per user" solution must recognize that users work on a team, allow sharing, and permit admin abilities. Similarly, an "options per user" solution must persist only a subset of options (those which are neither system nor per-query options), and must solve the synchronization issue. Notes have also explained that the customary definition of "tenant" is an organization with multiple users. A true multi-tenant solution must allow multiple users per tenant, and provide a path toward full tenant isolation later. The challenge here is to find a design that balances the very specific, ad-hoc, unusual needs of this one use case, with something that can evolve to become of general use to the Drill community. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
