The issue is that for those customers who do have such storage plugin names, it's too late to rename after an offline upgrade - as there is no easy way to access the storage plugin configurations if Drillbits are down (due to Drillbit start-up failing). Might be okay, if admins perform a rolling upgrade (newer Drillbits would fail, but older Drillbits can be used to update storage plugin config), but that's not fully supported. Ideally, we'll need to find a way to not fail startup, instead disable the plugins which have issues, but if that's a complex and separate task, for now we should perhaps clearly document that this would be a breaking change after upgrade, so users should fix the plugins before they proceed.
On Wed, Jun 13, 2018 at 3:42 AM Arina Yelchiyeva <arina.yelchiy...@gmail.com> wrote: > From the Drill code workspaces are already case insensitive (though the > documentation states the opposite). Since there were no complaints from the > users so far, I believe there are not many (if any) who uses the same names > in different case. > Regarding those users that already have duplicating storage plugins names, > after the change Drill start up will fail with appropriate error message > and they would have to rename those storage plugins. > > Kind regards, > Arina > > > On Tue, Jun 12, 2018 at 8:45 PM Abhishek Girish <agir...@apache.org> > wrote: > > > Paul, I think this proposal was specific to storage plugin and workspace > > *names*. And not for the whole of Drill. > > > > I agree it makes sense to have these names case insensitive, to improve > > user experience. The only impact to current users I can think of is if > > someone created two storage plugins dfs and DFS. Or configured workspaces > > tmp and TMP. In this case, they'd need to rename those. One thing I'm not > > clear on is how we'll handle upgrades in these cases. > > > > On Tue, Jun 12, 2018 at 10:31 AM Paul Rogers <par0...@yahoo.com.invalid> > > wrote: > > > > > Hi All, > > > > > > As it turns out, this topic has been discussed, in depth, previously. > > > Can't recall if it was on this list, or in a JIRA. > > > > > > We face a number of constraints: > > > > > > * As was noted, for some data sources, the data source itself has case > > > insensitive names. (Windows file systems, RDBMSs, etc.) > > > * In other cases, the data source itself has case sensitive names. > (HDFS > > > file system, Linux file systems, JSON, etc.) > > > * SQL is defined to be case insensitive. > > > * We now have several years of user queries, in production, based on > the > > > current semantics. > > > > > > Given all this, it is very likely that simply shifting to > case-sensitive > > > will break existing applications. > > > > > > Perhaps a more subtle solution is to make the case-sensitivity a > property > > > of the symbol that is carried through the query pipeline as another > piece > > > of metadata. > > > > > > Thus, a workspace that corresponds to a DB schema would be labeled as > > case > > > insensitive. A workspace that corresponds to an HDFS directory would be > > > case sensitive. Names defined within Drill (as part of an AS clause), > > would > > > follow SQL rules and be case insensitive. > > > > > > I believe that, if we sit down and work out exactly what users would > > > expect, and what is required to handle both case sensitive and case > > > insensitive names, we'll end up with a solution not far from the above > -- > > > out of simple necessity. > > > > > > Thanks, > > > - Paul > > > > > > > > > > > > On Tuesday, June 12, 2018, 8:36:01 AM PDT, Arina Yelchiyeva < > > > arina.yelchiy...@gmail.com> wrote: > > > > > > To make it clear we have three notions here: storage plugin name, > > > workspace > > > (schema) and table name (dfs.root.`/tmp/t`). > > > My suggestion is the following: > > > Storage plugin names to be case insensitive (DFS vs dfs, > > INFORMATION_SCHEMA > > > vs information_schema). > > > Workspace (schemas) names to be case insensitive (ROOT vs root, TMP vs > > > tmp). Even if user has two directories /TMP and /tmp, he can create two > > > workspaces but not both with tmp name. For example, tmp vs tmp_u. > > > Table names case sensitivity are treated per plugin. For example, > system > > > plugins (information_schema, sys) table names (views, tables) should be > > > case insensitive. Actually, currently for sys plugin table names are > case > > > insensitive, information_schema table names are case sensitive. That > > needs > > > to be synchronized. For file system plugins table names must be case > > > sensitive, since under table name we imply directory / file name and > > their > > > case sensitivity depends on file system. > > > > > > Kind regards, > > > Arina > > > > > > On Tue, Jun 12, 2018 at 6:13 PM Aman Sinha <amansi...@gmail.com> > wrote: > > > > > > > Drill is dependent on the underlying file system's case sensitivity. > > On > > > > HDFS one can create 'hadoop fs -mkdir /tmp/TPCH' and /tmp/tpch > which > > > are > > > > separate directories. > > > > These could be set as workspace in Drill's storage plugin > configuration > > > and > > > > we would want the ability to query both. If we change the current > > > > behavior, we would want > > > > some way, either using back-quotes ` or other way to support that. > > > > > > > > RDBMSs seem to have vendor-specific behavior... > > > > In MySQL [1] the database name and schema name are case-sensitive on > > > Linux > > > > and case-insensitive on Windows. Whereas in Postgres it converts the > > > > database name and schema name to lower-case by default but one can > put > > > > double-quotes to make it case-sensitive [2]. > > > > > > > > [1] > > > > > > https://dev.mysql.com/doc/refman/8.0/en/identifier-case-sensitivity.html > > > > [2] > > > > > > > > > > http://www.postgresqlforbeginners.com/2010/11/gotcha-case-sensitivity.html > > > > > > > > > > > > > > > > On Tue, Jun 12, 2018 at 5:01 AM, Arina Yelchiyeva < > > > > arina.yelchiy...@gmail.com> wrote: > > > > > > > > > Hi all, > > > > > > > > > > Currently Drill we treat storage plugin names and workspaces as > > > > > case-sensitive [1]. > > > > > Names for storage plugins and workspaces are defined by the user. > So > > we > > > > > allow to create plugin -> DFS and dfs, workspace -> tmp and TMP. > > > > > I have a suggestion to move to case insensitive approach and won't > > > allow > > > > > creating two plugins / workspaces with the same name in different > > case > > > at > > > > > least for the following reasons: > > > > > 1. usually rdbms schema and table names are case insensitive and > many > > > > users > > > > > are used to this approach; > > > > > 2. in Drill we have INFORMATION_SCHEMA schema which is in upper > case, > > > sys > > > > > in lower case. > > > > > personally I find it's extremely inconvenient. > > > > > > > > > > Also we should consider making table names case insensitive for > > system > > > > > schemas (info, sys). > > > > > > > > > > Any thoughts? > > > > > > > > > > [1] https://drill.apache.org/docs/lexical-structure/ > > > > > > > > > > > > > > > Kind regards, > > > > > Arina > > > > > > > > > > > > > > >