Based on the discussion in the hangout I wanted to start a thread around
Drop table support.
Couple of high level points about what is planned to be supported
1. In the first iteration Drop table will only support dropping tables in
the file system and not dropping tables in Hive/ Hbase or other storage
plugins.
2. Since Drop table is potentially "risky" we want to be pessimistic about
dropping tables.
There are two broad scenarios while dealing with Drop table - Security
enabled and Security Disabled. In both cases we would like to follow the
below workflow
1. Check if the table being dropped can be consumed by Drill.
* Meaning do all the files in the directories conform to a format that
Drill can read (parquet, json, csv etc). Jacques pointed out that if there
is a bug in this logic where if one of the files in the directory conforms
to a format that Drill can read we create a DrillTable and error out if we
encounter other files we cannot read.
* The above point can in the worst case entail reading the entire file
system, if a user issues a drop table command on the root of the file
system. But its more likely that we will encounter a file that Drill cannot
read soon and abort the Drop with an error.
* Another minor clarification is we consider only those directories to
be consumable by Drill if they contain file formats that are homogenous and
can be read by Drill. For eg: we should fail if a user is trying to delete
a directory that contains both JSON and Parquet files.
2. Once we have confirmed that the table requested to be dropped contains
homogenous files which can be read by Drill, we delve into the file
permissions.
* If security is enabled, we impersonate the user issuing the command
and drop the directory (succeeds if FS allows and user has correct
permissions).
* If security is not enabled, we only drop the directory if all the
files are owned by the user Drillbit is running as (being pessimistic about
drop). We should collect this information when checking for homogenous
files.
Open Questions:
Views: How do we handle views that were created on top of the dropped
table. Following are a couple of scenarios we might want to explore
* Views are treated as a different entity and its useful for the user
to have a view definition still in place as the dropped table will be
replaced with new set of files with the exact schema and existing view
definition suffices. AFAIK, Oracle and SQL Server have this model and don't
drop the views if the base table is dropped.
* Once the table is dropped, the view definition is no longer needed
and hence should be dropped automatically. We can probably punt on this
till we have dotdrill files. With dotdrill files we can maintain some
information to indicate the views on this table and can drop the views
implicitly. But given that some of the popular databases don't do this, we
might want to conform to the standard behavior.
Thanks
Mehant