Answers inline.

On 7/30/15 4:56 PM, Neeraja Rentachintala wrote:
Few questions/comments inline.

On Thu, Jul 30, 2015 at 2:53 PM, mehant baid <[email protected]> wrote:

  Based on the discussion in the hangout I wanted to start a thread around
Drop table support.

Couple of high level points about what is planned to be supported

1. In the first iteration Drop table will only support dropping tables in
the file system and not dropping tables in Hive/ Hbase or other storage
plugins.
2. Since Drop table is potentially "risky" we want to be pessimistic about
dropping tables.

There are two broad scenarios while dealing with Drop table - Security
enabled and Security Disabled. In both cases we would like to follow the
below workflow

1. Check if the table being dropped can be consumed by Drill.

[Neeraja] I am assuming if security is enabled, this is done with the
impersonated user identity. is this accurate.
/This is orthogonal to security/ file permissions. We want to make sure the directory we are dropping only contains homogenous file formats that Drill can read (eg: only .parquet, .json etc)./
     * Meaning do all the files in the directories conform to a format that
Drill can read (parquet, json, csv etc). Jacques pointed out that if there
is a bug in this logic where if one of the files in the directory conforms
to a format that Drill can read we create a DrillTable and error out if we
encounter other files we cannot read.

[Neeraja] What does it mean to create DrillTable here?
/I leaked a bit of existing implementation detail here. //The point I was trying to make was that the check for homogenous files in a directory applies to select and drop. /

     * The above point can in the worst case entail reading the entire file
system, if a user issues a drop table command on the root of the file
system. But its more likely that we will encounter a file that Drill cannot
read soon and abort the Drop with an error.
     * Another minor clarification is we consider only those directories to
be consumable by Drill if they contain file formats that are homogenous and
can be read by Drill. For eg: we should fail if a user is trying to delete
a directory that contains both JSON and Parquet files.

2. Once we have confirmed that the table requested to be dropped contains
homogenous files which can be read by Drill, we delve into the file
permissions.
     * If security is enabled, we impersonate the user issuing the command
and drop the directory (succeeds if FS allows and user has correct
permissions).
     * If security is not enabled, we only drop the directory if all the
files are owned by the user Drillbit is running as (being pessimistic about
drop). We should collect this information when checking for homogenous
files.

[Neeraja] Why do we need this check. How is this different from the
impersonated user scenario.
/The above check is in the case when security is not enabled. Meaning we are executing as the Drill user. If we are running as the Drill user (which might be root or a super user) its likely that this user has permissions to delete most files and checking for permissions might not suffice. So when security isn't enabled the proposal is to delete only those files that are owned (created) by the Drill user./

Open Questions:

Views: How do we handle views that were created on top of the dropped
table. Following are a couple of scenarios we might want to explore
     * Views are treated as a different entity and its useful for the user
to have a view definition still in place as the dropped table will be
replaced with new set of files with the exact schema and existing view
definition suffices. AFAIK, Oracle and SQL Server have this model and don't
drop the views if the base table is dropped.
     * Once the table is dropped, the view definition is no longer needed
and hence should be dropped automatically. We can probably punt on this
till we have dotdrill files. With dotdrill files we can maintain some
information to indicate the views on this table and can drop the views
implicitly. But given that some of the popular databases don't do this, we
might want to conform to the standard behavior.

[Neeraja] Agree with the recommendation here. It seems we can go with a
simpler approach here i.e treat views as different entity

Also will there any mechanism to recover once you accidentally drop?

Thanks
Mehant


Reply via email to