I think there has been reasonable agreement as to what is to be
supported in the first iteration of this feature. I have summarized the
decisions made on this list in a document. If you have any more
suggestions please get them in by today.
https://docs.google.com/document/d/1XFdNMXnCZ4cLFcg1gHRutBo_hZ9WzvCuKx3Fithd4-k/edit?usp=sharing
Thanks
Mehant
On 8/5/15 1:49 PM, Neeraja Rentachintala wrote:
Another question/comment.
Does Drill need to manage concurrency for the Drop table i.e how do you
deal with users trying to read the data while somebody is dropping. Does it
need to implement some kind of locking.
I have some thoughts on that but would like to know others think - Drill is
not (yet) a transactional system but rather an interactive query layer on
variety of stores. The couple of most common use cases I can think of in
this context are - a user doing analytics/exploration and as part of it he
would create some intermediate tables, insert data into them and drop the
tables or BI tools generating these intermediate tables for processing
queries. Both these do not have the concurrency issue..
Additionally given that the data is externally managed, there could always
be other processes adding and deleting files and Drill doesn't even have
control over them.
Overall, I think the first phase of DROP implementation might be ok not to
have these locking/concurrency checks.
Thoughts?
-Neeraja
On Wed, Aug 5, 2015 at 11:54 AM, Mehant Baid <[email protected]> wrote:
What you are suggesting makes sense in the case when security is enabled.
So when Drill is accessing the file system it will impersonate the user who
issued the command and drop will happen if the user has sufficient
permissions.
However when security isn't enabled, Drill will be accessing the file
system as the Drill user itself which is most likely to be a super user who
has permissions to delete most files. To prevent any catastrophic drops
checking for homogenous file formats makes sure that at least the directory
being dropped is something that can be read by Drill. This will prevent any
accidental drops (like dropping the home directory etc, because its likely
to have file formats that cannot be read by Drill). This will not prevent
against malicious behavior (for handling this security should be enabled).
Thanks
Mehant
On 8/5/15 11:43 AM, Ted Dunning wrote:
Is any check really necessary?
Can't we just say that for data sources that are file-like that drop is a
rough synonym for rm? If you have permission to remove files and
directories, you can do it. If you don't, it will fail, possibly half
done. I have never seen a bug filed against rm to add more elaborate
semantics, so why is it so necessary for Drill to have elaborate semantics
here?
On Wed, Aug 5, 2015 at 11:09 AM, Ramana I N <[email protected]> wrote:
The homogenous check- Will it be just checking for types are homogenous or
if they are actually types that can be read by drill?
Also, is there a good way to determine if a file can be read by drill?
And
will there be a perf hit if there are large number of files?
Regards
Ramana
On Wed, Aug 5, 2015 at 11:03 AM, Mehant Baid <[email protected]>
wrote:
I agree, it is definitely restrictive. We can lift the restriction for
being able to drop a table (when security is off) only if the Drill user
owns it. I think the check for homogenous files should give us enough
confidence that we are not deleting a non Drill directory.
Thanks
Mehant
On 8/4/15 10:00 PM, Neeraja Rentachintala wrote:
Ted, thats fair point on the recovery part.
Regarding the other point by Mehant (copied below) ,there is an
implication
that user can drop only Drill managed tables (i.e created as Drill
user)
when security is not enabled. I think this check is too restrictive
(also
unintuitive). Drill doesn't have the concept of external/managed tables
and
a user (impersonated user if security is enabled or Drillbit service
user
if no security is enabled) should be able to drop the table if they have
permissions to do so. The above design proposes a check to verify if
the
files that need to be deleted are readable by Drill and I believe is a
good
validation to have.
/The above check is in the case when security is not enabled. Meaning
we
are executing as the Drill user. If we are running as the Drill user
(which
might be root or a super user) its likely that this user has
permissions
to
delete most files and checking for permissions might not suffice. So
when
security isn't enabled the proposal is to delete only those files that
are
owned (created) by the Drill user./
On Fri, Jul 31, 2015 at 12:09 AM, Ted Dunning <[email protected]>
wrote:
On Thu, Jul 30, 2015 at 4:56 PM, Neeraja Rentachintala <
[email protected]> wrote:
Also will there any mechanism to recover once you accidentally drop?
yes. Snapshots <
https://www.mapr.com/resources/videos/mapr-snapshots
.
Seriously, recovery of data due to user error is a platform thing. How
can
we recover from turning off the cluster? From removing a disk on an
Oracle
node?
I don't think that this is Drill's business.