Hi Vinayakumar,

The staging tables are dynamic. From the Hadoop security team perspective,
it is unrealistic to force every data writer to do that because they are so
many and they write in different ways.

Rename is just one scenario and there are other scenarios. For example,
when permission is changed, we need to apply that change to every file
today. If we can have that flag, we only change the table. or
partition directories.

Xinli


On Sat, Oct 17, 2020 at 12:14 AM Vinayakumar B <vinayakum...@apache.org>
wrote:

> IIUC, hive renames are from hive’s staging directory during write to final
> destination within table.
>
> Why not set the default ACLs of staging directory to whatever expected, and
> then continue write remaining files.
>
> In this way even after rename you will have expected ACLs on the final
> files.
>
> Setting default ACLs on staging directory can be done using single RPC.
>
> -Vinay
>
> On Sat, 17 Oct 2020 at 8:08 AM, Xinli shang <sha...@uber.com.invalid>
> wrote:
>
> > Thanks Owen for your reply! As mentioned in the Jira, default ACLs don't
> > apply to rename. Any idea how rename can work without setting ACLs per
> > file?
> >
> > On Fri, Oct 16, 2020 at 7:25 PM Owen O'Malley <owen.omal...@gmail.com>
> > wrote:
> >
> > > I'm very -1 on adding these semantics.
> > >
> > > When you create the table's directory, set the default ACL. That will
> > have
> > > exactly the effect that you are looking for without creating additional
> > > semantics.
> > >
> > > .. Owen
> > >
> > > On Fri, Oct 16, 2020 at 7:02 PM Xinli shang <sha...@uber.com.invalid>
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I opened https://issues.apache.org/jira/browse/HDFS-15638 and want
> to
> > > > collect feedback from the community. I know whenever changing the
> > > > permission model that follows POSIX model is never a trivial change.
> So
> > > > please comment on if you have concerns. For reading convenience, here
> > is
> > > a
> > > > copy of the ticket.
> > > >
> > > > *Problem*: Currently, when a user tries to accesses a file he/she
> needs
> > > the
> > > > permissions of it's parent and ancestors and the permission of that
> > file.
> > > > This is correct generally, but for Hive tables directories/files, all
> > the
> > > > files under a partition or even a table usually have the same
> > permissions
> > > > for the same set of ACL groups. Although the permissions and ACL
> groups
> > > are
> > > > the same, the writer still need to call setfacl() for every file to
> add
> > > > LDAP groups. This results in a huge amount of RPC calls to NN. HDFS
> has
> > > > default ACL to solve that but that only applies to create and copy,
> but
> > > not
> > > > apply to rename. However, in Hive ETL, rename is very common.
> > > >
> > > > *Proposal*: Add a 1-bit flag to directory inodes to indicate whether
> or
> > > not
> > > > it is a Hive table directory. If that flag is set, then all the
> > > > sub-directory and files under it will just use it's permission and
> ACL
> > > > groups settings. By doing this way, Hive ETL doesn't need to set
> > > > permissions at the file level. If that flag is not set(by default),
> > work
> > > as
> > > > before. To set/unset that flag, it would require admin privilege.
> > > >
> > > > --
> > > > Xinli Shang
> > > >
> > >
> >
> >
> > --
> > Xinli Shang
> >
> --
> -Vinay
>

Reply via email to