Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The "Hive/AuthDev" page has been changed by HeYongqiang. http://wiki.apache.org/hadoop/Hive/AuthDev -------------------------------------------------- New page: = 1. Privilege = == 1.1 Access Privilege == Admin privilege, DB privilege, Table level privilege, column level privilege 1.1.1 Admin privileges are global privileges, and are used to perform administration. 1.1.2 DB privileges are database specific, and apply to all objects inside that database. 1.1.3 Table privileges apply to table/view/index in a given database 1.1.4 Column privileges apply to column level. All DB/Table/Column privilege differentiate read and write privileges even though now hive does not support column level overwrite. And there is no partition level privilege. = 2. Hive Operations = create index/drop index create database/drop database create table/drop table create view/drop view alter table show databases lock table/unlock table/show lock add partition archive Select insert overwrite directory insert overwrite table others include "create table as ", "create table like" etc = 3. Metadata = Store the privilege information in the new metastore table 'user', 'db', ['host'], 'tables_priv', 'columns_priv'. The user table indicates user's global privileges, which apply to all databases. The db table determine database level access privileges, which apply to all objects inside that database. The hots table is used to constrain the host names from which the privileges are granted to the given user. [I am not sure if we need to have this table.] == 3.1 user, group, and roles == User can belong to some groups. The group information is provided by authenticator. And each user or group can have some roles. And role can be a member of a role, but can not in a circular. So hive metadata needs to store: 1) roles 2) Hive user/group -> role mapping === 3.1.1 Role management === create role drop role add a user to a role remove a user from a role === 3.1.2 role metadata === role_name - string create_time - int === 3.1.3 hive role user membership table === role_name - string user_name - string is_group -- is the user name a group name is_role -- is the user name a role name == 3.2 Privileges to be supported by Hive == === 3.2.1 metadata === The below shows how we store the grant information in metastore. The deny information is stored in a same matter (just in different tables). So for each grant table, there will also be a deny table. The metastore tables are user, deny_user, db, deny_db, tables_priv, deny_tables_priv, columns_priv, deny_columns_priv Another way to do it is to add a column in the grant table to record this row is grant or deny. We store privileges in one column, and use comma to separate different privileges. hive> desc user; Field - - - - User isRole isGroup isSuper db_priv -- set (Select_priv, Insert_priv, Create_priv, Drop_priv, Reload_priv, Grant_priv, Index_priv, Alter_priv, Show_db_priv, Lock_tables_priv, Create_view_priv, Show_view_priv) hive> desc db; Field - - - - Db User isRole isGroup Table_priv -- set (Select_priv, Insert_priv, Create_priv, Drop_priv, Grant_priv, Index_priv, Reload_priv, Alter_priv, Create_tmp_table_priv, Lock_tables_priv, Create_view_priv, Show_view_priv) hive> desc tables_priv; Field - - - - Db User isRole isGroup Table_name Grantor Timestamp Table_priv -- set('Select','Insert',,'Create','Drop','Grant','Index','Alter','Create View','Show view') Column_priv -- set('Select','Insert',) mysql> desc columns_priv; Field - - - - Db User isRole isGroup Table_name Column_name Timestamp Column_priv -- set('Select','Insert','Update') = 4. grant/revoke access privilege = == 4.1 Privilege names/types: == ALL Privileges ALTER Create Create temporary tables Ceate view Delete Drop Index Insert Lock Tables Select Show databases show view Super Update == 4.2 show grant == == 4.3 grant/revoke statement == GRANT priv_type [(column_list)] [, priv_type [(column_list)]] ... ON [object_type] priv_level TO user [, user] ... WITH ADMIN OPTION object_type: TABLE priv_level: * | *.* | db_name.* | db_name.tbl_name | tbl_name REVOKE priv_type [(column_list)] [, priv_type [(column_list)]] ... ON [object_type] priv_level FROM user [, user] ... REVOKE ALL PRIVILEGES, GRANT OPTION FROM user [, user] ... DENY priv_type [(column_list)] [, priv_type [(column_list)]] ... ON [object_type] priv_level FROM user [, user] ... = 5. Authorization verification = == 5.1 USER/GROUP/ROLE == USER GROUP ROLE GROUP is very similar to a role. And we support Group is because we may need to pass the group information to HDFS/Map-reduce. But role does not need to be a group. Role can be nested but not circular. [ In Oracle, a role groups several privileges and roles, so that they can be granted to and revoked from users simultaneously. A role must be enabled for a user before it can be used by the user. And in Oracle, there is role Authorization. Create role/Drop role requires CREATE ROLE system privilege to perform. ] == 5.2 The verification steps == When a user logins to the system, he has a user name, one or few groups that he belongs to. And he also may be granted to some roles. So it is [ username, list of group names, list of roles that has been directly granted to himself, list of roles that been directly granted groups that users belongs to ]. First try user name: first try to deny this access by look up the deny tables by user name: 1. If there is an entry in 'user' that deny this access, return DENY 2. If there is an entry in 'db' that deny this access, return DENY 3. If there is an entry in 'table' that deny this access, return DENY 4. If there is an entry in 'column' that deny this access, return DENY if deny failed, go through all privilege levels with the user name: 5. If there is an entry in 'user' that accept this access, return ACCEPT 6. If there is an entry in 'db' that accept this access, return ACCEPT 7. If there is an entry in 'table' that accept this access, return ACCEPT 8. If there is an entry in 'column' that accept this access, return ACCEPT Second try the user's group/role names one by one until we get an ACCEPT or DENY. If we get one DENY from one group/role, will DENY this access. For each role/group, we do the same routine as we did for user name. = 5.3 Examples = 5.3.1 I want to grant everyone (new people may join at anytime) to db_name.*, and then later i want to protect one table db_name.T from ALL users but a few 1) Add all users to a group 'users'. (assumption: new users will automatically join this group). And grant 'users' ALL privileges to db_name.* 2) Add those few users to a new group 'users2'. AND REMOVE them from 'users' 3) DENY 'users' to db_name.T 4) Grant ALL on db_name.T to users2 5.3.2 I want to protect one table db_name.T from one/few users, but all other people can access it 1) Add all users to a group 'users'. (assumption: new users will automatically join this group). And grant 'users' ALL privileges to db_name.*. 2) Add those few users to a new group 'users2'. (Note: those few users will now belong to 2 groups: users and user2) 3) DENY 'users2' to db_name.T = 6. Where to add authorization in Hive = CliDriver and HiveServer. Basically they share the same code. If HiveServer invokes CliDriver, we can just add it into CliDriver. And we also need to make HiveServer be able to support multiple user/connections. = 7. Implementation = == 7.1 Authenticator interface == We only get the user's user name, group names from the authenticator. The authenticator implementations need to provide these information. This is the only interface between authenticator and authorization. == 7.2 Authorization == Authorization decision manager manages a set of authorization provider, and each provider can decide to accept or deny. And it is the decision manager to do the final decision. Can be vote based, or one -1 then deny, or one +1 then accept. Authorization provider decides whether to accept or deny an access based on his own information. ------------ = HDFS Permission = The above has a STRONG assumption on the file layer security. Users can easily by-pass the security if the hdfs file permission is open to him. We hope we can be able to plug in external authorizations (like HDFS permission) easily to alter the authorization result or even the rule.
