[
https://issues.apache.org/jira/browse/DRILL-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Arina Ielchiieva updated DRILL-540:
-----------------------------------
Reviewer: Vitalii Diravka
> Allow querying hive views in Drill
> ----------------------------------
>
> Key: DRILL-540
> URL: https://issues.apache.org/jira/browse/DRILL-540
> Project: Apache Drill
> Issue Type: New Feature
> Components: Storage - Hive
> Reporter: Ramana Inukonda Nagaraj
> Assignee: Igor Guzenko
> Priority: Major
> Labels: doc-impacting
> Fix For: 1.16.0
>
>
> Currently hive views cannot be queried from drill.
> This Jira aims to add support for Hive views in Drill.
> *Implementation details:*
> # Drill persists it's views metadata in file with suffix .view.drill using
> json format. For example:
> {noformat}
> {
> "name" : "view_from_calcite_1_4",
> "sql" : "SELECT * FROM `cp`.`store.json`WHERE `store_id` = 0",
> "fields" : [ {
> "name" : "*",
> "type" : "ANY",
> "isNullable" : true
> } ],
> "workspaceSchemaPath" : [ "dfs", "tmp" ]
> }
> {noformat}
> Later Drill parses the metadata and uses it to treat view names in SQL as a
> subquery.
> 2. In Apache Hive metadata about views is stored in similar way to
> tables. Below is example from metastore.TBLS :
>
> {noformat}
> TBL_ID |CREATE_TIME |DB_ID |LAST_ACCESS_TIME |OWNER |RETENTION |SD_ID
> |TBL_NAME |TBL_TYPE |VIEW_EXPANDED_TEXT |
> -------|------------|------|-----------------|------|----------|------|----------|--------------|-------------------------------------------|
> 2 |1542111078 |1 |0 |mapr |0 |2 |cview
> |VIRTUAL_VIEW |SELECT COUNT(*) FROM `default`.`customers` |
> {noformat}
> 3. So in Hive metastore views are considered as tables of special type.
> And main benefit is that we also have expanded SQL definition of views (just
> like in view.drill files). Also reading of the metadata is already
> implemented in Drill with help of thrift Metastore API.
> 4. To enable querying of Hive views we'll reuse existing code for Drill
> views as much as possible. First in *_HiveSchemaFactory.getDrillTable_* for
> _*HiveReadEntry*_ we'll convert the metadata to instance of _*View*_ (_which
> is actually model for data persisted in .view.drill files_) and then based on
> this instance return new _*DrillViewTable*_. Using this approach drill will
> handle hive views the same way as if it was initially defined in Drill and
> persisted in .view.drill file.
> 5. For conversion of Hive types: from _*FieldSchema*_ to _*RelDataType*_
> we'll reuse existing code from _*DrillHiveTable*_, so the conversion
> functionality will be extracted and used for both (table and view) fields
> type conversions.
> *Security implications*
> Consider simple example case where we have users,
> {code:java}
> user0 user1 user2
> \ /
> group12
> {code}
> and sample db where object names contains user or group who should access
> them
> {code:java}
> db_all
> tbl_user0
> vw_user0
> tbl_group12
> vw_group12
> {code}
> There are two Hive authorization modes supported by Drill - SQL Standart and
> Strorage Based authorization. For SQL Standart authorization permissions
> were granted using SQL:
> {code:java}
> SET ROLE admin;
> GRANT SELECT ON db_all.tbl_user0 TO USER user0;
> GRANT SELECT ON db_all.vw_user0 TO USER user0;
> CREATE ROLE group12;
> GRANT ROLE group12 TO USER user1;
> GRANT ROLE group12 TO USER user2;
> GRANT SELECT ON db_all.tbl_group12 TO ROLE group12;
> GRANT SELECT ON db_all.vw_group12 TO ROLE group12;
> {code}
> And for Storage based authorization permissions were granted using commands:
> {code:java}
> hadoop fs -chown user0:user0 /user/hive/warehouse/db_all.db/tbl_user0
> hadoop fs -chmod 700 /user/hive/warehouse/db_all.db/tbl_user0
> hadoop fs -chmod 750 /user/hive/warehouse/db_all.db/tbl_group12
> hadoop fs -chown user1:group12
> /user/hive/warehouse/db_all.db/tbl_group12{code}
> Then the following table shows us results of queries for both authorization
> models.
> *SQL
> Standart Storage Based Authorization*
> ||SQL||user0||user1||user2|| ||user0||user1||user2||
> |SHOW TABLES IN hive.db_all;| all| all| all| |Accessibe tables + all
> views|Accessibe tables + all views|Accessibe tables + all views|
> |SELECT * FROM hive.db_all.tbl_user0;| (/)| (x)| (x)| | (/)|
> (x)| (x)|
> |SELECT * FROM hive.db_all.vw_user0;| (/)| (x)| (x)| | (/)|
> (x)| (x)|
> |SELECT * FROM hive.db_all.tbl_group12;| (x)| (/)| (/)| | (x)|
> (/)| (/)|
> |SELECT * FROM hive.db_all.vw_group12;| (x)| (/)| (/)| | (x)|
> (/)| (/)|
> |SELECT * FROM INFORMATION_SCHEMA.`TABLES`
> WHERE TABLE_SCHEMA = 'hive.db_all';| all| all| all| |Accessibe tables +
> all views|Accessibe tables + all views|Accessibe tables + all views|
> |DESCRIBE hive.db_all.tbl_user0;| (/)| (x)| (x)| | (/)|
> (x)| (x)|
> |DESCRIBE hive.db_all.vw_user0;| (/)| (x) | (x)| | (/)|
> (/)| (/)|
> |DESCRIBE hive.db_all.tbl_group12;| (x)| (/)| (/)| | (x)|
> (/) | (/)|
> |DESCRIBE hive.db_all.vw_group12;| (x)| (/)| (/)| | (/)|
> (/)| (/)|
> (!) *Warning:* Because views in Hive aren't present as physical files and
> access can't be granted using file system commands, then access to Hive views
> for *Storage Based Authorization* is based on the underlying tables used in
> view definition. For current example views were defined as selection over
> appropriate tables.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)