[ https://issues.apache.org/jira/browse/DRILL-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813007#comment-16813007 ]
Igor Guzenko commented on DRILL-540: ------------------------------------ Hi [~bbevens], Sounds very good, thank you. > Allow querying hive views in Drill > ---------------------------------- > > Key: DRILL-540 > URL: https://issues.apache.org/jira/browse/DRILL-540 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Hive > Reporter: Ramana Inukonda Nagaraj > Assignee: Igor Guzenko > Priority: Major > Labels: doc-impacting, ready-to-commit > Fix For: 1.16.0 > > > Currently hive views cannot be queried from drill. > This Jira aims to add support for Hive views in Drill. > *Implementation details:* > # Drill persists it's views metadata in file with suffix .view.drill using > json format. For example: > {noformat} > { > "name" : "view_from_calcite_1_4", > "sql" : "SELECT * FROM `cp`.`store.json`WHERE `store_id` = 0", > "fields" : [ { > "name" : "*", > "type" : "ANY", > "isNullable" : true > } ], > "workspaceSchemaPath" : [ "dfs", "tmp" ] > } > {noformat} > Later Drill parses the metadata and uses it to treat view names in SQL as a > subquery. > 2. In Apache Hive metadata about views is stored in similar way to > tables. Below is example from metastore.TBLS : > > {noformat} > TBL_ID |CREATE_TIME |DB_ID |LAST_ACCESS_TIME |OWNER |RETENTION |SD_ID > |TBL_NAME |TBL_TYPE |VIEW_EXPANDED_TEXT | > -------|------------|------|-----------------|------|----------|------|----------|--------------|-------------------------------------------| > 2 |1542111078 |1 |0 |mapr |0 |2 |cview > |VIRTUAL_VIEW |SELECT COUNT(*) FROM `default`.`customers` | > {noformat} > 3. So in Hive metastore views are considered as tables of special type. > And main benefit is that we also have expanded SQL definition of views (just > like in view.drill files). Also reading of the metadata is already > implemented in Drill with help of thrift Metastore API. > 4. To enable querying of Hive views we'll reuse existing code for Drill > views as much as possible. First in *_HiveSchemaFactory.getDrillTable_* for > _*HiveReadEntry*_ we'll convert the metadata to instance of _*View*_ (_which > is actually model for data persisted in .view.drill files_) and then based on > this instance return new _*DrillViewTable*_. Using this approach drill will > handle hive views the same way as if it was initially defined in Drill and > persisted in .view.drill file. > 5. For conversion of Hive types: from _*FieldSchema*_ to _*RelDataType*_ > we'll reuse existing code from _*DrillHiveTable*_, so the conversion > functionality will be extracted and used for both (table and view) fields > type conversions. > *Security implications* > Consider simple example case where we have users, > {code:java} > user0 user1 user2 > \ / > group12 > {code} > and sample db where object names contains user or group who should access > them > {code:java} > db_all > tbl_user0 > vw_user0 > tbl_group12 > vw_group12 > {code} > There are two Hive authorization modes supported by Drill - SQL Standart and > Strorage Based authorization. For SQL Standart authorization permissions > were granted using SQL: > {code:java} > SET ROLE admin; > GRANT SELECT ON db_all.tbl_user0 TO USER user0; > GRANT SELECT ON db_all.vw_user0 TO USER user0; > CREATE ROLE group12; > GRANT ROLE group12 TO USER user1; > GRANT ROLE group12 TO USER user2; > GRANT SELECT ON db_all.tbl_group12 TO ROLE group12; > GRANT SELECT ON db_all.vw_group12 TO ROLE group12; > {code} > And for Storage based authorization permissions were granted using commands: > {code:java} > hadoop fs -chown user0:user0 /user/hive/warehouse/db_all.db/tbl_user0 > hadoop fs -chmod 700 /user/hive/warehouse/db_all.db/tbl_user0 > hadoop fs -chmod 750 /user/hive/warehouse/db_all.db/tbl_group12 > hadoop fs -chown user1:group12 > /user/hive/warehouse/db_all.db/tbl_group12{code} > Then the following table shows us results of queries for both authorization > models. > > *SQL Standart | Storage Based > Authorization* > ||SQL||user0||user1||user2|| ||user0||user1||user2|| > |*Queries executed using Drill :*| | | | | | | | > |SHOW TABLES IN hive.db_all;| all| all| all| |Accessibe tables + all > views|Accessibe tables + all views|Accessibe tables + all views| > |SELECT * FROM hive.db_all.tbl_user0;| (/)| (x)| (x)| | (/)| > (x)| (x)| > |SELECT * FROM hive.db_all.vw_user0;| (/)| (x)| (x)| | (/)| > (x)| (x)| > |SELECT * FROM hive.db_all.tbl_group12;| (x)| (/)| (/)| | (x)| > (/)| (/)| > |SELECT * FROM hive.db_all.vw_group12;| (x)| (/)| (/)| | (x)| > (/)| (/)| > |SELECT * FROM INFORMATION_SCHEMA.`TABLES` > WHERE TABLE_SCHEMA = 'hive.db_all';| all| all| all| |Accessibe tables + > all views|Accessibe tables + all views|Accessibe tables + all views| > |DESCRIBE hive.db_all.tbl_user0;| (/)| (x)| (x)| | (/)| > (x)| (x)| > |DESCRIBE hive.db_all.vw_user0;| (/)| (x) | (x)| | (/)| > (/)| (/)| > |DESCRIBE hive.db_all.tbl_group12;| (x)| (/)| (/)| | (x)| > (/) | (/)| > |DESCRIBE hive.db_all.vw_group12;| (x)| (/)| (/)| | (/)| > (/)| (/)| > |*Queries executed using Beeline and Hive CLI:*| | | | | | | | > |SHOW TABLES IN db_all;| all| all| all| | all| all| > all| > |SELECT * FROM hive.db_all.tbl_user0;| (/)| (x) | (x)| | (/)| > (x)| (x)| > |SELECT * FROM db_all.vw_user0;| (/)| (x)| (x)| | (/)| > (x)| (x)| > |SELECT * FROM db_all.tbl_group12;| (x)| (/)| (/)| | (x)| > (/)| (/)| > |SELECT * FROM db_all.vw_group12;| (x)| (/)| (/)| | (x)| > (/)| (/)| > | | | | | | | | | > |DESCRIBE db_all.tbl_user0;| (/)| (x) | (x)| | (/)| (x)| > (x)| > |DESCRIBE db_all.vw_user0;| (/)| (x)| (x)| | (/)| (/)| > (/)| > |DESCRIBE db_all.tbl_group12;| (x)| (/)| (/)| | (x)| > (/)| (/)| > |DESCRIBE db_all.vw_group12;| (x) | (/)| (/)| | (/)| > (/)| (/)| > (!) *Warning:* Because views in Hive aren't present as physical files and > access can't be granted using file system commands, then access to Hive views > for *Storage Based Authorization* is based on the underlying tables used in > view definition. For current example views were defined as selection over > appropriate tables. > *For documentation* > Remove note from > [https://drill.apache.org/docs/querying-the-information-schema/] that states: > {noformat} > NOTE > Currently, Drill only supports querying Drill views; Hive views are not yet > supported. > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)