[ 
https://issues.apache.org/jira/browse/DRILL-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-540:
-------------------------------
    Description: 
Currently hive views cannot be queried from drill.

This Jira aims to add support for Hive views in Drill.

*Implementation details:*
 # Drill persists it's views metadata in file with suffix .view.drill using 
json format. For example: 

{noformat}
{
 "name" : "view_from_calcite_1_4",
 "sql" : "SELECT * FROM `cp`.`store.json`WHERE `store_id` = 0",
 "fields" : [ {
 "name" : "*",
 "type" : "ANY",
 "isNullable" : true
 } ],
 "workspaceSchemaPath" : [ "dfs", "tmp" ]
}
{noformat}
Later Drill parses the metadata and uses it to treat view names in SQL as a 
subquery.

      2. In Apache Hive metadata about views is stored in similar way to 
tables. Below is example from metastore.TBLS :

 
{noformat}
TBL_ID |CREATE_TIME |DB_ID |LAST_ACCESS_TIME |OWNER |RETENTION |SD_ID |TBL_NAME 
 |TBL_TYPE      |VIEW_EXPANDED_TEXT                         |
-------|------------|------|-----------------|------|----------|------|----------|--------------|-------------------------------------------|
2      |1542111078  |1     |0                |mapr  |0         |2     |cview    
 |VIRTUAL_VIEW  |SELECT COUNT(*) FROM `default`.`customers` |

{noformat}
      3. So in Hive metastore views are considered as tables of special type. 
And main benefit is that we also have expanded SQL definition of views (just 
like in view.drill files). Also reading of the metadata is already implemented 
in Drill with help of thrift Metastore API.

      4. To enable querying of Hive views we'll reuse existing code for Drill 
views as much as possible. First in *_HiveSchemaFactory.getDrillTable_* for 
_*HiveReadEntry*_ we'll convert the metadata to instance of _*View*_ (_which is 
actually model for data persisted in .view.drill files_) and then based on this 
instance return new _*DrillViewTable*_. Using this approach drill will handle 
hive views the same way as if it was initially defined in Drill and persisted 
in .view.drill file. 

     5. For conversion of Hive types: from _*FieldSchema*_ to _*RelDataType*_ 
we'll reuse existing code from _*DrillHiveTable*_, so the conversion 
functionality will be extracted and used for both (table and view) fields type 
conversions. 

*Security implications*

Consider simple example case where we have users, 
{code:java}
user0  user1 user2
           \ /
          group12
{code}
and  sample db where object names contains user or group who should access them 
     
{code:java}
db_all
    tbl_user0
    vw_user0
    tbl_group12
    vw_group12
{code}
There are two Hive authorization modes supported  by Drill - SQL Standart and 
Strorage Based  authorization. For SQL Standart authorization permissions were 
granted using SQL: 
{code:java}
SET ROLE admin;
GRANT SELECT ON db_all.tbl_user0 TO USER user0;
GRANT SELECT ON db_all.vw_user0 TO USER user0;
CREATE ROLE group12;
GRANT ROLE group12 TO USER user1;
GRANT ROLE group12 TO USER user2;
GRANT SELECT ON db_all.tbl_group12 TO ROLE group12;
GRANT SELECT ON db_all.vw_group12 TO ROLE group12;
{code}
And for Storage based authorization permissions were granted using commands: 
{code:java}
hadoop fs -chown user0:user0 /user/hive/warehouse/db_all.db/tbl_user0
hadoop fs -chmod 700 /user/hive/warehouse/db_all.db/tbl_user0
hadoop fs -chmod 750 /user/hive/warehouse/db_all.db/tbl_group12
hadoop fs -chown user1:group12 /user/hive/warehouse/db_all.db/tbl_group12{code}
 Then the following table shows us results of queries for both authorization 
models. 

                                                                                
                        *SQL Standart     |            Storage Based 
Authorization*
||SQL||user0||user1||user2||   ||user0||user1||user2||
|*Queries executed using Drill :*| | | | | | | |
|SHOW TABLES IN hive.db_all;|   all|    all|   all| |Accessibe tables + all 
views|Accessibe tables + all views|Accessibe tables + all views|
|SELECT * FROM hive.db_all.tbl_user0;|   (/)|   (x)|   (x)| |        (/)|       
 (x)|         (x)|
|SELECT * FROM hive.db_all.vw_user0;|   (/)|   (x)|   (x)| |        (/)|        
(x)|         (x)|
|SELECT * FROM hive.db_all.tbl_group12;|   (x)|   (/)|   (/)| |        (x)|     
   (/)|         (/)|
|SELECT * FROM hive.db_all.vw_group12;|   (x)|   (/)|   (/)| |        (x)|      
  (/)|         (/)|
|SELECT * FROM INFORMATION_SCHEMA.`TABLES`
 WHERE TABLE_SCHEMA = 'hive.db_all';|   all|  all|  all| |Accessibe tables + 
all views|Accessibe tables + all views|Accessibe tables + all views|
|DESCRIBE hive.db_all.tbl_user0;|   (/)|   (x)|   (x)| |        (/)|         
(x)|         (x)|
|DESCRIBE hive.db_all.vw_user0;|   (/)|   (x) |   (x)| |        (/)|         
(/)|         (/)|
|DESCRIBE hive.db_all.tbl_group12;|   (x)|   (/)|   (/)| |        (x)|         
(/) |         (/)|
|DESCRIBE hive.db_all.vw_group12;|   (x)|   (/)|   (/)| |        (/)|         
(/)|         (/)|
|*Queries executed using Beeline and Hive CLI:*| | | | | | | |
|SHOW TABLES IN db_all;|   all|   all|   all| |        all|          all|       
   all|
|SELECT * FROM hive.db_all.tbl_user0;|   (/)|   (x) |   (x)| |        (/)|      
   (x)|         (x)|
|SELECT * FROM db_all.vw_user0;|   (/)|   (x)|   (x)| |        (/)|         
(x)|         (x)|
|SELECT * FROM db_all.tbl_group12;|   (x)|   (/)|   (/)| |        (x)|         
(/)|         (/)|
|SELECT * FROM db_all.vw_group12;|   (x)|   (/)|   (/)| |        (x)|         
(/)|         (/)|
| | | | | | | | |
|DESCRIBE db_all.tbl_user0;|   (/)|   (x) |   (x)| |        (/)|         (x)|   
      (x)|
|DESCRIBE db_all.vw_user0;|   (/)|   (x)|   (x)| |        (/)|         (/)|     
    (/)|
|DESCRIBE db_all.tbl_group12;|   (x)|   (/)|   (/)| |        (x)|         (/)|  
       (/)|
|DESCRIBE db_all.vw_group12;|   (x) |   (/)|   (/)| |        (/)|         (/)|  
       (/)|

 (!)  *Warning:*  Because views in Hive aren't present as physical files and 
access can't be granted using file system commands, then access to Hive views 
for *Storage Based Authorization* is based on the underlying tables used in 
view definition. For current example views were defined as selection over 
appropriate tables. 

*For documentation*
 Remove note from 
[https://drill.apache.org/docs/querying-the-information-schema/] that states:
{noformat}
NOTE

Currently, Drill only supports querying Drill views; Hive views are not yet 
supported.
{noformat}

  was:
Currently hive views cannot be queried from drill.

This Jira aims to add support for Hive views in Drill.

*Implementation details:*
 # Drill persists it's views metadata in file with suffix .view.drill using 
json format. For example: 

{noformat}
{
 "name" : "view_from_calcite_1_4",
 "sql" : "SELECT * FROM `cp`.`store.json`WHERE `store_id` = 0",
 "fields" : [ {
 "name" : "*",
 "type" : "ANY",
 "isNullable" : true
 } ],
 "workspaceSchemaPath" : [ "dfs", "tmp" ]
}
{noformat}
Later Drill parses the metadata and uses it to treat view names in SQL as a 
subquery.

      2. In Apache Hive metadata about views is stored in similar way to 
tables. Below is example from metastore.TBLS :

 
{noformat}
TBL_ID |CREATE_TIME |DB_ID |LAST_ACCESS_TIME |OWNER |RETENTION |SD_ID |TBL_NAME 
 |TBL_TYPE      |VIEW_EXPANDED_TEXT                         |
-------|------------|------|-----------------|------|----------|------|----------|--------------|-------------------------------------------|
2      |1542111078  |1     |0                |mapr  |0         |2     |cview    
 |VIRTUAL_VIEW  |SELECT COUNT(*) FROM `default`.`customers` |

{noformat}
      3. So in Hive metastore views are considered as tables of special type. 
And main benefit is that we also have expanded SQL definition of views (just 
like in view.drill files). Also reading of the metadata is already implemented 
in Drill with help of thrift Metastore API.

      4. To enable querying of Hive views we'll reuse existing code for Drill 
views as much as possible. First in *_HiveSchemaFactory.getDrillTable_* for 
_*HiveReadEntry*_ we'll convert the metadata to instance of _*View*_ (_which is 
actually model for data persisted in .view.drill files_) and then based on this 
instance return new _*DrillViewTable*_. Using this approach drill will handle 
hive views the same way as if it was initially defined in Drill and persisted 
in .view.drill file. 

     5. For conversion of Hive types: from _*FieldSchema*_ to _*RelDataType*_ 
we'll reuse existing code from _*DrillHiveTable*_, so the conversion 
functionality will be extracted and used for both (table and view) fields type 
conversions. 

*Security implications*

Consider simple example case where we have users, 
{code:java}
user0  user1 user2
           \ /
          group12
{code}
and  sample db where object names contains user or group who should access them 
     
{code:java}
db_all
    tbl_user0
    vw_user0
    tbl_group12
    vw_group12
{code}
There are two Hive authorization modes supported  by Drill - SQL Standart and 
Strorage Based  authorization. For SQL Standart authorization permissions were 
granted using SQL: 
{code:java}
SET ROLE admin;
GRANT SELECT ON db_all.tbl_user0 TO USER user0;
GRANT SELECT ON db_all.vw_user0 TO USER user0;
CREATE ROLE group12;
GRANT ROLE group12 TO USER user1;
GRANT ROLE group12 TO USER user2;
GRANT SELECT ON db_all.tbl_group12 TO ROLE group12;
GRANT SELECT ON db_all.vw_group12 TO ROLE group12;
{code}
And for Storage based authorization permissions were granted using commands: 
{code:java}
hadoop fs -chown user0:user0 /user/hive/warehouse/db_all.db/tbl_user0
hadoop fs -chmod 700 /user/hive/warehouse/db_all.db/tbl_user0
hadoop fs -chmod 750 /user/hive/warehouse/db_all.db/tbl_group12
hadoop fs -chown user1:group12 /user/hive/warehouse/db_all.db/tbl_group12{code}
 Then the following table shows us results of queries for both authorization 
models. 

                                                                    *SQL 
Standart                    Storage Based Authorization*
||SQL||user0||user1||user2||   ||user0||user1||user2||
|*Queries executed using Drill :*| | | | | | | |
|SHOW TABLES IN hive.db_all;|   all|    all|   all| |Accessibe tables + all 
views|Accessibe tables + all views|Accessibe tables + all views|
|SELECT * FROM hive.db_all.tbl_user0;|   (/)|   (x)|   (x)| |        (/)|       
 (x)|         (x)|
|SELECT * FROM hive.db_all.vw_user0;|   (/)|   (x)|   (x)| |        (/)|        
(x)|         (x)|
|SELECT * FROM hive.db_all.tbl_group12;|   (x)|   (/)|   (/)| |        (x)|     
   (/)|         (/)|
|SELECT * FROM hive.db_all.vw_group12;|   (x)|   (/)|   (/)| |        (x)|      
  (/)|         (/)|
|SELECT * FROM INFORMATION_SCHEMA.`TABLES`
 WHERE TABLE_SCHEMA = 'hive.db_all';|   all|  all|  all| |Accessibe tables + 
all views|Accessibe tables + all views|Accessibe tables + all views|
|DESCRIBE hive.db_all.tbl_user0;|   (/)|   (x)|   (x)| |        (/)|         
(x)|         (x)|
|DESCRIBE hive.db_all.vw_user0;|   (/)|   (x) |   (x)| |        (/)|         
(/)|         (/)|
|DESCRIBE hive.db_all.tbl_group12;|   (x)|   (/)|   (/)| |        (x)|         
(/) |         (/)|
|DESCRIBE hive.db_all.vw_group12;|   (x)|   (/)|   (/)| |        (/)|         
(/)|         (/)|
|*Queries executed using Beeline and Hive CLI:*| | | | | | | |
|SHOW TABLES IN db_all;|   all|   all|   all| |        all|          all|       
   all|
|SELECT * FROM hive.db_all.tbl_user0;|   (/)|   (x) |   (x)| |        (/)|      
   (x)|         (x)|
|SELECT * FROM db_all.vw_user0;|   (/)|   (x)|   (x)| |        (/)|         
(x)|         (x)|
|SELECT * FROM db_all.tbl_group12;|   (x)|   (/)|   (/)| |        (x)|         
(/)|         (/)|
|SELECT * FROM db_all.vw_group12;|   (x)|   (/)|   (/)| |        (x)|         
(/)|         (/)|
| | | | | | | | |
|DESCRIBE db_all.tbl_user0;|   (/)|   (x) |   (x)| |        (/)|         (x)|   
      (x)|
|DESCRIBE db_all.vw_user0;|   (/)|   (x)|   (x)| |        (/)|         (/)|     
    (/)|
|DESCRIBE db_all.tbl_group12;|   (x)|   (/)|   (/)| |        (x)|         (/)|  
       (/)|
|DESCRIBE db_all.vw_group12;|   (x) |   (/)|   (/)| |        (/)|         (/)|  
       (/)|

 (!)  *Warning:*  Because views in Hive aren't present as physical files and 
access can't be granted using file system commands, then access to Hive views 
for *Storage Based Authorization* is based on the underlying tables used in 
view definition. For current example views were defined as selection over 
appropriate tables. 

*For documentation*
Remove note from https://drill.apache.org/docs/querying-the-information-schema/ 
that states:
{noformat}
NOTE

Currently, Drill only supports querying Drill views; Hive views are not yet 
supported.
{noformat}





> Allow querying hive views in Drill
> ----------------------------------
>
>                 Key: DRILL-540
>                 URL: https://issues.apache.org/jira/browse/DRILL-540
>             Project: Apache Drill
>          Issue Type: New Feature
>          Components: Storage - Hive
>            Reporter: Ramana Inukonda Nagaraj
>            Assignee: Igor Guzenko
>            Priority: Major
>              Labels: doc-impacting, ready-to-commit
>             Fix For: 1.16.0
>
>
> Currently hive views cannot be queried from drill.
> This Jira aims to add support for Hive views in Drill.
> *Implementation details:*
>  # Drill persists it's views metadata in file with suffix .view.drill using 
> json format. For example: 
> {noformat}
> {
>  "name" : "view_from_calcite_1_4",
>  "sql" : "SELECT * FROM `cp`.`store.json`WHERE `store_id` = 0",
>  "fields" : [ {
>  "name" : "*",
>  "type" : "ANY",
>  "isNullable" : true
>  } ],
>  "workspaceSchemaPath" : [ "dfs", "tmp" ]
> }
> {noformat}
> Later Drill parses the metadata and uses it to treat view names in SQL as a 
> subquery.
>       2. In Apache Hive metadata about views is stored in similar way to 
> tables. Below is example from metastore.TBLS :
>  
> {noformat}
> TBL_ID |CREATE_TIME |DB_ID |LAST_ACCESS_TIME |OWNER |RETENTION |SD_ID 
> |TBL_NAME  |TBL_TYPE      |VIEW_EXPANDED_TEXT                         |
> -------|------------|------|-----------------|------|----------|------|----------|--------------|-------------------------------------------|
> 2      |1542111078  |1     |0                |mapr  |0         |2     |cview  
>    |VIRTUAL_VIEW  |SELECT COUNT(*) FROM `default`.`customers` |
> {noformat}
>       3. So in Hive metastore views are considered as tables of special type. 
> And main benefit is that we also have expanded SQL definition of views (just 
> like in view.drill files). Also reading of the metadata is already 
> implemented in Drill with help of thrift Metastore API.
>       4. To enable querying of Hive views we'll reuse existing code for Drill 
> views as much as possible. First in *_HiveSchemaFactory.getDrillTable_* for 
> _*HiveReadEntry*_ we'll convert the metadata to instance of _*View*_ (_which 
> is actually model for data persisted in .view.drill files_) and then based on 
> this instance return new _*DrillViewTable*_. Using this approach drill will 
> handle hive views the same way as if it was initially defined in Drill and 
> persisted in .view.drill file. 
>      5. For conversion of Hive types: from _*FieldSchema*_ to _*RelDataType*_ 
> we'll reuse existing code from _*DrillHiveTable*_, so the conversion 
> functionality will be extracted and used for both (table and view) fields 
> type conversions. 
> *Security implications*
> Consider simple example case where we have users, 
> {code:java}
> user0  user1 user2
>            \ /
>           group12
> {code}
> and  sample db where object names contains user or group who should access 
> them      
> {code:java}
> db_all
>     tbl_user0
>     vw_user0
>     tbl_group12
>     vw_group12
> {code}
> There are two Hive authorization modes supported  by Drill - SQL Standart and 
> Strorage Based  authorization. For SQL Standart authorization permissions 
> were granted using SQL: 
> {code:java}
> SET ROLE admin;
> GRANT SELECT ON db_all.tbl_user0 TO USER user0;
> GRANT SELECT ON db_all.vw_user0 TO USER user0;
> CREATE ROLE group12;
> GRANT ROLE group12 TO USER user1;
> GRANT ROLE group12 TO USER user2;
> GRANT SELECT ON db_all.tbl_group12 TO ROLE group12;
> GRANT SELECT ON db_all.vw_group12 TO ROLE group12;
> {code}
> And for Storage based authorization permissions were granted using commands: 
> {code:java}
> hadoop fs -chown user0:user0 /user/hive/warehouse/db_all.db/tbl_user0
> hadoop fs -chmod 700 /user/hive/warehouse/db_all.db/tbl_user0
> hadoop fs -chmod 750 /user/hive/warehouse/db_all.db/tbl_group12
> hadoop fs -chown user1:group12 
> /user/hive/warehouse/db_all.db/tbl_group12{code}
>  Then the following table shows us results of queries for both authorization 
> models. 
>                                                                               
>                           *SQL Standart     |            Storage Based 
> Authorization*
> ||SQL||user0||user1||user2||   ||user0||user1||user2||
> |*Queries executed using Drill :*| | | | | | | |
> |SHOW TABLES IN hive.db_all;|   all|    all|   all| |Accessibe tables + all 
> views|Accessibe tables + all views|Accessibe tables + all views|
> |SELECT * FROM hive.db_all.tbl_user0;|   (/)|   (x)|   (x)| |        (/)|     
>    (x)|         (x)|
> |SELECT * FROM hive.db_all.vw_user0;|   (/)|   (x)|   (x)| |        (/)|      
>   (x)|         (x)|
> |SELECT * FROM hive.db_all.tbl_group12;|   (x)|   (/)|   (/)| |        (x)|   
>      (/)|         (/)|
> |SELECT * FROM hive.db_all.vw_group12;|   (x)|   (/)|   (/)| |        (x)|    
>     (/)|         (/)|
> |SELECT * FROM INFORMATION_SCHEMA.`TABLES`
>  WHERE TABLE_SCHEMA = 'hive.db_all';|   all|  all|  all| |Accessibe tables + 
> all views|Accessibe tables + all views|Accessibe tables + all views|
> |DESCRIBE hive.db_all.tbl_user0;|   (/)|   (x)|   (x)| |        (/)|         
> (x)|         (x)|
> |DESCRIBE hive.db_all.vw_user0;|   (/)|   (x) |   (x)| |        (/)|         
> (/)|         (/)|
> |DESCRIBE hive.db_all.tbl_group12;|   (x)|   (/)|   (/)| |        (x)|        
>  (/) |         (/)|
> |DESCRIBE hive.db_all.vw_group12;|   (x)|   (/)|   (/)| |        (/)|         
> (/)|         (/)|
> |*Queries executed using Beeline and Hive CLI:*| | | | | | | |
> |SHOW TABLES IN db_all;|   all|   all|   all| |        all|          all|     
>      all|
> |SELECT * FROM hive.db_all.tbl_user0;|   (/)|   (x) |   (x)| |        (/)|    
>      (x)|         (x)|
> |SELECT * FROM db_all.vw_user0;|   (/)|   (x)|   (x)| |        (/)|         
> (x)|         (x)|
> |SELECT * FROM db_all.tbl_group12;|   (x)|   (/)|   (/)| |        (x)|        
>  (/)|         (/)|
> |SELECT * FROM db_all.vw_group12;|   (x)|   (/)|   (/)| |        (x)|         
> (/)|         (/)|
> | | | | | | | | |
> |DESCRIBE db_all.tbl_user0;|   (/)|   (x) |   (x)| |        (/)|         (x)| 
>         (x)|
> |DESCRIBE db_all.vw_user0;|   (/)|   (x)|   (x)| |        (/)|         (/)|   
>       (/)|
> |DESCRIBE db_all.tbl_group12;|   (x)|   (/)|   (/)| |        (x)|         
> (/)|         (/)|
> |DESCRIBE db_all.vw_group12;|   (x) |   (/)|   (/)| |        (/)|         
> (/)|         (/)|
>  (!)  *Warning:*  Because views in Hive aren't present as physical files and 
> access can't be granted using file system commands, then access to Hive views 
> for *Storage Based Authorization* is based on the underlying tables used in 
> view definition. For current example views were defined as selection over 
> appropriate tables. 
> *For documentation*
>  Remove note from 
> [https://drill.apache.org/docs/querying-the-information-schema/] that states:
> {noformat}
> NOTE
> Currently, Drill only supports querying Drill views; Hive views are not yet 
> supported.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to