[ 
https://issues.apache.org/jira/browse/SPARK-12998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated SPARK-12998:
-------------------------------------
    Description: 
When a user connects via spark-thrift server to execute SQL, it does not enable 
PPD with ORC. It ends up creating MetastoreRelation which does not have ORC 
PPD.  Purpose of this JIRA is to convert MetastoreRelation to OrcRelation in 
HiveMetastoreCatalog, so that users can benefit from PPD even when connecting 
to spark-thrift server.

For example, "explain select count(1) from  tpch_flat_orc_1000.lineitem where 
l_shipdate = '1990-04-18'", current plan is 

+------------------------------------------------------------------------------------------------------------------+--+
|                                                       plan                    
                                   |
+------------------------------------------------------------------------------------------------------------------+--+
| == Physical Plan ==                                                           
                                   |
| TungstenAggregate(key=[], functions=[(count(1),mode=Final,isDistinct=false)], 
output=[_c0#17L])                  |
| +- Exchange SinglePartition, None                                             
                                   |
|    +- WholeStageCodegen                                                       
                                   |
|       :  +- TungstenAggregate(key=[], 
functions=[(count(1),mode=Partial,isDistinct=false)], output=[count#20L])  |
|       :     +- Project                                                        
                                   |
|       :        +- Filter (l_shipdate#11 = 1990-04-18)                         
                                   |
|       :           +- INPUT                                                    
                                   |
|       +- HiveTableScan [l_shipdate#11], MetastoreRelation tpch_1000, 
lineitem, None                     |
+------------------------------------------------------------------------------------------------------------------+--+

It would be good to change it to OrcRelation to do PPD with ORC, which reduces 
the runtime by large margin.
 
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
|                                                                               
              plan                                                              
                                |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
| == Physical Plan ==                                                           
                                                                                
                                |
| TungstenAggregate(key=[], functions=[(count(1),mode=Final,isDistinct=false)], 
output=[_c0#70L])                                                               
                                |
| +- Exchange SinglePartition, None                                             
                                                                                
                                |
|    +- WholeStageCodegen                                                       
                                                                                
                                |
|       :  +- TungstenAggregate(key=[], 
functions=[(count(1),mode=Partial,isDistinct=false)], output=[count#106L])      
                                                                        |
|       :     +- Project                                                        
                                                                                
                                |
|       :        +- Filter (_col10#64 = 1990-04-18)                             
                                                                                
                                |
|       :           +- INPUT                                                    
                                                                                
                                |
|       +- Scan OrcRelation[_col10#64] InputPaths: 
hdfs://nn:8020/apps/hive/warehouse/tpch_1000.db/lineitem, PushedFilters: 
[EqualTo(_col10,1990-04-18)]  |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+

  was:When a user connects via spark-thrift server to execute SQL, it does not 
enable PPD with ORC. It ends up creating MetastoreRelation which does not have 
ORC PPD.  Purpose of this JIRA is to convert MetastoreRelation to OrcRelation 
in HiveMetastoreCatalog, so that users can benefit from PPD even when 
connecting to spark-thrift server.


> Enable OrcRelation when connecting via spark thrift server
> ----------------------------------------------------------
>
>                 Key: SPARK-12998
>                 URL: https://issues.apache.org/jira/browse/SPARK-12998
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Rajesh Balamohan
>
> When a user connects via spark-thrift server to execute SQL, it does not 
> enable PPD with ORC. It ends up creating MetastoreRelation which does not 
> have ORC PPD.  Purpose of this JIRA is to convert MetastoreRelation to 
> OrcRelation in HiveMetastoreCatalog, so that users can benefit from PPD even 
> when connecting to spark-thrift server.
> For example, "explain select count(1) from  tpch_flat_orc_1000.lineitem where 
> l_shipdate = '1990-04-18'", current plan is 
> +------------------------------------------------------------------------------------------------------------------+--+
> |                                                       plan                  
>                                      |
> +------------------------------------------------------------------------------------------------------------------+--+
> | == Physical Plan ==                                                         
>                                      |
> | TungstenAggregate(key=[], 
> functions=[(count(1),mode=Final,isDistinct=false)], output=[_c0#17L])         
>          |
> | +- Exchange SinglePartition, None                                           
>                                      |
> |    +- WholeStageCodegen                                                     
>                                      |
> |       :  +- TungstenAggregate(key=[], 
> functions=[(count(1),mode=Partial,isDistinct=false)], output=[count#20L])  |
> |       :     +- Project                                                      
>                                      |
> |       :        +- Filter (l_shipdate#11 = 1990-04-18)                       
>                                      |
> |       :           +- INPUT                                                  
>                                      |
> |       +- HiveTableScan [l_shipdate#11], MetastoreRelation tpch_1000, 
> lineitem, None                     |
> +------------------------------------------------------------------------------------------------------------------+--+
> It would be good to change it to OrcRelation to do PPD with ORC, which 
> reduces the runtime by large margin.
>  
> +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
> |                                                                             
>                 plan                                                          
>                                     |
> +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
> | == Physical Plan ==                                                         
>                                                                               
>                                     |
> | TungstenAggregate(key=[], 
> functions=[(count(1),mode=Final,isDistinct=false)], output=[_c0#70L])         
>                                                                               
>         |
> | +- Exchange SinglePartition, None                                           
>                                                                               
>                                     |
> |    +- WholeStageCodegen                                                     
>                                                                               
>                                     |
> |       :  +- TungstenAggregate(key=[], 
> functions=[(count(1),mode=Partial,isDistinct=false)], output=[count#106L])    
>                                                                           |
> |       :     +- Project                                                      
>                                                                               
>                                     |
> |       :        +- Filter (_col10#64 = 1990-04-18)                           
>                                                                               
>                                     |
> |       :           +- INPUT                                                  
>                                                                               
>                                     |
> |       +- Scan OrcRelation[_col10#64] InputPaths: 
> hdfs://nn:8020/apps/hive/warehouse/tpch_1000.db/lineitem, PushedFilters: 
> [EqualTo(_col10,1990-04-18)]  |
> +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to