[jira] [Commented] (SPARK-11087) spark.sql.orc.filterPushdown does not work, No ORC pushdown predicate

patcharee (JIRA) Fri, 06 Nov 2015 01:07:52 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-11087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14993398#comment-14993398
 ]


patcharee commented on SPARK-11087:
-----------------------------------

Hi [~zzhan], the problem actually happens when I generates orc file by 
"saveAsTable()" method (because I need my orc file to be accessible through 
hive). See below>>

hive> create external table peopletable(name string, address string, phone 
string) partitioned by(age int) stored as orc location 
'/user/patcharee/peopletable';

On spark shell local mode>>
2501  sql("set hive.exec.dynamic.partition.mode=nonstrict")
2502  sqlContext.setConf("spark.sql.orc.filterPushdown", "true")
2503  case class Person(name: String, age: Int, address: String, phone: String)
2504   val records = (1 to 100).map { i => Person(s"name_$i", i, s"address_$i", 
s"phone_$i" ) }
2505  
sc.parallelize(records).toDF().write.format("orc").mode("Append").partitionBy("age").saveAsTable("peopletable")
2506  val people = sqlContext.read.format("orc").load("peopletable")
2507  people.registerTempTable("people")
2508  sqlContext.sql("SELECT * FROM people WHERE age = 20 and name = 
'name_20'").count

It is true that if the orc file is generated by "save()" method, the predicate 
will be generated. But it is not for the case "saveAsTable()" method.

[~zzhan] can you please suggest how to fix this?

> spark.sql.orc.filterPushdown does not work, No ORC pushdown predicate
> ---------------------------------------------------------------------
>
>                 Key: SPARK-11087
>                 URL: https://issues.apache.org/jira/browse/SPARK-11087
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.1
>         Environment: orc file version 0.12 with HIVE_8732
> hive version 1.2.1.2.3.0.0-2557
>            Reporter: patcharee
>            Priority: Minor
>
> I have an external hive table stored as partitioned orc file (see the table 
> schema below). I tried to query from the table with where clause>
> hiveContext.setConf("spark.sql.orc.filterPushdown", "true")
> hiveContext.sql("select u, v from 4D where zone = 2 and x = 320 and y = 
> 117")). 
> But from the log file with debug logging level on, the ORC pushdown predicate 
> was not generated. 
> Unfortunately my table was not sorted when I inserted the data, but I 
> expected the ORC pushdown predicate should be generated (because of the where 
> clause) though
> Table schema
> ================================
> hive> describe formatted 4D;
> OK
> # col_name                    data_type               comment             
>                
> date                  int                                         
> hh                    int                                         
> x                     int                                         
> y                     int                                         
> height                float                                       
> u                     float                                       
> v                     float                                       
> w                     float                                       
> ph                    float                                       
> phb                   float                                       
> t                     float                                       
> p                     float                                       
> pb                    float                                       
> qvapor                float                                       
> qgraup                float                                       
> qnice                 float                                       
> qnrain                float                                       
> tke_pbl               float                                       
> el_pbl                float                                       
> qcloud                float                                       
>                
> # Partition Information                
> # col_name                    data_type               comment             
>                
> zone                  int                                         
> z                     int                                         
> year                  int                                         
> month                 int                                         
>                
> # Detailed Table Information           
> Database:             default                  
> Owner:                patcharee                
> CreateTime:           Thu Jul 09 16:46:54 CEST 2015    
> LastAccessTime:       UNKNOWN                  
> Protect Mode:         None                     
> Retention:            0                        
> Location:             hdfs://helmhdfs/apps/hive/warehouse/wrf_tables/4D       
>  
> Table Type:           EXTERNAL_TABLE           
> Table Parameters:              
>       EXTERNAL                TRUE                
>       comment                 this table is imported from rwf_data/*/wrf/*
>       last_modified_by        patcharee           
>       last_modified_time      1439806692          
>       orc.compress            ZLIB                
>       transient_lastDdlTime   1439806692          
>                
> # Storage Information          
> SerDe Library:        org.apache.hadoop.hive.ql.io.orc.OrcSerde        
> InputFormat:          org.apache.hadoop.hive.ql.io.orc.OrcInputFormat  
> OutputFormat:         org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat        
>  
> Compressed:           No                       
> Num Buckets:          -1                       
> Bucket Columns:       []                       
> Sort Columns:         []                       
> Storage Desc Params:           
>       serialization.format    1                   
> Time taken: 0.388 seconds, Fetched: 58 row(s)
> ================================
> Data was inserted into this table by another spark job>
> df.write.format("org.apache.spark.sql.hive.orc.DefaultSource").mode(org.apache.spark.sql.SaveMode.Append).partitionBy("zone","z","year","month").saveAsTable("4D")



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-11087) spark.sql.orc.filterPushdown does not work, No ORC pushdown predicate

Reply via email to