[ 
https://issues.apache.org/jira/browse/SPARK-17983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15586116#comment-15586116
 ] 

Michael Allman commented on SPARK-17983:
----------------------------------------

Speaking strictly from the POV of parquet predicate pushdown, I don't see how 
we can get away from doing that in a case-sensitive matter—at least not if it's 
part of planning (optimization). Pushing down a filter with the wrong case 
column name just doesn't work. The same can be said of projection pushdown, 
though I believe that happens as part of execution.

> Can't filter over mixed case parquet columns of converted Hive tables
> ---------------------------------------------------------------------
>
>                 Key: SPARK-17983
>                 URL: https://issues.apache.org/jira/browse/SPARK-17983
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 2.1.0
>            Reporter: Eric Liang
>            Priority: Critical
>
> We should probably revive https://github.com/apache/spark/pull/14750 in order 
> to fix this issue and related classes of issues.
> The only other alternatives are (1) reconciling on-disk schemas with 
> metastore schema at planning time, which seems pretty messy, and (2) fixing 
> all the datasources to support case-insensitive matching, which also has 
> issues.
> Reproduction:
> {code}
>   private def setupPartitionedTable(tableName: String, dir: File): Unit = {
>     spark.range(5).selectExpr("id as normalCol", "id as partCol1", "id as 
> partCol2").write
>       .partitionBy("partCol1", "partCol2")
>       .mode("overwrite")
>       .parquet(dir.getAbsolutePath)
>     spark.sql(s"""
>       |create external table $tableName (normalCol long)
>       |partitioned by (partCol1 int, partCol2 int)
>       |stored as parquet
>       |location "${dir.getAbsolutePath}"""".stripMargin)
>     spark.sql(s"msck repair table $tableName")
>   }
>   test("filter by mixed case col") {
>     withTable("test") {
>       withTempDir { dir =>
>         setupPartitionedTable("test", dir)
>         val df = spark.sql("select * from test where normalCol = 3")
>         assert(df.count() == 1)
>       }
>     }
>   }
> {code}
> cc [~cloud_fan]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to