[
https://issues.apache.org/jira/browse/SPARK-25207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16593110#comment-16593110
]
yucai commented on SPARK-25207:
-------------------------------
[~dongjoon] , sorry if I am confusing you.
This bug is created for master branch, because it has SPARK-25132 and
-SPARK-24716- already.
So it has no below issue actually.
{code:java}
scala> sql("select * from t").show // Parquet returns NULL for `ID` because
it has `id`.
+----+
| ID|
+----+
|null|
|null|
|null|
|null|
|null|
|null|
|null|
|null|
|null|
|null|
+----+
scala> sql("select * from t where id > 0").show // `NULL > 0` is `false`.
+---+
| ID|
+---+
+---+
{code}
> Case-insensitve field resolution for filter pushdown when reading Parquet
> -------------------------------------------------------------------------
>
> Key: SPARK-25207
> URL: https://issues.apache.org/jira/browse/SPARK-25207
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.4.0
> Reporter: yucai
> Priority: Major
> Labels: Parquet
> Attachments: image.png
>
>
> Currently, filter pushdown will not work if Parquet schema and Hive metastore
> schema are in different letter cases even spark.sql.caseSensitive is false.
> Like the below case:
> {code:java}
> spark.range(10).write.parquet("/tmp/data")
> sql("DROP TABLE t")
> sql("CREATE TABLE t (ID LONG) USING parquet LOCATION '/tmp/data'")
> sql("select * from t where id > 0").show{code}
> -No filter will be pushed down.-
> {code}
> scala> sql("select * from t where id > 0").explain // Filters are pushed
> with `ID`
> == Physical Plan ==
> *(1) Project [ID#90L]
> +- *(1) Filter (isnotnull(id#90L) && (id#90L > 0))
> +- *(1) FileScan parquet default.t[ID#90L] Batched: true, Format: Parquet,
> Location: InMemoryFileIndex[file:/tmp/data], PartitionFilters: [],
> PushedFilters: [IsNotNull(ID), GreaterThan(ID,0)], ReadSchema:
> struct<ID:bigint>
> scala> sql("select * from t").show // Parquet returns NULL for `ID`
> because it has `id`.
> +----+
> | ID|
> +----+
> |null|
> |null|
> |null|
> |null|
> |null|
> |null|
> |null|
> |null|
> |null|
> |null|
> +----+
> scala> sql("select * from t where id > 0").show // `NULL > 0` is `false`.
> +---+
> | ID|
> +---+
> +---+
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]