[ https://issues.apache.org/jira/browse/SPARK-35316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yuming Wang updated SPARK-35316: -------------------------------- Description: It will not pushdown filters for In/InSet predicates: {code:scala} spark.range(50).selectExpr("cast(id as int) as id").write.mode("overwrite").parquet("/tmp/parquet/t1") spark.read.parquet("/tmp/parquet/t1").where("id in (1L, 2L, 4L)").explain spark.read.parquet("/tmp/parquet/t1").where("id = 1L or id = 2L or id = 4L").explain {code} {noformat} == Physical Plan == *(1) Filter cast(id#5 as bigint) IN (1,2,4) +- *(1) ColumnarToRow +- FileScan parquet [id#5] Batched: true, DataFilters: [cast(id#5 as bigint) IN (1,2,4)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/parquet/t1], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<id:int> == Physical Plan == *(1) Filter (((id#7 = 1) OR (id#7 = 2)) OR (id#7 = 4)) +- *(1) ColumnarToRow +- FileScan parquet [id#7] Batched: true, DataFilters: [(((id#7 = 1) OR (id#7 = 2)) OR (id#7 = 4))], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/parquet/t1], PartitionFilters: [], PushedFilters: [Or(Or(EqualTo(id,1),EqualTo(id,2)),EqualTo(id,4))], ReadSchema: struct<id:int> {noformat} was: In/InSet missing this optimization: {code:scala} spark.range(50).selectExpr("cast(id as int) as id").write.mode("overwrite").parquet("/tmp/parquet/t1") spark.read.parquet("/tmp/parquet/t1").where("id in (1L, 2L, 4L)").explain spark.read.parquet("/tmp/parquet/t1").where("id = 1L or id = 2L or id = 4L").explain {code} {noformat} == Physical Plan == *(1) Filter cast(id#5 as bigint) IN (1,2,4) +- *(1) ColumnarToRow +- FileScan parquet [id#5] Batched: true, DataFilters: [cast(id#5 as bigint) IN (1,2,4)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/parquet/t1], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<id:int> == Physical Plan == *(1) Filter (((id#7 = 1) OR (id#7 = 2)) OR (id#7 = 4)) +- *(1) ColumnarToRow +- FileScan parquet [id#7] Batched: true, DataFilters: [(((id#7 = 1) OR (id#7 = 2)) OR (id#7 = 4))], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/parquet/t1], PartitionFilters: [], PushedFilters: [Or(Or(EqualTo(id,1),EqualTo(id,2)),EqualTo(id,4))], ReadSchema: struct<id:int> {noformat} > UnwrapCastInBinaryComparison support In/InSet predicate > ------------------------------------------------------- > > Key: SPARK-35316 > URL: https://issues.apache.org/jira/browse/SPARK-35316 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.2.0 > Reporter: Yuming Wang > Priority: Major > > It will not pushdown filters for In/InSet predicates: > {code:scala} > spark.range(50).selectExpr("cast(id as int) as > id").write.mode("overwrite").parquet("/tmp/parquet/t1") > spark.read.parquet("/tmp/parquet/t1").where("id in (1L, 2L, 4L)").explain > spark.read.parquet("/tmp/parquet/t1").where("id = 1L or id = 2L or id = > 4L").explain > {code} > {noformat} > == Physical Plan == > *(1) Filter cast(id#5 as bigint) IN (1,2,4) > +- *(1) ColumnarToRow > +- FileScan parquet [id#5] Batched: true, DataFilters: [cast(id#5 as > bigint) IN (1,2,4)], Format: Parquet, Location: InMemoryFileIndex(1 > paths)[file:/tmp/parquet/t1], PartitionFilters: [], PushedFilters: [], > ReadSchema: struct<id:int> > == Physical Plan == > *(1) Filter (((id#7 = 1) OR (id#7 = 2)) OR (id#7 = 4)) > +- *(1) ColumnarToRow > +- FileScan parquet [id#7] Batched: true, DataFilters: [(((id#7 = 1) OR > (id#7 = 2)) OR (id#7 = 4))], Format: Parquet, Location: InMemoryFileIndex(1 > paths)[file:/tmp/parquet/t1], PartitionFilters: [], PushedFilters: > [Or(Or(EqualTo(id,1),EqualTo(id,2)),EqualTo(id,4))], ReadSchema: > struct<id:int> > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org