[
https://issues.apache.org/jira/browse/SPARK-36027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Saurabh Chawla updated SPARK-36027:
-----------------------------------
Description:
In case of Filter having child as TypedFilter, Pushdown of Filters does not
take place
{code:java}
scala> def testUdfFunction(r: String): Boolean = {
| r.equals("hello")
| }
testUdfFunction: (r: String)Boolean
val df= spark.read.parquet("/testDir/testParquetSize/Parquetgzip/")
df: org.apache.spark.sql.DataFrame = [_1: string, _2: string ... 1 more field]
{code}
df.filter(x =>
testUdfFunction(x.getAs("_1"))).filter("_2<='id103855'").queryExecution.executedPlan
{code:java}
Filter (isnotnull(_2#1) AND (_2#1 <= id103855))
+- *(1) Filter
$line20.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$Lambda$3184/[email protected]
+- *(1) ColumnarToRow
+- FileScan parquet [_1#0,_2#1,_3#2] Batched: true, DataFilters: [], Format:
Parquet, Location:
InMemoryFileIndex[file:/testDir/testParquetSize/Parquetgzip], PartitionFilters:
[], PushedFilters: [], ReadSchema: struct<_1:string,_2:string,_3:string>
{code}
df.filter(x =>
testUdfFunction(x.getAs("_1"))).filter("_2<='id103855'").queryExecution.optimizedPlan
{code:java}
Filter (isnotnull(_2#1) AND (_2#1 <= id103855))
+- TypedFilter
$line22.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$Lambda$3191/320569017@37a2806c,
interface org.apache.spark.sql.Row, [StructField(_1,StringType,true),
StructField(_2,StringType,true), StructField(_3,StringType,true)],
createexternalrow(_1#0.toString, _2#1.toString, _3#2.toString,
StructField(_1,StringType,true), StructField(_2,StringType,true),
StructField(_3,StringType,true))
+- Relation[_1#0,_2#1,_3#2] parquet{code}
There is need to add this change to push down the filter for this scenario
was:
In case of Filter having child as TypedFilter, Pushdown of Filters does not
take place
{code:java}
scala> def testUdfFunction(r: String): Boolean = {
| r.equals("hello")
| }
testUdfFunction: (r: String)Boolean
val df= spark.read.parquet("/testDir/testParquetSize/Parquetgzip/")
df: org.apache.spark.sql.DataFrame = [_1: string, _2: string ... 1 more field]
{code}
df.filter(x =>
testUdfFunction(x.getAs("_1"))).filter("_2<='id103855'").queryExecution.executedPlan
{code:java}
Filter (isnotnull(_2#1) AND (_2#1 <= id103855))
+- *(1) Filter
$line20.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$Lambda$3184/[email protected]
+- *(1) ColumnarToRow
+- FileScan parquet [_1#0,_2#1,_3#2] Batched: true, DataFilters: [], Format:
Parquet, Location:
InMemoryFileIndex[file:/Users/saurabh.chawla/testDir/testParquetSize/Parquetgzip],
PartitionFilters: [], PushedFilters: [], ReadSchema:
struct<_1:string,_2:string,_3:string>
{code}
df.filter(x =>
testUdfFunction(x.getAs("_1"))).filter("_2<='id103855'").queryExecution.optimizedPlan
{code:java}
Filter (isnotnull(_2#1) AND (_2#1 <= id103855))
+- TypedFilter
$line22.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$Lambda$3191/320569017@37a2806c,
interface org.apache.spark.sql.Row, [StructField(_1,StringType,true),
StructField(_2,StringType,true), StructField(_3,StringType,true)],
createexternalrow(_1#0.toString, _2#1.toString, _3#2.toString,
StructField(_1,StringType,true), StructField(_2,StringType,true),
StructField(_3,StringType,true))
+- Relation[_1#0,_2#1,_3#2] parquet{code}
There is need to add this change to push down the filter for this scenario
> Push down Filter in case of filter having child as TypedFilter
> --------------------------------------------------------------
>
> Key: SPARK-36027
> URL: https://issues.apache.org/jira/browse/SPARK-36027
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.1.2
> Reporter: Saurabh Chawla
> Priority: Major
>
> In case of Filter having child as TypedFilter, Pushdown of Filters does not
> take place
>
> {code:java}
> scala> def testUdfFunction(r: String): Boolean = {
> | r.equals("hello")
> | }
> testUdfFunction: (r: String)Boolean
> val df= spark.read.parquet("/testDir/testParquetSize/Parquetgzip/")
> df: org.apache.spark.sql.DataFrame = [_1: string, _2: string ... 1 more field]
> {code}
>
> df.filter(x =>
> testUdfFunction(x.getAs("_1"))).filter("_2<='id103855'").queryExecution.executedPlan
>
> {code:java}
> Filter (isnotnull(_2#1) AND (_2#1 <= id103855))
> +- *(1) Filter
> $line20.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$Lambda$3184/[email protected]
> +- *(1) ColumnarToRow
> +- FileScan parquet [_1#0,_2#1,_3#2] Batched: true, DataFilters: [], Format:
> Parquet, Location:
> InMemoryFileIndex[file:/testDir/testParquetSize/Parquetgzip],
> PartitionFilters: [], PushedFilters: [], ReadSchema:
> struct<_1:string,_2:string,_3:string>
>
> {code}
> df.filter(x =>
> testUdfFunction(x.getAs("_1"))).filter("_2<='id103855'").queryExecution.optimizedPlan
> {code:java}
> Filter (isnotnull(_2#1) AND (_2#1 <= id103855))
> +- TypedFilter
> $line22.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$Lambda$3191/320569017@37a2806c,
> interface org.apache.spark.sql.Row, [StructField(_1,StringType,true),
> StructField(_2,StringType,true), StructField(_3,StringType,true)],
> createexternalrow(_1#0.toString, _2#1.toString, _3#2.toString,
> StructField(_1,StringType,true), StructField(_2,StringType,true),
> StructField(_3,StringType,true))
> +- Relation[_1#0,_2#1,_3#2] parquet{code}
>
> There is need to add this change to push down the filter for this scenario
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]