[
https://issues.apache.org/jira/browse/SPARK-25784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apache Spark reassigned SPARK-25784:
------------------------------------
Assignee: Apache Spark
> Infer filters from constraints after rewriting predicate subquery
> -----------------------------------------------------------------
>
> Key: SPARK-25784
> URL: https://issues.apache.org/jira/browse/SPARK-25784
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.0.0
> Reporter: Yuming Wang
> Assignee: Apache Spark
> Priority: Major
>
> Benchmark:
> {code:scala}
> withTempView("t1", "t2") {
> withTempDir { dir =>
> spark.range(3000000)
> .selectExpr("cast(null as int) as c1", "if(id % 2 = 0, null, id) as
> c2", "id as c3")
> .coalesce(1)
> .orderBy("c2")
> .write
> .mode("overwrite")
> .option("parquet.block.size", 10485760)
> .parquet(dir.getCanonicalPath)
> spark.read.parquet(dir.getCanonicalPath).createTempView("t1")
> spark.read.parquet(dir.getCanonicalPath).createTempView("t2")
> Seq("c1", "c2", "c3").foreach { column =>
> val benchmark = new Benchmark(s"join key $column", 10)
> Seq(false, true).foreach { inferFilters =>
> benchmark.addCase(s"Is infer filters $inferFilters", numIters = 5) {
> _ =>
> withSQLConf(SQLConf.CONSTRAINT_PROPAGATION_ENABLED.key ->
> inferFilters.toString) {
> sql(s"select t1.* from t1 where t1.$column in (select $column
> from t2)").count()
> }
> }
> }
> benchmark.run()
> }
> }
> }
> {code}
> Benchmark result:
> {noformat}
> Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
> Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
> join key c1: Best/Avg Time(ms) Rate(M/s) Per
> Row(ns) Relative
> ------------------------------------------------------------------------------------------------
> Is infer filters false 2005 / 2163 0.0
> 200481431.0 1.0X
> Is infer filters true 190 / 207 0.0
> 18962935.7 10.6X
> Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
> Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
> join key c2: Best/Avg Time(ms) Rate(M/s) Per
> Row(ns) Relative
> ------------------------------------------------------------------------------------------------
> Is infer filters false 2368 / 2498 0.0
> 236803743.1 1.0X
> Is infer filters true 1234 / 1268 0.0
> 123443912.3 1.9X
> Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
> Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
> join key c3: Best/Avg Time(ms) Rate(M/s) Per
> Row(ns) Relative
> ------------------------------------------------------------------------------------------------
> Is infer filters false 2754 / 2907 0.0
> 275376009.7 1.0X
> Is infer filters true 2237 / 2255 0.0
> 223739457.8 1.2X
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]