Yuming Wang created SPARK-25784: ----------------------------------- Summary: Infer filters from constraints after rewriting predicate subquery Key: SPARK-25784 URL: https://issues.apache.org/jira/browse/SPARK-25784 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Yuming Wang
Benchmark: {code:scala} withTempView("t1", "t2") { withTempDir { dir => spark.range(3000000) .selectExpr("cast(null as int) as c1", "if(id % 2 = 0, null, id) as c2", "id as c3") .coalesce(1) .orderBy("c2") .write .mode("overwrite") .option("parquet.block.size", 10485760) .parquet(dir.getCanonicalPath) spark.read.parquet(dir.getCanonicalPath).createTempView("t1") spark.read.parquet(dir.getCanonicalPath).createTempView("t2") Seq("c1", "c2", "c3").foreach { column => val benchmark = new Benchmark(s"join key $column", 10) Seq(false, true).foreach { inferFilters => benchmark.addCase(s"Is infer filters $inferFilters", numIters = 5) { _ => withSQLConf(SQLConf.CONSTRAINT_PROPAGATION_ENABLED.key -> inferFilters.toString) { sql(s"select t1.* from t1 where t1.$column in (select $column from t2)").count() } } } benchmark.run() } } } {code} Benchmark result: {noformat} Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6 Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz join key c1: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Is infer filters false 2005 / 2163 0.0 200481431.0 1.0X Is infer filters true 190 / 207 0.0 18962935.7 10.6X Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6 Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz join key c2: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Is infer filters false 2368 / 2498 0.0 236803743.1 1.0X Is infer filters true 1234 / 1268 0.0 123443912.3 1.9X Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6 Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz join key c3: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Is infer filters false 2754 / 2907 0.0 275376009.7 1.0X Is infer filters true 2237 / 2255 0.0 223739457.8 1.2X {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org