[
https://issues.apache.org/jira/browse/SPARK-20700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Josh Rosen updated SPARK-20700:
-------------------------------
Summary: InferFiltersFromConstraints stackoverflows for query (v2) (was:
Expression canonicalization hits stack overflow for query)
> InferFiltersFromConstraints stackoverflows for query (v2)
> ---------------------------------------------------------
>
> Key: SPARK-20700
> URL: https://issues.apache.org/jira/browse/SPARK-20700
> Project: Spark
> Issue Type: Bug
> Components: Optimizer, SQL
> Affects Versions: 2.2.0
> Reporter: Josh Rosen
>
> The following (complicated) query eventually fails with a stack overflow
> during optimization:
> {code}
> CREATE TEMPORARY VIEW table_5(varchar0002_col_1, smallint_col_2, float_col_3,
> int_col_4, string_col_5, timestamp_col_6, string_col_7) AS VALUES
> ('68', CAST(NULL AS SMALLINT), CAST(244.90413 AS FLOAT), -137, '571',
> TIMESTAMP('2015-01-14 00:00:00.0'), '947'),
> ('82', CAST(213 AS SMALLINT), CAST(53.184647 AS FLOAT), -724, '-278',
> TIMESTAMP('1999-08-15 00:00:00.0'), '437'),
> ('-7', CAST(-15 AS SMALLINT), CAST(NULL AS FLOAT), -890, '778',
> TIMESTAMP('1991-05-23 00:00:00.0'), '630'),
> ('22', CAST(676 AS SMALLINT), CAST(385.27386 AS FLOAT), CAST(NULL AS INT),
> '-10', TIMESTAMP('1996-09-29 00:00:00.0'), '641'),
> ('16', CAST(430 AS SMALLINT), CAST(187.23717 AS FLOAT), 989, CAST(NULL AS
> STRING), TIMESTAMP('2024-04-21 00:00:00.0'), '-234'),
> ('83', CAST(760 AS SMALLINT), CAST(-695.45386 AS FLOAT), -970, '330',
> CAST(NULL AS TIMESTAMP), '-740'),
> ('68', CAST(-930 AS SMALLINT), CAST(NULL AS FLOAT), -915, '-766', CAST(NULL
> AS TIMESTAMP), CAST(NULL AS STRING)),
> ('48', CAST(692 AS SMALLINT), CAST(-220.59615 AS FLOAT), 940, '-514',
> CAST(NULL AS TIMESTAMP), '181'),
> ('21', CAST(44 AS SMALLINT), CAST(NULL AS FLOAT), -175, '761',
> TIMESTAMP('2016-06-30 00:00:00.0'), '487'),
> ('50', CAST(953 AS SMALLINT), CAST(837.2948 AS FLOAT), 705, CAST(NULL AS
> STRING), CAST(NULL AS TIMESTAMP), '-62');
> CREATE VIEW bools(a, b) as values (1, true), (1, true), (1, null);
> SELECT
> AVG(-13) OVER (ORDER BY COUNT(t1.smallint_col_2) DESC ROWS 27 PRECEDING ) AS
> float_col,
> COUNT(t1.smallint_col_2) AS int_col
> FROM table_5 t1
> INNER JOIN (
> SELECT
> (MIN(-83) OVER (PARTITION BY t2.a ORDER BY t2.a, (t1.int_col_4) *
> (t1.int_col_4) ROWS BETWEEN CURRENT ROW AND 15 FOLLOWING)) NOT IN (-222, 928)
> AS boolean_col,
> t2.a,
> (t1.int_col_4) * (t1.int_col_4) AS int_col
> FROM table_5 t1
> LEFT JOIN bools t2 ON (t2.a) = (t1.int_col_4)
> WHERE
> (t1.smallint_col_2) > (t1.smallint_col_2)
> GROUP BY
> t2.a,
> (t1.int_col_4) * (t1.int_col_4)
> HAVING
> ((t1.int_col_4) * (t1.int_col_4)) IN ((t1.int_col_4) * (t1.int_col_4),
> SUM(t1.int_col_4))
> ) t2 ON (((t2.int_col) = (t1.int_col_4)) AND ((t2.a) = (t1.int_col_4))) AND
> ((t2.a) = (t1.smallint_col_2));
> {code}
> (I haven't tried to minimize this failing case yet).
> Based on sampled jstacks from the driver, it looks like the query might be
> repeatedly inferring filters from constraints and then pruning those filters.
> Here's part of the stack at the point where it stackoverflows:
> {code}
> [... repeats ...]
> at
> org.apache.spark.sql.catalyst.expressions.Canonicalize$.org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative(Canonicalize.scala:50)
> at
> org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50)
> at
> org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50)
> at
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
> at
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
> at scala.collection.immutable.List.flatMap(List.scala:344)
> at
> org.apache.spark.sql.catalyst.expressions.Canonicalize$.org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative(Canonicalize.scala:50)
> at
> org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50)
> at
> org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50)
> at
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
> at
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
> at scala.collection.immutable.List.flatMap(List.scala:344)
> at
> org.apache.spark.sql.catalyst.expressions.Canonicalize$.org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative(Canonicalize.scala:50)
> at
> org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50)
> at
> org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50)
> at
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
> at
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
> at scala.collection.immutable.List.flatMap(List.scala:344)
> at
> org.apache.spark.sql.catalyst.expressions.Canonicalize$.org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative(Canonicalize.scala:50)
> at
> org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50)
> at
> org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50)
> at
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
> at
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
> at scala.collection.immutable.List.flatMap(List.scala:344)
> at
> org.apache.spark.sql.catalyst.expressions.Canonicalize$.org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative(Canonicalize.scala:50)
> at
> org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50)
> at
> org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50)
> at
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
> at
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
> at scala.collection.immutable.List.flatMap(List.scala:344)
> at
> org.apache.spark.sql.catalyst.expressions.Canonicalize$.org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative(Canonicalize.scala:50)
> at
> org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50)
> at
> org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50)
> at
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
> at
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
> at scala.collection.immutable.List.flatMap(List.scala:344)
> at
> org.apache.spark.sql.catalyst.expressions.Canonicalize$.org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative(Canonicalize.scala:50)
> at
> org.apache.spark.sql.catalyst.expressions.Canonicalize$.orderCommutative(Canonicalize.scala:58)
> at
> org.apache.spark.sql.catalyst.expressions.Canonicalize$.expressionReorder(Canonicalize.scala:63)
> at
> org.apache.spark.sql.catalyst.expressions.Canonicalize$.execute(Canonicalize.scala:36)
> at
> org.apache.spark.sql.catalyst.expressions.Expression.canonicalized$lzycompute(Expression.scala:158)
> - locked <0x00000007a298b940> (a
> org.apache.spark.sql.catalyst.expressions.Multiply)
> at
> org.apache.spark.sql.catalyst.expressions.Expression.canonicalized(Expression.scala:156)
> at
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$1.apply(Expression.scala:157)
> at
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$1.apply(Expression.scala:157)
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> [...]
> {code}
> I suspect this is similar to SPARK-17733, another bug where
> {{InferFiltersFromConstraints}}, so I'll cc [~jiangxb1987] and [~sameerag]
> who worked on that earlier fix.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]