Josh Rosen created SPARK-20700:
----------------------------------
Summary: Expression canonicalization hits stack overflow for query
Key: SPARK-20700
URL: https://issues.apache.org/jira/browse/SPARK-20700
Project: Spark
Issue Type: Bug
Components: Optimizer, SQL
Affects Versions: 2.2.0
Reporter: Josh Rosen
The following (complicated) query eventually fails with a stack overflow during
optimization:
{code}
CREATE TEMPORARY VIEW table_5(varchar0002_col_1, smallint_col_2, float_col_3,
int_col_4, string_col_5, timestamp_col_6, string_col_7) AS VALUES
('68', CAST(NULL AS SMALLINT), CAST(244.90413 AS FLOAT), -137, '571',
TIMESTAMP('2015-01-14 00:00:00.0'), '947'),
('82', CAST(213 AS SMALLINT), CAST(53.184647 AS FLOAT), -724, '-278',
TIMESTAMP('1999-08-15 00:00:00.0'), '437'),
('-7', CAST(-15 AS SMALLINT), CAST(NULL AS FLOAT), -890, '778',
TIMESTAMP('1991-05-23 00:00:00.0'), '630'),
('22', CAST(676 AS SMALLINT), CAST(385.27386 AS FLOAT), CAST(NULL AS INT),
'-10', TIMESTAMP('1996-09-29 00:00:00.0'), '641'),
('16', CAST(430 AS SMALLINT), CAST(187.23717 AS FLOAT), 989, CAST(NULL AS
STRING), TIMESTAMP('2024-04-21 00:00:00.0'), '-234'),
('83', CAST(760 AS SMALLINT), CAST(-695.45386 AS FLOAT), -970, '330',
CAST(NULL AS TIMESTAMP), '-740'),
('68', CAST(-930 AS SMALLINT), CAST(NULL AS FLOAT), -915, '-766', CAST(NULL
AS TIMESTAMP), CAST(NULL AS STRING)),
('48', CAST(692 AS SMALLINT), CAST(-220.59615 AS FLOAT), 940, '-514',
CAST(NULL AS TIMESTAMP), '181'),
('21', CAST(44 AS SMALLINT), CAST(NULL AS FLOAT), -175, '761',
TIMESTAMP('2016-06-30 00:00:00.0'), '487'),
('50', CAST(953 AS SMALLINT), CAST(837.2948 AS FLOAT), 705, CAST(NULL AS
STRING), CAST(NULL AS TIMESTAMP), '-62');
CREATE VIEW bools(a, b) as values (1, true), (1, true), (1, null);
SELECT
AVG(-13) OVER (ORDER BY COUNT(t1.smallint_col_2) DESC ROWS 27 PRECEDING ) AS
float_col,
COUNT(t1.smallint_col_2) AS int_col
FROM table_5 t1
INNER JOIN (
SELECT
(MIN(-83) OVER (PARTITION BY t2.a ORDER BY t2.a, (t1.int_col_4) *
(t1.int_col_4) ROWS BETWEEN CURRENT ROW AND 15 FOLLOWING)) NOT IN (-222, 928)
AS boolean_col,
t2.a,
(t1.int_col_4) * (t1.int_col_4) AS int_col
FROM table_5 t1
LEFT JOIN bools t2 ON (t2.a) = (t1.int_col_4)
WHERE
(t1.smallint_col_2) > (t1.smallint_col_2)
GROUP BY
t2.a,
(t1.int_col_4) * (t1.int_col_4)
HAVING
((t1.int_col_4) * (t1.int_col_4)) IN ((t1.int_col_4) * (t1.int_col_4),
SUM(t1.int_col_4))
) t2 ON (((t2.int_col) = (t1.int_col_4)) AND ((t2.a) = (t1.int_col_4))) AND
((t2.a) = (t1.smallint_col_2));
{code}
(I haven't tried to minimize this failing case yet).
Based on sampled jstacks from the driver, it looks like the query might be
repeatedly inferring filters from constraints and then pruning those filters.
Here's part of the stack at the point where it stackoverflows:
{code}
[... repeats ...]
at
org.apache.spark.sql.catalyst.expressions.Canonicalize$.org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative(Canonicalize.scala:50)
at
org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50)
at
org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50)
at
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at
scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
at
org.apache.spark.sql.catalyst.expressions.Canonicalize$.org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative(Canonicalize.scala:50)
at
org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50)
at
org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50)
at
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at
scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
at
org.apache.spark.sql.catalyst.expressions.Canonicalize$.org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative(Canonicalize.scala:50)
at
org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50)
at
org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50)
at
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at
scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
at
org.apache.spark.sql.catalyst.expressions.Canonicalize$.org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative(Canonicalize.scala:50)
at
org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50)
at
org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50)
at
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at
scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
at
org.apache.spark.sql.catalyst.expressions.Canonicalize$.org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative(Canonicalize.scala:50)
at
org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50)
at
org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50)
at
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at
scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
at
org.apache.spark.sql.catalyst.expressions.Canonicalize$.org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative(Canonicalize.scala:50)
at
org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50)
at
org.apache.spark.sql.catalyst.expressions.Canonicalize$$anonfun$org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative$1.apply(Canonicalize.scala:50)
at
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at
scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
at
org.apache.spark.sql.catalyst.expressions.Canonicalize$.org$apache$spark$sql$catalyst$expressions$Canonicalize$$gatherCommutative(Canonicalize.scala:50)
at
org.apache.spark.sql.catalyst.expressions.Canonicalize$.orderCommutative(Canonicalize.scala:58)
at
org.apache.spark.sql.catalyst.expressions.Canonicalize$.expressionReorder(Canonicalize.scala:63)
at
org.apache.spark.sql.catalyst.expressions.Canonicalize$.execute(Canonicalize.scala:36)
at
org.apache.spark.sql.catalyst.expressions.Expression.canonicalized$lzycompute(Expression.scala:158)
- locked <0x00000007a298b940> (a
org.apache.spark.sql.catalyst.expressions.Multiply)
at
org.apache.spark.sql.catalyst.expressions.Expression.canonicalized(Expression.scala:156)
at
org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$1.apply(Expression.scala:157)
at
org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$1.apply(Expression.scala:157)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
[...]
{code}
I suspect this is similar to SPARK-17733, another bug where
{{InferFiltersFromConstraints}}, so I'll cc [~jiangxb1987] and [~sameerag] who
worked on that earlier fix.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]