Chhavi Bansal created SPARK-51295:
-------------------------------------
Summary: \d failure in CLI client for postgres v15 and above for
corelated subquery
Key: SPARK-51295
URL: https://issues.apache.org/jira/browse/SPARK-51295
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 3.4.1
Reporter: Chhavi Bansal
Postgres v15 and above fire particular metadata queries when a
{code:java}
\d tablename {code}
query is fired by the user.
If the backend is spark in this case, certain metadata query fails
{code:java}
SELECT a.attname, pg_catalog.format_type(a.atttypid, a.atttypmod), (SELECT
pg_catalog.pg_get_expr(d.adbin, d.adrelid, true) FROM pg_catalog.pg_attrdef d
WHERE d.adrelid = a.attrelid AND d.adnum = a.attnum AND a.atthasdef),
a.attnotnull, (SELECT c.collname FROM pg_catalog.pg_collation c,
pg_catalog.pg_type t WHERE c.oid = a.attcollation AND t.oid = a.atttypid AND
a.attcollation <> t.typcollation) AS attcollation, ''::pg_catalog.char AS
attidentity, ''::pg_catalog.char AS attgeneratedFROM pg_catalog.pg_attribute
aWHERE a.attrelid = '29916401' AND a.attnum > 0 AND NOT a.attisdroppedORDER BY
a.attnum; {code}
The inner subquery
{code:java}
(SELECT c.collname FROM pg_catalog.pg_collation c, pg_catalog.pg_type t WHERE
c.oid = a.attcollation AND t.oid = a.atttypid AND a.attcollation <>
t.typcollation) AS attcollation {code}
is the cause of failure citing to the reason
{code:java}
Reason:
[[UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.MUST_AGGREGATE_CORRELATED_SCALAR_SUBQUERY]
Unsupported subquery expression: Correlated scalar subqueries must be
aggregated to return at most one row.; line 10 pos 2;Project [attname#11,
format_type(atttypid, atttypmod)#15002, scalarsubquery(attrelid, attnum,
atthasdef)#15010, attnotnull#22, attcollation#14999, attidentity#15000,
attgenerated#15001]+- Sort [attnum#15 ASC NULLS FIRST], true +- Project
[attname#11, if (isnull(atttypid#12)) null else
format_type(knownnotnull(atttypid#12), cast(atttypmod#18 as string)) AS
format_type(atttypid, atttypmod)#15002, scalar-subquery#14997 [attrelid#10 &&
attnum#15 && atthasdef#23] AS scalarsubquery(attrelid, attnum,
atthasdef)#15010, attnotnull#22, scalar-subquery#14998 [attcollation#28 &&
atttypid#12 && attcollation#28] AS attcollation#14999, cast( as string) AS
attidentity#15000, cast( as string) AS attgenerated#15001, attnum#15] :
:- Aggregate [first(if (isnull(adrelid#3704)) null else pg_get_expr(adbin#3706,
knownnotnull(adrelid#3704)), false) AS first(pg_get_expr(adbin,
adrelid))#15003] : : +- Filter (((adrelid#3704 = outer(attrelid#10)) AND
(adnum#3705 = cast(outer(attnum#15) as int))) AND (outer(atthasdef#23) = true))
: : +- SubqueryAlias d : : +- SubqueryAlias pg_attrdef
: : +- View (`pg_attrdef`,
[oid#3703,adrelid#3704,adnum#3705,adbin#3706,adsrc#3707]) : :
+- Project [oid#3703, adrelid#3704, adnum#3705, adbin#3706, adsrc#3707] :
: +- SubqueryAlias global_temp.pg_attrdef : :
+- View (`global_temp`.`pg_attrdef`,
[oid#3703,adrelid#3704,adnum#3705,adbin#3706,adsrc#3707]) : :
+- LogicalRDD [oid#3703, adrelid#3704, adnum#3705, adbin#3706,
adsrc#3707], false
{code}
Since we cant control the metadata queries being fired by the CLI, Can this
kind of query with correlated subquery in select statement be supported by
spark?
Because the same query runs perfectly fine on vanilla postgres.
What should be the way to handle this query?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]