Bruce Robbins created SPARK-43841: ------------------------------------- Summary: Non-existent column in projection of full outer join with USING results in StringIndexOutOfBoundsException Key: SPARK-43841 URL: https://issues.apache.org/jira/browse/SPARK-43841 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.5.0 Reporter: Bruce Robbins
The following query throws a {{StringIndexOutOfBoundsException}}: {noformat} with v1 as ( select * from values (1, 2) as (c1, c2) ), v2 as ( select * from values (2, 3) as (c1, c2) ) select v1.c1, v1.c2, v2.c1, v2.c2, b from v1 full outer join v2 using (c1); {noformat} The query should fail anyway, since {{b}} refers to a non-existent column. But it should fail with a helpful error message, not with a {{StringIndexOutOfBoundsException}}. The issue seems to be in {{StringUtils#orderSuggestedIdentifiersBySimilarity}}. {{orderSuggestedIdentifiersBySimilarity}} assumes that a list of candidate attributes with a mix of prefixes will never have an attribute name with an empty prefix. But in this case it does ({{c1}} from the {{coalesce}} has no prefix, since it is not associated with any relation or subquery): {noformat} +- 'Project [c1#5, c2#6, c1#7, c2#8, 'b] +- Project [coalesce(c1#5, c1#7) AS c1#9, c2#6, c2#8] <== c1#9 has no prefix, unlike c2#6 (v1.c2) or c2#8 (v2.c2) +- Join FullOuter, (c1#5 = c1#7) :- SubqueryAlias v1 : +- CTERelationRef 0, true, [c1#5, c2#6] +- SubqueryAlias v2 +- CTERelationRef 1, true, [c1#7, c2#8] {noformat} Because of this, {{orderSuggestedIdentifiersBySimilarity}} returns a sorted list of suggestions like this: {noformat} ArrayBuffer(.c1, v1.c2, v2.c2) {noformat} {{UnresolvedAttribute.parseAttributeName}} chokes on an attribute name that starts with a namespace separator ('.'). -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org