[ https://issues.apache.org/jira/browse/SPARK-41812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653649#comment-17653649 ]
Martin Grund commented on SPARK-41812: -------------------------------------- On the Spark side we eagerly resolve the column and return a full named expression. {code:scala} Column(addDataFrameIdToCol(resolve(colName))) private[sql] def resolve(colName: String): NamedExpression = { val resolver = sparkSession.sessionState.analyzer.resolver queryExecution.analyzed.resolveQuoted(colName, resolver) .getOrElse(throw resolveException(colName, schema.fieldNames)) } {code} To avoid too many round-trips we should probably inject the Dataframe ID and column position properties in the metadata to perfom the resolution later on the server. > DataFrame.join: ambiguous column > -------------------------------- > > Key: SPARK-41812 > URL: https://issues.apache.org/jira/browse/SPARK-41812 > Project: Spark > Issue Type: Sub-task > Components: Connect > Affects Versions: 3.4.0 > Reporter: Hyukjin Kwon > Priority: Major > > {code} > File "/.../spark/python/pyspark/sql/connect/column.py", line 106, in > pyspark.sql.connect.column.Column.eqNullSafe > Failed example: > df1.join(df2, df1["value"] == df2["value"]).count() > Exception raised: > Traceback (most recent call last): > File "/.../miniconda3/envs/python3.9/lib/python3.9/doctest.py", line > 1336, in __run > exec(compile(example.source, filename, "single", > File "<doctest pyspark.sql.connect.column.Column.eqNullSafe[4]>", line > 1, in <module> > df1.join(df2, df1["value"] == df2["value"]).count() > File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 151, in > count > pdd = self.agg(_invoke_function("count", lit(1))).toPandas() > File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 1031, > in toPandas > return self._session.client.to_pandas(query) > File "/.../spark/python/pyspark/sql/connect/client.py", line 413, in > to_pandas > return self._execute_and_fetch(req) > File "/.../spark/python/pyspark/sql/connect/client.py", line 573, in > _execute_and_fetch > self._handle_error(rpc_error) > File "/.../spark/python/pyspark/sql/connect/client.py", line 619, in > _handle_error > raise SparkConnectAnalysisException( > pyspark.sql.connect.client.SparkConnectAnalysisException: > [AMBIGUOUS_REFERENCE] Reference `value` is ambiguous, could be: [`value`, > `value`]. > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org