Ruifeng Zheng created SPARK-42444: ------------------------------------- Summary: DataFrame.drop should handle multi columns properly Key: SPARK-42444 URL: https://issues.apache.org/jira/browse/SPARK-42444 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.4.0 Reporter: Ruifeng Zheng
{code:java} from pyspark.sql import Row df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, "Bob")], ["age", "name"]) df2 = spark.createDataFrame([Row(height=80, name="Tom"), Row(height=85, name="Bob")]) df1.join(df2, df1.name == df2.name, 'inner').drop('name', 'age').show() {code} This works in 3.3.0 {code:java} +------+ |height| +------+ | 85| | 80| +------+ {code} but fails in 3.4 {code:java} --------------------------------------------------------------------------- AnalysisException Traceback (most recent call last) Cell In[1], line 4 2 df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, "Bob")], ["age", "name"]) 3 df2 = spark.createDataFrame([Row(height=80, name="Tom"), Row(height=85, name="Bob")]) ----> 4 df1.join(df2, df1.name == df2.name, 'inner').drop('name', 'age').show() File ~/Dev/spark/python/pyspark/sql/dataframe.py:4913, in DataFrame.drop(self, *cols) 4911 jcols = [_to_java_column(c) for c in cols] 4912 first_column, *remaining_columns = jcols -> 4913 jdf = self._jdf.drop(first_column, self._jseq(remaining_columns)) 4915 return DataFrame(jdf, self.sparkSession) File ~/Dev/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args) 1316 command = proto.CALL_COMMAND_NAME +\ 1317 self.command_header +\ 1318 args_command +\ 1319 proto.END_COMMAND_PART 1321 answer = self.gateway_client.send_command(command) -> 1322 return_value = get_return_value( 1323 answer, self.gateway_client, self.target_id, self.name) 1325 for temp_arg in temp_args: 1326 if hasattr(temp_arg, "_detach"): File ~/Dev/spark/python/pyspark/errors/exceptions/captured.py:159, in capture_sql_exception.<locals>.deco(*a, **kw) 155 converted = convert_exception(e.java_exception) 156 if not isinstance(converted, UnknownException): 157 # Hide where the exception came from that shows a non-Pythonic 158 # JVM exception message. --> 159 raise converted from None 160 else: 161 raise AnalysisException: [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could be: [`name`, `name`]. {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org