[ https://issues.apache.org/jira/browse/SPARK-42258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-42258: ------------------------------------ Assignee: (was: Apache Spark) > pyspark.sql.functions should not expose typing.cast > --------------------------------------------------- > > Key: SPARK-42258 > URL: https://issues.apache.org/jira/browse/SPARK-42258 > Project: Spark > Issue Type: Improvement > Components: PySpark > Affects Versions: 3.3.1 > Reporter: Furcy Pin > Priority: Minor > > In pyspark, the `pyspark.sql.functions` modules imports and exposes the > method `typing.cast`. > This may lead to errors from users that can be hard to spot. > *Example* > It took me a few minutes to understand why the following code: > > {code:java} > from pyspark.sql import SparkSession > from pyspark.sql import functions as f > spark = SparkSession.builder.getOrCreate() > df = spark.sql("""SELECT 1 as a""") > df.withColumn("a", f.cast("STRING", f.col("a"))).printSchema() {code} > which executes without any problem, gives the following result: > > > {code:java} > root > |-- a: integer (nullable = false){code} > This is because `f.cast` here calls `typing.cast, and the correct syntax is: > {code:java} > df.withColumn("a", f.col("a").cast("STRING")).printSchema(){code} > > which indeed gives: > {code:java} > root > |-- a: string (nullable = false) {code} > *Suggestion of solution* > Option 1: The methods imported in the module `pyspark.sql.functions` could be > obfuscated to prevent this. For instance: > {code:java} > from typing import cast as _cast{code} > Option 2: only import `typing` and replace all occurrences of `cast` with > `typing.cast` -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org