[
https://issues.apache.org/jira/browse/SPARK-42258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687757#comment-17687757
]
Hyukjin Kwon commented on SPARK-42258:
--------------------------------------
Good point. Are you interested in submitting a PR?
> pyspark.sql.functions should not expose typing.cast
> ---------------------------------------------------
>
> Key: SPARK-42258
> URL: https://issues.apache.org/jira/browse/SPARK-42258
> Project: Spark
> Issue Type: Improvement
> Components: PySpark
> Affects Versions: 3.3.1
> Reporter: Furcy Pin
> Priority: Minor
>
> In pyspark, the `pyspark.sql.functions` modules imports and exposes the
> method `typing.cast`.
> This may lead to errors from users that can be hard to spot.
> *Example*
> It took me a few minutes to understand why the following code:
>
> {code:java}
> from pyspark.sql import SparkSession
> from pyspark.sql import functions as f
> spark = SparkSession.builder.getOrCreate()
> df = spark.sql("""SELECT 1 as a""")
> df.withColumn("a", f.cast("STRING", f.col("a"))).printSchema() {code}
> which executes without any problem, gives the following result:
>
>
> {code:java}
> root
> |-- a: integer (nullable = false){code}
> This is because `f.cast` here calls `typing.cast, and the correct syntax is:
> {code:java}
> df.withColumn("a", f.col("a").cast("STRING")).printSchema(){code}
>
> which indeed gives:
> {code:java}
> root
> |-- a: string (nullable = false) {code}
> *Suggestion of solution*
> Option 1: The methods imported in the module `pyspark.sql.functions` could be
> obfuscated to prevent this. For instance:
> {code:java}
> from typing import cast as _cast{code}
> Option 2: only import `typing` and replace all occurrences of `cast` with
> `typing.cast`
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]