Furcy Pin created SPARK-42258:
---------------------------------
Summary: pyspark.sql.functions should not expose typing.cast
Key: SPARK-42258
URL: https://issues.apache.org/jira/browse/SPARK-42258
Project: Spark
Issue Type: Improvement
Components: PySpark
Affects Versions: 3.3.1
Reporter: Furcy Pin
In pyspark, the `pyspark.sql.functions` modules imports and exposes the method
`typing.cast`.
This may lead to errors from users that can be hard to spot.
*Example*
It took me a few minutes to understand why the following code:
{code:java}
from pyspark.sql import SparkSession
from pyspark.sql import functions as f
spark = SparkSession.builder.getOrCreate()
df = spark.sql("""SELECT 1 as a""")
df.withColumn("a", f.cast("STRING", f.col("a"))).printSchema() {code}
which executes without any problem, gives the following result:
{code:java}
root
|-- a: integer (nullable = false){code}
This is because `f.cast` here calls `typing.cast, and the correct syntax is:
{code:java}
df.withColumn("a", f.col("a").cast("STRING")).printSchema(){code}
which indeed gives:
{code:java}
root
|-- a: string (nullable = false) {code}
*Suggestion of solution*
Option 1: The methods imported in the module `pyspark.sql.functions` could be
obfuscated to prevent this. For instance:
{code:java}
from typing import cast as _cast{code}
Option 2: only import `typing` and replace all occurrences of `cast` with
`typing.cast`
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]