[
https://issues.apache.org/jira/browse/SPARK-47854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-47854:
-----------------------------------
Labels: pull-request-available (was: )
> [PYTHON] Avoid shadowing python built-ins in python function variable naming
> ----------------------------------------------------------------------------
>
> Key: SPARK-47854
> URL: https://issues.apache.org/jira/browse/SPARK-47854
> Project: Spark
> Issue Type: Improvement
> Components: PySpark
> Affects Versions: 3.4.1, 3.5.0, 3.5.1, 3.3.4
> Reporter: Liu Cao
> Priority: Major
> Labels: pull-request-available
>
> Given that spark 4.0.0 is upcoming I wonder if we should at least consider
> renaming certain function variable naming in python. Otherwise, we may need
> to wait another 4 years to do so.
> Example
> [https://github.com/apache/spark/blob/e6b7950f553cff5adc02b8b5195e79cffff3c97c/python/pyspark/sql/functions/builtin.py#L12768]
> There are 8 uses of `len` and 35 `str` as variable names, both of which are
> python built-ins. Shadowing `str` is somewhat dangerous in that the following
> would be nonsensical:
> {code:java}
> def foo(str: "ColumnOrName", bar: "ColumnOrName"):
> # str is variable now, cannot be used as type
> bar = if lit(bar) if isinstance(bar, str) else bar
> {code}
>
> Now obviously this would be breaking change for user code if the function is
> called with kwargs style. If we rename `str` to `src` or `col`, certain old
> code using kwargs would break:
> {code:java}
> # breaks:
> foo(str="x", bar="y")
> # okay:
> foo("x", bar="y"){code}
> Is this change a possibility for 4.0? Or are we thinking that the kwargs
> breaking change is too big to make compared to the benefit?
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]