[
https://issues.apache.org/jira/browse/SPARK-47854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Liu Cao updated SPARK-47854:
----------------------------
Description:
Given that spark 4.0.0 is upcoming I wonder if we should at least consider
renaming certain function variable naming in python. Otherwise, we may need to
wait another 4 years to do so.
Example
[https://github.com/apache/spark/blob/e6b7950f553cff5adc02b8b5195e79cffff3c97c/python/pyspark/sql/functions/builtin.py#L12768]
There are 8 uses of `len` and 35 `str` as variable names, both of which are
python built-ins. Shadowing `str` is somewhat dangerous in that the following
would be non-sensical.
{code:java}
def foo(str: "ColumnOrName", bar: "ColumnOrName"):
# str is variable now, cannot be used as type
bar = if lit(bar) if isinstance(bar, str) else bar
{code}
Now obviously this would be breaking change for user code if the function is
called with kwargs style. If we rename `str` to `src` or `col`, old code
calling `foo(str="x", bar="y")` would break; though `foo("x", bar="y")` would
be fine.
Is this change a possibility? Or are we thinking that the kwargs breaking
change is not enough of a benefit to make?
was:
Given that spark 4.0.0 is upcoming I wonder if we should at least consider
renaming certain function variable naming in python. Otherwise, we may need to
wait another 4 years to do so.
Example
[https://github.com/apache/spark/blob/e6b7950f553cff5adc02b8b5195e79cffff3c97c/python/pyspark/sql/functions/builtin.py#L12768]
There are 8 uses of `len` and 35 `str` as variable names, both of which are
python built-ins. Shadowing `str` is somewhat dangerous in that the following
would be non-sensical.
{code:java}
def foo(str: "ColumnOrName", bar: "ColumnOrName"):
bar = if lit(bar) if isinstance(bar, str) else bar # str is variable
now, cannot be used as type
{code}
Now obviously this would be breaking change for user code if the function is
called with kwargs style. If we rename `str` to `src` or `col`, old code
calling `foo(str="x", bar="y")` would break; though `foo("x", bar="y")` would
be fine.
Is this change a possibility? Or are we thinking that the kwargs breaking
change is not enough of a benefit to make?
> [PYTHON] Avoid shadowing python built-ins in python function variable naming
> ----------------------------------------------------------------------------
>
> Key: SPARK-47854
> URL: https://issues.apache.org/jira/browse/SPARK-47854
> Project: Spark
> Issue Type: Improvement
> Components: PySpark
> Affects Versions: 3.4.1, 3.5.0, 3.5.1, 3.3.4
> Reporter: Liu Cao
> Priority: Major
>
> Given that spark 4.0.0 is upcoming I wonder if we should at least consider
> renaming certain function variable naming in python. Otherwise, we may need
> to wait another 4 years to do so.
> Example
> [https://github.com/apache/spark/blob/e6b7950f553cff5adc02b8b5195e79cffff3c97c/python/pyspark/sql/functions/builtin.py#L12768]
> There are 8 uses of `len` and 35 `str` as variable names, both of which are
> python built-ins. Shadowing `str` is somewhat dangerous in that the following
> would be non-sensical.
>
> {code:java}
> def foo(str: "ColumnOrName", bar: "ColumnOrName"):
> # str is variable now, cannot be used as type
> bar = if lit(bar) if isinstance(bar, str) else bar
> {code}
>
>
> Now obviously this would be breaking change for user code if the function is
> called with kwargs style. If we rename `str` to `src` or `col`, old code
> calling `foo(str="x", bar="y")` would break; though `foo("x", bar="y")` would
> be fine.
>
> Is this change a possibility? Or are we thinking that the kwargs breaking
> change is not enough of a benefit to make?
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]