[ 
https://issues.apache.org/jira/browse/SPARK-55821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yicong Huang updated SPARK-55821:
---------------------------------
    Description: 
The serializer classes in `pyspark.sql.pandas.serializers` accept many 
positional arguments in their `__init__` methods, making call sites error-prone 
and hard to read.

For example, `ArrowStreamPandasUDFSerializer.__init__` takes 12 parameters, 
`ApplyInPandasWithStateSerializer.__init__` takes 7 parameters, etc. When these 
are called with positional arguments, it is very easy to mix up the order.

We should enforce keyword-only arguments (using `*` separator after `self`) in 
serializer `__init__` methods to improve readability and prevent positional 
argument mistakes.

All call sites in `worker.py` and within `serializers.py` (subclass 
`super().__init__` calls) must also be updated to use keyword arguments.

  was:
The serializer classes in `pyspark.sql.pandas.serializers` accept many 
positional arguments in their `__init__` methods, making call sites error-prone 
and hard to read.

For example, `ArrowStreamPandasUDFSerializer.__init__` takes 12 parameters, 
`ApplyInPandasWithStateSerializer.__init__` takes 7 parameters, etc. When these 
are called with positional arguments, it is very easy to mix up the order.

We should enforce keyword-only arguments (using `*` separator after `self`) in 
serializer `__init__` methods to improve readability and prevent positional 
argument mistakes.

Classes to update:
- `ArrowStreamPandasSerializer`
- `ArrowStreamPandasUDFSerializer`
- `ArrowStreamArrowUDFSerializer`
- `ArrowBatchUDFSerializer`
- `ArrowStreamPandasUDTFSerializer`
- `ArrowStreamAggPandasUDFSerializer`
- `GroupPandasUDFSerializer`
- `CogroupPandasUDFSerializer`
- `ApplyInPandasWithStateSerializer`
- `TransformWithStateInPandasSerializer`
- `TransformWithStateInPandasInitStateSerializer`
- `ArrowStreamGroupUDFSerializer`
- `CogroupArrowUDFSerializer`

All call sites in `worker.py` and within `serializers.py` (subclass 
`super().__init__` calls) must also be updated to use keyword arguments.


> [PYTHON] Enforce keyword-only arguments in serializer __init__ methods
> ----------------------------------------------------------------------
>
>                 Key: SPARK-55821
>                 URL: https://issues.apache.org/jira/browse/SPARK-55821
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>    Affects Versions: 4.0.0
>            Reporter: Yicong Huang
>            Priority: Minor
>
> The serializer classes in `pyspark.sql.pandas.serializers` accept many 
> positional arguments in their `__init__` methods, making call sites 
> error-prone and hard to read.
> For example, `ArrowStreamPandasUDFSerializer.__init__` takes 12 parameters, 
> `ApplyInPandasWithStateSerializer.__init__` takes 7 parameters, etc. When 
> these are called with positional arguments, it is very easy to mix up the 
> order.
> We should enforce keyword-only arguments (using `*` separator after `self`) 
> in serializer `__init__` methods to improve readability and prevent 
> positional argument mistakes.
> All call sites in `worker.py` and within `serializers.py` (subclass 
> `super().__init__` calls) must also be updated to use keyword arguments.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to