Re: [pyspark] dataframe map_partition

peng yu Thu, 07 Mar 2019 12:13:55 -0800

I'm looking for a mapPartition(pandas_udf) for  a pyspark.Dataframe.

```
@pandas_udf(df.schema, PandasUDFType.MAP)
def do_nothing(pandas_df):
    return pandas_df

new_df = df.mapPartition(do_nothing)
```
pandas_udf only support scala or GROUPED_MAP.  Why not support just Map?

On Thu, Mar 7, 2019 at 2:57 PM Sean Owen <sro...@gmail.com> wrote:

> Are you looking for @pandas_udf in Python? Or just mapPartition? Those
> exist already
>
> On Thu, Mar 7, 2019, 1:43 PM peng yu <yupb...@gmail.com> wrote:
>
>> There is a nice map_partition function in R `dapply`.  so that user can
>> pass a row to udf.
>>
>> I'm wondering why we don't have that in python?
>>
>> I'm trying to have a map_partition function with pandas_udf supported
>>
>> thanks!
>>
>

Re: [pyspark] dataframe map_partition

Reply via email to