[jira] [Commented] (SPARK-34348) applyInPandas doesn't seem to work with StructType output schema

Raman Srinivasan (Jira) Wed, 03 Feb 2021 14:51:06 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-34348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17278404#comment-17278404
 ]


Raman Srinivasan commented on SPARK-34348:
------------------------------------------

My mistake to modify the schema of the original dataframe inplace

> applyInPandas doesn't seem to work with StructType output schema 
> -----------------------------------------------------------------
>
>                 Key: SPARK-34348
>                 URL: https://issues.apache.org/jira/browse/SPARK-34348
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 3.0.1
>            Reporter: Raman Srinivasan
>            Priority: Major
>
>  
> {code:java}
> df = spark.createDataFrame(
>     [(1, 1.0), (1, 2.0), (2, 3.0), (2, 5.0), (2, 10.0)],
>     ("id", "v"))
> def subtract_mean(pdf):
>     # pdf is a pandas.DataFrame
>     pdf['count'] = pdf.shape[0]
>     return pdf{code}
>  
>  
> Using a DDL-formatted string for output schema works fine:
> {code:java}
> df.groupby("id").applyInPandas(subtract_mean, schema="id long, v double, 
> count int").show()
> +---+----+-----+
> | id|   v|count|
> +---+----+-----+
> |  1| 1.0|    2|
> |  1| 2.0|    2|
> |  2| 3.0|    3|
> |  2| 5.0|    3|
> |  2|10.0|    3|
> +---+----+-----+
> {code}
>  
>  
> But using StructType schema (appending a integer count column) fails:
> {code:java}
> df.groupby("id").applyInPandas(subtract_mean, 
> schema=df.schema.add(StructField('count', IntegerType(), False))).show()
> AnalysisException: Cannot resolve column name "count" among (id, v);
> {code}
> It appears to be looking for the new return field in the input schema?
> As a workaround, is there a toDDL method I can use to get the current schema 
> as a DDL string to which I can append the new return fields?
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-34348) applyInPandas doesn't seem to work with StructType output schema

Reply via email to