[
https://issues.apache.org/jira/browse/SPARK-23761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon resolved SPARK-23761.
----------------------------------
Resolution: Cannot Reproduce
> Dataframe filter(udf) followed by groupby in pyspark throws a casting error
> ---------------------------------------------------------------------------
>
> Key: SPARK-23761
> URL: https://issues.apache.org/jira/browse/SPARK-23761
> Project: Spark
> Issue Type: Bug
> Components: PySpark, SQL
> Affects Versions: 1.6.0
> Environment: pyspark 1.6.0
> Python 2.6.6 (r266:84292, Aug 18 2016, 15:13:37)
> [GCC 4.4.7 20120313 (Red Hat 4.4.7-17)] on linux2
> CentOS 6.7
> Reporter: Dhaniram Kshirsagar
> Priority: Major
>
> On pyspark with dataframe, we are getting following exception when
> 'filter(with UDF) is followed by groupby' :-
> # Snippet of error observed in pyspark
> {code:java}
> py4j.protocol.Py4JJavaError: An error occurred while calling o56.filter.
> : java.lang.ClassCastException:
> org.apache.spark.sql.catalyst.plans.logical.Project cannot be cast to
> org.apache.spark.sql.catalyst.plans.logical.Aggregate{code}
> This one looks like https://issues.apache.org/jira/browse/SPARK-12981 however
> not sure if this one is same.
>
> Here is gist with pyspark steps to reproduce this issue:
> [https://gist.github.com/dhaniram-kshirsagar/d72545620b6a05d145a1a6bece797b6d]
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]