[
https://issues.apache.org/jira/browse/SPARK-3500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14131933#comment-14131933
]
Nicholas Chammas commented on SPARK-3500:
-----------------------------------------
[~davies] - PySpark doesn't seem to support {{distinct(N)}} on even a regular
RDD. Should it?
{code}
>>> sc.parallelize([1,2,3]).distinct(2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: distinct() takes exactly 1 argument (2 given)
{code}
This sounds like it's a separate issue. Maybe it should be tracked in a
separate JIRA issue?
Also, could we edit the title of this JIRA issue to read something like
"SchemaRDDs are missing these methods: ..."? The problem is not limited to
SchemaRDDs created by jsonRDD().
> SchemaRDD from jsonRDD() has not coalesce() method
> --------------------------------------------------
>
> Key: SPARK-3500
> URL: https://issues.apache.org/jira/browse/SPARK-3500
> Project: Spark
> Issue Type: Bug
> Components: PySpark, SQL
> Affects Versions: 1.1.0
> Reporter: Davies Liu
> Assignee: Davies Liu
> Priority: Critical
>
> {code}
> >>> sqlCtx.jsonRDD(sc.parallelize(['{"foo":"bar"}',
> >>> '{"foo":"baz"}'])).coalesce(1)
> Py4JError: An error occurred while calling o94.coalesce. Trace:
> py4j.Py4JException: Method coalesce([class java.lang.Integer, class
> java.lang.Boolean]) does not exist
> {code}
> repartition() and distinct(N) are also missing too.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]