[
https://issues.apache.org/jira/browse/SPARK-3500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14131752#comment-14131752
]
Nicholas Chammas commented on SPARK-3500:
-----------------------------------------
Hmm, you _could_ perhaps consider this a missing feature, though since all base
RDD operations should also be valid SchemaRDD operations (right?), it
definitely feels like a bug. And it's not just for SchemaRDDs created by
jsonRDD (as noted in the title).
It looks like {{repartition}} is missing, too.
{code}
from pyspark.sql import SQLContext
from pyspark.sql import Row
sqlContext = SQLContext(sc)
a = sc.parallelize([Row(field1=1, field2="row1")])
sqlContext.inferSchema(a).coalesce(1) # Method coalesce does not exist
sqlContext.inferSchema(a).repartition(1) # Method repartition does not exist
{code}
> SchemaRDD from jsonRDD() has not coalesce() method
> --------------------------------------------------
>
> Key: SPARK-3500
> URL: https://issues.apache.org/jira/browse/SPARK-3500
> Project: Spark
> Issue Type: Bug
> Components: PySpark, SQL
> Affects Versions: 1.1.0
> Reporter: Davies Liu
> Assignee: Davies Liu
> Priority: Critical
>
> {code}
> >>> sqlCtx.jsonRDD(sc.parallelize(['{"foo":"bar"}',
> >>> '{"foo":"baz"}'])).coalesce(1)
> Py4JError: An error occurred while calling o94.coalesce. Trace:
> py4j.Py4JException: Method coalesce([class java.lang.Integer, class
> java.lang.Boolean]) does not exist
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]