[
https://issues.apache.org/jira/browse/SPARK-4533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Josh Rosen resolved SPARK-4533.
-------------------------------
Resolution: Invalid
AFAIK this is not a bug, since calling {{keyBy()}} on a SchemaRDD returns a
non-schema RDD. Therefore, I'm marking this issue as "Invalid" and closing it.
> SchemaRDD API error: Can only subtract another SchemaRDD
> --------------------------------------------------------
>
> Key: SPARK-4533
> URL: https://issues.apache.org/jira/browse/SPARK-4533
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 1.1.0
> Environment: JDK6/7
> Reporter: Shawn Guo
> Priority: Minor
>
> There are two unexpected validations in below SchemaRDD APIs.
> subtract(self, other, numPartitions=None)
> "Can only subtract another SchemaRDD"
> intersection(self, other)
> "Can only intersect with another SchemaRDD"
> "Can only subtract another SchemaRDD" will be thrown when SchemaRDD subtract
> other types of RDD.
> Reproduce Steps:
> A = SchemaRDD
> B = SchemaRDD
> A_APX= A.keyBy(lambda line: None)
> B_APX=B.keyBy(lambda line: None)
> {color:red}
> CROSSED = A_APX.join(B_APX).map(lambda line: line[1]).filter(filter
> condition).map(lambda line: line[0]))
> {color}
> C=A.subtract(CROSSED) {color:red}#ERROR:Can only subtract another
> SchemaRDD{color}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]