Shawn Guo created SPARK-4533:
--------------------------------
Summary: Can only subtract another SchemaRDD
Key: SPARK-4533
URL: https://issues.apache.org/jira/browse/SPARK-4533
Project: Spark
Issue Type: Bug
Components: PySpark
Affects Versions: 1.1.0
Environment: JDK6/7
Reporter: Shawn Guo
Priority: Minor
There are two unexpected validations in below SchemaRDD APIs.
subtract(self, other, numPartitions=None)
"Can only subtract another SchemaRDD"
intersection(self, other)
"Can only intersect with another SchemaRDD"
"Can only subtract another SchemaRDD" will be thrown when SchemaRDD subtract
other types of RDD.
Reproduce Steps:
A = SchemaRDD
B = SchemaRDD
A_APX= A.keyBy(lambda line: None)
B_APX=B.keyBy(lambda line: None)
{color:red}
CROSSED = A_APX.join(B_APX).map(lambda line: line[1]).filter(filter
condition).map(lambda line: line[0]))
{color}
C=A.subtract(CROSSED) {color:red}#ERROR:Can only subtract another
SchemaRDD{color}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]