[
https://issues.apache.org/jira/browse/SPARK-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002275#comment-14002275
]
Matei Zaharia commented on SPARK-1857:
--------------------------------------
The problem is that it's not currently supported to run actions on an RDD
within another RDD operation. For example you couldn't do a.map(_ => m.count())
either. The error message could probably be improved. I'd also like to see
lookup() support being called within an operation in the future but it's not
supported by the current architecture.
> map() with lookup() causes exception
> ------------------------------------
>
> Key: SPARK-1857
> URL: https://issues.apache.org/jira/browse/SPARK-1857
> Project: Spark
> Issue Type: Bug
> Affects Versions: 0.9.0
> Reporter: Michael Malak
>
> Using map() and lookup() in conjunction throws an exception
> {noformat}
> val a = sc.parallelize(Array(11))
> val m = sc.parallelize(Array((11,21)))
> a.map(m.lookup(_)(0)).collect
> 14/05/14 15:03:35 ERROR Executor: Exception in task ID 23
> scala.MatchError: null
> at org.apache.spark.rdd.PairRDDFunctions.lookup(PairRDDFunctions.scala:551)
> {noformat}
> A workaround is:
> {noformat}
> a.map((_,0)).join(m).map(_._2._2).collect
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.2#6252)