[
https://issues.apache.org/jira/browse/SPARK-6822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shivaram Venkataraman updated SPARK-6822:
-----------------------------------------
Target Version/s: 1.5.0 (was: 1.4.0)
> lapplyPartition passes empty list to function
> ---------------------------------------------
>
> Key: SPARK-6822
> URL: https://issues.apache.org/jira/browse/SPARK-6822
> Project: Spark
> Issue Type: Bug
> Components: SparkR
> Affects Versions: 1.4.0
> Reporter: Shivaram Venkataraman
>
> I have an rdd containing two elements, as expected or as shown by a collect.
> When I call lapplyPartition on it with a function that prints its arguments
> in stderr, the function gets called three times, the first two with the
> expected arguments and the third with an empty list as argument. I was
> wondering if that's a bug or if there are conditions under which that's
> possible. I apologize I don't have a simple test case ready yet. I run into
> this potential bug developing a separate package, plyrmr. If you are willing
> to install it, the test case is very simple. The rdd that creates this
> problem is a result of a join, but I couldn't replicate the problem using a
> plain vanilla join.
> Example from Antonio on SparkR JIRA: I don't have time to try any harder to
> repro this without plyrmr. For the record this is the example
> {code}
> library(plyrmr)
> plyrmr.options(backend = "spark")
> df1 = mtcars[1:4,]
> df2 = mtcars[3:6,]
> w = as.data.frame(gapply(merge(input(df1), input(df2)), identity))
> {code}
> the gapply is implemented with a lapplyPartition in most cases. The merge
> with a join. as.data.frame with a collect. The join has an arbitrary argument
> of 4 partitions. If I turn that down to 2L, the problem disappears. I will
> check in a version with a workaround in place but a debugging statement will
> leave a record in stderr whenever the workaround kicks in, so that we can
> track it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]