[ 
https://issues.apache.org/jira/browse/SPARK-6822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-6822:
-----------------------------------------
    Target Version/s: 1.5.0  (was: 1.4.0)

> lapplyPartition passes empty list to function
> ---------------------------------------------
>
>                 Key: SPARK-6822
>                 URL: https://issues.apache.org/jira/browse/SPARK-6822
>             Project: Spark
>          Issue Type: Bug
>          Components: SparkR
>    Affects Versions: 1.4.0
>            Reporter: Shivaram Venkataraman
>
> I have an rdd containing two elements, as expected or as shown by a collect. 
> When I call lapplyPartition on it with a function that prints its arguments 
> in stderr, the function gets called three times, the first two with the 
> expected arguments and the third with an empty list as argument. I was 
> wondering if that's a bug or if there are conditions under which that's 
> possible. I apologize I don't have a simple test case ready yet. I run into 
> this potential bug developing a separate package, plyrmr. If you are willing 
> to install it, the test case is very simple. The rdd that creates this 
> problem is a result of a join, but I couldn't replicate the problem using a 
> plain vanilla join.
> Example from Antonio on SparkR JIRA: I don't have time to try any harder to 
> repro this without plyrmr. For the record this is the example
> {code}
> library(plyrmr)
> plyrmr.options(backend = "spark")
> df1 = mtcars[1:4,]
> df2 = mtcars[3:6,]
> w = as.data.frame(gapply(merge(input(df1), input(df2)), identity))
> {code}
> the gapply is implemented with a lapplyPartition in most cases. The merge 
> with a join. as.data.frame with a collect. The join has an arbitrary argument 
> of 4 partitions. If I turn that down to 2L, the problem disappears. I will 
> check in a version with a workaround in place but a debugging statement will 
> leave a record in stderr whenever the workaround kicks in, so that we can 
> track it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to