This is expected for example if your RDD is the result of random
sampling, or if the underlying source is not consistent. You haven't
shown any code.

On Fri, May 22, 2015 at 3:34 PM, Niklas Wilcke
<1wil...@informatik.uni-hamburg.de> wrote:
> Hi,
>
> I have recognized a strange behavior of spark core in combination with
> mllib. Running my pipeline results in a RDD.
> Calling count() on this RDD results in 160055.
> Calling count() directly afterwards results in 160044 and so on.
> The RDD seems to be unstable.
>
> How can that be? Do you maybe have an explanation or guidance for
> further investigation? I'm investigating for 3 days now and can't
> isolate the bug.
>
> Unfortunately I can't provide a minimal working example only using
> Spark. At the moment I try to reproduce the bug with only using the
> Spark API to hand it over to someone more experienced.
>
> I recognized this behavior while investigating SPARK-5480. Trying to
> build a graph and calculate the transitive closure on such a unstable
> RDD results in a IndexOutOfBoundsException -1.
>
> My first suspicion is that
> org.apache.spark.mllib.rdd.RDDFunctions.sliding causes the problems.
> Replacing my algorithm which uses the sliding window solves the problem.
>
> The bug only occurs on large data sets. On small ones the pipeline works
> fine. That makes it hard to investigate because every run takes several
> minutes. Also generated data does not produce the bug.
>
> I didn't open a Jira ticket yet because I can't tell how to reproduce it.
>
> I'm running Spark 1.3.1 in standalone mode with HDFS on a 10 node cluster.
>
> Thanks for your advise,
> Niklas
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to