
2020-01-14 Thread Dylan Hogg

Re: Analyzing consecutive elements

2015-10-22 Thread Dylan Hogg
Hi Sampo,

You could try zipWithIndex followed by a self join with shifted index
values like this:

val arr = Array((1, "A"), (8, "D"), (7, "C"), (3, "B"), (9, "E"))
val rdd = sc.parallelize(arr)
val sorted = rdd.sortByKey(true)

val zipped = sorted.zipWithIndex.map(x => (x._2, x._1))
val pairs = zipped.join(zipped.map(x => (x._1 - 1, x._2))).sortBy(_._1)

Which produces the consecutive elements as pairs in the RDD for further

There are probably more efficient ways to do this, but if your dataset
isn't too big it should work for you.


On 22 October 2015 at 17:35, Sampo Niskanen 

> Hi,
> I have analytics data with timestamps on each element.  I'd like to
> analyze consecutive elements using Spark, but haven't figured out how to do
> this.
> Essentially what I'd want is a transform from a sorted RDD [A, B, C, D, E]
> to an RDD [(A,B), (B,C), (C,D), (D,E)].  (Or some other way to analyze
> time-related elements.)
> How can this be achieved?
> *Sampo Niskanen*
> *Lead developer / Wellmo*
> sampo.niska...@wellmo.com
> +358 40 820 5291