Hi I have a RDD<byte[]> that needs to be sorted lexicographically and then processed by partition. The partitions should be split in to ranged blocks where sorted order is maintained and each partition containing sequential, non-overlapping keys.
Given keys (1,2,3,4,5,6) 1. Correct - 2 partition = (1,2,3),(4,5,6). - 3 partition = (1,2),(3,4),(5,6) 2. Incorrect, the ranges overlap even though they're sorted. - 2 partitions (1,3,5) (2,4,6) - 3 partitions (1,3),(2,5),(4,6) Is this possible with spark? Cheers, -Kristoffer --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org