A use case would be helpful? Batches of RDDs from Streams are going to have temporal ordering in terms of when they are processed in a typical application ... , but maybe you could shuffle the way batch iterations work
> On Nov 3, 2014, at 11:59 AM, Josh J <joshjd...@gmail.com> wrote: > > When I'm outputting the RDDs to an external source, I would like the RDDs to > be outputted in a random shuffle so that even the order is random. So far > what I understood is that the RDDs do have a type of order, in that the order > for spark streaming RDDs would be the order in which spark streaming read the > tuples from source (e.g. ordered by roughly when the producer sent the tuple > in addition to any latency) > >> On Mon, Nov 3, 2014 at 8:48 AM, Sean Owen <so...@cloudera.com> wrote: >> I think the answer will be the same in streaming as in the core. You >> want a random permutation of an RDD? in general RDDs don't have >> ordering at all -- excepting when you sort for example -- so a >> permutation doesn't make sense. Do you just want a well-defined but >> random ordering of the data? Do you just want to (re-)assign elements >> randomly to partitions? >> >> On Mon, Nov 3, 2014 at 4:33 PM, Josh J <joshjd...@gmail.com> wrote: >> > Hi, >> > >> > Is there a nice or optimal method to randomly shuffle spark streaming RDDs? >> > >> > Thanks, >> > Josh >