A use case would be helpful? 

Batches of  RDDs from Streams are going to have temporal ordering in terms of 
when they are processed in a typical application ... , but maybe you could 
shuffle the way batch iterations work

> On Nov 3, 2014, at 11:59 AM, Josh J <joshjd...@gmail.com> wrote:
> 
> When I'm outputting the RDDs to an external source, I would like the RDDs to 
> be outputted in a random shuffle so that even the order is random. So far 
> what I understood is that the RDDs do have a type of order, in that the order 
> for spark streaming RDDs would be the order in which spark streaming read the 
> tuples from source (e.g. ordered by roughly when the producer sent the tuple 
> in addition to any latency)
> 
>> On Mon, Nov 3, 2014 at 8:48 AM, Sean Owen <so...@cloudera.com> wrote:
>> I think the answer will be the same in streaming as in the core. You
>> want a random permutation of an RDD? in general RDDs don't have
>> ordering at all -- excepting when you sort for example -- so a
>> permutation doesn't make sense. Do you just want a well-defined but
>> random ordering of the data? Do you just want to (re-)assign elements
>> randomly to partitions?
>> 
>> On Mon, Nov 3, 2014 at 4:33 PM, Josh J <joshjd...@gmail.com> wrote:
>> > Hi,
>> >
>> > Is there a nice or optimal method to randomly shuffle spark streaming RDDs?
>> >
>> > Thanks,
>> > Josh
> 

Reply via email to