partitioning - by itself - is a property of RDD. so essentially it is no different in case of streaming where each batch is one RDD. You can use partitionBy on RDD and pass on your custom partitioner function to it.
One thing you should consider is how balanced are your partitions ie your partition scheme should not skew data into one partition too much. Best Ayan On Wed, Aug 12, 2015 at 9:06 AM, Mohit Anchlia <mohitanch...@gmail.com> wrote: > How does partitioning in spark work when it comes to streaming? What's the > best way to partition a time series data grouped by a certain tag like > categories of product video, music etc. > -- Best Regards, Ayan Guha