UpdateStateByKey : Partitioning and Shuffle

Soumitra Johri Tue, 05 Jan 2016 17:22:43 -0800

Hi,

I am relatively new to Spark and am using updateStateByKey() operation to
maintain state in my Spark Streaming application. The input data is coming
through a Kafka topic.


   1. I want to understand how are DStreams partitioned?
   2. How does the partitioning work with mapWithState() or
   updateStatebyKey() method?
   3. In updateStateByKey() does the old state and the new values against a
   given key processed on same node ?
   4. How frequent is the shuffle for updateStateByKey() method ?

The state I have to maintaining contains ~ 100000 keys and I want to avoid
shuffle every time I update the state , any tips to do it ?

Warm Regards
Soumitra

UpdateStateByKey : Partitioning and Shuffle

Reply via email to