BTW, this is more like a user-list kind of mail, than a dev-list. The dev-list is for Spark developers.
On Tue, Jul 14, 2015 at 4:23 PM, Tathagata Das <t...@databricks.com> wrote: > 1. When you set ssc.checkpoint(checkpointDir), the spark streaming > periodically saves the state RDD (which is a snapshot of all the state > data) to HDFS using RDD checkpointing. In fact, a streaming app with > updateStateByKey will not start until you set checkpoint directory. > > 2. The updateStateByKey performance is sort of independent of the what is > the source that is being use - receiver based or direct Kafka. The > absolutely performance obvious depends on a LOT of variables, size of the > cluster, parallelization, etc. The key things is that you must ensure > sufficient parallelization at every stage - receiving, shuffles > (updateStateByKey included), and output. > > Some more discussion in my talk - > https://www.youtube.com/watch?v=d5UJonrruHk > > > On Tue, Jul 14, 2015 at 4:11 PM, swetha <swethakasire...@gmail.com> wrote: > >> >> Hi TD, >> >> I have a question regarding sessionization using updateStateByKey. If near >> real time state needs to be maintained in a Streaming application, what >> happens when the number of RDDs to maintain the state becomes very large? >> Does it automatically get saved to HDFS and reload when needed or do I >> have >> to use any code like ssc.checkpoint(checkpointDir)? Also, how is the >> performance if I use both DStream Checkpointing for maintaining the state >> and use Kafka Direct approach for exactly once semantics? >> >> >> Thanks, >> Swetha >> >> >> >> -- >> View this message in context: >> http://apache-spark-developers-list.1001551.n3.nabble.com/Does-RDD-checkpointing-store-the-entire-state-in-HDFS-tp7368p13227.html >> Sent from the Apache Spark Developers List mailing list archive at >> Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> For additional commands, e-mail: dev-h...@spark.apache.org >> >> >