Size of arbitrary state managed via DStream updateStateByKey

2015-04-01 Thread Vinoth Chandar
Hi all, As I understand from docs and talks, the streaming state is in memory as RDD (optionally checkpointable to disk). SPARK-2629 hints that this in memory structure is not indexed efficiently? I am wondering how my performance would be if the streaming state does not fit in memory (say 100GB

Re: Size of arbitrary state managed via DStream updateStateByKey

2015-04-01 Thread Vinoth Chandar
Thanks for confirming! On Wed, Apr 1, 2015 at 12:33 PM, Tathagata Das t...@databricks.com wrote: In the current state yes there will be performance issues. It can be done much more efficiently and we are working on doing that. TD On Wed, Apr 1, 2015 at 7:49 AM, Vinoth Chandar

Re: Size of arbitrary state managed via DStream updateStateByKey

2015-04-01 Thread Tathagata Das
In the current state yes there will be performance issues. It can be done much more efficiently and we are working on doing that. TD On Wed, Apr 1, 2015 at 7:49 AM, Vinoth Chandar vin...@uber.com wrote: Hi all, As I understand from docs and talks, the streaming state is in memory as RDD