Re: Iterator of KeyValueGroupedDataset.flatMapGroupsWithState function

2018-10-31 Thread Tathagata Das
It is okay to collect the iterator. That will not break Spark. However, collecting it requires memory in the executor, so you may cause OOMs if a group has a LOT of new data. On Wed, Oct 31, 2018 at 3:44 AM Antonio Murgia - antonio.murg...@studio.unibo.it wrote: > Hi all, > > I'm currently

Iterator of KeyValueGroupedDataset.flatMapGroupsWithState function

2018-10-31 Thread Antonio Murgia - antonio.murg...@studio.unibo.it
Hi all, I'm currently developing a Spark Structured Streaming job and I'm performing flatMapGroupsWithState. I'm concerned about the laziness of the Iterator[V] that is passed to my custom function (func: (K, Iterator[V], GroupState[S]) => Iterator[U]). Is it ok to collect that iterator (with