It is okay to collect the iterator. That will not break Spark. However,
collecting it requires memory in the executor, so you may cause OOMs if a
group has a LOT of new data.

On Wed, Oct 31, 2018 at 3:44 AM Antonio Murgia -
antonio.murg...@studio.unibo.it <antonio.murg...@studio.unibo.it> wrote:

> Hi all,
>
> I'm currently developing a Spark Structured Streaming job and I'm
> performing flatMapGroupsWithState.
>
> I'm concerned about the laziness of the Iterator[V] that is passed to my
> custom function (func: (K, Iterator[V], GroupState[S]) => Iterator[U]).
>
> Is it ok to collect that iterator (with a toList, for example)? I have a
> logic that is practically impossible to perform on a Iterator, but I do not
> want to break Spark lazy chain, obviously.
>
>
> Thank you in advance.
>
>
> #A.M.
>

Reply via email to