Hi, I need to run a map() and a mapPartitions() on my input DF. As a side-effect of the map(), a partition-local variable should be updated, that is used in the mapPartitions() afterwards. I can't use Broadcast variable, because it's shared between partitions on the same executor.
Where can I define this variable? I could run a single mapPartitions() that defines the variable, iterates over the input (just as the map() would do), collect the result into an ArrayList, and then use the list's iterator (and the updated partition-local variable) as the input of the transformation that the original mapPartitions() did. It feels however, that this is not as optimal as running map()+mapPartitions() because I need to store the ArrayList (which is basically the whole data in the partition) in memory. Thanks, Zsolt