Any comment on this one? 2016. nov. 16. du. 12:59 ezt írta ("Zsolt Tóth" <toth.zsolt....@gmail.com>):
> Hi, > > I need to run a map() and a mapPartitions() on my input DF. As a > side-effect of the map(), a partition-local variable should be updated, > that is used in the mapPartitions() afterwards. > I can't use Broadcast variable, because it's shared between partitions on > the same executor. > > Where can I define this variable? > I could run a single mapPartitions() that defines the variable, iterates > over the input (just as the map() would do), collect the result into an > ArrayList, and then use the list's iterator (and the updated > partition-local variable) as the input of the transformation that the > original mapPartitions() did. > > It feels however, that this is not as optimal as running > map()+mapPartitions() because I need to store the ArrayList (which is > basically the whole data in the partition) in memory. > > Thanks, > Zsolt >