>> If you want remote DoF values in parallel you have to sync or serialize them 

If I understood, this serialization should be done upfront once,
before iterating through the elements.

Ideally, this serialization is done *never*.  But if you're writing
GUI code you're probably stuck calling those methods from a single
process, right?

As in introduction_ex4, it seems that the parallelization is
confined in "solve", which returns only after the job is all done.

Nope.  When a range is set up from mesh.active_local_elements_begin()
to _end(), the "local" part of that means you're only iterating over
elements which are local to the current processor.  So no
serialization is ever done.  Later in the assembly, the "add_matrix"
and "add_vector" calls are essentially the first halves of parallel
actions - they queue up values to be accumulated when the solver
closes the parallel matrix and vector later.

I should then serialize to the master node just after the "solve".
Is that right?

If you need to, then that's when to do it.

