Anders Logg wrote:
> On Thu, Aug 21, 2008 at 11:14:09AM +0200, Niclas Jansson wrote:
>> Anders Logg wrote:
>>> On Thu, Aug 21, 2008 at 09:10:03AM +0200, Niclas Jansson wrote:
>>>> Anders Logg wrote:
>>>>> On Wed, Aug 20, 2008 at 06:17:30PM +0200, Niclas Jansson wrote:
>>>>>
>>>>>>>> Stage 2 seems to involve a lot of communication, with small messages.
>>>>>>>> I think it would be more efficient if the stage were reorganized such
>>>>>>>> that all messages could be exchanged "at once", in a couple of larger
>>>>>>>> messages.
>>>>>>> That would be nice. I'm very open to suggestions.
>>>>>> If understand the {T, S, F} overlap correctly, a facet could be globally
>>>>>> identified by the value of F(facet).
>>>>> No, F(facet) would be the local number of the facet in subdomain S(facet).
>>>>>
>>>>>> If so, one suggestion is to buffer N_i and F(facet) in 0...p-1 buffers
>>>>>> (one for each processor) and exchange these during stage 2.
>>>>>>
>>>>>> -- stage 1
>>>>>>
>>>>>>   for each facet  f \in T
>>>>>>     j = S_i(f)
>>>>>>     if j > i
>>>>>>
>>>>>>         -- calculate dof N_i
>>>>>>
>>>>>>         buffer[S_i(f)].add(N_i)
>>>>>>         buffer[S_i(f)].add(F_i(f))
>>>>>>     end
>>>>>>   end
>>>>>>
>>>>>>
>>>>>> -- stage 2
>>>>>>
>>>>>> -- Exchange shared dofs with fancy MPI_Allgatherv or a lookalike
>>>>>> -- MPI_SendRecv loop.
>>>>>>
>>>>>>    for j = 1 to j = (num processors - 1)
>>>>>>       src = (rank - j + num processors) % num processors
>>>>>>       dest = (rank + j) % num processors
>>>>>>
>>>>>>       MPI_SendRecv(dest, buffer[dest], src, recv_buffer)
>>>>>>
>>>>>>       for i = 0 to sizeof(recv_buffer), i += 2
>>>>>>          --update facet recv_buff(i+1) with dof value in  recv_buff(i)
>>>>>>       end
>>>>>>
>>>>>>    end
>>>>> I didn't look at this in detail (yet). Is it still valid with the
>>>>> above interpretation of F(facet)?
>>>>>
>>>> Yes, I think so.
>>> I think I understand your point, but I don't understand the details
>>> of your code.
>> if j > i the processor is responsible for creating M_i for the shared
>> facet. The newly created M_i is placed in the send buffer for the
>> subdomain S_f(f), together with the local facet number in that subdomain.
>>
>> So the send buffers contains tuples {M_i, F_i(f)}, since there is one
>> buffer for each subdomain, one could be sure that F_i(f) is valid on the
>> receiving processor.
>>
>> Instead of iterating over all processors and facets in stage 2, each
>> processor receives a set of tuples (for all shared facets) from each
>> processor. These could then be used to identify the local facet (since
>> F_i(f) is the local facet number) and assign the dofs, which, if I
>> understand everything correctly is obtained from M_i.
>>
>> One modification to the above algorithm, I think it's easier if the
>> tuples are stored as {F_i(f), M_i}. Since M_i could be a long list of
>> dofs. So the update loop would be something similar to
>>
>>   for i = 0 to size of recv_buff , i +=(number of dofs on each facet + 1)
>>      local facet f = recv_buff(i)
>>      for each facet on f, loop counter j
>>         assign recv_buff( (i+1) + j) ) to facet dof j
>>      end
>>   end
>>
>>> The mapping N_i is an auxiliary global-to-global mapping, which maps
>>> the global dofs on a local mesh to global dofs on the global mesh. It
>>> has a meaning only on each local mesh. What we want to communicate is
>>> the stuff in M_i.
>> I see, then it should be M_i in the outlined code.
>>
>> Niclas
> 
> Sounds very good.
> 
> Where do we start?
> 
> I guess one good place to start would be to get input/partitioning
> working and you seem to have that working already. We should be able
> to read in a mesh, partition it (with ParMetis for now) and construct
> the MeshFunctions S_i and F_i.
> 
> Once that is in place, we can start hacking on DofMapBuilder::build().
> 
> Could you outline what you have in terms of input/partitioning and
> then we can start applying those patches.
> 
> --
> Anders
> 

Parallel mesh parsing, the entire mesh is never represented on a single 
processor. It's a two stage process, first the coordinates are loaded 
and partitioned with a geometric partitioner. In the second stage each 
processor loads the cells corresponding to the assigned coordinates, and 
finally the mesh is partitioned with a graph partitioner.

Partitioning is done with the distributed geometric and mesh-to-dual 
graph partitioner in parmetis.

Mesh distribution or more correctly redistribution (since the mesh is 
always distributed) moves vertices between processors and construct a 
new mesh with the MeshEditor. Since processors share vertices in the 
overlap, I use the concept of ghosted vertices in order to decide which 
processor should be responsible for redistributing a vertex.


Niclas
_______________________________________________
DOLFIN-dev mailing list
[email protected]
http://www.fenics.org/mailman/listinfo/dolfin-dev

Reply via email to