Anders Logg wrote:
> On Wed, Aug 20, 2008 at 05:09:41PM +0200, Niclas Jansson wrote:
>> Anders Logg wrote:
>>> On Wed, Aug 20, 2008 at 09:39:30AM +0200, Niclas Jansson wrote:
>>>> Anders Logg wrote:
>>>>> On Mon, Aug 18, 2008 at 11:05:30AM +0200, Niclas Jansson wrote:
>>>>>
>>>>>> Anders Logg wrote:
>>>>>>
>>>>>>>>> I think it looks good.
>>>>>>>>>
>>>>>>>>> As far as I understand, you build a global numbering of all mesh
>>>>>>>>> entities (which may be different from the local numbering on each
>>>>>>>>> processor), and then the (global parallel) local-to-global mapping
>>>>>>>>> follows from tabulate_dofs just as usual.
>>>>>>>>>
>>>>>>>>> So, the difference is that you build a global numbering of the mesh
>>>>>>>>> entities, and we wanted to build a global numbering of the dofs. The
>>>>>>>>> only advantage I can see with our approach is that it may use less
>>>>>>>>> memory, since we don't need to store an extra numbering scheme for all
>>>>>>>>> mesh entities but this not a big deal.
>>>>>>>>>
>>>>>>>>> A few questions:
>>>>>>>>>
>>>>>>>>> 1. Is the above interpretation correct?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Yes.
>>>>>>>>
>>>>>>>> Another disadvantage with the global numbering scheme is the mesh
>>>>>>>> connectivity calculations (mesh.init in MeshRenumber).
>>>>>>>>
>>>>>>>>
>>>>>>> Why is this a problem? As far as I understand, there are always two
>>>>>>> different numberings of mesh entities, one local (same as we have
>>>>>>> now) and one global. The local can be computed as usual and then the
>>>>>>> global can be reconstructed from the local + the overlap.
>>>>>>>
>>>>>>> (overlap = how the local pieces fit together)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> Iterating over the local + overlap requires some mesh connectivity,
>>>>>> which are costly to generate.
>>>>>>
>>>>> What's your point? Are you arguing against a global numbering scheme
>>>>> for the mesh entities? I thought this is what you have implemented.
>>>>>
>>>>>
>>>> I'm not sure if the global numbering scheme is the best approach. It
>>>> worked well for simple dof_maps / elements, with a low renumbering time.
>>>> But for a more general implementation, renumbering starts to take too
>>>> much time.
>>> ok. So do you suggest we implement the other strategy instead, building a
>>> global dof map from local dof maps?
>> Yes, it's probably more efficient. The only problem with algorithm 5 (in
>> my opinion) is the communication pattern in stage 0 and Stage 2.
>>
>> Parallel efficiency in stage 0 would probably be low due to the pipeline
>> styled offset calculation, it should be easy to fix with MPI_(Ex)Scan.
>
> The plan is to use MPI_Scan for this. The "offset += " is just my
> notation for the same operation. I wasn't aware of MPI_Scan at the
> time.
>
>> Stage 2 seems to involve a lot of communication, with small messages.
>> I think it would be more efficient if the stage were reorganized such
>> that all messages could be exchanged "at once", in a couple of larger
>> messages.
>
> That would be nice. I'm very open to suggestions.
>
> --
> Anders
>
If understand the {T, S, F} overlap correctly, a facet could be globally
identified by the value of F(facet).
If so, one suggestion is to buffer N_i and F(facet) in 0...p-1 buffers
(one for each processor) and exchange these during stage 2.
-- stage 1
for each facet f \in T
j = S_i(f)
if j > i
-- calculate dof N_i
buffer[S_i(f)].add(N_i)
buffer[S_i(f)].add(F_i(f))
end
end
-- stage 2
-- Exchange shared dofs with fancy MPI_Allgatherv or a lookalike
-- MPI_SendRecv loop.
for j = 1 to j = (num processors - 1)
src = (rank - j + num processors) % num processors
dest = (rank + j) % num processors
MPI_SendRecv(dest, buffer[dest], src, recv_buffer)
for i = 0 to sizeof(recv_buffer), i += 2
--update facet recv_buff(i+1) with dof value in recv_buff(i)
end
end
Niclas
_______________________________________________
DOLFIN-dev mailing list
[email protected]
http://www.fenics.org/mailman/listinfo/dolfin-dev