Yujie,

Funny you should ask, Derek and myself (Cody) are working on a talk for the 
upcoming SIAM conference illustrating the various tradeoffs of using a hybrid 
threaded-MPI scheme in Libmesh.  I've been running various combinations of 
threads and MPI processes for the last two weeks to solve a variety of problems 
on our large parallel system.  The LibMesh developers have done a good job of 
wrapping the TBB APIs so that you can embed those functions in your code and 
they still work whether you have TBB or not.  Check out the Threads namespace 
for these functions.  There is a fair amount of work as you'll need to create 
functors out of your hotspots and call them with constructs like parallel_for 
but the work may have big payoffs for your codes.  I wish I had a better 
example for you than just explaining at a high level what to do but our 
threaded loops are buried in our own custom framework and won't stand well as 
examples on their own.

Hope this helps,
Cody

On Feb 11, 2010, at 4:00 PM, Yujie wrote:

> Dear Derek,
> 
> Thank you for your comments.
> I am wandering whether multithread can signicantly reduce the cost time of
> Mesh Communication. If it is, can current libmesh support the multithread in
> MPI. Any example for it? thanks a lot.
> 
> Regards,
> Yujie
> 
> On Thu, Feb 11, 2010 at 4:38 PM, Derek Gaston <[email protected]> wrote:
> 
>> This plot looks reasonable to me.  For a normal linear problem... I
>> wouldn't expect to be able to scale a 350,000 dof problem (that only takes
>> 30 seconds in serial) over about 10 cpus... just too much communication
>> involved (which is what you're seeing).
>> 
>> I'm not saying that we couldn't do something better in MeshCommunication
>> (or that there might be a bug)... I'm just trying to say that you shouldn't
>> expect this problem to scale all that well.  Try a bigger (or harder)
>> problem and I bet that plot changes.
>> 
>> Derek
>> 
>> On Feb 11, 2010, at 3:23 PM, Yujie wrote:
>> 
>> Dear John,
>> 
>> Please check the attached figure. It is better to show the problem.
>> I will test it according to your advice. I am just wondering whether it is
>> reasonable.
>> Thanks a lot.
>> 
>> Regards,
>> Yujie
>> 
>> On Thu, Feb 11, 2010 at 4:19 PM, John Peterson <
>> [email protected]> wrote:
>> 
>>> Yujie,
>>> 
>>> I don't understand the numbers you posted, not least of which because
>>> the columns don't line up in my email...
>>> 
>>> MeshCommunication is a class that does a lot of different things, is
>>> that what you are referring to in the second column.
>>> 
>>> How about attaching some actual perflog logs?  How about adding some
>>> additional logging to parallel_sort if that part is getting slower,
>>> and seeing which part of it takes the longest?
>>> 
>>> --
>>> John
>>> 
>> 
>> <forlibmesh.jpg>
>> 
>> 
>> 
> ------------------------------------------------------------------------------
> SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
> Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
> http://p.sf.net/sfu/solaris-dev2dev
> _______________________________________________
> Libmesh-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/libmesh-users


------------------------------------------------------------------------------
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev
_______________________________________________
Libmesh-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/libmesh-users

Reply via email to