On Thu, Nov 06, 2008 at 03:04:13PM -0500, Jeff Squyres wrote: > > For the web archives: this same question was posted and answered on the > users list. See this thread: > > http://www.open-mpi.org/community/lists/users/2008/11/7222.php
Good thread... one possible omission is the possible replacement of the sleep(1) with sched_yield() to get some overlap with other system activity. As a general rule tight test loops should be aware of the max and minimum times for the test to change to true. Retesting the flag sooner than the minimum time invites system contention. Waiting longer than the max time wastes resources. The loop should know if the state of the object being tested will change without local CPU activity. If the CPU you are executing the test loop on is the same CPU/core that will finish the transaction then a sched_yield() is a very good thing. Also knowing if the test itself impacts the system is important (example: cache line contention or system call). MPI is interesting because for some hardware a lot of work is done in user space and a "sleep()" or "sched_yield()" gets no MPI work done. Other transport code moves data with system calls (example: tcp/ip) where yielding gives the system an opportunity to work any IO queue, or interrupt that might be pending. To point... >> vladimir marjanovic wrote: >>> >>> In order to overlap communication and computation Communication requires work in the form of {small, medium, large} interaction with a processor. Work is work and overlap is strictly not possible. Thus the problem is scheduling for minimum conflict which is just hard to solve in the general set of cases since scheduling is work too. Thus "sched_yield()" may help. > On Nov 6, 2008, at 1:00 PM, vladimir marjanovic wrote: >>> I am new user of Open MPI, I've used MPICH before. >>> I've tried on the user list but they couldn't help me. >>> >>> There is performance bug with the following scenario: >>> >>> proc_B: MPI_Isend(...,proc_A,..,&request) >>> do{ >>> sleep(1); >>> MPI_Test(..,&flag,&request); >>> count++ >>> }while(!flag); >>> >>> proc_A: MPI_Recv(...,proc_B); >>> >>> For message size 8MB, proc_B calls MPI_Test 88 times. It means that >>> point to point communication costs 88 seconds. >>> Btw, bandwidth isn't the problem (interconnection network: >>> InfiniBand) >>> >>> Obviously, there is the problem with progress of the asynchronous >>> messages. In order to overlap communication and computation I don't >>> want to use MPI_Wait. Probably, the message is being decomposed into >>> chucks and the size of chuck is probably defined by environment >>> variable. >>> >>> How can I advance the message more aggressively or can I control >>> size of chunk? >>> Thank you very much >>> >>> Vladimir >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > -- > Jeff Squyres > Cisco Systems > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- T o m M i t c h e l l Found me a new hat, now what?