[LUG@IITD:6341] Re: Again : different output when number of nodes are different in parallel run of a program in Linux OS.

chetan.mlist Sat, 30 Jan 2010 09:28:09 -0800

On Jan 29, 6:05 pm, sonu <[email protected]> wrote:

> But one MUST get same output (when i use "double precison" for
> output ), irrespective of the number of nodes.


A reasonable expectation from your side.
A few point to look at (I can only help you search for the answer)
1. compiler flags: have the biggest effect (one point where control is
possible, meaning changing the compile time options with a bit of hit-
and-trial. Are you doing any compiler optimisation? Drop all -Ox flags
now and check. Such things should be last on the list.
2. The related issue is that of language. FORTRAN/C ? Search for IEEE
754 floating point calculations and get informed about it . May help.
3. Another point could be the compiler itself. Using GCC suite or
intel? Not quite sure if compilers for x86 architectures will be as
good as those for other systems (SUN/IBM). My ideas/information could
be outdated. Surely, there is prevalence of x86 based machines running
linux and at times one has to use what one gets.
4. A genuine programming error. A little bit about the programming
model will help. Things like is the final value a sum/average of
partial sums/answers from different nodes. At times one does make
mistakes in calculations before distributing and after gathering.

> I think mpi(message passing interface) has bugs ? can you tell me how
> mpi works? Because in parallel run, libraries of mpi are used.

Tried using the accompanying examples/test cases that come with MPI?
The MPI book has examples too. Will be a good idea to check with some
simpler things like calculation of pi (I think this example is there
in the book). If they give sound results then there is nothing wrong
with the mpi (your installation, that is).
Internal workings of the library may not be of use (at least at this
stage).

Good luck
Chetan

-- 
l...@iitd - http://tinyurl.com/ycueutm

[LUG@IITD:6341] Re: Again : different output when number of nodes are different in parallel run of a program in Linux OS.

Reply via email to