Hello,

If you mean the current version in the ompi-tests/ibm svn repository I can 
confirm that it passes the topolgy/dimscreate test without errors. :)

The difference in the patches is as follows: The patch from Andreas only 
generated a table of prime numbers of up to sqrt(freeprocs) while my patch 
still produces prime numbers up to freeprocs. And for factoring we really need 
all factors up to freeprocs. The standard sqrt optimization was just introduced 
in the wrong place. :)

You are right with #3: It's a better approximation for the upper bound and the 
proof is something to be read under the Christmas tree. ;)
I just have to rethink if the ceil() is necessary in the code as I am not sure 
about rounding issues in floating point calculations here... :P

Regarding your questions:
1.) I don't think we have to cache prime numbers as MPI_Dims create will not be 
used frequently for factorization. If anybody needs faster factorization he 
would use his own - even more optimized - code. If you find some free time 
beside Open MPI go out for some harder problems at http://projecteuler.net. But 
please don't get frustrated from the assembler solutions. ;)

2.) Interesting idea: Using the approximation from the cited paper we should 
only need around 400 MB to store all primes in the int32 range. Potential for 
applying compression techniques still present. ^^

Regards
Christoph

--

Christoph Niethammer
High Performance Computing Center Stuttgart (HLRS)
Nobelstrasse 19
70569 Stuttgart

Tel: ++49(0)711-685-87203
email: nietham...@hlrs.de
http://www.hlrs.de/people/niethammer



----- Ursprüngliche Mail -----
Von: "Jeff Squyres (jsquyres)" <jsquy...@cisco.com>
An: "Open MPI Developers" <de...@open-mpi.org>
Gesendet: Montag, 10. Februar 2014 20:12:08
Betreff: Re: [OMPI devel] Reviewing MPI_Dims_create

Nice!  Can you verify that it passes the ibm test?  I didn't look closely, and 
to be honest, I'm not sure why the previous improvement broke the IBM test 
because it hypothetically did what you mentioned (stopped at sqrt(freenodes)).

I think patch 1 is a no-brainer.  I'm not sure about #2 because I'm not sure 
how it's different than the previous one, nor did I have time to investigate 
why the previous one broke the IBM test.  #3 seems like a good idea, too; I 
did't check the paper, but I assume it's some kind of proof about the upper 
limit on the number of primes in a given range.

Two questions:

1. Should we cache generated prime numbers?  (if so, it'll have to be done in a 
thread-safe way)

2. Should we just generate prime numbers and hard-code them into a table that 
is compiled into the code?  We would only need primes up to the sqrt of 
2billion (i.e., signed int), right?  I don't know how many that is -- if it's 
small enough, perhaps this is the easiest solution.



On Feb 10, 2014, at 1:30 PM, Christoph Niethammer <nietham...@hlrs.de> wrote:

> Hello,
> 
> I noticed some effort in improving the scalability of
> MPI_Dims_create(int nnodes, int ndims, int dims[])
> Unfortunately there were some issues with the first attempt (r30539 and 
> r30540) which were reverted.
> 
> So I decided to give it a short review based on r30606
> https://svn.open-mpi.org/trac/ompi/browser/trunk/ompi/mpi/c/dims_create.c?rev=30606
> 
> 
> 1.) freeprocs is initialized to be nnodes and the subsequent divisions of 
> freeprocs have all positive integers as divisor.
> So IMHO it would make more sense to check if nnodes > 0 in the 
> MPI_PARAM_CHECK section at the begin instead of the following (see patch 
> 0001):
> 
> 99        if (freeprocs < 1) {
> 100          return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, MPI_ERR_DIMS,
> 101                                        FUNC_NAME);
> 102       }
> 
> 
> 2.) I rewrote the algorithm stopping at sqrt(n) in getprimes(int num, int 
> *nprimes, int **pprimes)
> which makes mathematically more sens (as the largest prime factor of any 
> number n cannot exceed \sqrt{n}) - and should produce the right result. ;)
> (see patch 0002)
> Here the improvements:
> 
> module load mpi/openmpi/trunk-gnu.4.7.3
> $ ./mpi-dims-old 1000000
> time used for MPI_Dims_create(1000000, 3, {}): 8.104007
> module swap mpi/openmpi/trunk-gnu.4.7.3 mpi/openmpi/trunk-gnu.4.7.3-testing
> $ ./mpi-dims-new 1000000
> time used for MPI_Dims_create(1000000, 3, {}): 0.060400
> 
> 
> 3.) Memory allocation for the list of prime numbers may be reduced up to a 
> factor of ~6 for 1mio nodes using the result from Dusart 1999 [1]:
> \pi(x)  < x/ln(x)(1+1.2762/ln(x))  for x > 1
> Unfortunately this saves us only 1.6 MB per process for 1mio nodes as 
> reported by tcmalloc/pprof on a test program - but it may sum up with fatter 
> nodes. :P
> 
> $ pprof --base=$PWD/primes-old.0001.heap a.out primes-new.0002.heap
> (pprof) top
> Total: -1.6 MB
>     0.3 -18.8% -18.8%      0.3 -18.8% getprimes2
>     0.0  -0.0% -18.8%     -1.6 100.0% __libc_start_main
>     0.0  -0.0% -18.8%     -1.6 100.0% main
>    -1.9 118.8% 100.0%     -1.9 118.8% getprimes
> 
> Find attached patch for it in 0003.
> 
> 
> If there are no issues I would like to commit this to trunk for further 
> testing (+cmr for 1.7.5?) end of this week.
> 
> Best regards
> Christoph
> 
> [1] 
> http://www.ams.org/journals/mcom/1999-68-225/S0025-5718-99-01037-6/home.html
> 
> 
> 
> --
> 
> Christoph Niethammer
> High Performance Computing Center Stuttgart (HLRS)
> Nobelstrasse 19
> 70569 Stuttgart
> 
> Tel: ++49(0)711-685-87203
> email: nietham...@hlrs.de
> http://www.hlrs.de/people/niethammer<0001-Move-parameter-check-into-appropriate-code-section-a.patch><0002-Speeding-up-detection-of-prime-numbers-using-the-fac.patch><0003-Reduce-memory-usage-by-a-better-approximation-for-th.patch>_______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to