Hello,
Minor update as a little bug and an unused variable were left over in the patch.
I'll commit this + the parameter check change later.
Anybody volunteering for review of a cmr for 1.7.5. :)
Ah and some restults for MPI_Dims_create(1000000, 3, {})
original: 8.110628 sec
optimized-primes: 0.048702 sec
optimized-factorization: 0.000013 sec
Regards
Christoph
--
Christoph Niethammer
High Performance Computing Center Stuttgart (HLRS)
Nobelstrasse 19
70569 Stuttgart
Tel: ++49(0)711-685-87203
email: nietham...@hlrs.de
http://www.hlrs.de/people/niethammer
----- Ursprüngliche Mail -----
Von: "Andreas Schäfer" <gent...@gmx.de>
An: "Open MPI Developers" <de...@open-mpi.org>
Gesendet: Dienstag, 11. Februar 2014 12:16:53
Betreff: Re: [OMPI devel] Reviewing MPI_Dims_create
Hi,
ah, that's clever indeed!
Best
-Andreas
On 12:02 Tue 11 Feb , Christoph Niethammer wrote:
> Hello,
>
> After rethinking Jeff's comments about caching prime numbers I came to the
> conclusion that we can omit the prime numbers at all and go directly for the
> factorization. :D
> We then only need at most log_2(INT_MAX) * sizeof(int) = 32 * 4 byte = 128
> byte of memory for the factors.
>
> Computational costs reduce as well as the factorization itself is done by a
> loop with at most \sqrt(num) / 2 iterations - which is the same as in the
> original prime number detection loop.
> I think this is the cleanest way which reduces also source code size. ;)
>
> Find attache patch against the trunk.
>
> Best regards
> Christoph
>
> --
>
> Christoph Niethammer
> High Performance Computing Center Stuttgart (HLRS)
> Nobelstrasse 19
> 70569 Stuttgart
>
> Tel: ++49(0)711-685-87203
> email: nietham...@hlrs.de
> http://www.hlrs.de/people/niethammer
>
>
>
> ----- Ursprüngliche Mail -----
> Von: "Andreas Schäfer" <gent...@gmx.de>
> An: "Open MPI Developers" <de...@open-mpi.org>
> Gesendet: Dienstag, 11. Februar 2014 06:24:56
> Betreff: Re: [OMPI devel] Reviewing MPI_Dims_create
>
> OK, so we were thinking the same thing :-) The optimization you
> mention below is the same I've used in my updated patch.
>
>
> On 02:29 Tue 11 Feb , Christoph Niethammer wrote:
> > As mentioned in my former mail I did not touch the factorization
> > code.
> > But to figure out if a number n is *not* a prime number it is sufficient to
> > check up to \sqrt(n).
> > Proof:
> > let n = p*q with q > \sqrt{n}
> > --> p < \sqrt(n)
> > So we have already found factor p before reaching \sqrt(n) and by this n is
> > no prime any more and no need for further checks. ;)
> >
> >
> > The mentioned factorization may indeed include one factor which is larger
> > than \sqrt(n). :)
> >
> > Proof that at least one prime factor can be larger than \sqrt(n) example:
> > 6 = 2*3
> > \sqrt(6) = 2.4494897427832... < 3 Q.E.D.
> >
> >
> > Proof that no more than one factor can be larger than \sqrt(n):
> > let n = \prod_{i=0}^K p_i with p_i \in N and K > 2
> > and assume w.l.o.g. p_0 > \sqrt(n) and p_1 > \sqrt(n)
> > --> 1 > \prod_{i=2}^K p_i
> > which is a contradiction as all p_i \in N. Q.E.D.
> >
> >
> > So your idea is still applicable with not much effort and we only need
> > prime factors up to sqrt(n) in the factorizer code for an additional
> > optimization. :)
> >
> > First search all K' factors p_i < \sqrt(n). If then n \ne \prod_{i=0}^{K'}
> > p_i we should be sure that p_{K'+1} = n / \prod_{i=0}^{K'} p_i is a prime.
> > No complication with counts IMHO. I leave this without patch as it is
> > already 2:30 in the morning. :P
> >
> > Regards
> > Christoph
> >
> > --
> >
> > Christoph Niethammer
> > High Performance Computing Center Stuttgart (HLRS)
> > Nobelstrasse 19
> > 70569 Stuttgart
> >
> > Tel: ++49(0)711-685-87203
> > email: nietham...@hlrs.de
> > http://www.hlrs.de/people/niethammer
> >
> > ----- Ursprüngliche Mail -----
> > Von: "Andreas Schäfer" <gent...@gmx.de>
> > An: "Open MPI Developers" <de...@open-mpi.org>
> > Gesendet: Montag, 10. Februar 2014 23:24:24
> > Betreff: Re: [OMPI devel] Reviewing MPI_Dims_create
> >
> > Christoph-
> >
> > your patch has the same problem as my original patch: indeed there may
> > be a prime factor p of n with p > sqrt(n). What's important is that
> > there may only be at most one. I've submitted an updated patch (see my
> > previous mail) which catches this special case.
> >
> > Best
> > -Andreas
> >
> >
> > On 19:30 Mon 10 Feb , Christoph Niethammer wrote:
> > > Hello,
> > >
> > > I noticed some effort in improving the scalability of
> > > MPI_Dims_create(int nnodes, int ndims, int dims[])
> > > Unfortunately there were some issues with the first attempt (r30539 and
> > > r30540) which were reverted.
> > >
> > > So I decided to give it a short review based on r30606
> > > https://svn.open-mpi.org/trac/ompi/browser/trunk/ompi/mpi/c/dims_create.c?rev=30606
> > >
> > >
> > > 1.) freeprocs is initialized to be nnodes and the subsequent divisions of
> > > freeprocs have all positive integers as divisor.
> > > So IMHO it would make more sense to check if nnodes > 0 in the
> > > MPI_PARAM_CHECK section at the begin instead of the following (see patch
> > > 0001):
> > >
> > > 99 if (freeprocs < 1) {
> > > 100 return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD,
> > > MPI_ERR_DIMS,
> > > 101 FUNC_NAME);
> > > 102 }
> > >
> > >
> > > 2.) I rewrote the algorithm stopping at sqrt(n) in getprimes(int num, int
> > > *nprimes, int **pprimes)
> > > which makes mathematically more sens (as the largest prime factor of any
> > > number n cannot exceed \sqrt{n}) - and should produce the right result. ;)
> > > (see patch 0002)
> > > Here the improvements:
> > >
> > > module load mpi/openmpi/trunk-gnu.4.7.3
> > > $ ./mpi-dims-old 1000000
> > > time used for MPI_Dims_create(1000000, 3, {}): 8.104007
> > > module swap mpi/openmpi/trunk-gnu.4.7.3
> > > mpi/openmpi/trunk-gnu.4.7.3-testing
> > > $ ./mpi-dims-new 1000000
> > > time used for MPI_Dims_create(1000000, 3, {}): 0.060400
> > >
> > >
> > > 3.) Memory allocation for the list of prime numbers may be reduced up to
> > > a factor of ~6 for 1mio nodes using the result from Dusart 1999 [1]:
> > > \pi(x) < x/ln(x)(1+1.2762/ln(x)) for x > 1
> > > Unfortunately this saves us only 1.6 MB per process for 1mio nodes as
> > > reported by tcmalloc/pprof on a test program - but it may sum up with
> > > fatter nodes. :P
> > >
> > > $ pprof --base=$PWD/primes-old.0001.heap a.out primes-new.0002.heap
> > > (pprof) top
> > > Total: -1.6 MB
> > > 0.3 -18.8% -18.8% 0.3 -18.8% getprimes2
> > > 0.0 -0.0% -18.8% -1.6 100.0% __libc_start_main
> > > 0.0 -0.0% -18.8% -1.6 100.0% main
> > > -1.9 118.8% 100.0% -1.9 118.8% getprimes
> > >
> > > Find attached patch for it in 0003.
> > >
> > >
> > > If there are no issues I would like to commit this to trunk for further
> > > testing (+cmr for 1.7.5?) end of this week.
> > >
> > > Best regards
> > > Christoph
> > >
> > > [1]
> > > http://www.ams.org/journals/mcom/1999-68-225/S0025-5718-99-01037-6/home.html
> > >
> > >
> > >
> > > --
> > >
> > > Christoph Niethammer
> > > High Performance Computing Center Stuttgart (HLRS)
> > > Nobelstrasse 19
> > > 70569 Stuttgart
> > >
> > > Tel: ++49(0)711-685-87203
> > > email: nietham...@hlrs.de
> > > http://www.hlrs.de/people/niethammer
> >
> > > From e3292b90cac42fad80ed27a555419002ed61ab66 Mon Sep 17 00:00:00 2001
> > > From: Christoph Niethammer <christoph.nietham...@hlrs.de>
> > > Date: Mon, 10 Feb 2014 16:44:03 +0100
> > > Subject: [PATCH 1/3] Move parameter check into appropriate code section
> > > at the
> > > begin.
> > >
> > > ---
> > > ompi/mpi/c/dims_create.c | 11 ++++++-----
> > > 1 file changed, 6 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/ompi/mpi/c/dims_create.c b/ompi/mpi/c/dims_create.c
> > > index d2c3858..3d0792f 100644
> > > --- a/ompi/mpi/c/dims_create.c
> > > +++ b/ompi/mpi/c/dims_create.c
> > > @@ -71,6 +71,11 @@ int MPI_Dims_create(int nnodes, int ndims, int dims[])
> > > return OMPI_ERRHANDLER_INVOKE (MPI_COMM_WORLD,
> > > MPI_ERR_DIMS, FUNC_NAME);
> > > }
> > > +
> > > + if (1 > nnodes) {
> > > + return OMPI_ERRHANDLER_INVOKE (MPI_COMM_WORLD,
> > > + MPI_ERR_DIMS, FUNC_NAME);
> > > + }
> > > }
> > >
> > > /* Get # of free-to-be-assigned processes and # of free dimensions */
> > > @@ -95,11 +100,7 @@ int MPI_Dims_create(int nnodes, int ndims, int dims[])
> > > FUNC_NAME);
> > > }
> > >
> > > - if (freeprocs < 1) {
> > > - return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, MPI_ERR_DIMS,
> > > - FUNC_NAME);
> > > - }
> > > - else if (freeprocs == 1) {
> > > + if (freeprocs == 1) {
> > > for (i = 0; i < ndims; ++i, ++dims) {
> > > if (*dims == 0) {
> > > *dims = 1;
> > > --
> > > 1.8.3.2
> > >
> >
> > > From bc862c47ef8d581a8f6735c51983d6c9eeb95dfd Mon Sep 17 00:00:00 2001
> > > From: Christoph Niethammer <christoph.nietham...@hlrs.de>
> > > Date: Mon, 10 Feb 2014 18:50:51 +0100
> > > Subject: [PATCH 2/3] Speeding up detection of prime numbers using the fact
> > > that the largest prime factor of any number n cannot exceed \sqrt{n}.
> > >
> > > ---
> > > ompi/mpi/c/dims_create.c | 9 ++++++---
> > > 1 file changed, 6 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/ompi/mpi/c/dims_create.c b/ompi/mpi/c/dims_create.c
> > > index 3d0792f..1c1c381 100644
> > > --- a/ompi/mpi/c/dims_create.c
> > > +++ b/ompi/mpi/c/dims_create.c
> > > @@ -5,7 +5,7 @@
> > > * Copyright (c) 2004-2005 The University of Tennessee and The University
> > > * of Tennessee Research Foundation. All rights
> > > * reserved.
> > > - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart,
> > > + * Copyright (c) 2004-2014 High Performance Computing Center Stuttgart,
> > > * University of Stuttgart. All rights reserved.
> > > * Copyright (c) 2004-2005 The Regents of the University of California.
> > > * All rights reserved.
> > > @@ -20,6 +20,8 @@
> > >
> > > #include "ompi_config.h"
> > >
> > > +#include <math.h>
> > > +
> > > #include "ompi/mpi/c/bindings.h"
> > > #include "ompi/runtime/params.h"
> > > #include "ompi/communicator/communicator.h"
> > > @@ -293,13 +295,14 @@ getprimes(int num, int *pnprime, int **pprimes) {
> > > primes[i++] = 2;
> > >
> > > for (n = 3; n <= num; n += 2) {
> > > - for (j = 1; j < i; ++j) {
> > > + double nsqrt = sqrt((double) n);
> > > + for(j = 0; (double) primes[j] <= nsqrt; j++) {
> > > if ((n % primes[j]) == 0) {
> > > break;
> > > }
> > > }
> > >
> > > - if (j == i) {
> > > + if (primes[j] > nsqrt) {
> > > if (i >= size) {
> > > return MPI_ERR_DIMS;
> > > }
> > > --
> > > 1.8.3.2
> > >
> >
> > > From a012206cfbf7b8b22cef4cc5b811384e5e3ac32c Mon Sep 17 00:00:00 2001
> > > From: Christoph Niethammer <christoph.nietham...@hlrs.de>
> > > Date: Mon, 10 Feb 2014 19:02:03 +0100
> > > Subject: [PATCH 3/3] Reduce memory usage by a better approximation for the
> > > upper limit of the number of primes.
> > >
> > > ---
> > > ompi/mpi/c/dims_create.c | 12 ++++++++++--
> > > 1 file changed, 10 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/ompi/mpi/c/dims_create.c b/ompi/mpi/c/dims_create.c
> > > index 1c1c381..8dd3144 100644
> > > --- a/ompi/mpi/c/dims_create.c
> > > +++ b/ompi/mpi/c/dims_create.c
> > > @@ -281,9 +281,17 @@ getprimes(int num, int *pnprime, int **pprimes) {
> > > int n;
> > > int size;
> > > int *primes;
> > > + double lognum;
> > >
> > > - /* Allocate the array of primes */
> > > - size = (num / 2) + 1;
> > > + /* Allocate the array of primes
> > > + * see Pierre Dusart, Math. Comp. 68 (1999), no. 225, 411???415.*/
> > > + lognum = log(num);
> > > + if(num > 1) {
> > > + size = ceil(num/lognum * (1+1.2762/lognum));
> > > + }
> > > + else {
> > > + size = 1;
> > > + }
> > > primes = (int *) malloc((unsigned) size * sizeof(int));
> > > if (NULL == primes) {
> > > return MPI_ERR_NO_MEM;
> > > --
> > > 1.8.3.2
> > >
> >
> > > _______________________________________________
> > > devel mailing list
> > > de...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> >
> > --
> > ==========================================================
> > Andreas Schäfer
> > HPC and Grid Computing
> > Chair of Computer Science 3
> > Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
> > +49 9131 85-27910
> > PGP/GPG key via keyserver
> > http://www.libgeodecomp.org
> > ==========================================================
> >
> > (\___/)
> > (+'.'+)
> > (")_(")
> > This is Bunny. Copy and paste Bunny into your
> > signature to help him gain world domination!
> >
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> --
> ==========================================================
> Andreas Schäfer
> HPC and Grid Computing
> Chair of Computer Science 3
> Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
> +49 9131 85-27910
> PGP/GPG key via keyserver
> http://www.libgeodecomp.org
> ==========================================================
>
> (\___/)
> (+'.'+)
> (")_(")
> This is Bunny. Copy and paste Bunny into your
> signature to help him gain world domination!
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> From 19af07aac40d9e2b9c827ab043643b1e6e28fe0c Mon Sep 17 00:00:00 2001
> From: Christoph Niethammer <christoph.nietham...@hlrs.de>
> Date: Tue, 11 Feb 2014 11:58:30 +0100
> Subject: [PATCH] Omit usage of pre calculated prime numbers and factorize
> directly.
>
> ---
> ompi/mpi/c/dims_create.c | 159
> +++++++++++++++--------------------------------
> 1 file changed, 50 insertions(+), 109 deletions(-)
>
> diff --git a/ompi/mpi/c/dims_create.c b/ompi/mpi/c/dims_create.c
> index 3d0792f..02a3413 100644
> --- a/ompi/mpi/c/dims_create.c
> +++ b/ompi/mpi/c/dims_create.c
> @@ -5,7 +5,7 @@
> * Copyright (c) 2004-2005 The University of Tennessee and The University
> * of Tennessee Research Foundation. All rights
> * reserved.
> - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart,
> + * Copyright (c) 2004-2014 High Performance Computing Center Stuttgart,
> * University of Stuttgart. All rights reserved.
> * Copyright (c) 2004-2005 The Regents of the University of California.
> * All rights reserved.
> @@ -20,6 +20,8 @@
>
> #include "ompi_config.h"
>
> +#include <math.h>
> +
> #include "ompi/mpi/c/bindings.h"
> #include "ompi/runtime/params.h"
> #include "ompi/communicator/communicator.h"
> @@ -36,9 +38,8 @@
> static const char FUNC_NAME[] = "MPI_Dims_create";
>
> /* static functions */
> -static int assignnodes(int ndim, int nfactor, int *pfacts, int *counts, int
> **pdims);
> -static int getfactors(int num, int nprime, int *primes, int **pcounts);
> -static int getprimes(int num, int *pnprime, int **pprimes);
> +static int assignnodes(int ndim, int nfactor, int *pfacts,int **pdims);
> +static int getfactors(int num, int *nfators, int **factors);
>
>
> /*
> @@ -50,8 +51,7 @@ int MPI_Dims_create(int nnodes, int ndims, int dims[])
> int i;
> int freeprocs;
> int freedims;
> - int nprimes;
> - int *primes;
> + int nfactors;
> int *factors;
> int *procs;
> int *p;
> @@ -109,20 +109,14 @@ int MPI_Dims_create(int nnodes, int ndims, int dims[])
> return MPI_SUCCESS;
> }
>
> - /* Compute the relevant prime numbers for factoring */
> - if (MPI_SUCCESS != (err = getprimes(freeprocs, &nprimes, &primes))) {
> - return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, err,
> - FUNC_NAME);
> - }
> -
> /* Factor the number of free processes */
> - if (MPI_SUCCESS != (err = getfactors(freeprocs, nprimes, primes,
> &factors))) {
> + if (MPI_SUCCESS != (err = getfactors(freeprocs, &nfactors, &factors))) {
> return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, err,
> FUNC_NAME);
> }
>
> /* Assign free processes to free dimensions */
> - if (MPI_SUCCESS != (err = assignnodes(freedims, nprimes, primes,
> factors, &procs))) {
> + if (MPI_SUCCESS != (err = assignnodes(freedims, nfactors, factors,
> &procs))) {
> return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, err,
> FUNC_NAME);
> }
> @@ -135,7 +129,6 @@ int MPI_Dims_create(int nnodes, int ndims, int dims[])
> }
> }
>
> - free((char *) primes);
> free((char *) factors);
> free((char *) procs);
>
> @@ -154,12 +147,11 @@ int MPI_Dims_create(int nnodes, int ndims, int dims[])
> * Accepts: - # of dimensions
> * - # of prime factors
> * - array of prime factors
> - * - array of factor counts
> * - ptr to array of dimensions (returned value)
> * Returns: - 0 or ERROR
> */
> static int
> -assignnodes(int ndim, int nfactor, int *pfacts, int *counts, int **pdims)
> +assignnodes(int ndim, int nfactor, int *pfacts, int **pdims)
> {
> int *bins;
> int i, j;
> @@ -185,17 +177,15 @@ assignnodes(int ndim, int nfactor, int *pfacts, int
> *counts, int **pdims)
>
> /* Loop assigning factors from the highest to the lowest */
> for (j = nfactor - 1; j >= 0; --j) {
> - f = pfacts[j];
> - for (n = counts[j]; n > 0; --n) {
> - /* Assign a factor to the smallest bin */
> - pmin = bins;
> - for (i = 1, p = pmin + 1; i < ndim; ++i, ++p) {
> - if (*p < *pmin) {
> - pmin = p;
> - }
> + f = pfacts[j];
> + /* Assign a factor to the smallest bin */
> + pmin = bins;
> + for (i = 1, p = pmin + 1; i < ndim; ++i, ++p) {
> + if (*p < *pmin) {
> + pmin = p;
> }
> - *pmin *= f;
> }
> + *pmin *= f;
> }
>
> /* Sort dimensions in decreasing order (O(n^2) for now) */
> @@ -217,96 +207,47 @@ assignnodes(int ndim, int nfactor, int *pfacts, int
> *counts, int **pdims)
> *
> * Function: - factorize a number
> * Accepts: - number
> - * - # of primes
> - * - array of primes
> - * - ptr to array of counts (returned value)
> - * Returns: - 0 or ERROR
> + * - # prime factors
> + * - array of prime factors
> + * Returns: - MPI_SUCCESS or ERROR
> */
> static int
> -getfactors(int num, int nprime, int *primes, int **pcounts)
> -{
> - int *counts;
> +getfactors(int num, int *nfactors, int **factors) {
> + int size;
> + int n;
> + int d;
> int i;
> - int *p;
> - int *c;
> -
> - if (0 >= nprime) {
> - return MPI_ERR_INTERN;
> - }
> + double lognum;
>
> - /* Allocate the factor counts array */
> - counts = (int *) malloc((unsigned) nprime * sizeof(int));
> - if (NULL == counts) {
> - return MPI_ERR_NO_MEM;
> + if(num < 2) {
> + (*nfactors) = 0;
> + (*factors) = NULL;
> + return MPI_SUCCESS;
> }
> -
> - *pcounts = counts;
> -
> - /* Loop over the prime numbers */
> - i = nprime - 1;
> - p = primes + i;
> - c = counts + i;
> -
> - for (; i >= 0; --i, --p, --c) {
> - *c = 0;
> - while ((num % *p) == 0) {
> - ++(*c);
> - num /= *p;
> + /* Allocate the array of prime factors which cannot exceed log_2(num)
> entries */
> + n = num;
> + size = ceil(log(num) / log(2));
> + *factors = (int *) malloc((unsigned) size * sizeof(int));
> +
> + i = 0;
> + /* determine all occurences of factor 2 */
> + while((num % 2) == 0) {
> + num /= 2;
> + (*factors)[i++] = 2;
> + }
> + /* determine all occurences of uneven prime numbers up to sqrt(num) */
> + d = 3;
> + int sqrtnum = ceil(sqrt(num));
> + for(d = 3; (num > 1) && (d < sqrtnum); d += 2) {
> + while((num % d) == 0) {
> + num /= d;
> + (*factors)[i++] = d;
> }
> }
> -
> - if (1 != num) {
> - return MPI_ERR_INTERN;
> + /* as we looped only up to sqrt(num) one factor > sqrt(num) may be left
> over */
> + if(num != 1) {
> + (*factors)[i++] = num;
> }
> -
> + (*nfactors) = i;
> return MPI_SUCCESS;
> }
> -
> -/*
> - * getprimes
> - *
> - * Function: - get primes smaller than number
> - * - array of primes dynamically allocated
> - * Accepts: - number
> - * - ptr # of primes (returned value)
> - * - ptr array of primes (returned values)
> - * Returns: - 0 or ERROR
> - */
> -static int
> -getprimes(int num, int *pnprime, int **pprimes) {
> -
> - int i, j;
> - int n;
> - int size;
> - int *primes;
> -
> - /* Allocate the array of primes */
> - size = (num / 2) + 1;
> - primes = (int *) malloc((unsigned) size * sizeof(int));
> - if (NULL == primes) {
> - return MPI_ERR_NO_MEM;
> - }
> - *pprimes = primes;
> -
> - /* Find the prime numbers */
> - i = 0;
> - primes[i++] = 2;
> -
> - for (n = 3; n <= num; n += 2) {
> - for (j = 1; j < i; ++j) {
> - if ((n % primes[j]) == 0) {
> - break;
> - }
> - }
> -
> - if (j == i) {
> - if (i >= size) {
> - return MPI_ERR_DIMS;
> - }
> - primes[i++] = n;
> - }
> - }
> -
> - *pnprime = i;
> - return MPI_SUCCESS;
> -}
> --
> 1.8.3.2
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
==========================================================
Andreas Schäfer
HPC and Grid Computing
Chair of Computer Science 3
Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
+49 9131 85-27910
PGP/GPG key via keyserver
http://www.libgeodecomp.org
==========================================================
(\___/)
(+'.'+)
(")_(")
This is Bunny. Copy and paste Bunny into your
signature to help him gain world domination!
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
From 0862ff8ff43105f75562cbef31a4162b894191a5 Mon Sep 17 00:00:00 2001
From: Christoph Niethammer <christoph.nietham...@hlrs.de>
Date: Tue, 11 Feb 2014 15:11:12 +0100
Subject: [PATCH] Omit usage of pre calculated prime numbers and factorize
directly.
---
ompi/mpi/c/dims_create.c | 159 +++++++++++++++--------------------------------
1 file changed, 50 insertions(+), 109 deletions(-)
diff --git a/ompi/mpi/c/dims_create.c b/ompi/mpi/c/dims_create.c
index 3d0792f..92eb652 100644
--- a/ompi/mpi/c/dims_create.c
+++ b/ompi/mpi/c/dims_create.c
@@ -5,7 +5,7 @@
* Copyright (c) 2004-2005 The University of Tennessee and The University
* of Tennessee Research Foundation. All rights
* reserved.
- * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart,
+ * Copyright (c) 2004-2014 High Performance Computing Center Stuttgart,
* University of Stuttgart. All rights reserved.
* Copyright (c) 2004-2005 The Regents of the University of California.
* All rights reserved.
@@ -20,6 +20,8 @@
#include "ompi_config.h"
+#include <math.h>
+
#include "ompi/mpi/c/bindings.h"
#include "ompi/runtime/params.h"
#include "ompi/communicator/communicator.h"
@@ -36,9 +38,8 @@
static const char FUNC_NAME[] = "MPI_Dims_create";
/* static functions */
-static int assignnodes(int ndim, int nfactor, int *pfacts, int *counts, int **pdims);
-static int getfactors(int num, int nprime, int *primes, int **pcounts);
-static int getprimes(int num, int *pnprime, int **pprimes);
+static int assignnodes(int ndim, int nfactor, int *pfacts,int **pdims);
+static int getfactors(int num, int *nfators, int **factors);
/*
@@ -50,8 +51,7 @@ int MPI_Dims_create(int nnodes, int ndims, int dims[])
int i;
int freeprocs;
int freedims;
- int nprimes;
- int *primes;
+ int nfactors;
int *factors;
int *procs;
int *p;
@@ -109,20 +109,14 @@ int MPI_Dims_create(int nnodes, int ndims, int dims[])
return MPI_SUCCESS;
}
- /* Compute the relevant prime numbers for factoring */
- if (MPI_SUCCESS != (err = getprimes(freeprocs, &nprimes, &primes))) {
- return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, err,
- FUNC_NAME);
- }
-
/* Factor the number of free processes */
- if (MPI_SUCCESS != (err = getfactors(freeprocs, nprimes, primes, &factors))) {
+ if (MPI_SUCCESS != (err = getfactors(freeprocs, &nfactors, &factors))) {
return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, err,
FUNC_NAME);
}
/* Assign free processes to free dimensions */
- if (MPI_SUCCESS != (err = assignnodes(freedims, nprimes, primes, factors, &procs))) {
+ if (MPI_SUCCESS != (err = assignnodes(freedims, nfactors, factors, &procs))) {
return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, err,
FUNC_NAME);
}
@@ -135,7 +129,6 @@ int MPI_Dims_create(int nnodes, int ndims, int dims[])
}
}
- free((char *) primes);
free((char *) factors);
free((char *) procs);
@@ -154,12 +147,11 @@ int MPI_Dims_create(int nnodes, int ndims, int dims[])
* Accepts: - # of dimensions
* - # of prime factors
* - array of prime factors
- * - array of factor counts
* - ptr to array of dimensions (returned value)
* Returns: - 0 or ERROR
*/
static int
-assignnodes(int ndim, int nfactor, int *pfacts, int *counts, int **pdims)
+assignnodes(int ndim, int nfactor, int *pfacts, int **pdims)
{
int *bins;
int i, j;
@@ -185,17 +177,15 @@ assignnodes(int ndim, int nfactor, int *pfacts, int *counts, int **pdims)
/* Loop assigning factors from the highest to the lowest */
for (j = nfactor - 1; j >= 0; --j) {
- f = pfacts[j];
- for (n = counts[j]; n > 0; --n) {
- /* Assign a factor to the smallest bin */
- pmin = bins;
- for (i = 1, p = pmin + 1; i < ndim; ++i, ++p) {
- if (*p < *pmin) {
- pmin = p;
- }
+ f = pfacts[j];
+ /* Assign a factor to the smallest bin */
+ pmin = bins;
+ for (i = 1, p = pmin + 1; i < ndim; ++i, ++p) {
+ if (*p < *pmin) {
+ pmin = p;
}
- *pmin *= f;
}
+ *pmin *= f;
}
/* Sort dimensions in decreasing order (O(n^2) for now) */
@@ -217,96 +207,47 @@ assignnodes(int ndim, int nfactor, int *pfacts, int *counts, int **pdims)
*
* Function: - factorize a number
* Accepts: - number
- * - # of primes
- * - array of primes
- * - ptr to array of counts (returned value)
- * Returns: - 0 or ERROR
+ * - # prime factors
+ * - array of prime factors
+ * Returns: - MPI_SUCCESS or ERROR
*/
static int
-getfactors(int num, int nprime, int *primes, int **pcounts)
-{
- int *counts;
+getfactors(int num, int *nfactors, int **factors) {
+ int size;
+ int n;
+ int d;
int i;
- int *p;
- int *c;
-
- if (0 >= nprime) {
- return MPI_ERR_INTERN;
- }
+ int sqrtnum;
- /* Allocate the factor counts array */
- counts = (int *) malloc((unsigned) nprime * sizeof(int));
- if (NULL == counts) {
- return MPI_ERR_NO_MEM;
+ if(num < 2) {
+ (*nfactors) = 0;
+ (*factors) = NULL;
+ return MPI_SUCCESS;
}
-
- *pcounts = counts;
-
- /* Loop over the prime numbers */
- i = nprime - 1;
- p = primes + i;
- c = counts + i;
-
- for (; i >= 0; --i, --p, --c) {
- *c = 0;
- while ((num % *p) == 0) {
- ++(*c);
- num /= *p;
+ /* Allocate the array of prime factors which cannot exceed log_2(num) entries */
+ n = num;
+ sqrtnum = ceil(sqrt(num));
+ size = ceil(log(num) / log(2));
+ *factors = (int *) malloc((unsigned) size * sizeof(int));
+
+ i = 0;
+ /* determine all occurences of factor 2 */
+ while((num % 2) == 0) {
+ num /= 2;
+ (*factors)[i++] = 2;
+ }
+ /* determine all occurences of uneven prime numbers up to sqrt(num) */
+ d = 3;
+ for(d = 3; (num > 1) && (d < sqrtnum); d += 2) {
+ while((num % d) == 0) {
+ num /= d;
+ (*factors)[i++] = d;
}
}
-
- if (1 != num) {
- return MPI_ERR_INTERN;
+ /* as we looped only up to sqrt(num) one factor > sqrt(num) may be left over */
+ if(num != 1) {
+ (*factors)[i++] = num;
}
-
+ (*nfactors) = i;
return MPI_SUCCESS;
}
-
-/*
- * getprimes
- *
- * Function: - get primes smaller than number
- * - array of primes dynamically allocated
- * Accepts: - number
- * - ptr # of primes (returned value)
- * - ptr array of primes (returned values)
- * Returns: - 0 or ERROR
- */
-static int
-getprimes(int num, int *pnprime, int **pprimes) {
-
- int i, j;
- int n;
- int size;
- int *primes;
-
- /* Allocate the array of primes */
- size = (num / 2) + 1;
- primes = (int *) malloc((unsigned) size * sizeof(int));
- if (NULL == primes) {
- return MPI_ERR_NO_MEM;
- }
- *pprimes = primes;
-
- /* Find the prime numbers */
- i = 0;
- primes[i++] = 2;
-
- for (n = 3; n <= num; n += 2) {
- for (j = 1; j < i; ++j) {
- if ((n % primes[j]) == 0) {
- break;
- }
- }
-
- if (j == i) {
- if (i >= size) {
- return MPI_ERR_DIMS;
- }
- primes[i++] = n;
- }
- }
-
- *pnprime = i;
- return MPI_SUCCESS;
-}
--
1.8.3.2