We could not use most of the OpenMP patches recently submitted because
they were not properly implemented.  If you want to correct your patches
and resubmit, post your updates here.  Ideally you would resubmit your
patch against ImageMagick 6.4.1-2 and type 'make check' to ensure that the
hundreds of regression tests pass.  In addition you should submit an image
and a few command lines that exercise your OpenMP-ed method to prove your
parallelization is correctly implemented.

One problem we encountered in the submitted patches is the user of
AcquireImagePixels(), GetImagePixels(), and SetImagePixels().  These are not
thread safe.  See http://www.imagemagick.org/script/architecture.php#threads
for details.  Even if you use a cache view (e.g. AcquireCacheViewPixels()),
which is thread safe, you need to open a cache view for each thread which
most likely negates any benefit of using threads.  You could declare the
cache view private but that essentially serializes the outer (y) loop.
Instead the most promising speed-up might come from parallelizing the inside
(x) loop:

  for (y=0; y < (long) blur_image->rows; y++)
  {
    r=AcquireImagePixels(edge_image,0,y,edge_image->columns,1,exception);
    q=GetImagePixels(blur_image,0,y,blur_image->columns,1);
    if ((r == (const PixelPacket *) NULL) || (q == (PixelPacket *) NULL))
      break;
    indexes=GetIndexes(image);
    blur_indexes=GetIndexes(blur_image);
    #pragma omp parallel for
    for (x=0; x < (long) blur_image->columns; x++)
    {

The standard X loop looks something like this

    for (x=0; x < (long) blur_image->columns; x++)
    {
      q->red=f(p->red);
      q->green=f(p->green);
      q->red=f(p->blue);
      p++;
      q++;
    }

To ensure a speed up, you must reference p & q relative to the loop variable:

    #pragma omp parallel for
    for (x=0; x < (long) blur_image->columns; x++)
    {
      (q+x)->red=f((p+x)->red);
      (q+x)->green=f((p+x)->green);
      (q+x)->red=f((p+x)->blue);
    }

or

    #pragma omp parallel for
    for (x=0; x < (long) blur_image->columns; x++)
    {
      q[x].red=f(p[x].red);
      q[x].green=f(p[x].green);
      q[x].red=f(p[x].blue);
    }

if you perfer. 

ImageMagick typically declares local variables at the beginning of a method:

  double alpha;
  ...
  for (y=0; ...
    for (x=0; ...
      alpha=f(p)+f(q);

Rather than declaring these private (e.g.  #pragma omp parallel for
private(alpha)), just include them in the x loop and they will be allocated
on the stack of each thread:

    #pragma omp parallel for
    for (x=0; x < (long) blur_image->columns; x++)
    {
      double
        alpha;

      alpha=f(p[x],q[x]);
      q[x].red=f(alpha);
      q[x].green=f(alpha);
      q[x].red=f(alpha);
    }

Given this discussion, we are far from an expert on OpenMP.  If you can
show that parallelizing the outer loop (y) using cache views can speed
up an algorithm, we'd like to see it.  Keep in mind that you must close a
cache view (i.e. CloseCacheView()) for each cache open (i.e. OpenCacheView()).
We had a few patches that had a cache view open but no matching close (which
is a memory leak).
_______________________________________________
Magick-users mailing list
[email protected]
http://studio.imagemagick.org/mailman/listinfo/magick-users

Reply via email to