We could not use most of the OpenMP patches recently submitted because they were not properly implemented. If you want to correct your patches and resubmit, post your updates here. Ideally you would resubmit your patch against ImageMagick 6.4.1-2 and type 'make check' to ensure that the hundreds of regression tests pass. In addition you should submit an image and a few command lines that exercise your OpenMP-ed method to prove your parallelization is correctly implemented.
One problem we encountered in the submitted patches is the user of AcquireImagePixels(), GetImagePixels(), and SetImagePixels(). These are not thread safe. See http://www.imagemagick.org/script/architecture.php#threads for details. Even if you use a cache view (e.g. AcquireCacheViewPixels()), which is thread safe, you need to open a cache view for each thread which most likely negates any benefit of using threads. You could declare the cache view private but that essentially serializes the outer (y) loop. Instead the most promising speed-up might come from parallelizing the inside (x) loop: for (y=0; y < (long) blur_image->rows; y++) { r=AcquireImagePixels(edge_image,0,y,edge_image->columns,1,exception); q=GetImagePixels(blur_image,0,y,blur_image->columns,1); if ((r == (const PixelPacket *) NULL) || (q == (PixelPacket *) NULL)) break; indexes=GetIndexes(image); blur_indexes=GetIndexes(blur_image); #pragma omp parallel for for (x=0; x < (long) blur_image->columns; x++) { The standard X loop looks something like this for (x=0; x < (long) blur_image->columns; x++) { q->red=f(p->red); q->green=f(p->green); q->red=f(p->blue); p++; q++; } To ensure a speed up, you must reference p & q relative to the loop variable: #pragma omp parallel for for (x=0; x < (long) blur_image->columns; x++) { (q+x)->red=f((p+x)->red); (q+x)->green=f((p+x)->green); (q+x)->red=f((p+x)->blue); } or #pragma omp parallel for for (x=0; x < (long) blur_image->columns; x++) { q[x].red=f(p[x].red); q[x].green=f(p[x].green); q[x].red=f(p[x].blue); } if you perfer. ImageMagick typically declares local variables at the beginning of a method: double alpha; ... for (y=0; ... for (x=0; ... alpha=f(p)+f(q); Rather than declaring these private (e.g. #pragma omp parallel for private(alpha)), just include them in the x loop and they will be allocated on the stack of each thread: #pragma omp parallel for for (x=0; x < (long) blur_image->columns; x++) { double alpha; alpha=f(p[x],q[x]); q[x].red=f(alpha); q[x].green=f(alpha); q[x].red=f(alpha); } Given this discussion, we are far from an expert on OpenMP. If you can show that parallelizing the outer loop (y) using cache views can speed up an algorithm, we'd like to see it. Keep in mind that you must close a cache view (i.e. CloseCacheView()) for each cache open (i.e. OpenCacheView()). We had a few patches that had a cache view open but no matching close (which is a memory leak). _______________________________________________ Magick-users mailing list [email protected] http://studio.imagemagick.org/mailman/listinfo/magick-users
