This is a long message, so let me start with the punchline:  *I have a 
lot of CUDA code that harnesses a user's GPU to accelerate very tedious 
image processing operations, potentially 200x speedup.  I am ready to 
donate this code to the GIMP project.  This code can be run on Windows 
or Linux, and probably Mac, too.*  *It only works on NVIDIA cards, but 
can detect at runtime whether the user has acceptable hardware, and 
disables itself if not.*

Hi all, I'm new here.  I work on real-time image processing applications 
that must run at 60-240Hz, which is typically too fast for doing things 
like convolutions on large images.  However, the new fad is to use CUDA 
to harness the parallel computing power of your graphics card to do 
computations, instead of just rendering graphics.  The speed ups are 

For instance, I implemented a basic convolution algorithm (blurring), 
which operates on a 4096x4096 image with a 15x15 kernel/PSF.  On my CPU 
it took *27 seconds* (AMD Athlon X3 440).  When running the identical 
algorithm in CUDA, I get it done in *0.1 to 0.25 seconds*, so between 
110x to 250x speedup (NVIDIA GTX 460).  Which side of the spectrum you 
are on depends on whether the memory already resides in the GPU device 
memory, of it needs to be copied in/out on each operation. 

Any kind of operation that resembles convolution, such as edge 
detection, blurring, morphology operations, etc, are all highly 
parallelizable and ideal for GPU-acceleration.  *I have a lot of this 
code already written for grayscale images, and can be donated to the 
GIMP project.*  I would be interested to expand the code to work on 
color images (though, I suspect just doing it three times on each 
channel would probably not be ideal), and I don't think it will be that 
hard to integrate into the existing GIMP project (only a couple extra 
libraries need to be added for a user's computer to benefit from it).

Additionally, the CUDA comes with convenient functions for determining 
whether a user has a CUDA-enabled GPU, and can default to regular CPU 
operations if they don't have one.  It can determine how many cards they 
have, select the fastest one, and adjust the function calls to 
accommodate older GPU cards.   Therefore, I believe the code can safely 
be integrated and dynamically enable itself only if it can be used.

My solution is for any image size (within the limit of GPU memory), but 
the kernel/PSF size must be odd and no larger than 25x25.  It's not to 
say larger kernel sizes can't be done in CUDA, but my solution is 
"elegant" for sizes smaller than that, due to having a limited amount of 
shared memory.  I believe it will still work up to a 61x61 kernel but 
with substantial slowdown (though, probably still much faster than 
CPU).  Beyond that, I believe a different algorithm is needed.

I have implemented basic convolution (which assumes 0s outside the edge 
of the image), bilateral filter (which is blurring without destroying 
edges), and most of the basic binary morphological operations 
(kernel-based erode, dilate, opening, closing).  I believe it would be 
possible to develop a morphology plugin, that allows you to start with a 
binary image, and click buttons for erode, dilate, opening, etc, and 
have it respond immediately.  This would allow someone to start with an 
image, and try lots of different combinations of morphological 
operations to determine if their problem can be solved with morphology 
(which usually requires a long and complex sequence of morph ops).

Unfortunately, I don't have much time to become a GIMP developer, but I 
feel like I can still contribute.  I will happily develop the algorithms 
to be run on the GPU, as they will probably benefit my job, too (I'm 
open to suggestions for functions that operate on the whole image, but 
independently).  And I can help with the linking to CUDA libraries, 
which NVIDIA claims can be done quickly by someone with no CUDA experience.

Please let me know if anyone is interested to work with me on this:

Gimp-developer mailing list

Reply via email to