Yet another patch..

This one is a case of "add code to gain speed".

By adding a special case handling of "kernel being tested 
is within the borders of the image",you gain speed by not
performing a conditional in the loop, which allows for 
gcc 4.1's autovec engine to change your loop to a series 
of SSE2 instructions.

Get all that? :)

Anyways, this brings the speed down to ~1.3 on my machine.

the 70-* and 80-* patches have equivilents for convolutions
two and four, i'm just not getting it quite right yet...

I'm also attaching my testing program, so that others can
check my work.

Julia Longtin <[EMAIL PROTECTED]>
--- ../../dev2/gift/ChangeLog   2006-08-16 20:33:10.000000000 +0000
+++ ChangeLog   2006-08-16 20:48:08.000000000 +0000
@@ -1,3 +1,9 @@
+2006-08-16    <[EMAIL PROTECTED]>
+
+       * FeatureExtraction/gabor.c
+       move one if to the outside of a for loop, for conditions where the 
kernal dosent touch the edge of the frame.
+       break the for loop for said conditions into two pieces, so that 
gcc4.1's autovec engine can SSE2 translate.
+
 2006-08-14    <[EMAIL PROTECTED]>
 
        * FeatureExtraction/gabor.c
--- ../../dev2/gift/FeatureExtraction/gabor.c   2006-08-16 20:33:16.000000000 
+0000
+++ FeatureExtraction/gabor.c   2006-08-16 20:39:16.000000000 +0000
@@ -89,6 +89,7 @@
        double conv[65536]; /* take advantage of our fixed image size. 65536 == 
width*height */
        double * target_conv;
        double * target_image;
+        double temparray[kernal_size[2]];
 
        for (i = 0; i < width*height; i++)
          {
@@ -101,12 +102,22 @@
        for (x = 0; x < width; x++) {
        for (y = 0; y < height; y++) {
                
target_image=&image[(width*height)-(y*width+x+kernal_size[filter_scale]/2)];
+               if ((x>=kernal_size[filter_scale]/2) && 
((x+kernal_size[filter_scale]/2)<width))
+                 {
+                   for (k = 0; k < kernal_size[filter_scale]; k++)
+                     temparray[k]= target_kernal[k]*target_image[k];
+                   for (k = 0; k < kernal_size[filter_scale]; k++)
+                     conv[y*width+x] += temparray[k];
+                 }
+                else
+                 {
                for (k=0; k < kernal_size[filter_scale]; k++) {
                        if ((x+kernal_size[filter_scale]/2 >= k) && 
(x+kernal_size[filter_scale]/2 < width+k)) {
                                conv[y*width + x] +=
                                        target_kernal[k]*target_image[k];       
                          
                        }
                }
+                 }
        }
        }
 
@@ -130,12 +141,22 @@
        for (x = 0; x < width; x++) {
        for (y = 0; y < height; y++) {
                
target_image=&image[(width*height)-(y*width+x+kernal_size[filter_scale]/2)];
+               if ((x>=kernal_size[filter_scale]/2) && 
((x+kernal_size[filter_scale]/2)<width))
+                 {
+                   for (k = 0; k < kernal_size[filter_scale]; k++)
+                     temparray[k]= target_kernal[k]*target_image[k];
+                   for (k = 0; k < kernal_size[filter_scale]; k++)
+                     conv[y*width+x] += temparray[k];
+                 }
+                else
+                 {
                for (k=0; k < kernal_size[filter_scale]; k++) {
                        if ((x+kernal_size[filter_scale]/2 >= k) && 
(x+kernal_size[filter_scale]/2 < width+k)) {
                                conv[y*width + x] +=
                                        target_kernal[k]*target_image[k];       
                          
                        }
                }
+                 }
        }
        }
 
_______________________________________________
help-GIFT mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/help-gift

Reply via email to