Hello, First, I have to say the idea is smart! I am sure we'll get out with something. There are two optimisations since 0.8.17, and the second one would require to test the improvement again.
The second optimisation was very simple: move the "if (g > 2)" out of the inner loop, and test "if (g <= 2)" just when you read the gradient. I read your patch. Just one comment: try to use the same variable type naming conventions than localy. Like using "Uint32" here. I did apply your idea on the current code. My problem is that it's not slow enough on my computer to notice anything. I ran a test on an active 128x128 map. Here is the cpu usage graph: (there is nothing above 40% so it's croped) before: 40.0 % | 38.5 % | * 35.0 % | ** 33.5 % | ** 30.0 % | ******* 28.5 % | ************************ 25.0 % | ******************************* 23.5 % | ***************************** 20.0 % | *** 18.5 % | * 15.0 % | 13.5 % | 10.0 % | 8.5 % | 5.0 % | 3.5 % | 0.0 % | after: 40.0 % | 38.5 % | * 35.0 % | * 33.5 % | **** 30.0 % | **** 28.5 % | *************** 25.0 % | *************************** 23.5 % | *************************************** 20.0 % | ***** 18.5 % | * 15.0 % | 13.5 % | 10.0 % | 8.5 % | 5.0 % | 3.5 % | 0.0 % | Maybe it's 2.5% faster. Relatively to the 25% peak, that's 10% faster. On a 256x256 map, I can't see any change (again). Maybe my test case is not active enough. But, to me, that proves that the ressources gradients is more than fast enough. I did try a 512x512 map too without any problem. I will try to find a good 256x256 test case, and tune your optimisation, and then commit it. So, thank you for the idea !!! >From your idea, I did try another one: depending on which direction a gradient is receiving the update, it can be propagated only to 3 or 5 direction, that's only 4 on average. But I couldn't see any improvements. Nuage simon schuler wrote: > Hi > > [EMAIL PROTECTED] wrote: > >> Very interested, but could you also try to first compute using your >> algorithm then using the old one? Indeed, there is a high probability the >> speed difference is partially due to the gradient already being cached by >> updateGlobalGradientSmallOld (thus the slowness) when >> updateGlobalGradientSmall is called (thus the fastness). >> > > I tested it both ways and it's almost the same result. > I've made a patch against 0.8.17. It's attached. > note: i have only compile-tested it with 0.8.17 because I can't compile > the whole release. But on 0.8.15 it worked perfectly. > I hope it doesn't interfere with nuage's new optimization... > > Simon > > > > ------------------------------------------------------------------------ > > _______________________________________________ > glob2-devel mailing list > [email protected] > http://lists.nongnu.org/mailman/listinfo/glob2-devel _______________________________________________ glob2-devel mailing list [email protected] http://lists.nongnu.org/mailman/listinfo/glob2-devel
