On 03.03.2005, at 20:09, GSR - FR wrote:
With your idea, calculating the full 3000*3000 with a depth of 3 is like calculating 9000*9000 (81 million pixels in RGB, 243*10^6 bytes plus overhead) and in time it should be 9 times the 3000*3000 non adaptive version plus the scale operation. To avoid absurd memory usage, the code will have to be more complex than just render big and then scale down. It could sample multiple planes and average (9 stacked tiles, each with a small offset for the gradient sampling).
Huh? In my eyes the code would be *very* simple. Instead of an allocation of a 9 times size the image of the sampled version I'd rather allocate the sample tiles for the work area per thread So, to stay with this example, for the rendering of 1 tile one will need the memory for the 1 permanent tile plus 9 temporary ones that will be reused after supersampling.
Current adaptive is not paralel but the algorithm, at the logic level, is paralelizable in tiles, or groups of tiles to not waste so much in edges.
I don't see how but that would be a good start.
So I did some rough tests, 2000*2000 with adaptive vs 6000*6000 without adaptive (9000 was too much for my computer, so tried 2 and 6, same 1:3 ratio and still big). Small with adaptive was 10.3 sec and big without adaptive was 9.6 sec for linear black to white from one corner to another or side to side.
Your idea does not seem to be always faster, not approaching the 10x magical "order of magnitude" in many cases but 3x in extreme ones and a big memory hog if done naively. Only cases in which it is faster are when adaptive has to calculate all the samples, due the test overhead being a complete waste.
Apart from your machine obviously being completely different to mine your comparison is neither fair nor even close to correct. Although memory bandwidth is plenty and cache are big nowadays an approach using a magnitude more memory in some non-efficient way will lose hands down against some less efficient algorithm on a much smaller workarea.
it avoid checks. When you want oversampling, adaptive one is faster in many cases than full sampling, otherwise it would have been silly to design and code it in first instance.
So please, apples to apples and oranges to oranges.
If I weren't so short of time I would simply remove gobs of crufty uncomprehensible code, reuse the current code for a parallel full supersampling idea, simply to prove you wrong. On top of this it should be pretty simple to readd the adaptiveness for another large speed gain.
Description: This is a digitally signed message part