2019-04-09 4:54 GMT+02:00, Song, Ruiling <ruiling.s...@intel.com>: >> > +kernel void vert_sum(__global uint4 *ii, >> > + int width, >> > + int height) >> > +{ >> > + int x = get_global_id(0); >> > + uint4 sum = 0; >> > + for (int i = 0; i < height; i++) { >> > + ii[i * width + x] += sum; >> > + sum = ii[i * width + x]; >> >> This looks like it might be able to overflow in extreme cases? >> >> 3840 * 2160 * (1 - 0)^2 * 255 * 255 = 539,343,360,000 which >> is a long way out of range for a 32-bit int. That requires >> impossible input (all pixels differing by the most extreme >> value), but something like a chequerboard might be of the >> same order? > Yes this is a dilemma for me. Generally the filter is with > high computation cost. > To fix the overflow, we have to use 64bit integer for the > integral image. Most GPUs are not good at 64bit integer > calculation I think. May be we can try later. > So I would prefer to stay with 32bit integer for a while.
Can the overflow be detected at runtime? Could the user choose between 32 and 64 bit calculation? Carl Eugen _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".