+ for (i = 0; i < nb_pixel / 4; i++) { + + int *dx_cur = dxdy + 8 * i; + int *dy_cur = dxdy + 8 * i + 4; + + call_horiz(ctx, 1, src_dptr, src_width, src_height, src_pitch, + integ_img, dx_cur, dy_cur, pixel_size); + + call_vert(ctx, 1, src_width, src_height, integ_img, pixel_size); + + call_weight(ctx, 1, src_dptr, src_width, src_height, src_pitch, integ_img, (float*)s->sum, (float*)s->weight, p, dx_cur, dy_cur, pixel_size); + } + + call_average(ctx, 1, src_dptr, src_width, src_height, src_pitch, (float*)s->sum, (float*)s->weight, + dst_dptr, dst_width, dst_height, dst_pitch, pixel_size);
My immediate thought when seeing that block is "move this all to the CUDA side", but you're calling all those with different block layouts?
I don't understand the algorithm well enough, so I guess this is necessary.How well does it perform? All those jumps between C and CUDA code come at an overhead.
Some other nits:I'm not a fan of a functions just called "init", "uninit" and so on. It's not wrong, given it's static, but it's usually nicer to give all functions a common prefix. "cunlmeans_" or something like that.
What's up with that if(!s->initialised) block in filter_frame? I would have thought it's logically impossible that it gets that far without init being called?
Otherwise, the filter looks fine to me.
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".