+    for (i = 0; i < nb_pixel / 4; i++) {
+
+        int *dx_cur = dxdy + 8 * i;
+        int *dy_cur = dxdy + 8 * i + 4;
+
+        call_horiz(ctx, 1, src_dptr, src_width, src_height, src_pitch,
+                   integ_img, dx_cur, dy_cur, pixel_size);
+
+        call_vert(ctx, 1, src_width, src_height, integ_img, pixel_size);
+
+        call_weight(ctx, 1, src_dptr, src_width, src_height, src_pitch, integ_img, 
(float*)s->sum, (float*)s->weight, p, dx_cur, dy_cur, pixel_size);
+    }
+
+    call_average(ctx, 1, src_dptr, src_width, src_height, src_pitch, 
(float*)s->sum, (float*)s->weight,
+                   dst_dptr, dst_width, dst_height, dst_pitch, pixel_size);

My immediate thought when seeing that block is "move this all to the CUDA side", but you're calling all those with different block layouts?

I don't understand the algorithm well enough, so I guess this is necessary.

How well does it perform? All those jumps between C and CUDA code come at an overhead.


Some other nits:
I'm not a fan of a functions just called "init", "uninit" and so on. It's not wrong, given it's static, but it's usually nicer to give all functions a common prefix. "cunlmeans_" or something like that.

What's up with that if(!s->initialised) block in filter_frame? I would have thought it's logically impossible that it gets that far without init being called?



Otherwise, the filter looks fine to me.

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to