On Thursday, 28 April 2016 at 06:22:18 UTC, Relja Ljubobratovic wrote:

Can you share with us some of your experience working on image and video processing modules in the app, such as are filters here:
http://www.infognition.com/VideoEnhancer/filters.html

If I may ask, was that part implemented in D, C++, or was some 3rd party library used?

Thanks!

The filters listed there are third-party plugins originally created for VirtualDub ( http://virtualdub.org/ ) by different people, in C++. We made just 2-3 of them, like motion-based temporal denoiser (Film Dirt Cleaner) and Intelligent Brightness filter for automatic brightness/contrast correction. Our most interesting and distinctive piece of tech is our Super Resolution engine for video upsizing and it's not in that list, it's built-in in the app (and also available separately as plugins for some other hosts). All this image processing stuff is written in C++ and works directly with raw image bytes, no special libraries involved. When video processing starts our filters usually launch a bunch of worker threads and these threads work in parallel each on its part of video frame (divided into horizontal stripes usually). Inside they often work block-wise and we have a bunch of template classes for different blocks (RGB or monochrome) parameterized by pixel data type and often block size, so the size is often is known at compile-time and compiler can unroll the loops properly. When doing motion search we're using our vector class parameterized by precision, so we have vectors of different precision (low-res pixel, high-res pixel, half-pixel, quarter-pixel etc.) and type system makes sure I don't add or mix vectors of different precision and don't pass a half-pixel-precise vector to a block reading routine that expects quarter-pixel precise coordinates. Where it makes sense and possible we use SIMD classes like F32vec4 and/or SIMD intrinsics for pixel operations.

Video Enhancer allows chaining several VD filters and our SR rescaler instances to a pipeline and it's also parallelized, so when first filter finishes with frame X it can immediately start working on frame X+1 while the next filter is still working on frame X. Previously it was organized as a chain of DirectShow filters with a special Parallelizer filter inserted between video processing ones, this Parallelizer had some frame queue inside and separated receiving and sending threads, allowing the connected filters to work in parallel. In version 2 it's trickier, since we need to be able to seek to different positions in the video and some filters may request a few frames before and after the current, so sequential pipeline doesn't suffice anymore, now we build a virtual chain inside one big DirectShow filter, and each node in that chain has its worker thread and they do message passing to communicate. After all, we now have a big DirectShow filter in 11K lines of C++ that does both Super Resolution resizing and invoking VirtualDub plugins (imitating VirtualDub for them) and doing colorspace conversions where necessary and organizing them all into a pipeline that is pull-based inside but behaves as push-based DirectShow filter outside.

So the D part is using COM to build and run a DirectShow graph with all the readers, splitters, codecs and of course our big video processing DirectShow filter, it talks to it via COM and some callbacks but doesn't do much with video frames apart from copying.

Btw, if you're interested in an image processing app in pure D, I've got one too:
http://www.infognition.com/blogsort/
(sources: https://bitbucket.org/infognition/bsort )

Reply via email to