Just a correction, I messed up the referenced paper, and I actually should refer to this one:
https://arxiv.org/pdf/1602.00763 As the one I mentioned previously seems to be an extension of the SORT, but with appearance information, which is **NOT** what what the filter I implemented. An embarrassing mistake, and I will fix it in the code and PR (but it will keep forever in the mailing list :-(). In any case, I probably need to spend more time reading papers. Also, I hope it's clear that this work is a POC, and I am not suggesting being mainlined, at least not in the short term. On 2/20/25 14:06, Leandro Santiago wrote: > [insert meme here] > > (this will be a long e-mail) > > Dear FFmpeg devs, > > in the past days I've been experimenting hacking FFmpeg using Rust. > > As I am becoming more familiar with the libavfilter, and it is not a > dependency for any other of the libav* libs, I decided this is a good > candidate. > > It's also convenient as I use FFmpeg libs heavily in a commercial product, > and one of the features I've been working on involves a basic multi object > tracking. > > In my case, it does not need to be a "perfect" tracking algorithm, as I need > to compromise quality of the result in exchange of performance executing in > the CPU only, so most of the algorithms out there that need a GPU are out of > my range. > > I decided then use as first experiment a filter called `track_sort` that > implements the 2016 paper SIMPLE ONLINE AND REALTIME TRACKING WITH A DEEP > ASSOCIATION METRIC, as known as SORT [1]. > > The filter already works well based on the `master` branch, but the code > itself is in very early stages and far from being "production ready", so > please do not read the code assuming it's in its final form. It's ugly and > needs lots of refactoring. > > I've created a PR on forgejo [4] to make it easier for others to track > progress, although I use gitlab.com as my main forge. > > Here is a description of the filter: > > - It perform only object tracking, needing the object detection to be > performed elsewhere. It feeds from the detection boxes generated by > `dnn_detect`. That means that the quality of the the tracking is closely > related to the quality of the detection. > > - SORT is a simple algorithm that uses spatial data only, and it not able to > handle cases such as object occlusion. It's good enough for my use case, as I > mentioned earlier. > > - The filter works with the default options, so you can pass it without any > arguments. In this mode, it will try to track any objects from the boxes > available. You can change this behaviour by specifying the list of labels to > track, for example: `track_sort=labels=person|dog|cat`. Such labels come from > the ML model you used in the detection filter. It also has the options > `threshold`, `min_hits` and `max_age`, which control how the tracking > algorithm works, but the default values should work well on most cases. > > - The filter will add the tracking information as label on a new frame side > data entry of type `AV_FRAME_DATA_DETECTION_BBOXES`. It **WILL NOT** override > the side data from `dnn_detect`,, meaning that the frame will have side data > two entries of this type. I've created a PR that make it possible to fetch > such entry [2]. > > - The labels in the detection boxes have the format > "track:<track_num>:<track_age>", and this is not the final format. I did this > way as a quick hack to have some visual information when drawing the boxes > and labels with the `drawtext` and `drawbox` filters. I believe this can be > improved by putting the tracking information as metadata of the > `AVDetectionBBox`es, but this would on API and ABI breaking, so this is still > an open question. > > What has not been done so far: > > I had quite a few goals in this task: > > - 1: get a working and efficient implementation of the SORT algorithm. > - 2: start learning Rust again (it's been ~5 years since I used it) > - 3: learn more about the libavfilter codebase > - 4: evaluate whether Rust could work as a second language for hacking FFMpeg. > > Results: > > - 1: I managed to reuse lots of high quality code, available on crates (the > repository of Rust packages), preventing me of needing to write hairy math > heavy code. I personally suck in maths, especially linear algebra. Using the > paper and the reference implementation [3] was enough, although I do not > understand all the math magic. For instance, I reused an existing crate for > Kalman filters that I probably would need to implement by hand, as the > alternative in C would probably be using the implementation that OpenCV > offers. And I am aware that it's not practical to make OpenCV a dependency of > FFmpeg. > > - 2: yay! Back to Rust! > > - 3: I've learned more not only about avfilter, but a bit about other > components as well. > > - 4: I have more notes on that later, but it feels for me that Rust is > natural candidate for new code in large C codebases, as it integrates quite > tell, with some warts. I also have no idea whether the FFmpeg community has > discussed about Rust in the codebase in the past and, if, not, why not now? > > Some notes on using Rust: > > In general I enjoyed using Rust in the project, and if you have a look at the > code, you'll notice that I am not reusing any of the nice C macros that make > a lot of stuff easier on writing new filters. That means that the Rust code > looks like the expanded macro versions from C. And that's a lot of > boilerplate and ugly code. > > There were some reasons for that: One is that I am still learning Rust > macros, and wanted to focus on getting stuff done for now. Second is that > Rust has a much more powerful macro system than C does, and avoiding macros > now allow me to feel all the pain of writing the manual code. Such pain, I > believe, can help a set of Rust macros to "emerge" from the codebase, rather > than one designing a set of macros that will probably look like the C ones, > which might not be "rusty" enough. And I don't find a good practise to design > APIs before having some implementation (looking at you, C++ committee). > > I've been developing on Manjaro Linux and for now building FFmpeg statically > with `--disable-stripping --enable-debug=3 --disable-optimizations` and the > Rust code in `Debug` mode. That means slow code and static builds, which are > easy to debug a profile. > > Debugging is easy, as I can simply use GDB and it simply works with the Rust > and C code mixed. I stil don't have pretty-printer for the Rust part, but > this is probably an issue on my setup. > > Profiling also works well. Even though the Rust code is in Debug mode, > profiling with Hotspot/Perf shows that the tracking code is very efficient > (you almost cannot see it in the flamegraph!). > > Memory management is a breeze, as the standard library has generic versions > of many useful containers, such as Vectors and BTrees. The algorithms there > also make transforming and filtering very convenient and type safe. > > You get support for unit tests for free. No hassle, no complex setup. Simply > write unit tests anywhere and run them with `cargo test`. > > It feels very good to get the code to work and not being afraid of things > going badly (in the code which is not unsafe, of course!). > > WARTS > > I did not implement any wrapper on top of the avfilter private API (yay > `bindgen`!), so it's used directly on the Rust code. It forces you to write > the code as `unsafe` on any interaction with libav* API. Nevertheless, even > on unsafe code, working on non owned data is very convenient, as you can turn > almost anything into slices, which provide you with lots of convenient > algorithms (map, filter, zip, etc.). > > Working with C pointers is a very painful and ugly. Especially `**` and > `***`. Rust is very verbose on using them in the rust side (they become > things like `&*mut *mut *mut`, not really easy to reason about). Rust also > does not have the `->` operator, forcing you do do stuff like ``(*foo).bar`, > which is simply ugly. > > Interacting with the C API is also not trivial, as in Rust one must be > explicit about ownership and lifetimes, something which is done implicitely > (and often wrongly) in C. > > Struct members in Rust must always be explicitely initialized, even for > global static variables, which C initializes with zero implicitely. > > C unions. Luckily Rust supports them, but they are always unsafe. > > `bindgen` does not generate wrappers for `static av_always_inline blah()` > functions, as those are... inlined, so when in the need of using those, I had > to simply reimplement them in Rust. > > In general my impression is that Rust code is more verbose than C in > "dangerous" code, but way less verbose in safe code, due to the compiler > checks. > > WHY? WHY? WHY????? > > Ok, why do I, who never really took part on the FFmpeg community come > apparently now throwing Rust on your faces? Am I saying you folks should > rewrite ffmpeg in rust? I know that especially the Rust community have been > involved recently in a lot of conflicts involving large C codebases, and it's > not my intention to tell you what or not to do. I recognize having no > authority in this group for that and I am essentially just a FFmpeg user. > > My intention, first of all, was to get some stuff I needed done. I'm working > on a commercial product, and developing in Rust was the quickest way I could > get it done (considering my requirements). I've enjoyed a lot working in this > project, and I believe my learnings can be useful for the FFmpeg community as > a whole. > > Demo time > > Requirements: Cargo/Rust installed. I am using `1.84.0`, the latest stable, > via `rustup`. > > You'll need openvino, harfbuzz and freetype installed. > > First of all, check out the code from the PR at [4] and compile FFmpeg with: > > ```sh > ./configure ./configure --disable-stripping --enable-debug=3 > --disable-optimizations --enable-libopenvino --enable-libharfbuzz > --enable-libfreetype --enable-openssl > cargo build && make > ``` > > I added a `--enable-rust` flag to the PR, but at the moment it does nothing > :-) > > Next you should download a pre-trained YOLO4 model and associated files, for > perform the object detections: > > ```sh > pip install openvino-dev tensorflow > omz_downloader --name yolo-v4-tiny-tf > omz_converter --name yolo-v4-tiny-tf > wget > https://raw.githubusercontent.com/openvinotoolkit/open_model_zoo/refs/heads/master/data/dataset_classes/coco_80cl.txt > ``` > > Here we'll use a video from MOT Challenge 2016, [5] which is the one shown in > the original SORT paper. You can use it with the command: > > ```sh > ./ffplay https://motchallenge.net/sequenceVideos/MOT16-06-raw.webm -vf > 'dnn_detect=dnn_backend=openvino:model=public/yolo-v4-tiny-tf/FP16/yolo-v4-tiny-tf.xml:input=image_input:confidence=0.1:model_type=yolov4:anchors=81&82&135&169&344&319:labels=coco_80cl.txt:async=0:nb_classes=80,track_sort=labels=person,drawbox=box_source=side_data_detection_bboxes:color=red:skip=1,drawtext=text_source=side_data_detection_bboxes:fontcolor=yellow:bordercolor=yellow:fontsize=20:fontfile=DroidSans-Bold.ttf:skip=1' > ``` > > The `dnn_detect` options were obtained from the YOLO4 model at [6]. > > Please also noticed I passed the extra option `skip=1` to both the `drawtext` > and the `drawbox` filters. This is to make them render the boxes information > from `track_sort` , instead of the ones from `dnn_detect`. More at [2]. > > I also recorded a video showing the filter in action [7]. > > Cheers, > > Leandro > > > [1] https://arxiv.org/pdf/1703.07402 > [2] https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/10 > [3] https://github.com/abewley/sort > [4] https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/11 > [5] https://motchallenge.net/vis/MOT16-06 > [6] > https://github.com/openvinotoolkit/open_model_zoo/blob/master/models/public/yolo-v4-tiny-tf/README.md > [7] https://youtu.be/U_y4-NnaINg > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".