Just a correction, I messed up the referenced paper, and I actually should 
refer to this one:

https://arxiv.org/pdf/1602.00763

As the one I mentioned previously seems to be an extension of the SORT, but 
with appearance information, which is **NOT** what what the filter I 
implemented.

An embarrassing mistake, and I will fix it in the code and PR (but it will keep 
forever in the mailing list :-().

In any case, I probably need to spend more time reading papers.

Also, I hope it's clear that this work is a POC, and I am not suggesting being 
mainlined, at least not in the short term.

On 2/20/25 14:06, Leandro Santiago wrote:
> [insert meme here]
>
> (this will be a long e-mail)
>
> Dear FFmpeg devs,
>
> in the past days I've been experimenting hacking FFmpeg using Rust.
>
> As I am becoming more familiar with the libavfilter, and it is not a 
> dependency for any other of the libav* libs, I decided this is a good 
> candidate.
>
> It's also convenient as I use FFmpeg libs heavily in a commercial product, 
> and one of the features I've been working on involves a basic multi object 
> tracking.
>
> In my case, it does not need to be a "perfect" tracking algorithm, as I need 
> to compromise quality of the result in exchange of performance executing in 
> the CPU only, so most of the algorithms out there that need a GPU are out of 
> my range.
>
> I decided then use as first experiment a filter called `track_sort` that 
> implements the 2016 paper SIMPLE ONLINE AND REALTIME TRACKING WITH A DEEP 
> ASSOCIATION METRIC, as known as SORT [1].
>
> The filter already works well based on the `master` branch, but the code 
> itself is in very early stages and far from being "production ready", so 
> please do not read the code assuming it's in its final form. It's ugly and 
> needs lots of refactoring.
>
> I've created a PR on forgejo [4] to make it easier for others to track 
> progress, although I use gitlab.com as my main forge.
>
> Here is a description of the filter:
>
> - It perform only object tracking, needing the object detection to be 
> performed elsewhere. It feeds from the detection boxes generated by 
> `dnn_detect`. That means that the quality of the the tracking is closely 
> related to the quality of the detection.
>
> - SORT is a simple algorithm that uses spatial data only, and it not able to 
> handle cases such as object occlusion. It's good enough for my use case, as I 
> mentioned earlier.
>
> - The filter works with the default options, so you can pass it without any 
> arguments. In this mode, it will try to track any objects from the boxes 
> available. You can change this behaviour by specifying the list of labels to 
> track, for example: `track_sort=labels=person|dog|cat`. Such labels come from 
> the ML model you used in the detection filter. It also has the options 
> `threshold`, `min_hits` and `max_age`, which control how the tracking 
> algorithm works, but the default values should work well on most cases.
>
> - The filter will add the tracking information as label on a new frame side 
> data entry of type `AV_FRAME_DATA_DETECTION_BBOXES`. It **WILL NOT** override 
> the side data from `dnn_detect`,, meaning that the frame will have side data 
> two entries of this type. I've created a PR that make it possible to fetch 
> such entry [2].
>
> - The labels in the detection boxes have the format 
> "track:<track_num>:<track_age>", and this is not the final format. I did this 
> way as a quick hack to have some visual information when drawing the boxes 
> and labels with the `drawtext` and `drawbox` filters. I believe this can be 
> improved by putting the tracking information as metadata of the 
> `AVDetectionBBox`es, but this would on API and ABI breaking, so this is still 
> an open question.
>
> What has not been done so far:
>
> I had quite a few goals in this task:
>
> - 1: get a working and efficient implementation of the SORT algorithm.
> - 2: start learning Rust again (it's been ~5 years since I used it)
> - 3: learn more about the libavfilter codebase
> - 4: evaluate whether Rust could work as a second language for hacking FFMpeg.
>
> Results:
>
> - 1: I managed to reuse lots of high quality code, available on crates (the 
> repository of Rust packages), preventing me of needing to write hairy math 
> heavy code. I personally suck in maths, especially linear algebra. Using the 
> paper and the reference implementation [3] was enough, although I do not 
> understand all the math magic. For instance, I reused an existing crate for 
> Kalman filters that I probably would need to implement by hand, as the 
> alternative in C would probably be using the implementation that OpenCV 
> offers. And I am aware that it's not practical to make OpenCV a dependency of 
> FFmpeg.
>
> - 2: yay! Back to Rust!
>
> - 3: I've learned more not only about avfilter, but a bit about other 
> components as well.
>
> - 4: I have more notes on that later, but it feels for me that Rust is 
> natural candidate for new code in large C codebases, as it integrates quite 
> tell, with some warts. I also have no idea whether the FFmpeg community has 
> discussed about Rust in the codebase in the past and, if, not, why not now?
>
> Some notes on using Rust:
>
> In general I enjoyed using Rust in the project, and if you have a look at the 
> code, you'll notice that I am not reusing any of the nice C macros that make 
> a lot of stuff easier on writing new filters. That means that the Rust code 
> looks like the expanded macro versions from C. And that's a lot of 
> boilerplate and ugly code.
>
> There were some reasons for that: One is that I am still learning Rust 
> macros, and wanted to focus on getting stuff done for now. Second is that 
> Rust has a much more powerful macro system than C does, and avoiding macros 
> now allow me to feel all the pain of writing the manual code. Such pain, I 
> believe, can help a set of Rust macros to "emerge" from the codebase, rather 
> than one designing a set of macros that will probably look like the C ones, 
> which might not be "rusty" enough. And I don't find a good practise to design 
> APIs before having some implementation (looking at you, C++ committee).
>
> I've been developing on Manjaro Linux and for now building FFmpeg statically 
> with `--disable-stripping --enable-debug=3 --disable-optimizations` and the 
> Rust code in `Debug` mode. That means slow code and static builds, which are 
> easy to debug a profile.
>
> Debugging is easy, as I can simply use GDB and it simply works with the Rust 
> and C code mixed. I stil don't have pretty-printer for the Rust part, but 
> this is probably an issue on my setup.
>
> Profiling also works well. Even though the Rust code is in Debug mode, 
> profiling with Hotspot/Perf shows that the tracking code is very efficient 
> (you almost cannot see it in the flamegraph!).
>
> Memory management is a breeze, as the standard library has generic versions 
> of many useful containers, such as Vectors and BTrees. The algorithms there 
> also make transforming and filtering very convenient and type safe.
>
> You get support for unit tests for free. No hassle, no complex setup. Simply 
> write unit tests anywhere and run them with `cargo test`.
>
> It feels very good to get the code to work and not being afraid of things 
> going badly (in the code which is not unsafe, of course!).
>
> WARTS
>
> I did not implement any wrapper on top of the avfilter private API (yay 
> `bindgen`!), so it's used directly on the Rust code. It forces you to write 
> the code as `unsafe` on any interaction with libav* API. Nevertheless, even 
> on unsafe code, working on non owned data is very convenient, as you can turn 
> almost anything into slices, which provide you with lots of convenient 
> algorithms (map, filter, zip, etc.).
>
> Working with C pointers is a very painful and ugly. Especially `**` and 
> `***`. Rust is very verbose on using them in the rust side (they become 
> things like `&*mut *mut *mut`, not really easy to reason about). Rust also 
> does not have the `->` operator, forcing you do do stuff like ``(*foo).bar`, 
> which is simply ugly.
>
> Interacting with the C API is also not trivial, as in Rust one must be 
> explicit about ownership and lifetimes, something which is done implicitely 
> (and often wrongly) in C.
>
> Struct members in Rust must always be explicitely initialized, even for 
> global static variables, which C initializes with zero implicitely.
>
> C unions. Luckily Rust supports them, but they are always unsafe.
>
> `bindgen` does not generate wrappers for `static av_always_inline blah()` 
> functions, as those are... inlined, so when in the need of using those, I had 
> to simply reimplement them in Rust.
>
> In general my impression is that Rust code is more verbose than C in 
> "dangerous" code, but way less verbose in safe code, due to the compiler 
> checks.
>
> WHY? WHY? WHY?????
>
> Ok, why do I, who never really took part on the FFmpeg community come 
> apparently now throwing Rust on your faces? Am I saying you folks should 
> rewrite ffmpeg in rust? I know that especially the Rust community have been 
> involved recently in a lot of conflicts involving large C codebases, and it's 
> not my intention to tell you what or not to do. I recognize having no 
> authority in this group for that and I am essentially just a FFmpeg user.
>
> My intention, first of all, was to get some stuff I needed done. I'm working 
> on a commercial product, and developing in Rust was the quickest way I could 
> get it done (considering my requirements). I've enjoyed a lot working in this 
> project, and I believe my learnings can be useful for the FFmpeg community as 
> a whole.
>
> Demo time
>
> Requirements: Cargo/Rust installed. I am using `1.84.0`, the latest stable, 
> via `rustup`.
>
> You'll need openvino, harfbuzz and freetype installed.
>
> First of all, check out the code from the PR at [4] and compile FFmpeg with:
>
> ```sh
> ./configure ./configure --disable-stripping --enable-debug=3 
> --disable-optimizations --enable-libopenvino --enable-libharfbuzz 
> --enable-libfreetype --enable-openssl
> cargo build && make
> ```
>
> I added a `--enable-rust` flag to the PR, but at the moment it does nothing 
> :-)
>
> Next you should download a pre-trained YOLO4 model and associated files, for 
> perform the object detections:
>
> ```sh
> pip install openvino-dev tensorflow
> omz_downloader --name yolo-v4-tiny-tf
> omz_converter --name yolo-v4-tiny-tf
> wget 
> https://raw.githubusercontent.com/openvinotoolkit/open_model_zoo/refs/heads/master/data/dataset_classes/coco_80cl.txt
> ```
>
> Here we'll use a video from MOT Challenge 2016, [5] which is the one shown in 
> the original SORT paper. You can use it with the command:
>
> ```sh
> ./ffplay https://motchallenge.net/sequenceVideos/MOT16-06-raw.webm -vf 
> 'dnn_detect=dnn_backend=openvino:model=public/yolo-v4-tiny-tf/FP16/yolo-v4-tiny-tf.xml:input=image_input:confidence=0.1:model_type=yolov4:anchors=81&82&135&169&344&319:labels=coco_80cl.txt:async=0:nb_classes=80,track_sort=labels=person,drawbox=box_source=side_data_detection_bboxes:color=red:skip=1,drawtext=text_source=side_data_detection_bboxes:fontcolor=yellow:bordercolor=yellow:fontsize=20:fontfile=DroidSans-Bold.ttf:skip=1'
> ```
>
> The `dnn_detect` options were obtained from the YOLO4 model at [6].
>
> Please also noticed I passed the extra option `skip=1` to both the `drawtext` 
> and the `drawbox` filters. This is to make them render the boxes information 
> from  `track_sort` , instead of the ones from `dnn_detect`. More at [2].
>
> I also recorded a video showing the filter in action [7].
>
> Cheers,
>
> Leandro
>
>
> [1] https://arxiv.org/pdf/1703.07402
> [2] https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/10
> [3] https://github.com/abewley/sort
> [4] https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/11
> [5] https://motchallenge.net/vis/MOT16-06
> [6] 
> https://github.com/openvinotoolkit/open_model_zoo/blob/master/models/public/yolo-v4-tiny-tf/README.md
> [7] https://youtu.be/U_y4-NnaINg
>
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to