On Tue, Dec 8, 2020 at 7:14 PM Brian Matherly <brian.mathe...@yahoo.com>
wrote:

>
> > As a reminder, frame threading is when
> > abs(consumer.get_int("real_time")) > 1. There are some problems with it.
> > 1. sometimes there is a crash
> > 2. image artifacts due to race conditions
>
> I think that the main reason for this is that we try to allow each
> service instance to be running get_image() in multiple threads at the
> same time.
>
> I would suggest the following strategy:
>
> 1) Design each service to use a lock (mutex) around the entire
> get_image() function. I think this would solve most crash and artifact
> issues. It would also simplify the code design in some cases.
>
>
I think that would help a lot for those 2 issues, but it will sacrifice a
lot of parallelism. Take a simple use case of scaling a video. Today, with
frame threading, it can be scaling multiple frames in parallel. With your
proposal it would not. Only different operations that are not waiting on a
common operation would be in parallel.


> 2) Optimize each service so that the get_image() function can complete
> in one frame duration. For 60fps, that would be 16ms. This optimization


Of course, that is a nice goal, but you know it is not possible for all
workloads when you do not control the hardware. A budget laptop CPU is
simply not going to be able to handle 4K60 in real time on most operations.

can be done using OpenMP, SMID or slice threading - which ever is the
> best solution for each particular situation.
>
> 3) Set the frame threading count to whatever is needed to overcome the
> delay through the stackup of services. For example, if there are 3
> filters and one transition, then use 4 frame threads for real-time
> performance.
>
>
This can change per frame, but this is probably the least important of your
points.


> > 3. slowness due to accessing slow-to-seek sources in non-sequential order
>
> With my proposal above, I think we could serialize the access to the
> producer. As long as the image can be produced within one frame duration
> (on average). When the consumer is assigning threads, it would need some
> way to wait until the producer from the previous frame has provided an
> image. This would add one additional frame thread to the stackup above.
>
>
Perhaps, but it seems like even more work than what I suggested may already
be too much to invest into an 8-bit pipeline!

> 4. inefficient usage of CPU cache memory
> > 5. increased memory usage
> >
> > #1 and 2 have improved significantly over the years through diligence
> > and hard work, but it still occurs sometimes because the framework
> > permits so many combinations of things. I think the most recent move
> > to use thread-local frei0r instances (context) helped, but that also
> > adds some inefficiency.
> >
> > For #1, in Shotcut, if an export job with frame-threading fails,
> > Shotcut automatically restarts the job without frame-threading. And
> > for all of the above reasons, Shotcut no longer defaults to enable
> > frame-threading. I should also point out that due to the
> > inefficiencies, Shotcut also limits the amount of threads to 4 (and
> > automatically computes a lower amount on systems with <= 4 CPU threads).
> >
> > There is no solution today for #3. In the avformat producer, we could
> > cache decoded frames such that an out-of-order frame request will get
> > a decoded frame. Today, the producer caches converted video frames.
> > Converted frames are frames that have been converted their pixel
> > format and colorspace to something available in MLT based upon the
> > get_image request such that requesting the same frame requires no
> > re-conversion. I do not want to cache both because that is memory
> > hungry. I do not want to convert every decoded frame to reach the
> > requested frame because that will make it slower. I might convert this
> > to cache unconverted decoded frames.
> >
> > #4 and #5 rather come with the territory especially #5 when frames
> > hold a whole uncompressed image.
> >
> > I want to expand the usage of slice threading. I have a branch ready
> > to merge that improves mlt_slices in frei0r, but you still need to opt
> > into it on a per-plugin basis since some of them are incompatible.
> > Basically, this simple change automatically adjusts the number of
> > slices until each slice has the same height.
> >
> > I think it will be easy to add slice processing to all cases of pixel
> > format and colorspace conversion using swscale as proven by the
> > existing code for mlt_image_yuv422 in the avformat producer.
> > Slice-based scaling is still not reliable in swscale, and I might add
> > a z.img scaler, which supports tiles.
> >
> I have also wondered if we could simply reduce the number of conversions
> in the stackup by strategically adding more pixel format support to a
> few specific filters for common use cases. Maybe worth some investigation.
>

That can help too.


>
> > Of course, there are still more places in MLT where one can apply
> > mlt_slices and convert floating point code to integer. And both frei0r
> > and MLT can use OpenMP and more SIMD. But all that for an 8-bit
> > pipeline? I am not so sure.
> >
>
_______________________________________________
Mlt-devel mailing list
Mlt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mlt-devel

Reply via email to