On Tue, Dec 8, 2020 at 7:14 PM Brian Matherly <brian.mathe...@yahoo.com> wrote:
> > > As a reminder, frame threading is when > > abs(consumer.get_int("real_time")) > 1. There are some problems with it. > > 1. sometimes there is a crash > > 2. image artifacts due to race conditions > > I think that the main reason for this is that we try to allow each > service instance to be running get_image() in multiple threads at the > same time. > > I would suggest the following strategy: > > 1) Design each service to use a lock (mutex) around the entire > get_image() function. I think this would solve most crash and artifact > issues. It would also simplify the code design in some cases. > > I think that would help a lot for those 2 issues, but it will sacrifice a lot of parallelism. Take a simple use case of scaling a video. Today, with frame threading, it can be scaling multiple frames in parallel. With your proposal it would not. Only different operations that are not waiting on a common operation would be in parallel. > 2) Optimize each service so that the get_image() function can complete > in one frame duration. For 60fps, that would be 16ms. This optimization Of course, that is a nice goal, but you know it is not possible for all workloads when you do not control the hardware. A budget laptop CPU is simply not going to be able to handle 4K60 in real time on most operations. can be done using OpenMP, SMID or slice threading - which ever is the > best solution for each particular situation. > > 3) Set the frame threading count to whatever is needed to overcome the > delay through the stackup of services. For example, if there are 3 > filters and one transition, then use 4 frame threads for real-time > performance. > > This can change per frame, but this is probably the least important of your points. > > 3. slowness due to accessing slow-to-seek sources in non-sequential order > > With my proposal above, I think we could serialize the access to the > producer. As long as the image can be produced within one frame duration > (on average). When the consumer is assigning threads, it would need some > way to wait until the producer from the previous frame has provided an > image. This would add one additional frame thread to the stackup above. > > Perhaps, but it seems like even more work than what I suggested may already be too much to invest into an 8-bit pipeline! > 4. inefficient usage of CPU cache memory > > 5. increased memory usage > > > > #1 and 2 have improved significantly over the years through diligence > > and hard work, but it still occurs sometimes because the framework > > permits so many combinations of things. I think the most recent move > > to use thread-local frei0r instances (context) helped, but that also > > adds some inefficiency. > > > > For #1, in Shotcut, if an export job with frame-threading fails, > > Shotcut automatically restarts the job without frame-threading. And > > for all of the above reasons, Shotcut no longer defaults to enable > > frame-threading. I should also point out that due to the > > inefficiencies, Shotcut also limits the amount of threads to 4 (and > > automatically computes a lower amount on systems with <= 4 CPU threads). > > > > There is no solution today for #3. In the avformat producer, we could > > cache decoded frames such that an out-of-order frame request will get > > a decoded frame. Today, the producer caches converted video frames. > > Converted frames are frames that have been converted their pixel > > format and colorspace to something available in MLT based upon the > > get_image request such that requesting the same frame requires no > > re-conversion. I do not want to cache both because that is memory > > hungry. I do not want to convert every decoded frame to reach the > > requested frame because that will make it slower. I might convert this > > to cache unconverted decoded frames. > > > > #4 and #5 rather come with the territory especially #5 when frames > > hold a whole uncompressed image. > > > > I want to expand the usage of slice threading. I have a branch ready > > to merge that improves mlt_slices in frei0r, but you still need to opt > > into it on a per-plugin basis since some of them are incompatible. > > Basically, this simple change automatically adjusts the number of > > slices until each slice has the same height. > > > > I think it will be easy to add slice processing to all cases of pixel > > format and colorspace conversion using swscale as proven by the > > existing code for mlt_image_yuv422 in the avformat producer. > > Slice-based scaling is still not reliable in swscale, and I might add > > a z.img scaler, which supports tiles. > > > I have also wondered if we could simply reduce the number of conversions > in the stackup by strategically adding more pixel format support to a > few specific filters for common use cases. Maybe worth some investigation. > That can help too. > > > Of course, there are still more places in MLT where one can apply > > mlt_slices and convert floating point code to integer. And both frei0r > > and MLT can use OpenMP and more SIMD. But all that for an 8-bit > > pipeline? I am not so sure. > > >
_______________________________________________ Mlt-devel mailing list Mlt-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mlt-devel