Cool, thanks, this is all very helpful.

I'm seeing about a 25% performance gain by adding threads=4 compared to the 
default. And I'm using v0.9.0 as installed using apt-get install melt on an 
Ubuntu 14.04.3 OS... not sure how much it's come along in the last couple 
months, but I'm certain I'm behind the curve a little here.

---------- 
jeffrey k eliasen - technologist, philosopher, agent of change
blog <http://jeff.jke.net/> | linkedin 
<http://www.linkedin.com/pub/jeffrey-eliasen/3/a83/b76> | google+ 
<http://plus.google.com/+JeffreyEliasen> | facebook 
<http://facebook.com/jeffrey.eliasen> | twitter 
<http://twitter.com/jeffreyeliasen>
> On Jun 17, 2016, at 07:02, Dan Dennedy <d...@dennedy.org> wrote:
> 
> Let me add to the thread count info. Many encoders such as x264 and x265 
> configure themselves by logical CPU count if you do not specify "threads" or 
> set it to 0. They often actually use more threads than cpu count; they know 
> what they are doing, but it is based on the assumption they need the vast 
> majority of cpu utilization. I do not yet have a comprehensive accounting of 
> which codecs require an explicit threads property to use more than one and 
> which ones cannot be more than 1, but there is some start to capturing this 
> knowledge in Shotcut source code here:
> https://github.com/mltframework/shotcut/blob/master/src/docks/encodedock.cpp#L565
>  
> <https://github.com/mltframework/shotcut/blob/master/src/docks/encodedock.cpp#L565>
> 
> The performance gain of MLT's parallel image processing (abs(real_time) > 1) 
> varies considerably depending on the composition and services involved. It is 
> well short of linear scaling. Some effects are not parallel-safe and block 
> major portions from concurrent access, and that creates bottlenecks. Slowly 
> over time this situation is improving in MLT and frei0r. Based on my testing, 
> to make a general rule, I see little gains setting real_time > (-)4 on a 
> system with 8 logical processors. Sure, I can create some scenarios where 
> going over that still shows some benefits, but again, as a general rule. In 
> Shotcut, I made a heuristic to set this:
> https://github.com/mltframework/shotcut/blob/master/src/mltcontroller.cpp#L718
>  
> <https://github.com/mltframework/shotcut/blob/master/src/mltcontroller.cpp#L718>
> So, on a dual core it will still be 1, on a quad core it will be 3, and on 
> anything more than that only 4. The rest can go to decoders and encoders, 
> many of which are multi-threaded now.
> MLT's parallel image processing (also known as frame-threading) removes the 
> need for each effect to be touched to use SIMD assembler or OpenMP or even to 
> be slice-friendly. That makes it easy to get some parallelism going for 
> nearly all effects, but it is the least performant approach because it is not 
> friendly to the CPU RAM caches and has overhead for locks. 
> 
> 
> On Fri, Jun 17, 2016 at 6:23 AM Brian Matherly <c...@brianmatherly.com 
> <mailto:c...@brianmatherly.com>> wrote:
> Using "real_time" and "threads" options should not change the final result - 
> unless you happen to expose a bug. It should not matter if you use one, the 
> other or in combination.
> 
> In my own experience, if I have an EDL that only has a few 
> filters/transitions and I am encoding to H.264, I only need to set 
> "threads=4" and I can pretty much saturate my CPU. Basically, one core ends 
> up doing all the decoding and MLT processing and the rest of the cores get 
> used up for encoding.
> 
> ~Brian
> 
> 
> From: jeffrey k eliasen <j...@jke.net <mailto:j...@jke.net>>
> To: Brian Matherly <c...@brianmatherly.com <mailto:c...@brianmatherly.com>> 
> Cc: "mlt-devel@lists.sourceforge.net 
> <mailto:mlt-devel@lists.sourceforge.net>" <mlt-devel@lists.sourceforge.net 
> <mailto:mlt-devel@lists.sourceforge.net>>
> Sent: Friday, June 17, 2016 3:28 AM
> 
> Subject: Re: [Mlt-devel] Optimizing `melt` in a CPU-intensive environment
> 
> OK, that's exactly what I was looking for, thanks!
> 
> Looks like I also had a typo in an earlier reply, using ':' instead of '=' to 
> denote a property value, that took longer than it should have to recognize.
> 
> Finally, you mention the real_time and threads options can be combined, but 
> does this change the final result in any way vs. using just one or the other 
> (assuming I'm not dropping frames)?
> 
> 
> ------------------------------------------------------------------------------
> What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
> patterns at an interface-level. Reveals which users, apps, and protocols are
> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
> J-Flow, sFlow and other flows. Make informed decisions using capacity planning
> reports. http://sdm.link/zohomanageengine 
> <http://sdm.link/zohomanageengine>_______________________________________________
> Mlt-devel mailing list
> Mlt-devel@lists.sourceforge.net <mailto:Mlt-devel@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/mlt-devel 
> <https://lists.sourceforge.net/lists/listinfo/mlt-devel>

------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports. http://sdm.link/zohomanageengine
_______________________________________________
Mlt-devel mailing list
Mlt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mlt-devel

Reply via email to