On Tue, 2003-12-16 at 12:33, Andrew Stevens wrote:
> Hi all,
> 
> First off a bit of background to the multi-threading in the current stable 
> branch.  First off:
> 
> - Parallelism is primarily frame-by-frame.  This means that the final phases 
> of the encoding lock on completion of the reference frame (prediction and DCT 
> transform) and the predecessor (bit allocation).   If you have a really fast 
> CPU that motion estimates and DCT's very fast you will get lower 
> parallelisation.  If you use -R 0 you will get very litte parallelism *at 
> all*.   Certainly not enough to make -M 3 sensible.

Yet again, good to know.

This line (generally, a triple loop for 0-3 M, 0-1 I and 0-2 R):

Produces this (approximately 1010 frames), encoding times (real time /
user time, gives a bit of a view as to how busy the CPUs were during the
real time, optimal should be 1m realtime, 2m user time, right? and
average system time was 3.0s, with +/- 0.2s for all tests):

(options on each call were:
 -f 8 -g 9 -G 18 -v 0 -E -10 -K kvcd -4 2 -2 1 -F 1 < rawstream.yuv
)

-M 0 -I 0 -R 0: 1m  6.082s      0m 50.050s      baselines
-M 0 -I 0 -R 1: 1m 16.545s      0m 58.980s      ..
-M 0 -I 0 -R 2: 1m 34.511s      1m 17.045s      ..
-M 0 -I 1 -R 0: 2m  7.344s      1m 49.495s      ..
-M 0 -I 1 -R 1: 1m 59.665s      1m 42.215s      ..
-M 0 -I 1 -R 2: 2m 30.990s      2m 30.990s      ..

-M 1 -I 0 -R 0: 1m  5.713s      0m 49.800s      -0.35s
-M 1 -I 0 -R 1: 1m 15.305s      0m 58.975s      -1.2s
-M 1 -I 0 -R 2: 1m 34.057s      1m 17.090s      -0.5s
-M 1 -I 1 -R 0: 2m  5.928s      1m 49.700s      -1.3s
-M 1 -I 1 -R 1: 1m 59.019s      1m 41.955s      -0.6s
-M 1 -I 1 -R 2: 2m 49.149s      2m 31.440s      +19.2s

-M 2 -I 0 -R 0: 1m  0.503s      0m 25.930s      -5.5s
-M 2 -I 0 -R 1: 0m 53.418s      0m 58.950s      -23s
-M 2 -I 0 -R 2: 1m  7.418s      1m 18.145s      -27s
-M 2 -I 1 -R 0: 1m 54.534s      1m 50.060s      -13s
-M 2 -I 1 -R 1: 1m 15.489s      0m  1.040s -- uhm...?
-M 2 -I 1 -R 2: 1m 54.720s      1m 16.720s      -36s

-M 3 -I 0 -R 0: 0m 57.533s      0m 50.610s      -8.5s
-M 3 -I 0 -R 1: 0m 51.541s      0m 40.265s      -25s
-M 3 -I 0 -R 2: 1m  5.996s      0m 54.325s      -29s
-M 3 -I 1 -R 0: 1m 50.570s      1m 49.715s      -17s
-M 3 -I 1 -R 1: 1m 14.462s      1m  8.530s      -45s
-M 3 -I 1 -R 2: 1m 36.192       0m 52.145s      -54s

Interestingly, and I think this has to do with the I/O buffering, -M 0
is slower than -M 1 by a small fraction in all tests. And as Steven
Shultz had suggested, -I 1 is a bad bad idea. It never improved
performance, and made it in fact quite a bit worse (the man page is
right :). (Of course, -M 1 will be at least two processes, and since I
have a real dual system, it makes sense, and may not hold true for a
single CPU)

Also, encoding with one B frame is a touch faster in -I 1 mode than
encoding without them, but it is slower when you encode two B frames
instead of just one. I find this interesting.. I would have expected a
single B frame to take a bit longer than none at all, and that is the
case when -I 0 is on, but not when it's -I 1. Any ideas on that one?

In the end -M 3 is not reasonably faster in -I 0 -R 0, but flys along at
-I 0 -R 2 compared to baseline, and gets fair gains at -I 0 -R 1, while
dropping encoding time by another 14 seconds for the same frameset. So,
does this boil down to the fastest is -M 3 -I 0 -R 1?

The numbers on -M 3 -I 1 -R 2 show a 54 second improvement over the
tests with -M 0, but it takes almost 50% longer than -M 3 -I 0 -R 1. The
file size of 3-1-2 is 13,807,067 and the file size of 3-0-1 is
13,402,673. The file is smaller, and is encoded faster, and viewing them
now, the quality is at least on par (3-0-1 looked a tad better).

> - There is also a parallel read-ahead thread but this rarely soaks much CPU on 
> modern CPUs.
> 
> The MPEG_DEVEL branch encoder stripes all encoding phases to allow much more 
> scalable parallelisation.  You might want to give it a go - I'd be interested 
> in the results!

I'd love to, but I couldn't find it in CVS. I found everything else in
the SF CVS branch, but not mjpegtools itself.

> N.b. in a 'realistic' scenario you're running the multiplexer and audio 
> encoding in parallel with the encoder and video filters communicating via 
> pipes and named FIFO's.   This setup usually saturate a modern dual machine 

No multiplexing and no audio encoding (AC3 pass through and multiplexing
of DVD streams is done after completion of the video encoding). There is
the overhead of decoding the original MPEG2 stream into YUV, but that's
about all else that transcode (which I'm using) is dumping into the
pipe. I avoided any of that on this run by just dumping the file in an
already decoded format (pgmtoy4m output).

> cheers,
> 
>       Andrew
> PS
> I'm away on vacation for a couple of weeks from friday so there'll be a bit of 
> pause in answering emails / posts from then ;-)




-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
_______________________________________________
Mjpeg-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/mjpeg-users

Reply via email to