Re: [Mjpeg-users] -M 2/3 on SMP is slower than -M 0

Andrew Stevens Tue, 16 Dec 2003 15:24:18 -0800

Hi Steven,  Trent,

> But what about bit allocation?  You need to know how big the last GOP was
> to figure out how many bits you can use for the next GOP.


Actually, this is not such a big deal provided the GOPs are well seperated.  
Simplifying a little, you just need to ensure that you have >= the assumed 
amount of decoder buffer full at the end of each 'chunk' as you assumed 
starting to encode its successor.    

However, this idea came to mind more as a sneaky way of doing accurately sized 
single-pass encoding: work on multiple 'segments' spread across the video 
sequence so you get a good statistical sample of how your total 
bit-consumption is going relative to your target.  This is rotten for 
parallelism thought because you have two more or less totally uncorrelated 
memory footprints.  For DVD 'segments' would kind of naturally correlate with 
'chapters' at the authoring level.

In the MPEG_DEVEL branch encoding of each frame (apart from the bit-packed 
coding and bit allocation which is only a small fraction of the CPU load) is 
simply striped across the available CPUs.  This has a nice side effect of 
reducing each CPUs working set too as it only deals with a fraction of a 
frame.

Having said all that I'll probably simply do a simple two-pass encoding mode 
first (much simpler frame feeding!).


> > Of course, Andrew would be much better suited to discuss mpeg2enc's
> > memory access patterns during encoding, which depending on how it
> > does go about accessing memory can better make use of the 256k of
> > cache, or cause the 256k of cache to be constantly thrashed in and
> > out.
>
> I seem to recall that one of the biggest performance bottlenecks of
> mpeg2enc is they way it accesses memory.  It runs each step of the encoding
> processes and en entire frame at a time.  It's much more cache friendly run
> every stage of the encoding process on a single macroblock before moving on
> the to next macroblock.

The single-macroblock approach has been implemented for quite some time now 
(since the move to C++ roughly).  In rather basic English speed improved 
by... bugger all.  I was *most* surprised, it could well be that the story is 
rather different on multi-CPU machines.  At least I like to hope the work 
wasn't wasted ;-)

Actually, the memory footprint of encoding is much larger than you'd think.  
Remember each 16x16 int16_t difference macroblock gets generated from nastily 
unaligned 16x16 or 16x8 uint8_t predictors and a 16x16 uint8_t picture 
macroblock.  The difference is then DCT-ed in place into 4 8x8 int16_t DCT 
blocks which are then quantised in 4 8x8 int16_t quantised DCT blocks.

Where mpeg2enc could speed up is:

- DCT blocks are in 'correct' and not transposed form.  This is simply a waste 
as by transposing quantiser matrices and the scan sequence you can simply 
skip this.

- Each quantised DCT block is seperately stored.  Nice for debugging, poor for 
memory performance ;-)

- DCT is not combined with quantisation when this is possible.

- Motion estimation (probably wastefully) computes a lot of variances that 
could probably better be replaced by SAD for fast encoding modes.

- The current GOP sizing approach is wasteful.   Frame type should only be 
decided once the best encoding modest (Intra, various inter motion prediction 
modes) is known.  Basically, you turn a B/P frame into an I frame if you've 
reached your GOP length limit or it has enough Intra coded blocks that it is 
more compact that way.   Unfortunately, the current allocation algorithm 
still has a few 'left over' elements that need to know GOP size in advance 
that need to be replaced before this can be fixed.   I'm currently working on 
bit-allocation (basically, a two-pass / look-ahead mode plus the above 
improvement).

A similar approach can be used for deciding B/P frame selection but this is 
expensive in CPU as you basically have to do encode each potential B frame's 
reference frame twice.  I'm playing around with ideas for trying B frames out 
and if they don't seem worthwhile turning them off and then periodically 
checking if it might make sense to turn them on a again.


Andrew



-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
_______________________________________________
Mjpeg-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/mjpeg-users

Re: [Mjpeg-users] -M 2/3 on SMP is slower than -M 0

Reply via email to