On Thu, Aug 16, 2007 at 11:30:37PM +0200, Marco Gerards wrote:
> Michael Niedermayer <[EMAIL PROTECTED]> writes:
>
> Hi,
>
> > On Thu, Aug 16, 2007 at 07:37:24PM +0200, Marco Gerards wrote:
> > [...]
> >> > [...]
> >> >
> >> > dirac_arith.h:
> >> > [...]
> >> >> typedef struct dirac_arith_state {
> >> >> /* Arithmetic decoding. */
> >> >> unsigned int low;
> >> >> unsigned int range;
> >> >> unsigned int code;
> >> >> unsigned int bits_left;
> >> >> int carry;
> >> >> unsigned int contexts[ARITH_CONTEXT_COUNT];
> >> >>
> >> >> GetBitContext *gb;
> >> >> PutBitContext *pb;
> >> >
> >> > GetBitContext gb;
> >> > PutBitContext pb;
> >> > would avoid a pointer dereference if these 2 are used in the innermost
> >> > loops, though maybe their use can be avoided by reading/writing 8bit
> >> > at a time similar to how our cabac reader and range coder works
> >>
> >> Can I just copy the GetBitContext, PutBitContext at initialization
> >> time, or is there a more efficient way to do this?
> >
> > no, its ok
>
> Do you mean it is ok to copy them?yes its ok to copy for the arith coder [...] > > [...] > >> > > >> > this is VERY inefficient, most of what is done here does not need to be > >> > done > >> > per pixel or can be precalculated > >> > >> Absolutely! gprof agrees with you ;-) > >> > >> Actually, this comment applies to all code above, I assume. > >> > >> time seconds seconds calls ms/call ms/call name > >> 40.85 2.41 2.41 88939320 0.00 0.00 upconvert > >> 29.49 4.15 1.74 50 34.80 113.59 dirac_decode_frame > >> 10.34 4.76 0.61 417867 0.00 0.00 motion_comp_block1ref > >> 6.78 5.16 0.40 540 0.74 0.74 dirac_subband_idwt_53 > >> 4.24 5.41 0.25 8689089 0.00 0.00 coeff_unpack > >> 4.07 5.65 0.24 8707719 0.00 0.00 dirac_arith_read_uint > >> > >> Luca and I talked about if I would concentrate on further > >> optimizations or on the encoder and he said he preferred having me > >> working on the encoder for now, because I know Dirac quite well by now > >> (Luca, please tell me if I am wrong in any way). > >> > >> So there is quite some room for optimizations. In the specification, > >> the used pseudo code looped over the pixels. My current code at least > >> doesn't do that. But I agree, the MC code is where a lot of > >> optimizations can be made: > >> > >> - The qpel/eightpel interpolation code is very inefficient and perhaps > >> has to be merged with motion_comp_block1ref and > >> motion_comp_block2ref. In that case it can be optimized. We > >> especially have to take care of border cases to avoid clipping, more > >> or less like I did for halfpel interpolation. > > > > simply making upconvert() work not with 1 pixel but with a block of pixels > > should make it alot more efficient > > I moved this into motion_comp_block1ref/motion_comp_block2ref. This > works quite well and boosts the speed of the decoder by 50% and makes > further optimizations easier. Hopefully you do not have problems with > this approach. iam fine with it [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Good people do not need laws to tell them to act responsibly, while bad people will find a way around the laws. -- Plato
signature.asc
Description: Digital signature
_______________________________________________ FFmpeg-soc mailing list [email protected] http://lists.mplayerhq.hu/mailman/listinfo/ffmpeg-soc
