Thilo Borgmann schrieb: > Michael Niedermayer schrieb: >> On Wed, Oct 21, 2009 at 12:33:21PM +0200, Thilo Borgmann wrote: >>> Michael Niedermayer schrieb: >>>> On Tue, Oct 20, 2009 at 03:00:40PM +0200, thilo.borgmann wrote: >>>>> Author: thilo.borgmann >>>>> Date: Tue Oct 20 15:00:40 2009 >>>>> New Revision: 5419 >>>>> >>>>> Log: >>>>> Splits reading of block data and decoding of block data. >>>>> Introduces ALSBlockData struct. >>>> You are missing the "why" part, that should be explained in the commit >>>> message >>> Yes, sorry. >>> >>>> also this needs a benchmark as there are many additional dereferences >>>> added >>> It is a necessary evil to support MCC. If it would be faster the "old" >>> way for non-MCC files, would this reason to have both, a split read & >>> decode function pair and an all-in-one function? >> I think a benchmark is usefull to judge if we should spend time thinking >> about alternatives to the many dereferences or not >> > > The combined (old) function: > > 848450 dezicycles in combined, 1 runs, 0 skips > 436625 dezicycles in combined, 2 runs, 0 skips > 422562 dezicycles in combined, 4 runs, 0 skips > 251822 dezicycles in combined, 8 runs, 0 skips > 275631 dezicycles in combined, 16 runs, 0 skips > 244726 dezicycles in combined, 32 runs, 0 skips > 206217 dezicycles in combined, 64 runs, 0 skips > 179422 dezicycles in combined, 119 runs, 9 skips > 179422 dezicycles in combined, 119 runs, 137 skips > > The separate (new) functions: > > 984100 dezicycles in separate, 1 runs, 0 skips > 499555 dezicycles in separate, 2 runs, 0 skips > 534420 dezicycles in separate, 4 runs, 0 skips > 369905 dezicycles in separate, 8 runs, 0 skips > 340817 dezicycles in separate, 16 runs, 0 skips > 280026 dezicycles in separate, 32 runs, 0 skips > 263883 dezicycles in separate, 64 runs, 0 skips > 231872 dezicycles in separate, 119 runs, 9 skips > 231872 dezicycles in separate, 119 runs, 137 skips > > This is a 30% difference which makes me think to try these alternatives. > > What comes into my mind would be to use local copies, thus dereferencing > the field of *bd just twice. One at the top and one at the bottom of the > function. >
I tested using local copies instead of dereferencing: 10823450 dezicycles in local copies, 1 runs, 0 skips 6122845 dezicycles in local copies, 2 runs, 0 skips 4420565 dezicycles in local copies, 4 runs, 0 skips 3557323 dezicycles in local copies, 8 runs, 0 skips 2553006 dezicycles in local copies, 16 runs, 0 skips 2554690 dezicycles in local copies, 32 runs, 0 skips 2424406 dezicycles in local copies, 64 runs, 0 skips 2535575 dezicycles in local copies, 128 runs, 0 skips 2242664 dezicycles in local copies, 256 runs, 0 skips 69085900 dezicycles in dereferences, 1 runs, 0 skips 35455330 dezicycles in dereferences, 2 runs, 0 skips 19061607 dezicycles in dereferences, 4 runs, 0 skips 10732197 dezicycles in dereferences, 8 runs, 0 skips 6036062 dezicycles in dereferences, 16 runs, 0 skips 3893601 dezicycles in dereferences, 32 runs, 0 skips 3105304 dezicycles in dereferences, 64 runs, 0 skips 2732319 dezicycles in dereferences, 128 runs, 0 skips 2333672 dezicycles in dereferences, 256 runs, 0 skips That's a 4% gain so I think local copies don't pay off... Other alternatives? -Thilo _______________________________________________ FFmpeg-soc mailing list FFmpeg-soc@mplayerhq.hu https://lists.mplayerhq.hu/mailman/listinfo/ffmpeg-soc