On Friday 20 April 2007 10:39, you wrote:

> The primary conversion we do isn't planar -> packed (this is a fallback
> for when the video is obscured), but from planar to another custom
> planar format.  It would be good to get ARM assembly for the fallback
> path, but most of the problem when using packed lies in having to
> transfer the much larger amount of data over the bus.

It is only a problem of definition :) Whatever it is, packed or planar, this
YUV420 format is not YV12. So it still needs conversion which is 
performed by only reordering bytes and is not much different from 
packed YUY2 (except that it requires less space and bandwidth).

> There's one optimisation that could be done for the YUV420 conversion
> (the custom planar format that Hailstorm takes), which removes a branch,
> ensures 32-bit writes always (instead of one 32-bit and one 16-bit per
> pixel), and unrolls a loop by half.  Might be interesting to see what
> effect this has, but I think it'll still be rather small.

My main performance concern is exactly about this 'omapCopyPlanarDataYUV420'
function. My experience from Nokia 770 video output code optimization shows
that optimization effect can be really huge (it was 1.5x improvement on Nokia
770 for unscaled YV12 -> YUY2 conversion going from a simple loop in C to
optimized assembly code, I provided a link to the relevant code in my previous
post). But N800 code can be probably improved more because now it contains
unnecessary branch in the inner loop and branches are expensive on long
pipeline CPUs. Such color format conversion performance should be
comparable to that of memcpy if done right (it is about half memcpy speed on
Nokia 770 for unscaled YV12 -> YUY2 conversion).

But only benchmarks can be a real proof, any premature speculations are
useless and even harmful. Do you remember the times when nobody from 
Nokia believed that ARM core could be good for video decoding on 770? ;-)

Testing with Nokia_N800.avi video on N800:
#  mplayer -benchmark -quiet -noaspect Nokia_N800.avi

BENCHMARKs: VC:  29,525s VO:  15,029s A:   0,453s Sys:  59,919s =  104,925s
BENCHMARK%: VC: 28,1390% VO: 14,3232% A:  0,4313% Sys: 57,1065% = 100,0000%
BENCHMARKn: disp: 2511 (23,93 fps)  drop: 0 (0%)  total: 2511 (23,93 fps)

Enabling direct rendering (avoids extra memcpy in mplayer, but requires to
disable OSD menu):
#  mplayer -benchmark -quiet -noaspect -dr -nomenu Nokia_N800.avi

BENCHMARKs: VC:  29,826s VO:  12,365s A:   0,437s Sys:  60,555s =  103,182s
BENCHMARK%: VC: 28,9058% VO: 11,9833% A:  0,4236% Sys: 58,6873% = 100,0000%
BENCHMARKn: disp: 2504 (24,27 fps)  drop: 0 (0%)  total: 2504 (24,27 fps)

Testing the same video on Nokia 770:
#  mplayer -benchmark -quiet -noaspect Nokia_N800.avi

BENCHMARKs: VC:  44,982s VO:   7,998s A:   0,884s Sys:  47,936s =  101,801s
BENCHMARK%: VC: 44,1862% VO:  7,8568% A:  0,8688% Sys: 47,0882% = 100,0000%
BENCHMARKn: disp: 2502 (24,58 fps)  drop: 0 (0%)  total: 2502 (24,58 fps)


So Nokia 770, having slower CPU, slower memory and using less efficient 
output format (16bpp vs. 12bpp), still requires less time for video output
than N800 (7,998s vs. 12,365s). Graphics bus performance is unrelated here 
as it is asynchronous operation and it is fast enough. Surely N800 also has
some extra overhead because of interprocess communication with xserver, but
looks like YV12 -> YUV420 conversion is quite a bottleneck here too.

It should be noted that while Nokia_N800.avi video has low resolution and 
N800 has no problems decoding and displaying it, our goal is higher resolution 
videos such as 640x480. Getting to higher resolutions will increase color
format conversion overhead. As it can be seen from these benchmarks, video
output on N800 takes quite a significant time when compared with time needed
for decoding (29,826s for decoding, 12,365s for video output).

I can make an assembly optimized code for YV12 -> YUV420 conversion. Is there
any chance that such optimization could be also integrated into xserver in one
of the next firmware updates if it really provides a significant performance
improvement?

N800 is almost able to play VGA resolution videos properly, it only needs a
bit more optimizations. Color format conversion performance for video output
is one of the important things that can be improved.

> > So for any performance optimizations experiments which result in
> > immediate video performance improvement, either direct framebuffer access
> > should be used again or it would be very nice if xserver could provide
> > direct access to framebuffer (video planes) in yuy2 and that custom
> > yuv420 format in one of the next firmware updates. The xserver itself
> > should not do any excess memory copy operations as they degrade
> > performance (and it does such copy for yuy2 at least).
>
> 'Direct framebuffer access'?  As in, just hand you a pointer to a
> framebuffer somewhere and let you write straight to it?  As this would
> require a firmware update anyway, I don't really see how this would
> improve matters too much, and I really don't want to write any more
> Maemo-specific extensions (I've been working very hard to kill XSP).

Direct framebuffer access will eliminate the need for extra memcpy while
allowing to use OSD menu and subtitles and make everything much easier 
(currenty this is how MPlayer works on Nokia 770).  You can compare the
benchmark results with direct rendering enabled and disabled above. It 
saves ~3 seconds of CPU time on playing Nokia_N800.avi video.

Direct rendering allows to use Xv buffers and decode video in-place. But
unfortunately as data from these buffers is used as reference frames for
decoding next frames, they should be non-modified. And this all makes 
implementing OSD and subtitles tricky.

Having access directly to framebuffer eliminates the need to use this direct
rendering technique and saves us from the complexities associated with it.

> > Also I'm curious about that yuv420 format. From the comments in your
> > code, it looks like it is different from what is described in Epson docs.
> > That seems a bit weird.
>
> Which Epson docs?

The one mentioned by Frantisek. Well, it was just a comment 
for 'omapCopyPlanarDataYUV420' function wrong and misleading, 
nevermind :-) Now everything is clear.
_______________________________________________
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Reply via email to