Re: N800 Video playback
Hi, The memory bandwidth to the N800 LCD framebuffer is 3 times slower that the bandwidth in the N770? Is it really _that_ big? Siarhei's calculations were correct, so, yes. What is limiting the bandwidth: The OMAP interface, the LCD controller itself or was it a design issue. a) and c). It's just not stable at higher frequencies. Just curious - is there any word out about the N810 regarding this particular issue? (As previously mentioned, my personal killer app for Maemo is full screen 800x480 video @ 30 fps. Will it be possible?) Thanks! Hanno ___ maemo-developers mailing list maemo-developers@maemo.org https://lists.maemo.org/mailman/listinfo/maemo-developers
Re: N800 Video playback
On Thursday 03 May 2007 10:21, Frantisek Dufka wrote: Siarhei Siamashka wrote: If decoding time for each frame will never exceed 28-29ms (which is a tough limitation, cpu usage is not uniform), video playback without dropping any frames will be possible even with tearsync enabled. Would a double or multiple buffering help with this? Yes, most likely it will. N800 has 800x480 virtual size for framebuffer and a new enhanced screen update ioctl. Now it should be possible (did not try yet, but will have some results very soon) to specify output position and size for the rectangle as it gets displayed on the screen. struct omapfb_update_window { __u32 x, y; __u32 width, height; __u32 format; __u32 out_x, out_y; __u32 out_width, out_height; __u32 reserved[8]; }; This theoretically allows us to use some kind of double buffering, we can split framebuffer into two 400x480 parts and while one part is being displayed, another one can be freely filled with the data for the next frame. This will effectively remove the need for OMAPFB_SYNC_GFX, improving peak framerate. But this solution will require support for arbitrary downscaling in YUV420 format for each video frame to fit 400x480 box. The quality will be also reduced a bit, but on the other hand, graphics bus should have no performance problems with sending 400x480 through it. If virtual framebuffer size could be extended to 800x960, this would allow us to use doublebuffering without sacrificing resolution. Anyway, I'll try to fix MPlayer framebuffer output module to properly work with the latest version of N800 firmware and implement this form of doublebuffering. It should provide the fastest video output performance that is possible. Regarding Nokia 770, now it uses 800x600 framebuffer virtual size (some extra waste of RAM?). Anyway, if hwa742 kernel driver could be extended to support this improved screen update API and respect 'out_x' and 'out_y' arguments, we could have four video pages in framebufer memory for 400x240 pixel doubled video output. It could allow to implement a very efficient double buffering for accelerated Nokia 770 SDL project if it ever takes off the ground :) Does mplayer use different threads for displaying and decoding and decode frames in advance? No, it doesn't have any extra threads now. But video playback on Nokia 770 is already parallel, splitting tasks between the following pieces of hardware each working simultaneously: 1. ARM core (demuxing and decoding video into framebuffer) 2. DMA + graphics controller (screen update transferring data from framebuffer into videomemory and performaing YUV-RGB conversion on the fly) 3. C55x DSP core (mp3 audio decoding and playback) There is not much point in creating many threads on ARM, as we only have a single ARM core and splitting work into several threads will not accelerate overall performance. Threads could be useful for doing something extra while waiting for other hardware components to finish their work (waiting for screen update for example), but decoding ahead will also require storing the decoded data somewhere. This place for storing decoded ahead frames could be only some extra space in framebuffer memory, otherwise we would lose some performance on moving this data to framebuffer later (and increasing battery power consumption). As framebuffer space is limited, we would not be able to store many frames ahead, and decoding cpu usage most likely varies not between frames but more like between different scenes (complicated action scene will make us run out of decode ahead buffer pretty fast). Anyway, probably this may be worth trying later, there even exists some threads based MPlayer fork: http://mplayerxp.sourceforge.net/ ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: N800 Video playback
On Friday 04 May 2007 10:49, Daniel Stone wrote: On Thu, May 03, 2007 at 11:10:32PM +0300, ext Siarhei Siamashka wrote: Well, found what's the matter and added explanation at bugzilla: https://maemo.org/bugzilla/show_bug.cgi?id=1281 The workaround can be easily added to MPlayer, so that it will never call XvShmPutImage with top left image corner at an odd line. I'm going to release an updated MPlayer package (maybe even a bit later today), it is really fast on N800 with the optimized xserver :) Aha, that will indeed cause a fallback (x, y, width and height should all be aligned to 4px). Could you clarify this information? The code from kernel framebuffer driver (blizzard.c) suggests that only width should be 4px aligned: switch (color_mode) { case OMAPFB_COLOR_YUV420: /* Embedded window with different color mode */ bpp = 12; /* X, Y, height must be aligned at 2, width at 4 pixels */ x = ~1; y = ~1; height = yspan = height ~1; width = width ~3; break; Does xserver introduce additional limitations? ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: N800 Video playback
On Thu, May 03, 2007 at 11:10:32PM +0300, ext Siarhei Siamashka wrote: On Thursday 03 May 2007 08:48, Siarhei Siamashka wrote: The only thing which is unclear here is that Hailstorm does not need to downscale video in this situation. The bug can be reproduced with 512x288 video which just needs upscaling to 800x450. Also even standard Nokia_N800.avi video with proper aspect ratio causes a huge performance regression and tearing. Please give this #1281 issue another look. It looks like a bug in xserver, but not a hardware limitation. I can probably try to workaround it by requesting not 512x288 buffer from Xv, but something like 512x308, use only 512x288 part of it and artificially add black bands above and below. After that, Xv can be asked to expand it to 800x480 to get expected result But if it is a bug in xserver, it would be better to get it fixed, preferably before the next firmware update :) Well, found what's the matter and added explanation at bugzilla: https://maemo.org/bugzilla/show_bug.cgi?id=1281 The workaround can be easily added to MPlayer, so that it will never call XvShmPutImage with top left image corner at an odd line. I'm going to release an updated MPlayer package (maybe even a bit later today), it is really fast on N800 with the optimized xserver :) Aha, that will indeed cause a fallback (x, y, width and height should all be aligned to 4px). Cheers, Daniel signature.asc Description: Digital signature ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: N800 Video playback
Siarhei Siamashka wrote: If decoding time for each frame will never exceed 28-29ms (which is a tough limitation, cpu usage is not uniform), video playback without dropping any frames will be possible even with tearsync enabled. Would a double or multiple buffering help with this? Does mplayer use different threads for displaying and decoding and decode frames in advance? ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: N800 Video playback
On Tuesday 01 May 2007 20:49, Siarhei Siamashka wrote: Looks like I have to reply to myself. On Tuesday 01 May 2007 17:49, Kalle Vahlman wrote: Applied and build without problems for me. Thanks a lot for building the package and putting it for download, everything seems to be fine, but more details will follow below. [snip] Anyway, the new xserver package works really good. If we do some tests with the standard Nokia_N800.avi video clip, we get the following results with the patched xserver: # mplayer -benchmark -quiet -noaspect Nokia_N800.avi BENCHMARKs: VC: 29,764s VO: 7,666s A: 0,468s Sys: 64,635s = 102,534s BENCHMARK%: VC: 29,0287% VO: 7,4767% A: 0,4565% Sys: 63,0381% = 100,% BENCHMARKn: disp: 2504 (24,42 fps) drop: 0 (0%) total: 2504 (24,42 fps) # mplayer -benchmark -quiet -noaspect -dr -nomenu Nokia_N800.avi BENCHMARKs: VC: 30,266s VO: 5,490s A: 0,467s Sys: 66,286s = 102,509s BENCHMARK%: VC: 29,5255% VO: 5,3554% A: 0,4560% Sys: 64,6631% = 100,% BENCHMARKn: disp: 2501 (24,40 fps) drop: 0 (0%) total: 2501 (24,40 fps) Results with unpatched xserver and some more explanations can be found in [3]. Yes, now N800 is faster than Nokia 770 for video output performance at last :) Well, still not everything is so good until the following bug gets fixed: https://maemo.org/bugzilla/show_bug.cgi?id=1281 The patch for optimized Xv performance will not help to watch widescreen video which triggers this tearing bug. If you see tearing on the screen, you should know that the YUV420 color format conversion optimization patch does not get used at all and xserver most likely uses a slow nonoptimized YUV422 fallback code with software scaling. Fixing this bug is critical for video playback performance. I hope it will be solved in the next version of N800 firmware too. But it we get some patch to solve this problem for testing earlir, that would be nice too. Video output overhead on N800 is really at least halved. Of course, video output takes only some fraction of time in video player. So overall performance improvement for Nokia_N800.avi playback is approximately 20% but not 250%-300% which can be observed for 'omapCopyPlanarDataYUV420' function alone. Before anybody noticed, correcting myself :) This 'omapCopyPlanarDataYUV420' has 2.5x-3x improvement which is equal to 150%-200% in percents. Elementary arithmetics is tough when you are tired ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: N800 Video playback
Kalle Vahlman wrote: I put the deb up at: http://iki.fi/zuh/xserver-xomap_1.1.99.3-0.zuh2_armel.deb until I get it to the repository. This version also has the composite extension enabled, but AFAIK it does not depend on the libs or change server behaviour if composite is not specifically used. The server *should* be compiled with '-mcpu=arm1136j-s -mfpu=vfp -mfloat-abi=softfp -O2', but as I had troubles with the SBOX_EXTRA_COMPILER_ARGS env var being honored some time ago I'm not guaranteeing it at the moment ;) I also succeeded in making the deb: http://fanoush.wz.cz/maemo/xserver-xomap_1.1.99.3-0osso31_armel.deb This one is compiled as thumb (except the ASM code) and no special CPU flags so it can be verified if there is any slowdown. Thumb mode saves approx. 300kb of executable size. It seems to be used by default in firmware images. Kalle, did it link properly for you? With the patch the final Xomap link did not add the ASM code, I had to do it by hand. I didn't find proper place in Makefile for it to be added to libomap.a, the place patched by Siarhei was ignored by the build process for me. Frantisek ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: N800 Video playback
On Wed, May 02, 2007 at 09:16:01AM +0300, ext Siarhei Siamashka wrote: On Tuesday 01 May 2007 20:49, Siarhei Siamashka wrote: Results with unpatched xserver and some more explanations can be found in [3]. Yes, now N800 is faster than Nokia 770 for video output performance at last :) Well, still not everything is so good until the following bug gets fixed: https://maemo.org/bugzilla/show_bug.cgi?id=1281 The patch for optimized Xv performance will not help to watch widescreen video which triggers this tearing bug. If you see tearing on the screen, you should know that the YUV420 color format conversion optimization patch does not get used at all and xserver most likely uses a slow nonoptimized YUV422 fallback code with software scaling. Indeed. And the reason the code is there is because Hailstorm can only downscale at fixed ratios (half and one-quarter), and even then, it locked up when we tried. Similarly, the display controller's downscaling didn't work, either. So we can optimise the fallback path, but you'll still be screwed by sending 16bpp (instead of 12bpp) through RFBI. Fixing this bug is critical for video playback performance. I hope it will be solved in the next version of N800 firmware too. But it we get some patch to solve this problem for testing earlir, that would be nice too. The only patch is optimising that function, really. Even if we did work out a way to make Hailstorm happy, you can still only scale at those exact multiples, which doesn't make it a viable general solution. Cheers, Daniel signature.asc Description: Digital signature ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: N800 Video playback
On Tue, May 01, 2007 at 08:49:20PM +0300, ext Siarhei Siamashka wrote: On Tuesday 01 May 2007 17:49, Kalle Vahlman wrote: For testing, I fabricated some video with gstreamer: which resulted in [EMAIL PROTECTED] and [EMAIL PROTECTED] videos. For some reason 320x240 and 352x288 refused to play with: X11 error: BadValue (integer parameter out of range for operation) MPlayer interrupted by signal 6 in module: flip_page while gstreamer did play them just fine. Also the Nokia_N800.avi and NokiaN93.avi died in the same way. This X11 error on video playback start and also sometimes on switching fullscreen/windowed mode is a known problem [1] reported in this mailing list. If MPlayer dies on start, usually trying to start it again succeeds. So these 320x240 and 352x288 videos could be played as well if you were a bit more persistent :) Resizing is a bit tricky. Most video hardware lets you use the hardware to clip, so if you move it beyond the edge of the screen, it just happily ignores anything beyond the hardware's bounds. Unfortunately for us, attempting to move a video surface off-screen (even by just a few pixels) triggers a hardware lockup. Given that we can't display the frame at all, we send BadValue (there are a couple of other conditions where this is possible, but this is the main one). I don't see the point in returning Success when no video is drawn at all. So, I guess you could hack mplayer's error handler to just ignore BadValues from Xv(Shm)PutImage, unless you get more than five or ten in a row, say. As Daniel replied in one of the followup messages, it is most likely some race condition. The question is which code is a suspect. Is it MPlayer Xv video output code that has been around for ages and worked fine on different systems or relatively new Xv extension code from N800 xserver? In addition, a previous revision of N800 firmware had a serious bug [2] related to video playback. It should be noted, that MPlayer needed only about 1 minute to freeze on the initial N800 firmware. So the problem could be identified much more easily if MPlayer was included in the standard set of tests done by Nokia QA staff before each new IT OS release. Surely, Nokia is only interested in a properly working xvimagesink for the software included in IT OS by default. But testing with more client applications can improve overall xserver quality. Bear in mind that, as you've hinted at, the only part of the Xv code which is custom is the _output_ code. We're using the standard X server implementation (as used by tens of millions of people) for the protocol decode and standard semantics, the standard KDrive layer for extended stuff (as used by god-knows-how-many embedded and consumer devices), and then the only part we have to play is taking frames and putting them on the screen. Due to some restrictions (as above), we have to deliberately error out on some operations. But errors like that tend to say 'you've hit a hardware restriction, I can't do this', rather than 'you hit one of the many random return BadValues we put in this weird code just to confuse people'. Also, bear in mind that a lot of the initial instability was due to the DSP. The video was actually rather stable when you played without sound, although now the situation is somewhat reversed with the DSP being pretty steady now, and the new YUV420 code having complicated semsnatics. I have also submitted this patch to maemo bugzilla, hopefully it (or its modification) can get included into the next version of N800 firmware: https://maemo.org/bugzilla/show_bug.cgi?id=1278 I'll merge it with some changes. Cheers, Daniel signature.asc Description: Digital signature ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: N800 Video playback
On Tue, May 01, 2007 at 11:51:50AM +0300, ext Siarhei Siamashka wrote: On Monday 30 April 2007 17:49, Daniel Stone wrote: Indeed. Unfortunately this is slightly misleading in that it only shows the raw write speed. RFBI can't deal with the sorts of speeds that your hyper-optimised version is pumping out, e.g. So it's mainly just about cutting the latency into the critical path to low enough that it makes no difference. The 'framebuffer' is just the ordinary system memory, converting color format and copying data to framebuffer will be done with the same performance as simulated in this test. RFBI performance is only critical for asynchronous DMA data transfer to LCD controller which does not introduce any overhead and is performed at the same time as ARM core is doing some other work (decoding the next frame). RFBI performance matters only if data transfer to LCD is still not complete at the time when the next frame is already decoded and is ready to be displayed. When playing video, ARM core and LCD controller are almost always working at the same time performing different tasks in parallel. I think I had already explained these details in [1] Right. My point is that the numbers you're showing -- while very good, don't get me wrong -- won't necessarily have a huge direct impact on video playback. Particularly if you want to avoid tearing. So now the results of the tests are consistent - when doing video output, most of ARM core cycles are spent in this 'omapCopyPlanarDataYUV420' function. Well, either that, or just waiting for RFBI transfers to complete. Optimizing it using 'yv12_to_yuv420_line_armv6' will definitely provide a huge effect, video output overhead when using Xv will be at least halved providing more cpu resources for video decoding. Yes, this is one good aspect. I don't have any tips, per se. Once I get it all integrated it'll be in git, but for now, the only public source is the packages. OK, thanks. It may take some time though. I'm still using old scratchbox with mistral SDK here (did not have enough free time to upgrade yet). Until I clean up my scratchbox mess, I can only provide some patch without testing, if anybody courageous can try to build it :) I'm still using Scratchbox 0.9.8.5 for day-to-day stuff ... Well, anyway, everything worked perfectly and I could play 640x480 video on N800 with the following statistics: VIDEO: [DIVX] 640x480 12bpp 23.976 fps 886.7 kbps (108.2 kbyte/s) ... BENCHMARKs: VC: 87,757s VO: 8,712s A: 1,314s Sys: 3,835s = 101,618s BENCHMARK%: VC: 86,3592% VO: 8,5736% A: 1,2932% Sys: 3,7740% = 100,% BENCHMARKn: disp: 2044 (20,11 fps) drop: 355 (14%) total: 2399 (23,61 fps) As you see, mplayer took 8.712 seconds to display 2044 VGA resolution frames. If we do the necessary calculations, that's 72 millions pixels per second, quite close to 'yv12_to_yuv420_line_armv6' capabilities limit, so this function is the only major contributor to video output time. Video output took much less time than decoding, so it proves that video output overhead can be reduced to minimum (in this test tearsync was not used though). I'd be curious to see the results from this with tearsync _enabled_? i.e., after your OMAPFB_UPDATE_WIDNOW call, issue an OMAPFB_SYNC_GFX ioctl before you start writing to memory again. This is basically the limiter for us at this stage. When tearsync comes into action, everything gets a bit more complicated. I'm still investigating its impact on video playback performance. 'Not good'. :) Thanks again for your work. Cheers, Daniel signature.asc Description: Digital signature ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: Documenting maemo pearls (was Re: N800 Video playback)
On 5/2/07, Quim Gil [EMAIL PROTECTED] wrote: On Tue, 2007-05-01 at 03:29 -0300, ext Gustavo Sverzut Barbieri wrote: Daniel, Siarhei, Eero: I always find your mails to provide great deal of tech information about N800. What a coincidence, me too. ;) However we do not have a central place with these information, it would be great if you guys setup a wiki page with tech details about drivers, optimizations and weakness of current implementations so others could base work on. Indeed. But knowing about the day to day of these busy guys I kind of understand why things they write instantly in an email can't be easily reproduced by themselves in a more formal way. I know, and problem is that we're not always sure of some things, some effects are collateral, some are expected... that wastes our time and when you're finished, often you're so tired you won't document it, just archive the excerpt you want, without any context... you'll know it when you need. But we do want to have all these pearls available in a structured way in maemo.org. Easing web publishing is a step, partially covered now by the Midgard CMS integration. Providing an appropriate content structure is a next step (I'm responsible of). Having that doc manager in place will definitely help, as well, as making sure that every relevant component in our architecture is officially covered by someone of the team (still working on this). Until then we will keep getting busy developers really sensitive to openness and dialog, finding some spare time to answer questions and fill indirectly the gaps in our documentation. Quim, while formal documents as those maemo.org provides are cool, it consumes a lot of resources... doing simple but correct/consistent wiki is good enough. Maybe we could setup a techday that we'd meet on IRC and document some topics on Wiki. It would be great to get some people with deep knowledge on hw issues, like Daniel, Siarhei and Eero... I could help with writing and organization, as I never dig on hw that much (but I'll need to do so really soon). ... Said that, there is nothing stopping anyone from collecting these pearls in the maemo.org wiki. ;) Sure -- Gustavo Sverzut Barbieri -- Jabber: [EMAIL PROTECTED] MSN: [EMAIL PROTECTED] ICQ#: 17249123 Skype: gsbarbieri Mobile: +55 (81) 9927 0010 ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: Documenting maemo pearls (was Re: N800 Video playback)
Hi, On Wed, May 02, 2007 at 10:05:13AM -0300, ext Gustavo Sverzut Barbieri wrote: On 5/2/07, Quim Gil [EMAIL PROTECTED] wrote: On Tue, 2007-05-01 at 03:29 -0300, ext Gustavo Sverzut Barbieri wrote: Daniel, Siarhei, Eero: I always find your mails to provide great deal of tech information about N800. What a coincidence, me too. ;) However we do not have a central place with these information, it would be great if you guys setup a wiki page with tech details about drivers, optimizations and weakness of current implementations so others could base work on. Indeed. But knowing about the day to day of these busy guys I kind of understand why things they write instantly in an email can't be easily reproduced by themselves in a more formal way. I know, and problem is that we're not always sure of some things, some effects are collateral, some are expected... that wastes our time and when you're finished, often you're so tired you won't document it, just archive the excerpt you want, without any context... you'll know it when you need. If there's anything you want to know directly, just ask on the list. I tend to deal with email when I'm not actively coding/building/etc, which is how I justify it. A wiki would require me to sit down for a while and really think about stuff, and I don't really have huge blocks of time available to me. But yeah, always happy to answer direct questions. But we do want to have all these pearls available in a structured way in maemo.org. Easing web publishing is a step, partially covered now by the Midgard CMS integration. Providing an appropriate content structure is a next step (I'm responsible of). Having that doc manager in place will definitely help, as well, as making sure that every relevant component in our architecture is officially covered by someone of the team (still working on this). Until then we will keep getting busy developers really sensitive to openness and dialog, finding some spare time to answer questions and fill indirectly the gaps in our documentation. Quim, while formal documents as those maemo.org provides are cool, it consumes a lot of resources... doing simple but correct/consistent wiki is good enough. Maybe we could setup a techday that we'd meet on IRC and document some topics on Wiki. It would be great to get some people with deep knowledge on hw issues, like Daniel, Siarhei and Eero... I could help with writing and organization, as I never dig on hw that much (but I'll need to do so really soon). If you can manage the timezones, that would probably be okay. America/Europe is doable if you guys get up early, just as long as no-one from Asia-Pacific wants to join in ... Cheers, Daniel signature.asc Description: Digital signature ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: Documenting maemo pearls (was Re: N800 Video playback)
Daniel Stone wrote: If there's anything you want to know directly, just ask on the list. I tend to deal with email when I'm not actively coding/building/etc, which is how I justify it. A wiki would require me to sit down for a while and really think about stuff, and I don't really have huge blocks of time available to me. But yeah, always happy to answer direct questions. Disadvantage is that it becomes lost in the list archive. Even when you do search the archive it is hard to know proper keywords and it is very likely your brilliant answer will not be found. Many times I am 100% sure the answer is in the list since I remember someone answered it some time ago but even then it is hard or impossible to find. Gustavo Sverzut Barbieri wrote: Quim, while formal documents as those maemo.org provides are cool, it consumes a lot of resources... doing simple but correct/consistent wiki is good enough. Maybe we could setup a techday that we'd meet on IRC and document some topics on Wiki. It would be great to get some people with deep knowledge on hw issues, like Daniel, Siarhei and Eero... I could help with writing and organization, as I never dig on hw that much (but I'll need to do so really soon). If you can manage the timezones, that would probably be okay. America/Europe is doable if you guys get up early, just as long as no-one from Asia-Pacific wants to join in ... This techday is good idea. Sadly it depends on people being available at that time and most probably most people providing interesting answers may be the most busy ones. I tend to avoid IRC because it is big waste of time. There are few gems too found in the archives (thanks Marius G. ;-) but 98% is just babble and FAQs repeated again and again. However I would try to join such techday on IRC (not that I expect my presence to be useful to others). It would be nice to have such tech days regulary preferably with few topics set in advance. But still I don't know how real it is to achieve this and whether wiki or mailing list is not better suited for this after all. Frantisek ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: N800 Video playback
On Wednesday 02 May 2007 12:54, Daniel Stone wrote: The 'framebuffer' is just the ordinary system memory, converting color format and copying data to framebuffer will be done with the same performance as simulated in this test. RFBI performance is only critical for asynchronous DMA data transfer to LCD controller which does not introduce any overhead and is performed at the same time as ARM core is doing some other work (decoding the next frame). RFBI performance matters only if data transfer to LCD is still not complete at the time when the next frame is already decoded and is ready to be displayed. When playing video, ARM core and LCD controller are almost always working at the same time performing different tasks in parallel. I think I had already explained these details in [1] Right. My point is that the numbers you're showing -- while very good, don't get me wrong -- won't necessarily have a huge direct impact on video playback. Particularly if you want to avoid tearing. I have no idea what other proof would be enough for you. You already got all the numbers, and even benchmarks with patched xserver. They all confirm video output performance improvement. So now the results of the tests are consistent - when doing video output, most of ARM core cycles are spent in this 'omapCopyPlanarDataYUV420' function. Well, either that, or just waiting for RFBI transfers to complete. You need to wait a bit before displaying the next frame anyway, and the period between frames for 30 fps video usually eclipses transfer completion time. If you want some numbers, now 640x480 YUV420 (12bpp) screen update takes now 25ms without tearsync flag enabled (OMAPFB_FORMAT_FLAG_TEARSYNC for OMAPFB_UPDATE_WINDOW ioctl) and 25-42ms with tearsync. For 30 fps video, period between performing screen updates is normally 33ms. For playing video, we initiate RFBI transfer, wait till it completes, perform VY12-YUV420 color format conversion (which should take less than 4ms for 640x480 considering benmchmark results), wait till it is time to display the next frame and start RFBI transfer again. For 30 fps video 25ms+4ms is less than 33ms, so without tearsync enabled, any 640x480 video should play fine (considering video output performance). With tearsync enabled, we should add the time needed for performing vertical sync in LCD controller which breaks our nice numbers. Worst case (17ms wait for retrace + 25ms for actual data transfer) takes more time than 33ms between frames. We can be saved if LCD controller internal refresh rate is really 60Hz, it this case video playback will automagically synchronize to LCD refresh rate and each frame processing will be done exactly within 2 LCD refresh cycles (by the time we want to display a video frame, the next vertical will be near and we will not lose much time waiting for it). If decoding time for each frame will never exceed 28-29ms (which is a tough limitation, cpu usage is not uniform), video playback without dropping any frames will be possible even with tearsync enabled. That's what I'm investigating now. In any case, getting ideal 24 fps playback will be a bit easier. I hope all these explanations are clear now. And this is not just a theory, but already confirmed by some experiments and practical tests. I'm still using Scratchbox 0.9.8.5 for day-to-day stuff ... Thanks, that is what I would consider 'additional tips and tricks' :) It is good to know that maemo 3.x development can be also done with older scratchbox (I have 0.9.8.8 installed now), I'll try it without upgrading scratchbox then. Well, anyway, everything worked perfectly and I could play 640x480 video on N800 with the following statistics: VIDEO: [DIVX] 640x480 12bpp 23.976 fps 886.7 kbps (108.2 kbyte/s) ... BENCHMARKs: VC: 87,757s VO: 8,712s A: 1,314s Sys: 3,835s = 101,618s BENCHMARK%: VC: 86,3592% VO: 8,5736% A: 1,2932% Sys: 3,7740% = 100,% BENCHMARKn: disp: 2044 (20,11 fps) drop: 355 (14%) total: 2399 (23,61 fps) As you see, mplayer took 8.712 seconds to display 2044 VGA resolution frames. If we do the necessary calculations, that's 72 millions pixels per second, quite close to 'yv12_to_yuv420_line_armv6' capabilities limit, so this function is the only major contributor to video output time. Video output took much less time than decoding, so it proves that video output overhead can be reduced to minimum (in this test tearsync was not used though). I'd be curious to see the results from this with tearsync _enabled_? i.e., after your OMAPFB_UPDATE_WIDNOW call, issue an OMAPFB_SYNC_GFX ioctl before you start writing to memory again. This is basically the limiter for us at this stage. That's exactly how MPlayer works. It always waits on OMAPFB_SYNC_GFX before filling framebuffer with the data for the next frame. Not issuing OMAPFB_SYNC_GFX would introduce *artificial* tearing not related to sync with LCD
RE: Documenting maemo pearls (was Re: N800 Video playback)
Don't kill the messenger! But yeah, always happy to answer direct questions. Disadvantage is that it becomes lost in the list archive. This is an old problem communication science solved centuries ago: generally you have those generating information and those collecting it. Asking the sources to organize information is many times as useless as asking the documenters to generate new data. I keep thinking the right approach in our case is: - maemo.org should provide the right infrastructure to document easily (getting there). - the maemo team should make sure that all the essential information reaches the official documentation (still a while to get there). - the maemo community could help organizing themselves in wiki-based collaboration and pointing essential information missing in the official documentation (up to you, tell us where we can help). I keep insisting in a clear separation between official and community documentation. Don't get me wrong, I think the quality and usefulness of community docs can match and outsmart official documentation, in maemo and in any software project (in fact in *any* type of project). But think on the zillions of newcomers we want to welcome: most of them are looking for a single, comprehensive and reliable source of information, structured in a way that makes sense in order to find what I'm looking for. These are elements required in good quality official documentation, while these same elements can kill community workflow (generally quite spontaneous) if not handled properly. Quim ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: N800 Video playback
On Wednesday 02 May 2007 12:39, Daniel Stone wrote: On Wed, May 02, 2007 at 09:16:01AM +0300, ext Siarhei Siamashka wrote: On Tuesday 01 May 2007 20:49, Siarhei Siamashka wrote: Results with unpatched xserver and some more explanations can be found in [3]. Yes, now N800 is faster than Nokia 770 for video output performance at last :) Well, still not everything is so good until the following bug gets fixed: https://maemo.org/bugzilla/show_bug.cgi?id=1281 The patch for optimized Xv performance will not help to watch widescreen video which triggers this tearing bug. If you see tearing on the screen, you should know that the YUV420 color format conversion optimization patch does not get used at all and xserver most likely uses a slow nonoptimized YUV422 fallback code with software scaling. Indeed. And the reason the code is there is because Hailstorm can only downscale at fixed ratios (half and one-quarter), and even then, it locked up when we tried. Similarly, the display controller's downscaling didn't work, either. So we can optimise the fallback path, but you'll still be screwed by sending 16bpp (instead of 12bpp) through RFBI. The only thing which is unclear here is that Hailstorm does not need to downscale video in this situation. The bug can be reproduced with 512x288 video which just needs upscaling to 800x450. Also even standard Nokia_N800.avi video with proper aspect ratio causes a huge performance regression and tearing. Please give this #1281 issue another look. It looks like a bug in xserver, but not a hardware limitation. I can probably try to workaround it by requesting not 512x288 buffer from Xv, but something like 512x308, use only 512x288 part of it and artificially add black bands above and below. After that, Xv can be asked to expand it to 800x480 to get expected result But if it is a bug in xserver, it would be better to get it fixed, preferably before the next firmware update :) Fixing this bug is critical for video playback performance. I hope it will be solved in the next version of N800 firmware too. But it we get some patch to solve this problem for testing earlir, that would be nice too. The only patch is optimising that function, really. Even if we did work out a way to make Hailstorm happy, you can still only scale at those exact multiples, which doesn't make it a viable general solution. I will do optimized software YV12-YUV420 JIT scaler a bit later (on next weekend?). It will be only a minor modification of YV12-YUV422 scaler which already exists and works fine. If it can be useful for xserver, it might be added there at any time. ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: N800 Video playback
2007/5/1, Siarhei Siamashka [EMAIL PROTECTED]: On Tuesday 01 May 2007 17:49, Kalle Vahlman wrote: OK, here is this untested a patch for xserver to add ARMv6 optimized YUV420 color format conversion. Theoretically it should compile (I did not try to build xserver myself though) and work. If it refuses to compile, fixing the patch should be not too difficult. Applied and build without problems for me. Thanks a lot for building the package and putting it for download, everything seems to be fine, but more details will follow below. For testing, I fabricated some video with gstreamer: which resulted in [EMAIL PROTECTED] and [EMAIL PROTECTED] videos. For some reason 320x240 and 352x288 refused to play with: X11 error: BadValue (integer parameter out of range for operation) MPlayer interrupted by signal 6 in module: flip_page while gstreamer did play them just fine. Also the Nokia_N800.avi and NokiaN93.avi died in the same way. This X11 error on video playback start and also sometimes on switching fullscreen/windowed mode is a known problem [1] reported in this mailing list. If MPlayer dies on start, usually trying to start it again succeeds. So these 320x240 and 352x288 videos could be played as well if you were a bit more persistent :) No, it's actually 100% reproducable in this situation (yes, I tried a number of . You see, I didn't have the window manager running. It breaks with the N800 video too. Running with the window manager does make it runnable, but it also changes the window size which I wanted to avoid. My mplayer is compiled from the svn trunk of the garage project, with some additional cflags I use (so maybe those were the problem...). Do you have a set of cflags settings which work better than the default set? Can you share this information? If by default set you mean what the default options in the toolchain is, then yes (as there are none AFAIK ;). If you mean the default options for mplayer, I don't know if they add any value. I like to use my hardware well ;) so I tend to compile everything with VFP enabled and optimized for the processor: CFLAGS='-mcpu=arm1136j-s -mfpu=vfp -mfloat-abi=softfp -O2' Now, wheter it works better than thumb code is debatable, as optimizing code size might be more beneficial than having fast floats. But at least I was happy with the results we got from our testing, detailed in http://syslog.movial.fi/archives/46-N800-VFP-or-not-to-VFP.html I doubt they will do much good for mplayer, as I assume most critical operations will be highly optimized already by hand and not left entirely for the compiler... If you want a guaranteed video playback with divx/xvid/mpeg4 codecs, you should restrict to 512x384 resolution or lower and keep bitrate reasonable. The results for these 'insane' videos you have posted are somewhat weird, a complete statistics would require also a number of frames dropped, otherwise we don't know how much work was done by the player. Probably missing audio track resulted in MPlayer not being able to provide a proper report. Yeah, I guess the fabricated videos weren't that good. Have to do some more testing with real videos... Yes, now N800 is faster than Nokia 770 for video output performance at last :) This is _very_ cool indeed :) -- Kalle Vahlman, [EMAIL PROTECTED] Powered by http://movial.fi Interesting stuff at http://syslog.movial.fi ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: N800 Video playback
On 4/30/07, Siarhei Siamashka [EMAIL PROTECTED] wrote: On Friday 27 April 2007 04:43, Daniel Stone wrote: [...] Daniel, Siarhei, Eero: I always find your mails to provide great deal of tech information about N800. However we do not have a central place with these information, it would be great if you guys setup a wiki page with tech details about drivers, optimizations and weakness of current implementations so others could base work on. I see that Eero has a how to at: http://maemo.org/platform/docs/howtos/howto_performance_test_process.html Other docs, describing best fetch size, which instructions that usually are cheap are bad implemented/slow on omap2420, etc... Tools would be great. I see Oprofile kernel was suggested to Siarhei, so it would be great to have it for download on this wiki page as well. Thank you all for your great work! Keep it coming :-) -- Gustavo Sverzut Barbieri -- Jabber: [EMAIL PROTECTED] MSN: [EMAIL PROTECTED] ICQ#: 17249123 Skype: gsbarbieri Mobile: +55 (81) 9927 0010 ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: N800 Video playback
2007/5/1, Siarhei Siamashka [EMAIL PROTECTED]: On Monday 30 April 2007 17:49, Daniel Stone wrote: It's completely safe to upgrade from a deb if it's not broken. If you set up a standard Maemo build environment and run apt-get source xorg-server and apt-get build-dep xorg-server, it should work just fine, in theory. I don't have any tips, per se. Once I get it all integrated it'll be in git, but for now, the only public source is the packages. OK, thanks. It may take some time though. I'm still using old scratchbox with mistral SDK here (did not have enough free time to upgrade yet). Until I clean up my scratchbox mess, I can only provide some patch without testing, if anybody courageous can try to build it :) Given that I fear not the perils of building a X server with nonstandard options[1], I shall be more than happy to conduct such adventurous acts :) And unless Mr. Kulve has objections, the results could be installed from a repository as well. [1] http://syslog.movial.fi/archives/47-Shadows-for-everyone-well,-not-really.html -- Kalle Vahlman, [EMAIL PROTECTED] Powered by http://movial.fi Interesting stuff at http://syslog.movial.fi ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: N800 Video playback
On Tuesday 01 May 2007 13:36, Kalle Vahlman wrote: 2007/5/1, Siarhei Siamashka [EMAIL PROTECTED]: OK, thanks. It may take some time though. I'm still using old scratchbox with mistral SDK here (did not have enough free time to upgrade yet). Until I clean up my scratchbox mess, I can only provide some patch without testing, if anybody courageous can try to build it :) Given that I fear not the perils of building a X server with nonstandard options[1], I shall be more than happy to conduct such adventurous acts :) And unless Mr. Kulve has objections, the results could be installed from a repository as well. [1] http://syslog.movial.fi/archives/47-Shadows-for-everyone-well,-not-really.html OK, here is this untested a patch for xserver to add ARMv6 optimized YUV420 color format conversion. Theoretically it should compile (I did not try to build xserver myself though) and work. If it refuses to compile, fixing the patch should be not too difficult. In the worst case only video playback may be broked. But if everything works as expected, video output performance should become a lot better. Video output performance can be tested by mplayer using -benchmark option, 'VO:' stat shows how much time was used for video output, 'VC:' stat shows how much time was used for video decoding. Built-in video player also should become faster. I don't know if this improvement can be 'scientifically' benchmarked, but it should drop less frames on high resolution video playback. If any of you can build xserver package with this patch, please put it for download somewhere or send directly to me. Thanks. diff -u -r -N xorg-server-1.1.99.3/hw/kdrive/omap/Makefile.am xorg-server-1.1.99.3.yuv420patch/hw/kdrive/omap/Makefile.am --- xorg-server-1.1.99.3/hw/kdrive/omap/Makefile.am 2007-03-05 16:17:32.0 +0200 +++ xorg-server-1.1.99.3.yuv420patch/hw/kdrive/omap/Makefile.am 2007-05-01 15:04:43.0 +0300 @@ -1,5 +1,5 @@ if XV -XV_SRCS = omap_video.c +XV_SRCS = omap_video.c omap_colorconv.S omap_colorconv.h endif if DEBUG @@ -34,4 +34,4 @@ $(TSLIB_FLAG) \ $(DYNSYMS) -EXTRA_DIST = omap_video.c +EXTRA_DIST = omap_video.c omap_colorconv.S omap_colorconv.h diff -u -r -N xorg-server-1.1.99.3/hw/kdrive/omap/omap_colorconv.h xorg-server-1.1.99.3.yuv420patch/hw/kdrive/omap/omap_colorconv.h --- xorg-server-1.1.99.3/hw/kdrive/omap/omap_colorconv.h 1970-01-01 03:00:00.0 +0300 +++ xorg-server-1.1.99.3.yuv420patch/hw/kdrive/omap/omap_colorconv.h 2007-05-01 15:06:13.0 +0300 @@ -0,0 +1,45 @@ +/* + * Copyright © 2007 Siarhei Siamashka + * + * Permission to use, copy, modify, distribute and sell this software and its + * documentation for any purpose is hereby granted without fee, provided that + * the above copyright notice appear in all copies and that both that + * copyright notice and this permission notice appear in supporting + * documentation, and that the names of the authors and/or copyright holders + * not be used in advertising or publicity pertaining to distribution of the + * software without specific, written prior permission. The authors and + * copyright holders make no representations about the suitability of this + * software for any purpose. It is provided as is without any express + * or implied warranty. + * + * THE AUTHORS AND COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO + * THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND + * FITNESS, IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR + * ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER + * RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF + * CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN + * CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. + * + * Author: Siarhei Siamashka [EMAIL PROTECTED] + */ + +/* + * ARMv6 assembly optimized color format conversion functions + * (planar YV12 to some custom YUV420 format used by graphics chip in Nokia N800) + */ + +#ifndef _OMAP_COLORCONV_H_ +#define _OMAP_COLORCONV_H_ + +#include stdint.h + +/** + * Convert a line of pixels from YV12 to YUV420 color format + * @param dst - destination buffer for YUV420 pixel data, it should be at least 16-bit aligned + * @param src_y - pointer to Y plane, it should be 16-bit aligned + * @param src_c - pointer to chroma plane (U for even lines, V for odd lines) + * @param w - number of pixels to convert (should be multiple of 4) + */ +void yv12_to_yuv420_line_armv6(uint16_t *dst, const uint16_t *src_y, const uint8_t *src_c, int w); + +#endif diff -u -r -N xorg-server-1.1.99.3/hw/kdrive/omap/omap_colorconv.S xorg-server-1.1.99.3.yuv420patch/hw/kdrive/omap/omap_colorconv.S --- xorg-server-1.1.99.3/hw/kdrive/omap/omap_colorconv.S 1970-01-01 03:00:00.0 +0300 +++ xorg-server-1.1.99.3.yuv420patch/hw/kdrive/omap/omap_colorconv.S 2007-05-01 15:06:36.0 +0300 @@ -0,0 +1,244 @@ +/* + * Copyright © 2007 Siarhei
Re: N800 Video playback
2007/5/1, Siarhei Siamashka [EMAIL PROTECTED]: On Tuesday 01 May 2007 13:36, Kalle Vahlman wrote: 2007/5/1, Siarhei Siamashka [EMAIL PROTECTED]: OK, thanks. It may take some time though. I'm still using old scratchbox with mistral SDK here (did not have enough free time to upgrade yet). Until I clean up my scratchbox mess, I can only provide some patch without testing, if anybody courageous can try to build it :) Given that I fear not the perils of building a X server with nonstandard options[1], I shall be more than happy to conduct such adventurous acts :) And unless Mr. Kulve has objections, the results could be installed from a repository as well. [1] http://syslog.movial.fi/archives/47-Shadows-for-everyone-well,-not-really.html OK, here is this untested a patch for xserver to add ARMv6 optimized YUV420 color format conversion. Theoretically it should compile (I did not try to build xserver myself though) and work. If it refuses to compile, fixing the patch should be not too difficult. Applied and build without problems for me. For testing, I fabricated some video with gstreamer: gst-launch-0.10 videotestsrc num-buffers=300 \ ! video/x-raw-yuv, width=640, height=480 \ ! ffenc_mpeg4 ! avimux \ ! filesink location=640x480.avi which resulted in [EMAIL PROTECTED] and [EMAIL PROTECTED] videos. For some reason 320x240 and 352x288 refused to play with: X11 error: BadValue (integer parameter out of range for operation) MPlayer interrupted by signal 6 in module: flip_page while gstreamer did play them just fine. Also the Nokia_N800.avi and NokiaN93.avi died in the same way. My mplayer is compiled from the svn trunk of the garage project, with some additional cflags I use (so maybe those were the problem...). Anyway, then I shut down af-base-apps and matchbox (to avoid scaling the video) and ran mplayer -benchmark file. In the worst case only video playback may be broked. But if everything works as expected, video output performance should become a lot better. Video output performance can be tested by mplayer using -benchmark option, 'VO:' stat shows how much time was used for video output, 'VC:' stat shows how much time was used for video decoding. There's something fishy in the decoding or something as the color bars in the test video were broken (yellow and cyan to be precise), but that seemed to be the case in a vanilla image too so nothing to do with this patch. I could not see any other glitches in the output. But on to the results: VIDEO: [DX50] 640x480 24bpp 30.000 fps 1597.6 kbps (195.0 kbyte/s) Original: V: 10.0 300/300 44% 74% 0.0% 0 0 0% BENCHMARKs: VC: 4.387s VO: 7.436s A: 0.000s Sys: 0.482s = 12.305s BENCHMARK%: VC: 35.6503% VO: 60.4311% A: 0.% Sys: 3.9185% = 100.% Patched: V: 10.0 300/300 42% 72% 0.0% 0 0 0% BENCHMARKs: VC: 4.213s VO: 7.265s A: 0.000s Sys: 0.381s = 11.859s BENCHMARK%: VC: 35.5296% VO: 61.2604% A: 0.% Sys: 3.2100% = 100.% --- VIDEO: [DX50] 800x480 24bpp 30.000 fps 1976.5 kbps (241.3 kbyte/s) Original: V: 10.0 300/300 54% 114% 0.0% 0 0 0% BENCHMARKs: VC: 5.466s VO: 11.456s A: 0.000s Sys: 0.366s = 17.287s BENCHMARK%: VC: 31.6179% VO: 66.2677% A: 0.% Sys: 2.1144% = 100.% Patched: V: 10.0 300/300 53% 70% 0.0% 0 0 0% BENCHMARKs: VC: 5.346s VO: 7.043s A: 0.000s Sys: 0.449s = 12.838s BENCHMARK%: VC: 41.6414% VO: 54.8602% A: 0.% Sys: 3.4984% = 100.% There is a clear drop in amount of time used to output the videos for 800x480 (the numbers were stable trough multiple runs). So I gather from the 10s benchmark time that we didn't get to real time yet, but close to it? And of course this is just video, audio decoding should be considered for real video playback performance measurement. If any of you can build xserver package with this patch, please put it for download somewhere or send directly to me. I put the deb up at: http://iki.fi/zuh/xserver-xomap_1.1.99.3-0.zuh2_armel.deb until I get it to the repository. This version also has the composite extension enabled, but AFAIK it does not depend on the libs or change server behaviour if composite is not specifically used. The server *should* be compiled with '-mcpu=arm1136j-s -mfpu=vfp -mfloat-abi=softfp -O2', but as I had troubles with the SBOX_EXTRA_COMPILER_ARGS env var being honored some time ago I'm not guaranteeing it at the moment ;) -- Kalle Vahlman, [EMAIL PROTECTED] Powered by http://movial.fi Interesting stuff at http://syslog.movial.fi ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: N800 Video playback
2007/5/1, Kalle Vahlman [EMAIL PROTECTED]: The server *should* be compiled with '-mcpu=arm1136j-s -mfpu=vfp -mfloat-abi=softfp -O2', but as I had troubles with the SBOX_EXTRA_COMPILER_ARGS env var being honored some time ago I'm not guaranteeing it at the moment ;) Actually seems that I had added the env var to the rules file so it *is* built with those options. I can produce a build without them if need be (it does affect performance in my experience, so if one wants to see the impact of that patch on a more normal version...). -- Kalle Vahlman, [EMAIL PROTECTED] Powered by http://movial.fi Interesting stuff at http://syslog.movial.fi ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: N800 Video playback
On Tuesday 01 May 2007 17:49, Kalle Vahlman wrote: OK, here is this untested a patch for xserver to add ARMv6 optimized YUV420 color format conversion. Theoretically it should compile (I did not try to build xserver myself though) and work. If it refuses to compile, fixing the patch should be not too difficult. Applied and build without problems for me. Thanks a lot for building the package and putting it for download, everything seems to be fine, but more details will follow below. For testing, I fabricated some video with gstreamer: which resulted in [EMAIL PROTECTED] and [EMAIL PROTECTED] videos. For some reason 320x240 and 352x288 refused to play with: X11 error: BadValue (integer parameter out of range for operation) MPlayer interrupted by signal 6 in module: flip_page while gstreamer did play them just fine. Also the Nokia_N800.avi and NokiaN93.avi died in the same way. This X11 error on video playback start and also sometimes on switching fullscreen/windowed mode is a known problem [1] reported in this mailing list. If MPlayer dies on start, usually trying to start it again succeeds. So these 320x240 and 352x288 videos could be played as well if you were a bit more persistent :) As Daniel replied in one of the followup messages, it is most likely some race condition. The question is which code is a suspect. Is it MPlayer Xv video output code that has been around for ages and worked fine on different systems or relatively new Xv extension code from N800 xserver? In addition, a previous revision of N800 firmware had a serious bug [2] related to video playback. It should be noted, that MPlayer needed only about 1 minute to freeze on the initial N800 firmware. So the problem could be identified much more easily if MPlayer was included in the standard set of tests done by Nokia QA staff before each new IT OS release. Surely, Nokia is only interested in a properly working xvimagesink for the software included in IT OS by default. But testing with more client applications can improve overall xserver quality. With all that said, I don't know if MPlayer Xv code is bugfree, it wasn't me who developed it. My mplayer is compiled from the svn trunk of the garage project, with some additional cflags I use (so maybe those were the problem...). Do you have a set of cflags settings which work better than the default set? Can you share this information? There's something fishy in the decoding or something as the color bars in the test video were broken (yellow and cyan to be precise), but that seemed to be the case in a vanilla image too so nothing to do with this patch. I could not see any other glitches in the output. But on to the results: VIDEO: [DX50] 640x480 24bpp 30.000 fps 1597.6 kbps (195.0 kbyte/s) [snip] VIDEO: [DX50] 800x480 24bpp 30.000 fps 1976.5 kbps (241.3 kbyte/s) [snip] There is a clear drop in amount of time used to output the videos for 800x480 (the numbers were stable trough multiple runs). So I gather from the 10s benchmark time that we didn't get to real time yet, but close to it? And of course this is just video, audio decoding should be considered for real video playback performance measurement. These videos are way too heavy for N800 to decode and play in realtime. We may expect playback for videos up to 640x480 resolution with 1000kbps bitrate and 24fps. This is probably current realistic limit which can be achieved. Some minor variations to these parameters are possible (for example we can get 30fps, but should also reduce resolution or bitrate, etc.). If you want a guaranteed video playback with divx/xvid/mpeg4 codecs, you should restrict to 512x384 resolution or lower and keep bitrate reasonable. The results for these 'insane' videos you have posted are somewhat weird, a complete statistics would require also a number of frames dropped, otherwise we don't know how much work was done by the player. Probably missing audio track resulted in MPlayer not being able to provide a proper report. Don't know. Also it is strange that you did not see any improvement at all for 640x480 video, are you sure you really tested it with the patched xserver? Anyway, the new xserver package works really good. If we do some tests with the standard Nokia_N800.avi video clip, we get the following results with the patched xserver: # mplayer -benchmark -quiet -noaspect Nokia_N800.avi BENCHMARKs: VC: 29,764s VO: 7,666s A: 0,468s Sys: 64,635s = 102,534s BENCHMARK%: VC: 29,0287% VO: 7,4767% A: 0,4565% Sys: 63,0381% = 100,% BENCHMARKn: disp: 2504 (24,42 fps) drop: 0 (0%) total: 2504 (24,42 fps) # mplayer -benchmark -quiet -noaspect -dr -nomenu Nokia_N800.avi BENCHMARKs: VC: 30,266s VO: 5,490s A: 0,467s Sys: 66,286s = 102,509s BENCHMARK%: VC: 29,5255% VO: 5,3554% A: 0,4560% Sys: 64,6631% = 100,% BENCHMARKn: disp: 2501 (24,40 fps) drop: 0 (0%) total: 2501 (24,40 fps) Results with unpatched xserver and some more
Re: N800 Video playback
Frantisek Dufka wrote: [sbox-SDK_ARMEL: ~/x/xorg-server-1.1.99.3] patch -p1 ../xomap_yuv420patch.diff patching file hw/kdrive/omap/Makefile.am Hunk #1 FAILED at 1. Hunk #2 FAILED at 34. 2 out of 2 hunks FAILED -- saving rejects to file hw/kdrive/omap/Makefile.am.rej patching file hw/kdrive/omap/omap_colorconv.h patching file hw/kdrive/omap/omap_colorconv.S patching file hw/kdrive/omap/omap_video.c Hunk #1 FAILED at 39. Hunk #2 FAILED at 468. Hunk #3 FAILED at 491. 3 out of 3 hunks FAILED -- saving rejects to file hw/kdrive/omap/omap_video.c.rej Sorry, my fault, mystery solved. Saved attachement in Thunderbird in Windows XP, then moved to Ubuntu inside VMware. The problem was caused by DOS CR+LF line endings, patch doesn't like it. Recoded to unix linefeeds and now it applies cleanly. I'm using Windows a lot, it is strange this never happened to me yet. ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: N800 Video playback
On 4/30/07, Daniel Stone [EMAIL PROTECTED] wrote: There are two important optimizations in this code: 1. Cache prefetch with PLD instruction (added in '_armv5' version) which boosts performance to 70 megapixels per second. Inner loop is unrolled to process 32 pixels per iteration (cache line size is 32 bytes on ARM, so such unrolling is convenient). This is the most important improvement. You can try using __builtin_prefetch() from C code to do the same optimization. Ah, sounds useful. From what Dan Amelang's been saying on xorg@, gcc should coalesce four 32-bit reads into one 128-bit read, but this sounds promising as well. To expand on this: I was referring to fact that gcc is pretty smart about using ldmia/stdmia instructions to cluster sequential reads/writes. I see that Siarhei is already using this technique in his assembler code, so nothing new here. Dan ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: N800 Video playback
On 5/1/07, Daniel Amelang [EMAIL PROTECTED] wrote: about using ldmia/stdmia instructions to cluster sequential that was supposed to be ldmia/sdmia, sorry. Dan ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: N800 Video playback
On 5/1/07, Daniel Amelang [EMAIL PROTECTED] wrote: On 5/1/07, Daniel Amelang [EMAIL PROTECTED] wrote: about using ldmia/stdmia instructions to cluster sequential that was supposed to be ldmia/sdmia, sorry. Gah, ldmia/stmia, final answer. Dan ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: N800 Video playback
On Friday 27 April 2007 04:43, Daniel Stone wrote: I'll make a really optimized version of YV12 - YUV420 convertor on this weekend (removing branch is good, but I feel that it can be improved more) and will try to use it on Nokia 770, any extra video performance improvement will be useful there. I hope that the framebuffer driver on Nokia 770 supports YUV420 color format properly. I don't think Tornado supports YUV420, but I can check in the specs tomorrow. My better C version basically does two macroblocks at a time, ensuring all 32-bit writes (which _really_ helps over 16-bit writes, believe me). This eliminates the branch, since your surface is guaranteed to be word-aligned, so if you do all 32-bit writes, you can just drop the branch as you know every write will be aligned. This will be really fast. Optimized YV12 - YUV420 convertor is done. The sources can be found here: https://garage.maemo.org/plugins/scmsvn/viewcvs.php/trunk/libswscale_nokia770/?root=mplayer Take a look at 'arm_colorconv.h' and 'arm_colorconv.S' files. Also there is a test program ('test_colorconv') which can ensure that everything works correctly and fast: ~ $ ./test_colorconv test: 'yv12_to_yuv420_xomap', time=7.332s, speed=32.878MP/s, memwritespeed=43.838MB/s test: 'yv12_to_yuv420_xomap_nobranch', time=5.679s, speed=42.448MP/s, memwritespeed=56.597MB/s test: 'yv12_to_yuv420_line_arm_', time=4.706s, speed=51.223MP/s, memwritespeed=68.297MB/s test: 'yv12_to_yuv420_line_armv5_', time=3.356s, speed=71.824MP/s, memwritespeed=95.765MB/s test: 'yv12_to_yuv420_line_armv6_', time=2.826s, speed=85.298MP/s, memwritespeed=113.731MB/s ARMv6 optimized YV12-YUV420 convertor is about 2.5x faster than current code used in N800 xserver. So it should provide a nice improvement for video :) I doubt that your better C version can beat it or even get any close. There are two important optimizations in this code: 1. Cache prefetch with PLD instruction (added in '_armv5' version) which boosts performance to 70 megapixels per second. Inner loop is unrolled to process 32 pixels per iteration (cache line size is 32 bytes on ARM, so such unrolling is convenient). This is the most important improvement. You can try using __builtin_prefetch() from C code to do the same optimization. 2. The use of ARMv6 instruction REV16 to do bytes swapping for high and low 16-bit register parts, this optimization was added in '_armv6' version and boosted performance even more to 85 megapixels per second. This optimization is highly unlikely probably impossible for C version at all. I was a bit wrong about YUV420 format in my previous post. Suppose we have planar YV12 image with the following data. Y plane: Y1 Y2 Y3 Y4 ... U plane: U1 __ U2 __ ... Normal YUV420 (according to pictures in Epson docs) would be the following: U1 Y1 Y2 U2 Y3 Y4 ... But appears (most likely because of 16-bit interface and some endian differences between ARM and Epson chip) that each pair of bytes is swapped and we actually get the following somewhat weird layout: Y1 U1 U2 Y2 Y4 Y3 ... To do this byteswapping, ARMv6 instruction REV16 is very handy. The assembly sources for ARMv6 code look a bit messy because instruction reordering was needed to correctly schedule them and avoid ARM11 pipeline interlocks which negatively affect performance. Now this code is really fast with very little or no interlocks in the inner loop. And gcc does not do a good job optimizing code on ARM, so C implementation would be also at disadvantage here. By the way, the benchmarks posted in my previous message should be discarded. I did not initialize source buffers that time and looks like ARM11 cpu has some 'cheat' which allows treating empty data pages in some special way and avoid reading from memory. So the numbers posted in the previous benchmark were higher than usual. Now it is corrected. As for the other possible Xv optimizations. You mentioned that fallback code is not important at all. But imagine 640x480 video playback in windowed mode. Decoding it will require quite a lot of resources, but additionally scaling it down using a slow fallback code will be a finishing blow. In addition, a solution (fast JIT accelerated YV12-YUY2 scaler) for this problem already exists. I can also modify this scaler to support YV12-YUV420 scaling. An interesting thing here is that this scaler could be also used by xserver to solve graphics bus bandwidth issues. Imagine that we have some high resolution video with high framerate which exceeds graphics bus capabilities. In this case this video can be downscaled in software using JIT scaler to lower resolution before sending data to LCD controller. What do you think? Sure. Unfortunately my job has other functions than to make video decoding really, really fast, so I'm happy to merge, review, offer feedback, and help you out where I can be useful, but I can't throw much time at this myself. That's fine. Now I'm waiting for
Re: N800 Video playback
Hi, On Mon, Apr 30, 2007 at 02:27:49PM +0300, ext Siarhei Siamashka wrote: On Friday 27 April 2007 04:43, Daniel Stone wrote: I don't think Tornado supports YUV420, but I can check in the specs tomorrow. My better C version basically does two macroblocks at a time, ensuring all 32-bit writes (which _really_ helps over 16-bit writes, believe me). This eliminates the branch, since your surface is guaranteed to be word-aligned, so if you do all 32-bit writes, you can just drop the branch as you know every write will be aligned. This will be really fast. Optimized YV12 - YUV420 convertor is done. The sources can be found here: https://garage.maemo.org/plugins/scmsvn/viewcvs.php/trunk/libswscale_nokia770/?root=mplayer Take a look at 'arm_colorconv.h' and 'arm_colorconv.S' files. Also there is a test program ('test_colorconv') which can ensure that everything works correctly and fast: ~ $ ./test_colorconv [results follow] ARMv6 optimized YV12-YUV420 convertor is about 2.5x faster than current code used in N800 xserver. So it should provide a nice improvement for video :) Indeed. Unfortunately this is slightly misleading in that it only shows the raw write speed. RFBI can't deal with the sorts of speeds that your hyper-optimised version is pumping out, e.g. So it's mainly just about cutting the latency into the critical path to low enough that it makes no difference. I doubt that your better C version can beat it or even get any close. Of course not. There are two important optimizations in this code: 1. Cache prefetch with PLD instruction (added in '_armv5' version) which boosts performance to 70 megapixels per second. Inner loop is unrolled to process 32 pixels per iteration (cache line size is 32 bytes on ARM, so such unrolling is convenient). This is the most important improvement. You can try using __builtin_prefetch() from C code to do the same optimization. Ah, sounds useful. From what Dan Amelang's been saying on xorg@, gcc should coalesce four 32-bit reads into one 128-bit read, but this sounds promising as well. 2. The use of ARMv6 instruction REV16 to do bytes swapping for high and low 16-bit register parts, this optimization was added in '_armv6' version and boosted performance even more to 85 megapixels per second. This optimization is highly unlikely probably impossible for C version at all. Sounds useful. I was a bit wrong about YUV420 format in my previous post. Suppose we have planar YV12 image with the following data. Y plane: Y1 Y2 Y3 Y4 ... U plane: U1 __ U2 __ ... Normal YUV420 (according to pictures in Epson docs) would be the following: U1 Y1 Y2 U2 Y3 Y4 ... But appears (most likely because of 16-bit interface and some endian differences between ARM and Epson chip) that each pair of bytes is swapped and we actually get the following somewhat weird layout: Y1 U1 U2 Y2 Y4 Y3 ... Right, hence the comment in the code is correct. ;) As for the other possible Xv optimizations. You mentioned that fallback code is not important at all. But imagine 640x480 video playback in windowed mode. Decoding it will require quite a lot of resources, but additionally scaling it down using a slow fallback code will be a finishing blow. In addition, a solution (fast JIT accelerated YV12-YUY2 scaler) for this problem already exists. I can also modify this scaler to support YV12-YUV420 scaling. An interesting thing here is that this scaler could be also used by xserver to solve graphics bus bandwidth issues. Imagine that we have some high resolution video with high framerate which exceeds graphics bus capabilities. In this case this video can be downscaled in software using JIT scaler to lower resolution before sending data to LCD controller. What do you think? IMO this is a policy issue, and X is 'mechanism, not policy'. If you want to adapt the scaler, I'm more than happy to include it, but I'm not about to start doing automatic scaling. IOW, 'ask a stupid question, get a stupid answer'. That's fine. Now I'm waiting for further instructions :) Should I try to prepare a complete patch for xserver? I'm really interested in getting this optimization into xserver as it would help to play high resolution videos. If you have any extra questions about the code or anything else (for example I wonder what free license would be appriopriate for it), don't hesitate to contact me. If you wanted to prepare a complete patch for the server, that would be great, as I don't have time to get to it right now (trying to finish off the merge with upstream, among others). As for the license, just the standard MIT boilerplate in hw/kdrive/omap/* is fine, but replace Nokia Corporation/Daniel Stone with Siarhei Siamaskha, obviously. I did not try to build xserver sources yet as I did not have enough time for that and xserver requires quite a number of build dependencies. Can you share some tips and tricks about maemo
Re: N800 Video playback
On Friday 20 April 2007 10:39, you wrote: The primary conversion we do isn't planar - packed (this is a fallback for when the video is obscured), but from planar to another custom planar format. It would be good to get ARM assembly for the fallback path, but most of the problem when using packed lies in having to transfer the much larger amount of data over the bus. It is only a problem of definition :) Whatever it is, packed or planar, this YUV420 format is not YV12. So it still needs conversion which is performed by only reordering bytes and is not much different from packed YUY2 (except that it requires less space and bandwidth). There's one optimisation that could be done for the YUV420 conversion (the custom planar format that Hailstorm takes), which removes a branch, ensures 32-bit writes always (instead of one 32-bit and one 16-bit per pixel), and unrolls a loop by half. Might be interesting to see what effect this has, but I think it'll still be rather small. My main performance concern is exactly about this 'omapCopyPlanarDataYUV420' function. My experience from Nokia 770 video output code optimization shows that optimization effect can be really huge (it was 1.5x improvement on Nokia 770 for unscaled YV12 - YUY2 conversion going from a simple loop in C to optimized assembly code, I provided a link to the relevant code in my previous post). But N800 code can be probably improved more because now it contains unnecessary branch in the inner loop and branches are expensive on long pipeline CPUs. Such color format conversion performance should be comparable to that of memcpy if done right (it is about half memcpy speed on Nokia 770 for unscaled YV12 - YUY2 conversion). But only benchmarks can be a real proof, any premature speculations are useless and even harmful. Do you remember the times when nobody from Nokia believed that ARM core could be good for video decoding on 770? ;-) Testing with Nokia_N800.avi video on N800: # mplayer -benchmark -quiet -noaspect Nokia_N800.avi BENCHMARKs: VC: 29,525s VO: 15,029s A: 0,453s Sys: 59,919s = 104,925s BENCHMARK%: VC: 28,1390% VO: 14,3232% A: 0,4313% Sys: 57,1065% = 100,% BENCHMARKn: disp: 2511 (23,93 fps) drop: 0 (0%) total: 2511 (23,93 fps) Enabling direct rendering (avoids extra memcpy in mplayer, but requires to disable OSD menu): # mplayer -benchmark -quiet -noaspect -dr -nomenu Nokia_N800.avi BENCHMARKs: VC: 29,826s VO: 12,365s A: 0,437s Sys: 60,555s = 103,182s BENCHMARK%: VC: 28,9058% VO: 11,9833% A: 0,4236% Sys: 58,6873% = 100,% BENCHMARKn: disp: 2504 (24,27 fps) drop: 0 (0%) total: 2504 (24,27 fps) Testing the same video on Nokia 770: # mplayer -benchmark -quiet -noaspect Nokia_N800.avi BENCHMARKs: VC: 44,982s VO: 7,998s A: 0,884s Sys: 47,936s = 101,801s BENCHMARK%: VC: 44,1862% VO: 7,8568% A: 0,8688% Sys: 47,0882% = 100,% BENCHMARKn: disp: 2502 (24,58 fps) drop: 0 (0%) total: 2502 (24,58 fps) So Nokia 770, having slower CPU, slower memory and using less efficient output format (16bpp vs. 12bpp), still requires less time for video output than N800 (7,998s vs. 12,365s). Graphics bus performance is unrelated here as it is asynchronous operation and it is fast enough. Surely N800 also has some extra overhead because of interprocess communication with xserver, but looks like YV12 - YUV420 conversion is quite a bottleneck here too. It should be noted that while Nokia_N800.avi video has low resolution and N800 has no problems decoding and displaying it, our goal is higher resolution videos such as 640x480. Getting to higher resolutions will increase color format conversion overhead. As it can be seen from these benchmarks, video output on N800 takes quite a significant time when compared with time needed for decoding (29,826s for decoding, 12,365s for video output). I can make an assembly optimized code for YV12 - YUV420 conversion. Is there any chance that such optimization could be also integrated into xserver in one of the next firmware updates if it really provides a significant performance improvement? N800 is almost able to play VGA resolution videos properly, it only needs a bit more optimizations. Color format conversion performance for video output is one of the important things that can be improved. So for any performance optimizations experiments which result in immediate video performance improvement, either direct framebuffer access should be used again or it would be very nice if xserver could provide direct access to framebuffer (video planes) in yuy2 and that custom yuv420 format in one of the next firmware updates. The xserver itself should not do any excess memory copy operations as they degrade performance (and it does such copy for yuy2 at least). 'Direct framebuffer access'? As in, just hand you a pointer to a framebuffer somewhere and let you write straight to it? As this would require a firmware update anyway, I don't really see how
Re: N800 Video playback
On Tue, Apr 24, 2007 at 09:46:52AM +0300, ext Siarhei Siamashka wrote: On Friday 20 April 2007 10:39, you wrote: There's one optimisation that could be done for the YUV420 conversion (the custom planar format that Hailstorm takes), which removes a branch, ensures 32-bit writes always (instead of one 32-bit and one 16-bit per pixel), and unrolls a loop by half. Might be interesting to see what effect this has, but I think it'll still be rather small. My main performance concern is exactly about this 'omapCopyPlanarDataYUV420' function. My experience from Nokia 770 video output code optimization shows that optimization effect can be really huge (it was 1.5x improvement on Nokia 770 for unscaled YV12 - YUY2 conversion going from a simple loop in C to optimized assembly code, I provided a link to the relevant code in my previous post). But N800 code can be probably improved more because now it contains unnecessary branch in the inner loop and branches are expensive on long pipeline CPUs. Such color format conversion performance should be comparable to that of memcpy if done right (it is about half memcpy speed on Nokia 770 for unscaled YV12 - YUY2 conversion). Right, the branch is a problem, and as I said, the branch can be avoided and the writes optimised to be three 32-bit writes for two macroblocks, instead of two 32-bit writes and two 16-bit writes. However, I don't think the lessons from the 770 are necessarily _directly_ applicable to the N800: on the 770, our bottleneck is decoding speed. The bottleneck on the N800 is exactly the opposite: video output. But only benchmarks can be a real proof, any premature speculations are useless and even harmful. Do you remember the times when nobody from Nokia believed that ARM core could be good for video decoding on 770? ;-) Actually, I don't, since I've always mainly worked on the N800. ;) But still, if there's dedicated hardware we can use to remove load from the ARM and let it get on with tasks, and it can perform to an adequate level, there's no reason to avoid it. So Nokia 770, having slower CPU, slower memory and using less efficient output format (16bpp vs. 12bpp), still requires less time for video output than N800 (7,998s vs. 12,365s). Graphics bus performance is unrelated here as it is asynchronous operation and it is fast enough. Surely N800 also has some extra overhead because of interprocess communication with xserver, but looks like YV12 - YUV420 conversion is quite a bottleneck here too. Bear in mind that, unless you explicitly disable it (the Xv attribute is something like XV_OMAP_VSYNC), the X server _will_ flush all pending writes before the next frame is put through. Else you get tearing, because you can be halfway through an update, and writing the next frame to the framebuffer, so which frame is being picked up, changes halfway through. Try forcing XV_OMAP_VSYNC (or whatever it is) to 0, and comparing the results. I can make an assembly optimized code for YV12 - YUV420 conversion. Is there any chance that such optimization could be also integrated into xserver in one of the next firmware updates if it really provides a significant performance improvement? Yeah, if there's measurable benefit, I'll include it. N800 is almost able to play VGA resolution videos properly, it only needs a bit more optimizations. Color format conversion performance for video output is one of the important things that can be improved. I don't believe it's on the critical path. The optimisation I mentioned before will bring us up to the point where any improvement that we can make in that conversion will be eclipsed by the time taken to send it over the bus, I believe. But I can't prove that. Which Epson docs? The one mentioned by Frantisek. Well, it was just a comment for 'omapCopyPlanarDataYUV420' function wrong and misleading, nevermind :-) Now everything is clear. Hmm, is it? Because, unless I was _really_ tired at the time I wrote it (which is entirely possible), that's what the code does, and it works, so ... Cheers, Daniel signature.asc Description: Digital signature ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: N800 Video playback
On Monday 19 March 2007 22:34, you wrote: snip Again, if there are any particular questions I can answer, don't be subtle: ask me straight up. If I can answer them (some things I can't necessarily say, some things I don't necessarily know), I will. Thanks, here we go and sorry for a long delay with this answer. First thanks for Xv update which makes it really usable now, MPlayer now uses Xv video output on N800 by default. But there are still some problems. Using unmodified upstream MPlayer code for Xv (N800 with 3.2007.10-7 firmware at the moment) does not work good. It has two at least problems: 1. Lockups which look like cycling two sequential frames, very similar or the same problem as https://maemo.org/bugzilla/show_bug.cgi?id=991 Also keypresses are not very responsive. A fix (or workaround) required changing XFlush to XSync in screen update code, now it looks a lot better. 2. Switching windowed/fullscreen mode generally makes mplayer terminate with the following error messages: X11 error: BadValue (integer parameter out of range for operation) Xlib: unexpected async reply (sequence 0x5db)! A workaround to make this problem less frequent was a code addition which prevents screen updates until we get Expose even notification. All these Xv patches for MPlayer code can be viewed here: https://garage.maemo.org/plugins/scmsvn/viewcvs.php?root=mplayerdiff_format=hview=revrev=166 I really don't know much about X11 programming and only started to learning it, so your help with some advice may be very useful. Looks like MPlayer code X11/Xv output code is a big mess with many tricks and workarounds added to work on different systems over time. Maybe it contains some bugs which get triggered on N800 only, but apparently this code is used for other systems without any problems. Can you try experimenting a bit with MPlayer (upstream release) yourself to check how it works with N800 xserver? Maybe it can reveal some xserver bugs which need to be fixed? Also if MPlayer has some apparently bad X11 code, preparing a clean patch and submitting it upstream maybe a good idea. One more strange thing with Xv on N800 can be reproduced by trying to watch standard N800 demo video in MPlayer. It has an old familiar tearing line in the bottom part of the screen and the performance is very poor. The same file plays fine in the standard video player. The only difference is that mplayer respects video aspect ratio (this video is not precisely 15:9 but slightly off) and shows some small black bands above and below picture and default video player scales it to fit the whole screen. Disabling aspect ratio in mplayer with -noaspect option also 'fixes' this problem. Using benchmark option we get the following numbers: # mplayer -benchmark -quiet Nokia_N800.avi [...] BENCHMARKs: VC: 33,271s VO: 66,768s A: 0,490s Sys: 5,703s = 106,232s BENCHMARK%: VC: 31,3189% VO: 62,8517% A: 0,4614% Sys: 5,3681% = 100,% BENCHMARKn: disp: 1732 (16,30 fps) drop: 778 (30%) total: 2510 (23,63 fps) # mplayer -benchmark -quiet -noaspect Nokia_N800.avi [...] BENCHMARKs: VC: 32,226s VO: 14,350s A: 0,456s Sys: 55,699s = 102,731s BENCHMARK%: VC: 31,3694% VO: 13,9687% A: 0,4439% Sys: 54,2180% = 100,% BENCHMARKn: disp: 2501 (24,35 fps) drop: 0 (0%) total: 2501 (24,35 fps) So when showing video with proper aspect ratio, we get tearing back and more than 4x slowdown in video output code (66,768s vs. 14,350s). This all results in 30% of frames dropped. These were the 'usability' problems with Xv. Now we get to performance related issues. As YV12 is not natively supported by hardware, some color format conversion and bytes shuffling in video output code is unavoidable. It is a good idea to optimize this code if we need a good performance for high resolution video playback. Color format conversion can be optimized using assembly, for example maemo port of mplayer has a patch for assembly optimized yv12- yuy2 (yuv420p - yuyv422) nonscaled conversion which provides a very noticeable ~50% improvement on Nokia 770: https://garage.maemo.org/plugins/scmsvn/viewcvs.php?root=mplayerrev=129view=rev Also here is a JIT accelerated scaler for yv12- yuy2 (yuv420p - yuyv422) conversion, it is very fast and supports pixels interpolation (good for image quality) : https://garage.maemo.org/plugins/scmsvn/viewcvs.php/trunk/libswscale_nokia770/?root=mplayer I have seen your code in xserver which does the same job for downscaling, but in nonoptimized C and with much higher impact on quality. Using JIT scaler there can improve both image quality and performance a lot. The only my concern is about instruction cache coherency. As ARM requires explicit instructions cache flush for self modyfying or dynamically generated code, I wonder if using just mmap is safe (does it flush cache for allocated region of memory?). Maybe maemo kernel hackers/developers can help with this information? It should be noted, that all this assembly
Re: N800 Video playback
Hi, On Fri, Apr 20, 2007 at 09:41:45AM +0300, ext Siarhei Siamashka wrote: 1. Lockups which look like cycling two sequential frames, very similar or the same problem as https://maemo.org/bugzilla/show_bug.cgi?id=991 Also keypresses are not very responsive. A fix (or workaround) required changing XFlush to XSync in screen update code, now it looks a lot better. I assume this is basically just a race condition, and it doesn't trigger on other systems, because they're a lot quicker. 2. Switching windowed/fullscreen mode generally makes mplayer terminate with the following error messages: X11 error: BadValue (integer parameter out of range for operation) Xlib: unexpected async reply (sequence 0x5db)! A workaround to make this problem less frequent was a code addition which prevents screen updates until we get Expose even notification. Ditto. I really don't know much about X11 programming and only started to learning it, so your help with some advice may be very useful. I mainly lurk on the server side, however. Looks like MPlayer code X11/Xv output code is a big mess with many tricks and workarounds added to work on different systems over time. Maybe it contains some bugs which get triggered on N800 only, but apparently this code is used for other systems without any problems. Can you try experimenting a bit with MPlayer (upstream release) yourself to check how it works with N800 xserver? Maybe it can reveal some xserver bugs which need to be fixed? Also if MPlayer has some apparently bad X11 code, preparing a clean patch and submitting it upstream maybe a good idea. Unfortunately, I don't have the time to do this. Sorry. One more strange thing with Xv on N800 can be reproduced by trying to watch standard N800 demo video in MPlayer. It has an old familiar tearing line in the bottom part of the screen and the performance is very poor. The same file plays fine in the standard video player. The only difference is that mplayer respects video aspect ratio (this video is not precisely 15:9 but slightly off) and shows some small black bands above and below picture and default video player scales it to fit the whole screen. Disabling aspect ratio in mplayer with -noaspect option also 'fixes' this problem. Using benchmark option we get the following numbers: # mplayer -benchmark -quiet Nokia_N800.avi [...] BENCHMARKs: VC: 33,271s VO: 66,768s A: 0,490s Sys: 5,703s = 106,232s BENCHMARK%: VC: 31,3189% VO: 62,8517% A: 0,4614% Sys: 5,3681% = 100,% BENCHMARKn: disp: 1732 (16,30 fps) drop: 778 (30%) total: 2510 (23,63 fps) # mplayer -benchmark -quiet -noaspect Nokia_N800.avi [...] BENCHMARKs: VC: 32,226s VO: 14,350s A: 0,456s Sys: 55,699s = 102,731s BENCHMARK%: VC: 31,3694% VO: 13,9687% A: 0,4439% Sys: 54,2180% = 100,% BENCHMARKn: disp: 2501 (24,35 fps) drop: 0 (0%) total: 2501 (24,35 fps) So when showing video with proper aspect ratio, we get tearing back and more than 4x slowdown in video output code (66,768s vs. 14,350s). This all results in 30% of frames dropped. Okay, I'll take a look at this. My guess is that the scaling we're seeing prevents us from using the LCD controller's overlay, possibly because it's done in software. These were the 'usability' problems with Xv. Now we get to performance related issues. As YV12 is not natively supported by hardware, some color format conversion and bytes shuffling in video output code is unavoidable. It is a good idea to optimize this code if we need a good performance for high resolution video playback. Color format conversion can be optimized using assembly, for example maemo port of mplayer has a patch for assembly optimized yv12- yuy2 (yuv420p - yuyv422) nonscaled conversion which provides a very noticeable ~50% improvement on Nokia 770: https://garage.maemo.org/plugins/scmsvn/viewcvs.php?root=mplayerrev=129view=rev Also here is a JIT accelerated scaler for yv12- yuy2 (yuv420p - yuyv422) conversion, it is very fast and supports pixels interpolation (good for image quality) : https://garage.maemo.org/plugins/scmsvn/viewcvs.php/trunk/libswscale_nokia770/?root=mplayer The primary conversion we do isn't planar - packed (this is a fallback for when the video is obscured), but from planar to another custom planar format. It would be good to get ARM assembly for the fallback path, but most of the problem when using packed lies in having to transfer the much larger amount of data over the bus. There's one optimisation that could be done for the YUV420 conversion (the custom planar format that Hailstorm takes), which removes a branch, ensures 32-bit writes always (instead of one 32-bit and one 16-bit per pixel), and unrolls a loop by half. Might be interesting to see what effect this has, but I think it'll still be rather small. I have seen your code in xserver which does the same job for downscaling, but in nonoptimized C and with much higher impact on quality.
Re: N800 Video playback
Daniel Stone wrote: Which Epson docs? fanoush.wz.cz/maemo/S1D13745A01SpecRev1.0.gm.zip Got it from Epson Electronics like the one mentioned here http://maemo.org/pipermail/maemo-developers/2006-December/006638.html ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: N800 Video playback
Siarhei Siamashka 写道: I have seen your code in xserver which does the same job for downscaling, but in nonoptimized C and with much higher impact on quality. Using JIT scaler there can improve both image quality and performance a lot. The only my concern is about instruction cache coherency. As ARM requires explicit instructions cache flush for self modyfying or dynamically generated code, I wonder if using just mmap is safe (does it flush cache for allocated region of memory?). Maybe maemo kernel hackers/developers can help with this information? arm linux support flush icache by syscall cacheflush, qemu have this function: static inline void flush_icache_range(unsigned long start, unsigned long stop) { register unsigned long _beg __asm (a1) = start; register unsigned long _end __asm (a2) = stop; register unsigned long _flg __asm (a3) = 0; __asm __volatile__ (swi 0x9f0002 : : r (_beg), r (_end), r (_flg)); } you can reference kernel source arch/arm/kernel/traps.c and include/asm-arm/unistd.h ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: N800 Video playback
On Tuesday 20 March 2007 15:03, Klaus Rotter wrote: On Tue, Mar 20, 2007 at 09:31:00AM +0100, ext Klaus Rotter wrote: The memory bandwidth to the N800 LCD framebuffer is 3 times slower that the bandwidth in the N770? Is it really _that_ big? Siarhei's calculations were correct, so, yes. Bad... the N770 interface wasn't the fasted either. So we have even a more slow down. There is one important thing to note. Screen updates are asynchronous and are performed simultaneously with CPU doing some other useful things at the same time. Screen updates do not introduce any overhead or affect performance (at least I did not notice any such effect). So insanely boosting graphics bus performance will not provide any improvements at all once it is capable to sustain acceptable framerate. And what is acceptable depends on applications. Video may require higher framerate, but it is both high resolution and high framerate movies that may exceed graphics bus capabilities, in this case video will be still played (if cpu is fast enough to decode it, that's another story) but with some frames skipped and many people will not even notice any problems. Quite a lot of people are even satistied with 15fps transcoded video, so getting maybe 20-25fps (random guess) on some videos instead of 30fps is not so bad. Tearing at the bottom is most likely caused by screen update time being longer than two LCD refresh cycles. With tearsync enabled, both screen update and refresh cycle start at the same time, refresh is faster, so we still see the previous frame on the screen. When the first refresh cycle completes, screen buffer is slightly less than half updated at that moment. The second LCD refresh cycle starts displaying the data from the new image, while screen buffer still continues to get updated, but not fast enough to complete before this second LCD refresh cycle catches up not too far from the bottom part of the screen. If the screen update was faster than two refresh cycles, there would be no tearing visible. Screen update only needs to be 15-20% faster to achieve this. If improving graphics bus performance does not work, I wonder if it is possible to to reduce LCD refresh rate instead? Anyway, I think it is better to believe Daniel and wait for the new firmware update :) On the N770 there was the feature (with SDL games) of doubling the pixels by hardware with a X-server extension. Will this feature be available in the new kernel / X11 server for the N800? It would be great if it would use the same API. Doubling pixels will definitely reduce the load on the graphics bus so that its bandwidth should become not an issue. ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: N800 Video playback
Daniel Stone wrote: On Sun, Mar 18, 2007 at 07:57:36PM +0200, ext Siarhei Siamashka wrote: Looks like graphics bus on N800 is 3x slower than on Nokia 770. It might be caused by inefficient framebuffer driver implementation in its initial revision. But if it is a hardware issue, getting normal video playback at native framerate may be troublesome. [...] Unfortunately, it's a hardware issue. What we can do is get the LCD The memory bandwidth to the N800 LCD framebuffer is 3 times slower that the bandwidth in the N770? Is it really _that_ big? What is limiting the bandwidth: The OMAP interface, the LCD controller itself or was it a design issue. -Klaus -- Klaus Rotter * klaus at rotters dot de * www.rotters.de ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: N800 Video playback
On Tue, Mar 20, 2007 at 09:31:00AM +0100, ext Klaus Rotter wrote: Daniel Stone wrote: On Sun, Mar 18, 2007 at 07:57:36PM +0200, ext Siarhei Siamashka wrote: Looks like graphics bus on N800 is 3x slower than on Nokia 770. It might be caused by inefficient framebuffer driver implementation in its initial revision. But if it is a hardware issue, getting normal video playback at native framerate may be troublesome. [...] Unfortunately, it's a hardware issue. What we can do is get the LCD The memory bandwidth to the N800 LCD framebuffer is 3 times slower that the bandwidth in the N770? Is it really _that_ big? Siarhei's calculations were correct, so, yes. What is limiting the bandwidth: The OMAP interface, the LCD controller itself or was it a design issue. a) and c). It's just not stable at higher frequencies. Cheers, Daniel signature.asc Description: Digital signature ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: N800 Video playback
Daniel Stone wrote: On Tue, Mar 20, 2007 at 09:31:00AM +0100, ext Klaus Rotter wrote: The memory bandwidth to the N800 LCD framebuffer is 3 times slower that the bandwidth in the N770? Is it really _that_ big? Siarhei's calculations were correct, so, yes. Bad... the N770 interface wasn't the fasted either. So we have even a more slow down. On the N770 there was the feature (with SDL games) of doubling the pixels by hardware with a X-server extension. Will this feature be available in the new kernel / X11 server for the N800? It would be great if it would use the same API. -- Klaus Rotter * klaus at rotters dot de * www.rotters.de ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: N800 Video playback
On Tue, Mar 20, 2007 at 02:03:16PM +0100, ext Klaus Rotter wrote: Daniel Stone wrote: On Tue, Mar 20, 2007 at 09:31:00AM +0100, ext Klaus Rotter wrote: The memory bandwidth to the N800 LCD framebuffer is 3 times slower that the bandwidth in the N770? Is it really _that_ big? Siarhei's calculations were correct, so, yes. Bad... the N770 interface wasn't the fasted either. So we have even a more slow down. On the N770 there was the feature (with SDL games) of doubling the pixels by hardware with a X-server extension. Will this feature be available in the new kernel / X11 server for the N800? It would be great if it would use the same API. Yes, pixel doubling has been fixed, and still uses the XSP API for now. Future releases (long-term, as I haven't implemented this yet) will use the standard XRandR API. Cheers, Daniel signature.asc Description: Digital signature ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: N800 Video playback
On Sun, Mar 18, 2007 at 07:57:36PM +0200, Siarhei Siamashka wrote: I did some tests with the framebuffer when trying to find a way to reduce tearing effect in MPlayer. Here are the results. snip This is a very interesting post. Thanks! Marius Gedminas -- ... Another nationwide organization's computer system crashed twice in less than a year. The cause of each crash was a computer virus -- Paul Mungo, Bryan Glough _Approaching_Zero_ (in 1986 computer crashes were something out of the ordinary. Win95 anyone?) signature.asc Description: Digital signature ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: N800 Video playback
Siarhei Siamashka schrieb: Looks like graphics bus on N800 is 3x slower than on Nokia 770. It might be caused by inefficient framebuffer driver implementation in its initial revision. But if it is a hardware issue, getting normal video playback at native framerate may be troublesome. It would be a major disappointment if this turns out to be a hardware issue... Regards, Hanno ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: N800 Video playback
Hi, On Sun, Mar 18, 2007 at 07:57:36PM +0200, ext Siarhei Siamashka wrote: If we look at the framebuffer API. There are two ioctl important for screen updates and tearing synchronization if I understand them correctly now: [...] You do indeed understand them correctly. Looks like graphics bus on N800 is 3x slower than on Nokia 770. It might be caused by inefficient framebuffer driver implementation in its initial revision. But if it is a hardware issue, getting normal video playback at native framerate may be troublesome. Performing software downscaling of video before sending data to the graphics chip may be a solution, but it sacrifices image quality. Switching to 12bit YUV format from 16bit will save ~33% of bus bandwidth, but it can't compensate 3x performance regression and may be not enough for 30 fps fullscreen video playback. Unfortunately, it's a hardware issue. What we can do is get the LCD controller to perform colourspace conversion from a custom planar format ('YUV420') and the scaling as well. Unfortunately this isn't a colourkey, but only a simple rectangle, so the semantics are actually quite complex. But it works well enough that we've shipped an X server and kernel with this support. We've tried jacking the RFBI frequency up a bit, and the most we could get was a ~10% improvement, with a loss in stability: anything above that would kill your device quick smart, whereas this one only crashed it every day or so. As Daniel explained, the next firmware will bring a big improvement in this area. I'm not sure whether it is worth to release the next version of MPlayer before that, since it will still be far from perfect on N800. I'd hold your breath, to be honest. A preview of the next kernel for beta testing might reduce time needed to get MPlayer fully working on N800, but I'm not demanding or expecting anything. It is just a matter of time anyway and I'm not so impatient :) Unfortunately, again, it's not my call: there are various processes to get things released (legal, in particular), and I can't really pre-empt those. I would be grateful for any comments and corrections. Some things are not yet clear to me, figuring them out myself is just a waste of time that could be spent on something more useful. Even a small hint may save a huge amount of time. Anything in particular? I thought my last mails on the subject would've been reasonably exhaustive. PS. The last 'inefficient' period of time was when I was struggling with gstreamer API (with no prior experience with it) to get MP3 playback in MPlayer working on DSP for a few months. Looks like the history repeats. Once again, I'm not demanding anything, it is just a matter of 'optimizing' development and spending scarce amounts of spare time more efficiently. I know that Nokia developers are too busy with their primary work, and really appreciate what they are doing. So consider this as a polite request for a favour (not necessary to fulfil right now or fulfil at all). Again, if there are any particular questions I can answer, don't be subtle: ask me straight up. If I can answer them (some things I can't necessarily say, some things I don't necessarily know), I will. Cheers, Daniel signature.asc Description: Digital signature ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers