Re: N800 Video playback

2007-10-18 Thread Hanno Zulla
Hi,

 The memory bandwidth to the N800 LCD framebuffer is 3 times slower that 
 the bandwidth in the N770? Is it really _that_ big?
 
 Siarhei's calculations were correct, so, yes.
 
 What is limiting the bandwidth: The OMAP interface, the LCD controller 
 itself or was it a design issue.
 
 a) and c).  It's just not stable at higher frequencies.

Just curious - is there any word out about the N810 regarding this
particular issue?

(As previously mentioned, my personal killer app for Maemo is full
screen 800x480 video @ 30 fps. Will it be possible?)

Thanks!

Hanno
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers


Re: N800 Video playback

2007-05-06 Thread Siarhei Siamashka
On Thursday 03 May 2007 10:21, Frantisek Dufka wrote:

 Siarhei Siamashka wrote:
If decoding time for
  each frame will never exceed 28-29ms (which is a tough limitation, cpu
  usage is not uniform), video playback without dropping any frames will be
  possible even with tearsync enabled.

 Would a double or multiple buffering help with this? 

Yes, most likely it will. N800 has 800x480 virtual size for framebuffer and a
new enhanced screen update ioctl. Now it should be possible (did not try yet, 
but will have some results very soon) to specify output position and size for
the rectangle as it gets displayed on the screen.

struct omapfb_update_window {
__u32 x, y;
__u32 width, height;
__u32 format;
__u32 out_x, out_y;
__u32 out_width, out_height;
__u32 reserved[8];
};

This theoretically allows us to use some kind of double buffering, we can
split framebuffer into two 400x480 parts and while one part is being
displayed, another one can be freely filled with the data for the next frame.
This will effectively remove the need for OMAPFB_SYNC_GFX, improving
peak framerate.

But this solution will require support for arbitrary downscaling in YUV420
format for each video frame to fit 400x480 box. The quality will be also
reduced a bit, but on the other hand, graphics bus should have no 
performance problems with sending 400x480 through it.

If virtual framebuffer size could be extended to 800x960, this would allow us
to use doublebuffering without sacrificing resolution. Anyway, I'll try to
fix MPlayer framebuffer output module to properly work with the latest
version of N800 firmware and implement this form of doublebuffering. It 
should provide the fastest video output performance that is possible.

Regarding Nokia 770, now it uses 800x600 framebuffer virtual size (some
extra waste of RAM?). Anyway, if hwa742 kernel driver could be extended to
support this improved screen update API and respect 'out_x' and 'out_y'
arguments, we could have four video pages in framebufer memory for 
400x240 pixel doubled video output. It could allow to implement a very 
efficient double buffering for accelerated Nokia 770 SDL project if it ever
takes off the ground :)

 Does mplayer use different threads for displaying and decoding and decode
 frames in advance? 

No, it doesn't have any extra threads now. But video playback on Nokia 770 
is already parallel, splitting tasks between the following pieces of hardware
each working simultaneously:
1. ARM core (demuxing and decoding video into framebuffer)
2. DMA + graphics controller (screen update transferring data from framebuffer
into videomemory and performaing YUV-RGB conversion on the fly)
3. C55x DSP core (mp3 audio decoding and playback)

There is not much point in creating many threads on ARM, as we only have a
single ARM core and splitting work into several threads will not accelerate
overall performance. Threads could be useful for doing something extra while
waiting for other hardware components to finish their work (waiting for screen
update for example), but decoding ahead will also require storing the decoded
data somewhere. This place for storing decoded ahead frames could be only
some extra space in framebuffer memory, otherwise we would lose some
performance on moving this data to framebuffer later (and increasing battery
power consumption). As framebuffer space is limited, we would not be able to
store many frames ahead, and decoding cpu usage most likely varies not 
between frames but more like between different scenes (complicated action
scene will make us run out of decode ahead buffer pretty fast). Anyway,
probably this may be worth trying later, there even exists some threads based
MPlayer fork: http://mplayerxp.sourceforge.net/
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: N800 Video playback

2007-05-06 Thread Siarhei Siamashka
On Friday 04 May 2007 10:49, Daniel Stone wrote:
 On Thu, May 03, 2007 at 11:10:32PM +0300, ext Siarhei Siamashka wrote:
  Well, found what's the matter and added explanation at bugzilla:
  https://maemo.org/bugzilla/show_bug.cgi?id=1281
 
  The workaround can be easily added to MPlayer, so that it will
  never call XvShmPutImage with top left image corner at an odd line.
  I'm going to release an updated MPlayer package (maybe even
  a bit later today), it is really fast on N800 with the optimized xserver
  :)

 Aha, that will indeed cause a fallback (x, y, width and height should
 all be aligned to 4px).

Could you clarify this information? The code from kernel framebuffer 
driver (blizzard.c) suggests that only width should be 4px aligned:

switch (color_mode) {
case OMAPFB_COLOR_YUV420:
/* Embedded window with different color mode */
bpp = 12;
/* X, Y, height must be aligned at 2, width at 4 pixels */
x = ~1;
y = ~1;
height = yspan = height  ~1;
width = width  ~3;
break;

Does xserver introduce additional limitations?
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: N800 Video playback

2007-05-04 Thread Daniel Stone
On Thu, May 03, 2007 at 11:10:32PM +0300, ext Siarhei Siamashka wrote:
 On Thursday 03 May 2007 08:48, Siarhei Siamashka wrote:
  The only thing which is unclear here is that Hailstorm does not need to
  downscale video in this situation. The bug can be reproduced with 512x288
  video which just needs upscaling to 800x450. Also even standard
  Nokia_N800.avi video with proper aspect ratio causes a huge
  performance regression and tearing.
 
  Please give this #1281 issue another look. It looks like a bug in xserver,
  but not a hardware limitation. I can probably try to workaround it by
  requesting not 512x288 buffer from Xv, but something like 512x308, use
  only 512x288 part of it and artificially add black bands above and below.
  After that, Xv can be asked to expand it to 800x480 to get expected result
  But if it is a bug in xserver, it would be better to get it fixed,
  preferably before the next firmware update :)
 
 Well, found what's the matter and added explanation at bugzilla:
 https://maemo.org/bugzilla/show_bug.cgi?id=1281
 
 The workaround can be easily added to MPlayer, so that it will 
 never call XvShmPutImage with top left image corner at an odd line. 
 I'm going to release an updated MPlayer package (maybe even 
 a bit later today), it is really fast on N800 with the optimized xserver :)

Aha, that will indeed cause a fallback (x, y, width and height should
all be aligned to 4px).

Cheers,
Daniel


signature.asc
Description: Digital signature
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: N800 Video playback

2007-05-03 Thread Frantisek Dufka

Siarhei Siamashka wrote:

  If decoding time for
each frame will never exceed 28-29ms (which is a tough limitation, cpu 
usage is not uniform), video playback without dropping any frames will be

possible even with tearsync enabled.


Would a double or multiple buffering help with this? Does mplayer use 
different threads for displaying and decoding and decode frames in advance?

___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: N800 Video playback

2007-05-02 Thread Siarhei Siamashka
On Tuesday 01 May 2007 20:49, Siarhei Siamashka wrote:

Looks like I have to reply to myself.

 On Tuesday 01 May 2007 17:49, Kalle Vahlman wrote:
  Applied and build without problems for me.

 Thanks a lot for building the package and putting it for download,
 everything seems to be fine, but more details will follow below.

[snip]

 Anyway, the new xserver package works really good. If we do some tests with
 the standard Nokia_N800.avi video clip, we get the following results with
 the patched xserver:

 #  mplayer -benchmark -quiet -noaspect Nokia_N800.avi
 BENCHMARKs: VC:  29,764s VO:   7,666s A:   0,468s Sys:  64,635s =  102,534s
 BENCHMARK%: VC: 29,0287% VO:  7,4767% A:  0,4565% Sys: 63,0381% = 100,%
 BENCHMARKn: disp: 2504 (24,42 fps)  drop: 0 (0%)  total: 2504 (24,42 fps)

 #  mplayer -benchmark -quiet -noaspect -dr -nomenu Nokia_N800.avi
 BENCHMARKs: VC:  30,266s VO:   5,490s A:   0,467s Sys:  66,286s =  102,509s
 BENCHMARK%: VC: 29,5255% VO:  5,3554% A:  0,4560% Sys: 64,6631% = 100,%
 BENCHMARKn: disp: 2501 (24,40 fps)  drop: 0 (0%)  total: 2501 (24,40 fps)

 Results with unpatched xserver and some more explanations can be found in
 [3]. 
 Yes, now N800 is faster than Nokia 770 for video output performance at 
 last :)

Well, still not everything is so good until the following bug gets fixed:
https://maemo.org/bugzilla/show_bug.cgi?id=1281

The patch for optimized Xv performance will not help to watch widescreen
video which triggers this tearing bug. If you see tearing on the screen, you
should know that the YUV420 color format conversion optimization patch 
does not  get used at all and xserver most likely uses a slow nonoptimized
YUV422 fallback code with software scaling.

Fixing this bug is critical for video playback performance. I hope it will be
solved in the next version of N800 firmware too. But it we get some patch to
solve this problem for testing earlir, that would be nice too.

 Video output overhead on N800 is really at least halved. Of course, video
 output takes only some fraction of time in video player. So overall
 performance improvement for Nokia_N800.avi playback is approximately 20%
 but not 250%-300% which can be observed for  'omapCopyPlanarDataYUV420'
 function alone.

Before anybody noticed, correcting myself :) This 'omapCopyPlanarDataYUV420'
has 2.5x-3x improvement which is equal to 150%-200% in percents. Elementary
arithmetics is tough when you are tired
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: N800 Video playback

2007-05-02 Thread Frantisek Dufka

Kalle Vahlman wrote:


I put the deb up at:

 http://iki.fi/zuh/xserver-xomap_1.1.99.3-0.zuh2_armel.deb

until I get it to the repository. This version also has the composite
extension enabled, but AFAIK it does not depend on the libs or change
server behaviour if composite is not specifically used.

The server *should* be compiled with '-mcpu=arm1136j-s -mfpu=vfp
-mfloat-abi=softfp -O2', but as I had troubles with the
SBOX_EXTRA_COMPILER_ARGS env var being honored some time ago I'm not
guaranteeing it at the moment ;)



I also succeeded in making the deb:
http://fanoush.wz.cz/maemo/xserver-xomap_1.1.99.3-0osso31_armel.deb

This one is compiled as thumb (except the ASM code) and no special CPU 
flags so it can be verified if there is any slowdown. Thumb mode saves 
approx. 300kb of executable size. It seems to be used by default in 
firmware images.


Kalle, did it link properly for you? With the patch the final Xomap link 
did not add the ASM code, I had to do it by hand. I didn't find proper 
place in Makefile for it to be added to libomap.a, the place patched by 
Siarhei was ignored by the build process for me.


Frantisek

___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: N800 Video playback

2007-05-02 Thread Daniel Stone
On Wed, May 02, 2007 at 09:16:01AM +0300, ext Siarhei Siamashka wrote:
 On Tuesday 01 May 2007 20:49, Siarhei Siamashka wrote:
  Results with unpatched xserver and some more explanations can be found in
  [3]. 
  Yes, now N800 is faster than Nokia 770 for video output performance at 
  last :)
 
 Well, still not everything is so good until the following bug gets fixed:
 https://maemo.org/bugzilla/show_bug.cgi?id=1281
 
 The patch for optimized Xv performance will not help to watch widescreen
 video which triggers this tearing bug. If you see tearing on the screen, you
 should know that the YUV420 color format conversion optimization patch 
 does not  get used at all and xserver most likely uses a slow nonoptimized
 YUV422 fallback code with software scaling.

Indeed.  And the reason the code is there is because Hailstorm can only
downscale at fixed ratios (half and one-quarter), and even then, it
locked up when we tried.  Similarly, the display controller's
downscaling didn't work, either.  So we can optimise the fallback path,
but you'll still be screwed by sending 16bpp (instead of 12bpp) through
RFBI.

 Fixing this bug is critical for video playback performance. I hope it will be
 solved in the next version of N800 firmware too. But it we get some patch to
 solve this problem for testing earlir, that would be nice too.

The only patch is optimising that function, really.  Even if we did work
out a way to make Hailstorm happy, you can still only scale at those
exact multiples, which doesn't make it a viable general solution.

Cheers,
Daniel


signature.asc
Description: Digital signature
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: N800 Video playback

2007-05-02 Thread Daniel Stone
On Tue, May 01, 2007 at 08:49:20PM +0300, ext Siarhei Siamashka wrote:
 On Tuesday 01 May 2007 17:49, Kalle Vahlman wrote:
  For testing, I fabricated some video with gstreamer:
 
  which resulted in [EMAIL PROTECTED] and [EMAIL PROTECTED] videos. For some
  reason 320x240 and 352x288 refused to play with:
 
  X11 error: BadValue (integer parameter out of range for operation)
  MPlayer interrupted by signal 6 in module: flip_page while gstreamer did
  play them just fine. Also the Nokia_N800.avi and  NokiaN93.avi died in the
  same way. 
 
 This X11 error on video playback start and also sometimes on switching
 fullscreen/windowed mode is a known problem [1] reported in this mailing list.
 
 If MPlayer dies on start, usually trying to start it again succeeds. So these
 320x240 and 352x288 videos could be played as well if you were a bit more
 persistent :)

Resizing is a bit tricky.  Most video hardware lets you use the hardware
to clip, so if you move it beyond the edge of the screen, it just
happily ignores anything beyond the hardware's bounds.  Unfortunately
for us, attempting to move a video surface off-screen (even by just a
few pixels) triggers a hardware lockup.

Given that we can't display the frame at all, we send BadValue (there
are a couple of other conditions where this is possible, but this is the
main one).  I don't see the point in returning Success when no video is
drawn at all.  So, I guess you could hack mplayer's error handler to
just ignore BadValues from Xv(Shm)PutImage, unless you get more than
five or ten in a row, say.

 As Daniel replied in one of the followup messages, it is most likely some race
 condition. The question is which code is a suspect. Is it MPlayer Xv video
 output code that has been around for ages and worked fine on different systems
 or relatively new Xv extension code from N800 xserver? In addition, a previous
 revision of N800 firmware had a serious bug [2] related to video playback. It
 should be noted, that MPlayer needed only about 1 minute to freeze on the
 initial N800 firmware. So the problem could be identified much more easily
 if MPlayer was included in the standard set of tests done by Nokia QA staff
 before each new IT OS release. Surely, Nokia is only interested in a
 properly working xvimagesink for the software included in IT OS by default.
 But testing with more client applications can improve overall xserver quality.

Bear in mind that, as you've hinted at, the only part of the Xv code
which is custom is the _output_ code.  We're using the standard X server
implementation (as used by tens of millions of people) for the protocol
decode and standard semantics, the standard KDrive layer for extended
stuff (as used by god-knows-how-many embedded and consumer devices), and
then the only part we have to play is taking frames and putting them on
the screen.

Due to some restrictions (as above), we have to deliberately error out
on some operations.  But errors like that tend to say 'you've hit a
hardware restriction, I can't do this', rather than 'you hit one of the
many random return BadValues we put in this weird code just to confuse
people'.

Also, bear in mind that a lot of the initial instability was due to the
DSP.  The video was actually rather stable when you played without
sound, although now the situation is somewhat reversed with the DSP
being pretty steady now, and the new YUV420 code having complicated
semsnatics.

 I have also submitted this patch to maemo bugzilla, hopefully it (or its
 modification) can get included into the next version of N800 firmware:
 https://maemo.org/bugzilla/show_bug.cgi?id=1278

I'll merge it with some changes.

Cheers,
Daniel


signature.asc
Description: Digital signature
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: N800 Video playback

2007-05-02 Thread Daniel Stone
On Tue, May 01, 2007 at 11:51:50AM +0300, ext Siarhei Siamashka wrote:
 On Monday 30 April 2007 17:49, Daniel Stone wrote:
  Indeed.  Unfortunately this is slightly misleading in that it only shows
  the raw write speed.  RFBI can't deal with the sorts of speeds that your
  hyper-optimised version is pumping out, e.g.  So it's mainly just about
  cutting the latency into the critical path to low enough that it makes
  no difference.
 
 The 'framebuffer' is just the ordinary system memory, converting color format 
 and copying data to framebuffer will be done with the same performance as 
 simulated in this test. RFBI performance is only critical for asynchronous
 DMA data transfer to LCD controller which does not introduce any overhead 
 and is performed at the same time as ARM core is doing some other work
 (decoding the next frame). RFBI performance matters only if data transfer to
 LCD is still not complete at the time when the next frame is already decoded
 and is ready to be displayed. When playing video, ARM core and LCD controller
 are almost always working at the same time performing different tasks in
 parallel. I think I had already explained these details in [1]

Right.  My point is that the numbers you're showing -- while very good,
don't get me wrong -- won't necessarily have a huge direct impact on
video playback.  Particularly if you want to avoid tearing.

 So now the results of the tests are consistent - when doing video output, most
 of ARM core cycles are spent in this 'omapCopyPlanarDataYUV420' function.

Well, either that, or just waiting for RFBI transfers to complete.

 Optimizing it using 'yv12_to_yuv420_line_armv6' will definitely provide a huge
 effect, video output overhead when using Xv will be at least halved providing
 more cpu resources for video decoding.

Yes, this is one good aspect.

  I don't have any tips, per se.  Once I get it all integrated it'll be in
  git, but for now, the only public source is the packages.
 
 OK, thanks. It may take some time though. I'm still using old scratchbox
 with mistral SDK here (did not have enough free time to upgrade yet). Until I
 clean up my scratchbox mess, I can only provide some patch without testing, if
 anybody courageous can try to build it :)

I'm still using Scratchbox 0.9.8.5 for day-to-day stuff ...

 Well, anyway, everything worked perfectly and I could play 640x480 video 
 on N800 with the following statistics:
 
 VIDEO:  [DIVX]  640x480  12bpp  23.976 fps  886.7 kbps (108.2 kbyte/s)
 ...
 BENCHMARKs: VC:  87,757s VO:   8,712s A:   1,314s Sys:   3,835s =  101,618s
 BENCHMARK%: VC: 86,3592% VO:  8,5736% A:  1,2932% Sys:  3,7740% = 100,%
 BENCHMARKn: disp: 2044 (20,11 fps)  drop: 355 (14%)  total: 2399 (23,61 fps)
 
 As you see, mplayer took 8.712 seconds to display 2044 VGA resolution frames. 
 If we do the necessary calculations, that's 72 millions pixels per second,
 quite close to 'yv12_to_yuv420_line_armv6' capabilities limit, so this
 function is the only major contributor to video output time. Video output
 took much less time than decoding, so it proves that video output 
 overhead can be reduced to minimum (in this test tearsync was not used
 though).

I'd be curious to see the results from this with tearsync _enabled_?
i.e., after your OMAPFB_UPDATE_WIDNOW call, issue an OMAPFB_SYNC_GFX
ioctl before you start writing to memory again.  This is basically the
limiter for us at this stage.

 When tearsync comes into action, everything gets a bit more complicated. I'm
 still investigating its impact on video playback performance.

'Not good'. :)

Thanks again for your work.

Cheers,
Daniel


signature.asc
Description: Digital signature
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: Documenting maemo pearls (was Re: N800 Video playback)

2007-05-02 Thread Gustavo Sverzut Barbieri

On 5/2/07, Quim Gil [EMAIL PROTECTED] wrote:

On Tue, 2007-05-01 at 03:29 -0300, ext Gustavo Sverzut Barbieri wrote:
 Daniel, Siarhei, Eero: I always find your mails to provide great deal
 of tech information about N800.

What a coincidence, me too.  ;)

 However we do not have a central place
 with these information, it would be great if you guys setup a wiki
 page with tech details about drivers, optimizations and weakness of
 current implementations so others could base work on.

Indeed. But knowing about the day to day of these busy guys I kind of
understand why things they write instantly in an email can't be easily
reproduced by themselves in a more formal way.


I know, and problem is that we're not always sure of some things, some
effects are collateral, some are expected... that wastes our time and
when you're finished, often you're so tired you won't document it,
just archive the excerpt you want, without any context... you'll know
it when you need.



But we do want to have all these pearls available in a structured way in
maemo.org. Easing web publishing is a step, partially covered now by the
Midgard CMS integration. Providing an appropriate content structure is a
next step (I'm responsible of). Having that doc manager in place will
definitely help, as well, as making sure that every relevant component
in our architecture is officially covered by someone of the team (still
working on this).

Until then we will keep getting busy developers really sensitive to
openness and dialog, finding some spare time to answer questions and
fill indirectly the gaps in our documentation.


Quim, while formal documents as those maemo.org provides are cool,
it consumes a lot of resources... doing simple but correct/consistent
wiki is good enough. Maybe we could setup a techday that we'd meet
on IRC and document some topics on Wiki. It would be great to get some
people with deep knowledge on hw issues, like Daniel, Siarhei and
Eero... I could help with writing and organization, as I never dig on
hw that much (but I'll need to do so really soon).



... Said that, there is nothing stopping anyone from collecting these
pearls in the maemo.org wiki.  ;)


Sure


--
Gustavo Sverzut Barbieri
--
Jabber: [EMAIL PROTECTED]
  MSN: [EMAIL PROTECTED]
 ICQ#: 17249123
Skype: gsbarbieri
Mobile: +55 (81) 9927 0010
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: Documenting maemo pearls (was Re: N800 Video playback)

2007-05-02 Thread Daniel Stone
Hi,

On Wed, May 02, 2007 at 10:05:13AM -0300, ext Gustavo Sverzut Barbieri wrote:
 On 5/2/07, Quim Gil [EMAIL PROTECTED] wrote:
 On Tue, 2007-05-01 at 03:29 -0300, ext Gustavo Sverzut Barbieri wrote:
  Daniel, Siarhei, Eero: I always find your mails to provide great deal
  of tech information about N800.
 
 What a coincidence, me too.  ;)
 
  However we do not have a central place
  with these information, it would be great if you guys setup a wiki
  page with tech details about drivers, optimizations and weakness of
  current implementations so others could base work on.
 
 Indeed. But knowing about the day to day of these busy guys I kind of
 understand why things they write instantly in an email can't be easily
 reproduced by themselves in a more formal way.
 
 I know, and problem is that we're not always sure of some things, some
 effects are collateral, some are expected... that wastes our time and
 when you're finished, often you're so tired you won't document it,
 just archive the excerpt you want, without any context... you'll know
 it when you need.

If there's anything you want to know directly, just ask on the list.  I
tend to deal with email when I'm not actively coding/building/etc, which
is how I justify it.  A wiki would require me to sit down for a while
and really think about stuff, and I don't really have huge blocks of
time available to me.

But yeah, always happy to answer direct questions.

 But we do want to have all these pearls available in a structured way in
 maemo.org. Easing web publishing is a step, partially covered now by the
 Midgard CMS integration. Providing an appropriate content structure is a
 next step (I'm responsible of). Having that doc manager in place will
 definitely help, as well, as making sure that every relevant component
 in our architecture is officially covered by someone of the team (still
 working on this).
 
 Until then we will keep getting busy developers really sensitive to
 openness and dialog, finding some spare time to answer questions and
 fill indirectly the gaps in our documentation.
 
 Quim, while formal documents as those maemo.org provides are cool,
 it consumes a lot of resources... doing simple but correct/consistent
 wiki is good enough. Maybe we could setup a techday that we'd meet
 on IRC and document some topics on Wiki. It would be great to get some
 people with deep knowledge on hw issues, like Daniel, Siarhei and
 Eero... I could help with writing and organization, as I never dig on
 hw that much (but I'll need to do so really soon).

If you can manage the timezones, that would probably be okay.
America/Europe is doable if you guys get up early, just as long as
no-one from Asia-Pacific wants to join in ...

Cheers,
Daniel


signature.asc
Description: Digital signature
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: Documenting maemo pearls (was Re: N800 Video playback)

2007-05-02 Thread Frantisek Dufka

Daniel Stone wrote:


If there's anything you want to know directly, just ask on the list.  I
tend to deal with email when I'm not actively coding/building/etc, which
is how I justify it.  A wiki would require me to sit down for a while
and really think about stuff, and I don't really have huge blocks of
time available to me.

But yeah, always happy to answer direct questions.


Disadvantage is that it becomes lost in the list archive. Even when you 
do search the archive it is hard to know proper keywords and it is very 
likely your brilliant answer will not be found. Many times I am 100% 
sure the answer is in the list since I remember someone answered it some 
time ago but even then it is hard or impossible to find.


Gustavo Sverzut Barbieri wrote:

Quim, while formal documents as those maemo.org provides are cool,
it consumes a lot of resources... doing simple but correct/consistent
wiki is good enough. Maybe we could setup a techday that we'd meet
on IRC and document some topics on Wiki. It would be great to get some
people with deep knowledge on hw issues, like Daniel, Siarhei and
Eero... I could help with writing and organization, as I never dig on
hw that much (but I'll need to do so really soon).


If you can manage the timezones, that would probably be okay.
America/Europe is doable if you guys get up early, just as long as
no-one from Asia-Pacific wants to join in ...



This techday is good idea. Sadly it depends on people being available at 
that time and most probably most people providing interesting answers 
may be the most busy ones. I tend to avoid IRC because it is big waste 
of time. There are few gems too found in the archives (thanks Marius G. 
;-) but 98% is just babble and FAQs repeated again and again. However I 
would try to join such techday on IRC (not that I expect my presence to 
be useful to others). It would be nice to have such tech days regulary 
preferably with few topics set in advance. But still I don't know how 
real it is to achieve this and whether wiki or mailing list is not 
better suited for this after all.


Frantisek
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: N800 Video playback

2007-05-02 Thread Siarhei Siamashka
On Wednesday 02 May 2007 12:54, Daniel Stone wrote:
  The 'framebuffer' is just the ordinary system memory, converting color
  format and copying data to framebuffer will be done with the same
  performance as simulated in this test. RFBI performance is only critical
  for asynchronous DMA data transfer to LCD controller which does not
  introduce any overhead and is performed at the same time as ARM core is
  doing some other work (decoding the next frame). RFBI performance matters
  only if data transfer to LCD is still not complete at the time when the
  next frame is already decoded and is ready to be displayed. When playing
  video, ARM core and LCD controller are almost always working at the same
  time performing different tasks in parallel. I think I had already
  explained these details in [1]

 Right.  My point is that the numbers you're showing -- while very good,
 don't get me wrong -- won't necessarily have a huge direct impact on
 video playback.  Particularly if you want to avoid tearing.

I have no idea what other proof would be enough for you. You already got all
the numbers, and even benchmarks with patched xserver. They all confirm
video output performance improvement.

  So now the results of the tests are consistent - when doing video output,
  most of ARM core cycles are spent in this 'omapCopyPlanarDataYUV420'
  function.

 Well, either that, or just waiting for RFBI transfers to complete.

You need to wait a bit before displaying the next frame anyway, and 
the period between frames for 30 fps video usually eclipses transfer
completion time. If you want some numbers, now 640x480 YUV420 (12bpp) 
screen update takes now 25ms without tearsync flag enabled 
(OMAPFB_FORMAT_FLAG_TEARSYNC for OMAPFB_UPDATE_WINDOW 
ioctl) and 25-42ms with tearsync. For 30 fps video, period between
performing screen updates is normally 33ms. For playing video, we
initiate RFBI transfer, wait till it completes, perform VY12-YUV420 color
format conversion (which should take less than 4ms for 640x480 
considering benmchmark results), wait till it is time to display the next
frame and start RFBI transfer again. For 30 fps video 25ms+4ms is less 
than 33ms, so without tearsync enabled, any 640x480 video should play
fine (considering video output performance). With tearsync enabled, we 
should add the time needed for performing vertical sync in LCD controller
which breaks our nice numbers. Worst case (17ms wait for retrace + 25ms
for actual data transfer) takes more time than 33ms between frames.
We can be saved if LCD controller internal refresh rate is really 60Hz,
it this case video playback will automagically synchronize to LCD refresh 
rate and each frame processing will be done exactly within 2 LCD refresh
cycles (by the time we want to display a video frame, the next vertical will
be near and we will not lose much time waiting for it). If decoding time for
each frame will never exceed 28-29ms (which is a tough limitation, cpu 
usage is not uniform), video playback without dropping any frames will be
possible even with tearsync enabled. That's what I'm investigating now.
In any case, getting ideal 24 fps playback will be a bit easier.

I hope all these explanations are clear now. And this is not just a theory,
but already confirmed by some experiments and practical tests.

 I'm still using Scratchbox 0.9.8.5 for day-to-day stuff ...

Thanks, that is what I would consider 'additional tips and tricks' :)

It is good to know that maemo 3.x development can be also done with 
older scratchbox (I have 0.9.8.8 installed now), I'll try it without upgrading
scratchbox then.

  Well, anyway, everything worked perfectly and I could play 640x480 video
  on N800 with the following statistics:
 
  VIDEO:  [DIVX]  640x480  12bpp  23.976 fps  886.7 kbps (108.2 kbyte/s)
  ...
  BENCHMARKs: VC:  87,757s VO:   8,712s A:   1,314s Sys:   3,835s = 
  101,618s BENCHMARK%: VC: 86,3592% VO:  8,5736% A:  1,2932% Sys:  3,7740%
  = 100,% BENCHMARKn: disp: 2044 (20,11 fps)  drop: 355 (14%)  total:
  2399 (23,61 fps)
 
  As you see, mplayer took 8.712 seconds to display 2044 VGA resolution
  frames. If we do the necessary calculations, that's 72 millions pixels
  per second, quite close to 'yv12_to_yuv420_line_armv6' capabilities
  limit, so this function is the only major contributor to video output
  time. Video output took much less time than decoding, so it proves that
  video output overhead can be reduced to minimum (in this test tearsync
  was not used though).

 I'd be curious to see the results from this with tearsync _enabled_?
 i.e., after your OMAPFB_UPDATE_WIDNOW call, issue an OMAPFB_SYNC_GFX
 ioctl before you start writing to memory again.  This is basically the
 limiter for us at this stage.

That's exactly how MPlayer works. It always waits on OMAPFB_SYNC_GFX 
before filling framebuffer with the data for the next frame. Not issuing
OMAPFB_SYNC_GFX would introduce *artificial* tearing not related to sync
with LCD 

RE: Documenting maemo pearls (was Re: N800 Video playback)

2007-05-02 Thread quim.gil
Don't kill the messenger! 

 But yeah, always happy to answer direct questions.

Disadvantage is that it becomes lost in the list archive.

This is an old problem communication science solved centuries ago:
generally you have those generating information and those collecting it.
Asking the sources to organize information is many times as useless as
asking the documenters to generate new data.

I keep thinking the right approach in our case is:

- maemo.org should provide the right infrastructure to document easily
(getting there).

- the maemo team should make sure that all the essential information
reaches the official documentation (still a while to get there).

- the maemo community could help organizing themselves in wiki-based
collaboration and pointing essential information missing in the official
documentation (up to you, tell us where we can help).

I keep insisting in a clear separation between official and community
documentation. Don't get me wrong, I think the quality and usefulness of
community docs can match and outsmart official documentation, in maemo
and in any software project (in fact in *any* type of project). But
think on the zillions of newcomers we want to welcome: most of them are
looking for a single, comprehensive and reliable source of information,
structured in a way that makes sense in order to find what I'm looking
for. These are elements required in good quality official documentation,
while these same elements can kill community workflow (generally quite
spontaneous) if not handled properly.

Quim
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: N800 Video playback

2007-05-02 Thread Siarhei Siamashka
On Wednesday 02 May 2007 12:39, Daniel Stone wrote:
 On Wed, May 02, 2007 at 09:16:01AM +0300, ext Siarhei Siamashka wrote:
  On Tuesday 01 May 2007 20:49, Siarhei Siamashka wrote:
   Results with unpatched xserver and some more explanations can be found
   in [3].
   Yes, now N800 is faster than Nokia 770 for video output performance at
   last :)
 
  Well, still not everything is so good until the following bug gets fixed:
  https://maemo.org/bugzilla/show_bug.cgi?id=1281
 
  The patch for optimized Xv performance will not help to watch widescreen
  video which triggers this tearing bug. If you see tearing on the screen,
  you should know that the YUV420 color format conversion optimization
  patch does not  get used at all and xserver most likely uses a slow
  nonoptimized YUV422 fallback code with software scaling.

 Indeed.  And the reason the code is there is because Hailstorm can only
 downscale at fixed ratios (half and one-quarter), and even then, it
 locked up when we tried.  Similarly, the display controller's
 downscaling didn't work, either.  So we can optimise the fallback path,
 but you'll still be screwed by sending 16bpp (instead of 12bpp) through
 RFBI.

The only thing which is unclear here is that Hailstorm does not need to
downscale video in this situation. The bug can be reproduced with 512x288
video which just needs upscaling to 800x450. Also even standard 
Nokia_N800.avi video with proper aspect ratio causes a huge 
performance regression and tearing.

Please give this #1281 issue another look. It looks like a bug in xserver,
but not a hardware limitation. I can probably try to workaround it by
requesting not 512x288 buffer from Xv, but something like 512x308, use
only 512x288 part of it and artificially add black bands above and below.
After that, Xv can be asked to expand it to 800x480 to get expected result
But if it is a bug in xserver, it would be better to get it fixed, preferably
before the next firmware update :)

  Fixing this bug is critical for video playback performance. I hope it
  will be solved in the next version of N800 firmware too. But it we get
  some patch to solve this problem for testing earlir, that would be nice
  too.

 The only patch is optimising that function, really.  Even if we did work
 out a way to make Hailstorm happy, you can still only scale at those
 exact multiples, which doesn't make it a viable general solution.

I will do optimized software YV12-YUV420 JIT scaler a bit later (on next
weekend?). It will be only a minor modification of YV12-YUV422 scaler 
which already exists and works fine. If it can be useful for xserver, it might
be added there at any time.
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: N800 Video playback

2007-05-02 Thread Kalle Vahlman

2007/5/1, Siarhei Siamashka [EMAIL PROTECTED]:

On Tuesday 01 May 2007 17:49, Kalle Vahlman wrote:
  OK, here is this untested a patch for xserver to add ARMv6 optimized
  YUV420 color format conversion. Theoretically it should compile
  (I did not try to build xserver myself though) and work. If it refuses to
  compile, fixing the patch should be not too difficult.

 Applied and build without problems for me.

Thanks a lot for building the package and putting it for download, everything
seems to be fine, but more details will follow below.

 For testing, I fabricated some video with gstreamer:

 which resulted in [EMAIL PROTECTED] and [EMAIL PROTECTED] videos. For some
 reason 320x240 and 352x288 refused to play with:

 X11 error: BadValue (integer parameter out of range for operation)
 MPlayer interrupted by signal 6 in module: flip_page while gstreamer did
 play them just fine. Also the Nokia_N800.avi and  NokiaN93.avi died in the
 same way.

This X11 error on video playback start and also sometimes on switching
fullscreen/windowed mode is a known problem [1] reported in this mailing list.

If MPlayer dies on start, usually trying to start it again succeeds. So these
320x240 and 352x288 videos could be played as well if you were a bit more
persistent :)


No, it's actually 100% reproducable in this situation (yes, I tried a
number of . You see, I didn't have the window manager running. It
breaks with the N800 video too.
Running with the window manager does make it runnable, but it also
changes the window size which I wanted to avoid.


 My mplayer is compiled from the svn
 trunk of the garage project, with some additional cflags I use (so
 maybe those were the problem...).

Do you have a set of cflags settings which work better than the default set?
Can you share this information?


If by default set you mean what the default options in the toolchain
is, then yes (as there are none AFAIK ;). If you mean the default
options for mplayer, I don't know if they add any value. I like to use
my hardware well ;) so I tend to compile everything with VFP enabled
and optimized for the processor:

CFLAGS='-mcpu=arm1136j-s -mfpu=vfp -mfloat-abi=softfp -O2'

Now, wheter it works better than thumb code is debatable, as
optimizing code size might be more beneficial than having fast floats.
But at least I was happy with the results we got from our testing,
detailed in

 http://syslog.movial.fi/archives/46-N800-VFP-or-not-to-VFP.html

I doubt they will do much good for mplayer, as I assume most critical
operations will be highly optimized already by hand and not left
entirely for the compiler...


If you want a guaranteed video playback with divx/xvid/mpeg4 codecs, you
should restrict to 512x384 resolution or lower and keep bitrate reasonable.

The results for these 'insane' videos you have posted are somewhat weird, a
complete statistics would  require also a number of frames dropped, otherwise
we don't know how much work was done by the player. Probably missing audio
track resulted in MPlayer not being able to provide a proper report.


Yeah, I guess the fabricated videos weren't that good. Have to do some
more testing with real videos...


Yes, now N800 is faster than Nokia 770 for video output performance at last :)


This is _very_ cool indeed :)

--
Kalle Vahlman, [EMAIL PROTECTED]
Powered by http://movial.fi
Interesting stuff at http://syslog.movial.fi
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: N800 Video playback

2007-05-01 Thread Gustavo Sverzut Barbieri

On 4/30/07, Siarhei Siamashka [EMAIL PROTECTED] wrote:

On Friday 27 April 2007 04:43, Daniel Stone wrote:


[...]

Daniel, Siarhei, Eero: I always find your mails to provide great deal
of tech information about N800. However we do not have a central place
with these information, it would be great if you guys setup a wiki
page with tech details about drivers, optimizations and weakness of
current implementations so others could base work on.

I see that Eero has a how to at:
http://maemo.org/platform/docs/howtos/howto_performance_test_process.html

Other docs, describing best fetch size, which instructions that
usually are cheap are bad implemented/slow on omap2420, etc...

Tools would be great. I see Oprofile kernel was suggested to Siarhei,
so it would be great to have it for download on this wiki page as
well.

Thank you all for your great work! Keep it coming :-)

--
Gustavo Sverzut Barbieri
--
Jabber: [EMAIL PROTECTED]
  MSN: [EMAIL PROTECTED]
 ICQ#: 17249123
Skype: gsbarbieri
Mobile: +55 (81) 9927 0010
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: N800 Video playback

2007-05-01 Thread Kalle Vahlman

2007/5/1, Siarhei Siamashka [EMAIL PROTECTED]:

On Monday 30 April 2007 17:49, Daniel Stone wrote:
 It's completely safe to upgrade from a deb if it's not broken.  If you
 set up a standard Maemo build environment and run apt-get source
 xorg-server and apt-get build-dep xorg-server, it should work just fine,
 in theory.

 I don't have any tips, per se.  Once I get it all integrated it'll be in
 git, but for now, the only public source is the packages.

OK, thanks. It may take some time though. I'm still using old scratchbox
with mistral SDK here (did not have enough free time to upgrade yet). Until I
clean up my scratchbox mess, I can only provide some patch without testing, if
anybody courageous can try to build it :)


Given that I fear not the perils of building a X server with
nonstandard options[1], I shall be more than happy to conduct such
adventurous acts :)

And unless Mr. Kulve has objections, the results could be installed
from a repository as well.

[1] 
http://syslog.movial.fi/archives/47-Shadows-for-everyone-well,-not-really.html

--
Kalle Vahlman, [EMAIL PROTECTED]
Powered by http://movial.fi
Interesting stuff at http://syslog.movial.fi
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: N800 Video playback

2007-05-01 Thread Siarhei Siamashka
On Tuesday 01 May 2007 13:36, Kalle Vahlman wrote:
 2007/5/1, Siarhei Siamashka [EMAIL PROTECTED]:
  OK, thanks. It may take some time though. I'm still using old scratchbox
  with mistral SDK here (did not have enough free time to upgrade yet).
  Until I clean up my scratchbox mess, I can only provide some patch
  without testing, if anybody courageous can try to build it :)

 Given that I fear not the perils of building a X server with
 nonstandard options[1], I shall be more than happy to conduct such
 adventurous acts :)

 And unless Mr. Kulve has objections, the results could be installed
 from a repository as well.

 [1] 
 
http://syslog.movial.fi/archives/47-Shadows-for-everyone-well,-not-really.html

OK, here is this untested a patch for xserver to add ARMv6 optimized 
YUV420 color format conversion. Theoretically it should compile
(I did not try to build xserver myself though) and work. If it refuses to
compile, fixing the patch should be not too difficult.

In the worst case only video playback may be broked. But if everything works
as expected, video output performance should become a lot better.

Video output performance can be tested by mplayer using -benchmark 
option, 'VO:' stat shows how much time was used for video output, 'VC:' stat
shows how much time was used for video decoding.

Built-in video player also should become faster. I don't know if this
improvement can be 'scientifically' benchmarked, but it should drop less
frames on high resolution video playback.

If any of you can build xserver package with this patch, please put it for
download somewhere or send directly to me.

Thanks.
diff -u -r -N xorg-server-1.1.99.3/hw/kdrive/omap/Makefile.am xorg-server-1.1.99.3.yuv420patch/hw/kdrive/omap/Makefile.am
--- xorg-server-1.1.99.3/hw/kdrive/omap/Makefile.am	2007-03-05 16:17:32.0 +0200
+++ xorg-server-1.1.99.3.yuv420patch/hw/kdrive/omap/Makefile.am	2007-05-01 15:04:43.0 +0300
@@ -1,5 +1,5 @@
 if XV
-XV_SRCS = omap_video.c
+XV_SRCS = omap_video.c omap_colorconv.S omap_colorconv.h
 endif
 
 if DEBUG
@@ -34,4 +34,4 @@
 	$(TSLIB_FLAG)		\
 	$(DYNSYMS)
 
-EXTRA_DIST = omap_video.c
+EXTRA_DIST = omap_video.c omap_colorconv.S omap_colorconv.h
diff -u -r -N xorg-server-1.1.99.3/hw/kdrive/omap/omap_colorconv.h xorg-server-1.1.99.3.yuv420patch/hw/kdrive/omap/omap_colorconv.h
--- xorg-server-1.1.99.3/hw/kdrive/omap/omap_colorconv.h	1970-01-01 03:00:00.0 +0300
+++ xorg-server-1.1.99.3.yuv420patch/hw/kdrive/omap/omap_colorconv.h	2007-05-01 15:06:13.0 +0300
@@ -0,0 +1,45 @@
+/*
+ * Copyright © 2007 Siarhei Siamashka
+ *
+ * Permission to use, copy, modify, distribute and sell this software and its
+ * documentation for any purpose is hereby granted without fee, provided that
+ * the above copyright notice appear in all copies and that both that
+ * copyright notice and this permission notice appear in supporting
+ * documentation, and that the names of the authors and/or copyright holders
+ * not be used in advertising or publicity pertaining to distribution of the
+ * software without specific, written prior permission.  The authors and
+ * copyright holders make no representations about the suitability of this
+ * software for any purpose.  It is provided as is without any express
+ * or implied warranty.
+ *
+ * THE AUTHORS AND COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO
+ * THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
+ * FITNESS, IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR
+ * ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER
+ * RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF
+ * CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
+ * CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ *
+ * Author: Siarhei Siamashka [EMAIL PROTECTED]
+ */
+
+/*
+ * ARMv6 assembly optimized color format conversion functions
+ * (planar YV12 to some custom YUV420 format used by graphics chip in Nokia N800)
+ */
+
+#ifndef _OMAP_COLORCONV_H_
+#define _OMAP_COLORCONV_H_
+
+#include stdint.h
+
+/**
+ * Convert a line of pixels from YV12 to YUV420 color format
+ * @param dst   - destination buffer for YUV420 pixel data, it should be at least 16-bit aligned
+ * @param src_y - pointer to Y plane, it should be 16-bit aligned
+ * @param src_c - pointer to chroma plane (U for even lines, V for odd lines)
+ * @param w - number of pixels to convert (should be multiple of 4)
+ */
+void yv12_to_yuv420_line_armv6(uint16_t *dst, const uint16_t *src_y, const uint8_t *src_c, int w);
+
+#endif
diff -u -r -N xorg-server-1.1.99.3/hw/kdrive/omap/omap_colorconv.S xorg-server-1.1.99.3.yuv420patch/hw/kdrive/omap/omap_colorconv.S
--- xorg-server-1.1.99.3/hw/kdrive/omap/omap_colorconv.S	1970-01-01 03:00:00.0 +0300
+++ xorg-server-1.1.99.3.yuv420patch/hw/kdrive/omap/omap_colorconv.S	2007-05-01 15:06:36.0 +0300
@@ -0,0 +1,244 @@
+/*
+ * Copyright © 2007 Siarhei 

Re: N800 Video playback

2007-05-01 Thread Kalle Vahlman

2007/5/1, Siarhei Siamashka [EMAIL PROTECTED]:

On Tuesday 01 May 2007 13:36, Kalle Vahlman wrote:
 2007/5/1, Siarhei Siamashka [EMAIL PROTECTED]:
  OK, thanks. It may take some time though. I'm still using old scratchbox
  with mistral SDK here (did not have enough free time to upgrade yet).
  Until I clean up my scratchbox mess, I can only provide some patch
  without testing, if anybody courageous can try to build it :)

 Given that I fear not the perils of building a X server with
 nonstandard options[1], I shall be more than happy to conduct such
 adventurous acts :)

 And unless Mr. Kulve has objections, the results could be installed
 from a repository as well.

 [1]

http://syslog.movial.fi/archives/47-Shadows-for-everyone-well,-not-really.html

OK, here is this untested a patch for xserver to add ARMv6 optimized
YUV420 color format conversion. Theoretically it should compile
(I did not try to build xserver myself though) and work. If it refuses to
compile, fixing the patch should be not too difficult.


Applied and build without problems for me.

For testing, I fabricated some video with gstreamer:

gst-launch-0.10 videotestsrc num-buffers=300 \
   ! video/x-raw-yuv, width=640, height=480 \
   ! ffenc_mpeg4 ! avimux \
   ! filesink location=640x480.avi

which resulted in [EMAIL PROTECTED] and [EMAIL PROTECTED] videos. For some
reason 320x240 and 352x288 refused to play with:

X11 error: BadValue (integer parameter out of range for operation)
MPlayer interrupted by signal 6 in module: flip_page

while gstreamer did play them just fine. Also the Nokia_N800.avi and
NokiaN93.avi died in the same way. My mplayer is compiled from the svn
trunk of the garage project, with some additional cflags I use (so
maybe those were the problem...).

Anyway, then I shut down af-base-apps and matchbox (to avoid scaling
the video) and ran mplayer -benchmark file.


In the worst case only video playback may be broked. But if everything works
as expected, video output performance should become a lot better.

Video output performance can be tested by mplayer using -benchmark
option, 'VO:' stat shows how much time was used for video output, 'VC:' stat
shows how much time was used for video decoding.


There's something fishy in the decoding or something as the color bars
in the test video were broken (yellow and cyan to be precise), but
that seemed to be the case in a vanilla image too so nothing to do
with this patch. I could not see any other glitches in the output.

But on to the results:

VIDEO:  [DX50]  640x480  24bpp  30.000 fps  1597.6 kbps (195.0 kbyte/s)

Original:
V:  10.0 300/300 44% 74%  0.0% 0 0 0%
BENCHMARKs: VC:   4.387s VO:   7.436s A:   0.000s Sys:   0.482s =   12.305s
BENCHMARK%: VC: 35.6503% VO: 60.4311% A:  0.% Sys:  3.9185% = 100.%


Patched:
V:  10.0 300/300 42% 72%  0.0% 0 0 0%
BENCHMARKs: VC:   4.213s VO:   7.265s A:   0.000s Sys:   0.381s =   11.859s
BENCHMARK%: VC: 35.5296% VO: 61.2604% A:  0.% Sys:  3.2100% = 100.%

---

VIDEO:  [DX50]  800x480  24bpp  30.000 fps  1976.5 kbps (241.3 kbyte/s)

Original:
V:  10.0 300/300 54% 114%  0.0% 0 0 0%
BENCHMARKs: VC:   5.466s VO:  11.456s A:   0.000s Sys:   0.366s =   17.287s
BENCHMARK%: VC: 31.6179% VO: 66.2677% A:  0.% Sys:  2.1144% = 100.%

Patched:
V:  10.0 300/300 53% 70%  0.0% 0 0 0%
BENCHMARKs: VC:   5.346s VO:   7.043s A:   0.000s Sys:   0.449s =   12.838s
BENCHMARK%: VC: 41.6414% VO: 54.8602% A:  0.% Sys:  3.4984% = 100.%

There is a clear drop in amount of time used to output the videos for
800x480 (the numbers were stable trough multiple runs).

So I gather from the 10s benchmark time that we didn't get to real
time yet, but close to it? And of course this is just video, audio
decoding should be considered for real video playback performance
measurement.


If any of you can build xserver package with this patch, please put it for
download somewhere or send directly to me.


I put the deb up at:

 http://iki.fi/zuh/xserver-xomap_1.1.99.3-0.zuh2_armel.deb

until I get it to the repository. This version also has the composite
extension enabled, but AFAIK it does not depend on the libs or change
server behaviour if composite is not specifically used.

The server *should* be compiled with '-mcpu=arm1136j-s -mfpu=vfp
-mfloat-abi=softfp -O2', but as I had troubles with the
SBOX_EXTRA_COMPILER_ARGS env var being honored some time ago I'm not
guaranteeing it at the moment ;)

--
Kalle Vahlman, [EMAIL PROTECTED]
Powered by http://movial.fi
Interesting stuff at http://syslog.movial.fi
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: N800 Video playback

2007-05-01 Thread Kalle Vahlman

2007/5/1, Kalle Vahlman [EMAIL PROTECTED]:

The server *should* be compiled with '-mcpu=arm1136j-s -mfpu=vfp
-mfloat-abi=softfp -O2', but as I had troubles with the
SBOX_EXTRA_COMPILER_ARGS env var being honored some time ago I'm not
guaranteeing it at the moment ;)


Actually seems that I had added the env var to the rules file so it
*is* built with those options.

I can produce a build without them if need be (it does affect
performance in my experience, so if one wants to see the impact of
that patch on a more normal version...).

--
Kalle Vahlman, [EMAIL PROTECTED]
Powered by http://movial.fi
Interesting stuff at http://syslog.movial.fi
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: N800 Video playback

2007-05-01 Thread Siarhei Siamashka
On Tuesday 01 May 2007 17:49, Kalle Vahlman wrote:
  OK, here is this untested a patch for xserver to add ARMv6 optimized
  YUV420 color format conversion. Theoretically it should compile
  (I did not try to build xserver myself though) and work. If it refuses to
  compile, fixing the patch should be not too difficult.

 Applied and build without problems for me.

Thanks a lot for building the package and putting it for download, everything
seems to be fine, but more details will follow below.

 For testing, I fabricated some video with gstreamer:

 which resulted in [EMAIL PROTECTED] and [EMAIL PROTECTED] videos. For some
 reason 320x240 and 352x288 refused to play with:

 X11 error: BadValue (integer parameter out of range for operation)
 MPlayer interrupted by signal 6 in module: flip_page while gstreamer did
 play them just fine. Also the Nokia_N800.avi and  NokiaN93.avi died in the
 same way. 

This X11 error on video playback start and also sometimes on switching
fullscreen/windowed mode is a known problem [1] reported in this mailing list.

If MPlayer dies on start, usually trying to start it again succeeds. So these
320x240 and 352x288 videos could be played as well if you were a bit more
persistent :)

As Daniel replied in one of the followup messages, it is most likely some race
condition. The question is which code is a suspect. Is it MPlayer Xv video
output code that has been around for ages and worked fine on different systems
or relatively new Xv extension code from N800 xserver? In addition, a previous
revision of N800 firmware had a serious bug [2] related to video playback. It
should be noted, that MPlayer needed only about 1 minute to freeze on the
initial N800 firmware. So the problem could be identified much more easily
if MPlayer was included in the standard set of tests done by Nokia QA staff
before each new IT OS release. Surely, Nokia is only interested in a
properly working xvimagesink for the software included in IT OS by default.
But testing with more client applications can improve overall xserver quality.

With all that said, I don't know if MPlayer Xv code is bugfree, it wasn't me
who developed it.

 My mplayer is compiled from the svn
 trunk of the garage project, with some additional cflags I use (so
 maybe those were the problem...).

Do you have a set of cflags settings which work better than the default set?
Can you share this information?

 There's something fishy in the decoding or something as the color bars
 in the test video were broken (yellow and cyan to be precise), but
 that seemed to be the case in a vanilla image too so nothing to do
 with this patch. I could not see any other glitches in the output.

 But on to the results:

 VIDEO:  [DX50]  640x480  24bpp  30.000 fps  1597.6 kbps (195.0 kbyte/s)
[snip]
 VIDEO:  [DX50]  800x480  24bpp  30.000 fps  1976.5 kbps (241.3 kbyte/s)
[snip]
 There is a clear drop in amount of time used to output the videos for
 800x480 (the numbers were stable trough multiple runs).

 So I gather from the 10s benchmark time that we didn't get to real
 time yet, but close to it? And of course this is just video, audio
 decoding should be considered for real video playback performance
 measurement.

These videos are way too heavy for N800 to decode and play in realtime. We
may expect playback for videos up to 640x480 resolution with 1000kbps 
bitrate and 24fps. This is probably current realistic limit which can be
achieved. Some minor variations to these parameters are possible (for example
we can get 30fps, but should also reduce resolution or bitrate, etc.).

If you want a guaranteed video playback with divx/xvid/mpeg4 codecs, you
should restrict to 512x384 resolution or lower and keep bitrate reasonable.

The results for these 'insane' videos you have posted are somewhat weird, a
complete statistics would  require also a number of frames dropped, otherwise
we don't know how much work was done by the player. Probably missing audio
track resulted in MPlayer not being able to provide a proper report. Don't
know. Also it is strange that you did not see any improvement at all for
640x480 video, are you sure you really tested it with the patched xserver?

Anyway, the new xserver package works really good. If we do some tests with
the standard Nokia_N800.avi video clip, we get the following results with the
patched xserver:

#  mplayer -benchmark -quiet -noaspect Nokia_N800.avi
BENCHMARKs: VC:  29,764s VO:   7,666s A:   0,468s Sys:  64,635s =  102,534s
BENCHMARK%: VC: 29,0287% VO:  7,4767% A:  0,4565% Sys: 63,0381% = 100,%
BENCHMARKn: disp: 2504 (24,42 fps)  drop: 0 (0%)  total: 2504 (24,42 fps)

#  mplayer -benchmark -quiet -noaspect -dr -nomenu Nokia_N800.avi
BENCHMARKs: VC:  30,266s VO:   5,490s A:   0,467s Sys:  66,286s =  102,509s
BENCHMARK%: VC: 29,5255% VO:  5,3554% A:  0,4560% Sys: 64,6631% = 100,%
BENCHMARKn: disp: 2501 (24,40 fps)  drop: 0 (0%)  total: 2501 (24,40 fps)

Results with unpatched xserver and some more 

Re: N800 Video playback

2007-05-01 Thread Frantisek Dufka

Frantisek Dufka wrote:

[sbox-SDK_ARMEL: ~/x/xorg-server-1.1.99.3]  patch -p1 
../xomap_yuv420patch.diff

patching file hw/kdrive/omap/Makefile.am
Hunk #1 FAILED at 1.
Hunk #2 FAILED at 34.
2 out of 2 hunks FAILED -- saving rejects to file 
hw/kdrive/omap/Makefile.am.rej

patching file hw/kdrive/omap/omap_colorconv.h
patching file hw/kdrive/omap/omap_colorconv.S
patching file hw/kdrive/omap/omap_video.c
Hunk #1 FAILED at 39.
Hunk #2 FAILED at 468.
Hunk #3 FAILED at 491.
3 out of 3 hunks FAILED -- saving rejects to file 
hw/kdrive/omap/omap_video.c.rej





Sorry, my fault, mystery solved. Saved attachement in Thunderbird in 
Windows XP, then moved to Ubuntu inside VMware. The problem was caused 
by DOS CR+LF line endings, patch doesn't like it. Recoded to unix 
linefeeds and now it applies cleanly. I'm using Windows a lot, it is 
strange this never happened to me yet.

___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: N800 Video playback

2007-05-01 Thread Daniel Amelang

On 4/30/07, Daniel Stone [EMAIL PROTECTED] wrote:


 There are two important optimizations in this code:
 1. Cache prefetch with PLD instruction (added in '_armv5' version) which
 boosts performance to 70 megapixels per second. Inner loop is unrolled
 to process 32 pixels per iteration (cache line size is 32 bytes on ARM, so
 such unrolling is convenient). This is the most important improvement.
 You can try using __builtin_prefetch() from C code to do the same
 optimization.

Ah, sounds useful.  From what Dan Amelang's been saying on xorg@, gcc
should coalesce four 32-bit reads into one 128-bit read, but this sounds
promising as well.


To expand on this: I was referring to fact that gcc is pretty smart
about using ldmia/stdmia instructions to cluster sequential
reads/writes. I see that Siarhei is already using this technique in
his assembler code, so nothing new here.

Dan
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: N800 Video playback

2007-05-01 Thread Daniel Amelang

On 5/1/07, Daniel Amelang [EMAIL PROTECTED] wrote:


about using ldmia/stdmia instructions to cluster sequential


that was supposed to be ldmia/sdmia, sorry.

Dan
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: N800 Video playback

2007-05-01 Thread Daniel Amelang

On 5/1/07, Daniel Amelang [EMAIL PROTECTED] wrote:

On 5/1/07, Daniel Amelang [EMAIL PROTECTED] wrote:

 about using ldmia/stdmia instructions to cluster sequential

that was supposed to be ldmia/sdmia, sorry.


Gah, ldmia/stmia, final answer.

Dan
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: N800 Video playback

2007-04-30 Thread Siarhei Siamashka
On Friday 27 April 2007 04:43, Daniel Stone wrote:

  I'll make a really optimized version of YV12 - YUV420 convertor on this
  weekend (removing branch is good, but I feel that it can be improved
  more) and will try to use it on Nokia 770, any extra video performance
  improvement will be useful there. I hope that the framebuffer driver on
  Nokia 770 supports YUV420 color format properly.

 I don't think Tornado supports YUV420, but I can check in the specs
 tomorrow.  My better C version basically does two macroblocks at a time,
 ensuring all 32-bit writes (which _really_ helps over 16-bit writes,
 believe me).  This eliminates the branch, since your surface is
 guaranteed to be word-aligned, so if you do all 32-bit writes, you can
 just drop the branch as you know every write will be aligned.

 This will be really fast.

Optimized YV12 - YUV420 convertor is done. The sources can be found here:
https://garage.maemo.org/plugins/scmsvn/viewcvs.php/trunk/libswscale_nokia770/?root=mplayer

Take a look at 'arm_colorconv.h' and 'arm_colorconv.S' files. Also there is a
test program ('test_colorconv') which can ensure that everything works
correctly and fast:

~ $ ./test_colorconv
test: 'yv12_to_yuv420_xomap', 
time=7.332s, speed=32.878MP/s, memwritespeed=43.838MB/s

test: 'yv12_to_yuv420_xomap_nobranch', 
time=5.679s, speed=42.448MP/s, memwritespeed=56.597MB/s

test: 'yv12_to_yuv420_line_arm_', 
time=4.706s, speed=51.223MP/s, memwritespeed=68.297MB/s

test: 'yv12_to_yuv420_line_armv5_', 
time=3.356s, speed=71.824MP/s, memwritespeed=95.765MB/s

test: 'yv12_to_yuv420_line_armv6_', 
time=2.826s, speed=85.298MP/s, memwritespeed=113.731MB/s

ARMv6 optimized YV12-YUV420 convertor is about 2.5x faster
than current code used in N800 xserver. So it should provide a nice
improvement for video :)

I doubt that your better C version can beat it or even get any close. There
are two important optimizations in this code:
1. Cache prefetch with PLD instruction (added in '_armv5' version) which
boosts performance to 70 megapixels per second. Inner loop is unrolled
to process 32 pixels per iteration (cache line size is 32 bytes on ARM, so
such unrolling is convenient). This is the most important improvement.
You can try using __builtin_prefetch() from C code to do the same
optimization.
2. The use of ARMv6 instruction REV16 to do bytes swapping for high and low
16-bit register parts, this optimization was added in '_armv6' version and
boosted performance even more to 85 megapixels per second. This 
optimization is highly unlikely probably impossible for C version at all.

I was a bit wrong about YUV420 format in my previous post.

Suppose we have planar YV12 image with the following data.
Y plane: Y1 Y2 Y3 Y4 ...
U plane: U1 __ U2 __ ...

Normal YUV420 (according to pictures in Epson docs)  would be the following:
U1 Y1 Y2 U2 Y3 Y4 ...

But appears (most likely because of 16-bit interface and some endian
differences between ARM and Epson chip) that each pair of bytes is 
swapped and we actually get the following somewhat weird layout:
Y1 U1 U2 Y2 Y4 Y3 ...

To do this byteswapping, ARMv6 instruction REV16 is very handy.

The assembly sources for ARMv6 code look a bit messy because 
instruction reordering was needed to correctly schedule them and avoid
ARM11 pipeline interlocks which negatively affect performance. Now this 
code is really fast with very little or no interlocks in the inner loop. And
gcc does not do a good job optimizing code on ARM, so C implementation
would be also at disadvantage here.

By the way, the benchmarks posted in my previous message should be 
discarded. I did not initialize source buffers that time and looks like ARM11
cpu has some 'cheat' which allows treating empty data pages in some 
special way and avoid reading from memory. So the numbers posted in the
previous benchmark were higher than usual. Now it is corrected.

As for the other possible Xv optimizations. You mentioned that fallback code
is not important at all. But imagine 640x480 video playback in windowed 
mode. Decoding it will require quite a lot of resources, but additionally
scaling it down using a slow fallback code will be a finishing blow. In
addition, a solution (fast JIT accelerated YV12-YUY2 scaler) for this 
problem already exists. I can also modify this scaler to support
YV12-YUV420 scaling. An interesting thing here is that this scaler
could be also used by xserver to solve graphics bus bandwidth 
issues. Imagine that we have some high resolution video with high 
framerate which exceeds graphics bus capabilities. In this case
this video can be downscaled in software using JIT scaler to lower 
resolution before sending data to LCD controller. What do you think?

 Sure.  Unfortunately my job has other functions than to make video
 decoding really, really fast, so I'm happy to merge, review, offer
 feedback, and help you out where I can be useful, but I can't throw much
 time at this myself.

That's fine. Now I'm waiting for 

Re: N800 Video playback

2007-04-30 Thread Daniel Stone
Hi,

On Mon, Apr 30, 2007 at 02:27:49PM +0300, ext Siarhei Siamashka wrote:
 On Friday 27 April 2007 04:43, Daniel Stone wrote:
  I don't think Tornado supports YUV420, but I can check in the specs
  tomorrow.  My better C version basically does two macroblocks at a time,
  ensuring all 32-bit writes (which _really_ helps over 16-bit writes,
  believe me).  This eliminates the branch, since your surface is
  guaranteed to be word-aligned, so if you do all 32-bit writes, you can
  just drop the branch as you know every write will be aligned.
 
  This will be really fast.
 
 Optimized YV12 - YUV420 convertor is done. The sources can be found here:
 https://garage.maemo.org/plugins/scmsvn/viewcvs.php/trunk/libswscale_nokia770/?root=mplayer
 
 Take a look at 'arm_colorconv.h' and 'arm_colorconv.S' files. Also there is a
 test program ('test_colorconv') which can ensure that everything works
 correctly and fast:
 
 ~ $ ./test_colorconv
 [results follow]
 
 ARMv6 optimized YV12-YUV420 convertor is about 2.5x faster
 than current code used in N800 xserver. So it should provide a nice
 improvement for video :)

Indeed.  Unfortunately this is slightly misleading in that it only shows
the raw write speed.  RFBI can't deal with the sorts of speeds that your
hyper-optimised version is pumping out, e.g.  So it's mainly just about
cutting the latency into the critical path to low enough that it makes
no difference.

 I doubt that your better C version can beat it or even get any close.

Of course not.

 There are two important optimizations in this code:
 1. Cache prefetch with PLD instruction (added in '_armv5' version) which
 boosts performance to 70 megapixels per second. Inner loop is unrolled
 to process 32 pixels per iteration (cache line size is 32 bytes on ARM, so
 such unrolling is convenient). This is the most important improvement.
 You can try using __builtin_prefetch() from C code to do the same
 optimization.

Ah, sounds useful.  From what Dan Amelang's been saying on xorg@, gcc
should coalesce four 32-bit reads into one 128-bit read, but this sounds
promising as well.

 2. The use of ARMv6 instruction REV16 to do bytes swapping for high and low
 16-bit register parts, this optimization was added in '_armv6' version and
 boosted performance even more to 85 megapixels per second. This 
 optimization is highly unlikely probably impossible for C version at all.

Sounds useful.

 I was a bit wrong about YUV420 format in my previous post.
 
 Suppose we have planar YV12 image with the following data.
 Y plane: Y1 Y2 Y3 Y4 ...
 U plane: U1 __ U2 __ ...
 
 Normal YUV420 (according to pictures in Epson docs)  would be the following:
 U1 Y1 Y2 U2 Y3 Y4 ...
 
 But appears (most likely because of 16-bit interface and some endian
 differences between ARM and Epson chip) that each pair of bytes is 
 swapped and we actually get the following somewhat weird layout:
 Y1 U1 U2 Y2 Y4 Y3 ...

Right, hence the comment in the code is correct. ;)

 As for the other possible Xv optimizations. You mentioned that fallback code
 is not important at all. But imagine 640x480 video playback in windowed 
 mode. Decoding it will require quite a lot of resources, but additionally
 scaling it down using a slow fallback code will be a finishing blow. In
 addition, a solution (fast JIT accelerated YV12-YUY2 scaler) for this 
 problem already exists. I can also modify this scaler to support
 YV12-YUV420 scaling. An interesting thing here is that this scaler
 could be also used by xserver to solve graphics bus bandwidth 
 issues. Imagine that we have some high resolution video with high 
 framerate which exceeds graphics bus capabilities. In this case
 this video can be downscaled in software using JIT scaler to lower 
 resolution before sending data to LCD controller. What do you think?

IMO this is a policy issue, and X is 'mechanism, not policy'.  If you
want to adapt the scaler, I'm more than happy to include it, but I'm not
about to start doing automatic scaling.

IOW, 'ask a stupid question, get a stupid answer'.

 That's fine. Now I'm waiting for further instructions :) Should I try to
 prepare a complete patch for xserver? I'm really interested in getting
 this optimization into xserver as it would help to play high resolution
 videos. If you have any extra questions about the code or anything 
 else (for example I wonder what free license would be appriopriate
 for it), don't hesitate to contact me.

If you wanted to prepare a complete patch for the server, that would be
great, as I don't have time to get to it right now (trying to finish off
the merge with upstream, among others).  As for the license, just the
standard MIT boilerplate in hw/kdrive/omap/* is fine, but replace Nokia
Corporation/Daniel Stone with Siarhei Siamaskha, obviously.

 I did not try to build xserver sources yet as I did not have enough time 
 for that and xserver requires quite a number of build dependencies. Can 
 you  share some tips and tricks about maemo 

Re: N800 Video playback

2007-04-24 Thread Siarhei Siamashka
On Friday 20 April 2007 10:39, you wrote:

 The primary conversion we do isn't planar - packed (this is a fallback
 for when the video is obscured), but from planar to another custom
 planar format.  It would be good to get ARM assembly for the fallback
 path, but most of the problem when using packed lies in having to
 transfer the much larger amount of data over the bus.

It is only a problem of definition :) Whatever it is, packed or planar, this
YUV420 format is not YV12. So it still needs conversion which is 
performed by only reordering bytes and is not much different from 
packed YUY2 (except that it requires less space and bandwidth).

 There's one optimisation that could be done for the YUV420 conversion
 (the custom planar format that Hailstorm takes), which removes a branch,
 ensures 32-bit writes always (instead of one 32-bit and one 16-bit per
 pixel), and unrolls a loop by half.  Might be interesting to see what
 effect this has, but I think it'll still be rather small.

My main performance concern is exactly about this 'omapCopyPlanarDataYUV420'
function. My experience from Nokia 770 video output code optimization shows
that optimization effect can be really huge (it was 1.5x improvement on Nokia
770 for unscaled YV12 - YUY2 conversion going from a simple loop in C to
optimized assembly code, I provided a link to the relevant code in my previous
post). But N800 code can be probably improved more because now it contains
unnecessary branch in the inner loop and branches are expensive on long
pipeline CPUs. Such color format conversion performance should be
comparable to that of memcpy if done right (it is about half memcpy speed on
Nokia 770 for unscaled YV12 - YUY2 conversion).

But only benchmarks can be a real proof, any premature speculations are
useless and even harmful. Do you remember the times when nobody from 
Nokia believed that ARM core could be good for video decoding on 770? ;-)

Testing with Nokia_N800.avi video on N800:
#  mplayer -benchmark -quiet -noaspect Nokia_N800.avi

BENCHMARKs: VC:  29,525s VO:  15,029s A:   0,453s Sys:  59,919s =  104,925s
BENCHMARK%: VC: 28,1390% VO: 14,3232% A:  0,4313% Sys: 57,1065% = 100,%
BENCHMARKn: disp: 2511 (23,93 fps)  drop: 0 (0%)  total: 2511 (23,93 fps)

Enabling direct rendering (avoids extra memcpy in mplayer, but requires to
disable OSD menu):
#  mplayer -benchmark -quiet -noaspect -dr -nomenu Nokia_N800.avi

BENCHMARKs: VC:  29,826s VO:  12,365s A:   0,437s Sys:  60,555s =  103,182s
BENCHMARK%: VC: 28,9058% VO: 11,9833% A:  0,4236% Sys: 58,6873% = 100,%
BENCHMARKn: disp: 2504 (24,27 fps)  drop: 0 (0%)  total: 2504 (24,27 fps)

Testing the same video on Nokia 770:
#  mplayer -benchmark -quiet -noaspect Nokia_N800.avi

BENCHMARKs: VC:  44,982s VO:   7,998s A:   0,884s Sys:  47,936s =  101,801s
BENCHMARK%: VC: 44,1862% VO:  7,8568% A:  0,8688% Sys: 47,0882% = 100,%
BENCHMARKn: disp: 2502 (24,58 fps)  drop: 0 (0%)  total: 2502 (24,58 fps)


So Nokia 770, having slower CPU, slower memory and using less efficient 
output format (16bpp vs. 12bpp), still requires less time for video output
than N800 (7,998s vs. 12,365s). Graphics bus performance is unrelated here 
as it is asynchronous operation and it is fast enough. Surely N800 also has
some extra overhead because of interprocess communication with xserver, but
looks like YV12 - YUV420 conversion is quite a bottleneck here too.

It should be noted that while Nokia_N800.avi video has low resolution and 
N800 has no problems decoding and displaying it, our goal is higher resolution 
videos such as 640x480. Getting to higher resolutions will increase color
format conversion overhead. As it can be seen from these benchmarks, video
output on N800 takes quite a significant time when compared with time needed
for decoding (29,826s for decoding, 12,365s for video output).

I can make an assembly optimized code for YV12 - YUV420 conversion. Is there
any chance that such optimization could be also integrated into xserver in one
of the next firmware updates if it really provides a significant performance
improvement?

N800 is almost able to play VGA resolution videos properly, it only needs a
bit more optimizations. Color format conversion performance for video output
is one of the important things that can be improved.

  So for any performance optimizations experiments which result in
  immediate video performance improvement, either direct framebuffer access
  should be used again or it would be very nice if xserver could provide
  direct access to framebuffer (video planes) in yuy2 and that custom
  yuv420 format in one of the next firmware updates. The xserver itself
  should not do any excess memory copy operations as they degrade
  performance (and it does such copy for yuy2 at least).

 'Direct framebuffer access'?  As in, just hand you a pointer to a
 framebuffer somewhere and let you write straight to it?  As this would
 require a firmware update anyway, I don't really see how 

Re: N800 Video playback

2007-04-24 Thread Daniel Stone
On Tue, Apr 24, 2007 at 09:46:52AM +0300, ext Siarhei Siamashka wrote:
 On Friday 20 April 2007 10:39, you wrote:
  There's one optimisation that could be done for the YUV420 conversion
  (the custom planar format that Hailstorm takes), which removes a branch,
  ensures 32-bit writes always (instead of one 32-bit and one 16-bit per
  pixel), and unrolls a loop by half.  Might be interesting to see what
  effect this has, but I think it'll still be rather small.
 
 My main performance concern is exactly about this 'omapCopyPlanarDataYUV420'
 function. My experience from Nokia 770 video output code optimization shows
 that optimization effect can be really huge (it was 1.5x improvement on Nokia
 770 for unscaled YV12 - YUY2 conversion going from a simple loop in C to
 optimized assembly code, I provided a link to the relevant code in my previous
 post). But N800 code can be probably improved more because now it contains
 unnecessary branch in the inner loop and branches are expensive on long
 pipeline CPUs. Such color format conversion performance should be
 comparable to that of memcpy if done right (it is about half memcpy speed on
 Nokia 770 for unscaled YV12 - YUY2 conversion).

Right, the branch is a problem, and as I said, the branch can be avoided
and the writes optimised to be three 32-bit writes for two macroblocks,
instead of two 32-bit writes and two 16-bit writes.

However, I don't think the lessons from the 770 are necessarily
_directly_ applicable to the N800: on the 770, our bottleneck is
decoding speed.  The bottleneck on the N800 is exactly the opposite:
video output.

 But only benchmarks can be a real proof, any premature speculations are
 useless and even harmful. Do you remember the times when nobody from 
 Nokia believed that ARM core could be good for video decoding on 770? ;-)

Actually, I don't, since I've always mainly worked on the N800. ;)  But
still, if there's dedicated hardware we can use to remove load from the
ARM and let it get on with tasks, and it can perform to an adequate
level, there's no reason to avoid it.

 So Nokia 770, having slower CPU, slower memory and using less efficient 
 output format (16bpp vs. 12bpp), still requires less time for video output
 than N800 (7,998s vs. 12,365s). Graphics bus performance is unrelated here 
 as it is asynchronous operation and it is fast enough. Surely N800 also has
 some extra overhead because of interprocess communication with xserver, but
 looks like YV12 - YUV420 conversion is quite a bottleneck here too.

Bear in mind that, unless you explicitly disable it (the Xv attribute is
something like XV_OMAP_VSYNC), the X server _will_ flush all pending
writes before the next frame is put through.  Else you get tearing,
because you can be halfway through an update, and writing the next frame
to the framebuffer, so which frame is being picked up, changes halfway
through.

Try forcing XV_OMAP_VSYNC (or whatever it is) to 0, and comparing the
results.

 I can make an assembly optimized code for YV12 - YUV420 conversion. Is there
 any chance that such optimization could be also integrated into xserver in one
 of the next firmware updates if it really provides a significant performance
 improvement?

Yeah, if there's measurable benefit, I'll include it.

 N800 is almost able to play VGA resolution videos properly, it only needs a
 bit more optimizations. Color format conversion performance for video output
 is one of the important things that can be improved.

I don't believe it's on the critical path.  The optimisation I mentioned
before will bring us up to the point where any improvement that we can
make in that conversion will be eclipsed by the time taken to send it
over the bus, I believe.  But I can't prove that.

  Which Epson docs?
 
 The one mentioned by Frantisek. Well, it was just a comment 
 for 'omapCopyPlanarDataYUV420' function wrong and misleading, 
 nevermind :-) Now everything is clear.

Hmm, is it?  Because, unless I was _really_ tired at the time I wrote it
(which is entirely possible), that's what the code does, and it works,
so ...

Cheers,
Daniel


signature.asc
Description: Digital signature
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: N800 Video playback

2007-04-20 Thread Siarhei Siamashka
On Monday 19 March 2007 22:34, you wrote:

snip

 Again, if there are any particular questions I can answer, don't be
 subtle: ask me straight up.  If I can answer them (some things I can't
 necessarily say, some things I don't necessarily know), I will.

Thanks, here we go and sorry for a long delay with this answer.

First thanks for Xv update which makes it really usable now, MPlayer now uses
Xv video output on N800 by default. But there are still some problems. Using
unmodified upstream MPlayer code for Xv (N800 with 3.2007.10-7
firmware at the moment) does not work good. It has two at least problems:

1. Lockups which look like cycling two sequential frames, very similar or the 
same problem as https://maemo.org/bugzilla/show_bug.cgi?id=991 
Also keypresses are not very responsive. A fix (or workaround) required
changing XFlush to XSync in screen update code, now it looks a lot better.

2. Switching windowed/fullscreen mode generally makes mplayer terminate
with the following error messages:
X11 error: BadValue (integer parameter out of range for operation)
Xlib: unexpected async reply (sequence 0x5db)!
A workaround to make this problem less frequent was a code addition which
prevents screen updates until we get Expose even notification.

All these Xv patches for MPlayer code can be viewed here:
https://garage.maemo.org/plugins/scmsvn/viewcvs.php?root=mplayerdiff_format=hview=revrev=166

I really don't know much about X11 programming and only started to learning
it, so your help with some advice may be very useful. Looks like MPlayer code
X11/Xv output code is a big mess with many tricks and workarounds added to
work on different systems over time. Maybe it contains some bugs which get
triggered on N800 only, but apparently this code is used for other systems
without any problems. Can you try experimenting a bit with MPlayer (upstream
release) yourself to check how it works with N800 xserver? Maybe it can reveal
some xserver bugs which need to be fixed? Also if MPlayer has some apparently 
bad X11 code, preparing a clean patch and submitting it upstream maybe a 
good idea.

One more strange thing with Xv on N800 can be reproduced by trying to watch
standard N800 demo video in MPlayer. It has an old familiar tearing line in
the bottom part of the screen and the performance is very poor. The same file
plays fine in the standard video player. The only difference is that mplayer
respects video aspect ratio (this video is not precisely 15:9 but slightly
off) and shows some small black bands above and below picture and 
default video player scales it to fit the whole screen. Disabling aspect ratio
in mplayer with -noaspect option also 'fixes' this problem.

Using benchmark option we get the following numbers:

# mplayer -benchmark -quiet Nokia_N800.avi
[...]
BENCHMARKs: VC:  33,271s VO:  66,768s A:   0,490s Sys:   5,703s =  106,232s
BENCHMARK%: VC: 31,3189% VO: 62,8517% A:  0,4614% Sys:  5,3681% = 100,%
BENCHMARKn: disp: 1732 (16,30 fps)  drop: 778 (30%)  total: 2510 (23,63 fps)

# mplayer -benchmark -quiet -noaspect Nokia_N800.avi
[...]
BENCHMARKs: VC:  32,226s VO:  14,350s A:   0,456s Sys:  55,699s =  102,731s
BENCHMARK%: VC: 31,3694% VO: 13,9687% A:  0,4439% Sys: 54,2180% = 100,%
BENCHMARKn: disp: 2501 (24,35 fps)  drop: 0 (0%)  total: 2501 (24,35 fps)

So when showing video with proper aspect ratio, we get tearing back and more
than 4x slowdown in video output code (66,768s vs. 14,350s). This all results
in 30% of frames dropped.

These were the 'usability' problems with Xv. Now we get to performance
related issues. As YV12 is not natively supported by hardware, some 
color format conversion and bytes shuffling in video output code is
unavoidable. It is a good idea to optimize this code if we need a good
performance for high resolution video playback. Color format conversion 
can be optimized using assembly, for example maemo port of mplayer
has a patch for assembly optimized yv12- yuy2 (yuv420p - yuyv422) 
nonscaled conversion which provides a very noticeable ~50% improvement
on Nokia 770:
https://garage.maemo.org/plugins/scmsvn/viewcvs.php?root=mplayerrev=129view=rev

Also here is a JIT accelerated scaler for yv12- yuy2 (yuv420p - yuyv422)
conversion, it is very fast and supports pixels interpolation (good for image
quality) :
https://garage.maemo.org/plugins/scmsvn/viewcvs.php/trunk/libswscale_nokia770/?root=mplayer

I have seen your code in xserver which does the same job for downscaling, but
in nonoptimized C and with much higher impact on quality. Using JIT scaler
there can improve both image quality and performance a lot. The only my
concern is about instruction cache coherency. As ARM requires explicit
instructions cache flush for self modyfying or dynamically generated code, I
wonder if  using just mmap is safe (does it flush cache for allocated region
of  memory?). Maybe maemo kernel hackers/developers can help with this
information?

It should be noted, that all this assembly 

Re: N800 Video playback

2007-04-20 Thread Daniel Stone
Hi,

On Fri, Apr 20, 2007 at 09:41:45AM +0300, ext Siarhei Siamashka wrote:
 1. Lockups which look like cycling two sequential frames, very similar or the 
 same problem as https://maemo.org/bugzilla/show_bug.cgi?id=991 
 Also keypresses are not very responsive. A fix (or workaround) required
 changing XFlush to XSync in screen update code, now it looks a lot better.

I assume this is basically just a race condition, and it doesn't trigger
on other systems, because they're a lot quicker.
 
 2. Switching windowed/fullscreen mode generally makes mplayer terminate
 with the following error messages:
 X11 error: BadValue (integer parameter out of range for operation)
 Xlib: unexpected async reply (sequence 0x5db)!
 A workaround to make this problem less frequent was a code addition which
 prevents screen updates until we get Expose even notification.

Ditto.

 I really don't know much about X11 programming and only started to learning
 it, so your help with some advice may be very useful.

I mainly lurk on the server side, however.

 Looks like MPlayer code
 X11/Xv output code is a big mess with many tricks and workarounds added to
 work on different systems over time. Maybe it contains some bugs which get
 triggered on N800 only, but apparently this code is used for other systems
 without any problems. Can you try experimenting a bit with MPlayer (upstream
 release) yourself to check how it works with N800 xserver? Maybe it can reveal
 some xserver bugs which need to be fixed? Also if MPlayer has some apparently 
 bad X11 code, preparing a clean patch and submitting it upstream maybe a 
 good idea.

Unfortunately, I don't have the time to do this.  Sorry.

 One more strange thing with Xv on N800 can be reproduced by trying to watch
 standard N800 demo video in MPlayer. It has an old familiar tearing line in
 the bottom part of the screen and the performance is very poor. The same file
 plays fine in the standard video player. The only difference is that mplayer
 respects video aspect ratio (this video is not precisely 15:9 but slightly
 off) and shows some small black bands above and below picture and 
 default video player scales it to fit the whole screen. Disabling aspect ratio
 in mplayer with -noaspect option also 'fixes' this problem.
 
 Using benchmark option we get the following numbers:
 
 # mplayer -benchmark -quiet Nokia_N800.avi
 [...]
 BENCHMARKs: VC:  33,271s VO:  66,768s A:   0,490s Sys:   5,703s =  106,232s
 BENCHMARK%: VC: 31,3189% VO: 62,8517% A:  0,4614% Sys:  5,3681% = 100,%
 BENCHMARKn: disp: 1732 (16,30 fps)  drop: 778 (30%)  total: 2510 (23,63 fps)
 
 # mplayer -benchmark -quiet -noaspect Nokia_N800.avi
 [...]
 BENCHMARKs: VC:  32,226s VO:  14,350s A:   0,456s Sys:  55,699s =  102,731s
 BENCHMARK%: VC: 31,3694% VO: 13,9687% A:  0,4439% Sys: 54,2180% = 100,%
 BENCHMARKn: disp: 2501 (24,35 fps)  drop: 0 (0%)  total: 2501 (24,35 fps)
 
 So when showing video with proper aspect ratio, we get tearing back and more
 than 4x slowdown in video output code (66,768s vs. 14,350s). This all results
 in 30% of frames dropped.

Okay, I'll take a look at this.  My guess is that the scaling we're
seeing prevents us from using the LCD controller's overlay, possibly
because it's done in software.

 These were the 'usability' problems with Xv. Now we get to performance
 related issues. As YV12 is not natively supported by hardware, some 
 color format conversion and bytes shuffling in video output code is
 unavoidable. It is a good idea to optimize this code if we need a good
 performance for high resolution video playback. Color format conversion 
 can be optimized using assembly, for example maemo port of mplayer
 has a patch for assembly optimized yv12- yuy2 (yuv420p - yuyv422) 
 nonscaled conversion which provides a very noticeable ~50% improvement
 on Nokia 770:
 https://garage.maemo.org/plugins/scmsvn/viewcvs.php?root=mplayerrev=129view=rev
 
 Also here is a JIT accelerated scaler for yv12- yuy2 (yuv420p - yuyv422)
 conversion, it is very fast and supports pixels interpolation (good for image
 quality) :
 https://garage.maemo.org/plugins/scmsvn/viewcvs.php/trunk/libswscale_nokia770/?root=mplayer

The primary conversion we do isn't planar - packed (this is a fallback
for when the video is obscured), but from planar to another custom
planar format.  It would be good to get ARM assembly for the fallback
path, but most of the problem when using packed lies in having to
transfer the much larger amount of data over the bus.

There's one optimisation that could be done for the YUV420 conversion
(the custom planar format that Hailstorm takes), which removes a branch,
ensures 32-bit writes always (instead of one 32-bit and one 16-bit per
pixel), and unrolls a loop by half.  Might be interesting to see what
effect this has, but I think it'll still be rather small.

 I have seen your code in xserver which does the same job for downscaling, but
 in nonoptimized C and with much higher impact on quality. 

Re: N800 Video playback

2007-04-20 Thread Frantisek Dufka

Daniel Stone wrote:



Which Epson docs?



fanoush.wz.cz/maemo/S1D13745A01SpecRev1.0.gm.zip
Got it from Epson Electronics like the one mentioned here
http://maemo.org/pipermail/maemo-developers/2006-December/006638.html

___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: N800 Video playback

2007-04-20 Thread Tao Huang

Siarhei Siamashka 写道:

I have seen your code in xserver which does the same job for downscaling, but
in nonoptimized C and with much higher impact on quality. Using JIT scaler
there can improve both image quality and performance a lot. The only my
concern is about instruction cache coherency. As ARM requires explicit
instructions cache flush for self modyfying or dynamically generated code, I
wonder if  using just mmap is safe (does it flush cache for allocated region
of  memory?). Maybe maemo kernel hackers/developers can help with this
information?
  

arm linux support flush icache by syscall cacheflush,

qemu have this function:
static inline void flush_icache_range(unsigned long start, unsigned long 
stop)

{
   register unsigned long _beg __asm (a1) = start;
   register unsigned long _end __asm (a2) = stop;
   register unsigned long _flg __asm (a3) = 0;
   __asm __volatile__ (swi 0x9f0002 : : r (_beg), r (_end), r 
(_flg));

}

you can reference kernel source arch/arm/kernel/traps.c and 
include/asm-arm/unistd.h


___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: N800 Video playback

2007-03-21 Thread Siarhei Siamashka
On Tuesday 20 March 2007 15:03, Klaus Rotter wrote:

  On Tue, Mar 20, 2007 at 09:31:00AM +0100, ext Klaus Rotter wrote:
  The memory bandwidth to the N800 LCD framebuffer is 3 times slower that
  the bandwidth in the N770? Is it really _that_ big?
 
  Siarhei's calculations were correct, so, yes.

 Bad... the N770 interface wasn't the fasted either. So we have even a
 more slow down. 

There is one important thing to note. Screen updates are asynchronous and 
are performed simultaneously with CPU doing some other useful things at 
the same time. Screen updates do not introduce any overhead or affect 
performance (at least I did not notice any such effect). So insanely boosting
graphics bus performance will not provide any improvements at all once it is 
capable to sustain acceptable framerate. And what is acceptable depends
on applications. Video may require higher framerate, but it is both high
resolution and high framerate movies that may exceed graphics bus 
capabilities, in this case video will be still played (if cpu is fast enough
to decode it, that's another story) but with some frames skipped and 
many people will not even notice any problems. Quite a lot of people 
are even satistied with 15fps transcoded video, so getting maybe 20-25fps
(random guess) on some videos instead of 30fps is not so bad.

Tearing at the bottom is most likely caused by screen update time being 
longer than two LCD refresh cycles. With tearsync enabled, both screen 
update and refresh cycle start at the same time, refresh is faster, so we
still see the previous frame on the screen. When the first refresh cycle
completes, screen buffer is slightly less than half updated at that moment.
The second LCD refresh cycle starts displaying the data from the new image,
while screen buffer still continues to get updated, but not fast enough to
complete before this second LCD refresh cycle catches up not too far 
from the bottom part of the screen. If the screen update was faster than two
refresh cycles, there would be no tearing visible. Screen update only needs 
to be 15-20% faster to achieve this. If improving graphics bus performance
does  not work, I wonder if it is possible to to reduce LCD refresh rate
instead?

Anyway, I think it is better to believe Daniel and wait for the new
firmware update :)

 On the N770 there was the feature (with SDL games) of 
 doubling the pixels by hardware with a X-server extension. Will this
 feature be available in the new kernel / X11 server for the N800? It
 would be great if it would use the same API.

Doubling pixels will definitely reduce the load on the graphics bus so that
its bandwidth should become not an issue.
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: N800 Video playback

2007-03-20 Thread Klaus Rotter

Daniel Stone wrote:

On Sun, Mar 18, 2007 at 07:57:36PM +0200, ext Siarhei Siamashka wrote:
Looks like graphics bus on N800 is 3x slower than on Nokia 770. It might 
be caused by inefficient framebuffer driver implementation in its initial

revision. But if it is a hardware issue, getting normal video playback at
native framerate may be troublesome. [...]



Unfortunately, it's a hardware issue.  What we can do is get the LCD


The memory bandwidth to the N800 LCD framebuffer is 3 times slower that 
the bandwidth in the N770? Is it really _that_ big?


What is limiting the bandwidth: The OMAP interface, the LCD controller 
itself or was it a design issue.


-Klaus

--
Klaus Rotter * klaus at rotters dot de * www.rotters.de
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: N800 Video playback

2007-03-20 Thread Daniel Stone
On Tue, Mar 20, 2007 at 09:31:00AM +0100, ext Klaus Rotter wrote:
 Daniel Stone wrote:
 On Sun, Mar 18, 2007 at 07:57:36PM +0200, ext Siarhei Siamashka wrote:
 Looks like graphics bus on N800 is 3x slower than on Nokia 770. It might 
 be caused by inefficient framebuffer driver implementation in its initial
 revision. But if it is a hardware issue, getting normal video playback at
 native framerate may be troublesome. [...]
 
 Unfortunately, it's a hardware issue.  What we can do is get the LCD
 
 The memory bandwidth to the N800 LCD framebuffer is 3 times slower that 
 the bandwidth in the N770? Is it really _that_ big?

Siarhei's calculations were correct, so, yes.

 What is limiting the bandwidth: The OMAP interface, the LCD controller 
 itself or was it a design issue.

a) and c).  It's just not stable at higher frequencies.

Cheers,
Daniel


signature.asc
Description: Digital signature
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: N800 Video playback

2007-03-20 Thread Klaus Rotter

Daniel Stone wrote:

On Tue, Mar 20, 2007 at 09:31:00AM +0100, ext Klaus Rotter wrote:
The memory bandwidth to the N800 LCD framebuffer is 3 times slower that 
the bandwidth in the N770? Is it really _that_ big?



Siarhei's calculations were correct, so, yes.


Bad... the N770 interface wasn't the fasted either. So we have even a 
more slow down. On the N770 there was the feature (with SDL games) of 
doubling the pixels by hardware with a X-server extension. Will this 
feature be available in the new kernel / X11 server for the N800? It 
would be great if it would use the same API.


--
Klaus Rotter * klaus at rotters dot de * www.rotters.de
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: N800 Video playback

2007-03-20 Thread Daniel Stone
On Tue, Mar 20, 2007 at 02:03:16PM +0100, ext Klaus Rotter wrote:
 Daniel Stone wrote:
 On Tue, Mar 20, 2007 at 09:31:00AM +0100, ext Klaus Rotter wrote:
 The memory bandwidth to the N800 LCD framebuffer is 3 times slower that 
 the bandwidth in the N770? Is it really _that_ big?
 
 Siarhei's calculations were correct, so, yes.
 
 Bad... the N770 interface wasn't the fasted either. So we have even a 
 more slow down. On the N770 there was the feature (with SDL games) of 
 doubling the pixels by hardware with a X-server extension. Will this 
 feature be available in the new kernel / X11 server for the N800? It 
 would be great if it would use the same API.

Yes, pixel doubling has been fixed, and still uses the XSP API for now.
Future releases (long-term, as I haven't implemented this yet) will use
the standard XRandR API.

Cheers,
Daniel


signature.asc
Description: Digital signature
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: N800 Video playback

2007-03-19 Thread Marius Gedminas
On Sun, Mar 18, 2007 at 07:57:36PM +0200, Siarhei Siamashka wrote:
 I did some tests with the framebuffer when trying to find a way to reduce
 tearing effect in MPlayer. Here are the results.
snip

This is a very interesting post.  Thanks!

Marius Gedminas
-- 
... Another nationwide organization's computer system crashed twice in less
than a year. The cause of each crash was a computer virus
-- Paul Mungo, Bryan Glough  _Approaching_Zero_
(in 1986 computer crashes were something out of the ordinary.  Win95 anyone?)


signature.asc
Description: Digital signature
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: N800 Video playback

2007-03-19 Thread Hanno Zulla
Siarhei Siamashka schrieb:
 Looks like graphics bus on N800 is 3x slower than on Nokia 770. It might 
 be caused by inefficient framebuffer driver implementation in its initial
 revision. But if it is a hardware issue, getting normal video playback at
 native framerate may be troublesome.

It would be a major disappointment if this turns out to be a hardware
issue...

Regards,

Hanno
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: N800 Video playback

2007-03-19 Thread Daniel Stone
Hi,

On Sun, Mar 18, 2007 at 07:57:36PM +0200, ext Siarhei Siamashka wrote:
 If we look at the framebuffer API. There are two ioctl important for screen
 updates and tearing synchronization if I understand them correctly now:
 
 [...]

You do indeed understand them correctly.

 Looks like graphics bus on N800 is 3x slower than on Nokia 770. It might 
 be caused by inefficient framebuffer driver implementation in its initial
 revision. But if it is a hardware issue, getting normal video playback at
 native framerate may be troublesome. Performing software downscaling of 
 video before sending data to the graphics chip may be a solution, but it
 sacrifices image quality. Switching to 12bit YUV format from 16bit will save
 ~33% of bus bandwidth, but it can't compensate 3x performance regression 
 and may be not enough for 30 fps fullscreen video playback.

Unfortunately, it's a hardware issue.  What we can do is get the LCD
controller to perform colourspace conversion from a custom planar format
('YUV420') and the scaling as well.  Unfortunately this isn't a
colourkey, but only a simple rectangle, so the semantics are actually
quite complex.  But it works well enough that we've shipped an X server
and kernel with this support.  We've tried jacking the RFBI frequency up
a bit, and the most we could get was a ~10% improvement, with a loss in
stability: anything above that would kill your device quick smart,
whereas this one only crashed it every day or so.

 As Daniel explained, the next firmware will bring a big improvement in this
 area. I'm not sure whether it is worth to release the next version of MPlayer
 before that, since it will still be far from perfect on N800.

I'd hold your breath, to be honest.

 A preview of the next kernel for beta testing might reduce time needed to get
 MPlayer fully working on N800, but I'm not demanding or expecting anything. It
 is just a matter of time anyway and I'm not so impatient :)

Unfortunately, again, it's not my call: there are various processes to
get things released (legal, in particular), and I can't really pre-empt
those.

 I would be grateful for any comments and corrections. Some things are not 
 yet clear to me, figuring them out myself is just a waste of time that could
 be spent on something more useful. Even a small hint may save a huge 
 amount of time.

Anything in particular?  I thought my last mails on the subject would've
been reasonably exhaustive.

 PS. The last 'inefficient' period of time was when I was struggling with
 gstreamer API (with no prior experience with it) to get MP3 playback in
 MPlayer working on DSP for a few months. Looks like the history repeats. 
 Once again, I'm not demanding anything, it is just a matter of 'optimizing'
 development and spending scarce amounts of spare time more efficiently.
 I know that Nokia developers are too busy with their primary work, and
 really appreciate what they are doing. So consider this as a polite request 
 for a favour (not necessary to fulfil right now or fulfil at all).

Again, if there are any particular questions I can answer, don't be
subtle: ask me straight up.  If I can answer them (some things I can't
necessarily say, some things I don't necessarily know), I will.

Cheers,
Daniel


signature.asc
Description: Digital signature
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers