from:"Siarhei Siamashka"

Re: Performance of floating point instructions

2010-03-10 Thread Siarhei Siamashka

On Wednesday 10 March 2010, Alberto Mardegan wrote:
 Alberto Mardegan wrote:
  Does one have any figure about how the performance of the FPU is,
  compared to integer operations?

 I added some profiling to the code, and I measured the time spent by a
 function which is operating on an array of points (whose coordinates are
 integers) and trasforming each of them into a geographic coordinates
 (latitude and longitude, floating point) and calculating the distance
 from the previous point.

 http://vcs.maemo.org/git?p=maemo-mapper;a=shortlog;h=refs/heads/gps_control
 map_path_calculate_distances() is in path.c,
 calculate_distance() is in utils.c,
 unit2latlon() is a pointer to unit2latlon_google() in tile_source.c


 The output (application compiled with -O0):

Using an optimized build (-O2 or -O3) may sometimes change the overall picture
quite dramatically. It makes almost no sense benchmarking -O0 code, because in
this case all the local variables are kept in memory and are read/written
before/after each operation. It's substantially different from normal code.

-- 
Best regards,
Siarhei Siamashka
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers

Re: Performance of floating point instructions

2010-03-10 Thread Siarhei Siamashka

On Wednesday 10 March 2010, Laurent Desnogues wrote:
 On Wed, Mar 10, 2010 at 7:29 PM, Alberto Mardegan
  So, it seems that there's a huge improvements when switching from doubles
  to floats; although I wonder if it's because of the FPU or just because
  the amount of data passed around is smaller.
  On the other hand, the improvements obtained by enabling the fast FPU
  mode is rather small -- but that might be due to the fact that the FPU
  operations are not a major player in this piece of code.

 The fast mode only gains 1 or 2 cycles per FP instruction.
 The FPU on Cortex-A8 is not pipelined and the fast mode
 can't change that :-)

It's probably
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0344j/ch16s07s01.html
vs.
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0344j/BCGEIHDJ.html

I wonder why the compiler does not use real NEON instructions with -ffast-math 
option, it should be quite useful even for scalar code.

something like:

vld1.32  {d0[0]}, [r0]
vadd.f32 d0, d0, d0
vst1.32  {d0[0]}, [r0]

instead of:

flds s0, [r0]
faddss0, s0, s0
fsts s0, [r0]

for:

*float_ptr = *float_ptr + *float_ptr;

At least NEON is pipelined and should be a lot faster on more complex code
examples where it can actually benefit from pipelining. On x86, SSE2 is used
quite nicely for floating point math.

-- 
Best regards,
Siarhei Siamashka
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers

Re: Performance of floating point instructions

2010-03-10 Thread Siarhei Siamashka

On Wednesday 10 March 2010, Laurent Desnogues wrote:
 On Wed, Mar 10, 2010 at 8:54 PM, Siarhei Siamashka
 siarhei.siamas...@gmail.com wrote:
 [...]

  I wonder why the compiler does not use real NEON instructions with
  -ffast-math option, it should be quite useful even for scalar code.
 
  something like:
 
  vld1.32  {d0[0]}, [r0]
  vadd.f32 d0, d0, d0
  vst1.32  {d0[0]}, [r0]
 
  instead of:
 
  flds     s0, [r0]
  fadds    s0, s0, s0
  fsts     s0, [r0]
 
  for:
 
  *float_ptr = *float_ptr + *float_ptr;
 
  At least NEON is pipelined and should be a lot faster on more complex
  code examples where it can actually benefit from pipelining. On x86, SSE2
  is used quite nicely for floating point math.

 Even if fast-math is known to break some rules, it only
 breaks C rules IIRC. 

If that's the case, some other option would be handy. Or even a new custom
data type like float_neon (or any other name). Probably it is even possible
with C++ and operators overloading.

 OTOH, NEON FP has no support 
 for NaN and other nice things from IEEE754.

 Anyway you're perhaps looking for -mfpu=neon, no?

I lost my faith in gcc long ago :) So I'm not really looking for anything.

-- 
Best regards,
Siarhei Siamashka
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers

Re: Performance of floating point instructions

2010-03-10 Thread Siarhei Siamashka

On Wednesday 10 March 2010, Laurent Desnogues wrote:
 Even if fast-math is known to break some rules, it only
 breaks C rules IIRC.  OTOH, NEON FP has no support
 for NaN and other nice things from IEEE754.

And just checked gcc man page to verify this stuff. 

-ffast-math
  Sets -fno-math-errno, -funsafe-math-optimizations,
  -ffinite-math-only, -fno-rounding-math, -fno-signaling-nans
  and -fcx-limited-range.

-ffinite-math-only
  Allow optimizations for floating-point arithmetic that assume that arguments
  and results are not NaNs or +-Infs.

  This option is not turned on by any -O option since it can result in
  incorrect output for programs which depend on an exact implementation of
  IEEE or ISO rules/specifications for math functions. It may, however, yield
  faster code for programs that do not require the guarantees of these
  specifications.

So looks like -ffast-math already assumes no support for NaNs. Even if
there are other nice IEEE754 things preventing NEON from being used
with -ffast-math, an appropriate new option relaxing this requirement
makes sense to be invented.

-- 
Best regards,
Siarhei Siamashka
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers

Re: Performance of floating point instructions

2010-03-10 Thread Siarhei Siamashka

On Wednesday 10 March 2010, Laurent GUERBY wrote:
 On Wed, 2010-03-10 at 21:54 +0200, Siarhei Siamashka wrote:
  I wonder why the compiler does not use real NEON instructions with
  -ffast-math option, it should be quite useful even for scalar code.
 
  something like:
 
  vld1.32  {d0[0]}, [r0]
  vadd.f32 d0, d0, d0
  vst1.32  {d0[0]}, [r0]
 
  instead of:
 
  flds s0, [r0]
  faddss0, s0, s0
  fsts s0, [r0]
 
  for:
 
  *float_ptr = *float_ptr + *float_ptr;
 
  At least NEON is pipelined and should be a lot faster on more complex
  code examples where it can actually benefit from pipelining. On x86, SSE2
  is used quite nicely for floating point math.

 Hi,

 Please open a report on http://gcc.gnu.org/bugzilla with your test
 sources and command line, at least GCC developpers will notice there's
 interest :).

This sounds reasonable :)

 GCC comes with some builtins for neon, they're defined in arm_neon.h
 see below.

This does not sound like a good idea. If the code has to be modified and
changed into something nonportable, there are way better options than
intrinsics.

Regarding the use of NEON instructions via C++ operator overloading. A test
program is attached.

# gcc -O3 -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp -ffast-math
  -o neon_float neon_float.cpp

=== ieee754 floats ===

real0m3.396s
user0m3.391s
sys 0m0.000s

=== runfast floats ===

real0m2.285s
user0m2.273s
sys 0m0.008s

=== NEON C++ wrapper ===

real0m1.312s
user0m1.313s
sys 0m0.000s

But the quality of generated code is quite bad. That's also something to be
reported to gcc bugzilla :)

-- 
Best regards,
Siarhei Siamashka
#include stdio.h
#include arm_neon.h

#if 1
class fast_float
{
float32x2_t data;
public:
fast_float(float x) { data = vset_lane_f32(x, data, 0); }
fast_float(const fast_float x) { data = x.data; }
fast_float(const float32x2_t x) { data = x; }
operator float () { return vget_lane_f32(data, 0); }

friend fast_float operator+(const fast_float a, const fast_float b);
friend fast_float operator*(const fast_float a, const fast_float b);

const fast_float operator+=(fast_float a)
{
data = vadd_f32(data, a.data);
return *this;
}
};
fast_float operator+(const fast_float a, const fast_float b)
{
return vadd_f32(a.data, b.data);
}
fast_float operator*(const fast_float a, const fast_float b)
{
return vmul_f32(a.data, b.data);
}
#else
typedef float fast_float;
#endif

float f(float *a, float *b)
{
int i;
fast_float accumulator = 0;
for (i = 0; i  1024; i += 16)
{
accumulator += (fast_float)a[i + 0] * (fast_float)b[i + 0];
accumulator += (fast_float)a[i + 1] * (fast_float)b[i + 1];
accumulator += (fast_float)a[i + 2] * (fast_float)b[i + 2];
accumulator += (fast_float)a[i + 3] * (fast_float)b[i + 3];
accumulator += (fast_float)a[i + 4] * (fast_float)b[i + 4];
accumulator += (fast_float)a[i + 5] * (fast_float)b[i + 5];
accumulator += (fast_float)a[i + 6] * (fast_float)b[i + 6];
accumulator += (fast_float)a[i + 7] * (fast_float)b[i + 7];
accumulator += (fast_float)a[i + 8] * (fast_float)b[i + 8];
accumulator += (fast_float)a[i + 9] * (fast_float)b[i + 9];
accumulator += (fast_float)a[i + 10] * (fast_float)b[i + 10];
accumulator += (fast_float)a[i + 11] * (fast_float)b[i + 11];
accumulator += (fast_float)a[i + 12] * (fast_float)b[i + 12];
accumulator += (fast_float)a[i + 13] * (fast_float)b[i + 13];
accumulator += (fast_float)a[i + 14] * (fast_float)b[i + 14];
accumulator += (fast_float)a[i + 15] * (fast_float)b[i + 15];
}
return accumulator;
}

volatile float dummy;
float buf1[1024];
float buf2[1024];

int main()
{
int i;
int tmp;
__asm__ volatile(
fmrx   %[tmp], fpscr\n
orr%[tmp], %[tmp], #(1  24)\n /* flush-to-zero */
orr%[tmp], %[tmp], #(1  25)\n /* default NaN */
bic%[tmp], %[tmp], #((1  15) | (1  12) | (1  11) | (1  10) | (1  9) | (1  8))\n /* clear exception bits */
fmxr   fpscr, %[tmp]\n
: [tmp] =r (tmp)
  );
for (i = 0; i  1024; i++)
{
buf1[i] = buf2[i] = i % 16;
}
for (i = 0; i  10; i++)
{
dummy = f(buf1, buf2);
}
printf(%f\n, (double)dummy);
return 0;
}
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers

Re: Problem with garage's project being refused

2009-03-02 Thread Siarhei Siamashka

On Mon, Mar 2, 2009 at 4:57 PM, Sarah Newman newm...@sonic.net wrote:
 On that note, what do we do if project is there but completely unused
 and we want to delete it?  I know I'm not the only one ;)

Probably first try contact the person who registered the project and
ask him(her) to transfer ownership to you. Maybe they don't have time
or have other reasons not to work on the project actively and will be
glad to know that somebody wants to take it over.

-- 
Best regards
Siarhei Siamashka
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers

Re: Screen orientation

2009-01-30 Thread Siarhei Siamashka

On Tuesday 27 January 2009, gary liquid wrote:
 the answer to the identification problem is by querying the xrandr x11
 extension library.

 http://www.xfree86.org/current/Xrandr.3.html

 Kamen correctly points out however that a user with the default
 installation of maemo does not need to query this interface, there is only
 1 possible default orientation: landscape.

 Future versions of maemo will hopefully have a fully working xrandr
 implementation and allow rotation to be queried and controlled in the
 default system.

 *note to nokians reading, PLEASE make sure this works and also confirm that
 XV rotates correctly as well ;)*

 Gary Birkett (lcuk in #maemo)

By the way, have you reported this XV rotation problem to the authors of the
unofficial rotation patch? What did they reply?

-- 
Best regards,
Siarhei Siamashka
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers

Re: gprof-base Profiling: no time accumulated

2009-01-24 Thread Siarhei Siamashka

On Saturday 24 January 2009, Thomas Thrainer wrote:
 Hello,

 I'm trying to profile an application on the N810 with gprof. I compiled it
 with -pg, -O3  and -g and linked all the libraries I'm interested in
 statically (and compiled them also with -pg -O3 -g).

 The first couple of problems were that I got a floating point exception in
 scratchbox when starting the program, and a crash on the tablet. Using
 -fno- unit-at-a-time and -fno-omit-frame-pointer solves the problem on the
 tablet, it still doesn't run in scratchbox tough. By the way, this is
 related to -O3, non-optimized build run fine in both scratchbox and on the
 tablet.

 My problem is however, that profiling doesn't give any usable results. In
 the profile written to gmon.out there are all times 0.0. But the call graph
 and calling count for functions is correct.

 I tried to strace the application, and profiling seems to work normally.
 The profiling code sets up the profiling timer, and the SIGPROF signal is
 received regularly throughout the program run.
 So I suspect the profiling code, or more precisely the SIGPROF-handler, to
 not being able to get the currently executing function based on the stack.
 My program is not spending a lot of time in some library functions or such,
 most of the time it's usually in some user-functions (I know that based on
 profiling on my PC).

 Can anybody shed some light on this issue? To I have to link against some
 special version of glibc? Or is profiling with gprof broken on ARM's?

IIRC, there might be something wrong with the toolchain in the respect of 
support for -pg option.

On the other hand, do you really need to use gprof? Profiling on N810 can be
done with oprofile, which covers all the gprof functionality and provides a
lot more features. Please check the following page:
http://maemo.org/development/tools/doc/diablo/oprofile/

If you think that for your specific case gprof is better and doubt that
oprofile can handle it well, please describe what exactly you want to do.
I'll try to advice something.

-- 
Best regards,
Siarhei Siamashka
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers

Re: Does anyone know the mechanism in Nokia's LCD driver?

2009-01-24 Thread Siarhei Siamashka

On Tuesday 13 January 2009, Frantisek Dufka wrote:
 Siarhei Siamashka wrote:
  XV should make a perfect backend for SDL, because it maps fine on SDL
  API (SDL_SetVideoMode/SDL_Flip/...). In general, XV is a good backend
  for anything that uses double-buffered or triple-buffered
  fullscreen/fullwindow blits. It is possible to get ~27.5 frames per
  second in 800x480 resolution for 16bpp rgb color format without
  tearing. With a lower resolution it is possible to go up to ~55 frames
  per second.

 Hmm, I hope there is some work done on this front (Xv or even openGLES
 based backend for SDL) for Fremantle.
 The alpha release has same old 1.2.8-23 SDL version though. Or is there some
 alternative to SDL for 2D graphics?

I was only talking about N800/N810 hardware, its kernel, xserver, SDL and how
to best use them on the current generation of internet tablets.

Fremantle is a completely different subject and I don't feel like discussing
it yet. It surely will bring new exciting features and challenges.

Regarding improved SDL or whatever is needed for 2D games, in any case it is
one of the things that the community can do with or without official Nokia
support.

 BTW, as for the external lcd controller status - I have just checked and
CONFIG_FB_OMAP_LCDC_EXTERNAL is disabled for both RX-51
 confugurations in Fremantle alpha kernel (unlike e.g. PowerVR stuff) so
 maybe we finally has directly mapped framebuffer in SDRAM ?

We can discuss this stuff after the actual HW is out :)

-- 
Best regards,
Siarhei Siamashka
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers

Re: Does anyone know the mechanism in Nokia's LCD driver?

2009-01-13 Thread Siarhei Siamashka

On Tue, Jan 13, 2009 at 10:56 AM, Frantisek Dufka duf...@seznam.cz wrote:
 BTW, there is kernel ioctl to set automatic refresh and the refresh rate
 can be tweaked in kernel source but the results are suboptimal. Maybe at
 least for Nokia 770 it would be possible to use tearsync flag for such
 automatic update and set the update timer to lcd refresh rate or lcd
 refresh rate/2 to get nice result. N8x0 cannot update whole screen in
 one lcd refresh cycle anyway so you'll always get tearing on the bottom.

N8x0 can update whole screen in two lcd refresh cycles (but admittedly
it just barely crosses this limit), which is enough to have no
tearing. But this is only true when running at OP 0 (400MHz). When
running at other operating points, RFBI is slower. It may be
interesting to experiment with better integration of omapfb kernel
driver with DVFS and try to always switch (at least temporarily) to OP
0 when performing screen updates.
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers

Re: Does anyone know the mechanism in Nokia's LCD driver?

2009-01-13 Thread Siarhei Siamashka

On Tue, Jan 13, 2009 at 12:23 PM, Felipe Contreras
felipe.contre...@gmail.com wrote:
 On Tue, Jan 13, 2009 at 12:05 PM, Igor Stoppa igor.sto...@nokia.com wrote:
 Hi,
 On Tue, 2009-01-13 at 18:06 +0800, ext Huang Gao (Gmail) wrote:
 Hi, Igor Stoppa:
   Thank you for your reply!
   So can I understand that this hardware FB is not contained in SDRAM
 or SRAM, and LCD will refresh itself from this hardware FB by its controller
 automatically, without the help of OMAP DMA channel?

 I'm in no way a display guy but iirc there are 2 modes for refresh:
 -auto: whatever is written to the framebuffer goes through straight to
 the LCD
 -manual: the image needs to be flushed to the LCD

 If you are interested, you can check it from the kernel source files.

 Would the manual mode help to avoid tearing?

Yes, and it does help to avoid tearing. At least this works fine for
XV extension. But getting tearfree scrolling/panning in GTK
applications for example is a bit more challenging. I can provide a
more detailed explanation if anybody is interested.

XV should make a perfect backend for SDL, because it maps fine on SDL
API (SDL_SetVideoMode/SDL_Flip/...). In general, XV is a good backend
for anything that uses double-buffered or triple-buffered
fullscreen/fullwindow blits. It is possible to get ~27.5 frames per
second in 800x480 resolution for 16bpp rgb color format without
tearing. With a lower resolution it is possible to go up to ~55 frames
per second.
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers

Re: Projects Nokia should support (yours?)

2008-11-01 Thread Siarhei Siamashka

On Saturday 01 November 2008, Ryan Abel wrote:
 On Nov 1, 2008, at 5:47 AM, Simon Pickering wrote:
  Something with more mileage, though still not really a killer app,
  would be working on optimisation of the backend libs that all three
  media players use (therefore any media player would benefit). But this
  will have to wait until we see what the hardware is capable of really.

 Considering the hardware is already doing 720p with only NEON
 optimizations, well. . . .

There are always ways to make a powerful hardware run as fast as a snail ;)
Especially when excessively complex frameworks and layers are used, chances
of having something implemented inefficiently in between get a bit higher.

-- 
Best regards,
Siarhei Siamashka
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers

Re: ALSA sound driver for Nokia 770 and DSP programming

2008-09-26 Thread Siarhei Siamashka

On Friday 26 September 2008, Simon Pickering wrote:
   Recently I have been trying to make it running and seems
   like we have a very
   good chance to have it working nicely. It is also
   interesting, that the
   linux-omap guys seem to be developing a new driver [3] for
   AIC23 which may
   eventually become a better alternative.
   Very nice!

 Good stuff Siarhei :)

 Have you built a replacement DSP-kernel yet?

Yes, sure. You can easily build DSP kernel and demo_console DSP task from
the sources in dspgw-3.3-dsp.tar.bz2

All that is need to done is to run 'make' in 'tokliBIOS' (*), 'tinkernel' and 
'apps/demo_mod' subdirectories. You will get 'tinkernel.out' and 
'demo_console.o' binary files which are DSP kernel and DSP task respectively.
After that, have a look into dspgw-3.3-dynamic-demo-omap1.tar.bz2 archive for
the target directory layout and README file with the instructions how to test
it. Of course you can replace 'tinkernel.out' and 'demo_console.o' with the
files that you have compiled yourself. ARM side binaries also need to be
recompiled before you can run them (they were compiled for OABI and will
not run out of the box), but that's a minor issue. Only 'dsp_dld' compilation
may cause problems because it needs a more up to date version of flex than the
one that is part of OS2006 SDK.

(*) Actually you need to apply a patch to 'tokliBIOScfg.tcf' if you want to
use the generated 'tokliBIOScfg.cmd' instead of tinkernelcfg.cmd'. That was
actually the hardest part.

Now it is possible to experiment with configuring DSP kernel by changing
.tcf file and enabling different kernel features. Documentation which explains
its syntax is available in free DSP toolchain.

So DSP programming for 770 and other OMAP1 based devices should be perfectly
fine. But I can't say the same for N8x0 at the moment, because free DSP
toolchain from TI does not support compilation of DSP kernel for OMAP2
according to dspgateway documentation.

-- 
Best regards,
Siarhei Siamashka
--- dspgw-3.3-dsp.orig/tokliBIOS/tokliBIOScfg.tcf	2005-06-09 07:28:21.0 +0300
+++ dspgw-3.3-dsp/tokliBIOS/tokliBIOScfg.tcf	2008-09-27 03:00:40.0 +0300
@@ -47,6 +47,10 @@
 bios.MEM.MALLOCSEG = prog.get(DARAM);
 bios.MEM.BIOSOBJSEG = prog.get(DARAM);
 
+var extmem = bios.MEM.create(EXTMEM);
+extmem.base = 0x14000;
+extmem.len = 0x1000;
+extmem.createHeap = false;
 
 /*
  * CLK
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers

Re: ALSA sound driver for Nokia 770 and DSP programming

2008-09-26 Thread Siarhei Siamashka

On Friday 26 September 2008, Robert Schuster wrote:
 Hi,

 Siarhei Siamashka schrieb:
  Recently I have been trying to make it running and seems like we have a
  very good chance to have it working nicely. It is also interesting, that
  the linux-omap guys seem to be developing a new driver [3] for AIC23
  which may eventually become a better alternative.

 Very nice!

 I will try your patch in mamona which IMO provides the most easy way to
 test those things out. 

That's great, feedback is very much welcome. Though bugreports can wait until
I roll out the next revision of the patch ;)

 Mamona is based on OpenEmbedded and every other 
 distribution made from it bases its sound core on ALSA not gstreamer ...

Somehow I also like this approach better ...

 It is good that a new driver is in development, however I have doubts
 that we will be able to run 2.6.27+ on the N770 soon ;).

Sure, upgrading the kernel may end up in fixing one problem, but introducing a
lot more of them instead :) But we could try to backport the new driver to
2.6.16 once it is ready.

 Is the AIC23 also in the N8x0 devices?

In addition to the information already provided by Simon, I can recommend to
search linux-omap and alsa-devel mailing lists.

-- 
Best regards,
Siarhei Siamashka
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers

ALSA sound driver for Nokia 770 and DSP programming

2008-09-25 Thread Siarhei Siamashka

Hi,

As has been discovered long ago [1] but eventually forgotten, Nokia 770 has
AIC23 audio hardware [2] which can be used not only from DSP side, but
from ARM as well. Moreover, OS2006 kernel sources even contain an ARM driver
for it, but this driver is disabled (that's understandable as the driver is
not in a very good shape and has quite a number of bugs).

Recently I have been trying to make it running and seems like we have a very
good chance to have it working nicely. It is also interesting, that the
linux-omap guys seem to be developing a new driver [3] for AIC23 which may
eventually become a better alternative.

Kernel patch is attached. It enables AIC32 driver, adds a hack to
power on/off code so that audio codec is permanently powered on (power 
on/off code is not reliable and needs to be reworked). Also it fixes a 
problem with audio stuttering on video playback in mplayer (the driver
had broken position reporting which is critical for proper audio/video
synchronization).

Here is some usage instruction (beware that standard disclaimer applies: 
you can use this patch at your own risk, this code is quite untested. If it
somehow manages to fry your device, you have been warned and I'm not
responsible for any breakages):

1. Disable esd daemon and DSP stuff in order to move it out of the way
(temporarily rename '/usr/bin/esd' and '/usr/sbin/dsp_dld' to something else)
2. Apply the attached patch to OS2006 kernel, compile and flash it to the
device
3. Compile and install alsa userspace library, I used alsa-lib-1.0.11.tar.bz2
4. Put attached 'asound.conf' into '/etc' directory on the device, it enables
dmix plugin for audio mixing and resampling
5. Compile and try some applications which use ALSA, I tested 'aplay' 
and 'mplayer'

The driver is semi-usable now, but a lot still needs to be done:
* proper power management to avoid excessive battery drain
* audio volume control
* switch between speaker/headphone
* audio quality is a bit crappy now, this needs to be fixed
* maybe some more fixes for bugs that are yet to be discovered...

DMA code is quite suspicious (especially the way it does channels linking) and
might be responsible for audio quality issues. Also sofware mixing/resampling
code in dmix plugin can benefit from ARM optimizations.


Now regarding why we may want it. Once if we get a good, low latency, fully
functional and reliable ALSA sound driver running on ARM, it gives maemo 
community a nice possibility to scrap all the proprietary DSP binaries. This
provides us with a new and shiny 252MHz C55x DSP core ready to be used by
something else :)

Free linux DSP toolchain from TI [4] supports generation of both DSP kernel
and DSP tasks for OMAP1 based devices which is sufficient for DSP development.
The toolchain license was supposed to permit open source development (with
noncommercial restriction), though the license text itself is a bit
questionable [5].

With DSP avalable for use and having no need to spend efforts on ensuring
compatibility and peaceful coexistence with proprietary binary codecs (free
and proprietary code does not mix well), it should be possible to turn 
Nokia 770 into quite a powerful media player.


1. http://lists.maemo.org/pipermail/maemo-developers/2006-June/022231.html
2. http://focus.ti.com/docs/prod/folders/print/tlv320aic23b.html
3. http://thread.gmane.org/gmane.linux.ports.arm.omap/11700/focus=11709
4. 
https://www-a.ti.com/downloads/sds_support/targetcontent/LinuxDspTools/index.html
5. http://www.gossamer-threads.com/lists/maemo/developers/30611

-- 
Best regards,
Siarhei Siamashka
diff --git a/arch/arm/mach-omap1/board-nokia770.c b/arch/arm/mach-omap1/board-nokia770.c
index 3862a77..90f113a 100644
--- a/arch/arm/mach-omap1/board-nokia770.c
+++ b/arch/arm/mach-omap1/board-nokia770.c
@@ -33,6 +33,8 @@
 #include asm/arch/aic23.h
 #include asm/arch/gpio.h
 #include asm/arch/lcd_mipid.h
+#include asm/arch/mcbsp.h
+#include asm/arch/omap-alsa.h
 
 extern void nokia770_ts_init(void);
 extern void nokia770_mmc_init(void);
@@ -67,6 +69,42 @@ static int nokia770_keymap[] = {
 	0
 };
 
+#define DEFAULT_BITPERSAMPLE 16
+
+static struct omap_mcbsp_reg_cfg mcbsp_regs = {
+.spcr2 = FREE | FRST | GRST | XRST | XINTM(3),
+.spcr1 = RINTM(3) | RRST,
+.rcr2 = RPHASE | RFRLEN2(OMAP_MCBSP_WORD_8) |
+RWDLEN2(OMAP_MCBSP_WORD_16) | RDATDLY(0),
+.rcr1 = RFRLEN1(OMAP_MCBSP_WORD_8) | RWDLEN1(OMAP_MCBSP_WORD_16),
+.xcr2 = XPHASE | XFRLEN2(OMAP_MCBSP_WORD_8) |
+XWDLEN2(OMAP_MCBSP_WORD_16) | XDATDLY(0) | XFIG,
+.xcr1 = XFRLEN1(OMAP_MCBSP_WORD_8) | XWDLEN1(OMAP_MCBSP_WORD_16),
+.srgr1 = FWID(DEFAULT_BITPERSAMPLE - 1),
+.srgr2 = GSYNC | CLKSP | FSGM | FPER(DEFAULT_BITPERSAMPLE * 2 - 1),
+/*.pcr0 = FSXM | FSRM | CLKXM | CLKRM | CLKXP | CLKRP,*/ /* mcbsp: master */
+.pcr0 = CLKXP | CLKRP,  /* mcbsp: slave */
+};
+
+static struct omap_alsa_codec_config alsa_config = {
+.name   = Nokia770 AIC23,
+.mcbsp_regs_alsa

Re: ALSA sound driver for Nokia 770 and DSP programming

2008-09-25 Thread Siarhei Siamashka

On Thursday 25 September 2008, Felipe Contreras wrote:
 On Thu, Sep 25, 2008 at 10:07 PM, Siarhei Siamashka
[...]
  Now regarding why we may want it. Once if we get a good, low latency,
  fully functional and reliable ALSA sound driver running on ARM, it gives
  maemo community a nice possibility to scrap all the proprietary DSP
  binaries. This provides us with a new and shiny 252MHz C55x DSP core
  ready to be used by something else :)
 
  Free linux DSP toolchain from TI [4] supports generation of both DSP
  kernel and DSP tasks for OMAP1 based devices which is sufficient for DSP
  development. The toolchain license was supposed to permit open source
  development (with noncommercial restriction), though the license text
  itself is a bit questionable [5].
 
  With DSP avalable for use and having no need to spend efforts on ensuring
  compatibility and peaceful coexistence with proprietary binary codecs
  (free and proprietary code does not mix well), it should be possible to
  turn Nokia 770 into quite a powerful media player.

 Great stuff!

 Do you plan to use the dsp-gateway or dsp-bridge?

Now as you mentioned that, it indeed makes sense to consider other
alternatives if they exist. Do you have any links to the information about 
dspgateway vs. dspbridge comparison (features/performance/reliability)?

Using dspgateway has a clear advantage that it is already included in the 
kernel. And dspgateway is more or less ok, though patching it a bit in order
to improve performance will be required.

-- 
Best regards,
Siarhei Siamashka
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers

Re: DSP SBC encoder update

2008-07-10 Thread Siarhei Siamashka

On Thu, Jul 10, 2008 at 2:06 PM, Simon Pickering
[EMAIL PROTECTED] wrote:
 The change which has allowed it to encode an entire song rather than just
 a
 few seconds was to move the input and output buffers from SDRAM (OMAP main
 memory) to SRAM (DSP fast single access memory). There are probably other
 things which would benefit from being moved, the sbc-priv data (or parts
 thereof) for one. This structure is pretty big so I allocated it in SDRAM,
 but at least parts of it might be better off in faster local memory. This
 is
 something to look at.

 I looked at this yesterday evening (thanks to derf, crashanddie, and others
 for answering my C questions), trying to move some parts of the priv
 structure to SARAM (sorry for the SRAM typo above). Unfortunately just
 moving the bare minimum (the X array) won't happen as there's not enough
 SARAM (so dsp_dld tells me). I don't know where it's all gone, anyone have
 any ideas?

Do you use any buffers allocated by malloc? My guess is that malloc
does allocation of DARAM and SARAM memory.
In any case, memory returned by malloc should be not worse than the
memory buffer explicitly statically placed to EXTMEM.
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers

Re: DSP SBC encoder update

2008-07-10 Thread Siarhei Siamashka

On Thu, Jul 10, 2008 at 6:57 PM, Simon Pickering
[EMAIL PROTECTED] wrote:
 No, I understood, I was just mentioning that there appear to be two
 heaps to chose from - presumably one is used by the DSP tasks (malloc is
 probably #defined as one of the CSL MEM* fns in the DSP Gateway task
 functions).

Maybe it is just clever/stupid enough to do the allocation
automatically. At least when I did some experiments with DSP before,
it was alocating DARAM memory. Surely, you might want to have better
control to put most performance critical data into DARAM, but malloc
is a standard C function and is more portable.

 Yes, accessing SDRAM memory is extremely slow. And if you access SDRAM
 memory using 16-bit accesses instead of 32-bit accesses, the overhead
 doubles. So if your data processing algorithm does not deal
 exclusively
 with 32-bit data accesses, you are better not to run it to process
 data
 in SDRAM memory. Copying data to a temporary buffer in DARAM or
 SARAM, processing it there and copying results back to SDRAM would be
 faster in this case.

 The X[] array data type is an int32, so even accessing 32bit from SDRAM
 is still slower than using a local buffer (depending on what you need to
 do with it of course).

It depends on how many times the data is accessed. For example, if you
have some algorithm that accesses this memory location 10 times, you
would have 2 SDRAM + 10 SRAM memory accesses by using
fetch/process/store pattern vs. just 10 SDRAM memory accesses if
working with this buffer directly in SDRAM. As SDRAM is an order of
magnitude slower (decimal order, not binary), you really want to avoid
dealing with SDRAM as much as possible.
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers

Re: wpa_supplicant and cx3110x

2008-06-14 Thread Siarhei Siamashka

On Sunday 15 June 2008, Andrew Barr wrote:
 I am using an N800 running Angstrom[0] booting off of an SD card, using
 an initfs modified to multiboot as is documented on the wiki[1]. I am
 having some trouble getting wpa_supplicant to authenticate on my WPA-EAP
 (PEAP-MSCHAPv2/TKIP) home network. This works fine in the Internet
 Tablet OS provided power management is turned off (something is
 apparently wrong with 802.11 PSM on my AP). It appears to be off by
 default when the tablet is booted into Angstrom
 (/sys/devices/platform/wlan-omap/psm is '0'). The EAP auth seems to work
 ok (eap_state in 'wpa_cli stat' is SUCCESS) but the crypto gets hung up
 at 4WAY_HANDSHAKE and times out. I have observed this on my own home
 network and at least one other similar network at my school. I am using
 wpa_supplicant 0.6.3 linked with openssl (also tested 0.5.5 linked with
 gnutls) and the 'wext' driver. The cx3110x driver is the one shipped in
 the initfs for the latest official Nokia firmware.

 I can provide packet captures and similar debugging info, perhaps the
 maintainer of the driver can help me out with this? The last message I
 saw concerning this was from late 2007, talking about running
 wpa_supplicant under ITOS and this was not yet supported as cx3110x did
 not yet support WE-18 or later. This appears to no longer be the case,
 as /proc/net/wireless indicates WE-22 and no version mismatch messages
 are printed by iwconfig. Also, the output of 'iwlist wlan0 scan' has WPA
 info, which indicates to me that WPA via Wireless Extensions _should_
 work.

 Thanks for any help.

 [0] http://www.angstrom-distribution.org/
 [1] http://maemo.org/community/wiki/HowTo_EASILY_Boot_From_MMC_card/

IIRC the recommended way to get cx3110x support is to use cx3110x-devel
mailing list: https://garage.maemo.org/mailman/listinfo/cx3110x-devel

BTW, there is a set of community enhancements and patches for cx3110x
driver (Nokia 770 version) collected together by Rodrigo Vivi and posted
here: https://garage.maemo.org/pipermail/cx3110x-devel/2008-April/38.html

-- 
Best regards,
Siarhei Siamashka
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers

Re: Hardware performance counters

2008-03-02 Thread Siarhei Siamashka

On 3 March 2008, Gustavo Sverzut Barbieri wrote:
 On Sun, Mar 2, 2008 at 2:17 PM, Vinod Hegde [EMAIL PROTECTED] wrote:
  Hi Everyone,
 
  How do I access the hardware performance counters available in OMAP2420.
  what are the header files that contain the implementation.
  I tried lots to figure out but in vein. thanks for any help

 Use oprofile, or see how they do it.

 http://blog.gustavobarbieri.com.br/2007/05/22/oprofile-and-maemo-n800/

 it's outdated, so you need to do your own kernel modules.

Up to date instructions are here, it is really easy to install:
http://maemo.org/development/tools/doc/oprofile/
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers

Re: WLAN Horrible Roaming Performance (N800, OS2008), Software or Hardware Problem ?

2008-02-22 Thread Siarhei Siamashka

On 22 February 2008, Frantisek Dufka wrote:
 Kalle Valo wrote:
  Also CPU usage is very high because of busyloop when waiting till
  DMA transfer is done. Tasklet, which executes the code can't be
  easily preempted, as far as I understand kernel documentation. Maybe
  it is possible to split tasklet into several parts, one of them
  could be responsible for initiating DMA transfer, the other could be
  activated on DMA transfer completion? This all is important for
  video streaming as any excessive CPU resources consumption by WLAN
  driver negatively impacts video playback performance.
 
  Sorry, I'm not familiar with OMAP 1710 McBSP, so I can't comment.

 I think you don't have to. From the code (sm_drv_spi_io.c) it looks like
 McBSP is setup to use dma transfer with callback when it finishes

 omap_request_dma(OMAP_DMA_MCBSP2_TX, McBSP TX, dma_tx_callback, ...
 omap_request_dma(OMAP_DMA_MCBSP2_RX, McBSP RX, dma_rx_callback, 

 and the dma_tx/rx_callback() code just sets variable
 spi_dma.dma__tx/rx_done to 1.

 But the code that sends/receives the frame busyloops for it like this

  omap_start_dma(spi_dma.dma_rx_ch);


  omap_start_dma(spi_dma.dma_tx_ch);


  while(!spi_dma.dma_rx_done) {


  udelay(5);


  }




  while(!spi_dma.dma_tx_done) {


  udelay(5);


  }




 spi_dma.dma_rx_done = 0;


  spi_dma.dma_tx_done = 0;




 So there is this nice dma architecture with callback used but the code
 still spins up the cpu waiting for the done flag instead of sleeping.

 So you need to be familiar with the driver and tell us if it is possible
 to sleep inside cx3110x_spi_dma_read and cx3110x_spi_dma_write. And one
 also needs to be familiar with kernel programming and waiting primitives
 to suggest how to sleep and wait for the callback (if possible in this
 context) and how to wake up the sleeping code from the dma callback.

A while ago I looked for various kernel docs to see what's happening in the
wlan driver and what can be done to reduce cpu load. My impression was that
tasklet can be only preempted by hardware interrupts, so it is impossible to
sleep in it and give cpu resources to userland applications. If that is true,
no matter if n800 driver looks nicer, it must end up busylooping too.

Though on Nokia 770 cpu usage is attributed to the application doing (for
example wget) and on N800 it is attributed to 'OMAP McSPI/0' process.

A solution that I tried to suggest might be to start DMA transfer, schedule
another tasklet to run after DMA transfer is done and exit from the first
tasklet. That another tasklet should get activated and do the rest of the job.
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers

Re: SDL, tearing, X overhead and direct framebuffer gfx

2008-02-18 Thread Siarhei Siamashka

On Feb 17, 2008 9:56 PM, Tobias Oberstein [EMAIL PROTECTED] wrote:

[...]

 I've read a lot of bits on the web 'bout mplayer, vsync, omapfb etc.
 and tried to assemble a minimal example of using direct framebuffer
 access for gfx output.

 Q: I can't get the tearing away (only fixed at certain line positions).
 What am I doing wrong?

Transfer framebuffer-videoram must be fast enough to complete for the
period of two LCD refresh cycles, also see
http://lists.maemo.org/pipermail//maemo-developers/2007-March/009202.html
Using smaller source rectangle in the framebuffer will reduce data
transfer time and the tearing line at the bottom will disappear (using
'new' screen update ioctl which was introduced in N800 kernel, this
rectangle can be upscaled to fullscreen). You can calculate the
resolution which can be used without tearing either theoretically or
in an experimental way.

 I wondered if there would be any plans to make SDL run directly on
 framebuffer .. if not, I'd maybe give it a try.

 Q: Where can I find the sources to the OS2008 SDL?

AFAIK, SDL is used pretty much unmodified. My guess is that you can
get it here: http://repository.maemo.org/pool/chinook/free/source/

As for some practical solution on N800/N810, I think it makes sense
tweaking xserver to add support rgb color format in Xv and tweaking
SDL to use Xv for the emulation of setting arbitrary screen
resolutions (setting low resolution will eliminate tearing and will be
useful for games).

For those interested in the topic, documentation for the Epson LCD
controller used in N8x0 (S1D13745) is available here:
http://vdc.epson.com/index.php?option=com_docmantask=cat_viewgid=38Itemid=40
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers

Re: SDL, tearing, X overhead and direct framebuffer gfx

2008-02-18 Thread Siarhei Siamashka

On Feb 18, 2008 9:39 AM, Tapani Pälli [EMAIL PROTECTED] wrote:
  When using the supplied SDL library for doing timer-based frame
  rendering, there seems to be
  - heavy tearing

 Tearing unfortunately happens because there is no vsync available for
 framebuffer driver to use.

Could you please verify and confirm this information? Framebuffer
driver from OS2007 supported tearsync (via OMAPFB_FORMAT_FLAG_TEARSYNC
flag as Frantisek mentioned), and it was used at least for video.
Well, I have noticed some tearing in mplayer with OS2008 though.

  Q: I can't get the tearing away (only fixed at certain line positions).
  What am I doing wrong?
 
 
 Nothing, you cannot get away from tearing.

Still what about trying different LCD panel timings? For example,
reducing LCD refresh rate to something like 40Hz should allow 20 full
resolution fullscreen rgb updates per second with perfect tearsync. I
don't dare trying such experiments myself as I'm afraid to kill LCD
panel of my N800 :) Can any HW expert tell if it can be possible? Link
to LCD controller docs is available earlier in this thread.

On the other hand, reducing refresh rate may introduce problems for 25
and 30 fps video playback (for high resolution video only, when the
time to transfer frame data over graphics bus is larger than one LCD
refresh cycle).
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers

Re: SDL, tearing, X overhead and direct framebuffer gfx

2008-02-18 Thread Siarhei Siamashka

On Feb 18, 2008 1:28 PM, Tapani Pälli [EMAIL PROTECTED] wrote:
  Could you please verify and confirm this information? Framebuffer
  driver from OS2007 supported tearsync (via OMAPFB_FORMAT_FLAG_TEARSYNC
  flag as Frantisek mentioned), and it was used at least for video.
  Well, I have noticed some tearing in mplayer with OS2008 though.
 
 This is what I've heard from our kernel team members, maybe they could
 share some more light to this. AFAIK the hardware itself does not offer
 sync.

Is it possible to invite kernel team members to join this discussion?
:) At least it would be nice if they had a look at this thread.

N800 hardware definitely supports tearsync. It worked fine in OS2007
(I'm not telling that OS2008 does not support it anymore, I just can't
check this till I get home). When I looked through xserver sources
last time, tearsync was used for video planes, but was disabled for
normal rgb updates. This can be easily explained. Video usually has
lower resolution than 800x480, requires less graphics bandwidth and it
is possible to display it with perfect tearsync. With tearsync enabled
for rgb updates, we get an ugly tearing line at a fixed location in
the bottom of screen when doing 800x480 rgb update (the first OS2008
firmware had this problem btw). Without tearsync flag set, we also get
tearing, but at random locations on screen, and it is less
noticeable/annoying.
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers

Re: WLAN Horrible Roaming Performance (N800, OS2008), Software or Hardware Problem ?

2008-02-18 Thread Siarhei Siamashka

On Feb 14, 2008 8:43 AM, Kalle Valo [EMAIL PROTECTED] wrote:
[...]
  other users reported it too as Luca Olivetti pointed out. and it
  seems like the problem and fix is described here:
 
  http://internettablettalk.com/forums/showpost.php?p=134914postcount=15
 
  at least for the 770 the fix seems to exist,

 What I read from the link, someone had written a workaround to try
 again whenever the chip is responding. That would good a feature, but
 I would like to get more information about what's happening in this
 case.

I'm sorry. For some unknown reason, I thought that I notified you
about this problem long ago, but appears that we only discussed this
issue privately with Frantisek Dufka :(

I encountered this problem when I was checking what is the maximum
McBSP clock frequency that could be used reliably on Nokia 770 to
speed up WLAN performance. To do this stability test, I just put the
device on charger, established wlan connection and started a test
script which repeatedly executed wget to download a large file, piping
it to md5sum and verifying that the file always gets received
correctly. That's probably one of the most simple stress tests that
can be done :)

People on ITT, who are suffering from this disconnection problem are
running bittorent client software which probably stresses network to a
much higher extent.

Having kept this simple test running, I noticed that wlan network is
getting stuck eventually. Sometimes very soon and sometimes after a
long time. Checking dmesg log revealed the following lines:
[84936.145721] We haven't got a READY interrupt from WAKEUP. (firmware crashed?)
[84940.419342] TX dropped
[84940.419433] TX dropped

The symptoms are similar to what other people reported as
https://bugs.maemo.org/show_bug.cgi?id=329

Initially I thought that it was the effect of overclocking, but could
reproduce the problem even after going back to the standard frequency.

With a simple patch that just retries operation on such error,
wireless connection got stable. After a long test with the test
script, no problems were detected. The following lines could be
occasionally seen in dmesg log and it  proves that there were
potential connection drops encountered, but they all did not cause any
troubles in reality (MD5 of downloaded file was always OK):
[50559.494232] Dynamic PSM
[50559.494323] PSM timeout 1000 ms
[50622.038146] We haven't got a READY interrupt from WAKEUP. (firmware
crashed?)
[50622.038269] Try again...
[50622.038330] succeeded!!!

I'm attaching the same patch here. It is not very clean, but it does
the job (for Nokia 770).


And I have encountered other problems with WLAN driver that are yet to
be solved. For example, sometimes speed drops to ~30KB/s (that's still
an unresolved mystery to me). Also CPU usage is very high because of
busyloop when waiting till DMA transfer is done. Tasklet, which
executes the code can't be easily preempted, as far as I understand
kernel documentation. Maybe it is possible to split tasklet into
several parts, one of them could be responsible for initiating DMA
transfer, the other could be activated on DMA transfer completion?
This all is important for video streaming as any excessive CPU
resources consumption by WLAN driver negatively impacts video playback
performance.


n770_wlan_retry_on_wake_error.diff
Description: Binary data
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers

Re: Frequencies scaling with OS2008

2007-12-31 Thread Siarhei Siamashka

On 30 December 2007, Frantisek Dufka wrote:
 Krischan Keitsch wrote:
  I was wondering if the device really needs to run at 300MHz (220MHz dsp)
  for mp3 playback? Is the max dsp power needed for such a task? Or would
  220MHz (177MHz dsp) or 165MHz (85MHz dsp) be sufficient? Would a lower
  dsp scaling save even more battery?

 Well, yes it looks a bit simplistic now. Even if you play audio decoded
 by ARM cpu (ogg, real audio) it seems to lock ARM core to 330 and dsp to
 220Mhz. I suppose it is because you need pcm dsp task running for audio
 output and any active dsp task locks it to 220Mhz (and thus cpu to 330).
 I wonder if it is just simple implementation that can be tuned in next
 firmwares or there is some fundamental problem (like changing dsp clock
 of already running dsp task may break it so it is hardcoded to 220).

ARM cpu frequency is apparently not locked at 330MHz when using pcm dsp task.

That's why it is faster to do MP3 decoding on ARM core with the 
current OS2008 firmware. Extra 70MHz of ARM core frequency are more 
than enough to handle MP3 and there are even some resources left to 
speed up video decoding. 

It is interesting if it is possible to lock ARM/DSP frequency at 400/166
instead of 330/220 when playing video. That would probably improve 
built-in player performance on some heavy bitrate/resolution videos.

Also it is interesting to know what is the difference in power consumption 
between 400/166 and 330/220 modes?
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers

Re: Frequencies scaling with OS2008

2007-12-31 Thread Siarhei Siamashka

On 31 December 2007, Frantisek Dufka wrote:
 Igor Stoppa wrote:
  Having the audio path open, but no dsp tack loaded (arm audio) sets the
  clock to 400MHz.

 Interesting, so, umm, there is way to play audio from ARM side directly?
 What I tried is to play BBC radio in home screen applet which activated
 only pcm2 task and arm clock dropped from 400 to 330. That lead me to
 conclusion that there is no way to output audio with arm clock at
 400Mhz. Why there are pcm tasks (used when streaming internet radio) if
 we could output audio from arm side without limiting ARM clock? Siarhei
 apparently used a way to output audio without activating DSP from
 mplayer, how?

I did not do anything special. ARM clock frequency just remains at 400MHz 
when using esd or sdl for audio output. I did some benchmarks and it 
became clear that it is now faster not to touch dsp mp3 task and just do
all the decoding on ARM core. In addition, my hack which used dspmp3sink 
from MPlayer, now has problems with audio/video synchronization in OS2008.
So looks like it is a good time to drop it. Using DSP for MP3 audio was a 
useful trick on Nokia 770 and OS2006, but right now everything is reversed 
for N800/N810 and OS2008.

Anyway, my guess also was that pcm dsp task is used in osso-esd, maybe it 
makes sense to check its sources more thoroughfully. Looks like we will
have a lot of new discoveries with OS2008 :)

As ARM core is quite fast in N8x0, probably it would make sense to try 
keeping DSP out of the way whenever possible (restrict it to 133MHz only, 
keep DSP tasks which are fast enough to run at this frequency, port 
all the other DSP tasks to ARM)? That is unless improvements to support 
more intelligent DSP clock frequency selection are still possible.

But for those interested in C55x development, Nokia 770 is still a very
interesting device as it runs DSP at full speed.
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers

Re: DSP vs. ARM endianness and struct packing

2007-09-14 Thread Siarhei Siamashka

On 14 September 2007, Charles 'Buck' Krasic wrote:
 I did some experimentation a while back with DSP - ARM communication
 via mmap'ed memory, in my case I was working on using the DSP for rgb
 to yuv conversion. Another big gotcha to look out for is 64k
 boundaries. The DSP (at least in the 770) just can't naturally deal
 with object bigger than 64k, so you will get very bizarre results if
 you run into this limitation.

Isn't it more a limitation of a free dsp toolchain? I have seen a pdf 
where OMAP1710 was mentioned to have c55x rev 3.0 core which does not 
have this limitation:
http://www.ocpip.org/japanese/news/presentations/Japanese_JapanTI.pdf

Also when looking for various DSP related information, I found Texas
Instruments public ftp with the following interesting directory:
ftp://ftp.ti.com/pub/cs/v275/

It looks like a linux c55x dsp toolchain with a slightly updated version, and 
what is more interesting, it lists OMAP1710 as one of the supported targets.
I have also read about a rather scary thing such as silicon bugs :) Looks like 
silicon bugs are a lot more common in DSP than the bugs in general purpuse
cores. My guess is that TI is solving this problem by releasing toolchains
which are able to avoid generating problematic sequences of code. In this case
having a compiler that is aware of the target core (OMAP1710 and OMAP2420)
would be a really nice thing to have.

If a more recent toolchain proves to be useful, maybe it would make sense
asking TI to include it into a free linux dsp tools package? Or at least 
query about its status (whether it is ok to download and use that toolchain
from ftp or they put it there by some mistake).

Hope that this information might be useful for dsp hackers.
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers

Re: Python and GStreamer

2007-08-30 Thread Siarhei Siamashka

On 30 August 2007, Jesse Guardiani wrote:
 Koen Kooi wrote:
  -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA1
 
  Jesse Guardiani schreef:
  And the related question is: given an existing program that sends
  stuff out to ALSA and doesn't use gstreamer, how difficult is it
  generally to port it so that it works properly?
 
  No porting necessary, really. mplayer comes with a decoder called
  libmp3. It's not optimized for ARM or anything, and it compiles without
  any problems. We don't use it in Kagu for A2DP though because there is
  another decoder out there called ffmp3 which doesn't use floating point
  math, so it's a little more efficient on ARM.
 
  use libmad (-ac mad), that works great on arm and x86.

 Any cpu savings over ffmp3? The mplayer maintainers tell me that ffmp3
 is the best choice on ARM.

Just for the sake of correctness, I did not quite say that :) This is what 
I replied to you earlier when we were discussing A2DP performance issues:

Software MP3 (-ac ffmp3) and OGG/Vorbis (don't remember exact '-ac' option
for it, but you don't need it anyway as this decoder is used by default)
decoders are already enabled in mplayer. If we want the best MP3 decoding
performance, libmad (-ac mad) is the best choice, but it is an external
dependency and is not used right now. The worst option for software MP3
decoding on ARM is mp3lib (-ac mp3), it uses floating point math and is a lot
slower than other (fixed point) MP3 decoders even on N800 which supports
floating point math in hardware.

You can try to test all these decoders yourself to figure out which one 
works the best for you.


Just ffmp3 is a part of ffmpeg library and is bundled with mplayer by default
and libmad is an external dependency which may make packaging a bit more
complicated. That's the only reason why libmad support is not enabled in 
maemo build of mplayer yet.
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers

Re: Java acceleration/Jazelle

2007-07-18 Thread Siarhei Siamashka

On Wednesday 18 July 2007 13:01, Simon Pickering wrote:

 Does anyone know whether there are there any good docs/books on ARM asm
 programming, telling people these sort of things? This is an interesting
 (and hopefully useful) learning experience, but can be really frustrating
 when I know what I want to do, and pretty much how to, but not quite! :)
 E.g. calling functions in linked libraries, how to call .s file functions
 from C, what is and isn't allowed in in-line asm, etc.

I would recommend checking the following documentation from ARM website:

http://www.arm.com/documentation/Instruction_Set/index.html
ARM v5TE Architecture Reference Manual for the detailed information about
the instruction set (up to ARMv5TE). Unfortunately it does not cover new ARMv6
instructions (I used Quick Reference Card to get some information about them).

http://www.arm.com/pdfs/aapcs.pdf
Procedure Call Standard for the ARM Architecture for the information about
calling conventions and arguments passing between asm and C and generally
about ABI.

http://www.arm.com/documentation/ARMProcessor_Cores/index.html
ARM1136JF-S and ARM1136J-S r1p1 Technical Reference Manual for ARM11
pipeline description and instruction timings (useful when optimizing for 
N800).

ARM9EJ-S Revision r1p2 Technical Reference Manual for ARM9E pipeline
description and instruction timings (useful when optimizing for 770).

These four pdf files cover almost everything needed if you are interested in
assembly programming for Nokia 770 and N800. But surely ARM website 
provides many other interesting documents worth reading.
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers

Re: MPlayer compilation problem (armv5te)

2007-05-25 Thread Siarhei Siamashka

On Friday 25 May 2007 11:42, Juuso Räsänen wrote:

 I have been trying to compile MPlayer myself under Maemo 3.1.
 I've used patches from:
 https://garage.maemo.org/frs/?group_id=54

 However, commands...

 [sbox-SDK_ARMEL: ~/MPlayer-1.0rc1-maemo.16]  ./configure
 [sbox-SDK_ARMEL: ~/MPlayer-1.0rc1-maemo.16]  make

 ...results after a while in errors like:

 cc -c -I. -I../libvo -I.. -Wdeclaration-after-statement -O4   -pipe
 -ffast-math -fomit-frame-pointer -I/usr/include -D_REENTRANT
 -I/usr/include/gstreamer-0.10 -I/usr/include/glib-2.0
 -I/usr/lib/glib-2.0/include -I/usr/include/libxml2-I/usr/include/SDL
 -D_REENTRANT  -I/usr/include/freetype2 -DMPG12PLAY -o idct_armv5te.o
 idct_armv5te.c idct_armv5te.c: In function `idct_row':
 idct_armv5te.c:125: warning: ISO C90 forbids mixed declarations and code
 {standard input}: Assembler messages:
 {standard input}:107: Error: selected processor does not support `smulbb
 fp,lr,r9' {standard input}:109: Error: selected processor does not support
 `smlabb r5,r0,ip,fp'

This error is caused by the use of armv5te instructions in inline assembly
while gcc is not ordered to support them (either with -march or -mcpu
options). 

In order to fix this problem and compile the package with the best settings
for N800, you can use:
CFLAGS=-mcpu=arm1136jf-s -mfpu=vfp -mfloat-abi=softfp -O3 -fomit-frame-pointer 
-ffast-math ./configure
make

Or just type 'make deb-n800' or 'make deb-n770' to build a .deb package. 
See 'debian/rules' file for the options used when building mplayer packages.

But I think that maemo developers mailing list is not the best place to
discuss such application specific issues which do not have any direct 
relation to maemo platform in general. It would probably make sense to 
use  'support' tracker at https://garage.maemo.org/projects/mplayer/
for this instead.

If you have any other questions or still need help, feel free to e-mail me
directly. I guess, discussing the programming techniques and optimizations 
used in mplayer which are potentially usable in other maemo applications is 
ok here,  but such boring stuff as compilation issues is unlikely to be
interesting to anyone else :)
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: N800 Video playback

2007-05-06 Thread Siarhei Siamashka

On Thursday 03 May 2007 10:21, Frantisek Dufka wrote:

 Siarhei Siamashka wrote:
If decoding time for
  each frame will never exceed 28-29ms (which is a tough limitation, cpu
  usage is not uniform), video playback without dropping any frames will be
  possible even with tearsync enabled.

 Would a double or multiple buffering help with this? 

Yes, most likely it will. N800 has 800x480 virtual size for framebuffer and a
new enhanced screen update ioctl. Now it should be possible (did not try yet, 
but will have some results very soon) to specify output position and size for
the rectangle as it gets displayed on the screen.

struct omapfb_update_window {
__u32 x, y;
__u32 width, height;
__u32 format;
__u32 out_x, out_y;
__u32 out_width, out_height;
__u32 reserved[8];
};

This theoretically allows us to use some kind of double buffering, we can
split framebuffer into two 400x480 parts and while one part is being
displayed, another one can be freely filled with the data for the next frame.
This will effectively remove the need for OMAPFB_SYNC_GFX, improving
peak framerate.

But this solution will require support for arbitrary downscaling in YUV420
format for each video frame to fit 400x480 box. The quality will be also
reduced a bit, but on the other hand, graphics bus should have no 
performance problems with sending 400x480 through it.

If virtual framebuffer size could be extended to 800x960, this would allow us
to use doublebuffering without sacrificing resolution. Anyway, I'll try to
fix MPlayer framebuffer output module to properly work with the latest
version of N800 firmware and implement this form of doublebuffering. It 
should provide the fastest video output performance that is possible.

Regarding Nokia 770, now it uses 800x600 framebuffer virtual size (some
extra waste of RAM?). Anyway, if hwa742 kernel driver could be extended to
support this improved screen update API and respect 'out_x' and 'out_y'
arguments, we could have four video pages in framebufer memory for 
400x240 pixel doubled video output. It could allow to implement a very 
efficient double buffering for accelerated Nokia 770 SDL project if it ever
takes off the ground :)

 Does mplayer use different threads for displaying and decoding and decode
 frames in advance? 

No, it doesn't have any extra threads now. But video playback on Nokia 770 
is already parallel, splitting tasks between the following pieces of hardware
each working simultaneously:
1. ARM core (demuxing and decoding video into framebuffer)
2. DMA + graphics controller (screen update transferring data from framebuffer
into videomemory and performaing YUV-RGB conversion on the fly)
3. C55x DSP core (mp3 audio decoding and playback)

There is not much point in creating many threads on ARM, as we only have a
single ARM core and splitting work into several threads will not accelerate
overall performance. Threads could be useful for doing something extra while
waiting for other hardware components to finish their work (waiting for screen
update for example), but decoding ahead will also require storing the decoded
data somewhere. This place for storing decoded ahead frames could be only
some extra space in framebuffer memory, otherwise we would lose some
performance on moving this data to framebuffer later (and increasing battery
power consumption). As framebuffer space is limited, we would not be able to
store many frames ahead, and decoding cpu usage most likely varies not 
between frames but more like between different scenes (complicated action
scene will make us run out of decode ahead buffer pretty fast). Anyway,
probably this may be worth trying later, there even exists some threads based
MPlayer fork: http://mplayerxp.sourceforge.net/
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Nokia 770 Video playback

2007-05-06 Thread Siarhei Siamashka

On Monday 30 April 2007 14:27, Siarhei Siamashka wrote:

 I also tried to use YUV420 on Nokia 770, but it did not work well.
 According to Epson, this format should be supported by hardware. Also there
 is a constant OMAPFB_COLOR_YUV420 defined in omapfb.h in Nokia 770 kernel
 sources. But actually using YUV420 was not very successful. Full screen
 update 800x480 in YUV420 seems to deadlock Nokia 770. Playback of centered
 640x480 video in YUV420 format was a bit better, at least I could decipher
 what's on the screen. But anyway, it looked like an old broken TV :) Image
 was not fixed but floating up and down, there were mirrors, tearings, some
 color distortion, etc. After video playback finished, the screen remained
 in inconsistent state with a striped garbage displayed on it. Starting
 video playback with YUY2 output fixed it. But anyway, looks like YUV420 is
 not supported properly in the framebuffer driver from the latest OS2006
 kernel. That's bad, it could provide ~30% improvement in video output
 perfrmance for Nokia 770. Maybe upgrading framebuffer driver can fix this
 issue (and add tearsync support).

By doing a quick kernel framebuffer code review, looks like the problem may
reside in the following fragment from hwa742.c:

switch (color_mode) {
...
case OMAPFB_COLOR_YUV420:
bpp = 12;
conv = 0x09;
transl = 0x25;
break;
...
}

...
set_window_regs(x, y, x + w, y + h);

offset = (scr_width * y + x) * bpp / 8;

hwa742.int_ctrl-setup_plane(OMAPFB_PLANE_GFX,
OMAPFB_CHANNEL_OUT_LCD, offset, scr_width, 0, 0, w, h,
color_mode);

hwa742.extif-set_bits_per_cycle(16);

hwa742.int_ctrl-enable_plane(OMAPFB_PLANE_GFX, 1);
hwa742.extif-transfer_area(w, h, request_complete, req);


As far as understand it, this code notifies the graphics chip about what
screen area it is going to update and starts DMA transfer to fill it with
data. But a similar code from 'blizzard.c' also does width correction before
'transfer_area' by doing 'w = (w + 1) * 3 / 4;'. Looks like code from hwa742.c
attempts to transfer more data than the graphics chip expects for YUV420
format. This can explain vertical image drift observed in my previous
experiments (for 640x480 area starting at 0,80), also excess data may 
deadlock the graphics chip (for the test with 800x480 area starting at 0,0).

Also the 'offset' may be calculated wrong, see [1] for some bits of
information about YUV420 framebuffer layout on N800. Starting location 
should be probably always calculated assuming 16bpp framebuffer layout.

Now I wonder who can be considered upstream for this kernel framebuffer
driver? I guess reporting a bug at maemo bugzilla is pointless as there are
no official updates for Nokia 770 planned. I wonder why this driver is still
getting some updates in newer versions of linux omap kernel and what
hardware is used to verify that it works? Is it still tested on Nokia 770?

1. http://maemo.org/pipermail/maemo-developers/2007-May/010039.html
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: N800 Video playback

2007-05-06 Thread Siarhei Siamashka

On Friday 04 May 2007 10:49, Daniel Stone wrote:
 On Thu, May 03, 2007 at 11:10:32PM +0300, ext Siarhei Siamashka wrote:
  Well, found what's the matter and added explanation at bugzilla:
  https://maemo.org/bugzilla/show_bug.cgi?id=1281
 
  The workaround can be easily added to MPlayer, so that it will
  never call XvShmPutImage with top left image corner at an odd line.
  I'm going to release an updated MPlayer package (maybe even
  a bit later today), it is really fast on N800 with the optimized xserver
  :)

 Aha, that will indeed cause a fallback (x, y, width and height should
 all be aligned to 4px).

Could you clarify this information? The code from kernel framebuffer 
driver (blizzard.c) suggests that only width should be 4px aligned:

switch (color_mode) {
case OMAPFB_COLOR_YUV420:
/* Embedded window with different color mode */
bpp = 12;
/* X, Y, height must be aligned at 2, width at 4 pixels */
x = ~1;
y = ~1;
height = yspan = height  ~1;
width = width  ~3;
break;

Does xserver introduce additional limitations?
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: Xsp pixel-doubling solutions for Nokia 770?

2007-05-06 Thread Siarhei Siamashka

On Thursday 03 May 2007 12:58, Eero Tamminen wrote:

 ext Siarhei Siamashka wrote:
  What problem with using framebuffer directly? Everything should be
  fine, you can get notifications from xserver when your window becomes
  obscured, so you can stop drawing. I suggest you to try MPlayer on
  Nokia 770 to check how it interacts with xserver. The worst thing that
  can happen is some garbage data left on screen on fast applications
  switching. That can happen because there is no support to synchronize
  access to framebuffer in a reliable way (application using framebuffer
  directly may get notification from the xserver about getting inactive
  too late and overwrite some other application window).
 
  Adding support to xserver for proper synchronization with direct
  framebuffer access applications should be quite possible. It already
  synchronizes access to framebuffer with DSP (Xsp API for registering
  DSP area). Almost all the necessary changes will probably have to be
  added at the same places in xserver which support interaction with
  DSP.

 AFAIK Xserver requests  waits DSP to stop updating the framebuffer
 before proceeding.  This works with HW, but you cannot expect it to
 work reliably with misc X clients as there are no guarantees about
 what they do. If client is not processing X events, the response would
 be waited forever and device freezes.  If X server has some timeout for
 the client reply, then the server and client can be updating the
 framebuffer at the same time and that was what we wanted to avoid
 in the first place.

Timeout is a perfectly valid solution in my opinion. It just requires that
xserver and some thin wrapper library used by misc clients (SDL) both 
interact correctly. Interface of this wrapper library should be designed
in such a way, that it is safe and hard to be misused (special timeout 
code which automatically terminates the process which refuses to give
framebuffer back to xserver).

I may provide some extra details about my vision of implementation 
details if anybody is interested.

  I guess it can't be helped and I will make an example application for
  using framebuffer directly and some kind of tutorial. Don't know when
  I will have enough free time to do this though.
 
  I'm pretty much confident that direct framebuffer access (also with
  pixel doubling support) is quite possible for SDL. I don't care much
  if you believe me or not :) Someone just needs to do the dirty work
  and implement all this stuff.

 Yes, it just cannot be done safely / reliably.  

I can't be completely sure, but I think it is possible to do safely/reliably.
At least it is worth trying in my opinion. The difference in our views is
that you see xserver as the only valid Nokia 770 citizen and everything
else looks like a very ugly hack to you. 

I see the problem from the completely different perspective. For many 
games xserver is irrelevant, they use SDL API and that is what they care
about (xserver is just an additional extra layer). Game developers would like
to have a fast and reliable SDL implementation which could make efficient use
of all the hardware features that can benefit games. If xserver can provide
all of this with some standard or nonstandard extensions, that's fine. We only
need to estimate the amount of development resources and time needed to 
do these modifications to xserver and SDL to make use of these features. As
games are not a primary target for Internet Tablets, I doubt that anything
like this will be officially done any time soon (at least before the Nokia 770
end of life). Am I wrong?

In this case tweaking SDL to use framebuffer directly may have a much
lower cost. Especially considering that you have already solved this
framebuffer sharing problem for DSP video playback. I did not suggest
anything completely new ;)

It is not quite related, but games also need a reliable and low latency method
to play sounds. Current esd daemon solution is not very good for games. Maybe
modifying SDL to deal with dsp tasks directly can provide some improvement.
Also it would be very nice if SDL_Mixer could use dsp codecs transparently
without any extra hacks to play mp3 music.

 But for hackers it's enough that it works when it works I guess. :-)

I'm not sure if I can consider myself a hacker :) Something that just works
is perfectly enough for a prototype. But a production system needs a reliable
solution, hence I'm trying to start discussing the implementation details.

SDL optimization for Nokia 770 might be an interesting task for some student
with lots of free time.

In any case, trying alternative solutions is a good thing, it drives the
progress, allows us to improve our skills and gain experience. We get 
the best solution (whatever it would be) and everyone benefits as a 
result :)
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: Xsp pixel-doubling solutions for Nokia 770?

2007-05-03 Thread Siarhei Siamashka


On 5/3/07, Eero Tamminen [EMAIL PROTECTED] wrote:

[...]


Same problem as using framebuffer directly.  How user switches
to another application?   How to invoke power menu properly etc.


What problem with using framebuffer directly? Everything should be
fine, you can get notifications from xserver when your window becomes
obscured, so you can stop drawing. I suggest you to try MPlayer on
Nokia 770 to check how it interacts with xserver. The worst thing that
can happen is some garbage data left on screen on fast applications
switching. That can happen because there is no support to synchronize
access to framebuffer in a reliable way (application using framebuffer
directly may get notification from the xserver about getting inactive
too late and overwrite some other application window).

Adding support to xserver for proper synchronization with direct
framebuffer access applications should be quite possible. It already
synchronizes access to framebuffer with DSP (Xsp API for registering
DSP area). Almost all the necessary changes will probably have to be
added at the same places in xserver which support interaction with
DSP.

I guess it can't be helped and I will make an example application for
using framebuffer directly and some kind of tutorial. Don't know when
I will have enough free time to do this though.

I'm pretty much confident that direct framebuffer access (also with
pixel doubling support) is quite possible for SDL. I don't care much
if you believe me or not :) Someone just needs to do the dirty work
and implement all this stuff.
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: N800 Video playback

2007-05-02 Thread Siarhei Siamashka

On Tuesday 01 May 2007 20:49, Siarhei Siamashka wrote:

Looks like I have to reply to myself.

 On Tuesday 01 May 2007 17:49, Kalle Vahlman wrote:
  Applied and build without problems for me.

 Thanks a lot for building the package and putting it for download,
 everything seems to be fine, but more details will follow below.

[snip]

 Anyway, the new xserver package works really good. If we do some tests with
 the standard Nokia_N800.avi video clip, we get the following results with
 the patched xserver:

 #  mplayer -benchmark -quiet -noaspect Nokia_N800.avi
 BENCHMARKs: VC:  29,764s VO:   7,666s A:   0,468s Sys:  64,635s =  102,534s
 BENCHMARK%: VC: 29,0287% VO:  7,4767% A:  0,4565% Sys: 63,0381% = 100,%
 BENCHMARKn: disp: 2504 (24,42 fps)  drop: 0 (0%)  total: 2504 (24,42 fps)

 #  mplayer -benchmark -quiet -noaspect -dr -nomenu Nokia_N800.avi
 BENCHMARKs: VC:  30,266s VO:   5,490s A:   0,467s Sys:  66,286s =  102,509s
 BENCHMARK%: VC: 29,5255% VO:  5,3554% A:  0,4560% Sys: 64,6631% = 100,%
 BENCHMARKn: disp: 2501 (24,40 fps)  drop: 0 (0%)  total: 2501 (24,40 fps)

 Results with unpatched xserver and some more explanations can be found in
 [3]. 
 Yes, now N800 is faster than Nokia 770 for video output performance at 
 last :)

Well, still not everything is so good until the following bug gets fixed:
https://maemo.org/bugzilla/show_bug.cgi?id=1281

The patch for optimized Xv performance will not help to watch widescreen
video which triggers this tearing bug. If you see tearing on the screen, you
should know that the YUV420 color format conversion optimization patch 
does not  get used at all and xserver most likely uses a slow nonoptimized
YUV422 fallback code with software scaling.

Fixing this bug is critical for video playback performance. I hope it will be
solved in the next version of N800 firmware too. But it we get some patch to
solve this problem for testing earlir, that would be nice too.

 Video output overhead on N800 is really at least halved. Of course, video
 output takes only some fraction of time in video player. So overall
 performance improvement for Nokia_N800.avi playback is approximately 20%
 but not 250%-300% which can be observed for  'omapCopyPlanarDataYUV420'
 function alone.

Before anybody noticed, correcting myself :) This 'omapCopyPlanarDataYUV420'
has 2.5x-3x improvement which is equal to 150%-200% in percents. Elementary
arithmetics is tough when you are tired
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: Xsp pixel-doubling solutions for Nokia 770?

2007-05-02 Thread Siarhei Siamashka

On Monday 30 April 2007 12:26, Frantisek Dufka wrote:

 Daniel Stone wrote:
  Specifying that pixels must be exactly _doubled_ is a
  hack around both the performance issues and a lack of resolution
  independence.  Apparently an important one, if you happen to like SDL
  games, but a hack nonetheless.

 Yes limiting ourselves to doubling is bad. Why not to add custom ratio
 if N800 can do that.

 This all leads to request to have some more advanced gaming API. Sadly
 this is probably not what internet tablets are currently designed for.
 Gamers are big target group and this device is meant for entertainment
 so maybe extending target audience to gamers in not that bad idea.
 Gaming devices are moving online too so this is direct competition. Why
 to buy internet tablet if better Sony or Nintendo device in future will
 do this too plus gaming. Unfortunately gaming business has complicated
 rules similar in complexity to devices with GSM radio. BTW are internet
 tablets in same Nokia multimedia division as N-Gage?

Well, SDL is to some extent this advanced gaming API, its current
implementation for Nokia 770/N800 is just poor.

As for pixel doubling, a practical solution would be just to support 400x240
fullscreen resolution in SDL so that no extra hack would be required when
porting each game or emulator in particular. N800 hardware probably 
makes it possible to set any resolution up to 800x480, with all this available
using standard SDL API.

Having support for both pixel doubled and normal graphics in the same game 
may be useful, but it will require extra efforts when porting games, while low
resolution may already work out of the box without doing any tweaks to the
sources. Let's try the simple solution first.

The very first step would be to take Nokia 770 xserver and SDL sources and
tweak them until setting 400x240 fullscreen resolution works transparently for
any SDL applications. Anybody up to this task?

Also it would be a good idea to benchmark SDL, identify maemo or ARM
architecture related bottlenecks and try to fix them. Many older generation 
games worked perfectly on hardware way slower than Nokia 770. So 
Nokia 770 may be a good platform for mobile gaming if properly 
optimized (though I'm not sure about realtime games because of 
unsuitable controls). I could probably do these optimizations myself, but 
have quite a limited amount of free time available for free software
development.
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: Xsp pixel-doubling solutions for Nokia 770?

2007-05-02 Thread Siarhei Siamashka

On Wednesday 02 May 2007 20:40, Daniel Stone wrote:
  For the use case which is being described here - namely always full
  screen applications which need exclusive access to the display at a
  lower resolution Why not do something like switch to another VT and do
  it directly on the framebuffer ? and then wrap this with something
  that makes sure you can always safely return to/from X - maybe
  something managed through systemUI or some such. This is a different
  approach but could prove simpler in the long run though I havn't
  thought long and hard about it so there could be some obvious
  downsides - More a random idea :)

 Egh, my eyes!  Dealing with input in particular could be a pain.

This is what works for MPlayer on Nokia 770. It creates x11 window just
to reserve some screen space and prevent other applications from using 
it. After that, it renders data directly to framebuffer and uses x11 for
input. It is not very clean, but it works. And it works fast. The same trick
can be probably done for SDL.

Here is a link to the old discussion in the mailing list with the initial
idea: http://maemo.org/pipermail/maemo-developers/2006-December/006646.html
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: Toolchain upgrade? (Was: Instructions cache flush on ARM)

2007-05-02 Thread Siarhei Siamashka

On Wednesday 02 May 2007 18:48, Eero Tamminen wrote:
 On x86 I prefer valgrind/cachegrind/callgrind/kcachegrind as
 that way one can browse the source code interactively with
 the profiling information.  Getting to know how the source
 really works is sometimes more useful than knowing the exact
 bottleneckedness percentage of some function.

Sure, I'm also using valgrind/cachegrind/callgrind/kcachegrind in my 
work quite often. It's a very nice tool.

But callgrind for statistics does not provide information about floating point
math and integer divisions, so real results on ARM may be really very
different.

Also cache behaviour on Nokia 770 arm926ej-s core is very different from 
cache on x86. Actually arm926ej-s  does not allocate cache line on write 
miss and all the x86 cpus do. This makes very big difference for the code
which does lots of writes to uncached memory. Cachegrind only simulates
write-allocate cache.

I created the following patch for simulating read-allocate behaviour 
in callgrind (for more precise arm926ej-s simulation):
http://ufo2000.sourceforge.net/files/vg-read-allocate-cache-patch.diff

Though arm1136jf-s core from N800 now supports write-allocate cache 
and this patch is not needed when optimizing for N800 :)

  Did anybody try installing newer toolchains in scratchbox and use them
  with maemo SDK? I just don't have much free time for these experiments
  and don't want to break my installation of scratchbox which works now
  (more or less acceptable)

 Installing new toolchains for Sbox shouldn't be a problem (if it's
 already available for it) and you can make a new Sbox target for each
 toolchain you want to test.

Thanks, I'll try that. In my preliminary tests, mplayer becomes a few percents
faster for mpeg4 decoding when switching to gcc 4.1.1 (tested a build compiled
with a crosscompiler outside scratchbox, with no audio/video output except for
SDL, so not really useful for end users, but fine for benchmarking with
gprof).

  Building packages with new toolchain would probably need to have
  libstdc++ linked statically for C++ applications to work on 770/N800, but
  otherwise everything should be fine.

 Actually, you cannot really build static binaries with Glibc.
 It links some stuff always dynamically (nss for example).
 I don't know whether this is a problem in practice though.

I'm not going to statically link with glibc, but only with libstdc++ (standard
c++ library). There are a few known tricks to make gcc link with libstdc++
statically, but dynamically with all the rest of libraries. One of them is
creating a symlink to libstdc++.a in some empty directory and specify this
directory with -L option in gcc command line. When gcc will start linking,
it will be fooled to link with a static libstdc++ library. But I guess just
killing libstdc++.so in scratchbox will do the the job. After that, the
compiler theoretically should create binaries which should run with no
problems on the device even for c++ applications.

  http://arm.com/documentation/ARMProcessor_Cores/index.html
  'ARM1136JF-S and ARM1136J-S r1p1 Technical Reference Manual'
  Chapter 4 'Unaligned and Mixed-Endian Data Access Support'

 Did you read the section on ARMv6 unaligned data access restrictions?
 Basically it doesn't work in all cases, the accesses are not atomic and
 have performance implications.

Did you also read Intel docs? Unaligned access has some restrictions
on x86 as well. Do you have an example of some practical case where 
hardware unaligned support from ARM11 would work worse than on x86?

The compiler should do the job aligning data for performance reasons (as it
does on x86 as well). But if you happen to have some unaligned data in 
memory anyway, just reading it with some minor unavoidable performance 
penalty will be faster than reading data one byte at a time and combining it
into a 32-bit or 16-bit value (instructions timings can be also found in this
Technical Reference Manual).

Enabling hardware unaligned access support should make explicit pointer
conversion hacks that are sometimes used in not very portable C code work
just like they do on x86. Which is a good thing in my opinion.

  As ARM11 core used in N800 is little endian, does have floating point
  unit and supports unaligned memory access in hardware (which only needs
  to be enabled). It probably doesn't have any serious portability issues
  to be aware of anymore and vast majority of software initially developed
  for x86 should be easy to compile and run on it even without doing any
  modifications.

 Compiler aligns everything correctly if your code is correct.
 I think non-aligned code is bug and performance issue.

In the real world such buggy code unfortunately exists. And it works fine on
x86 which is probably the most widely used platform for software development.

  Enabling unaligned memory support will make life much easier for
  developers unfamiliar with ARM platform. The number of applications for

Re: Xsp pixel-doubling solutions for Nokia 770?

2007-05-02 Thread Siarhei Siamashka

On Wednesday 02 May 2007 23:01, Arnim Sauerbier wrote:

 If the memcpy on 770 is something like 190MB/s, pushing 800x480 at 30fps
 would use only 12 percent of that bandwidth

I'm sorry, I was the source of this misleading information, I forgot that
you are a Nokia 770 user and mentioned some numbers from N800.

I measured the memory bandwidth as ~170-190MB/s for memcpy 
and ~410MB/s for memset on N800.

The same numbers on Nokia 770 are ~70-100MB/s for memcpy (depending on
relative source and destination buffers alignment) and ~120MB/s for memset
with the standard glibc functions.

These glibc functions have ARM assembly optimizations developed by Nicolas
Pitre from MontaVista Software, Inc., but according to comments found inside,
they were developed for XScale cpus. Such code is not so good for Nokia 770
and can be replaced with something better.

Using some arm926ej-s specific optimizations, it is possible to get
~100-110MB/s for memcpy and ~270MB/s for memset on Nokia 770. 

More details and a link to the necessary code can be found here:
https://maemo.org/bugzilla/show_bug.cgi?id=733 

Maybe it is time to try getting these optimized functions integrated into
glibc for use on Nokia 770? Surely they need to be tested a bit more 
first. But improving core system components (glibc, xserver, SDL, ..) may 
help to make Nokia 770 at least a bit faster and more competitive.

Any comments are surely welcome. I wonder if it would be possible to 
get a community improved firmware for Nokia 770 created (with bootmenu
included, improvements to the kernel, gstreamer ogg vorbis support out of  
the box, some performance optimizations and bugfixes) and become
available for download somewhere? Because of proprietary parts, probably
this firmware should be hosted by Nokia in the standard place where the 
user needs to enter serial number to download it? Of course it would be
unofficial and unsupported just like the hacker's edition.
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: N800 Video playback

2007-05-02 Thread Siarhei Siamashka

On Wednesday 02 May 2007 12:54, Daniel Stone wrote:
  The 'framebuffer' is just the ordinary system memory, converting color
  format and copying data to framebuffer will be done with the same
  performance as simulated in this test. RFBI performance is only critical
  for asynchronous DMA data transfer to LCD controller which does not
  introduce any overhead and is performed at the same time as ARM core is
  doing some other work (decoding the next frame). RFBI performance matters
  only if data transfer to LCD is still not complete at the time when the
  next frame is already decoded and is ready to be displayed. When playing
  video, ARM core and LCD controller are almost always working at the same
  time performing different tasks in parallel. I think I had already
  explained these details in [1]

 Right.  My point is that the numbers you're showing -- while very good,
 don't get me wrong -- won't necessarily have a huge direct impact on
 video playback.  Particularly if you want to avoid tearing.

I have no idea what other proof would be enough for you. You already got all
the numbers, and even benchmarks with patched xserver. They all confirm
video output performance improvement.

  So now the results of the tests are consistent - when doing video output,
  most of ARM core cycles are spent in this 'omapCopyPlanarDataYUV420'
  function.

 Well, either that, or just waiting for RFBI transfers to complete.

You need to wait a bit before displaying the next frame anyway, and 
the period between frames for 30 fps video usually eclipses transfer
completion time. If you want some numbers, now 640x480 YUV420 (12bpp) 
screen update takes now 25ms without tearsync flag enabled 
(OMAPFB_FORMAT_FLAG_TEARSYNC for OMAPFB_UPDATE_WINDOW 
ioctl) and 25-42ms with tearsync. For 30 fps video, period between
performing screen updates is normally 33ms. For playing video, we
initiate RFBI transfer, wait till it completes, perform VY12-YUV420 color
format conversion (which should take less than 4ms for 640x480 
considering benmchmark results), wait till it is time to display the next
frame and start RFBI transfer again. For 30 fps video 25ms+4ms is less 
than 33ms, so without tearsync enabled, any 640x480 video should play
fine (considering video output performance). With tearsync enabled, we 
should add the time needed for performing vertical sync in LCD controller
which breaks our nice numbers. Worst case (17ms wait for retrace + 25ms
for actual data transfer) takes more time than 33ms between frames.
We can be saved if LCD controller internal refresh rate is really 60Hz,
it this case video playback will automagically synchronize to LCD refresh 
rate and each frame processing will be done exactly within 2 LCD refresh
cycles (by the time we want to display a video frame, the next vertical will
be near and we will not lose much time waiting for it). If decoding time for
each frame will never exceed 28-29ms (which is a tough limitation, cpu 
usage is not uniform), video playback without dropping any frames will be
possible even with tearsync enabled. That's what I'm investigating now.
In any case, getting ideal 24 fps playback will be a bit easier.

I hope all these explanations are clear now. And this is not just a theory,
but already confirmed by some experiments and practical tests.

 I'm still using Scratchbox 0.9.8.5 for day-to-day stuff ...

Thanks, that is what I would consider 'additional tips and tricks' :)

It is good to know that maemo 3.x development can be also done with 
older scratchbox (I have 0.9.8.8 installed now), I'll try it without upgrading
scratchbox then.

  Well, anyway, everything worked perfectly and I could play 640x480 video
  on N800 with the following statistics:
 
  VIDEO:  [DIVX]  640x480  12bpp  23.976 fps  886.7 kbps (108.2 kbyte/s)
  ...
  BENCHMARKs: VC:  87,757s VO:   8,712s A:   1,314s Sys:   3,835s = 
  101,618s BENCHMARK%: VC: 86,3592% VO:  8,5736% A:  1,2932% Sys:  3,7740%
  = 100,% BENCHMARKn: disp: 2044 (20,11 fps)  drop: 355 (14%)  total:
  2399 (23,61 fps)
 
  As you see, mplayer took 8.712 seconds to display 2044 VGA resolution
  frames. If we do the necessary calculations, that's 72 millions pixels
  per second, quite close to 'yv12_to_yuv420_line_armv6' capabilities
  limit, so this function is the only major contributor to video output
  time. Video output took much less time than decoding, so it proves that
  video output overhead can be reduced to minimum (in this test tearsync
  was not used though).

 I'd be curious to see the results from this with tearsync _enabled_?
 i.e., after your OMAPFB_UPDATE_WIDNOW call, issue an OMAPFB_SYNC_GFX
 ioctl before you start writing to memory again.  This is basically the
 limiter for us at this stage.

That's exactly how MPlayer works. It always waits on OMAPFB_SYNC_GFX 
before filling framebuffer with the data for the next frame. Not issuing
OMAPFB_SYNC_GFX would introduce *artificial* tearing not related to sync
with LCD

Re: N800 Video playback

2007-05-02 Thread Siarhei Siamashka

On Wednesday 02 May 2007 12:39, Daniel Stone wrote:
 On Wed, May 02, 2007 at 09:16:01AM +0300, ext Siarhei Siamashka wrote:
  On Tuesday 01 May 2007 20:49, Siarhei Siamashka wrote:
   Results with unpatched xserver and some more explanations can be found
   in [3].
   Yes, now N800 is faster than Nokia 770 for video output performance at
   last :)
 
  Well, still not everything is so good until the following bug gets fixed:
  https://maemo.org/bugzilla/show_bug.cgi?id=1281
 
  The patch for optimized Xv performance will not help to watch widescreen
  video which triggers this tearing bug. If you see tearing on the screen,
  you should know that the YUV420 color format conversion optimization
  patch does not  get used at all and xserver most likely uses a slow
  nonoptimized YUV422 fallback code with software scaling.

 Indeed.  And the reason the code is there is because Hailstorm can only
 downscale at fixed ratios (half and one-quarter), and even then, it
 locked up when we tried.  Similarly, the display controller's
 downscaling didn't work, either.  So we can optimise the fallback path,
 but you'll still be screwed by sending 16bpp (instead of 12bpp) through
 RFBI.

The only thing which is unclear here is that Hailstorm does not need to
downscale video in this situation. The bug can be reproduced with 512x288
video which just needs upscaling to 800x450. Also even standard 
Nokia_N800.avi video with proper aspect ratio causes a huge 
performance regression and tearing.

Please give this #1281 issue another look. It looks like a bug in xserver,
but not a hardware limitation. I can probably try to workaround it by
requesting not 512x288 buffer from Xv, but something like 512x308, use
only 512x288 part of it and artificially add black bands above and below.
After that, Xv can be asked to expand it to 800x480 to get expected result
But if it is a bug in xserver, it would be better to get it fixed, preferably
before the next firmware update :)

  Fixing this bug is critical for video playback performance. I hope it
  will be solved in the next version of N800 firmware too. But it we get
  some patch to solve this problem for testing earlir, that would be nice
  too.

 The only patch is optimising that function, really.  Even if we did work
 out a way to make Hailstorm happy, you can still only scale at those
 exact multiples, which doesn't make it a viable general solution.

I will do optimized software YV12-YUV420 JIT scaler a bit later (on next
weekend?). It will be only a minor modification of YV12-YUV422 scaler 
which already exists and works fine. If it can be useful for xserver, it might
be added there at any time.
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: N800 Video playback

2007-05-01 Thread Siarhei Siamashka

On Tuesday 01 May 2007 13:36, Kalle Vahlman wrote:
 2007/5/1, Siarhei Siamashka [EMAIL PROTECTED]:
  OK, thanks. It may take some time though. I'm still using old scratchbox
  with mistral SDK here (did not have enough free time to upgrade yet).
  Until I clean up my scratchbox mess, I can only provide some patch
  without testing, if anybody courageous can try to build it :)

 Given that I fear not the perils of building a X server with
 nonstandard options[1], I shall be more than happy to conduct such
 adventurous acts :)

 And unless Mr. Kulve has objections, the results could be installed
 from a repository as well.

 [1] 
 
http://syslog.movial.fi/archives/47-Shadows-for-everyone-well,-not-really.html

OK, here is this untested a patch for xserver to add ARMv6 optimized 
YUV420 color format conversion. Theoretically it should compile
(I did not try to build xserver myself though) and work. If it refuses to
compile, fixing the patch should be not too difficult.

In the worst case only video playback may be broked. But if everything works
as expected, video output performance should become a lot better.

Video output performance can be tested by mplayer using -benchmark 
option, 'VO:' stat shows how much time was used for video output, 'VC:' stat
shows how much time was used for video decoding.

Built-in video player also should become faster. I don't know if this
improvement can be 'scientifically' benchmarked, but it should drop less
frames on high resolution video playback.

If any of you can build xserver package with this patch, please put it for
download somewhere or send directly to me.

Thanks.
diff -u -r -N xorg-server-1.1.99.3/hw/kdrive/omap/Makefile.am xorg-server-1.1.99.3.yuv420patch/hw/kdrive/omap/Makefile.am
--- xorg-server-1.1.99.3/hw/kdrive/omap/Makefile.am	2007-03-05 16:17:32.0 +0200
+++ xorg-server-1.1.99.3.yuv420patch/hw/kdrive/omap/Makefile.am	2007-05-01 15:04:43.0 +0300
@@ -1,5 +1,5 @@
 if XV
-XV_SRCS = omap_video.c
+XV_SRCS = omap_video.c omap_colorconv.S omap_colorconv.h
 endif
 
 if DEBUG
@@ -34,4 +34,4 @@
 	$(TSLIB_FLAG)		\
 	$(DYNSYMS)
 
-EXTRA_DIST = omap_video.c
+EXTRA_DIST = omap_video.c omap_colorconv.S omap_colorconv.h
diff -u -r -N xorg-server-1.1.99.3/hw/kdrive/omap/omap_colorconv.h xorg-server-1.1.99.3.yuv420patch/hw/kdrive/omap/omap_colorconv.h
--- xorg-server-1.1.99.3/hw/kdrive/omap/omap_colorconv.h	1970-01-01 03:00:00.0 +0300
+++ xorg-server-1.1.99.3.yuv420patch/hw/kdrive/omap/omap_colorconv.h	2007-05-01 15:06:13.0 +0300
@@ -0,0 +1,45 @@
+/*
+ * Copyright © 2007 Siarhei Siamashka
+ *
+ * Permission to use, copy, modify, distribute and sell this software and its
+ * documentation for any purpose is hereby granted without fee, provided that
+ * the above copyright notice appear in all copies and that both that
+ * copyright notice and this permission notice appear in supporting
+ * documentation, and that the names of the authors and/or copyright holders
+ * not be used in advertising or publicity pertaining to distribution of the
+ * software without specific, written prior permission.  The authors and
+ * copyright holders make no representations about the suitability of this
+ * software for any purpose.  It is provided as is without any express
+ * or implied warranty.
+ *
+ * THE AUTHORS AND COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO
+ * THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
+ * FITNESS, IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR
+ * ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER
+ * RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF
+ * CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
+ * CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ *
+ * Author: Siarhei Siamashka [EMAIL PROTECTED]
+ */
+
+/*
+ * ARMv6 assembly optimized color format conversion functions
+ * (planar YV12 to some custom YUV420 format used by graphics chip in Nokia N800)
+ */
+
+#ifndef _OMAP_COLORCONV_H_
+#define _OMAP_COLORCONV_H_
+
+#include stdint.h
+
+/**
+ * Convert a line of pixels from YV12 to YUV420 color format
+ * @param dst   - destination buffer for YUV420 pixel data, it should be at least 16-bit aligned
+ * @param src_y - pointer to Y plane, it should be 16-bit aligned
+ * @param src_c - pointer to chroma plane (U for even lines, V for odd lines)
+ * @param w - number of pixels to convert (should be multiple of 4)
+ */
+void yv12_to_yuv420_line_armv6(uint16_t *dst, const uint16_t *src_y, const uint8_t *src_c, int w);
+
+#endif
diff -u -r -N xorg-server-1.1.99.3/hw/kdrive/omap/omap_colorconv.S xorg-server-1.1.99.3.yuv420patch/hw/kdrive/omap/omap_colorconv.S
--- xorg-server-1.1.99.3/hw/kdrive/omap/omap_colorconv.S	1970-01-01 03:00:00.0 +0300
+++ xorg-server-1.1.99.3.yuv420patch/hw/kdrive/omap/omap_colorconv.S	2007-05-01 15:06:36.0 +0300
@@ -0,0 +1,244 @@
+/*
+ * Copyright © 2007 Siarhei

Re: N800 Video playback

2007-05-01 Thread Siarhei Siamashka

On Tuesday 01 May 2007 17:49, Kalle Vahlman wrote:
  OK, here is this untested a patch for xserver to add ARMv6 optimized
  YUV420 color format conversion. Theoretically it should compile
  (I did not try to build xserver myself though) and work. If it refuses to
  compile, fixing the patch should be not too difficult.

 Applied and build without problems for me.

Thanks a lot for building the package and putting it for download, everything
seems to be fine, but more details will follow below.

 For testing, I fabricated some video with gstreamer:

 which resulted in [EMAIL PROTECTED] and [EMAIL PROTECTED] videos. For some
 reason 320x240 and 352x288 refused to play with:

 X11 error: BadValue (integer parameter out of range for operation)
 MPlayer interrupted by signal 6 in module: flip_page while gstreamer did
 play them just fine. Also the Nokia_N800.avi and  NokiaN93.avi died in the
 same way. 

This X11 error on video playback start and also sometimes on switching
fullscreen/windowed mode is a known problem [1] reported in this mailing list.

If MPlayer dies on start, usually trying to start it again succeeds. So these
320x240 and 352x288 videos could be played as well if you were a bit more
persistent :)

As Daniel replied in one of the followup messages, it is most likely some race
condition. The question is which code is a suspect. Is it MPlayer Xv video
output code that has been around for ages and worked fine on different systems
or relatively new Xv extension code from N800 xserver? In addition, a previous
revision of N800 firmware had a serious bug [2] related to video playback. It
should be noted, that MPlayer needed only about 1 minute to freeze on the
initial N800 firmware. So the problem could be identified much more easily
if MPlayer was included in the standard set of tests done by Nokia QA staff
before each new IT OS release. Surely, Nokia is only interested in a
properly working xvimagesink for the software included in IT OS by default.
But testing with more client applications can improve overall xserver quality.

With all that said, I don't know if MPlayer Xv code is bugfree, it wasn't me
who developed it.

 My mplayer is compiled from the svn
 trunk of the garage project, with some additional cflags I use (so
 maybe those were the problem...).

Do you have a set of cflags settings which work better than the default set?
Can you share this information?

 There's something fishy in the decoding or something as the color bars
 in the test video were broken (yellow and cyan to be precise), but
 that seemed to be the case in a vanilla image too so nothing to do
 with this patch. I could not see any other glitches in the output.

 But on to the results:

 VIDEO:  [DX50]  640x480  24bpp  30.000 fps  1597.6 kbps (195.0 kbyte/s)
[snip]
 VIDEO:  [DX50]  800x480  24bpp  30.000 fps  1976.5 kbps (241.3 kbyte/s)
[snip]
 There is a clear drop in amount of time used to output the videos for
 800x480 (the numbers were stable trough multiple runs).

 So I gather from the 10s benchmark time that we didn't get to real
 time yet, but close to it? And of course this is just video, audio
 decoding should be considered for real video playback performance
 measurement.

These videos are way too heavy for N800 to decode and play in realtime. We
may expect playback for videos up to 640x480 resolution with 1000kbps 
bitrate and 24fps. This is probably current realistic limit which can be
achieved. Some minor variations to these parameters are possible (for example
we can get 30fps, but should also reduce resolution or bitrate, etc.).

If you want a guaranteed video playback with divx/xvid/mpeg4 codecs, you
should restrict to 512x384 resolution or lower and keep bitrate reasonable.

The results for these 'insane' videos you have posted are somewhat weird, a
complete statistics would  require also a number of frames dropped, otherwise
we don't know how much work was done by the player. Probably missing audio
track resulted in MPlayer not being able to provide a proper report. Don't
know. Also it is strange that you did not see any improvement at all for
640x480 video, are you sure you really tested it with the patched xserver?

Anyway, the new xserver package works really good. If we do some tests with
the standard Nokia_N800.avi video clip, we get the following results with the
patched xserver:

#  mplayer -benchmark -quiet -noaspect Nokia_N800.avi
BENCHMARKs: VC:  29,764s VO:   7,666s A:   0,468s Sys:  64,635s =  102,534s
BENCHMARK%: VC: 29,0287% VO:  7,4767% A:  0,4565% Sys: 63,0381% = 100,%
BENCHMARKn: disp: 2504 (24,42 fps)  drop: 0 (0%)  total: 2504 (24,42 fps)

#  mplayer -benchmark -quiet -noaspect -dr -nomenu Nokia_N800.avi
BENCHMARKs: VC:  30,266s VO:   5,490s A:   0,467s Sys:  66,286s =  102,509s
BENCHMARK%: VC: 29,5255% VO:  5,3554% A:  0,4560% Sys: 64,6631% = 100,%
BENCHMARKn: disp: 2501 (24,40 fps)  drop: 0 (0%)  total: 2501 (24,40 fps)

Results with unpatched xserver and some more

Re: N800 Video playback

2007-04-30 Thread Siarhei Siamashka

On Friday 27 April 2007 04:43, Daniel Stone wrote:

I'll make a really optimized version of YV12 - YUV420 convertor on this
weekend (removing branch is good, but I feel that it can be improved
more) and will try to use it on Nokia 770, any extra video performance
improvement will be useful there. I hope that the framebuffer driver on
Nokia 770 supports YUV420 color format properly.

I don't think Tornado supports YUV420, but I can check in the specs
tomorrow. My better C version basically does two macroblocks at a time,
ensuring all 32-bit writes (which _really_ helps over 16-bit writes,
believe me). This eliminates the branch, since your surface is
guaranteed to be word-aligned, so if you do all 32-bit writes, you can
just drop the branch as you know every write will be aligned.

This will be really fast.

Optimized YV12 - YUV420 convertor is done. The sources can be found here:
https://garage.maemo.org/plugins/scmsvn/viewcvs.php/trunk/libswscale_nokia770/?root=mplayer

Take a look at 'arm_colorconv.h' and 'arm_colorconv.S' files. Also there is a
test program ('test_colorconv') which can ensure that everything works
correctly and fast:

~ $ ./test_colorconv
test: 'yv12_to_yuv420_xomap',
time=7.332s, speed=32.878MP/s, memwritespeed=43.838MB/s

test: 'yv12_to_yuv420_xomap_nobranch',
time=5.679s, speed=42.448MP/s, memwritespeed=56.597MB/s

test: 'yv12_to_yuv420_line_arm_',
time=4.706s, speed=51.223MP/s, memwritespeed=68.297MB/s

test: 'yv12_to_yuv420_line_armv5_',
time=3.356s, speed=71.824MP/s, memwritespeed=95.765MB/s

test: 'yv12_to_yuv420_line_armv6_',
time=2.826s, speed=85.298MP/s, memwritespeed=113.731MB/s

ARMv6 optimized YV12-YUV420 convertor is about 2.5x faster
than current code used in N800 xserver. So it should provide a nice
improvement for video :)

I doubt that your better C version can beat it or even get any close. There
are two important optimizations in this code:
1. Cache prefetch with PLD instruction (added in '_armv5' version) which
boosts performance to 70 megapixels per second. Inner loop is unrolled
to process 32 pixels per iteration (cache line size is 32 bytes on ARM, so
such unrolling is convenient). This is the most important improvement.
You can try using __builtin_prefetch() from C code to do the same
optimization.
2. The use of ARMv6 instruction REV16 to do bytes swapping for high and low
16-bit register parts, this optimization was added in '_armv6' version and
boosted performance even more to 85 megapixels per second. This
optimization is highly unlikely probably impossible for C version at all.

I was a bit wrong about YUV420 format in my previous post.

Suppose we have planar YV12 image with the following data.
Y plane: Y1 Y2 Y3 Y4 ...
U plane: U1 __ U2 __ ...

Normal YUV420 (according to pictures in Epson docs) would be the following:
U1 Y1 Y2 U2 Y3 Y4 ...

But appears (most likely because of 16-bit interface and some endian
differences between ARM and Epson chip) that each pair of bytes is
swapped and we actually get the following somewhat weird layout:
Y1 U1 U2 Y2 Y4 Y3 ...

To do this byteswapping, ARMv6 instruction REV16 is very handy.

The assembly sources for ARMv6 code look a bit messy because
instruction reordering was needed to correctly schedule them and avoid
ARM11 pipeline interlocks which negatively affect performance. Now this
code is really fast with very little or no interlocks in the inner loop. And
gcc does not do a good job optimizing code on ARM, so C implementation
would be also at disadvantage here.

By the way, the benchmarks posted in my previous message should be
discarded. I did not initialize source buffers that time and looks like ARM11
cpu has some 'cheat' which allows treating empty data pages in some
special way and avoid reading from memory. So the numbers posted in the
previous benchmark were higher than usual. Now it is corrected.

As for the other possible Xv optimizations. You mentioned that fallback code
is not important at all. But imagine 640x480 video playback in windowed
mode. Decoding it will require quite a lot of resources, but additionally
scaling it down using a slow fallback code will be a finishing blow. In
addition, a solution (fast JIT accelerated YV12-YUY2 scaler) for this
problem already exists. I can also modify this scaler to support
YV12-YUV420 scaling. An interesting thing here is that this scaler
could be also used by xserver to solve graphics bus bandwidth
issues. Imagine that we have some high resolution video with high
framerate which exceeds graphics bus capabilities. In this case
this video can be downscaled in software using JIT scaler to lower
resolution before sending data to LCD controller. What do you think?

Sure. Unfortunately my job has other functions than to make video
decoding really, really fast, so I'm happy to merge, review, offer
feedback, and help you out where I can be useful, but I can't throw much
time at this myself.

That's fine. Now I'm waiting for

Toolchain upgrade? (Was: Instructions cache flush on ARM)

2007-04-30 Thread Siarhei Siamashka

On Tuesday 24 April 2007 10:56, you wrote:

  By the way, do you have any plans for upgrading toolchain? Either I'm
  extremely unlucky, or current toolchain is really very buggy.

 You can see the known issues from the GCC bugzilla.
 There are a few bugs in C++ support which have been fixed
 in gcc 3.4.6 (Maemo toolchain is 3.4.4) or 4.x.

But doesn't current maemo toolchain have lots of modifications to
backport EABI support which only officially appeared in gcc 4.x? 
These modifications might have introduced some additional instability.

  It does not support -pg option properly (for profiling with gprof),

 Hm.  The toolchain might not be built with -pg support.
 As to using gprof, that produces fairly unreliable results.
 I'd recommend building Oprofile kernel and latest oprofile
 user-space tools.

Maybe Oprofile is good, but  gprof is better than nothing and does not require
recompiling kernel.

  also I encountered at least one internal compiler error and a couple of
  invalid code generation bugs already.

 C++ code generation?  Or C? (GCC bugzilla mentions only C++
 code generation issues)

I have encountered the following problems on C code (MPlayer).

ICE:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22177

Definitely invalid code generation in inline asm (but the same bug 
apparently shows up in gcc 4.1.1 as well):
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31693

Invalid code generation suspected:
https://garage.maemo.org/tracker/index.php?func=detailaid=254group_id=54atid=269
https://garage.maemo.org/tracker/index.php?func=detailaid=763group_id=54atid=269

I did not investigate these two last problems thoroughfully (this might be
probably some bad code in MPlayer with 'undefined behaviour' which works
better on some compilers but breaks on the others), but they disappear when
compiling with gcc 4.1.1 crosscompiler (outside scratchbox using gentoo
crossdev).

 ICE you can get around by trying another optimization level
 (sometimes -Os or -O3 works where -O2 doesn't).

Well, I'm worried not about how to workaround ICE but about the overall
quality of the compiler. I wonder how many compiler related bugs are lurking
in maemo software but are not caught yet? But again, maybe I'm just unlucky
to get hit by more bugs than the others :)

Did anybody try installing newer toolchains in scratchbox and use them with
maemo SDK? I just don't have much free time for these experiments and 
don't want to break my installation of scratchbox which works now (more or
less acceptable)

Building packages with new toolchain would probably need to have libstdc++
linked statically for C++ applications to work on 770/N800, but otherwise
everything should be fine.

  One more question is about the kernel, ARM11 seems to support unaligned
  memory access in hardware, but this feature is not enabled on N800.

 What the seems, to support and feature enabled mean in
 the above clause?  Seems how?  And what is result? Enabled what?

seems is a standard disclaimer which means that I did not work with these
features myself, only read this information from docs and can't be sure if I
understood everything correctly :)

 ARM CPU is able to trap them?  Kernel could SIGBUS the co. processes?
 (as unaligned access has AFAIK undefined results on ARM, is often
 coding error and fixing those accesses on kernel side has definitive
 performance penalty)

http://arm.com/documentation/ARMProcessor_Cores/index.html
'ARM1136JF-S and ARM1136J-S r1p1 Technical Reference Manual'
Chapter 4 'Unaligned and Mixed-Endian Data Access Support'

As ARM11 core used in N800 is little endian, does have floating point unit and
supports unaligned memory access in hardware (which only needs to be enabled).
It probably doesn't have any serious portability issues to be aware of
anymore and vast majority of software initially developed for x86 should be
easy to compile and run on it even without doing any modifications.

Enabling unaligned memory support will make life much easier for developers
unfamiliar with ARM platform. The number of applications for N800 should 
grow up, as less newbee developers will be turned away frustrated by the
alignment bugs they have never heared about before.

But this will be to some extent a bad thing for Nokia 770, as it will result
in more applications usable on N800, but buggy on 770
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: N800 Video playback

2007-04-24 Thread Siarhei Siamashka

On Friday 20 April 2007 10:39, you wrote:

 The primary conversion we do isn't planar - packed (this is a fallback
 for when the video is obscured), but from planar to another custom
 planar format.  It would be good to get ARM assembly for the fallback
 path, but most of the problem when using packed lies in having to
 transfer the much larger amount of data over the bus.

It is only a problem of definition :) Whatever it is, packed or planar, this
YUV420 format is not YV12. So it still needs conversion which is 
performed by only reordering bytes and is not much different from 
packed YUY2 (except that it requires less space and bandwidth).

 There's one optimisation that could be done for the YUV420 conversion
 (the custom planar format that Hailstorm takes), which removes a branch,
 ensures 32-bit writes always (instead of one 32-bit and one 16-bit per
 pixel), and unrolls a loop by half.  Might be interesting to see what
 effect this has, but I think it'll still be rather small.

My main performance concern is exactly about this 'omapCopyPlanarDataYUV420'
function. My experience from Nokia 770 video output code optimization shows
that optimization effect can be really huge (it was 1.5x improvement on Nokia
770 for unscaled YV12 - YUY2 conversion going from a simple loop in C to
optimized assembly code, I provided a link to the relevant code in my previous
post). But N800 code can be probably improved more because now it contains
unnecessary branch in the inner loop and branches are expensive on long
pipeline CPUs. Such color format conversion performance should be
comparable to that of memcpy if done right (it is about half memcpy speed on
Nokia 770 for unscaled YV12 - YUY2 conversion).

But only benchmarks can be a real proof, any premature speculations are
useless and even harmful. Do you remember the times when nobody from 
Nokia believed that ARM core could be good for video decoding on 770? ;-)

Testing with Nokia_N800.avi video on N800:
#  mplayer -benchmark -quiet -noaspect Nokia_N800.avi

BENCHMARKs: VC:  29,525s VO:  15,029s A:   0,453s Sys:  59,919s =  104,925s
BENCHMARK%: VC: 28,1390% VO: 14,3232% A:  0,4313% Sys: 57,1065% = 100,%
BENCHMARKn: disp: 2511 (23,93 fps)  drop: 0 (0%)  total: 2511 (23,93 fps)

Enabling direct rendering (avoids extra memcpy in mplayer, but requires to
disable OSD menu):
#  mplayer -benchmark -quiet -noaspect -dr -nomenu Nokia_N800.avi

BENCHMARKs: VC:  29,826s VO:  12,365s A:   0,437s Sys:  60,555s =  103,182s
BENCHMARK%: VC: 28,9058% VO: 11,9833% A:  0,4236% Sys: 58,6873% = 100,%
BENCHMARKn: disp: 2504 (24,27 fps)  drop: 0 (0%)  total: 2504 (24,27 fps)

Testing the same video on Nokia 770:
#  mplayer -benchmark -quiet -noaspect Nokia_N800.avi

BENCHMARKs: VC:  44,982s VO:   7,998s A:   0,884s Sys:  47,936s =  101,801s
BENCHMARK%: VC: 44,1862% VO:  7,8568% A:  0,8688% Sys: 47,0882% = 100,%
BENCHMARKn: disp: 2502 (24,58 fps)  drop: 0 (0%)  total: 2502 (24,58 fps)


So Nokia 770, having slower CPU, slower memory and using less efficient 
output format (16bpp vs. 12bpp), still requires less time for video output
than N800 (7,998s vs. 12,365s). Graphics bus performance is unrelated here 
as it is asynchronous operation and it is fast enough. Surely N800 also has
some extra overhead because of interprocess communication with xserver, but
looks like YV12 - YUV420 conversion is quite a bottleneck here too.

It should be noted that while Nokia_N800.avi video has low resolution and 
N800 has no problems decoding and displaying it, our goal is higher resolution 
videos such as 640x480. Getting to higher resolutions will increase color
format conversion overhead. As it can be seen from these benchmarks, video
output on N800 takes quite a significant time when compared with time needed
for decoding (29,826s for decoding, 12,365s for video output).

I can make an assembly optimized code for YV12 - YUV420 conversion. Is there
any chance that such optimization could be also integrated into xserver in one
of the next firmware updates if it really provides a significant performance
improvement?

N800 is almost able to play VGA resolution videos properly, it only needs a
bit more optimizations. Color format conversion performance for video output
is one of the important things that can be improved.

  So for any performance optimizations experiments which result in
  immediate video performance improvement, either direct framebuffer access
  should be used again or it would be very nice if xserver could provide
  direct access to framebuffer (video planes) in yuy2 and that custom
  yuv420 format in one of the next firmware updates. The xserver itself
  should not do any excess memory copy operations as they degrade
  performance (and it does such copy for yuy2 at least).

 'Direct framebuffer access'?  As in, just hand you a pointer to a
 framebuffer somewhere and let you write straight to it?  As this would
 require a firmware update anyway, I don't really see how

Re: Instructions cache flush on ARM (was: N800 Video playback)

2007-04-23 Thread Siarhei Siamashka

On Monday 23 April 2007 16:49, Guillem Jover wrote:

  You are right. gcc has function
  void __clear_cache (char *BEG, char *END)
  which should be more portable.

 It should, but it still has the problem of emitting an OABI syscall
 due to our old gcc.

 You could use syscall(2) and __ARM_NR_cacheflush instead.

Yes, but __clear_cache(char *BEG, char *END)  works fine with the 
current combination of gcc and kernel in maemo. So I guess it's the best 
option if portability is desired.

If you decide to drop support for old ABI from kernel without upgrading
gcc, that would be a bug in maemo platform :-)

By the way, do you have any plans for upgrading toolchain? Either I'm
extremely unlucky, or current toolchain is really very buggy. It does not
support -pg option properly (for profiling with gprof), also I encountered 
at least one internal compiler error and a couple of invalid code generation
bugs already.

One more question is about the kernel, ARM11 seems to support unaligned 
memory access in hardware, but this feature is not enabled on N800. Is it done
for consistency with Nokia 770?
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Instructions cache flush on ARM (was: N800 Video playback)

2007-04-22 Thread Siarhei Siamashka

On Friday 20 April 2007 19:04, you wrote:

  I have seen your code in xserver which does the same job for downscaling,
  but in nonoptimized C and with much higher impact on quality. Using JIT
  scaler there can improve both image quality and performance a lot. The
  only my concern is about instruction cache coherency. As ARM requires
  explicit instructions cache flush for self modyfying or dynamically
  generated code, I wonder if  using just mmap is safe (does it flush cache
  for allocated region of  memory?). Maybe maemo kernel hackers/developers
  can help with this information?

 arm linux support flush icache by syscall cacheflush,

 qemu have this function:
 static inline void flush_icache_range(unsigned long start, unsigned long
 stop)
 {
 register unsigned long _beg __asm (a1) = start;
 register unsigned long _end __asm (a2) = stop;
 register unsigned long _flg __asm (a3) = 0;
 __asm __volatile__ (swi 0x9f0002 : : r (_beg), r (_end), r
 (_flg));
 }

 you can reference kernel source arch/arm/kernel/traps.c and
 include/asm-arm/unistd.h

Thanks, it works. But I'm worried about [1]. Looks like EABI has a new syscall
interface and this code from qemu uses old ABI. And from reading description 
at the wiki page, compatibility with old ABI can be disabled (and it makes
sense disabling it as this compatibility reduces performance a bit).

I wonder if there is a better portable solution (running on any ARM linux or
even better on any POSIX compatible system). It would be reasonable to 
assume that allocating memory with mmap implies that we are going to 
execute code from that area and instructions cache should be flushed for it:
mmap(0, some_buffer_size, PROT_READ | PROT_WRITE | PROT_EXEC, 
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

But I wonder if mmap requesting executable block of memory really does 
instructions cache flush in reality?

I just want to submit this ARM optimized scaler to upstream ffmpeg and want to
make it as portable as possible.

1. 
http://wiki.debian.org/ArmEabiPort#head-96054c6cb4209b4a589e645dd50ac0fe133b8ced
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: N800 Video playback

2007-04-20 Thread Siarhei Siamashka

On Monday 19 March 2007 22:34, you wrote:

snip

Again, if there are any particular questions I can answer, don't be
subtle: ask me straight up. If I can answer them (some things I can't
necessarily say, some things I don't necessarily know), I will.

Thanks, here we go and sorry for a long delay with this answer.

First thanks for Xv update which makes it really usable now, MPlayer now uses
Xv video output on N800 by default. But there are still some problems. Using
unmodified upstream MPlayer code for Xv (N800 with 3.2007.10-7
firmware at the moment) does not work good. It has two at least problems:

1. Lockups which look like cycling two sequential frames, very similar or the
same problem as https://maemo.org/bugzilla/show_bug.cgi?id=991
Also keypresses are not very responsive. A fix (or workaround) required
changing XFlush to XSync in screen update code, now it looks a lot better.

2. Switching windowed/fullscreen mode generally makes mplayer terminate
with the following error messages:
X11 error: BadValue (integer parameter out of range for operation)
Xlib: unexpected async reply (sequence 0x5db)!
A workaround to make this problem less frequent was a code addition which
prevents screen updates until we get Expose even notification.

All these Xv patches for MPlayer code can be viewed here:
https://garage.maemo.org/plugins/scmsvn/viewcvs.php?root=mplayerdiff_format=hview=revrev=166

I really don't know much about X11 programming and only started to learning
it, so your help with some advice may be very useful. Looks like MPlayer code
X11/Xv output code is a big mess with many tricks and workarounds added to
work on different systems over time. Maybe it contains some bugs which get
triggered on N800 only, but apparently this code is used for other systems
without any problems. Can you try experimenting a bit with MPlayer (upstream
release) yourself to check how it works with N800 xserver? Maybe it can reveal
some xserver bugs which need to be fixed? Also if MPlayer has some apparently
bad X11 code, preparing a clean patch and submitting it upstream maybe a
good idea.

One more strange thing with Xv on N800 can be reproduced by trying to watch
standard N800 demo video in MPlayer. It has an old familiar tearing line in
the bottom part of the screen and the performance is very poor. The same file
plays fine in the standard video player. The only difference is that mplayer
respects video aspect ratio (this video is not precisely 15:9 but slightly
off) and shows some small black bands above and below picture and
default video player scales it to fit the whole screen. Disabling aspect ratio
in mplayer with -noaspect option also 'fixes' this problem.

Using benchmark option we get the following numbers:

# mplayer -benchmark -quiet Nokia_N800.avi
[...]
BENCHMARKs: VC: 33,271s VO: 66,768s A: 0,490s Sys: 5,703s = 106,232s
BENCHMARK%: VC: 31,3189% VO: 62,8517% A: 0,4614% Sys: 5,3681% = 100,%
BENCHMARKn: disp: 1732 (16,30 fps) drop: 778 (30%) total: 2510 (23,63 fps)

# mplayer -benchmark -quiet -noaspect Nokia_N800.avi
[...]
BENCHMARKs: VC: 32,226s VO: 14,350s A: 0,456s Sys: 55,699s = 102,731s
BENCHMARK%: VC: 31,3694% VO: 13,9687% A: 0,4439% Sys: 54,2180% = 100,%
BENCHMARKn: disp: 2501 (24,35 fps) drop: 0 (0%) total: 2501 (24,35 fps)

So when showing video with proper aspect ratio, we get tearing back and more
than 4x slowdown in video output code (66,768s vs. 14,350s). This all results
in 30% of frames dropped.

These were the 'usability' problems with Xv. Now we get to performance
related issues. As YV12 is not natively supported by hardware, some
color format conversion and bytes shuffling in video output code is
unavoidable. It is a good idea to optimize this code if we need a good
performance for high resolution video playback. Color format conversion
can be optimized using assembly, for example maemo port of mplayer
has a patch for assembly optimized yv12- yuy2 (yuv420p - yuyv422)
nonscaled conversion which provides a very noticeable ~50% improvement
on Nokia 770:
https://garage.maemo.org/plugins/scmsvn/viewcvs.php?root=mplayerrev=129view=rev

Also here is a JIT accelerated scaler for yv12- yuy2 (yuv420p - yuyv422)
conversion, it is very fast and supports pixels interpolation (good for image
quality) :
https://garage.maemo.org/plugins/scmsvn/viewcvs.php/trunk/libswscale_nokia770/?root=mplayer

I have seen your code in xserver which does the same job for downscaling, but
in nonoptimized C and with much higher impact on quality. Using JIT scaler
there can improve both image quality and performance a lot. The only my
concern is about instruction cache coherency. As ARM requires explicit
instructions cache flush for self modyfying or dynamically generated code, I
wonder if using just mmap is safe (does it flush cache for allocated region
of memory?). Maybe maemo kernel hackers/developers can help with this
information?

It should be noted, that all this assembly

Re: Support for frame buffer in N800

2007-03-21 Thread Siarhei Siamashka

On Wednesday 21 March 2007 09:58, Sampath Goud wrote:

 I want to know if there is frame buffer support in N800.
 I have written a simple application (drawing a pixel) on frame buffer and
 tried to execute it on N800 in root mode.
 But it prompts the message permission denied.
 Please let me know if there is support for frame buffer in N800. If there
 is support then how can I use it?

Yes, it is available. You should use /dev/fb0 if you want RGB color format
(/dev/fb1 and /dev/fb2 are used for video planes generally in YUV color
format).

But do you absolutely need to use framebuffer? Using framebuffer directly
introduces a lot of problems synchronizing access to it with xserver (and it
can't be done in a completely right way currently as far as I know).
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: N800 Video playback

2007-03-21 Thread Siarhei Siamashka

On Tuesday 20 March 2007 15:03, Klaus Rotter wrote:

  On Tue, Mar 20, 2007 at 09:31:00AM +0100, ext Klaus Rotter wrote:
  The memory bandwidth to the N800 LCD framebuffer is 3 times slower that
  the bandwidth in the N770? Is it really _that_ big?
 
  Siarhei's calculations were correct, so, yes.

 Bad... the N770 interface wasn't the fasted either. So we have even a
 more slow down. 

There is one important thing to note. Screen updates are asynchronous and 
are performed simultaneously with CPU doing some other useful things at 
the same time. Screen updates do not introduce any overhead or affect 
performance (at least I did not notice any such effect). So insanely boosting
graphics bus performance will not provide any improvements at all once it is 
capable to sustain acceptable framerate. And what is acceptable depends
on applications. Video may require higher framerate, but it is both high
resolution and high framerate movies that may exceed graphics bus 
capabilities, in this case video will be still played (if cpu is fast enough
to decode it, that's another story) but with some frames skipped and 
many people will not even notice any problems. Quite a lot of people 
are even satistied with 15fps transcoded video, so getting maybe 20-25fps
(random guess) on some videos instead of 30fps is not so bad.

Tearing at the bottom is most likely caused by screen update time being 
longer than two LCD refresh cycles. With tearsync enabled, both screen 
update and refresh cycle start at the same time, refresh is faster, so we
still see the previous frame on the screen. When the first refresh cycle
completes, screen buffer is slightly less than half updated at that moment.
The second LCD refresh cycle starts displaying the data from the new image,
while screen buffer still continues to get updated, but not fast enough to
complete before this second LCD refresh cycle catches up not too far 
from the bottom part of the screen. If the screen update was faster than two
refresh cycles, there would be no tearing visible. Screen update only needs 
to be 15-20% faster to achieve this. If improving graphics bus performance
does  not work, I wonder if it is possible to to reduce LCD refresh rate
instead?

Anyway, I think it is better to believe Daniel and wait for the new
firmware update :)

 On the N770 there was the feature (with SDL games) of 
 doubling the pixels by hardware with a X-server extension. Will this
 feature be available in the new kernel / X11 server for the N800? It
 would be great if it would use the same API.

Doubling pixels will definitely reduce the load on the graphics bus so that
its bandwidth should become not an issue.
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

N800 Video playback

2007-03-18 Thread Siarhei Siamashka

Hello All,

I did some tests with the framebuffer when trying to find a way to reduce
tearing effect in MPlayer. Here are the results.

I did a mistake when I assumed that screen updates are synchronous for video
planes. They are actually asynchronous just like with Nokia 770, but just a
lot slower so that it is not possible to keep framerate in real time. So we
get blocking when trying to display the next frame and the previous screen
update is still in process.

If we look at the framebuffer API. There are two ioctl important for screen
updates and tearing synchronization if I understand them correctly now:

* ioctl(fd, OMAPFB_UPDATE_WINDOW, update);

This ioctl initiates asynchronous screen update from a 'framebuffer' memory
(actually that's just a system memory) to a graphics chip. If a previous
screen update request is still incomplete, this call blocks and waits until it
is done. The structure 'update' can have OMAPFB_FORMAT_FLAG_TEARSYNC 
flag set which instructs the framebuffer driver to wait internally (ioctl call
is not blocked) and start data transfer on the start of the next LCD internal
screen refresh (aparently refresh rate is something ~60Hz, but the numbers
are below).

* ioctl(fd, OMAPFB_SYNC_GFX);

This ioctl call ensures that current screen update is done (blocking
until data transfer is complete).

Perforrming both these ioctls consequently we can benchmark screen update
performance. The results are the following.

On N800 (OS2007 2.2006.51-6), every YUV screen update (OMAPFB_COLOR_YUY422)
takes about 41ms without tearsync enabled and 41-58ms with tearsync. It does
not matter what video resolution we try to watch, the result is the same. So
the maximal screen update rate is about ~24fps without tearsync and ~20fps on
average with tearsync. That's not enough to watch 30 fps video and achieving
24 fps is theoretically possible, but very tricky and unrealistic. If tearsync
comes into action, watching full framerate videos is impossible now. Analyzing
the difference in screen update times with vsync, looks like full cycle of LCD
internal refresh takes ~17ms (that's ~60Hz, but as the precision is not good,
that may be something else, 50Hz for example). Nevertheless screen update on
N800 can't be completed for these 17ms and tearsync does not work perfectly
(most likely it can fill the screen up to the bottom horizontal line observed
on playing video).

If we try RGB screen updates, we can see that the time needed for screen
update gets lower for updating smaller screen regions). The numbers are 
the following (without tearsync enabled):
640x480: 33.7ms
400x230: 10.2ms
320x240: 8.5ms

Of course RGB screen updates are not very suitable for video as we would lose 
much more time doing YUV-RGB conversion.

If we benchmark the screen update performance on Nokia 770, the numbers are:
640x480: 11.1ms
320x240: 2.9ms (that's fullscreen playback with pixel doubling)

If we estimate bus performance on Nokia 770, it is ~55MB/s and is more than
enough to display 800x480 sized video frames with 30 fps. Adding a tearsync
would be a nice addition, as 11.1ms for 640x480 screen update time is lower
than 17ms LCD refresh cycle. And in the worst case of video sync when we get
11.1ms+17ms=28.1ms for a single frame, it will be still capable of displaying
35 fps at the very least. So any resolution video can be played with perfect
quality given enough cpu performance for video decoding (that's a real
bottleneck on Nokia 770).

Looks like graphics bus on N800 is 3x slower than on Nokia 770. It might 
be caused by inefficient framebuffer driver implementation in its initial
revision. But if it is a hardware issue, getting normal video playback at
native framerate may be troublesome. Performing software downscaling of 
video before sending data to the graphics chip may be a solution, but it
sacrifices image quality. Switching to 12bit YUV format from 16bit will save
~33% of bus bandwidth, but it can't compensate 3x performance regression 
and may be not enough for 30 fps fullscreen video playback.

Right now, I can workaround tearing somewhat, but some frames will have to 
be skipped, resulting in somewhat jerky playback (even for transcoded video
unless framerate is halved).

Apparently the same issue applies to emulators, games and other software.

As Daniel explained, the next firmware will bring a big improvement in this
area. I'm not sure whether it is worth to release the next version of MPlayer
before that, since it will still be far from perfect on N800.

A preview of the next kernel for beta testing might reduce time needed to get
MPlayer fully working on N800, but I'm not demanding or expecting anything. It
is just a matter of time anyway and I'm not so impatient :)

I would be grateful for any comments and corrections. Some things are not 
yet clear to me, figuring them out myself is just a waste of time that could
be spent on something more useful. Even a small hint may save a huge 
amount of time.

PS.

Re: DVD content playback possible or not? Re: Wishlist (was:Re: N800 and USB host mode)

2007-03-09 Thread Siarhei Siamashka

On Friday 09 March 2007 12:20, Daniel Stone wrote:

 On Fri, Mar 09, 2007 at 09:45:03AM +0100, ext Hanno Zulla wrote:
   Right now, the biggest bottleneck in video decoding is RFBI bandwidth
   (i.e. the bus between OMAP and the LCD controller we use), being too
   slow to push more than ~15fps through at 800x480.  Beefing up the
   processor-side decoding doesn't help.  We've been working on this and
   the next firmware update will give you significantly faster video (with
   a couple of caveats).
  
   So it's mostly just down to the large image display, which more or less
   suffers from the same problem.  I don't think it would give us much
   benefit at all.
 
  So from the hardware side, it is definitely no-matter-what-you-try
  impossible to play DVD video content on the N800, even if there was help
  from the DSP?

 Not really.  The next firmware release has gone to great lengths to
 improve video performance by doing scaling on the LCD controller, as
 well as the colourspace conversion.  I think you'll be pleasantly
 surprised. ;)

Thanks, that's a very good news. We all are looking forward for this firmware
update. By the way, is it possible to get an early access to the updated
kernels in the future for the purpose of testing and ensuring compatibility?

I wonder what improvements the new framebuffer driver will bring to us. As 
far as I understand the situation with the current firmware, the problem is 
in having to do planar-packed YUV conversion at ARM core and 
synchronous screen update for anything involving planes.

Graphics system in Nokia 770 could perform YUV screen updates 
asynchronously with DMA consuming only ~20% cpu resources for
640x480 24 fps video output (these ARM core resources were used for 
planar-packed color format conversion and scaling).

N800 is a bit different with a more complicated framebuffer driver with a
support for more hardware features (such as a very high quality hardware
scaler),  but its graphics chip does not seem to support planar YUV color
formats, so something else (ARM core?) should do the conversion wasting the
same ~20% of resources. By the way, did you consider trying to use DSP 
at least for unscaled planar-packed color format conversion? It should
provide some improvement at least theoretically.

And a few questions about the future frambuffer driver. I know that the pixel
doubling feature should be fixed in the next firmware. Will this driver also
support YUV color format for regular screen updates (without using planes)
just like N770? I would prefer some kind of stateless API that would not 
allow to screw up the device when something gets wrong (having some 
planes enabled at abnormal exit makes it impossible to work with the device
and requires a reboot).

And one more minor question is about YUV format constants in framebuffer.
OMAPFB_COLOR_YUV422 constant for N770 specifies the same color 
format as OMAPFB_COLOR_YUY422 for N800, why did you have to introduce 
a new constant?

All in all, while video output issues can be solved, CPU performance for video
decoding is still another bottleneck. It is even worse bottleneck than video
output as you can skip displaying of some frames, but you can't skip decoding.
The latest build of mplayer for maemo (mplayer_1.0rc1-maemo.10) accesses
framebuffer directly, so its video output performance is comparable to that of
N770. Unfortunately  while cpu usage for video output reduced greatly to a
reasonable level and is not a bottleneck anymore, video decoding performance
is still a bottleneck and N800 is only about 30% faster than N770 for video
(N800 handles 30fps videos in mplayer approximately the same as N770 handles
24fps videos). Surely, armv6 optimizations for video decoding can provide some
improvement, but we have a long way of incremental improvements ahead.

Did you try to do something about tearing in the next firmware? While I tried
to workaround it, nothing could eliminate it completely but only resulted in
some additional slowdown. So the latest build of mplayer has tearsync
completely disabled and is optimized for performance only. It goes without
saying that we will have to do something about it in the the future for sure.

Is IVA really unusable on N800? What kind of cpu does it have inside? If 
it is done by TI, we can probably suppose that it is TMS320C64x (at least I
have seen information that IVA2 is a lower clock and more power efficient
version of DaVinci which uses TMS320C64x).
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: [maemo-developers] Xvideo support for Nokia 770?

2007-02-07 Thread Siarhei Siamashka

Hello, 

It would be probably a good idea to discuss different possibilities for
improving multimedia support on 770/N800.

Now we have a fast JIT scaler that runs on ARM core, it solves all the 
video resolution related performance problems. I'm going to work on 
improving quality, performance and its inclusion into upstream ffmpeg 
library, this task is in my nearest plans:
http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/2007-January/051209.html

As for the ways of improving multimedia support on Nokia 770, it may be 
done in the following ways (in no particular order):
1. Continue ffmpeg optimizations (motion compensation functions, finetune
idct, have a look at the possibilities to optimize codecs other than mpeg4 and
its variants)
2. Implement Xvideo extension support for Nokia 770 (using scaling done on 
ARM core)
3. Implement XvMC in some way (using C55x DSP for it as it is supposedly good
for IDCT and motion compensation stuff)
4. Improve GStreamer plugins (replacements for dspfbsink and dspmpeg4sink
running on ARM core, it could probably improve mpeg4 playback performance a
lot and allow using higher video bitrates and resolutions that are currently
available in MPlayer)
5. Try to relay color format conversion and scaling to DSP. If it works as
expected, video scaling can be done with almost zero overhead for ARM 
core. Theoretically the same trick could probably also work for GStreamer
if video output sink can provide its own buffer (::buffer_alloc). The first
step would be to try just doing nonscaled color format conversion. If it is
successful, some more advanced stuff can be tried such as JIT dynamic 
code generation on C55x.
6. Try porting vorbis decoder (tremor) to DSP
7. Try porting libmpeg2 to DSP. With audio decoding and scaling done on 
ARM core, it might improve overall mpeg2 playback performance, I wonder if
nonconverted DVD video playback is even theoretically possible on 
Nokia 770.

That's quite a big list and it contains some things that might be generally
nice to have, but have relatively low practical value and are actually not
worth efforts implementing :)

There are two issues that need to be solved for this all to become reality:
1. We need some way of applying community developed upgrades for core 
system components such as xserver and xlib (if we go after Xvideo support on
Nokia 770). They must be easy to install by end users, otherwise this all
development does not make much sense. It would be also nice to integrate 
these improvements into official firmware later, but I wonder if Nokia has
spare resources for doing this integration and its quality assurance.
2. Reliable information that is detailed enough for performing graphics and
audio output from DSP, see 
http://maemo.org/pipermail/maemo-developers/2007-February/007949.html
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

[maemo-developers] Fast 16bpp alpha blending (was: Improving Cairo performance on the N800)

2007-01-30 Thread Siarhei Siamashka

On Thursday 18 January 2007 13:46, Gustavo Sverzut Barbieri wrote:

  By the way, free software is really poorly optimized for ARM right now.
  For example, SDL is not optimized for ARM, xserver is probably not
  optimized as well, a lot of performance critical parts of code in various
  software are still only implemented in C for ARM while they have x86
  assembly optimizations long ago. Considering that Internet Tablets might
  have a tight competition  with x86 UMPC devices in the near future, ARM
  poweded devices are at some disadvantage now. Is this something that we
  should try to change? :-)

 Yes. Since at INdT we use a lot of SDL, GTK and in future Evas, we are
 interested in optimizing this.

 One thing that can be optimized is 16bpp operations. Moving SDL
 surfaces to be optimized, packing 16bpp RGB into one plane and 1
 byte-Alpha in another plane, we could use multiple store (stm) and
 improve things a bit.

 If we could achieve ~24fps blitting fullscreen 16bpp+Alpha, it would
 rock! :-) Right now we do 18fps, but we still need that function with
 separated planes + stm. I'll ask Lauro to send them as soon as we get
 it working.

 Anyone willing to help evas port to work with 16bpp+Alpha internally?
 Evas is a great canvas, can interoperate with Glib main loop easily
 and provides high level utilities, like text layout (pango-like),
 gradients, the concept of objects to animate and is scriptable really
 easy (with optimizations!).

Regarding 16bpp alpha blending, I did some optimization (not involving
assembly yet) for maemo build of ufo2000 [1]. The code which is currently 
used, is based on and extends RLE sprites from Allegro game programming
library [2]. The sources can be found here:
http://ufo2000.svn.sourceforge.net/viewvc/ufo2000/trunk/src/fpasprite/

The goal was to get support for drawing isometric tiles with the support of
alpha channel (for fire, smoke, explosions, window glass, ...) and adjustable
brigtness (for lighting effects on night missions simulation).

The code works as allegro addon library and allows loading sprites from
PNG files. It automagically detects presence or absence of alpha channel and
converts images into optimal format which allows fast blending (for alpha
channel) and store it in a compact form (for images without alpha channel).
When blitting sprite, brightness ranging from 0 - 255 is used as an additional
argument. The code uses C++ templates to support all the possible variants 
of bit depth and blending type (may be not a very good idea for the code
intended for submission into C library later :) ).

The trick used to speed up alpha blending was to store each pixel data in
a special 32-bit representation with R, G, B and alpha channel arranged in a
special way for better performance.

So imagine that we have 16-bit pixel in RGB565 format and alpha channel. 
We convert it into this 32-bit preprocessed data according to the following
algorithm:

uint32_t convert_pixel(uint16_t rgb565, int alpha)
{
uint32_t n = (alpha + 1) / 8, x = rgb565;
x = (x | (x  16))  0x7E0F81F;
return x | (n  5);
} 

Now if we need to do alpha blending (with some buffer in memory), we do the
following (d - destination pixel data buffer, s - buffer with preprocessed
32-bit pixel data, w - number of pixels to blend, n - brightness level (0 - 
32)):

uint16_t *draw_alpha_dark_line16(uint16_t *d, uint32_t *s, int w, uint32_t n)
{
 while (--w = 0) {
uint32_t x = *s++;
uint32_t y = (uint16_t)*d;
uint32_t result = (x  5)  0x3F;
x = ((x  0x7E0F81F) * n / 32)  0x7E0F81F;
y = (y | (y  16))  0x7E0F81F;
result = ((x - y) * result / 32 + y)  0x7E0F81F;
*d++ = (result | (result  16));
}
return d;
}

This code works quite fast (at the cost of some precision loss though). It 
is perfectly suitable for isometric tile based games and probably other
applications which only need lightning fast blending and do not need any 
extra operations with sprites (rotation for example). Removing brightness
level support makes the code even faster.

Using this code in ufo2000 allows it to keep reasonably high framerate (more
than 10 fps) even on complicated scenes full of fire and smoke animation for
example. I hope this information may be useful for other maemo game 
developers or anyone in need of fast 16bpp alpha blending code.

PS. Optimizing alpha blending using assembly can most likely improve 
performance even more :)

[1] http://ufo2000.sourceforge.net
[2] http://alleg.sourceforge.net
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: [maemo-developers] Improving Cairo performance on the N800

2007-01-17 Thread Siarhei Siamashka

On Tuesday 16 January 2007 12:08, Zeeshan Ali wrote:

  Now, the recently announced Nokia N800 is different from the 770 in
  various ways that are interesting for Cairo performance. I've got my
  eye on the ARMv6 SIMD instructions and the PowerVR MBX accelerator.

Yeah! me too. The combined power of these two can make it possible
 to optimize a lot of nice free software out there for the N800 device.
  However! while former is fully documented and the documentation is
 available for general public, it doesn't have a lot to offer. ARMv6
 SIMD only operate on 32-bit words and hence i find it unlikely that it
 can be used to optimize double fp emulation in contrast to the intel
 wirelesss MMX, which provides a big bunch of 128-bit (CORRECTME: or
 was it 64- bit?) SIMD instructions. OTOH, these few SIMD instructions
 can still be used to optimize a lot of code but would it be a good
 idea for cairo if you need to convert the operand values to ints and
 the result(s) back to float?

Well, OMAP2420 seems to support floating point in hardware, so all this stuff
is probably not needed anymore :)

   I have already been thinking on utilizing ARMv6 before the N800 was
 release to public. My proposed plan of attack for the community (and
 also the Nokia employees) is simply the following:

 1. Patch GCC to provide ARMv6 intrinsics. (1 MM at most)
 2. Patch liboil [1] to utilize these intrinsics when compiled for
 ARMv6 target (1-3 MM)
 3. Make all the software utilize liboil wherever appropriate or ARMv6
 intrinsics directly if needed.

The 3rd step would ensure that you are optimizing your software for
 all the platforms for which liboil provides optimizations. OTOH! one
 can skip step#1 and write liboil implementations in assembly.

I already did a little progress on this and the result is two
 header files which provides inline functions abstracting the assembly
 instructions. I am attaching the headers. One of my friend was
 supposed to convert them to gcc intrinsics and patch gcc but i never
 got around to finish them. However I am attaching the headers so
 anyone can use it as a starter if he/she likes.

According to my tests, performance improvement from using such header 
files is minimal. They are easy to use, but the improvement is generally not
very good.

When I benchmarked idct performance, I also tested C implementaion with some
macros for fast armv5te 16-bit multiplication out of curiasity. Performance
improvement was only about 5%. While at the same time, handcrafted code
improves performance by as much as 50% (and still has potential for more
optimizations):
http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/2006-September/045837.html

The very similar minimal effect is obtained from using such macros in ffmpeg
mp3 decoder.
 
The explanation is simple. Compiler is not able to shedule instructions 
as good as human especially if it has some 'alien' parts of code inserted 
in the flow of its instructions via inline asm. For example, this multiply
instruction takes 1 cycle to execute, but the result has 1 extra cycle latency
(for ARM9, it is even higher for ARM11 and is equal to 2 cycles) and you can't
use it immediately in the next instruction. As gcc does not know about the
sheduling of such instructions when using just macros, it may try to use
the result immediately and suffer form 1 or more cycles penalty because of
pipeline interlock.

So if really good performance is required, nothing can beat handcrafted
assembly yet. Of course it makes sense to profile code and optimize only 
time critical relatively small leaf functions.

By the way, free software is really poorly optimized for ARM right now. For
example, SDL is not optimized for ARM, xserver is probably not optimized 
as well, a lot of performance critical parts of code in various software are
still only implemented in C for ARM while they have x86 assembly 
optimizations long ago. Considering that Internet Tablets might have a tight
competition  with x86 UMPC devices in the near future, ARM poweded devices 
are at some disadvantage now. Is this something that we should try to
change? :-)
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: [maemo-developers] Xvideo support for Nokia 770?

2007-01-17 Thread Siarhei Siamashka

On Wednesday 10 January 2007 01:51, Charles 'Buck' Krasic wrote:

 Siarhei Siamashka wrote:
  Actually I have been thinking about trying to implement Xvideo
  support on 770 for some time already. Now as N800 has Xvideo
  support, it would be nice to have it on 770 as well for better
  consistency and software compatibility.

 As you may recall, I was considering this back in August/September.
 I tried a few things, and reported some of my findings to this list.
 The code for all that is still available here:
 http://qstream.org/~krasic/770/dsp/

Yes, sure I remember. Thanks for doing these experiments and making 
the results available. It really helps to have more information around.

  I see the following possible options:
 
  1. Implement it just using ARM core and optimize it as much as
  possible (using dynamically generated code for scaling to get the
  best performance). Is quite a straightforward solution and only
  needs time to implement it.

 It is my impression that this might be the most attractive option.
 I noticed that TCPMP which seems to be the most performant player for
 the ARM uses this approach, and it is available under GPL, so it may
 be possible to adapt some of its code.

 In the long run, I would hope that integrating TCPMP scaling code into
 libswscale of the ffmpeg project might be the most elegant approach,
 since that seems to be the most performant/featureful/widel adopted
 open-source scaling code (but not yet on ARM).   For mplayer, it works
 out of the box, since libswcale actually originated from mplayer, and
 only recently migrated to ffmpeg.

I see, thanks for the information (I checked TCPMP sources some time ago, 
but was interested in runtime cpu capabilities detection code and did not look
at the scaler that time). Using TCPMP code may be an interesting option. But I
also still may try to make my own scaler implementation for two reasons:
1. TCPMP is covered by GPL license, and most parts of ffmpeg are LGPL, so
probably it makes sense making a clean room implementation of JIT powered
scaler for ARM under LGPL license
2. I'm worried about the performance. Knowing how the cache and write buffer
work on arm926 core, it is possible to tune generated code for it and get the
best performance possible. So the results can be better than for TCPMP.

I have just committed some initial assembly optimizations for unscaled
yuv420p - yuyv422 color format convertor to maemo mplayer SVN. It already
provides some performance improvement, for example on my test video file
(640x480 resolution, 24 fps) I get the following results now:

BENCHMARKs: VC: 114.526s VO:  21.055s A:   0.000s Sys:   1.582s =  137.163s
BENCHMARK%: VC: 83.4962% VO: 15.3503% A:  0.% Sys:  1.1535% = 100.%

We can compare it with the older results (decoding time was also 
improved a bit since that time because of recent assembly optimizations 
for dequantizer):
http://maemo.org/pipermail/maemo-developers/2006-December/006646.html

BENCHMARKs: VC: 121.282s VO:  31.538s A:   0.000s Sys:   1.577s =  154.397s
BENCHMARK%: VC: 78.5517% VO: 20.4267% A:  0.% Sys:  1.0216% = 100.%

Most of the speed improvement in color conversion and video output (VO: part)
is gained just from loop unrolling and avoiding using some extra instructions
as gcc does when compiling C code, but using STMD instruction to store 16
bytes at once at aligned location [1] provides at least 10% performance here.
If we estimate memory copy speed here with additional colorspace conversion
applied, it is about 70MB/s now for 640x480 24 fps video (though we need to
read a bit less data than write here, so it is a bit different from memcpy).
And I have observed peak memcpy performance about 110MB/s on Nokia 770. 
So this color convertor is quite close to memory bandwidth limit now. This
code can be optimized more by processing two image lines at once, so we can
get rid of some data read instructions and improve performance. Also
experimenting with prefetch reads may provide some improvement.

JIT generated code should have a bit worse performance, but not much. It we
decide to make 'nearest neghbour' scaling, the result should be probably as
fast as this nonscaled conversion. But I want to try some simplified variation
of bilinear scaling: each pixel in the destination buffer is either a copy of
some pixel in the source buffer or an average value of two pixels. This way it
should only introduce two extra instructions for each byte in output at
maximum: addition of two pixel color components and right shift.

  2. Try using dsp tasks that already exist on the device and are
  used for dspfbsink. But the sources of gst plugins contain code
  that limits video resolution for dspfbsink. I wonder if this check
  was introduced artificially or it is the limitation of DSP scaler
  and it can't handle anything larger than that. Also I wonder if
  existing video scaler DSP task can support direct rendering [2].

 I tried direct rendering

Re: [maemo-developers] Cairo performance comparison, 770 / N800 / PXA-320

2007-01-14 Thread Siarhei Siamashka

On Sunday 14 January 2007 20:11, Frantisek Dufka wrote:

 Marius Gedminas wrote:
  On Sun, Jan 14, 2007 at 07:53:06PM +0200, Marius Gedminas wrote:
  On Sun, Jan 14, 2007 at 12:11:37AM +0200, Siarhei Siamashka wrote:
  Also Nokia 770 runs not at 220MHz as stated on your page, but at
  something closer to 250MHz as shown by this test code program
  (and confirmed to be actually 252MHz by somebody from Nokia
  on #maemo about half a year ago).
 
  So http://maemo.org/faq/faq.html#faq-N10129 is lying?

Well, if I were to create a conspiracy theory, I would suggest that it could
be done on purpose to make N800 look like a bigger improvement when 
comparing it to 770 ;-)

But most likely it is just a typo, a lot of new docs became available lately,
so they may contain some minor inaccuracies.

  The OMAP1710 page from Texas Instruments also claims 220 MHz is the
  maximum frequency:

 Check /proc/omap_clock on device, it says 252Mhz for both ARM and DSP core.

Hmm, interesting. Can anybody check /proc/omap_clock on N800 device?
I'm particularly curious about DSP clock frequency (as it can be actually
lower than on 770).
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: [maemo-developers] Cairo performance comparison, 770 / N800 / PXA-320

2007-01-13 Thread Siarhei Siamashka

On Saturday 13 January 2007 21:00, Kalle Vahlman wrote:

 We have all sorts of funny hardware at the office, so I thought I'd
 make a quick run of cairo-perf with the Cairo 1.3.10 snapshot and see
 how they relate to each other.

 There's some funny things I encountered in the results, and I hope
 people on both lists can offer insights on why.

 Details at

   http://syslog.movial.fi

 but let's just say that the results were predictable in general, with
 some surprises:

 N800 is naturally faster than 770, but I didn't expect the xlib
 backend to have so big differences between the two.

Maybe these devices were just running different linux kernels (task 
sheduler may be different) and xservers? So quite a lot of code could 
be different and these results can't be used to compare these cpus 
directly.

 For the cairo audience there's the question of the tessellation
 process, can it really be so fast on the PXA-320 or is there a bug
 somewhere that twists the results? What could be so good in PXA-320
 (or not-good on the other devices) that the results are so drastic?

What is the amount of cache on all these devices? If PXA-320 has 
more cache and all the necessary code/data for this test fit it but not on 
the competing device, that could explain the difference.

By the way, here you can take some code for benchmarking cpu clock frequency: 
https://garage.maemo.org/plugins/scmsvn/viewcvs.php/trunk/libavcodec/tests/testfreq.c?root=mplayerview=markup
It performs two test runs, the first run contains a loop with 10 add
instructions, the second run just contains the same loop but empty.
Substracting time of the second run from the time of the first run we get 
the time of executing these add instructions only. Number of such 
instructions executed per second can be used to measure cpu clock 
frequency. For getting best precision you may want to increase 
TESTS_COUNT define, it will result in a longer test time though.
This test program can show results a bit lower than the actual clock 
frequency (as we have a multitasking OS and other processes also 
take some time). But real cpu clock frequency can't be lower than the 
result benchmarked :) Even for superscalar cpus, these add 
instructions can't be run in parallel as each new instruction depends
on the result of the previous one (hmm, just thought that the last add 
instruction in a loop can be run in parallel with subs which decreases 
loop counter, maybe some additional tweak will be required).

Also Nokia 770 runs not at 220MHz as stated on your page, but at 
something closer to 250MHz as shown by this test code program 
(and confirmed to be actually 252MHz by somebody from Nokia 
on #maemo about half a year ago).

As for optimizing code for ARM (targeting Nokia 770), there are a few things
that are slow (maybe this list is still incomplete):
1. Floating point math is slow without vfp (cairo contains a lot of fp math)
2. Integer division is slow ('/' and '% operators) as ARM does not have
hardware instruction for it and much less efficient software implementation is
used.
3. write access to noncached memory is slow for read-allocate cache on arm926
core (data is not loaded into cache on write), see more details here:
http://maemo.org/pipermail/maemo-developers/2006-December/006579.html
I have some crude patch for valgrind (callgrind part) to simulate
read-allocate cache behaviour (instead of write-allocate as is simulated 
by default), it can show parts of code which have lots of cache misses. If
anybody is interested, I can try to clean it up and submit upstream:
http://ufo2000.xcomufo.com/maemo/vg-read-allocate-cache-patch.diff


I also had a quick look at cairo sources (without benchmarking it, just to
see general coding style). Some parts of code in it are not optimal. For
example this code chunk from cairo-path-stroke.c relies on integer division
(it is unlikely to cause severe performance decrease here, but may become 
a real problem for tight loops):
[cut]
for (i=start; i != stop; i = (i+1) % pen-num_vertices) {
tri[2] = f-point;
_translate_point (tri[2], pen-vertices[i].point);
_cairo_traps_tessellate_triangle (stroker-traps, tri);
tri[1] = tri[2];
}
[/cut]
If we go deeper into _cairo_traps_tessellate_triangle, we will notice the
following:
[cut]
memcpy (tsort, t, 3 * sizeof (cairo_point_t));
qsort (tsort, 3, sizeof (cairo_point_t), _compare_point_fixed_by_y);
[/cut]
There is unnecessary memcpy operation, also qsort is called for just three
elements! And such performance bottlenecks are quite easy to spot almost
everywhere. Most likely the code that is performance critical, is optimized a
lot better, but anyway at least this part deserved a comment such as 
/* I know that it is slow, but this code is not performance critical and I'm
too lazy to optimize it */ :-) 

Anyway, now I see no surprise that such huge improvements were possible
recently :-)

Also this code does

Re: [maemo-developers] Nokia's Linux-powered N800 Internet Tablet sneaks out early

2007-01-07 Thread Siarhei Siamashka

On Monday 08 January 2007 09:02, Komal Shah wrote:

 Ok. Let's fun begin.

 It looks like OMAP2420 only. As per the development track on
 linux.omap.com kernel mailing list the product may _NOT_ be using the
 IVA1.0 processor. Basically OMAP2420 from the multimedia point of view
 is a MPEG4 device. I hope you know that Nokia N93 also uses the
 OMAP2420 processor doing cool mpeg4 encode at VGA 30fps, but that's on
 IVA1.0 not on ARM. So, if N800 is not using IVA, then don't expect
 good multimedia experience like N93.

 BTW, I am slowly and steadily working on the IVA driver components,
 but we don't see whole lot of drivers on that right now.

Also the following line from dmesg log caugth my eye on #maemo irc channel:
[8044.496856] omapfb omapfb: s1d1374x: setting update mode to disabled

Does N800 also use Epson video chip just like N770? Could it be be some
upgraded version? Something like S1D13745 chip with hardware scaling support:
http://www.erd.epson.com/index.php?option=com_docmantask=cat_viewgid=38Itemid=40

OMAP2420 is supposed to have 2D/3D accelerator on chip. Is it disabled and
can't be used at all (using Epson chip instead for better compatibility with
existing software developed for N770)?

Also I wonder about DSP clock frequency for both N770 and N800. ARM core is
clocked at 250MHz in N770 and 330MHZ in N800. TI docs specify 220MHz ARM core
and 220MHz DSP core in OMAP1710, while OMAP2420 is specified to have 330MHz
for ARM core and 220MHz for DSP. Was Nokia 770 clocked at 250/250 or 250/220?
Is the clock frequency for N800 equal to standard 330/220?

As ARM core is supposed to have higher clock frequency than DSP and it
contains additional useful SIMD instructions, is it a better choice than DSP
for multimedia now (pretending that IVA does not exist)?
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: [maemo-developers] maemo mplayer development and its possible future use on Nokia 770

2006-12-12 Thread Siarhei Siamashka

On Tuesday 12 December 2006 01:57, you wrote:

 My original goal of posting the previous message was an attempt to find a
 volunteer who would like to try developing such a frontend.

 I don't have that much time to devote to mplayer development myself. Up
 until this moment I even could not concentrate on solving some specific
 task but tried some bits with MP3 audio output, decoder improvements, GUI
 and user interface, fixing arm specific bugs, and now video output code
 with hardware YUV support. Also some kind of management work, integration
 of useful patches and support for users in the forums takes some time. I
 would like to concentrate on some task such as video decoder optimizations
 for ARM, but seeing that other parts are not in a quite good shape,
 distracts attention somewhat :-)

Well, after reading this part again today, looks like it sounds a bit
controversial, I'm sorry about it. Actually I'm not the only one who took part
in porting and improving mplayer for maemo. Ed Bartosh created deb 
packages and Josep Torra optimized mpeg decoder for Nokia 770. Also 
some patches were taken from AGAWA Koji's Zaurus port of mplayer. Not to
mention numerous upstream developers of mplayer and ffmpeg who developed 
this nice piece of software and were very helpful (especially Mans Rullgard
who developed initial version of armv5te optimized idct code). Also there were
many people who provided useful information and valuable comments.

Consider it just as my regret for not being able to contribute to mplayer
development as much as I should, but not a rant about 'nobody is helping' :-)
I think we still really need a proper credits tab in mplayer GUI.

But I expect that having more people contributing to maemo mplayer development
can provide some very nice results. So feel free to join.
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: [maemo-developers] maemo mplayer development and its possible future use on Nokia 770

2006-12-11 Thread Siarhei Siamashka

On Monday 11 December 2006 11:26, Frantisek Dufka wrote:

 Yes, the result would look like video overlay works in windows or linux
 on PC - overlay draws over different windows when it shouldn't :-) We
 can live with that.

I thought it is actually not a problem but quite a good thing :-) Surely for
mplayer as a standalone video player, supporting keboard/initializing 
some window is important.

But if it just outputs video into some rectangular screen area (provided by
some other application) and is controlled via issuing commands through 
a pipe, it makes possible to develop some advanced frontends which use
mplayer as a video rendering engine. For example a twin of the standard 
Nokia 770 video player which simulates all its GUI controls could be created.

My original goal of posting the previous message was an attempt to find a
volunteer who would like to try developing such a frontend.

I don't have that much time to devote to mplayer development myself. Up until
this moment I even could not concentrate on solving some specific task but
tried some bits with MP3 audio output, decoder improvements, GUI and user 
interface, fixing arm specific bugs, and now video output code with hardware
YUV support. Also some kind of management work, integration of useful
patches and support for users in the forums takes some time. I would like to
concentrate on some task such as video decoder optimizations for ARM, but
seeing that other parts are not in a quite good shape, distracts attention
somewhat :-)

 As for framebuffer permissions, it may be better to 
 relax device permissions than to run mplayer as root.

The most right way to solve this issue is probably to add 'user' to 
'video' group. Alternative solutions involve messing with mplayer 
binary ownership and suid/sgid bits. I wonder what is possible to 
do automatically in the least intrusive way when installing mplayer 
package?

 Well, the conversion is done on the fly while the data is transferred to
 internal epson video buffer. I guess it would be hard to do planar YUV
 - RGB without additional memory. I still don't understand how it is
 done on the fly even in those packed formats since some color
 information (U,V) is common for more lines. Seems like tough task. There
 needs to be additional memory for remembering U,V parts from previous line.

YUV422 is a good format as it matches 16-bit RGB format quite well. Both of
them use 16 bits per pixel, and YUV422 encodes each pair of pixels into a
stride of 4 bytes (16-bit RGB encodes each pixel into 2 bytes, but you can
also treat it as 2 pixels in 4 bytes). So we can mix YUV422 and RGB data in
a framebuffer quite conveniently.

  Another interesting possibility is to relay video scaling and color
  conversion (planer - packed YUV) to DSP.

 I'm not sure, is there some math involved in this or it is just memory
 shuffling? I guess DSP would be really bad for memory shuffling. From
 previous discussions it looks like when you add DSP to the mix all kinds
 of bottlenecks appears. I wonder if gstreamer/dspfbsink could keep up
 with mplayer speed doing just conversion and video output.

Actually DSP may be a good choice for scaling, if you check the same
spru098.pdf you will find Pixel interpolation Hardware Extension part :-)
Also looks like dspfbsink uses DSP for scaling as it provides a mapped 
memory for planar YV12 data (or its variant) and accepts a command to 
do the rendering. I looked through xserver sources and gst plugins to dig 
for information and I think I got some impression about how they work, but 
I think this all deserves a separate post along with some additional
inquiries addressed to Nokia developers :-)

ARM can perform YV12-YUV422 conversion quite fast if properly optimized, I
even suspect that it can provide a throughoutput comparable to memcpy (as
memory controller/write buffer performance is a limiting factor here and some
data shuffling will not make much difference). The benchmarks in my previous
message use standard color conversion/scaling code from mplayer which is 
not optimized for ARM. But just color format conversion is a special case,
sometimes scaling is required and mplayer scaler is rather slow. Scaling
performed by mplayer was completely unusable for RGB target colorspace with
x11 driver, that's why maemo build of mplayer had fallback to SDL when
playback for scaled video was required. Now with the target colorspace YUV422,
it is slow but still usable and a bit better than SDL. If we want a fast
scaler for ARM, using JIT is a good option (and I have some experience in
developing JIT translator for x86).

Anyway, I hope that by using DSP for scaling and running it asynchronously, it
is possible to reduce ARM core usage to almost zero and keep all the resources
for video decoding. A related interesting observation is that screen update
ioctl does not seem to affect performance at all (commenting it out does not
improve performace and naturally we get no picture on the

[maemo-developers] maemo mplayer development and its possible future use on Nokia 770

2006-12-10 Thread Siarhei Siamashka

Hello All,

I have just uploaded a new build of mplayer (mplayer_1.0rc1-maemo.3) to
garage, which implements some experimental and not yet clean video output
method using hardware YUV colorspace and direct access to framebuffer. It's
not quite usable as framebuffer access is not allowed when running mplayer 
as ordinary user (framebuffer device is owned by root:video with 660
permissions). Also right now there are problems with keyboard input and other
applications may cause some flicker effect (for example clock applet or google
search applet overlap fullscreen video when thay are redrawn). Surely all
these problems can be fixed by implementing hybrid x11/framebuffer code 
where x11 is responsible for keyboard input and sets video mode so that no
other application draws over a screen area used by mplayer.

But having a plain framebuffer access creates an interesting possibility.
Looks like mplayer can coexist with other applicaitons nicely and if they
provide some rectangular area for video output, mplayer can be used from
them in slave mode (http://www.mplayerhq.hu/DOCS/tech/slave.txt). Mplayer
just needs to be extended to accept some command line option or slave option
to specify/change screen region that it will use for video playback. Also I 
noticed that there is some initiative to control video players using d-bus: 
http://lists.mplayerhq.hu/pipermail/mplayer-dev-eng/2006-December/048067.html
So maybe someone can develop a more advanced gui frontend for mplayer with all
the eye candy and convenient gui controls. Let me know if you are interested
in this idea and need some help or more information from me.

Also thanks to Frantisek Dufka for Epson chip documentation, having full and
reliable information adds a certain level of confidence and encourages
development (I needed a confirmation that Epson chip supports only packed 
YUV formats and no planar formats are really available).

Next come current benchmark results (mplayer_1.0rc1-maemo.3) showing some
recent improvements, you can skip this part if you are not interested.

Tested with the following video file (its first 100 seconds):
VIDEO:  [DIV3]  640x480  24bpp  23.976 fps  779.1 kbps (95.1 kbyte/s)

Without hardware YUV colorspace support (-vo x11):
# mplayer -endpos 100 -benchmark -vo x11 -nosound -quiet
...
SwScaler: using unscaled yuv420p - bgr565 special converter
BENCHMARKs: VC: 122.215s VO:  90.458s A:   0.000s Sys:   1.769s =  214.442s
BENCHMARK%: VC: 56.9918% VO: 42.1831% A:  0.% Sys:  0.8250% = 100.%


Now using framebuffer output with YUV422 colorspace (-vo nokia770):
# mplayer -endpos 100 -benchmark -vo nokia770 -nosound -quiet testfile.avi
...
SwScaler: using unscaled yuv420p - yuyv422 special converter
BENCHMARKs: VC: 121.282s VO:  31.538s A:   0.000s Sys:   1.577s =  154.397s
BENCHMARK%: VC: 78.5517% VO: 20.4267% A:  0.% Sys:  1.0216% = 100.%

VC - is the raw time it took to decode this video fragment
VO - is the time it took to display it on screen (performing color conversion)
= - is a total time including decoding and displaying (preferably this number 
should stay below 100 seconds is we want to play this video in realtime)

So no playback for 640x480 videos yet, but we got a bit closer to it (and
lower bitrate/resolution videos should be supported better with less battery
power consumption).

Another interesting possibility is to relay video scaling and color conversion
(planer - packed YUV) to DSP. This method is used by dspfbsink gstreamer
plugin and another nice feature is that dsp tasks are accessibe for ordinary
user. But I'll post some more details and thoughts a bit later in another 
message.
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

[maemo-developers] Optimized memory copying functions for Nokia 770 (final part)

2006-12-04 Thread Siarhei Siamashka

Hello All,

Here is an old link with some benchmarks and initial information:
http://maemo.org/pipermail/maemo-developers/2006-March/003269.html

Now for more completeness, memcpy equivalent is also available and 
the functions exist in two flavours (either gcc inline macros, or just
assembly code), all the sources are here:
https://garage.maemo.org/plugins/scmsvn/viewcvs.php/trunk/fastmem-arm9/?root=mplayer

The easiest way to try this code is just linking 'fastmem-arm9.S' with your
code, it will override glibc 'memcpy' and 'memset' functions with this
optimized implementation. But it will probably not affect code that is
contained in other shared libararies, for example SDL will still most likely
use functions from glibc. If you decide to try using gcc inline macros, it may 
be not safe, beware of compiler bugs, more details and testcases are here:
https://maemo.org/bugzilla/show_bug.cgi?id=733

Anyway, this code may be useful for various games, emulators or any 
software that may need to clear/initialize or copy large memory blocks 
fast. So those who are interested, may scavenge something useful there :)

At least adding a variation of this this code to allegro game programming
library for bitmaps blitting/clearing functions allowed to improve framerate
in ufo2000 quite a lot. Sure, that's because of nonoptimal full screen update
method which is not very fast and battery friendly anyway and should be
changed to screen updates only for the parts of screen that were changed. 
But sometimes you may have to update full screen anyway, for example 
when you have it filled with fire and smoke animation. So having fast
bitmaps blitting code and being able to just update full screen and have no 
problems with performance may be a good thing.

Technical explanation (at least my understanding of it) is the following.
Nokia 770 cpu has some small amount of write back cache, but it is not 
write allocate. That means if some memory block is already cached, write
operation is fast and data is stored immediately to cache. But if some 
memory block is not cached, it can get to cpu data cache only after read
operation, but not write (read allocate cache behaviour). If destination
buffer in not in cache, write to it will be performed directly to memory using 
write buffer. Transfers to memory are performed using blocks of 4, 16, or 32 
bytes and these blocks should be aligned. See '5.7 TCM write buffer'
and '6.2.2 Transfer size' from http://www.arm.com/pdfs/DDI0198D_926_TRM.pdf
So if you write to memory one byte at once, memory bandwidth is wasted (you 
get only one byte written per memory bus transfer operation, while you could 
easily get 4 bytes written instead). Here is the worst possible memcpy
implementation for example, if you benchmark it, you will get some
interesting numbers:

void memcpy_trivial(uint8_t *dst, uint8_t *src, int count)
{
while (--count = 0) *dst++ = *src++;
}

But the best performance is achieved when using 16 bytes transfers (aligned 
at 16 bytes boundary, otherwise it will be just split into some 4 byte
transfers). This can't be coded in C, and the use of assembly STM instruction
with 4 registers as operands is needed (or any number of registers that is
multiple of 4).
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: [maemo-developers] WMA streaming

2006-11-28 Thread Siarhei Siamashka

On Thursday 23 November 2006 21:02, you wrote:

 On Monday 13 November 2006 22:08, Andrew Barr wrote:
  What are my options for Windows Media Audio streams on the 770? Most
  Internet radio streams (I mean simulcasts of real broadcasts in this
  sense) are offered in (at least) this format, which is supported by free
  software (ffmpeg) up to version 2, the latest version I've ever seen
  anywhere. However, no one seems to have had any luck with third-party
  codecs on this device, much less for streaming. Is this device powerful
  enough to handle decoding audio streams using C or ARM-assembly codecs
  (from ffmpeg)? It's likely that the bitstream would be 16 to 24 kbps. I
  understand free tools for the DSP aren't quite there yet, so that may not
  be an option.

 FFmpeg library contains WMA decoder and MPlayer supports it at least on
 x86 desktop PC. There are some problems with its support on ARM though.

 First and the most easy to fix is that WMA decoder seems to have some
 alignment problems, so it crashes on any attempt to play WMA files. This is
 quite easy to workaround by running 'echo 3  /proc/cpu/alignment' as root,
 see https://maemo.org/maemowiki/PortingFromX86ToARM and the links at it
 for more information.

 A major problem with WMA decoder is that it uses floating point math. And
 having no FPU in hardware, Nokia 770 does not seem to be able to decode
 128kbps WMA files to play them in realtime (sound is skippy). So it is not
 very useful unless somebody finds (or implements) a fixed point WMA
 decoder.

 However I also tested 20kbps WMA sample and it worked fine with CPU load
 at about 60%. So if anybody would like to have such low bitrate WMA
 files/streams supported, I can have a look at these alignment problems, fix
 them and release updated version of MPlayer for everyone to use.

Well, the latest maemo build of MPlayer now supports WMA audio (with that
extremely inefficient cpu usage and unability to play high bitrates). But it
can play internet radio streams, for example running the following works:

# mplayer mms://wm05.nm.cbc.ca/cbcr1-calgary-low

On the other hand, standard audio player seems to support WMA files playback
just fine :) Does it have problems with internet radio?  Or you just tried to
look for free alternatives? What was the reason for asking this WMA
streaming question in the first place?
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: [maemo-developers] Bluetooth headset?

2006-11-25 Thread Siarhei Siamashka

On Saturday 25 November 2006 19:05, George Farris wrote:

 On Sat, 2006-25-11 at 18:34 +0200, Stefan Kost wrote:
  Hi George,
 
  To make it a bit more clear: If there are plans, we would not be allowed
  to talk about it :(. So you'll have to wait. But be assured it if is
  technically doable its quite likly that we are going to support it. Now
  please let it rest.

 Interesting, this suggests you are somehow or other connected with
 Nokia, I wasn't aware of that.

Yes, I also noticed that only recently as there's Stefan's name in the
copyright statements from the now available sources of Nokia 770 
gstreamer plugins. Now the previous Stefan's posts look in a bit 
different light to me and that's a good thing.

I guess, we are just spoiled by the fact that many (almost all?) Nokia
developers here have '@nokia.com' in their e-mail address :)
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: [maemo-developers] WMA streaming

2006-11-23 Thread Siarhei Siamashka

On Monday 13 November 2006 22:08, Andrew Barr wrote:

 What are my options for Windows Media Audio streams on the 770? Most
 Internet radio streams (I mean simulcasts of real broadcasts in this sense)
 are offered in (at least) this format, which is supported by free software
 (ffmpeg) up to version 2, the latest version I've ever seen anywhere.
 However, no one seems to have had any luck with third-party codecs on this
 device, much less for streaming. Is this device powerful enough to handle
 decoding audio streams using C or ARM-assembly codecs (from ffmpeg)? It's
 likely that the bitstream would be 16 to 24 kbps. I understand free tools
 for the DSP aren't quite there yet, so that may not be an option.

FFmpeg library contains WMA decoder and MPlayer supports it at least on 
x86 desktop PC. There are some problems with its support on ARM though.

First and the most easy to fix is that WMA decoder seems to have some
alignment problems, so it crashes on any attempt to play WMA files. This is
quite easy to workaround by running 'echo 3  /proc/cpu/alignment' as root,
see https://maemo.org/maemowiki/PortingFromX86ToARM and the links at it 
for more information.

A major problem with WMA decoder is that it uses floating point math. And
having no FPU in hardware, Nokia 770 does not seem to be able to decode 
128kbps WMA files to play them in realtime (sound is skippy). So it is not
very useful unless somebody finds (or implements) a fixed point WMA decoder.

However I also tested 20kbps WMA sample and it worked fine with CPU load 
at about 60%. So if anybody would like to have such low bitrate WMA
files/streams supported, I can have a look at these alignment problems, fix 
them and release updated version of MPlayer for everyone to use.
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: [maemo-developers] Unresolved issues (Week 46)

2006-11-23 Thread Siarhei Siamashka

On Thursday 23 November 2006 22:19, Charles 'Buck' Krasic wrote:

 No, my dsp work was actually video related.I did reuse Siarhei
 Siamashka's mplayer code to decode/output mp3 directly, but that
 obviously doesn't help with speex.

Just a disclaimer before anybody starts bashing my ugly hack that allows to
use dspmp3sink for mp3 playback from mplayer :)

I know that it is improper use of gstreamer api for audio synchronization, but
anything that looked somewhat better (proper buffer timestamps, the use of
gstreamer pipeline clock instead of system time, ...) appeared to work even
worse in my tests. So the code that is currently used in mplayer is bad, but
the other options seemed to be even worse :(

Surely it was not the best experience to start getting familiar with gstreamer
and I'm surely not going to start looking for the one who is at fault here,
there is a high probability that I missed something important and it is me
after all ;) Anybody who can fix gstreamer based output module for mplayer 
is welcome to submit a patch.

The only excuse for keeping this bad code in maemo build of mplayer is that 
it provides some performance improvement and works quite acceptable
(audio/video sync is ok) most of the time.

Now as the sources of gstreamer plugins are available, they may provide 
some insights about how to use them better and what could be wrong.

 I'd suggest that the most practical approach for now would be to have
 an application that uses a speex dsp task to decode speex,  and then
 takes the output from that speex task and routes it to an existing
 gstreamer plugin for pcm output. This may be suboptimal, as the
 data will cross the dsp gateway boundary twice more than necessary,
 but it still might retain most of the benefit of offloading speex work
 to the dsp.I mean were talking something like 64KB/s of extra
 copying in the worst case (?), which I don't think will be a very
 significant cost even on the 770's OMAP processor.

 The marginal benefit of persuing a zero copy solution (direct from dsp
 to sound) just probably isn't work the effort.   Documentation for the
 software components of the 770 that use the dsp is virtually
 non-existent until now.   Aside from the mp3 decoder, I think all of
 the other stuff has been basically unavailable to developers outside
 of those working on Nokia's closed source multimedia applications.
 On the bright side, the gstreamer plugins for these various pieces has
 been made open source in maemo 2.1. I wouldn't hold my breath on
 the dsp side of these plugins ever becoming open source (although I
 would wholeheartedly welcome it!).

It would be very nice to have the sources (maybe some stripped down version)
of C55x stuff that is used by dsppcmsink as a template for implementing
thirdparty dsp based decoders. But maybe I'm asking for too much :)
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: [maemo-developers] Maemo 2.0 device reboot

2006-10-18 Thread Siarhei Siamashka

On Wednesday 18 October 2006 16:26, Mike Frantzen wrote:

  On Tue, Oct 17, 2006 at 11:50:03AM +0200, Malix wrote:
   Hi, after upgrade to maemo 2.0 I have a problem. Some times my 770
   reboot. This happen some times when I'm using the browser and every
   time I try to use Gizmo. For now I never had problem with other
   programs. You think this is a software problem or hardware?
 
  My 770 reboots once every two--three days.  I think it's a software
  problem.  I could be wrong.

 I think it's a DSP problem when the microphone is enabled.  When writing
 some audio programs using GStreamer the DSP will eventually start
 sending back error codes, stop working, and eventually cause the whole
 n770 to reboot.  I had to do a lot of trial-and-error playing with how
 the GStreamer dsmpcmsrc pipelines are built and re-used to cut down the
 frequency of the problems.  Never could get it completely reliable
 though.  Once the DSP is wedged you have to reboot (if it hasn't
 spontaneously rebooted already).

I can confirm Nokia 770 GStreamer implementation problems. It is used in
MPlayer (in not a very clean way, but that's another story) in order to get
mp3 audio decoded by DSP and reduce load on ARM core. I also observed 
system instability supposedly caused by starting/stopping audio playback:
http://maemo.org/pipermail/maemo-developers/2006-August/005232.html

Also most likely GStreamer is responsible for MPlayer lockups when trying to
play relatively long flv files as observed here:
http://www.internettablettalk.com/forums/showpost.php?p=22855postcount=207

I know that IT2006 just switched to a new version of GStremer, and bugs can 
be expected. So I'm anticipating a bugfix release to check if some of the
problems got fixed
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: [maemo-developers] Maemo 2.0 device reboot

2006-10-18 Thread Siarhei Siamashka

On Wednesday 18 October 2006 15:58, Amit Kucheria wrote:

 On Wed, 2006-10-18 at 13:00 +0300, ext Marius Gedminas wrote:
  On Tue, Oct 17, 2006 at 11:50:03AM +0200, Malix wrote:
   Hi, after upgrade to maemo 2.0 I have a problem. Some times my 770
   reboot. This happen some times when I'm using the browser and every
   time I try to use Gizmo. For now I never had problem with other
   programs. You think this is a software problem or hardware?
 
  My 770 reboots once every two--three days.  I think it's a software
  problem.  I could be wrong.

 As mentioned in the following link, please file bugs against the errant
 applications.
 http://maemo.org/maemowiki/ReportingRebootIssues?action=show

 I remember in your case it was FBReader - report it to them.

Well, user mode applications not running as root are not supposed to crash the
whole system  and cause it to reboot. Still this instability and reboots issue
seems like a problem with some core component of the system or even the
kernel. Maybe something like:
https://maemo.org/bugzilla/show_bug.cgi?id=677

Could anybody who still has maemo 1.x sdk installed and can recompile
memtester for i, check if the problem is reproducible with memtester in
IT2005?
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

[maemo-developers] maemo-apps.org application catalog

2006-09-27 Thread Siarhei Siamashka


Hello All,

What do you think about current wiki based application catalog? In my
opinion, it is quite a large page already, slow to open and hard to
navigate. As I see it now, maemo-apps.org looks like it may become an
interesting resource for maemo community in the future. But it lacks
content, badly. Most of the applications submitted there were added by
'Frank' who is the admin there. Yesterday I edited
ApplicationCatalog2006 wiki page and added a notice with a suggestion
for application developers to submit their applications to maemo-apps
catalog as well, but this addition was reverted. I'm not suggesting to
drop current application catalog, it would be just stupid. But trying
some alternatives at the same time seems like a good idea to me.

Of course maemo-apps.org is a third party maintained resource with no
sources available. I would prefer something more open if it was
available (or at least maintained by Nokia).

Thoughts?
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

[maemo-developers] RE: defective memory?

2006-09-19 Thread Siarhei Siamashka


On 9/19/06, Kimmo Hämäläinen [EMAIL PROTECTED] wrote:


Yes, it would need to be reproducible in several different devices. The
guy here that tried to reproduce it currently thinks that Siarhei's unit
is broken.


Yes, I also think that the probability of my device being broken is
quite high. A certain (small) fraction of other Nokia 770 owners are
probably having the same problem. Does it make the device completely
useless? Of course no, my device works almost fine, it only crashes
and reboots sometimes, I also has filesystem corruption several times
(now even switched mmc filesystem to ext3, don't know if it would help
much though). So the device can be surely used as a book reader,
internet browser and serve other tasks. Other (small) fraction of
users who got 'white screen of death' were surely less lucky.

What can be done about this if the defective memory problem gets
confirmed. I see three possible ways:
1. 'Ignorance is a bliss' - just do nothing, those who don't know
about the problem will not worry about it :) The device will just
crash or reboot occasionally, some more unlucky users having more
annoying crashes will complain in the forums providing some bad PR.
2. Distribute some diagnostics software that will help to identify
memory problems and repair/replace defective units, that will have
some expences, but will improve overall reliability and reduce the
number of negative publicity.
3. Add some (un)official support for working around bad memory regions
using technology something similar to BadRAM, in this case most of
such units will be completely usable.

In general, bad memory problem is quite common for x86 pc's, but there
is an excellent tool for memory diagnostics - memtest86. It helped me
quite a number of times, also I always advice everyone having
stability issues to run it first. I don't know how the reliability of
memory chips used in embedded devices compares to the reliability of
memory from normal desktop computers, but bad memory seems to be one
of the most frequently encountered hardware problems.
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: [maemo-developers] RE: defective memory?

2006-09-19 Thread Siarhei Siamashka

On Wednesday 20 September 2006 01:12, Olivier ROLAND wrote:

 If your device is broken then mine is also.
 I don't think at all that we speak about (small) fraction because
 majority of users won't even notice the problem.
 My device seem stable until I stressed it. And stressed it is not a
 condition suffisante to make the problem happen.

That's exactly the point. The device is quite usable and most users will not
detect any difference on most common operations. It is a very good sign as 
looks like in order to get rock solid stability, we only need to allocate and
lock the problematic memory page early at boot time and do not let any 
applications use it.

 When I have time, I will make extensive test on my device to check
 exactly when the problem occur.

Please do it, now with the lastest version of the tester and 40MB tested
block, the coverage is almost 2/3 of physical memory. If that's a certain
location in memory, the chances that it can be easily detected are quite 
high. Please verify that the offset of faulty address within 1KB page is
reported to be always the same between different runs (it is equal to 1a5 
for me).

I'm trying to find a way to get a full physical address of that page. In my
last tests I managed to mmap '/dev/mem' (just using 'read' function
segfaults), but did not have enough time to experiment with it much yet.

 My doubt about small fraction are probably driven by the fact that I
 was hit by 'white screen of death' 4 weeks after buying the device.
 So I guess that during the reparation my 770 was checked (again) by the
 conventional Nokia diagnostic.
 I conclude that the conventional Nokia diagnostic doesn't detect the
 problem.

 To make things clear, I don't want to make negative publicity at all. I
 enjoy this device a lot and I've ported Streamtuner on it with lot of
 great feedback from users.

 My 2 cents.

I don't want to make negative publicity either.  My only goal now is to find
some reliable technical solution for both diagnostics and workaround of such
problems. After all, I have a good motivation for that :)

I'm grateful to Nokia as they are also trying to investigate the problem. I'm 
quite confident that we can come up with some solution, and it will have 
some positive effect for Nokia 770 community as a result. This is a new
device, software and tools for it are still being developed. We are all
learning and getting more experience.

 PS: I don't know what is the conventional Nokia diagnostic but as far
 as I know there is always a conventional XXX diagnostic in reparation
 centers.

By the way, when looking for additional information I found some Sharp
Zaurus community forum and asked what they use for hardware diagnostics 
in the hope that I could use the same tools. Somebody replied me that 
hardware diagnostics tools are built in Zaurus firmware and are accessible
from boot menu.
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: [maemo-developers] defective memory? (was: problem with dspmp3sink)

2006-09-18 Thread Siarhei Siamashka

On Monday 11 September 2006 00:34, Olivier ROLAND wrote:

  After playing with the device for some time, I got the same problem with
  lzma program this evening. And memtester also confirms that the memory is
  really defective :(
 
  # ./memtester 20
  memtester version 4.0.5 (32-bit)
  Copyright (C) 2005 Charles Cazabon.
  Licensed under the GNU General Public License version 2 (only).
 
  pagesize is 4096
  pagesizemask is 0xf000
  want 20MB (20971520 bytes)
  got  20MB (20971520 bytes), trying mlock ...locked.
  Loop 1:
Stuck Address   : testing   0FAILURE: possible bad address line at
  offset 0x0037e9a5.
  Skipping to next test...
Random Value: FAILURE: 0xdeb98374 != 0xdeb9 at offset
  0x000fe9a4.
  FAILURE: 0xd04629fc != 0xd046aa88 at offset 0x000fe9a4.
Compare XOR : FAILURE: 0x50467c54 != 0x5046 at offset
  0x000fe9a4.
Compare SUB : FAILURE: 0xb069e1c0 != 0xdc20 at offset
  0x000fe9a4.
  ...

[...]

 Hum ... very interesting memtester give non reproductible result on my
 device.
 and now lzma test failed also ...
 Battery is low. We definitively need to investigate this a little more.
 The good news is that your device is probably not broken. (or mine is
 also ;-) )
 All this should definitively interest Nokia people ...

Well, for the last days I tested memory occasionally and observed problem also
with a fully recharged battery at least once :(

An interesting observation is that you need to gradually increase the size of
tested memory block. You need to start with testing 20MB first, then you can
try 25MB and so on up to 43MB. If you try to allocate and test a large block
of  memory too early, memtester will just get killed.

As for the failures, only the last two hex digits of faulty address always
contain 'a5' and it is a bit strange. I expected that offset within a page
would remain the same (I changed malloc to mmap in order to always allocate
memory buffer at a page boundary ) and unless pages have size equal to 256
bytes, it is inconsistent.

I also wanted to detect physical address of a faulty memory region. I tried to
open '/dev/mem', read it one page at a time and compare its content with the
data from a faulty page. Unfortunately this does not work on Nokia 770 and
segfaults on reading from '/dev/mem'. The same code works fine on
desktop x86 pc and has no problems identifying physical address for any 
page. Test programs were always run as root.

Any other ideas?
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: [maemo-developers] defective memory? (was: problem with dspmp3sink)

2006-09-18 Thread Siarhei Siamashka

On Tuesday 19 September 2006 00:03, you wrote:

[...]

 An interesting observation is that you need to gradually increase the size
 of tested memory block. You need to start with testing 20MB first, then you
 can try 25MB and so on up to 43MB. If you try to allocate and test a large
 block of  memory too early, memtester will just get killed.

 As for the failures, only the last two hex digits of faulty address always
 contain 'a5' and it is a bit strange. I expected that offset within a page
 would remain the same (I changed malloc to mmap in order to always allocate
 memory buffer at a page boundary ) and unless pages have size equal to 256
 bytes, it is inconsistent.

A small update. As I checked manual [1], a minimal page size for arm926ej-s
cpu is in fact 1KB (tiny page). So inconsistency is now resolved.

I have patched memtester to gradually allocate memory starting from 20MB
to the size specified in a command line, so it is possible to check larger
blocks without any extra tricks, you can download this modified memtester
here: http://ufo2000.xcomufo.com/files/memtester-n770.tar.gz

If you are going to try it (and it may be a really good idea), it should be
run as root. The first argument is the size of memory block to be tested (in
megabytes), the second optional argument is the number of passes.

Here is a result of running it on my device:

Nokia770-26:/media/mmc1# ./memtester 40 1
memtester version 4.0.5 (32-bit)
Copyright (C) 2005 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xf000
want 40MB (41943040 bytes)
got  40MB (41943040 bytes), virtual address=0x40128000, trying 
mlock ...locked.
Loop 1/1:
  Stuck Address   : testing   0FAILURE: possible bad address line at 
offset 0x009899a5 (page offset 1a5).
Skipping to next test...
  Random Value: FAILURE: 0x3f770c1e != 0x3f77 at offset 0x004899a5 
(page offset 1a5).
FAILURE: 0xc50dee8d != 0xc50d at offset 0x004899a5 (page offset 1a5).
  Compare XOR : FAILURE: 0x0e119ff2 != 0x0e10 at offset 0x004899a5 
(page offset 1a5).
  Compare SUB : FAILURE: 0x7d558974 != 0x5ca0 at offset 0x004899a5 
(page offset 1a5).
  Compare MUL :   Compare DIV : ok
FAILURE: 0x7febf0e8 != 0x7feb at offset 0x004899a5 (page offset 1a5).
  Compare OR  : FAILURE: 0x7b69b068 != 0x7b69 at offset 0x004899a5 
(page offset 1a5).
  Compare AND :   Sequential Increment: ok
  Solid Bits  : testing   1FAILURE: 0x != 0x at offset 
0x004899a5 (page offset 1a5).
  Block Sequential: testing   1FAILURE: 0x01010101 != 0x0101 at offset 
0x004899a5 (page offset 1a5).
  Checkerboard: testing   0FAILURE: 0x != 0x at offset 
0x004899a5 (page offset 1a5).
  Bit Spread  : testing   0FAILURE: 0xfffa != 0x at offset 
0x004899a5 (page offset 1a5).
  Bit Flip: testing   0FAILURE: 0x0001 != 0x at offset 
0x004899a5 (page offset 1a5).
  Walking Ones: testing   0FAILURE: 0xfffe != 0x at offset 
0x004899a5 (page offset 1a5).
  Walking Zeroes  : testing   0FAILURE: 0x0001 != 0x at offset 
0x004899a5 (page offset 1a5).

So faulty address is always reported to have offset 1a5 within a page on 
every run. Now the next thing to do is to identify physical address for use
with BadRAM kernel patch.

 I also wanted to detect physical address of a faulty memory region. I tried
 to open '/dev/mem', read it one page at a time and compare its content with
 the data from a faulty page. Unfortunately this does not work on Nokia 770
 and segfaults on reading from '/dev/mem'. The same code works fine on
 desktop x86 pc and has no problems identifying physical address for any
 page. Test programs were always run as root.

I would really like to hear something from Nokia regarding this problem. There
may be a few other devices with faulty memory considering some browser crash
reports, reboots and instability for some people, a possible example can be
seen here (though the reporter did not run the memory test as adviced): 
https://garage.maemo.org/tracker/index.php?func=detailaid=84group_id=54atid=269

That's not a tragedy and software solution can probably resolve this problem. 
As you know, bad blocks are common for flash and jffs2 file system handles
this issue. RAM can be probably treated in a similar way by using something
like BadRAM kernel patch [2]

[1] http://www.arm.com/pdfs/DDI0198D_926_TRM.pdf
[2] http://rick.vanrein.org/linux/badram/
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: [maemo-developers] CPU Info

2006-09-11 Thread Siarhei Siamashka

On Monday 11 September 2006 23:51, [EMAIL PROTECTED] wrote:

 When executing: cat /proc/cpuinfo I see the following:
 Processor   : ARM926EJ-Sid(wb) rev 3 (v5l)
 BogoMIPS: 125.76
 Features: swp half thumb fastmult edsp java
 CPU implementer : 0x41
 CPU architecture: 5TEJ
 CPU variant : 0x0
 CPU part: 0x926
 CPU revision: 3
 Cache type  : write-back
 Cache clean : cp15 c7 ops
 Cache lockdown  : format C
 Cache format: Harvard
 I size  : 32768
 I assoc : 4
 I line length   : 32
 I sets  : 256
 D size  : 16384
 D assoc : 4
 D line length   : 32
 D sets  : 128

 Can anybody explain what the java feature in the Features mean?

Probably the explanation is here (on the first page):
http://www.arm.com/pdfs/DVI0035B_926_PO.pdf

There are lots of other interesting docs at http://www.arm.com by the way.
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: [maemo-developers] defective memory? (was: problem with dspmp3sink)

2006-09-10 Thread Siarhei Siamashka

On Sunday 10 September 2006 11:36, Olivier ROLAND wrote:

 Your test work fine on my device.
 I see that you run it from /media/mmc1so I guess you format your memory
 card with ext2.
 Mine still vfat so I can't. If you got same error when running from
 internal memory then your device is broken.

Thanks a lot for finding time and running the test.

Today in the morning I could not reproduce this bug. The device battery just
was recharged during night. As nothing else was changed (I checked uptime to
be sure that it did not reboot or something), I see three possible
explanations (may be wrong, I'm not hardware expert):
* page with the faulty memory bit was allocated to some other process
* cpu or memory chip was just overheated because of heavy use and the
bug disappeared as the temperature got back to normal
* maybe the bug is somewhat related to low battery charge level, maybe the
battery was unable to provide enough voltage or something for reliable
operation

I did some search and found this utility for testing memory on non-x86
hardware: http://pyropus.ca/software/memtester/
For those who are lazy to compile it, the binary is here:
http://ufo2000.xcomufo.com/files/memtester.gz

After playing with the device for some time, I got the same problem with lzma
program this evening. And memtester also confirms that the memory 
is really defective :(

# ./memtester 20
memtester version 4.0.5 (32-bit)
Copyright (C) 2005 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xf000
want 20MB (20971520 bytes)
got  20MB (20971520 bytes), trying mlock ...locked.
Loop 1:
  Stuck Address   : testing   0FAILURE: possible bad address line at 
offset 0x0037e9a5.
Skipping to next test...
  Random Value: FAILURE: 0xdeb98374 != 0xdeb9 at offset 
0x000fe9a4.
FAILURE: 0xd04629fc != 0xd046aa88 at offset 0x000fe9a4.
  Compare XOR : FAILURE: 0x50467c54 != 0x5046 at offset 
0x000fe9a4.
  Compare SUB : FAILURE: 0xb069e1c0 != 0xdc20 at offset 
0x000fe9a4.
...

By the way, I have seen some reports about random device reboots, maybe 
these people also suffer from defective memory problem. So maybe it is a 
good idea for everyone to test their memory. Though use it at your own risk, I
can't be sure that this test program is working correctly and always provides
valid results (I only found it today).

Well, as now the problem is identified, it is time to think how to solve it.

The first task is making a proper memory testing utility. As memtester needs
to allocate memory for testing and lots of memory is already taken by IT OS 
software and libraries, we can only test a small part of memory (only ~1/3 in
the test above). Maybe it is possible to patch kernel (or it already provides
such functionality) to allocate any physical memory page for us (relocating
its data to some other place if it is already occupied by some other process).
If it is possible, we would be able to check all the physical memory except
for probably the part occupied by the kernel itself.

The next task would be to make some way to use BadRAM kernel patch on 
Nokia 770: http://rick.vanrein.org/linux/badram/
Preferably physical addresses of the defective parts of memory should be
stored somewhere so that they survive reflashing (rd mode and other flags 
are stored in such a way, right?). If BadRAM patch becomes a part of 
standard Nokia 770 kernel, it can help to make use of the memory chips that
otherwise would have to be replaced. I wonder how much does Nokia 770 
memory chip cost?

By the way, maybe Nokia already has some utility for hardware diagnistics 
and it could become available for download? There would be no need to 
reinvent the wheel in this case.
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: [maemo-developers] defective memory? (was: problem with dspmp3sink)

2006-09-09 Thread Siarhei Siamashka

Hello All,

I'm sorry for a long chunk of quoted text at the end of this message 
(it describes the sympthoms of the problem), but looks like I got an 
almost reliable proof that there is something wrong with the hardware 
of my device :(

I tried to find some software that could be used for benchmarking 
and LZMA SDK (http://www.7-zip.org/sdk.html) looked like an interesting 
option for doing it. But when run on Nokia 770, it sometimes works 
normally and sometimes fails with the following error message:

/media/mmc1 $ time ./lzma b -d19

LZMA 4.43 Copyright (c) 1999-2006 Igor Pavlov  2006-06-04

   CompressingDecompressing


Error: CRC Error
Command exited with non-zero status 1
real1m 37.14s
user1m 36.10s
sys 0m 0.74s


As you see, it failed internal test and was unable to decompress data back
correctly. LZMA is an advanced compression algorithm and uses quite a lot of
memory (it shows ~20MB memory usage in top with '-d19' option). If any of the
bits within this memory block has problems, it can affect data integrity and
cause incorrect compression or decompression. So probably lzma can be also
used as some kind of memory checker.

But it may be also some problem in LZMA code and not in my Nokia 770 hardware,
so I would like to ask somebody to run the same test and check if  the same
problem can be reproduced. You can download the sources of LZMA SDK using this
link: http://prdownloads.sourceforge.net/sevenzip/lzma443.tar.bz2?download
Decompress this archive, change to 'C/7zip/Compress/LZMA_Alone' directory and
run 'make -f makefile.gcc' to compile it. Alternatively you can use my
compiled binary: http://ufo2000.xcomufo.com/files/lzma.gz

Considering that this test program works fine in scratchbox with qemu, LZMA
SDK page mentions performance on ARM and the existence of LZMA debian 
package for  ARM, I think that software bug theory is not very relevant, but
it still needs to be confirmed.

I will wait for feedback in order to confirm if the problem really exists in
my hardware. But looks like it is a high probability that I will have to make
some kind of more advanced memory checker, try to identify faulty memory
physical address and experiment with badram kernel patch.

On Wednesday 23 August 2006 23:09, you wrote:

   Also I noticed that gstreamer is not very reliable, at least when using
   it from mplayer. It can freeze or reboot the device sometimes. That's
   not something that should be expected from high level API. If I detect
   some reliable pattern in reproducing these bugs, I'll report it to
   bugzilla for sure. But right now just using mplayer and lots of seeking
   in video can cause these bugs reasonably fast.

...

 Earlier I noticed problems with sound output getting blocked that could be
 fixed by bult-in audio or video player. When trying to play anything it
 first shows error message. After the second attempt either the sound got
 fixed or the device rebooted. I suspected that something could get wrong
 with dsp and standard  audio player is able to reset it. That was observed
 when using fdsrc element for feeding data to the decoder in mplayer. On
 stopping/resuming playback, probably partial audio frames could be feeded
 to mp3 decoder and that might result in its misbehaviour.

...

 Now only complete mp3 audio frames can be sent to dspmp3sink. Anyway, first
 everything was ok and I even suspected that I will not encounter any
 problems at all. But after a few hours I got several reboots. After the
 last reboot even wifi started working strange (could not connect using ssh,
 it just showed various errors). Turning the device off, waiting for a few
 minutes and turning it on again got everything back to normal. Now I
 suspect that it could probably be overheating or some other hardware
 problem (the device worked with wifi on and heavy cpu usage because of
 decoding video for a long time). I'll keep an eye on it and will report
 again if the problems keep showing up and if their source becomes more
 clear.

...

 I tried swap a long time ago on IT2005, that was done in order to make gcc
 work on Nokia 770 to try compiling something before I installed
 scratchbox :) Anyway, I did not like the stability as gcc started to fail
 with internal compiler errors. So I decided not to use swap as long as it is
 enough memory for what I need. 

 Also there was some swap related report about the problem with mplayer:
 http://www.internettablettalk.com/forums/showpost.php?p=20068postcount=96

 But maybe I should give swap another try on IT2006 and see if it helps to
 improve stability.

 By the way, I already asked this question in the mailing list long time
 ago, but are there any tools for hardware diagnostics on Nokia 770?
 Something like memtest86 could probably be very useful.

 Though availablility of hardware diagnostics tools could probably result in
 more devices getting returned for replacement with otherwise undetected
 problems and have negative

Re: [maemo-developers] automatic byte order check

2006-08-23 Thread Siarhei Siamashka

On Monday 21 August 2006 10:45, Detlef Schmicker wrote:

 I had a look at the vncviewer and saw, that it is working in the sandbox
 in connection with vino (gnome vnc server). On the device the CoRRE
 encoding does not work.

 Probably it is a byte order problem. The code has a lot of byte order
 (e.g. GUINT16_TO_BE). Is there a way to automaticaly warn critical
 points at compilation? Are there any tools?

I'm not completely sure if understood your post correctly, but cpu used in 
Nokia 770 is little endian (the same as x86). So it is unlikely to have
byte order or endian problems here.

But ARM is alignment sensitive, so you may have problems because of bad
alighment, I started making a page on wiki describing this issue (still very
incomplete): http://maemo.org/maemowiki/PortingFromX86ToARM

I also tried to search for tools that could identify alignment problems
automatically, but did not find anything useful. Probably the most easy 
way to make such tool is to modify valgrind to track alignment for each
memory access operation. But don't know, I ended up finding and fixing 
such problems in my code manually without the help of any tools  :)
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: [maemo-developers] problem with dspmp3sink (was: problem with gstreamer and dsppcm)

2006-08-23 Thread Siarhei Siamashka

On Monday 21 August 2006 18:34, Charles 'Buck' Krasic wrote:

 Just in case you have not done it already, enabling swap in your device
 can help a lot to prevent out-of-memory errors.Maybe this will help
 with mplayer/gstreamer stability.

 I personally suspect  a design flaw in the current Linux VM subsystem.
 I've observed that if an application allocates memory rapidly,  the
 kernel may fail to reclaim pages quickly enough from the page and buffer
 caches (they are only caches after all), so it actually denies the
 allocation request. For example, with zero swap, on a machine with
 1G of ram, and 500M of it pseudo-free (used by caches), I've seen
 moderate allocations fail--like when starting an application like
 firefox.Enabling even a small amount of swap seems to dramatically
 change this behaviour.

Thanks for the information, this is interesting. I tried swap a long time ago
on IT2005, that was done in order to make gcc work on Nokia 770 to try
compiling something before I installed scratchbox :) Anyway, I did not like
the stability as gcc started to fail with internal compiler errors. So I
decided not to use swap as long as it is enough memory for what I need.

Also there was some swap related report about the problem with mplayer:
http://www.internettablettalk.com/forums/showpost.php?p=20068postcount=96

But maybe I should give swap another try on IT2006 and see if it helps to
improve stability.

By the way, I already asked this question in the mailing list long time ago,
but are there any tools for hardware diagnostics on Nokia 770? Something 
like memtest86 could probably be very useful.

Though availablility of hardware diagnostics tools could probably result in
more devices getting returned for replacement with otherwise undetected
problems and have negative impact on Nokia profit (just joking).
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

[maemo-developers] dspmp3sink and gst_element_query_position() function

2006-08-04 Thread Siarhei Siamashka

Hello All,

I'm trying to get current playing position from dspmp3sink, but
gst_element_query position() function fails. The same code works 
fine for dsppcmsink.

Did anybody encounter such problem? What solution can be used? 
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: [maemo-developers] libxv status?

2006-07-26 Thread Siarhei Siamashka

On Tuesday 25 July 2006 09:01, you wrote:

  http://www.internettablettalk.com/forums/showthread.php?t=2405
 
  Actually mplayer works surprisingly fast and has performance not much
  inferior to default video player on Nokia 770 that is using DSP. And
  that all is even without hardware colorspace conversion support!
 
  So is it possible to have an accelerated version of libxv on Nokia 770
  that would support colorspace conversion and scaling (no matter whether
  using video controller capabilities or relay this task to DSP)? So that
  a more universal and well supported ARM core could deal only with video
  stream decoding.

 We don't have any plans to do this for the 770.

Thanks for your reply.

In order not to take much of your time, just a few more questions which
require only yes/no/maybe answers :)

Does Nokia 770 hardware really support YUV colorspace?

Is it technically possible (I'm not asking whether it is planned by Nokia now)
to have some simple API for YUV colorspaces support added probably as part of
libxsp?

Is Nokia interested in getting any assistance from the community (from me
for example) in improving performance and capabilities of the software
and libraries preinstalled on the device?

I'm interested in good video support and also game development (official
support for twice lower resolution in SDL using pixel doubling, support for
portrait/landscape screen orientation modes, background music playback
utilizing DSP core, virtual keyboard for X11 applications and SDL in
particular, etc.). So I don't mind contributing some of my free time to do
some work in order to get all this real.

Thanks.
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: [maemo-developers] SQLITE development in IT2006

2006-07-14 Thread Siarhei Siamashka

On Friday 14 July 2006 02:41, Steven Hill wrote:

 Can anyone tell me either:
 1. Where I can find an ARMEL version of libsqlite3-dev to use when
 developing for the Nokia 770 IT2006 version, OR
 2. Where I can find a port of the libsqlite0 package to the ARMEL IT2006
 version.

 Failing that, how do I install the libsqlite0 package to a Nokia 770
 device running IT2006? I know that the package is available at the
 repository, but can't figure out how to get at it using the package
 installer!!!

Until a solid repository with all widely used thirdparty libraries is set up,
I think the most easy way of using sqlite is just installing it from sources
in SDK and linking statically with application. You can always add
libsqlite3-dev dependency in the future any time you want :)
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

[maemo-developers] libxv status?

2006-07-11 Thread Siarhei Siamashka


Hello All,

When trying to compile mplayer, I have noticed that there is libxv
packaged in maemo sdk and also it can also be installed using apt-get on
the real device. It is useless at least for mplayer though:
  It seems there is no Xvideo support for your video card available.
  Run 'xvinfo' to verify its Xv support and read
  DOCS/HTML/en/video.html#xv!
  See 'mplayer -vo help' for other (non-xv) video out drivers.
  Try -vo x11

http://www.internettablettalk.com/forums/showthread.php?t=2405

Actually mplayer works surprisingly fast and has performance not much
inferior to default video player on Nokia 770 that is using DSP. And
that all is even without hardware colorspace conversion support!

So is it possible to have an accelerated version of libxv on Nokia 770
that would support colorspace conversion and scaling (no matter whether
using video controller capabilities or relay this task to DSP)? So that
a more universal and well supported ARM core could deal only with video
stream decoding.

Having a more capable video player with *.ogm, *.mkv formats support,
subtitles, wider variety of supported codec and other nice features
would be nice to have. ARM core may have higher power consumption, but
it seems to be fast enough for decoding video. Some people may want to
trade battery life for better video formats support and quality.

Also it is interesting to know about the future plans of Nokia team.
What can we expect to see improved in the near future (maybe after some
nice summer vacations you all deserved ;))? Also what help can the
community provide?

___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: [maemo-developers] Nokia 770 CPU speed

2006-06-30 Thread Siarhei Siamashka

On Friday 30 June 2006 09:24, Raju, Vanitha G wrote:

 Is there a utility to read the processor speed? Or if you have the
 register details which can be accessed to determine the speed - that
 would be of help.

Maybe this can be of any use:
http://maemo.org/pipermail/maemo-developers/2006-April/003610.html
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: [maemo-developers] Game implementation on Hildon or SDL? And what about my project

2006-05-05 Thread Siarhei Siamashka


Daniel Monteiro wrote:


SDL would have a faster hardware access. On the other
hand , Maemo wold be better integrated with the
system, would not need the GameLauncher, etc.


I'm sorry to interfere, but is GameLauncher really needed?
SDL applications seem to run fine enough or I'm missing something?

___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: [maemo-developers] New Nokia 770 software image available

2006-04-20 Thread Siarhei Siamashka


Disconnect wrote:

Will changelogs be provided later? (Not to threadjack this into the other 
conversation, but this is a great example of where nokia can better their 
interaction with the community in a fairly easy way. Even if some parts have 
to be left out or summarized as Browser stability, most of the changelogs 
should be safe enough.)


http://maemo.org/pipermail/maemo-developers/2006-April/003639.html

Might be a fix for this problem?

___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

[maemo-developers] Nokia 770 CPU clock frequency question

2006-04-06 Thread Siarhei Siamashka


Hello all,

We have a discussion about Nokia 770 specs in some Russian speaking
forum. We figured out that there is different information about
Nokia 770 cpu clock frequency floating around.

Some sources report 220MHz:
http://maemo.org/maemowiki/Nokia_770_Hardware_Specification
The others report 250MHz:
http://en.wikipedia.org/wiki/Nokia_770

I have written a small test program that tries to measure cpu clock
frequency and it clearly shows 250MHz which confirms wikidepia
information: http://ufo2000.sourceforge.net/files/testfreq.tar.gz

# ./testfreq
CPU clock: 251.26MHz
Time spent for testing: 35.82 seconds

But there is some person who claims that he is providing Nokia 770
repairy and that he has an official service manual from Nokia. And
according to his information, Nokia 770 has 220MHz cpu and no other
frequency options are possible. So he claims that it is he who has
the precise information and I am just trying to deceive people :(

Well, I believe my own eyes, and if it performs as 250MHz cpu, it can't
have clock frequency 220MHz. Could anybody from Nokia clarify this
information?

Also as cpu frequency does not seem to be well defined which results in
contradictory information, might it be an indication that the next
hardware revisions of Nokia 770 might have a clock frequency 220MHz
(standard for OMAP1710) in order to reduce device cost for example 
(after software update improves performance to the sufficient level)?



___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: [maemo-developers] Abuse on the ApplicationCatalog page

2006-04-05 Thread Siarhei Siamashka


Weinehall David (Nokia-M/Tampere) wrote:


On ons, 2006-04-05 at 01:15 -0700, ext Aaron Levinson wrote:

On Wed, 5 Apr 2006, Weinehall David (Nokia-M/Tampere) wrote:


[snip]
Also, whether or not this application belongs on the Application Catalog 
page, it's still appropriate to consult with the author before deleting an 
entry (something that is difficult to do if anyone can modify the Wiki 
without registering).  Doing otherwise, as the individual has done who has 
repeatedly deleted this entry, demonstrates a lack of common courtesy.


I can agree that we need some sort of rules about what stays and what
goes, and only non-anonymous access, but needing to contact the author
before removing something defeats the whole purpose of a wiki.


Maybe wiki is just not very suitable for keeping the list of
available applications? As this page grows, it gets harder to maintain.
Just imagine a full list of sourceforge projects on a single wiki page
for example :) It is an exaggeration, but I still hope that the list of
maemo compatible applications will eventually grow to the number that
this wiki will be unable to handle.


___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: [maemo-developers] Optimized memory copying functions for Nokia770

2006-03-24 Thread Siarhei Siamashka


Simon Pickering wrote:


I don't know what version flash image that c760 was running, but my c750
produces better results. I compiled your test program in two versions - arm4
and arm5 using the default toolchains built by bitbake + OpenEmbedded for
the Zaurus sl-5500 and c7x0 machines respectively.


I'll ask for this information.


Note that OpenZaurus uses significantly more up-to-date versions of
libraries, etc. than the standard Sharp images, so this probably accounts
for the difference in speed that you see below. If you want more info on the
actual lib versions and patches then let me know.

I ran these two binaries on my Zaurus sl-5500 and Zaurus sl-c750 (both
running the latest OpenZaurus 3.5.4 flash image), and on my Nokia-770. The
arm5 binary ran fine on the arm4 arch sl-5500, obviously there were no arm5
instructions included, but the times are slightly different - I don't know
whether this is to do with background processes or something to do with the
compiler (though I don't suppose the compiler really has much to do in this
case.)

Results as follows:

[results skipped]



Compiling for arm4 or arm5 should not make any difference as all the
code that is benchmarked is not generated by compiler anyway (standard
functions are in standard libraries, tested functions are implemented as
inline assembler). Slight times differences should be ignored and are
just random deviations (+-1MB/s does not change overall picture much).

From these results looks like that:

Optimized memset works good on all tested platforms, providing the
same or much better results. Memset performance is critical for clearing
bitmaps to some color and drawing rectangles, so optimizing it makes
sense.

The same code for memcpy works good for Nokia 770 and StrongARM, but
XScale needs different optimizations. Seems like reading memory is
important here, maybe prefetch (PLD instruction) could improve
performance. I tried using prefetch when writing optimized memcpy for
Nokia 770, but it did not have any effect at all. But prefetch requires
armv5, so it affects portability.

An interesting observation is standard memset vs. memcpy performance on
StrongARM. In spite of doing more work, memcpy is even faster :)

It would be interesting to make some tests on PXA270 too.

___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: [maemo-developers] Optimized memory copying functions for Nokia 770

2006-03-17 Thread Siarhei Siamashka


Siarhei Siamashka wrote:

 ...

It is strange that such 16-byte alignment trick was neither used in
uclibc nor in glibc until now. One more option is that this improvement
is only Nokia 770 specific and nobody else ever encountered it or had to
use. Well, do we really care anyway? ;)

Now I just really badly want to see the benchmark results from some
other cpu, preferably intel xscale :)


Just got report from running my test on Sharp Zaurus SL-C760:

--- running correctness tests ---
all the correctness tests passed
--- running performance tests (memory bandwidth benchmark) ---:
memset() memory bandwidth: 80.35MB/s
memset8() memory bandwidth: 83.55MB/s
memcpy() memory bandwidth (perfectly aligned): 45.29MB/s
memcpy16() memory bandwidth (perfectly aligned): 45.20MB/s
memcpy() memory bandwidth (16-bit aligned): 43.15MB/s
memcpy16() memory bandwidth (16-bit aligned): 38.27MB/s
--- testing performance for random blocks (size 0-15 bytes) ---
memset time: 0.960
memset8 time: 0.880
--- testing performance for random blocks (size 0-511 bytes) ---
memset time: 3.840
memset8 time: 3.670

So memory copying functions on Zaurus are already optimal for this
Zaurus and my implementation only causes performance degradation :)

There are two possibilities now:
1. This particular Zaurus has a much better memcpy implementation worth
   looking at
2. Optimizations for memcpy are very cpu dependant and good code for
   Nokia does not necessery work good for Zaurus and vice versa.

PS. Nokia seems to have a much faster memory than Zaurus by the way :)


___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: [maemo-developers] Optimized memory copying functions for Nokia 770

2006-03-14 Thread Siarhei Siamashka


Tomas Frydrych wrote:


There seems to be no source for the functions in the tarball.



Siarhei Siamashka wrote:

Hello All,

Here are the optimized memory copying functions for Nokia 770 
(memset is more than twice faster, memcpy improves about 10-40% 
depending on relative data blocks alignment).


http://ufo2000.sourceforge.net/files/fastmem-arm-20060312.tar.gz

...


Like Dirk already replied, the implementation is in macros in the .h
file. I'm sorry for not providing detailed instructions about using this
tarball. Here they are:

# wget http://ufo2000.sourceforge.net/files/fastmem-arm-20060312.tar.gz
# tar -xzf fastmem-arm-20060312.tar.gz
# cd fastmem-arm

Now compile and run the test(in scratchbox using sbrsh cpu
transparency method):

# gcc -O2 -o fastmem-arm-test fastmem-arm-test.c
# ./fastmem-arm-test

If you want to use this optimized code in your programs, just add
fastmem-arm.h file to your project and the following line into your
source files:

#include fastmem-arm.h

And now you can use these functions (which are provided as a set of
macros using inline assembler, so they are all contained within
fastmem-arm.h file which is their source), the most simple to use is
'memset8', it is a direct replacement for 'memset' and can be used
instead of it to provide a huge performance boost.

The functions are optimized for different alignments, for example:
uint16_t *memcpy16(uint16_t *dst, uint16_t *src, int count)

It copies only 16-bit buffers, but it still can be used for a fast copy
of 16-bit pixel data (as Nokia 770 uses 16-bit display). I can make
'memcpy8' function later, but expect its sources to grow about twice
and become much more complicated (because of more complicated handling
of leading/trailing bytes and 2 more relative alignment combinations).
It will take some time.

Hope this information helps. Still waiting for feedback :)


___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: [maemo-developers] Optimized memory copying functions for Nokia 770

2006-03-14 Thread Siarhei Siamashka


Tomas Frydrych wrote:


Like Dirk already replied, the implementation is in macros in the .h
file.


I see. That makes the comparison with memcpy somewhat unfair, since you
are not actually providing replacement functions, so this would only
make difference for -O3 type optimatisation (where you trade speed for
size); it would be interesting to see what the performance difference is
if you add the C prologue and epilogue.#


Memory bandwidth benchmarking is done on 2MB memory block, so prologue
and epilogue code does not introduce any noticeable difference.

I did not pay much attention on optimizing prologue/epilogue code yet,
it should make difference on smaller buffer sizes, but it is in a TODO
list.


BTW, you can instruct gcc to use inlined assembler version of its memcpy
and friends as well, I think -O3 includes this, but if I read
bits/string.h correctly in my sbox, there are no such inlined functions
on the arm though, so there is certainly value in doing this.


Well you got the source, so you can do your own benchmarks either with
-O2 or -O3 or even -O9 and post them here :)


___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: [maemo-developers] Optimized memory copying functions for Nokia 770

2006-03-14 Thread Siarhei Siamashka


Eero Tamminen wrote:

That makes the comparison with memcpy somewhat unfair, since you 
are not actually providing replacement functions, so this would 
only make difference for -O3 type optimatisation (where you trade 
speed for size); it would be interesting to see what the 
performance difference is if you add the C prologue and epilogue.#


One should also remember that inlining functions increases the code 
size.  On trivial sized test programs this is not an issue, but in 
real programs it is, especially with the RAM and cache sizes that ARM

 has.


Sometimes inlining makes sense, sometimes it does not. In my case
(blitting code for allegro game programming library) it does, just
quoting myself:

Also just improving glibc might not give the best results. Imagine a
 code for 16bpp bitmaps blitting. It contains a tight loop of copying
 pixels one line at a time. If we need to get the best performance 
possible, especially for small bitmaps with only a few horizontal 
pixels, extra overhead caused by a memcpy function call and also 
extra check for alignment (which is known to be 16-bit in this case) 
might make a noticeable difference. So directly inlining code from 
that 'memcpy16' macro will be better in this case.



By the way, I tried to search for asm optimized versions of memcpy for
ARM platforms. Did not do that before as my mistake was that I assumed
glibc memcpy/memset implementations to be already optimized as much as
posible.

Appears that there is fast memcpy implementation in uclibc and there are
also much more other implementations around. Seems like I tried to
reinvent the wheel. Too bad if it appears that spending the whole 2 days
on weekend was a useless waste of time :( Well, at least I did not try
to steal someone's else code and 'copyright' it.

As I told before, my observations show that it is better to align
writes on 16-byte boundaries at least on Nokia 770. The code I have
posted is a proof of concept code and it shows that it is faster than
default memset/memcpy on the device. I'm going to compare my code with
uclibc implementation, if uclibc is in fact faster or has the same
performance, I'll have to apologize for causing this mess and go away
ashamed.

In any case, performance of memcpy/memset on default Nokia 770 image is
far from optimal. And considering that the device is certainly not
overpowered, improvements in this area might probably help. Just checked
GTK sources, memcpy is used in a lot of places, don't know whether it
affects performance much though. Is it something worth investigating by
Nokia developers?


___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

96 matches

Mail list logo