Re: [maemo-developers] Improving Cairo performance on the N800
On Tuesday 16 January 2007 12:08, Zeeshan Ali wrote: Now, the recently announced Nokia N800 is different from the 770 in various ways that are interesting for Cairo performance. I've got my eye on the ARMv6 SIMD instructions and the PowerVR MBX accelerator. Yeah! me too. The combined power of these two can make it possible to optimize a lot of nice free software out there for the N800 device. However! while former is fully documented and the documentation is available for general public, it doesn't have a lot to offer. ARMv6 SIMD only operate on 32-bit words and hence i find it unlikely that it can be used to optimize double fp emulation in contrast to the intel wirelesss MMX, which provides a big bunch of 128-bit (CORRECTME: or was it 64- bit?) SIMD instructions. OTOH, these few SIMD instructions can still be used to optimize a lot of code but would it be a good idea for cairo if you need to convert the operand values to ints and the result(s) back to float? Well, OMAP2420 seems to support floating point in hardware, so all this stuff is probably not needed anymore :) I have already been thinking on utilizing ARMv6 before the N800 was release to public. My proposed plan of attack for the community (and also the Nokia employees) is simply the following: 1. Patch GCC to provide ARMv6 intrinsics. (1 MM at most) 2. Patch liboil [1] to utilize these intrinsics when compiled for ARMv6 target (1-3 MM) 3. Make all the software utilize liboil wherever appropriate or ARMv6 intrinsics directly if needed. The 3rd step would ensure that you are optimizing your software for all the platforms for which liboil provides optimizations. OTOH! one can skip step#1 and write liboil implementations in assembly. I already did a little progress on this and the result is two header files which provides inline functions abstracting the assembly instructions. I am attaching the headers. One of my friend was supposed to convert them to gcc intrinsics and patch gcc but i never got around to finish them. However I am attaching the headers so anyone can use it as a starter if he/she likes. According to my tests, performance improvement from using such header files is minimal. They are easy to use, but the improvement is generally not very good. When I benchmarked idct performance, I also tested C implementaion with some macros for fast armv5te 16-bit multiplication out of curiasity. Performance improvement was only about 5%. While at the same time, handcrafted code improves performance by as much as 50% (and still has potential for more optimizations): http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/2006-September/045837.html The very similar minimal effect is obtained from using such macros in ffmpeg mp3 decoder. The explanation is simple. Compiler is not able to shedule instructions as good as human especially if it has some 'alien' parts of code inserted in the flow of its instructions via inline asm. For example, this multiply instruction takes 1 cycle to execute, but the result has 1 extra cycle latency (for ARM9, it is even higher for ARM11 and is equal to 2 cycles) and you can't use it immediately in the next instruction. As gcc does not know about the sheduling of such instructions when using just macros, it may try to use the result immediately and suffer form 1 or more cycles penalty because of pipeline interlock. So if really good performance is required, nothing can beat handcrafted assembly yet. Of course it makes sense to profile code and optimize only time critical relatively small leaf functions. By the way, free software is really poorly optimized for ARM right now. For example, SDL is not optimized for ARM, xserver is probably not optimized as well, a lot of performance critical parts of code in various software are still only implemented in C for ARM while they have x86 assembly optimizations long ago. Considering that Internet Tablets might have a tight competition with x86 UMPC devices in the near future, ARM poweded devices are at some disadvantage now. Is this something that we should try to change? :-) ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] Xvideo support for Nokia 770?
On Wednesday 10 January 2007 01:51, Charles 'Buck' Krasic wrote: Siarhei Siamashka wrote: Actually I have been thinking about trying to implement Xvideo support on 770 for some time already. Now as N800 has Xvideo support, it would be nice to have it on 770 as well for better consistency and software compatibility. As you may recall, I was considering this back in August/September. I tried a few things, and reported some of my findings to this list. The code for all that is still available here: http://qstream.org/~krasic/770/dsp/ Yes, sure I remember. Thanks for doing these experiments and making the results available. It really helps to have more information around. I see the following possible options: 1. Implement it just using ARM core and optimize it as much as possible (using dynamically generated code for scaling to get the best performance). Is quite a straightforward solution and only needs time to implement it. It is my impression that this might be the most attractive option. I noticed that TCPMP which seems to be the most performant player for the ARM uses this approach, and it is available under GPL, so it may be possible to adapt some of its code. In the long run, I would hope that integrating TCPMP scaling code into libswscale of the ffmpeg project might be the most elegant approach, since that seems to be the most performant/featureful/widel adopted open-source scaling code (but not yet on ARM). For mplayer, it works out of the box, since libswcale actually originated from mplayer, and only recently migrated to ffmpeg. I see, thanks for the information (I checked TCPMP sources some time ago, but was interested in runtime cpu capabilities detection code and did not look at the scaler that time). Using TCPMP code may be an interesting option. But I also still may try to make my own scaler implementation for two reasons: 1. TCPMP is covered by GPL license, and most parts of ffmpeg are LGPL, so probably it makes sense making a clean room implementation of JIT powered scaler for ARM under LGPL license 2. I'm worried about the performance. Knowing how the cache and write buffer work on arm926 core, it is possible to tune generated code for it and get the best performance possible. So the results can be better than for TCPMP. I have just committed some initial assembly optimizations for unscaled yuv420p - yuyv422 color format convertor to maemo mplayer SVN. It already provides some performance improvement, for example on my test video file (640x480 resolution, 24 fps) I get the following results now: BENCHMARKs: VC: 114.526s VO: 21.055s A: 0.000s Sys: 1.582s = 137.163s BENCHMARK%: VC: 83.4962% VO: 15.3503% A: 0.% Sys: 1.1535% = 100.% We can compare it with the older results (decoding time was also improved a bit since that time because of recent assembly optimizations for dequantizer): http://maemo.org/pipermail/maemo-developers/2006-December/006646.html BENCHMARKs: VC: 121.282s VO: 31.538s A: 0.000s Sys: 1.577s = 154.397s BENCHMARK%: VC: 78.5517% VO: 20.4267% A: 0.% Sys: 1.0216% = 100.% Most of the speed improvement in color conversion and video output (VO: part) is gained just from loop unrolling and avoiding using some extra instructions as gcc does when compiling C code, but using STMD instruction to store 16 bytes at once at aligned location [1] provides at least 10% performance here. If we estimate memory copy speed here with additional colorspace conversion applied, it is about 70MB/s now for 640x480 24 fps video (though we need to read a bit less data than write here, so it is a bit different from memcpy). And I have observed peak memcpy performance about 110MB/s on Nokia 770. So this color convertor is quite close to memory bandwidth limit now. This code can be optimized more by processing two image lines at once, so we can get rid of some data read instructions and improve performance. Also experimenting with prefetch reads may provide some improvement. JIT generated code should have a bit worse performance, but not much. It we decide to make 'nearest neghbour' scaling, the result should be probably as fast as this nonscaled conversion. But I want to try some simplified variation of bilinear scaling: each pixel in the destination buffer is either a copy of some pixel in the source buffer or an average value of two pixels. This way it should only introduce two extra instructions for each byte in output at maximum: addition of two pixel color components and right shift. 2. Try using dsp tasks that already exist on the device and are used for dspfbsink. But the sources of gst plugins contain code that limits video resolution for dspfbsink. I wonder if this check was introduced artificially or it is the limitation of DSP scaler and it can't handle anything larger than that. Also I wonder if existing video scaler DSP task can support direct rendering [2]. I tried direct rendering in the
[maemo-developers] Re: Discussion of a possible project - offline calendar project
On Tue, 2007-01-16 at 22:23 +0100, Patrick Ohly wrote: * the showstopper though were performance/timeout issues in the EDS-DBus libraries (see below) Eek! Anyway, the problem is that after downloading 200 contacts into the Nokia 770 e_book_get_contacts() fails with a timeout error. I was able to work around that by using e_book_async_get_contacts(), only to find that now e_book_get_changes() suffers from the same problem. I suspect that it is a DBus method call which is expected to complete more quickly than it really does. Yes, there is a timeout on DBus calls, which isn't that long. If you have a lot of contacts EDS has to read every single one into a memory, create a DBus message and send it back to the client (which gets copied a number of times with the current bus protocol). If you profile it you'll see that memcpy() is the bottleneck here, basically there is too much data to copy, and not enough memory bandwidth. The solution is to always use the async methods unless you are coding a toy application. If you want to get all contacts, ideally create a book view -- this means you get informed of the contacts both asynchronously and incrementally (which is much nicer to system performance as there is no multi-megabyte message to parse and copy over the bus). When you call get_changes, use the async version. Have you had this working against eds-dbus? The e_book_get_changes() method until now was untested in the DBus port, and although I hoped it worked I hadn't verified it. If it has been working, that is great news. Feel free to mail me any further in-depth questions off-list, Ross -- Ross Burton mail: [EMAIL PROTECTED] jabber: [EMAIL PROTECTED] www: http://www.burtonini.com./ PGP Fingerprint: 1A21 F5B0 D8D0 CFE3 81D4 E25A 2D09 E447 D0B4 33DF ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] examining n800 kernel
Frantisek Dufka [EMAIL PROTECTED] writes: WI-FI seems to be same chip as in (newer) N770 devices (?), similar firmware blobs (3825.arm, 3826.arm) probably newer versions. Hopefully the speed will be better that those 500KB/s on N770 thanks to rest of the system. The SPI bus is faster, so WLAN is a bit faster. I have managed to get 7 Mbit/s (I guess about 800 KB/s) TCP downstream with iperf. -- Kalle Valo ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
RE: [maemo-developers] examining n800 kernel
I've been looking at the n800 kernel source in bora repository to figure out what n800 is like comparing to the n770. Here is a summary of some things I found. As I don't have the device I may be wrong with something that could be easily verified. kernel is 2.6.18-omap1 - everybody probably knows that :-) (snip) USB Seems to be 2.0, capable of high speed mode (480MBits), chip is TUSB6010 by TI. No usb host mode is compiled in the kernel. Usb host mode support was also removed from initfs (usb booting) so this may look bad. Looking at the shiny brochure-level documentation for the TUSB6010 it looks like it has full support for USB-OTG, including a 5V charge pump for driving peripheral devices. Whether or not it's possible to write support for it due to available documentation is another matter. I really wish Nokia would step forward and do it since it looks like it could be perfect for host-mode. Thanks for doing the kernel writeup! Larry Hi, AFAIK there is no USB-OTG in N800. The connector (HW) does not handle it so no effort spent in the kernel either. Br, --jakub ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
[maemo-developers] Re: maemo-developers Digest, Vol 21, Issue 57
Yes, I also noticed the flurry of dev projects after the announcement.Interesting. I would say, logic, not interesting. One thing is to invest your spare time in an open source project (I don't know you guys, but i do other things for a living) but also investing 400 euros is maybe too much to ask to developers, who are creating the sustainability of the N800 as an appealing platform for non developers. Also, consider that the new N800 is much better and more appealing that the 770, so many of us are now considering it seriously. By the way, I'm not currently in any project (neither do I own a N800 in this moment) but I'm cooking one project regarding gesture based interfaces. If initial research goes well, I'll let you know . . . and this will hapen regardless of that rebate :-D Date: Tue, 16 Jan 2007 11:02:34 -0800 From: Ty Hoffman [EMAIL PROTECTED] Subject: Re: [maemo-developers] n800 camera specs To: Michael Wiktowy [EMAIL PROTECTED] Cc: maemo developers mailing list maemo-developers@maemo.org Message-ID: [EMAIL PROTECTED] Content-Type: text/plain; charset=windows-1252; format=flowed Michael Wiktowy wrote: On 1/16/07, Matt Clark [EMAIL PROTECTED] wrote: I take nokia is going to refund the €300/$300 price difference for people that bought an n800 already but are going to be in the dev program? Wow, that's a wonderfully well developed sense of entitlement you've got there. Getting beyond the easily misinterpreted intentions of mailing list participants, he does have a very good idea. If Nokia just sends 500 worthy developers mail-in rebates for store-bought n800's then there is no issue of waiting anymore. It is probably the easiest thing to handle logistically on Nokia's side also. On another note, one beneficial thing that this delay has generated is a lot of developer activity on the 770 as devs fight to get noticed via project updates :] So as a simple user I say, Nokia, delay all you want ;] /Mike ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers Yes, I also noticed the flurry of dev projects after the announcement. Interesting. Maybe the whole discount program is just a psychological experiment on motivation and reward, and there are no discounted units. Perhaps it's a first test for the 'bounty' program discussed earlier...(please don't misinterpret my intentions! I'm just kidding!) --Ty -- ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers End of maemo-developers Digest, Vol 21, Issue 57 ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
[maemo-developers] gtk+-2.0 uninstallable
Hi With new maemo3.0 libgtk2.0+-dev is not installable with apt-get. When I tried to apt-get install libgtk+-2.0 it shows the unmet dependencies libglib2.0-dev. It is looking for pkg-config package which is installed by osso-af-settings and pkg-config version is 0.15.0 which is latest. For building basic hildon c program, maemo requires those packages. How to install these debs? TIA Saifa = Sonitrol: Security for Your Business A leading provider of Verified Response security technology including Alarm Systems, Access Control, Video Surveillance and Fire Systems for businesses and government (GSA). http://a8-asy.a8ww.net/a8-ads/adftrclick?redirectid=f97cf185da516826972d5ac69cbc624b ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
RE: [maemo-developers] DSP programming
I've been trying to get started with DSP programming, following the instructions summarised in this post. I just took a brief look at what is required for programming for the DSP in the 770. It seems that building your own modules is possible with publically awailable tools and documents. At least I got the demo_console from DSP gateway loaded and it seems to run OK. The following is what I did. Someone might be interested in testing the instructions and maybe polish these into a proper HOWTO to the maemo wiki. snip 3. Get the avs_kernel.out from your 770 The dynamically loaded DSP modules are linked against a dummy kernel object generated from the actual kernel. The DSP kernel is in /etc/dsp directory. Just copy it to your host and adjust dspgw-3.3-dsp/apps/demo_mod/Makefile to use the 770 avs_kernel.out to generate the dummy_kernel.obj instead of using the default tinkernel.out. Now you should be able to build the DSP side of the demo console with simple make. After building the dsp side of the demos (demo_console or demo_fb), the next step should be to run coff_unresolve on the resultant .out file, to remove the dummy kernel, which has been linked in. So something like the following: # create the dummy kernel gen_dummy_kernel avs_kernel.out -o avs_kernel.obj # Run make for the dsp demo in question # (compiles source and statically links in # the avs_kernel.obj dummy kernel) make # remove the dummy kernel coff_unresolve -s .tinkernel demo_console.out Unfortunately, the code produced using this technique doesn't run. It results in the eventual error message: open: Device or resource busy This is the same message as is received if you actually forget to run coff_unresolve on the demo_console.out dsp task. What does work, however, is simply renaming the unlinked demo_console.obj file and placing that in the /lib/dsp/modules/ directory. Have other people seen this problem? Has anyone else tried? I'd specifically like to work out whether it's me making a mistake with coff_unresolve (as this step will certainly be necessary for dsp tasks which are built using more than one source file, as these can't be linked together without the dummy kernel (avs_kernel.obj)). Thanks, Simon P.S. The above steps are documented in the dsp_dld_spec13.pdf file from dspgateway, in Chapter 5., Building a DSP dynamic task module. ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] Help needed on setting up the development plarform for NOKIA N770/800
Following are my questions: 1.I am newbie to Linuix. I have SUSE LINUX running in the VMWare Server on Windows PC. I would like to know if there are detail steps as to how I can setup the Maemo development platform on SUSE LINUX. http://www.maemo.org/platform/docs/howtos/Maemo_tutorial_bora.html#settingup 2.From the links in www.maemo.org, I understand that the Mamemo works on Debian. Do I have to change my linux distro to debian only? No, Debian http://www.debian.org/ or Ubuntu http://www.ubuntu.com/ are recommended, but other fairly recent distributions should also work 3.Pl. provide me what tools I have to install in order. The tools are listed at the tutorial (item 1). Help is appreciated ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers -- Raul Fernandes Herbster Embedded and Pervasive Computing Laboratory - embedded.dee.ufcg.edu.br Electrical Engineering Department - DEE - www.dee.ufcg.edu.br Electrical Engineering and Informatics Center - CEEI Federal University of Campina Grande - UFCG - www.ufcg.edu.br Caixa Postal 10105 58109-970 Campina Grande - PB - Brasil ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers