Hi all,

I'm happy to say I've got the DSP task working for more than 4s now, in fact
it even runs all the way to the end of the song, as expected ;).

You can download version 1.0.0 from here:
https://garage.maemo.org/projects/dsp-sbc/. This is for Diablo only.

This consists of a tarball containing the DSP task and command file, a
tweaked Bluez-utils which can use said DSP task for SBC encoding (so it will
just work with mplayer and the like) and an installation script which writes
some config data about the new task to the DSP dynamic loader conf file and
then extracts the tarball, installs the deb and tells you to reboot.

[Note to would-be DSP hackers: rather than rebooting, you can just run
"dsp_dld" in the terminal to restart the loader daemon, but make sure you've
made a symlink from /lib/dsp/dsp_dld_avs.conf -> /lib/dsp/dsp_dld.conf as
this is where it expects to find the conf file.]

If you want to go back to software encoding, rename the sbcenc.o file (in
/lib/dsp/modules) and it will automatically fall back to the original
software method (it falls back whenever the DSP fails, and renaming the task
will cause it to fail). I've not checked to see if the fallback method is as
quick as the original code, I'd be interested to know though if anyone is
bored. I should add some logic using an env var or similar to switch method
- anyone have some example code I could use?

You still need to enable a2dp with either johnx's a2dp deb which can be
found here: http://www.internettablettalk.com/forums/showthread.php?t=13468
or manually (use the deb, far easier).

I should add that running DSP tasks will move the CPU frequency to 330MHz,
so this is probably not the answer to everyone's prayers with regard to
freeing the CPU to do Xvid decoding or the like. There is a kernel patch to
not force the CPU to 330MHz (the DSP runs slower) and I'll do some testing
to see if the DSP task can run in real-time at the lower DSP clock speed.
Then it will be significantly more useful. In the meantime, it may or may
not use less power this way, please let me know if you do any testing.



Next bit is for those interested in the gory details:

This is pretty much the same code I had running a week ago or thereabouts,
and it was only encoding ~4s of audio in real-time (using bulk transfers &
ioctls for sync). I tested the SW encoder and it would encode a test file
more slowly than the DSP method but would output more seconds worth of audio
when testing with mplayer, which made me wonder if the DSP was just cursed
(or perhaps something to do with the CPU speed being set to 330MHz when the
DSP is running...). The released code is from my mk2 branch:
https://garage.maemo.org/plugins/scmsvn/viewcvs.php/mk2/?root=dsp-sbc

The change which has allowed it to encode an entire song rather than just a
few seconds was to move the input and output buffers from SDRAM (OMAP main
memory) to SRAM (DSP fast single access memory). There are probably other
things which would benefit from being moved, the sbc->priv data (or parts
thereof) for one. This structure is pretty big so I allocated it in SDRAM,
but at least parts of it might be better off in faster local memory. This is
something to look at.

I tested the speed of the bulk transfers (29s au file, took ~20s to encode
with the DSP and ~9s to just transfer the data), which are pretty slow as
you can see. I then decided to convert the task to use shared memory and
some polling and sleeping to synchronise (mk4 branch:
https://garage.maemo.org/plugins/scmsvn/viewcvs.php/mk4/?root=dsp-sbc). The
mk4 code takes absolutely forever to run though, the same test file which
takes ~20s with the bulk transfer method (mk2) takes ~45s using shared
memory. Unfortunately there appear to be no clocks available in the DSP
kernel (which makes benchmarking code quite tricky) and also means you can't
sleep() between polling memory.

So the DSP task sits in a tight polling loop (bad!) and the ARM sleeps for
1us and then polls the shared memory. Anyway, there's something not right
and I'm not sure what it might be (the DSP manages ~650 loops before the ARM
presents it with input data), the DSP then processes and the ARM sleeps for
1 loop (1us) before the DSP gives it back the encoded data, and so on. This
is not a good method for the task to use, but I am interested to know why
it's so slow, so may do some more work on it eventually.

Talking about a lack of clocks, the mk3 branch was my attempt to rewrite the
sbc conversion fns using DSP intrinsics, dual MACs, and the like. It doesn't
produce the correct output data (probably some issue with my Q15 arithmetic,
this was only the first hack at the code) but also didn't improve the speed
of the code (and with no clock fns it's hard to tell where the bottleneck
is) so I'm leaving it alone for the time being.

Last but not least, even when running at 165MHz (or whatever the
conservative governor produces) the sw fall back code doesn't produce any
error messages (when playing through mplayer for example - no audio output
either I hasten to add). The DSP code did produce errors before the move
from SDRAM->SRAM, namely the following:

alsa-play: write error: Broken pipe
alsa-play: trying to reset soundcard

I'm curious to know what the difference is, and if it means I've not
implemented some fn that I should have. Any ideas?

Sorry for the typically long and rambling email, I'm sure there are things
I've missed out, please ask :)

Cheers,


Simon


_______________________________________________
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers

Reply via email to