On Mon, Jan 28, 2013 at 7:28 AM, Patrick Strasser
<[email protected]> wrote:
> schrieb Albert Cahalan on 2013-01-28 04:40:
>> On Sun, Jan 27, 2013 at 9:22 PM, Patrick Strasser
>> <[email protected]> wrote:

> Right. Some may like it, like compiling on a 64 bit machine for a 32 bit
> machine. I have a strong, AMD64 bit desktop machine and an old eeePC
> 701, where I do not want to compile. It easily cross-builds, both
> running Debian testing...

True cross builds, or just down-level arch?

>> Via the configure script I have enabled:
>> CFLAGS="-Wall -W -Wstrict-prototypes -Wmissing-prototypes -Wstrict-aliasing"
>
> I guess these are good anyway.

These too, which I had been trying to add:
-fvisibility=hidden -fvisibility-inlines-hidden

All libraries should have these, except that the second one is only for C++.

The c2sim code makes this difficult. It cheats, calling functions
in the library that are not really public. (no CODEC2_WIN32SUPPORT)

BTW, the unittest code has a similar problem that prevents
marking library functions static. This blocks the compiler from
performing some kinds of optimization.

>> In kiss_fft.c and pack.c I have enabled:
>> #pragma GCC target("sse3,mmx,arch=prescott,fpmath=sse,no-ieee-fp,recip")
>> #pragma GCC 
>> optimize("fast-math,omit-frame-pointer,unsafe-loop-optimizations,single-precision-constant")
>
> You're doing some very special optimizations. If I understand right,
> functions compiled with these pragmas probably won't run on other systems.

For the first line, yes. For the second line, no. I haven't tried them
separately yet, but together they knock 1/7 off of benchmark times.

> Please keep in mind that people already run Codec2/FreeDV on various
> architectures, like ia32, amd64 and different arm, more to come.

I actually have PowerPC, and I strongly prefer 64-bit, so I understand.

> You can check the architecture in configure AC_CANONICAL_HOST, which
> gives you information about architecture, vendor and OS. I'm not sure
> how much the code would benefit from special commands, if the code has
> not suitable structure, like SIMD instructions in FFT. I found some
> config tests in liboil that check for features at build time.

Most people would be surprised at what gcc can do these days.
A coworker of mine wrote an IP-style checksum loop. (16-bit 1's
complement addition) He wrote it using 32-bit loads (pairs) with
the loop counter going down. The compiler switched the direction
of the loop counter and switched to loading 64-bit values two at
a time in the vector registers!

You can help with good hinting. It's important to let the compiler
know when aliasing and alignment won't be an issue.

>> I may want to compile for profiling; right now I'm going on guesses.
>
> I once tried oprofile. Is that still the preferred tool? I just read
> about perf.

As I understand it, perf replaced oprofile. For non-threaded things,
the old gperf tool will work and may be easier. Another good one
is kcachegrind, a valgrind-based visualization tool for cache misses.

>> An interesting question related to this: Suppose that 50% of the platforms 
>> can
>> handle the codec in real-time. Changing code generation increases that to 
>> 70%,
>> but makes the remaining 30% unable to run the codec at all. Is that good or 
>> bad?
>
> What do you mean with "unable"? Just won't run, lacking some features it
> is compiled for, or running unbearable slow?

refuse to run, or an immediate crash

> For distribution, it's not
> that difficult to run your own optimized build on Linux and friends, and
> for Windows you could ship different optimized binaries or check
> processor features at runtime and switch functions accordingly.

Checking is an option. It's kind of a pain.

> be potential for performance optimizations. Some people think it can run
> properly on some embedded architectures without floating point support,
> but then again this needs porting to fixed point and some expertize in
> this field.

Maybe this can be done w/o making the code too ugly.
I think the latest C standard has some fixed-point features.
It was at least a proposal, and I think gcc implemented it.

>> There is also the question
>> of how much CPU power must be left for other things in order to be practical,
>> and the question of half-duplex vs. full-duplex.
>
> Full duplex is not common in amateur radio. First because only one
> frequency is used and TDM is not feasible, second because most
> transceivers do not support full duplex. It's kind of a problem to hear
> anything when you are sending with 100s of Watts, without proper duplex
> spacing.

Do the radios switch too slowly for TDM? What can a good one do?
If the radios can switch decently fast, it should work nicely. Here is
a simplistic protocol for it:

The initiator sends in every fourth time slot. If he takes half a time slot
to switch, that leaves two of the four time slots for listening. The responder
must obviously listen when the initiator's packets show up. He starts off
by transmitting as soon after the reception as he can. He slowly increases
his receive-to-transmit delay until his packets fall into the 2-slot window of
time where the initiator can receive them.

> For other scenarios, like VoIP, duplex is probably reasonable, but I'm
> not sure if you are strongly limited in computational power in such
> situations.

I think you can become severely limited, even on modern server hardware.
You may be handling numerous calls at once. I've heard "thousands" for
the free VoIP software on Linux.

------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnnow-d2d
_______________________________________________
Freetel-codec2 mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/freetel-codec2

Reply via email to