Re: [Qi Hardware Discuss] SIMD instructions in jz4720

David Kuehling Tue, 28 Aug 2012 01:41:16 -0700

>>>>> "Alan" == Alan W Black <[email protected]> writes:


[..]
> Hmm, "yes my code is thread safe, except for using global registers"
> :-).  I agree this might be possible and actually work but ...

:)

[..]
>> Looking at
>> 
>> http://flite.sourcearchive.com/documentation/1.4-release-4/cst__mlsa_8c_source.html
>> 
>> I notice a lot of code does floating point computation?  Is the
>> floating point part significant to total runtime?  The Jz47xx does
>> not have an FPU, and the SIMD unit is also integer-only, so that
>> won't help you much.  Recoding parts in integer arithmic might help
>> if the float-part has any significant impact on performance.

> Yes it is floating point intensive.  The non-mlsa code (the diphones
> and unit selection) code uses almost no floating point and is integer
> optimized, that does run fast enough on the Nanonote, but the
> synthesis quality (and convenience) isn't as good.

Well, floating point emulation easily costs you factor of 10-100 in
performance.  Why not move the MLAS to integer math, instead of seeking
for low-level SIMD optimizations?  Integer arithmetic will also be
helpful with CPUs that *do* have an FPU, as integer operation latency is
generally lower than FPU latency, so even on high performance FPUs, much
code may just be stalled, waiting on results to become available instead
of actually doing any computing.

Isn't 32 bit enough?  What about 64 bit 'long long'?  MIPS doesn't have
native add-with-carry support, so 64-bit adds may be twice or three
times as expensive as on ARM, still much cheaper than doing floating
point math.  On the other hand MIPS *does* have multiply-accumulate with
32 bit inputs and 64 bit output + accumulator, if that helps (taking 2
cycles on the jz4720, IIRC).

BTW moving your code from double to floats may also improve performance
with floating point emulation a lot.

>> Quite remarkable that something as low-data rate as speech-quality
>> audio is so difficult to generate...

> Well us speech synthesis people seem to try to compete with speech
> recognition people to use more CPU time :-).  That's not quite true
> but speech synthesis researchers rarely (except me) cares about end
> processor performance.  Signal generation for models is much harder
> than simple signal reconstruction (like mp3 encoding, or simple LPCs)

> It therefore might be more worthwhile to look for a better soft float
> option.  I remember back when we used ipaq 38xx's the floating point
> performance under linux was much worse than the performance under
> WinCE due to a better soft float optimization.  Also I note how our
> statistical synthesizers got much better on Google Nexus 1's when the
> SDK compiler was upgraded to generate better float code.

Maybe it's not better float code, only less standards compliant float
code?  You probably can improve performance a lot by sacrificing some
accuracy and error checks.  Still I see no sense in emulating a
hardware-optimized compactly-stored floating point format, on a hardware
that does have no FPU.  Roll your own FP, or better yet, just go for
integer math!

cheers,

David
-- 
GnuPG public key: http://dvdkhlng.users.sourceforge.net/dk.gpg
Fingerprint: B17A DC95 D293 657B 4205  D016 7DEF 5323 C174 7D40

pgpbhBMMEHMQp.pgp
Description: PGP signature

_______________________________________________
Qi Hardware Discussion List
Mail to list (members only): [email protected]
Subscribe or Unsubscribe: 
http://lists.en.qi-hardware.com/mailman/listinfo/discussion

Re: [Qi Hardware Discuss] SIMD instructions in jz4720

Reply via email to