Re: [PD] Raspberry Pi does denormals

2013-01-24 Thread katja
'Undenormalized' Pd build for Raspberry Pi is temporarily parked here
for testing purposes (will be removed when Miller's release is fixed
in this sense):

www.katjaas.nl/temp/pd-0.44-0-normalized.tar.gz

This is a locally installed Pd, like Miller's distribution. You can
start it from command line with the full path to
pd-0.44-0-normalized/bin/pd. It's not a .deb, so it can't be installed
under supervision of package manager.

Katja


On Wed, Jan 23, 2013 at 9:15 PM, Julian Brooks jbee...@gmail.com wrote:
 Hey Katja,

 Would you mind sharing the 'normalised' Pd-0.44.0 for RPi please.

 Cheers,

 Julian



 On 23 January 2013 18:23, katja katjavet...@gmail.com wrote:

 Now I recompiled the Pd-0.44.0 release on Raspberry Pi (took me a few
 hours, not only because Pi is so slow) with PD_BIGORSMALL enabled for
 arm in m_pd.h. Using bigorsmalltest.pd from my previous mail I
 verified that the macro is implemented indeed.

 Martin Brinkmann's patch chaosmonster1
 (http://www.martin-brinkmann.de) gives a beautiful illustration of the
 improvement. This patch is full of filters and delay lines. At it's
 initial settings, there is no subnormals problem. But if you set the
 bottom slider to the right, it gets silent. With Pd-0.44-0 release,
 CPU load explodes. With the 'normalized' Pd, nothing special happens.

 And indeed, the PD_BIGORSMALL conditional checks come for free: with
 initial settings of the chaosmonster1, performance is equivalent in
 both Pd's. Cool! Hopefully this is similar on armv7.

 Katja



 On Wed, Jan 23, 2013 at 5:01 PM, Hans-Christoph Steiner h...@at.or.at
 wrote:
 
  hey Katya,
 
  This also sounds like good evidence for your idea of writing C code that
  modern compilers optimize well.  Using unions for aliasing allows the
  compiler
  to do all the new tricks, then writing loops that auto-vectorize gives
  us the
  real benefits.  Also, I think we can see some gains by using memcpy()
  since on
  modern libc version, those are highly optimized for the given CPU,
  dynamically
  choosing the routines based on what instructions are available. memcpy
  will
  use things like SSSE2 if its available.
 
  .hc
 
  On 01/23/2013 07:47 AM, katja wrote:
  Finally some good news on this topic. Earlier I stated that 'big or
  small tests' are expensive for the Pi, but that is not by definition
  the case. There must have been other conditions blurring my
  impression. I've now done a systematic test where other influences are
  ruled out. A test class [lopass~] with exactly the same routine as
  [lop~] was made, but compiled with PD_BIGORSMALL() macro enabled. It
  was verified that [lopass~] is not affected by denormals. Performance
  comparison of [lop~] and [lopass~] shows that both objects cause
  equivalent CPU load. Meaning, Raspberry Pi gives the 'big or small
  checks' for free! At least in the case of this simple filter. Please
  try attached bigorsmalltest.zip on the Pi to see if I'm not dreaming.
 
  While I was at the topic anyway, I also tried a big or small test with
  union instead of direct type aliasing. It has the advantage that the
  compiler can apply strict aliasing rules. This test with unions did
  not cause extra CPU load either on the Pi. If you want to verify this
  result, enable the call to bigorsmall() instead of PD_BIGORSMALL in
  lopass~.c and recompile.
 
  The fact that these tests do not cause extra CPU load, indicate that
  they are done in parallel with other instructions. Float and int
  registers are apparently strictly separated on armv6, there's no such
  thing like Intel's xmm registers or armv7's NEON. As it happens, the
  big or small tests are done on ints, aliases of the floats that must
  be tested. Initially I assumed that the transport of floats from vfp
  to the arm integer processor would be expensive, but if the
  instructions are done simultaneously it may be an advantage instead.
  Another thing is that ARM implements branch predication instead of
  branch prediction. Those terms look almost the same but the routines
  are very different. Predication is when instructions for both branches
  are executed, and the wrong result is simply discarded later.
 
  Conclusions from the limited test with [lop~] and [lopass~] do not
  mean that all sorts of conditional checks are cheap on the Pi, or on
  ARM in general. If PD_BIGORSMALL is enabled for RPi using compile-time
  definition __arm__, it will also hold for armv7, but it may have very
  different result there. At the moment I have no access yet to an armv7
  device. Maybe someone can recompile test class [lopass~] and do the
  tests on Beagleboard or Cubieboard? Otherwise I may be able to do it
  on my friend's PengPod when that has arrived.
 
  Katja
 
 
  On Tue, Jan 22, 2013 at 8:54 PM, Miller Puckette m...@ucsd.edu wrote:
  thanks - I'd better try this and find out what's going on :)
 
  M
 
  On Mon, Jan 21, 2013 at 11:54:29AM +0100, katja wrote:
  Tried the 0.44.0 build from your website. It 

Re: [PD] Raspberry Pi does denormals

2013-01-24 Thread katja
On Wed, Jan 23, 2013 at 8:00 PM, padawa...@obiwannabe.co.uk
padawa...@obiwannabe.co.uk wrote:

 On 23 January 2013 at 18:23 katja katjavet...@gmail.com wrote:
 Now I recompiled the Pd-0.44.0 release on Raspberry Pi (took me a few
 hours, not only because Pi is so slow)

 Have you looked into cross compiling options much?
 there's plenty of arm7 support avail lst time I looked
 Just thinking out loud
 a.

I haven't looked at cross compiling options yet. Frankly, I'm
fascinated by the fact that Raspberry Pi is self-supporting. All the
GNU tools and other familiar deb packages working on a pocket size
circuit board. That's why I like RPi (and not the Android/iOS
gadgets). If you know what you're doing, you can start some job and
leave Pi alone. In the case of compiling Pd, the job was interrupted
by errors several times. Install instructions could be more complete.
I should make some notes and post them.

Katja

___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management - 
http://lists.puredata.info/listinfo/pd-list


Re: [PD] Raspberry Pi does denormals

2013-01-24 Thread Julian Brooks
Thank you.

On 24 January 2013 09:14, katja katjavet...@gmail.com wrote:

 'Undenormalized' Pd build for Raspberry Pi is temporarily parked here
 for testing purposes (will be removed when Miller's release is fixed
 in this sense):

 www.katjaas.nl/temp/pd-0.44-0-normalized.tar.gz

 This is a locally installed Pd, like Miller's distribution. You can
 start it from command line with the full path to
 pd-0.44-0-normalized/bin/pd. It's not a .deb, so it can't be installed
 under supervision of package manager.

 Katja


 On Wed, Jan 23, 2013 at 9:15 PM, Julian Brooks jbee...@gmail.com wrote:
  Hey Katja,
 
  Would you mind sharing the 'normalised' Pd-0.44.0 for RPi please.
 
  Cheers,
 
  Julian
 
 
 
  On 23 January 2013 18:23, katja katjavet...@gmail.com wrote:
 
  Now I recompiled the Pd-0.44.0 release on Raspberry Pi (took me a few
  hours, not only because Pi is so slow) with PD_BIGORSMALL enabled for
  arm in m_pd.h. Using bigorsmalltest.pd from my previous mail I
  verified that the macro is implemented indeed.
 
  Martin Brinkmann's patch chaosmonster1
  (http://www.martin-brinkmann.de) gives a beautiful illustration of the
  improvement. This patch is full of filters and delay lines. At it's
  initial settings, there is no subnormals problem. But if you set the
  bottom slider to the right, it gets silent. With Pd-0.44-0 release,
  CPU load explodes. With the 'normalized' Pd, nothing special happens.
 
  And indeed, the PD_BIGORSMALL conditional checks come for free: with
  initial settings of the chaosmonster1, performance is equivalent in
  both Pd's. Cool! Hopefully this is similar on armv7.
 
  Katja
 
 
 
  On Wed, Jan 23, 2013 at 5:01 PM, Hans-Christoph Steiner h...@at.or.at
  wrote:
  
   hey Katya,
  
   This also sounds like good evidence for your idea of writing C code
 that
   modern compilers optimize well.  Using unions for aliasing allows the
   compiler
   to do all the new tricks, then writing loops that auto-vectorize gives
   us the
   real benefits.  Also, I think we can see some gains by using memcpy()
   since on
   modern libc version, those are highly optimized for the given CPU,
   dynamically
   choosing the routines based on what instructions are available. memcpy
   will
   use things like SSSE2 if its available.
  
   .hc
  
   On 01/23/2013 07:47 AM, katja wrote:
   Finally some good news on this topic. Earlier I stated that 'big or
   small tests' are expensive for the Pi, but that is not by definition
   the case. There must have been other conditions blurring my
   impression. I've now done a systematic test where other influences
 are
   ruled out. A test class [lopass~] with exactly the same routine as
   [lop~] was made, but compiled with PD_BIGORSMALL() macro enabled. It
   was verified that [lopass~] is not affected by denormals. Performance
   comparison of [lop~] and [lopass~] shows that both objects cause
   equivalent CPU load. Meaning, Raspberry Pi gives the 'big or small
   checks' for free! At least in the case of this simple filter. Please
   try attached bigorsmalltest.zip on the Pi to see if I'm not dreaming.
  
   While I was at the topic anyway, I also tried a big or small test
 with
   union instead of direct type aliasing. It has the advantage that the
   compiler can apply strict aliasing rules. This test with unions did
   not cause extra CPU load either on the Pi. If you want to verify this
   result, enable the call to bigorsmall() instead of PD_BIGORSMALL in
   lopass~.c and recompile.
  
   The fact that these tests do not cause extra CPU load, indicate that
   they are done in parallel with other instructions. Float and int
   registers are apparently strictly separated on armv6, there's no such
   thing like Intel's xmm registers or armv7's NEON. As it happens, the
   big or small tests are done on ints, aliases of the floats that must
   be tested. Initially I assumed that the transport of floats from vfp
   to the arm integer processor would be expensive, but if the
   instructions are done simultaneously it may be an advantage instead.
   Another thing is that ARM implements branch predication instead of
   branch prediction. Those terms look almost the same but the routines
   are very different. Predication is when instructions for both
 branches
   are executed, and the wrong result is simply discarded later.
  
   Conclusions from the limited test with [lop~] and [lopass~] do not
   mean that all sorts of conditional checks are cheap on the Pi, or on
   ARM in general. If PD_BIGORSMALL is enabled for RPi using
 compile-time
   definition __arm__, it will also hold for armv7, but it may have very
   different result there. At the moment I have no access yet to an
 armv7
   device. Maybe someone can recompile test class [lopass~] and do the
   tests on Beagleboard or Cubieboard? Otherwise I may be able to do it
   on my friend's PengPod when that has arrived.
  
   Katja
  
  
   On Tue, Jan 22, 2013 at 8:54 PM, Miller Puckette 

Re: [PD] Raspberry Pi does denormals

2013-01-24 Thread Hans-Christoph Steiner
On 01/24/2013 04:56 AM, katja wrote:
 On Wed, Jan 23, 2013 at 8:00 PM, padawa...@obiwannabe.co.uk
 padawa...@obiwannabe.co.uk wrote:

 On 23 January 2013 at 18:23 katja katjavet...@gmail.com wrote:
 Now I recompiled the Pd-0.44.0 release on Raspberry Pi (took me a few
 hours, not only because Pi is so slow)

 Have you looked into cross compiling options much?
 there's plenty of arm7 support avail lst time I looked
 Just thinking out loud
 a.
 
 I haven't looked at cross compiling options yet. Frankly, I'm
 fascinated by the fact that Raspberry Pi is self-supporting. All the
 GNU tools and other familiar deb packages working on a pocket size
 circuit board. That's why I like RPi (and not the Android/iOS
 gadgets). If you know what you're doing, you can start some job and
 leave Pi alone. In the case of compiling Pd, the job was interrupted
 by errors several times. Install instructions could be more complete.
 I should make some notes and post them.
 
 Katja
 


I just set up an RPi chroot on one of the PdLab machines, it was pretty easy
to do on Debian.  Then you just run 'dchroot -d -c raspbian-armhf' and you
have a shell in a virtual RPi.  You can get access if you want:

http://puredata.info/docs/developer/PdLab

.hc

___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management - 
http://lists.puredata.info/listinfo/pd-list


Re: [PD] Raspberry Pi does denormals

2013-01-24 Thread Hans-Christoph Steiner
On 01/24/2013 01:33 PM, padawa...@obiwannabe.co.uk wrote:
 On 24 January 2013 at 17:18 Hans-Christoph Steiner h...@at.or.at wrote:
 On 01/24/2013 04:56 AM, katja wrote:
 On Wed, Jan 23, 2013 at 8:00 PM, padawa...@obiwannabe.co.uk
 padawa...@obiwannabe.co.uk wrote:

 On 23 January 2013 at 18:23 katja katjavet...@gmail.com wrote:
 Now I recompiled the Pd-0.44.0 release on Raspberry Pi (took me a few
 hours, not only because Pi is so slow)

 Have you looked into cross compiling options much?
 there's plenty of arm7 support avail lst time I looked
 Just thinking out loud
 a.

 I haven't looked at cross compiling options yet. Frankly, I'm
 fascinated by the fact that Raspberry Pi is self-supporting. All the
 GNU tools and other familiar deb packages working on a pocket size
 circuit board. That's why I like RPi (and not the Android/iOS
 gadgets). If you know what you're doing, you can start some job and
 leave Pi alone. In the case of compiling Pd, the job was interrupted
 by errors several times. Install instructions could be more complete.
 I should make some notes and post them.

 Katja



 I just set up an RPi chroot on one of the PdLab machines, it was pretty easy
 to do on Debian. Then you just run 'dchroot -d -c raspbian-armhf' and you
 have a shell in a virtual RPi. You can get access if you want:

 http://puredata.info/docs/developer/PdLab

 .hc
 
 Sweet. I'll give the chroot method a spin. I like working on native
 images this way, but when emulating on a host machine anything
 to do with networking seems to be a minefield.  Which amplifies
 Katja's preference at being able to do it all in the real machine.
 The trade off is speed I guess. Emulate it until most is done
 and then tweak on the real board. I saw a few blogs about using
 qemu or virtual-box as an arm emulator, but the setup seemed
 quite involved.

With the chroot method, networking is entirely handled by the host, so that
has been working pretty well for me.  But I don't run network services in the
chroot.

Here's how I did it:

http://annoyingtechnicaldetails.wordpress.com/2013/01/24/setting-up-a-chroot-for-raspbian/

.hc

___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management - 
http://lists.puredata.info/listinfo/pd-list


Re: [PD] Raspberry Pi does denormals

2013-01-23 Thread Hans-Christoph Steiner

hey Katya,

This also sounds like good evidence for your idea of writing C code that
modern compilers optimize well.  Using unions for aliasing allows the compiler
to do all the new tricks, then writing loops that auto-vectorize gives us the
real benefits.  Also, I think we can see some gains by using memcpy() since on
modern libc version, those are highly optimized for the given CPU, dynamically
choosing the routines based on what instructions are available. memcpy will
use things like SSSE2 if its available.

.hc

On 01/23/2013 07:47 AM, katja wrote:
 Finally some good news on this topic. Earlier I stated that 'big or
 small tests' are expensive for the Pi, but that is not by definition
 the case. There must have been other conditions blurring my
 impression. I've now done a systematic test where other influences are
 ruled out. A test class [lopass~] with exactly the same routine as
 [lop~] was made, but compiled with PD_BIGORSMALL() macro enabled. It
 was verified that [lopass~] is not affected by denormals. Performance
 comparison of [lop~] and [lopass~] shows that both objects cause
 equivalent CPU load. Meaning, Raspberry Pi gives the 'big or small
 checks' for free! At least in the case of this simple filter. Please
 try attached bigorsmalltest.zip on the Pi to see if I'm not dreaming.
 
 While I was at the topic anyway, I also tried a big or small test with
 union instead of direct type aliasing. It has the advantage that the
 compiler can apply strict aliasing rules. This test with unions did
 not cause extra CPU load either on the Pi. If you want to verify this
 result, enable the call to bigorsmall() instead of PD_BIGORSMALL in
 lopass~.c and recompile.
 
 The fact that these tests do not cause extra CPU load, indicate that
 they are done in parallel with other instructions. Float and int
 registers are apparently strictly separated on armv6, there's no such
 thing like Intel's xmm registers or armv7's NEON. As it happens, the
 big or small tests are done on ints, aliases of the floats that must
 be tested. Initially I assumed that the transport of floats from vfp
 to the arm integer processor would be expensive, but if the
 instructions are done simultaneously it may be an advantage instead.
 Another thing is that ARM implements branch predication instead of
 branch prediction. Those terms look almost the same but the routines
 are very different. Predication is when instructions for both branches
 are executed, and the wrong result is simply discarded later.
 
 Conclusions from the limited test with [lop~] and [lopass~] do not
 mean that all sorts of conditional checks are cheap on the Pi, or on
 ARM in general. If PD_BIGORSMALL is enabled for RPi using compile-time
 definition __arm__, it will also hold for armv7, but it may have very
 different result there. At the moment I have no access yet to an armv7
 device. Maybe someone can recompile test class [lopass~] and do the
 tests on Beagleboard or Cubieboard? Otherwise I may be able to do it
 on my friend's PengPod when that has arrived.
 
 Katja
 
 
 On Tue, Jan 22, 2013 at 8:54 PM, Miller Puckette m...@ucsd.edu wrote:
 thanks - I'd better try this and find out what's going on :)

 M

 On Mon, Jan 21, 2013 at 11:54:29AM +0100, katja wrote:
 Tried the 0.44.0 build from your website. It has the same issue with
 subnormal values. My test patch is with [lop~]. If inf or nan is fed
 into [lop~], these 'values' keep circulating in the object, it can no
 longer process normal signal values.

 I also tried my reverb stuff with specific compiler options for Pi's 
 processor:

 -march=armv6zk
 -mcpu=arm1176jzf-s
 -mtune=arm1176jzf-s

 With these options, gcc should be able to decide that RunFast mode is
 permitted. But even in combination with -ffast-math (which in turn
 sets -funsafe-math-optimizations and -fno-trapping-math amongst
 others), denormals are still there. I'm literally out of options for
 the moment. Sorry for not having better news.

 Katja



___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management - 
http://lists.puredata.info/listinfo/pd-list


Re: [PD] Raspberry Pi does denormals

2013-01-23 Thread katja
Now I recompiled the Pd-0.44.0 release on Raspberry Pi (took me a few
hours, not only because Pi is so slow) with PD_BIGORSMALL enabled for
arm in m_pd.h. Using bigorsmalltest.pd from my previous mail I
verified that the macro is implemented indeed.

Martin Brinkmann's patch chaosmonster1
(http://www.martin-brinkmann.de) gives a beautiful illustration of the
improvement. This patch is full of filters and delay lines. At it's
initial settings, there is no subnormals problem. But if you set the
bottom slider to the right, it gets silent. With Pd-0.44-0 release,
CPU load explodes. With the 'normalized' Pd, nothing special happens.

And indeed, the PD_BIGORSMALL conditional checks come for free: with
initial settings of the chaosmonster1, performance is equivalent in
both Pd's. Cool! Hopefully this is similar on armv7.

Katja



On Wed, Jan 23, 2013 at 5:01 PM, Hans-Christoph Steiner h...@at.or.at wrote:

 hey Katya,

 This also sounds like good evidence for your idea of writing C code that
 modern compilers optimize well.  Using unions for aliasing allows the compiler
 to do all the new tricks, then writing loops that auto-vectorize gives us the
 real benefits.  Also, I think we can see some gains by using memcpy() since on
 modern libc version, those are highly optimized for the given CPU, dynamically
 choosing the routines based on what instructions are available. memcpy will
 use things like SSSE2 if its available.

 .hc

 On 01/23/2013 07:47 AM, katja wrote:
 Finally some good news on this topic. Earlier I stated that 'big or
 small tests' are expensive for the Pi, but that is not by definition
 the case. There must have been other conditions blurring my
 impression. I've now done a systematic test where other influences are
 ruled out. A test class [lopass~] with exactly the same routine as
 [lop~] was made, but compiled with PD_BIGORSMALL() macro enabled. It
 was verified that [lopass~] is not affected by denormals. Performance
 comparison of [lop~] and [lopass~] shows that both objects cause
 equivalent CPU load. Meaning, Raspberry Pi gives the 'big or small
 checks' for free! At least in the case of this simple filter. Please
 try attached bigorsmalltest.zip on the Pi to see if I'm not dreaming.

 While I was at the topic anyway, I also tried a big or small test with
 union instead of direct type aliasing. It has the advantage that the
 compiler can apply strict aliasing rules. This test with unions did
 not cause extra CPU load either on the Pi. If you want to verify this
 result, enable the call to bigorsmall() instead of PD_BIGORSMALL in
 lopass~.c and recompile.

 The fact that these tests do not cause extra CPU load, indicate that
 they are done in parallel with other instructions. Float and int
 registers are apparently strictly separated on armv6, there's no such
 thing like Intel's xmm registers or armv7's NEON. As it happens, the
 big or small tests are done on ints, aliases of the floats that must
 be tested. Initially I assumed that the transport of floats from vfp
 to the arm integer processor would be expensive, but if the
 instructions are done simultaneously it may be an advantage instead.
 Another thing is that ARM implements branch predication instead of
 branch prediction. Those terms look almost the same but the routines
 are very different. Predication is when instructions for both branches
 are executed, and the wrong result is simply discarded later.

 Conclusions from the limited test with [lop~] and [lopass~] do not
 mean that all sorts of conditional checks are cheap on the Pi, or on
 ARM in general. If PD_BIGORSMALL is enabled for RPi using compile-time
 definition __arm__, it will also hold for armv7, but it may have very
 different result there. At the moment I have no access yet to an armv7
 device. Maybe someone can recompile test class [lopass~] and do the
 tests on Beagleboard or Cubieboard? Otherwise I may be able to do it
 on my friend's PengPod when that has arrived.

 Katja


 On Tue, Jan 22, 2013 at 8:54 PM, Miller Puckette m...@ucsd.edu wrote:
 thanks - I'd better try this and find out what's going on :)

 M

 On Mon, Jan 21, 2013 at 11:54:29AM +0100, katja wrote:
 Tried the 0.44.0 build from your website. It has the same issue with
 subnormal values. My test patch is with [lop~]. If inf or nan is fed
 into [lop~], these 'values' keep circulating in the object, it can no
 longer process normal signal values.

 I also tried my reverb stuff with specific compiler options for Pi's 
 processor:

 -march=armv6zk
 -mcpu=arm1176jzf-s
 -mtune=arm1176jzf-s

 With these options, gcc should be able to decide that RunFast mode is
 permitted. But even in combination with -ffast-math (which in turn
 sets -funsafe-math-optimizations and -fno-trapping-math amongst
 others), denormals are still there. I'm literally out of options for
 the moment. Sorry for not having better news.

 Katja



___
Pd-list@iem.at mailing list
UNSUBSCRIBE 

Re: [PD] Raspberry Pi does denormals

2013-01-23 Thread Julian Brooks
Hey Katja,

Would you mind sharing the 'normalised' Pd-0.44.0 for RPi please.

Cheers,

Julian



On 23 January 2013 18:23, katja katjavet...@gmail.com wrote:

 Now I recompiled the Pd-0.44.0 release on Raspberry Pi (took me a few
 hours, not only because Pi is so slow) with PD_BIGORSMALL enabled for
 arm in m_pd.h. Using bigorsmalltest.pd from my previous mail I
 verified that the macro is implemented indeed.

 Martin Brinkmann's patch chaosmonster1
 (http://www.martin-brinkmann.de) gives a beautiful illustration of the
 improvement. This patch is full of filters and delay lines. At it's
 initial settings, there is no subnormals problem. But if you set the
 bottom slider to the right, it gets silent. With Pd-0.44-0 release,
 CPU load explodes. With the 'normalized' Pd, nothing special happens.

 And indeed, the PD_BIGORSMALL conditional checks come for free: with
 initial settings of the chaosmonster1, performance is equivalent in
 both Pd's. Cool! Hopefully this is similar on armv7.

 Katja



 On Wed, Jan 23, 2013 at 5:01 PM, Hans-Christoph Steiner h...@at.or.at
 wrote:
 
  hey Katya,
 
  This also sounds like good evidence for your idea of writing C code that
  modern compilers optimize well.  Using unions for aliasing allows the
 compiler
  to do all the new tricks, then writing loops that auto-vectorize gives
 us the
  real benefits.  Also, I think we can see some gains by using memcpy()
 since on
  modern libc version, those are highly optimized for the given CPU,
 dynamically
  choosing the routines based on what instructions are available. memcpy
 will
  use things like SSSE2 if its available.
 
  .hc
 
  On 01/23/2013 07:47 AM, katja wrote:
  Finally some good news on this topic. Earlier I stated that 'big or
  small tests' are expensive for the Pi, but that is not by definition
  the case. There must have been other conditions blurring my
  impression. I've now done a systematic test where other influences are
  ruled out. A test class [lopass~] with exactly the same routine as
  [lop~] was made, but compiled with PD_BIGORSMALL() macro enabled. It
  was verified that [lopass~] is not affected by denormals. Performance
  comparison of [lop~] and [lopass~] shows that both objects cause
  equivalent CPU load. Meaning, Raspberry Pi gives the 'big or small
  checks' for free! At least in the case of this simple filter. Please
  try attached bigorsmalltest.zip on the Pi to see if I'm not dreaming.
 
  While I was at the topic anyway, I also tried a big or small test with
  union instead of direct type aliasing. It has the advantage that the
  compiler can apply strict aliasing rules. This test with unions did
  not cause extra CPU load either on the Pi. If you want to verify this
  result, enable the call to bigorsmall() instead of PD_BIGORSMALL in
  lopass~.c and recompile.
 
  The fact that these tests do not cause extra CPU load, indicate that
  they are done in parallel with other instructions. Float and int
  registers are apparently strictly separated on armv6, there's no such
  thing like Intel's xmm registers or armv7's NEON. As it happens, the
  big or small tests are done on ints, aliases of the floats that must
  be tested. Initially I assumed that the transport of floats from vfp
  to the arm integer processor would be expensive, but if the
  instructions are done simultaneously it may be an advantage instead.
  Another thing is that ARM implements branch predication instead of
  branch prediction. Those terms look almost the same but the routines
  are very different. Predication is when instructions for both branches
  are executed, and the wrong result is simply discarded later.
 
  Conclusions from the limited test with [lop~] and [lopass~] do not
  mean that all sorts of conditional checks are cheap on the Pi, or on
  ARM in general. If PD_BIGORSMALL is enabled for RPi using compile-time
  definition __arm__, it will also hold for armv7, but it may have very
  different result there. At the moment I have no access yet to an armv7
  device. Maybe someone can recompile test class [lopass~] and do the
  tests on Beagleboard or Cubieboard? Otherwise I may be able to do it
  on my friend's PengPod when that has arrived.
 
  Katja
 
 
  On Tue, Jan 22, 2013 at 8:54 PM, Miller Puckette m...@ucsd.edu wrote:
  thanks - I'd better try this and find out what's going on :)
 
  M
 
  On Mon, Jan 21, 2013 at 11:54:29AM +0100, katja wrote:
  Tried the 0.44.0 build from your website. It has the same issue with
  subnormal values. My test patch is with [lop~]. If inf or nan is fed
  into [lop~], these 'values' keep circulating in the object, it can no
  longer process normal signal values.
 
  I also tried my reverb stuff with specific compiler options for Pi's
 processor:
 
  -march=armv6zk
  -mcpu=arm1176jzf-s
  -mtune=arm1176jzf-s
 
  With these options, gcc should be able to decide that RunFast mode is
  permitted. But even in combination with -ffast-math (which in turn
  sets 

Re: [PD] Raspberry Pi does denormals

2013-01-22 Thread katja
Hey Pierre,

I've commented patch denorm-test.pd in such a way that it explains the
topic a bit more (see attached). Now I'd like to ask you, if you can
run the patch on a 'normal' computer and on your Raspberry Pi for
comparison. If you have Pd gui on the Pi, you should be able to check
if your Pd install has a denormals issue. It's hard to believe that I
would be the only one experiencing this issue, but I need to be sure.

In my previous mail I stated that it is easy to avoid subnormals by
just feeding a very small number in objects that may decay into
subnormal range (feedback delay lines and most filters). But in
practice, this is not always so easy, as I am now experiencing while
trying to make a big patch work well on the Pi. There are many more
filters than I was aware of. For example, following a bit-mangling
operation a [hip~] was added to remove DC. Now if the bit-mangler
stops receiving signal input, [hip~] starts to chew subnormals until
it will receive signal again. I found that each filter struggling with
subnormals eats at least 6% CPU time (while they do some 0.25% in
normal state). Pd objects should really take care of this in one way
or another, it's too confusing for the user to sort it out. I'm going
to try recompile Pd with PD_BIGORSMALL checks enabled, and see what it
means for the normal performance of the filter objects.

Katja


On Mon, Jan 21, 2013 at 4:24 PM, Pierre Massat pimas...@gmail.com wrote:
 Hi Katja, thank you for your reply! It is now (slightly) clearer. Every time
 you post something here I feel like some messages from a technical NASA
 mailing list are being accidentally sent to pd-list!

 Cheers,

 Pierre.


 2013/1/21 katja katjavet...@gmail.com

 Pierre, the way how denormals can impact performance on the Pi, is
 whenever a an object with feedback delay (IIR filter, reverb etc.)
 stops receiving input signal, it's values decay into the subnormal
 range, which causes substantial increase of CPU load. Such situations
 can be avoided by adding a tiny DC value to the object input, like [+~
 1e-21] (note the minus sign in the number notation). When a normal
 audio signal is present, that number is too small to be added (because
 of limited precision), but when audio stops, it prevents subnormals.

 Another thing is, one should be careful not to accidentally send 'inf'
 or 'nan' into such objects, as they can not recover from it. This
 would be particularly annoying in a public performance, since you'd
 need to reload the containing patch to recover.

 It is possible to prevent denormals via C code, as it is currently
 done for Pd on Intel processors, but this implements a lot of
 conditional checks and it means performance loss for many objects. For
 current Intel computers the extra load is not so much of a problem,
 but for poor Raspberry Pi one would rather like to save a few
 instructions, instead of adding more.

 Katja


 On Sun, Jan 20, 2013 at 5:27 PM, Pierre Massat pimas...@gmail.com wrote:
  Hi,
 
  Could someone please explain how this impacts Pd's performance on the
  Raspberry Pi ?
  It doesn't make any sense to me right now, but i'm very curious...
 
  Cheers,
 
  Pierre.
 
 
  2013/1/20 Hans-Christoph Steiner h...@at.or.at
 
 
  I think this is what you want, from 'man gcc'.  Its interesting to note
  that
  the NEON mode, which provides SIMD, also does not do denormals:
 
  -mfpu=name
  -mfpe=number
  -mfp=number
  This specifies what floating point hardware (or hardware emulation)
  is
  available on the target.  Permissible names are: fpa, fpe2, fpe3,
  maverick,
  vfp, vfpv3, vfpv3-fp16, vfpv3-d16, vfpv3-d16-fp16, vfpv3xd,
  vfpv3xd-fp16,
  neon, neon-fp16, vfpv4, vfpv4-d16, fpv4-sp-d16 and neon-vfpv4.
  -mfp
  and
  -mfpe are synonyms for -mfpu=fpenumber, for compatibility with
  older
  versions of GCC.
 
  If -msoft-float is specified this specifies the format of floating
  point
  values.
 
  If the selected floating-point hardware includes the NEON extension
  (e.g.
  -mfpu=neon), note that floating-point operations will not be used
  by
  GCC's
  auto-vectorization pass unless -funsafe-math-optimizations is also
  specified.  This is because NEON hardware does not fully implement
  the
  IEEE
  754 standard for floating-point arithmetic (in particular denormal
  values
  are treated as zero), so the use of NEON instructions may lead to a
  loss of
  precision.
 
 
  .hc
 
  On 01/20/2013 06:54 AM, katja wrote:
   I was assuming, or maybe just hoping? that Raspberry Pi (and ARM
   devices in general) would not suffer from Denormal's disease like
   Intel processors do. But guess what: Pi's float coprocessor is IEEE
   754 compliant and does all denormals by default (can check with
   attached denorm-test.pd). Bummer! As if one would use an ARM device
   to
   calculate the size of a Majorana particle, rather than doing simple
   dsp. Do we really need to enable PD-BIGORSMALL() checks 

Re: [PD] Raspberry Pi does denormals

2013-01-22 Thread Pierre Massat
Hi Katja,

I tried on my laptop (Intel dual-core 2 duo, 1,66GHz), and it works fine I
guess. It takes a little to go from 0 to 1 and back, but there doesn't seem
to be any particular issues with NAN and INF numbers.

Now on the Pi : sending a NAN to the lop~ makes it hang. Sending 1 works
fine, but sending 0 after that results in a different behaviour compared to
that of my laptop : the number of non-zero digits seems to grow much
faster, and it never really actually goes to 0. It (apparently) hangs with
still about 30 non-zero digits to the right of the very long number.

I guess you're not the only one, are you ?

Cheers,

Pierre.

2013/1/22 katja katjavet...@gmail.com

 Hey Pierre,

 I've commented patch denorm-test.pd in such a way that it explains the
 topic a bit more (see attached). Now I'd like to ask you, if you can
 run the patch on a 'normal' computer and on your Raspberry Pi for
 comparison. If you have Pd gui on the Pi, you should be able to check
 if your Pd install has a denormals issue. It's hard to believe that I
 would be the only one experiencing this issue, but I need to be sure.

 In my previous mail I stated that it is easy to avoid subnormals by
 just feeding a very small number in objects that may decay into
 subnormal range (feedback delay lines and most filters). But in
 practice, this is not always so easy, as I am now experiencing while
 trying to make a big patch work well on the Pi. There are many more
 filters than I was aware of. For example, following a bit-mangling
 operation a [hip~] was added to remove DC. Now if the bit-mangler
 stops receiving signal input, [hip~] starts to chew subnormals until
 it will receive signal again. I found that each filter struggling with
 subnormals eats at least 6% CPU time (while they do some 0.25% in
 normal state). Pd objects should really take care of this in one way
 or another, it's too confusing for the user to sort it out. I'm going
 to try recompile Pd with PD_BIGORSMALL checks enabled, and see what it
 means for the normal performance of the filter objects.

 Katja


 On Mon, Jan 21, 2013 at 4:24 PM, Pierre Massat pimas...@gmail.com wrote:
  Hi Katja, thank you for your reply! It is now (slightly) clearer. Every
 time
  you post something here I feel like some messages from a technical NASA
  mailing list are being accidentally sent to pd-list!
 
  Cheers,
 
  Pierre.
 
 
  2013/1/21 katja katjavet...@gmail.com
 
  Pierre, the way how denormals can impact performance on the Pi, is
  whenever a an object with feedback delay (IIR filter, reverb etc.)
  stops receiving input signal, it's values decay into the subnormal
  range, which causes substantial increase of CPU load. Such situations
  can be avoided by adding a tiny DC value to the object input, like [+~
  1e-21] (note the minus sign in the number notation). When a normal
  audio signal is present, that number is too small to be added (because
  of limited precision), but when audio stops, it prevents subnormals.
 
  Another thing is, one should be careful not to accidentally send 'inf'
  or 'nan' into such objects, as they can not recover from it. This
  would be particularly annoying in a public performance, since you'd
  need to reload the containing patch to recover.
 
  It is possible to prevent denormals via C code, as it is currently
  done for Pd on Intel processors, but this implements a lot of
  conditional checks and it means performance loss for many objects. For
  current Intel computers the extra load is not so much of a problem,
  but for poor Raspberry Pi one would rather like to save a few
  instructions, instead of adding more.
 
  Katja
 
 
  On Sun, Jan 20, 2013 at 5:27 PM, Pierre Massat pimas...@gmail.com
 wrote:
   Hi,
  
   Could someone please explain how this impacts Pd's performance on the
   Raspberry Pi ?
   It doesn't make any sense to me right now, but i'm very curious...
  
   Cheers,
  
   Pierre.
  
  
   2013/1/20 Hans-Christoph Steiner h...@at.or.at
  
  
   I think this is what you want, from 'man gcc'.  Its interesting to
 note
   that
   the NEON mode, which provides SIMD, also does not do denormals:
  
   -mfpu=name
   -mfpe=number
   -mfp=number
   This specifies what floating point hardware (or hardware
 emulation)
   is
   available on the target.  Permissible names are: fpa, fpe2, fpe3,
   maverick,
   vfp, vfpv3, vfpv3-fp16, vfpv3-d16, vfpv3-d16-fp16, vfpv3xd,
   vfpv3xd-fp16,
   neon, neon-fp16, vfpv4, vfpv4-d16, fpv4-sp-d16 and neon-vfpv4.
   -mfp
   and
   -mfpe are synonyms for -mfpu=fpenumber, for compatibility with
   older
   versions of GCC.
  
   If -msoft-float is specified this specifies the format of
 floating
   point
   values.
  
   If the selected floating-point hardware includes the NEON
 extension
   (e.g.
   -mfpu=neon), note that floating-point operations will not be used
   by
   GCC's
   auto-vectorization pass unless -funsafe-math-optimizations is
 also
   

Re: [PD] Raspberry Pi does denormals

2013-01-22 Thread Miller Puckette
thanks - I'd better try this and find out what's going on :)

M

On Mon, Jan 21, 2013 at 11:54:29AM +0100, katja wrote:
 Tried the 0.44.0 build from your website. It has the same issue with
 subnormal values. My test patch is with [lop~]. If inf or nan is fed
 into [lop~], these 'values' keep circulating in the object, it can no
 longer process normal signal values.
 
 I also tried my reverb stuff with specific compiler options for Pi's 
 processor:
 
 -march=armv6zk
 -mcpu=arm1176jzf-s
 -mtune=arm1176jzf-s
 
 With these options, gcc should be able to decide that RunFast mode is
 permitted. But even in combination with -ffast-math (which in turn
 sets -funsafe-math-optimizations and -fno-trapping-math amongst
 others), denormals are still there. I'm literally out of options for
 the moment. Sorry for not having better news.
 
 Katja
 
 

___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management - 
http://lists.puredata.info/listinfo/pd-list


Re: [PD] Raspberry Pi does denormals

2013-01-21 Thread katja
Tried the 0.44.0 build from your website. It has the same issue with
subnormal values. My test patch is with [lop~]. If inf or nan is fed
into [lop~], these 'values' keep circulating in the object, it can no
longer process normal signal values.

I also tried my reverb stuff with specific compiler options for Pi's processor:

-march=armv6zk
-mcpu=arm1176jzf-s
-mtune=arm1176jzf-s

With these options, gcc should be able to decide that RunFast mode is
permitted. But even in combination with -ffast-math (which in turn
sets -funsafe-math-optimizations and -fno-trapping-math amongst
others), denormals are still there. I'm literally out of options for
the moment. Sorry for not having better news.

Katja



On Sun, Jan 20, 2013 at 9:51 PM, Miller Puckette m...@ucsd.edu wrote:
 OK.. but try the 0.44 build on my site - the one from Raspian is quite old :)

 M

 On Sun, Jan 20, 2013 at 09:28:30PM +0100, katja wrote:
 Miller, the vanilla Pd which can be installed from Raspbian with
 apt-get or Synaptic does have the subnormals problem, as can be
 checked with a test patch attached with my first post. When an input
 signal to [lop~] is shut off, CPU load increases substantially. Output
 values go down in the order of 1e-44, subnormal range. I was working
 on reverb algo's showing the same problem, and compiled with option
 -ffastmath / --fast-math to see if that would turn on RunFast mode,
 but it didn't.

 I'm not familiar with ARM and it's coprocessors, but from Intel I do
 know that gcc doesn't implement certain specified optimization options
 (notably SSE versions) unless you also mention a processor type that
 can handle it . A similar case could be with Rpi's vfpv2; it can do
 RunFast mode but gcc doesn't implement it, until you find a way to
 specify vfpv2 (vfpv1 can't do RunFast). Miller, if you succeeded in
 getting automatic flush-to-zero on the Pi, it may be related to other
 flags which you've set. Arch flags which I've set so far are
 -march=armv6 and -mfpu=vfp. Option -mfpu=vfpv2 is not allowed. I would
 be happy to do further testing with compiler options, if you know
 some. The big-or-small checks are rather expensive for RPi, that's
 what I've found.

 Katja


 On Sun, Jan 20, 2013 at 8:24 PM, Miller Puckette m...@ucsd.edu wrote:
  Hi all...
 
  I think it's possible to get flush-to-zero behavior on the Pi (ARMv6) by
  calling gcc with --fast-math.  At any rate what I found was that, if I
  compiled without --fast-math, when numbers got small (e.g., when a
  reverberator decays down past 10^-38 or so), the patch would suddenly jump
  in CPI usage as if it were trappnig to the kernel (as it does for i386).
  But when I added --fast-math the problem went away.
 
  On i386 and x86_64, I believe that one can't get flush-to-zero (at least in
  the normal non-SSE floating point instructions) so there's no choice but
  to use a macro such as PD_BADFLOAT to protect against that.  So in m_pd.h 
  the
  PD_BADFLOAT macro is only turned on for Intel.
 
  However I've been mistaken many times about all this in the past and won't
  be surprised if I'm mistaken again.
 
  cheers
  Miller
 
  On Sun, Jan 20, 2013 at 11:12:28AM -0500, Hans-Christoph Steiner wrote:
 
  I think this is what you want, from 'man gcc'.  Its interesting to note 
  that
  the NEON mode, which provides SIMD, also does not do denormals:
 
  -mfpu=name
  -mfpe=number
  -mfp=number
  This specifies what floating point hardware (or hardware emulation) is
  available on the target.  Permissible names are: fpa, fpe2, fpe3, 
  maverick,
  vfp, vfpv3, vfpv3-fp16, vfpv3-d16, vfpv3-d16-fp16, vfpv3xd, 
  vfpv3xd-fp16,
  neon, neon-fp16, vfpv4, vfpv4-d16, fpv4-sp-d16 and neon-vfpv4.  -mfp 
  and
  -mfpe are synonyms for -mfpu=fpenumber, for compatibility with older
  versions of GCC.
 
  If -msoft-float is specified this specifies the format of floating 
  point
  values.
 
  If the selected floating-point hardware includes the NEON extension 
  (e.g.
  -mfpu=neon), note that floating-point operations will not be used by 
  GCC's
  auto-vectorization pass unless -funsafe-math-optimizations is also
  specified.  This is because NEON hardware does not fully implement 
  the IEEE
  754 standard for floating-point arithmetic (in particular denormal 
  values
  are treated as zero), so the use of NEON instructions may lead to a 
  loss of
  precision.
 
 
  .hc
 
  On 01/20/2013 06:54 AM, katja wrote:
   I was assuming, or maybe just hoping? that Raspberry Pi (and ARM
   devices in general) would not suffer from Denormal's disease like
   Intel processors do. But guess what: Pi's float coprocessor is IEEE
   754 compliant and does all denormals by default (can check with
   attached denorm-test.pd). Bummer! As if one would use an ARM device to
   calculate the size of a Majorana particle, rather than doing simple
   dsp. Do we really need to enable PD-BIGORSMALL() checks for this poor
   little 

Re: [PD] Raspberry Pi does denormals

2013-01-21 Thread katja
Pierre, the way how denormals can impact performance on the Pi, is
whenever a an object with feedback delay (IIR filter, reverb etc.)
stops receiving input signal, it's values decay into the subnormal
range, which causes substantial increase of CPU load. Such situations
can be avoided by adding a tiny DC value to the object input, like [+~
1e-21] (note the minus sign in the number notation). When a normal
audio signal is present, that number is too small to be added (because
of limited precision), but when audio stops, it prevents subnormals.

Another thing is, one should be careful not to accidentally send 'inf'
or 'nan' into such objects, as they can not recover from it. This
would be particularly annoying in a public performance, since you'd
need to reload the containing patch to recover.

It is possible to prevent denormals via C code, as it is currently
done for Pd on Intel processors, but this implements a lot of
conditional checks and it means performance loss for many objects. For
current Intel computers the extra load is not so much of a problem,
but for poor Raspberry Pi one would rather like to save a few
instructions, instead of adding more.

Katja


On Sun, Jan 20, 2013 at 5:27 PM, Pierre Massat pimas...@gmail.com wrote:
 Hi,

 Could someone please explain how this impacts Pd's performance on the
 Raspberry Pi ?
 It doesn't make any sense to me right now, but i'm very curious...

 Cheers,

 Pierre.


 2013/1/20 Hans-Christoph Steiner h...@at.or.at


 I think this is what you want, from 'man gcc'.  Its interesting to note
 that
 the NEON mode, which provides SIMD, also does not do denormals:

 -mfpu=name
 -mfpe=number
 -mfp=number
 This specifies what floating point hardware (or hardware emulation) is
 available on the target.  Permissible names are: fpa, fpe2, fpe3,
 maverick,
 vfp, vfpv3, vfpv3-fp16, vfpv3-d16, vfpv3-d16-fp16, vfpv3xd,
 vfpv3xd-fp16,
 neon, neon-fp16, vfpv4, vfpv4-d16, fpv4-sp-d16 and neon-vfpv4.  -mfp
 and
 -mfpe are synonyms for -mfpu=fpenumber, for compatibility with older
 versions of GCC.

 If -msoft-float is specified this specifies the format of floating
 point
 values.

 If the selected floating-point hardware includes the NEON extension
 (e.g.
 -mfpu=neon), note that floating-point operations will not be used by
 GCC's
 auto-vectorization pass unless -funsafe-math-optimizations is also
 specified.  This is because NEON hardware does not fully implement the
 IEEE
 754 standard for floating-point arithmetic (in particular denormal
 values
 are treated as zero), so the use of NEON instructions may lead to a
 loss of
 precision.


 .hc

 On 01/20/2013 06:54 AM, katja wrote:
  I was assuming, or maybe just hoping? that Raspberry Pi (and ARM
  devices in general) would not suffer from Denormal's disease like
  Intel processors do. But guess what: Pi's float coprocessor is IEEE
  754 compliant and does all denormals by default (can check with
  attached denorm-test.pd). Bummer! As if one would use an ARM device to
  calculate the size of a Majorana particle, rather than doing simple
  dsp. Do we really need to enable PD-BIGORSMALL() checks for this poor
  little processor? There seems to be something called 'RunFast mode'
  for Pi's float processor vfpv2, but I see no way how to enable this
  via gcc. Option -ffast-math is allowed but doesn't do the trick. Can't
  find an option to set vfpv2 specifically, in gcc docs.
 
  Katja
 
 
 
  ___
  Pd-list@iem.at mailing list
  UNSUBSCRIBE and account-management -
  http://lists.puredata.info/listinfo/pd-list
 

 ___
 Pd-list@iem.at mailing list
 UNSUBSCRIBE and account-management -
 http://lists.puredata.info/listinfo/pd-list



 ___
 Pd-list@iem.at mailing list
 UNSUBSCRIBE and account-management -
 http://lists.puredata.info/listinfo/pd-list


___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management - 
http://lists.puredata.info/listinfo/pd-list


Re: [PD] Raspberry Pi does denormals

2013-01-21 Thread Pierre Massat
Hi Katja, thank you for your reply! It is now (slightly) clearer. Every
time you post something here I feel like some messages from a technical
NASA mailing list are being accidentally sent to pd-list!

Cheers,

Pierre.

2013/1/21 katja katjavet...@gmail.com

 Pierre, the way how denormals can impact performance on the Pi, is
 whenever a an object with feedback delay (IIR filter, reverb etc.)
 stops receiving input signal, it's values decay into the subnormal
 range, which causes substantial increase of CPU load. Such situations
 can be avoided by adding a tiny DC value to the object input, like [+~
 1e-21] (note the minus sign in the number notation). When a normal
 audio signal is present, that number is too small to be added (because
 of limited precision), but when audio stops, it prevents subnormals.

 Another thing is, one should be careful not to accidentally send 'inf'
 or 'nan' into such objects, as they can not recover from it. This
 would be particularly annoying in a public performance, since you'd
 need to reload the containing patch to recover.

 It is possible to prevent denormals via C code, as it is currently
 done for Pd on Intel processors, but this implements a lot of
 conditional checks and it means performance loss for many objects. For
 current Intel computers the extra load is not so much of a problem,
 but for poor Raspberry Pi one would rather like to save a few
 instructions, instead of adding more.

 Katja


 On Sun, Jan 20, 2013 at 5:27 PM, Pierre Massat pimas...@gmail.com wrote:
  Hi,
 
  Could someone please explain how this impacts Pd's performance on the
  Raspberry Pi ?
  It doesn't make any sense to me right now, but i'm very curious...
 
  Cheers,
 
  Pierre.
 
 
  2013/1/20 Hans-Christoph Steiner h...@at.or.at
 
 
  I think this is what you want, from 'man gcc'.  Its interesting to note
  that
  the NEON mode, which provides SIMD, also does not do denormals:
 
  -mfpu=name
  -mfpe=number
  -mfp=number
  This specifies what floating point hardware (or hardware emulation)
 is
  available on the target.  Permissible names are: fpa, fpe2, fpe3,
  maverick,
  vfp, vfpv3, vfpv3-fp16, vfpv3-d16, vfpv3-d16-fp16, vfpv3xd,
  vfpv3xd-fp16,
  neon, neon-fp16, vfpv4, vfpv4-d16, fpv4-sp-d16 and neon-vfpv4.  -mfp
  and
  -mfpe are synonyms for -mfpu=fpenumber, for compatibility with older
  versions of GCC.
 
  If -msoft-float is specified this specifies the format of floating
  point
  values.
 
  If the selected floating-point hardware includes the NEON extension
  (e.g.
  -mfpu=neon), note that floating-point operations will not be used by
  GCC's
  auto-vectorization pass unless -funsafe-math-optimizations is also
  specified.  This is because NEON hardware does not fully implement
 the
  IEEE
  754 standard for floating-point arithmetic (in particular denormal
  values
  are treated as zero), so the use of NEON instructions may lead to a
  loss of
  precision.
 
 
  .hc
 
  On 01/20/2013 06:54 AM, katja wrote:
   I was assuming, or maybe just hoping? that Raspberry Pi (and ARM
   devices in general) would not suffer from Denormal's disease like
   Intel processors do. But guess what: Pi's float coprocessor is IEEE
   754 compliant and does all denormals by default (can check with
   attached denorm-test.pd). Bummer! As if one would use an ARM device to
   calculate the size of a Majorana particle, rather than doing simple
   dsp. Do we really need to enable PD-BIGORSMALL() checks for this poor
   little processor? There seems to be something called 'RunFast mode'
   for Pi's float processor vfpv2, but I see no way how to enable this
   via gcc. Option -ffast-math is allowed but doesn't do the trick. Can't
   find an option to set vfpv2 specifically, in gcc docs.
  
   Katja
  
  
  
   ___
   Pd-list@iem.at mailing list
   UNSUBSCRIBE and account-management -
   http://lists.puredata.info/listinfo/pd-list
  
 
  ___
  Pd-list@iem.at mailing list
  UNSUBSCRIBE and account-management -
  http://lists.puredata.info/listinfo/pd-list
 
 
 
  ___
  Pd-list@iem.at mailing list
  UNSUBSCRIBE and account-management -
  http://lists.puredata.info/listinfo/pd-list
 

___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management - 
http://lists.puredata.info/listinfo/pd-list


Re: [PD] Raspberry Pi does denormals

2013-01-20 Thread Hans-Christoph Steiner

I think this is what you want, from 'man gcc'.  Its interesting to note that
the NEON mode, which provides SIMD, also does not do denormals:

-mfpu=name
-mfpe=number
-mfp=number
This specifies what floating point hardware (or hardware emulation) is
available on the target.  Permissible names are: fpa, fpe2, fpe3, maverick,
vfp, vfpv3, vfpv3-fp16, vfpv3-d16, vfpv3-d16-fp16, vfpv3xd, vfpv3xd-fp16,
neon, neon-fp16, vfpv4, vfpv4-d16, fpv4-sp-d16 and neon-vfpv4.  -mfp and
-mfpe are synonyms for -mfpu=fpenumber, for compatibility with older
versions of GCC.

If -msoft-float is specified this specifies the format of floating point
values.

If the selected floating-point hardware includes the NEON extension (e.g.
-mfpu=neon), note that floating-point operations will not be used by GCC's
auto-vectorization pass unless -funsafe-math-optimizations is also
specified.  This is because NEON hardware does not fully implement the IEEE
754 standard for floating-point arithmetic (in particular denormal values
are treated as zero), so the use of NEON instructions may lead to a loss of
precision.


.hc

On 01/20/2013 06:54 AM, katja wrote:
 I was assuming, or maybe just hoping? that Raspberry Pi (and ARM
 devices in general) would not suffer from Denormal's disease like
 Intel processors do. But guess what: Pi's float coprocessor is IEEE
 754 compliant and does all denormals by default (can check with
 attached denorm-test.pd). Bummer! As if one would use an ARM device to
 calculate the size of a Majorana particle, rather than doing simple
 dsp. Do we really need to enable PD-BIGORSMALL() checks for this poor
 little processor? There seems to be something called 'RunFast mode'
 for Pi's float processor vfpv2, but I see no way how to enable this
 via gcc. Option -ffast-math is allowed but doesn't do the trick. Can't
 find an option to set vfpv2 specifically, in gcc docs.
 
 Katja
 
 
 
 ___
 Pd-list@iem.at mailing list
 UNSUBSCRIBE and account-management - 
 http://lists.puredata.info/listinfo/pd-list
 

___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management - 
http://lists.puredata.info/listinfo/pd-list


Re: [PD] Raspberry Pi does denormals

2013-01-20 Thread Miller Puckette
Hi all...

I think it's possible to get flush-to-zero behavior on the Pi (ARMv6) by
calling gcc with --fast-math.  At any rate what I found was that, if I
compiled without --fast-math, when numbers got small (e.g., when a
reverberator decays down past 10^-38 or so), the patch would suddenly jump
in CPI usage as if it were trappnig to the kernel (as it does for i386).
But when I added --fast-math the problem went away.

On i386 and x86_64, I believe that one can't get flush-to-zero (at least in
the normal non-SSE floating point instructions) so there's no choice but
to use a macro such as PD_BADFLOAT to protect against that.  So in m_pd.h the
PD_BADFLOAT macro is only turned on for Intel.

However I've been mistaken many times about all this in the past and won't
be surprised if I'm mistaken again.

cheers
Miller

On Sun, Jan 20, 2013 at 11:12:28AM -0500, Hans-Christoph Steiner wrote:
 
 I think this is what you want, from 'man gcc'.  Its interesting to note that
 the NEON mode, which provides SIMD, also does not do denormals:
 
 -mfpu=name
 -mfpe=number
 -mfp=number
 This specifies what floating point hardware (or hardware emulation) is
 available on the target.  Permissible names are: fpa, fpe2, fpe3, 
 maverick,
 vfp, vfpv3, vfpv3-fp16, vfpv3-d16, vfpv3-d16-fp16, vfpv3xd, vfpv3xd-fp16,
 neon, neon-fp16, vfpv4, vfpv4-d16, fpv4-sp-d16 and neon-vfpv4.  -mfp and
 -mfpe are synonyms for -mfpu=fpenumber, for compatibility with older
 versions of GCC.
 
 If -msoft-float is specified this specifies the format of floating point
 values.
 
 If the selected floating-point hardware includes the NEON extension (e.g.
 -mfpu=neon), note that floating-point operations will not be used by GCC's
 auto-vectorization pass unless -funsafe-math-optimizations is also
 specified.  This is because NEON hardware does not fully implement the 
 IEEE
 754 standard for floating-point arithmetic (in particular denormal values
 are treated as zero), so the use of NEON instructions may lead to a loss 
 of
 precision.
 
 
 .hc
 
 On 01/20/2013 06:54 AM, katja wrote:
  I was assuming, or maybe just hoping? that Raspberry Pi (and ARM
  devices in general) would not suffer from Denormal's disease like
  Intel processors do. But guess what: Pi's float coprocessor is IEEE
  754 compliant and does all denormals by default (can check with
  attached denorm-test.pd). Bummer! As if one would use an ARM device to
  calculate the size of a Majorana particle, rather than doing simple
  dsp. Do we really need to enable PD-BIGORSMALL() checks for this poor
  little processor? There seems to be something called 'RunFast mode'
  for Pi's float processor vfpv2, but I see no way how to enable this
  via gcc. Option -ffast-math is allowed but doesn't do the trick. Can't
  find an option to set vfpv2 specifically, in gcc docs.
  
  Katja
  
  
  
  ___
  Pd-list@iem.at mailing list
  UNSUBSCRIBE and account-management - 
  http://lists.puredata.info/listinfo/pd-list
  
 
 ___
 Pd-list@iem.at mailing list
 UNSUBSCRIBE and account-management - 
 http://lists.puredata.info/listinfo/pd-list

___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management - 
http://lists.puredata.info/listinfo/pd-list


Re: [PD] Raspberry Pi does denormals

2013-01-20 Thread katja
Miller, the vanilla Pd which can be installed from Raspbian with
apt-get or Synaptic does have the subnormals problem, as can be
checked with a test patch attached with my first post. When an input
signal to [lop~] is shut off, CPU load increases substantially. Output
values go down in the order of 1e-44, subnormal range. I was working
on reverb algo's showing the same problem, and compiled with option
-ffastmath / --fast-math to see if that would turn on RunFast mode,
but it didn't.

I'm not familiar with ARM and it's coprocessors, but from Intel I do
know that gcc doesn't implement certain specified optimization options
(notably SSE versions) unless you also mention a processor type that
can handle it . A similar case could be with Rpi's vfpv2; it can do
RunFast mode but gcc doesn't implement it, until you find a way to
specify vfpv2 (vfpv1 can't do RunFast). Miller, if you succeeded in
getting automatic flush-to-zero on the Pi, it may be related to other
flags which you've set. Arch flags which I've set so far are
-march=armv6 and -mfpu=vfp. Option -mfpu=vfpv2 is not allowed. I would
be happy to do further testing with compiler options, if you know
some. The big-or-small checks are rather expensive for RPi, that's
what I've found.

Katja


On Sun, Jan 20, 2013 at 8:24 PM, Miller Puckette m...@ucsd.edu wrote:
 Hi all...

 I think it's possible to get flush-to-zero behavior on the Pi (ARMv6) by
 calling gcc with --fast-math.  At any rate what I found was that, if I
 compiled without --fast-math, when numbers got small (e.g., when a
 reverberator decays down past 10^-38 or so), the patch would suddenly jump
 in CPI usage as if it were trappnig to the kernel (as it does for i386).
 But when I added --fast-math the problem went away.

 On i386 and x86_64, I believe that one can't get flush-to-zero (at least in
 the normal non-SSE floating point instructions) so there's no choice but
 to use a macro such as PD_BADFLOAT to protect against that.  So in m_pd.h the
 PD_BADFLOAT macro is only turned on for Intel.

 However I've been mistaken many times about all this in the past and won't
 be surprised if I'm mistaken again.

 cheers
 Miller

 On Sun, Jan 20, 2013 at 11:12:28AM -0500, Hans-Christoph Steiner wrote:

 I think this is what you want, from 'man gcc'.  Its interesting to note that
 the NEON mode, which provides SIMD, also does not do denormals:

 -mfpu=name
 -mfpe=number
 -mfp=number
 This specifies what floating point hardware (or hardware emulation) is
 available on the target.  Permissible names are: fpa, fpe2, fpe3, 
 maverick,
 vfp, vfpv3, vfpv3-fp16, vfpv3-d16, vfpv3-d16-fp16, vfpv3xd, vfpv3xd-fp16,
 neon, neon-fp16, vfpv4, vfpv4-d16, fpv4-sp-d16 and neon-vfpv4.  -mfp and
 -mfpe are synonyms for -mfpu=fpenumber, for compatibility with older
 versions of GCC.

 If -msoft-float is specified this specifies the format of floating point
 values.

 If the selected floating-point hardware includes the NEON extension (e.g.
 -mfpu=neon), note that floating-point operations will not be used by 
 GCC's
 auto-vectorization pass unless -funsafe-math-optimizations is also
 specified.  This is because NEON hardware does not fully implement the 
 IEEE
 754 standard for floating-point arithmetic (in particular denormal values
 are treated as zero), so the use of NEON instructions may lead to a loss 
 of
 precision.


 .hc

 On 01/20/2013 06:54 AM, katja wrote:
  I was assuming, or maybe just hoping? that Raspberry Pi (and ARM
  devices in general) would not suffer from Denormal's disease like
  Intel processors do. But guess what: Pi's float coprocessor is IEEE
  754 compliant and does all denormals by default (can check with
  attached denorm-test.pd). Bummer! As if one would use an ARM device to
  calculate the size of a Majorana particle, rather than doing simple
  dsp. Do we really need to enable PD-BIGORSMALL() checks for this poor
  little processor? There seems to be something called 'RunFast mode'
  for Pi's float processor vfpv2, but I see no way how to enable this
  via gcc. Option -ffast-math is allowed but doesn't do the trick. Can't
  find an option to set vfpv2 specifically, in gcc docs.
 
  Katja
 
 
 
  ___
  Pd-list@iem.at mailing list
  UNSUBSCRIBE and account-management - 
  http://lists.puredata.info/listinfo/pd-list
 

 ___
 Pd-list@iem.at mailing list
 UNSUBSCRIBE and account-management - 
 http://lists.puredata.info/listinfo/pd-list

 ___
 Pd-list@iem.at mailing list
 UNSUBSCRIBE and account-management - 
 http://lists.puredata.info/listinfo/pd-list

___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management - 
http://lists.puredata.info/listinfo/pd-list


Re: [PD] Raspberry Pi does denormals

2013-01-20 Thread katja
Hans, the info about NEON is relevant for armv7 (Beagleboard,
Cubieboard, PengPod...). But Raspberry Pi doesn't have NEON. Float
processing is done on coprocessor vfpv2. As far as I can see, vfpv2
hardly has any SIMD instructions (except for moving data between ARM
and vfp). It is said to process a maximum of 8 single precision floats
in parallel, but Raspberry Pi doesn't show a sign that it profits from
data alignment, at least not when code is compiled with gcc.

Katja


On Sun, Jan 20, 2013 at 5:12 PM, Hans-Christoph Steiner h...@at.or.at wrote:

 I think this is what you want, from 'man gcc'.  Its interesting to note that
 the NEON mode, which provides SIMD, also does not do denormals:

 -mfpu=name
 -mfpe=number
 -mfp=number
 This specifies what floating point hardware (or hardware emulation) is
 available on the target.  Permissible names are: fpa, fpe2, fpe3, 
 maverick,
 vfp, vfpv3, vfpv3-fp16, vfpv3-d16, vfpv3-d16-fp16, vfpv3xd, vfpv3xd-fp16,
 neon, neon-fp16, vfpv4, vfpv4-d16, fpv4-sp-d16 and neon-vfpv4.  -mfp and
 -mfpe are synonyms for -mfpu=fpenumber, for compatibility with older
 versions of GCC.

 If -msoft-float is specified this specifies the format of floating point
 values.

 If the selected floating-point hardware includes the NEON extension (e.g.
 -mfpu=neon), note that floating-point operations will not be used by GCC's
 auto-vectorization pass unless -funsafe-math-optimizations is also
 specified.  This is because NEON hardware does not fully implement the 
 IEEE
 754 standard for floating-point arithmetic (in particular denormal values
 are treated as zero), so the use of NEON instructions may lead to a loss 
 of
 precision.


 .hc

 On 01/20/2013 06:54 AM, katja wrote:
 I was assuming, or maybe just hoping? that Raspberry Pi (and ARM
 devices in general) would not suffer from Denormal's disease like
 Intel processors do. But guess what: Pi's float coprocessor is IEEE
 754 compliant and does all denormals by default (can check with
 attached denorm-test.pd). Bummer! As if one would use an ARM device to
 calculate the size of a Majorana particle, rather than doing simple
 dsp. Do we really need to enable PD-BIGORSMALL() checks for this poor
 little processor? There seems to be something called 'RunFast mode'
 for Pi's float processor vfpv2, but I see no way how to enable this
 via gcc. Option -ffast-math is allowed but doesn't do the trick. Can't
 find an option to set vfpv2 specifically, in gcc docs.

 Katja



 ___
 Pd-list@iem.at mailing list
 UNSUBSCRIBE and account-management - 
 http://lists.puredata.info/listinfo/pd-list


 ___
 Pd-list@iem.at mailing list
 UNSUBSCRIBE and account-management - 
 http://lists.puredata.info/listinfo/pd-list

___
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management - 
http://lists.puredata.info/listinfo/pd-list


Re: [PD] Raspberry Pi does denormals

2013-01-20 Thread Miller Puckette
OK.. but try the 0.44 build on my site - the one from Raspian is quite old :)

M

On Sun, Jan 20, 2013 at 09:28:30PM +0100, katja wrote:
 Miller, the vanilla Pd which can be installed from Raspbian with
 apt-get or Synaptic does have the subnormals problem, as can be
 checked with a test patch attached with my first post. When an input
 signal to [lop~] is shut off, CPU load increases substantially. Output
 values go down in the order of 1e-44, subnormal range. I was working
 on reverb algo's showing the same problem, and compiled with option
 -ffastmath / --fast-math to see if that would turn on RunFast mode,
 but it didn't.
 
 I'm not familiar with ARM and it's coprocessors, but from Intel I do
 know that gcc doesn't implement certain specified optimization options
 (notably SSE versions) unless you also mention a processor type that
 can handle it . A similar case could be with Rpi's vfpv2; it can do
 RunFast mode but gcc doesn't implement it, until you find a way to
 specify vfpv2 (vfpv1 can't do RunFast). Miller, if you succeeded in
 getting automatic flush-to-zero on the Pi, it may be related to other
 flags which you've set. Arch flags which I've set so far are
 -march=armv6 and -mfpu=vfp. Option -mfpu=vfpv2 is not allowed. I would
 be happy to do further testing with compiler options, if you know
 some. The big-or-small checks are rather expensive for RPi, that's
 what I've found.
 
 Katja
 
 
 On Sun, Jan 20, 2013 at 8:24 PM, Miller Puckette m...@ucsd.edu wrote:
  Hi all...
 
  I think it's possible to get flush-to-zero behavior on the Pi (ARMv6) by
  calling gcc with --fast-math.  At any rate what I found was that, if I
  compiled without --fast-math, when numbers got small (e.g., when a
  reverberator decays down past 10^-38 or so), the patch would suddenly jump
  in CPI usage as if it were trappnig to the kernel (as it does for i386).
  But when I added --fast-math the problem went away.
 
  On i386 and x86_64, I believe that one can't get flush-to-zero (at least in
  the normal non-SSE floating point instructions) so there's no choice but
  to use a macro such as PD_BADFLOAT to protect against that.  So in m_pd.h 
  the
  PD_BADFLOAT macro is only turned on for Intel.
 
  However I've been mistaken many times about all this in the past and won't
  be surprised if I'm mistaken again.
 
  cheers
  Miller
 
  On Sun, Jan 20, 2013 at 11:12:28AM -0500, Hans-Christoph Steiner wrote:
 
  I think this is what you want, from 'man gcc'.  Its interesting to note 
  that
  the NEON mode, which provides SIMD, also does not do denormals:
 
  -mfpu=name
  -mfpe=number
  -mfp=number
  This specifies what floating point hardware (or hardware emulation) is
  available on the target.  Permissible names are: fpa, fpe2, fpe3, 
  maverick,
  vfp, vfpv3, vfpv3-fp16, vfpv3-d16, vfpv3-d16-fp16, vfpv3xd, 
  vfpv3xd-fp16,
  neon, neon-fp16, vfpv4, vfpv4-d16, fpv4-sp-d16 and neon-vfpv4.  -mfp 
  and
  -mfpe are synonyms for -mfpu=fpenumber, for compatibility with older
  versions of GCC.
 
  If -msoft-float is specified this specifies the format of floating 
  point
  values.
 
  If the selected floating-point hardware includes the NEON extension 
  (e.g.
  -mfpu=neon), note that floating-point operations will not be used by 
  GCC's
  auto-vectorization pass unless -funsafe-math-optimizations is also
  specified.  This is because NEON hardware does not fully implement the 
  IEEE
  754 standard for floating-point arithmetic (in particular denormal 
  values
  are treated as zero), so the use of NEON instructions may lead to a 
  loss of
  precision.
 
 
  .hc
 
  On 01/20/2013 06:54 AM, katja wrote:
   I was assuming, or maybe just hoping? that Raspberry Pi (and ARM
   devices in general) would not suffer from Denormal's disease like
   Intel processors do. But guess what: Pi's float coprocessor is IEEE
   754 compliant and does all denormals by default (can check with
   attached denorm-test.pd). Bummer! As if one would use an ARM device to
   calculate the size of a Majorana particle, rather than doing simple
   dsp. Do we really need to enable PD-BIGORSMALL() checks for this poor
   little processor? There seems to be something called 'RunFast mode'
   for Pi's float processor vfpv2, but I see no way how to enable this
   via gcc. Option -ffast-math is allowed but doesn't do the trick. Can't
   find an option to set vfpv2 specifically, in gcc docs.
  
   Katja
  
  
  
   ___
   Pd-list@iem.at mailing list
   UNSUBSCRIBE and account-management - 
   http://lists.puredata.info/listinfo/pd-list
  
 
  ___
  Pd-list@iem.at mailing list
  UNSUBSCRIBE and account-management - 
  http://lists.puredata.info/listinfo/pd-list
 
  ___
  Pd-list@iem.at mailing list
  UNSUBSCRIBE and account-management -