Re: AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVR Register optimisations]

2008-01-14 Thread Joerg Wunsch
Dave N6NZ [EMAIL PROTECTED] wrote:

 the Atmel 802.15.4 MAC,

 Need to check license on that one -- but a good choice otherwise

BSD-style.

 If it is desired to have it in a more neutral place, such as
 avr-libc, I'm open to that too, if Joerg Wunsch is willing.

 Seems to me that as long as they are publicly available under an
 appropriate license, it doesn't really matter much who backs them up
 :)

Agreed, I think both locations (sf.net, or savannah.nongnu.org) would
do fine.

-- 
cheers, Jorg   .-.-.   --... ...--   -.. .  DL8DTL

http://www.sax.de/~joerg/NIC: JW11-RIPE
Never trust an operating system you don't have sources for. ;-)


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVR Register optimisations]

2008-01-13 Thread John Regehr
I'll just plug Avrora again:

  http://compilers.cs.ucla.edu/avrora/

It runs on many platforms (written all in Java), is quite fast, and is 
well designed.  Best of all it is easy to extend, you just add monitors 
that can be configured to receive a wide variety of callbacks about 
program events such as memory operations, I/O operations, execution of 
different kinds of instructions, interrupts, etc.

 focus
 on AVR-specific code, and GCC-specific AVR code at that.

Definitely.  If people want to test avr-gcc against other compilers, or 
compare AVR to other architectures, that's a separate exercise.

MiBench is an aging but useful collection of embedded C codes:

  http://www.eecs.umich.edu/mibench/

 John, I would welcome publicly available code from TinyOS, but I would
 need to be already compiled with nesc, so that way we just have straight
 C that we can feed into avr-gcc.

Sure, this is easy.  It'll target ATmega128 only, howver.

Re. floating point I believe that the papabench codes do a lot of this:

http://www.irit.fr/recherches/ARCHI/MARCH/rubrique.php3?id_rubrique=97

This is code extracted from the Paparazzi UAV project, which uses an 
ATmega for onboard flight control.

 There needs to be some consensus on what we measure, how we measure it,
 what output files we want generated, and hopefully some way to
 automatically generate composite results. I'm certainly open to anything
 in this area.

Code size and static RAM consumption are obvious.  Some sort of throughput 
metric is useful.  For interrupt-driven codes, my group often uses 
processor duty cycle as a measure of efficiency.  This is the % of time 
that the CPU is not in a sleep mode.  Dyanmic stack memory consumption is 
good, though this is not a very consistent metric for interrupt-driven 
codes since in a short simulation run the worst-case stack usage is 
unlikely to be encountered.  Perhaps adding up the stack memory usage of 
main + all interrupts would be better.

John Regehr


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVR Register optimisations]

2008-01-13 Thread Dave N6NZ



Weddington, Eric wrote:


Hi John, Dave, others,

Here are some random thoughts about a benchmark test suite:

- GCC has a page on benchmarks:
http://gcc.gnu.org/benchmarks/
However all of those are geared towards larger processors and host
systems. There is a link to a benchmark that focuses on code size,
CSiBE, http://www.inf.u-szeged.hu/csibe/. Again, that benchmark is
geared towards larger processors.

This creates a need to have a benchmark that is geared towards 8-bit
microcontroller environments in general, and specifically for the AVR.

What would we like to test?

Code size for sure. Everyone always seems to be interested in code size.
There is an interest in seeing how the GCC compiler performs from one
version to the next, to see if optimizations have improved or if they
have regressed.


Which I would call regression tests, not benchmarks, per se.  Of 
performance regressions, I would guess that code size regressions under 
-Os are the #1 priority for the typical user.  (A friend is currently 
tearing his hair out over a code size regression in a commercial PIC C 
compiler -- he needs to release a minor firmware update to the field... 
but not even the original code fits his flash any more...)


It's worth drawing a distinction between benchmarks and regression 
tests.  They need to be written differently.  A regression test needs to 
sensitize a particular condition, and needs to be small enough to be 
debuggable. A benchmark needs to be realistic, which often makes them 
harder to debug. I say we need both.  The performance regression tests 
can easily roll into release criteria.  A suite of performance 
benchmarks is more useful as a confirmatory measure of goodness -- but 
actual mysteries in the aggregate score will most likely be chased with 
smaller tests.


My guess is that existing tests my help us a lot in the benchmark 
category, but the regression tests will require some elbow grease on our 
part to get a good set.  There's a good chance we can extract good 
regression tests from existing benchmark-sized tests.


A semi-related question is how many of these tests can be pushed up 
stream?  If we could get a handful of uCtlr-oriented code size 
regression tests packaged up so that the developers of the generic 
optimizer could run them as release criteria, it would, I would think, 
improve the overall quality of gcc for all uCtlr targets.




There is also an interest in comparing AVR compilers, such as how GCC
compares to IAR, Codevision or ImageCraft compilers.


Who is interested? gcc developers, as a means to keep gcc competitive? 
Or potential users?  The former is benchmarking, the latter is moving 
towards bench-marketing. Not that marketing is bad, but that sort of 
thing can be a distraction.  In any case, the tests that are meaningful 
here are the benchmark overall goodness test suite, not the targeted 
test suite.




And sometimes there is an interest in comparing AVR against other
microcontrollers, notably Microchip's PIC and TI's MSP430.


Different processor with same compiler?  Different processor with best 
compiler? -- Now this is beginning to sound like SPEC.




Because there are these different interests, it is challenging to come
up with appropriate code samples to showcase and benchmark these
different issues. But we could also implement this in stages, and focus
on AVR-specific code, and GCC-specific AVR code at that.


Clarity of classification is import.  Different buckets for different 
issues.




If we are going to put together a benchmark test suite, like others
benchmarks for GCC (for larger processors), then I would think that it
would be better to model it somewhat after those other benchmarks. I see
that they tend to use publicly available code, and a variety of
different types of applications.


For benchmarking, and bench-marketing, that's a good approach.  I'll be 
redundant and say those are probably not what you want to be debugging. 
It would make sense for what I'll call a avr-gcc dashboard.  I see a 
web page with a bunch of bar graphs on it.  A summary bar at the top 
that is the weighted sum of individual test bars.  As an avr-gcc user, 
that kind of summary page would be very useful from one release to the 
next for setting expectations regarding performance on your own 
application. As an avr-gcc release master, it's a good dashboard for 
tracking progress and release worthy-ness.



We should have something similar. Some
suggested projects: FreeRTOS (for the AVR)

Sounds good,
, uIP (however, we need to

pick a specific implementation of it for the AVR; I have a copy of
uIP-Crumb644), 

Another good one


the Atmel 802.15.4 MAC,

Need to check license on that one -- but a good choice otherwise


and the GCC version of the
Butterfly firmware. I also have a copy of the TI Competitive
Benchmark, which they, and other semiconductor companies, have used to
do comparisons between processors.
Not familiar with it.  Also, check the license. 

Re: AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVR Register optimisations]

2008-01-13 Thread John Regehr
 (A friend is currently tearing his hair out
 over a code size regression in a commercial PIC C compiler -- he needs to
 release a minor firmware update to the field... but not even the original code
 fits his flash any more...)

Embedded compiler rule #1: If you find a version of the compiler that 
works, keep a copy around for the life of the product.
 
John


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


RE: AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVR Register optimisations]

2008-01-13 Thread Weddington, Eric
 

 -Original Message-
 From: Dave N6NZ [mailto:[EMAIL PROTECTED] 
 Sent: Sunday, January 13, 2008 4:19 PM
 To: Weddington, Eric
 Cc: John Regehr; avr-gcc-list@nongnu.org
 Subject: Re: AVR Benchmark Test Suite [was: RE: 
 [avr-gcc-list] GCC-AVR Register optimisations]
 
 
 
 Weddington, Eric wrote:
 
 
 It's worth drawing a distinction between benchmarks and regression 
 tests.  They need to be written differently.  A regression 
 test needs to 
 sensitize a particular condition, and needs to be small enough to be 
 debuggable. A benchmark needs to be realistic, which often 
 makes them 
 harder to debug. I say we need both.  The performance 
 regression tests 
 can easily roll into release criteria.  A suite of performance 
 benchmarks is more useful as a confirmatory measure of 
 goodness -- but 
 actual mysteries in the aggregate score will most likely be 
 chased with 
 smaller tests.

Ok. Regression tests should really fit within the GCC Regression Test
framework. I would rather not duplicate the work that they have there.
So I'm really looking for benchmark tests, under your definition. That's
not to say I want to ignore the regression tests. I just want to fill in
a gap that's missing for the AVR.

 
 A semi-related question is how many of these tests can be pushed up 
 stream?  If we could get a handful of uCtlr-oriented code size 
 regression tests packaged up so that the developers of the generic 
 optimizer could run them as release criteria, it would, I 
 would think, 
 improve the overall quality of gcc for all uCtlr targets.

Nothing can be pushed upstream right now. As I mentioned in another post
in this thread, the AVR target is not that important in the eyes of the
overall members of the GCC project. I'm working diligently to change
that. But it's one of those, if we want something done, do it
ourselves.

 
  
  There is also an interest in comparing AVR compilers, such 
 as how GCC
  compares to IAR, Codevision or ImageCraft compilers.
 
 Who is interested? gcc developers, as a means to keep gcc 
 competitive? 
 Or potential users?  The former is benchmarking, the latter is moving 
 towards bench-marketing. Not that marketing is bad, but that sort of 
 thing can be a distraction.  In any case, the tests that are 
 meaningful 
 here are the benchmark overall goodness test suite, not the 
 targeted 
 test suite.

As a gcc developer, I am interested in some kind of metric to keep gcc
competitive with other AVR compilers. Honestly, it seems that it is
urban myth that IAR optimizes better than GCC. Is that really true?
For what applications? For what compiler switches? Eventually I'd like
to have something definitive to combat any FUD.

I don't want to get into bench-marketing. I would really like to have
something of value and meaningful, and not have to tweak numbers to
arrive at good results to show off. If AVR GCC sucks in an area, I don't
want to paper over it. I want to show it so we know what needs
improvement.

 
  
  And sometimes there is an interest in comparing AVR against other
  microcontrollers, notably Microchip's PIC and TI's MSP430.
 
 Different processor with same compiler?  Different processor 
 with best 
 compiler? -- Now this is beginning to sound like SPEC.

Well, lofty goals for sure. I don't want to get outside of the 8-bit
microcontroller realm. I certainly want to do first things first. But I
think it might be interesting, at some point in the future, if some of
those things could be achieved.
 
  
  If we are going to put together a benchmark test suite, like others
  benchmarks for GCC (for larger processors), then I would 
 think that it
  would be better to model it somewhat after those other 
 benchmarks. I see
  that they tend to use publicly available code, and a variety of
  different types of applications.
 
 For benchmarking, and bench-marketing, that's a good 
 approach.  I'll be 
 redundant and say those are probably not what you want to be 
 debugging. 
 It would make sense for what I'll call a avr-gcc dashboard. 
  I see a 
 web page with a bunch of bar graphs on it.  A summary bar at the top 
 that is the weighted sum of individual test bars.  As an 
 avr-gcc user, 
 that kind of summary page would be very useful from one 
 release to the 
 next for setting expectations regarding performance on your own 
 application. As an avr-gcc release master, it's a good dashboard for 
 tracking progress and release worthy-ness.

That's definitely the idea.
 
 
  the Atmel 802.15.4 MAC,
 Need to check license on that one -- but a good choice otherwise

:-)
 
  and the GCC version of the
  Butterfly firmware. I also have a copy of the TI Competitive
  Benchmark, which they, and other semiconductor companies, 
 have used to
  do comparisons between processors.
 Not familiar with it.  Also, check the license.  Processor 
 manufacturers 
 (like, oh, for instance, *all* the several I have worked for) 
 are very 
 touchy about benchmarks and benchmark