Re: AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVR Register optimisations]
Dave N6NZ [EMAIL PROTECTED] wrote: the Atmel 802.15.4 MAC, Need to check license on that one -- but a good choice otherwise BSD-style. If it is desired to have it in a more neutral place, such as avr-libc, I'm open to that too, if Joerg Wunsch is willing. Seems to me that as long as they are publicly available under an appropriate license, it doesn't really matter much who backs them up :) Agreed, I think both locations (sf.net, or savannah.nongnu.org) would do fine. -- cheers, Jorg .-.-. --... ...-- -.. . DL8DTL http://www.sax.de/~joerg/NIC: JW11-RIPE Never trust an operating system you don't have sources for. ;-) ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVR Register optimisations]
I'll just plug Avrora again: http://compilers.cs.ucla.edu/avrora/ It runs on many platforms (written all in Java), is quite fast, and is well designed. Best of all it is easy to extend, you just add monitors that can be configured to receive a wide variety of callbacks about program events such as memory operations, I/O operations, execution of different kinds of instructions, interrupts, etc. focus on AVR-specific code, and GCC-specific AVR code at that. Definitely. If people want to test avr-gcc against other compilers, or compare AVR to other architectures, that's a separate exercise. MiBench is an aging but useful collection of embedded C codes: http://www.eecs.umich.edu/mibench/ John, I would welcome publicly available code from TinyOS, but I would need to be already compiled with nesc, so that way we just have straight C that we can feed into avr-gcc. Sure, this is easy. It'll target ATmega128 only, howver. Re. floating point I believe that the papabench codes do a lot of this: http://www.irit.fr/recherches/ARCHI/MARCH/rubrique.php3?id_rubrique=97 This is code extracted from the Paparazzi UAV project, which uses an ATmega for onboard flight control. There needs to be some consensus on what we measure, how we measure it, what output files we want generated, and hopefully some way to automatically generate composite results. I'm certainly open to anything in this area. Code size and static RAM consumption are obvious. Some sort of throughput metric is useful. For interrupt-driven codes, my group often uses processor duty cycle as a measure of efficiency. This is the % of time that the CPU is not in a sleep mode. Dyanmic stack memory consumption is good, though this is not a very consistent metric for interrupt-driven codes since in a short simulation run the worst-case stack usage is unlikely to be encountered. Perhaps adding up the stack memory usage of main + all interrupts would be better. John Regehr ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVR Register optimisations]
Weddington, Eric wrote: Hi John, Dave, others, Here are some random thoughts about a benchmark test suite: - GCC has a page on benchmarks: http://gcc.gnu.org/benchmarks/ However all of those are geared towards larger processors and host systems. There is a link to a benchmark that focuses on code size, CSiBE, http://www.inf.u-szeged.hu/csibe/. Again, that benchmark is geared towards larger processors. This creates a need to have a benchmark that is geared towards 8-bit microcontroller environments in general, and specifically for the AVR. What would we like to test? Code size for sure. Everyone always seems to be interested in code size. There is an interest in seeing how the GCC compiler performs from one version to the next, to see if optimizations have improved or if they have regressed. Which I would call regression tests, not benchmarks, per se. Of performance regressions, I would guess that code size regressions under -Os are the #1 priority for the typical user. (A friend is currently tearing his hair out over a code size regression in a commercial PIC C compiler -- he needs to release a minor firmware update to the field... but not even the original code fits his flash any more...) It's worth drawing a distinction between benchmarks and regression tests. They need to be written differently. A regression test needs to sensitize a particular condition, and needs to be small enough to be debuggable. A benchmark needs to be realistic, which often makes them harder to debug. I say we need both. The performance regression tests can easily roll into release criteria. A suite of performance benchmarks is more useful as a confirmatory measure of goodness -- but actual mysteries in the aggregate score will most likely be chased with smaller tests. My guess is that existing tests my help us a lot in the benchmark category, but the regression tests will require some elbow grease on our part to get a good set. There's a good chance we can extract good regression tests from existing benchmark-sized tests. A semi-related question is how many of these tests can be pushed up stream? If we could get a handful of uCtlr-oriented code size regression tests packaged up so that the developers of the generic optimizer could run them as release criteria, it would, I would think, improve the overall quality of gcc for all uCtlr targets. There is also an interest in comparing AVR compilers, such as how GCC compares to IAR, Codevision or ImageCraft compilers. Who is interested? gcc developers, as a means to keep gcc competitive? Or potential users? The former is benchmarking, the latter is moving towards bench-marketing. Not that marketing is bad, but that sort of thing can be a distraction. In any case, the tests that are meaningful here are the benchmark overall goodness test suite, not the targeted test suite. And sometimes there is an interest in comparing AVR against other microcontrollers, notably Microchip's PIC and TI's MSP430. Different processor with same compiler? Different processor with best compiler? -- Now this is beginning to sound like SPEC. Because there are these different interests, it is challenging to come up with appropriate code samples to showcase and benchmark these different issues. But we could also implement this in stages, and focus on AVR-specific code, and GCC-specific AVR code at that. Clarity of classification is import. Different buckets for different issues. If we are going to put together a benchmark test suite, like others benchmarks for GCC (for larger processors), then I would think that it would be better to model it somewhat after those other benchmarks. I see that they tend to use publicly available code, and a variety of different types of applications. For benchmarking, and bench-marketing, that's a good approach. I'll be redundant and say those are probably not what you want to be debugging. It would make sense for what I'll call a avr-gcc dashboard. I see a web page with a bunch of bar graphs on it. A summary bar at the top that is the weighted sum of individual test bars. As an avr-gcc user, that kind of summary page would be very useful from one release to the next for setting expectations regarding performance on your own application. As an avr-gcc release master, it's a good dashboard for tracking progress and release worthy-ness. We should have something similar. Some suggested projects: FreeRTOS (for the AVR) Sounds good, , uIP (however, we need to pick a specific implementation of it for the AVR; I have a copy of uIP-Crumb644), Another good one the Atmel 802.15.4 MAC, Need to check license on that one -- but a good choice otherwise and the GCC version of the Butterfly firmware. I also have a copy of the TI Competitive Benchmark, which they, and other semiconductor companies, have used to do comparisons between processors. Not familiar with it. Also, check the license.
Re: AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVR Register optimisations]
(A friend is currently tearing his hair out over a code size regression in a commercial PIC C compiler -- he needs to release a minor firmware update to the field... but not even the original code fits his flash any more...) Embedded compiler rule #1: If you find a version of the compiler that works, keep a copy around for the life of the product. John ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
RE: AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVR Register optimisations]
-Original Message- From: Dave N6NZ [mailto:[EMAIL PROTECTED] Sent: Sunday, January 13, 2008 4:19 PM To: Weddington, Eric Cc: John Regehr; avr-gcc-list@nongnu.org Subject: Re: AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVR Register optimisations] Weddington, Eric wrote: It's worth drawing a distinction between benchmarks and regression tests. They need to be written differently. A regression test needs to sensitize a particular condition, and needs to be small enough to be debuggable. A benchmark needs to be realistic, which often makes them harder to debug. I say we need both. The performance regression tests can easily roll into release criteria. A suite of performance benchmarks is more useful as a confirmatory measure of goodness -- but actual mysteries in the aggregate score will most likely be chased with smaller tests. Ok. Regression tests should really fit within the GCC Regression Test framework. I would rather not duplicate the work that they have there. So I'm really looking for benchmark tests, under your definition. That's not to say I want to ignore the regression tests. I just want to fill in a gap that's missing for the AVR. A semi-related question is how many of these tests can be pushed up stream? If we could get a handful of uCtlr-oriented code size regression tests packaged up so that the developers of the generic optimizer could run them as release criteria, it would, I would think, improve the overall quality of gcc for all uCtlr targets. Nothing can be pushed upstream right now. As I mentioned in another post in this thread, the AVR target is not that important in the eyes of the overall members of the GCC project. I'm working diligently to change that. But it's one of those, if we want something done, do it ourselves. There is also an interest in comparing AVR compilers, such as how GCC compares to IAR, Codevision or ImageCraft compilers. Who is interested? gcc developers, as a means to keep gcc competitive? Or potential users? The former is benchmarking, the latter is moving towards bench-marketing. Not that marketing is bad, but that sort of thing can be a distraction. In any case, the tests that are meaningful here are the benchmark overall goodness test suite, not the targeted test suite. As a gcc developer, I am interested in some kind of metric to keep gcc competitive with other AVR compilers. Honestly, it seems that it is urban myth that IAR optimizes better than GCC. Is that really true? For what applications? For what compiler switches? Eventually I'd like to have something definitive to combat any FUD. I don't want to get into bench-marketing. I would really like to have something of value and meaningful, and not have to tweak numbers to arrive at good results to show off. If AVR GCC sucks in an area, I don't want to paper over it. I want to show it so we know what needs improvement. And sometimes there is an interest in comparing AVR against other microcontrollers, notably Microchip's PIC and TI's MSP430. Different processor with same compiler? Different processor with best compiler? -- Now this is beginning to sound like SPEC. Well, lofty goals for sure. I don't want to get outside of the 8-bit microcontroller realm. I certainly want to do first things first. But I think it might be interesting, at some point in the future, if some of those things could be achieved. If we are going to put together a benchmark test suite, like others benchmarks for GCC (for larger processors), then I would think that it would be better to model it somewhat after those other benchmarks. I see that they tend to use publicly available code, and a variety of different types of applications. For benchmarking, and bench-marketing, that's a good approach. I'll be redundant and say those are probably not what you want to be debugging. It would make sense for what I'll call a avr-gcc dashboard. I see a web page with a bunch of bar graphs on it. A summary bar at the top that is the weighted sum of individual test bars. As an avr-gcc user, that kind of summary page would be very useful from one release to the next for setting expectations regarding performance on your own application. As an avr-gcc release master, it's a good dashboard for tracking progress and release worthy-ness. That's definitely the idea. the Atmel 802.15.4 MAC, Need to check license on that one -- but a good choice otherwise :-) and the GCC version of the Butterfly firmware. I also have a copy of the TI Competitive Benchmark, which they, and other semiconductor companies, have used to do comparisons between processors. Not familiar with it. Also, check the license. Processor manufacturers (like, oh, for instance, *all* the several I have worked for) are very touchy about benchmarks and benchmark