On 05/03/14 23:25, DJ Delorie wrote:
> 
> I'm writing this document to collect some of the current/new knowledge
> on how to minimize the flash/rom size needed for MSP430 applications,
> using the new msp430-elf (FSF/Red Hat) tools. I'll keep a copy at
> http://people.redhat.com/~dj/msp430/size-optimizations.html
> 
> This document has two purposes: to collect this information in one
> public place, and to ask others to test it out and provide feedback.
> 
>     Linker Section Optimization
>     __int20 Patch for Large Model
>     Building Newlib for Reduced Size
> 
> Obvious: Please do all size testing with -Os, which optimizes for
> size, and not -O3, which optimizes for speed.
> 
> ** Linker Section Optimizations
> 
> The msp430-elf assembler has a new directive ".refsym" that adds a
> reference to the named symbol into the generated object. In the past,
> to do this, you'd use a ".word" directive instead, but that takes up
> space in the final image. This new directive takes up no space in the
> image.
> 
> The startup code provided in the upstream newlib/libgloss (crt0 et al)
> uses ".refsym" to tell the linker about dependencies between the
> various snippets of startup code and things in the whole image which
> require them. This dependency information is partly manual in crt0.S
> itself, and partly automatic in code generated by gcc and gas.
> 
> For example, gcc has code to check if main() ever returns. In most
> embedded programs, it won't. If it *does* return, gcc adds a ".refsym"
> that causes a snippet in crt0.S to be included which adds a call to
> exit() after the call to main(). If main() doesn't return, there won't
> be any special code after the call to main().
> 
> The assembler has code to detect if either the data or bss sections
> are used, and if they are, it will use .refsym to tell crt0 to pull in
> snippets of code to initialize the RAM correctly. However, this means
> that most objects will now have extra "undefined" symbols that aren't
> part of your application:
> 
>       U __crt0_movedata
> 
> This new functionality means that there's a new library that must be
> linked in: "-lcrt". This is built by libgloss (part of newlib) by
> splitting up crt0.s, and contains all the snippets. To link this
> library correctly, you need a special new section in your linker
> script, which looks like this:
> 
> .text           :
>   {
>     . = ALIGN(2);
>     PROVIDE (_start = .);
>     KEEP (*(SORT(.crt_*)))
>     *(.lowtext .text .stub .text.* .gnu.linkonce.t.* .text:*)
> 
> The keep/sort line places all the snippets from crt0 at that point
> (after _start but before the rest of your program) in asciibetical
> order. Conveniently, the sections in libcrt.a are all named like
> ".crt_0013something" so the four-digit number causes them to all be
> inserted in the right order.
> 
> What's the net result of all this? A simple "blink an led" program
> that has no global variables can take as few as 24 bytes of flash
> depending on how you blink the led!
> 
> ** __int20 Patch for Large Model
> 
> The second big change is some ongoing work to add true "__intN"
> support to the GCC internals. Before now, gcc had one __int128 type
> built-in and any target that wanted something else had to hack it in
> somehow, without support from gcc's core. I've put a huge unofficial
> patch online at:
> 
> http://people.redhat.com/~dj/msp430/int20-patch.txt
> 
> This patch may not apply cleanly if the upstream sources have changed
> too much since the patch was generated.
> 
> There are two parts to this patch: The first part is the core __intN
> support, and the second part is changes to the msp430 backend to
> enable __int20 and support it as a regular integer type. Note that
> this patch mostly affects "large model" programs (-mlarge) as it
> changes pointer math to use __int20 for size_t instead of "unsigned
> long".
> 
> To explicitly use the __int20 type, replace "int" with "__int20" like
> this:
> 
> unsigned __int20 x[10];
> extern __int20 a, b, c;
> void foo (__int20 a, void *b);
> 
> Note that __int20 won't work (and you'll get a helpful compile-time
> error) unless you are building for an MSP430X-class cpu. You are
> allowed to use an explicit __int20 type with small model, though.
> 
> ** Building Newlib for Reduced Size
> 
> In some cases, applications may want to use the stock newlib runtime
> but want to reduce the amount of flash newlib routines use. If you're
> willing to rebuild newlib yourself, there are some config options you
> can provide that remove features you may not need. For an up-to-date
> list of these options, run "./configure --help" in the newlib/
> subdirectory. Any --enable-foo option can be given as --disable-foo to
> disable a feature. For example:
> 
> ../newlib-trunk/configure --disable-newlib-io-float
> 
> There is also an alternate tiny malloc() implementation that can be
> enabled:
> 
> ../newlib-trunk/configure --enable-newlib-nano-malloc
> 
> Note that you can specify multiple --enable/--disable options on one
> configure command.
> 

Hi,

While it is usually good to generate small code, it is worth remembering
that there is a limit to how important it is to get the smallest
possible code.  Even if you have the smallest mps430 with 512 bytes
flash, it does not matter if your program is 24 bytes or 511 bytes - all
that matters is that it fits in the chip you have.  So there is little
point in saving a few bytes on startup code that is almost always used,
even on small programs - any program that comes close to at least 512
bytes is going to have global (or at least statically allocated) data.

If there are no costs involved, then the sort of micro-optimisations you
mention are fine - after all, there are some programs that don't use
initialised data (preferring to initialise explicitly in code), and some
that don't use uninitialised data (due to the idiotic non-standard
behaviour of TI's compilers/libraries that don't clear the bss at
startup).  But if there /are/ costs - which can include lots of
confusing unnecessary symbols in map files and debugger sessions - then
it is not clear that saving a couple of bytes is worth it.

Reducing library size (especially for things like printf and friends) is
always useful, however.  And since most embedded programs do not return
from main(), then size optimisations based on that are worthwhile.

Ultimately, rather than having compiler flags "optimise for space" or
"optimise for speed", what we /really/ want is "give me the fastest
possible code that takes no more space than I have on the chip".  Maybe
that will come in gcc 5.0 :-)


Will the __intN stuff ever make it into mainline?  The msp430 port may
be the only chip that needs __int20, but there are other chips that
could benefit from different integer sizes - perhaps __int24 on the
8-bit AVR, or __int40 on some devices with DSP-style accumulators.


mvh.,

David


------------------------------------------------------------------------------
Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works. 
Faster operations. Version large binaries.  Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
_______________________________________________
Mspgcc-users mailing list
Mspgcc-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mspgcc-users

Reply via email to