btashton opened a new issue #2425: URL: https://github.com/apache/incubator-nuttx/issues/2425
# Why is the Build Larger ## Intro There was some discussion of the growth in code size between 9.1.0 and 10.0.0-RC0. We have some tools to help us look into this, so I did a little digging. To be clear I am not making any statement here about if we should do something about it or not, but I wanted us to at least be aware of what caused these increases. ## Discovery For reference I am using the `stm32f4discovery:nsh` as the reference target because it seems to be a common target for people that has been around for a long time. So first off here is the top level size information without with the standard configuration out of the box build which includes optimisation. ``` ❯ arm-none-eabi-size ../testdata/opt/nuttx-* text data bss dec hex filename 69947 104 2104 72155 119db ../testdata/opt/nuttx-10.0.0-RC0 69447 104 2476 72027 1195b ../testdata/opt/nuttx-9.1.0 ``` .text sees an increase of 500 bytes and .data for 372. These are not huge, but are also not insignificant and but we cannot keep increasing by this much every release. There are a lot of changes between these two releases, but we can first analyse the debug information to get a better idea what is going on. I added with kconfig-tweak the debug symbols and the used Bloaty to generate these reports Here is the quick script I used to generate the builds I wanted to look at: ```bash set -e +x NUTTX_TAG=$1 APPS=apps NUTTX=incubator-nuttx OUTDIR=testdata for REPO in $NUTTX $APPS do pushd $REPO git clean -xdf git checkout $NUTTX_TAG popd done pushd $NUTTX ./tools/configure.sh stm32f4discovery:nsh kconfig-tweak --enable DEBUG_SYMBOLS make oldconfig make -j8 popd mkdir -p testdata cp $NUTTX/nuttx $OUTDIR/$NUTTX_TAG ``` This is the comparison between 10.0.0-RC0 and 9.1.0. ``` ❯ bloaty -n 20 -d sections,symbols ../testdata/nuttx-10.0.0-RC0 -- ../testdata/nuttx-9.1.0 FILE SIZE VM SIZE -------------- -------------- +1.8% +16.2Ki [ = ] 0 .debug_info +0.7% +1.64Ki [ = ] 0 .debug_line +3.8% +1.13Ki [ = ] 0 .debug_str +0.6% +636 +0.6% +636 .text +1.9% +400 +1.9% +400 [103 Others] [NEW] +284 [NEW] +284 mm_memalign [NEW] +232 [NEW] +232 nxtask_init [NEW] +212 [NEW] +212 nxsem_clockwait +83% +158 +83% +158 fclose +61% +132 +61% +132 nsh_session [NEW] +96 [NEW] +96 cxx_initialize +44% +76 +44% +76 nx_vioctl +95% +72 +95% +72 proc_findnode +19% +64 +19% +64 nx_vopen +238% +62 +238% +62 inode_search [NEW] +62 [NEW] +62 sem_clockwait -72.0% -72 -72.0% -72 wd_initialize [DEL] -76 [DEL] -76 up_cxxinitialize -46.2% -96 -46.2% -96 nxthread_create -66.7% -120 -66.7% -120 isspace -76.7% -138 -76.7% -138 nsh_system [DEL] -148 [DEL] -148 wd_create [DEL] -152 [DEL] -152 wd_delete -40.4% -194 -40.4% -194 opendir -87.9% -218 -87.9% -218 nxsem_timedwait +0.9% +432 [ = ] 0 .symtab +1.9% +384 [ = ] 0 [section .symtab] [NEW] +48 [ = ] 0 _inode_getcwd [NEW] +48 [ = ] 0 cxx_initialize [NEW] +48 [ = ] 0 inode_root_reserve [NEW] +48 [ = ] 0 memalign [NEW] +48 [ = ] 0 nxsem_clockwait [NEW] +48 [ = ] 0 nxtask_init [NEW] +48 [ = ] 0 nxtask_uninit [NEW] +32 [ = ] 0 _assert [NEW] +32 [ = ] 0 inited.5681 +200% +32 [ = ] 0 inode_search [NEW] +32 [ = ] 0 mm_memalign [DEL] -32 [ = ] 0 g_wdpool -66.7% -32 [ = ] 0 inode_insert -66.7% -32 [ = ] 0 inode_unlink -40.0% -32 [ = ] 0 nx_start -66.7% -32 [ = ] 0 nx_stat [DEL] -48 [ = ] 0 up_cxxinitialize [DEL] -48 [ = ] 0 wd_create [DEL] -48 [ = ] 0 wd_delete -9.5% -112 [ = ] 0 [56 Others] +0.3% +374 [ = ] 0 .debug_abbrev +1.9% +224 [ = ] 0 .debug_aranges +0.5% +188 [ = ] 0 .debug_frame +0.9% +170 [ = ] 0 .strtab +1.3% +80 [ = ] 0 [section .strtab] [NEW] +32 [ = ] 0 nxsem_clockwait_uninterruptible [NEW] +19 [ = ] 0 inode_root_reserve [NEW] +18 [ = ] 0 __FUNCTION__.5716 [NEW] +18 [ = ] 0 __FUNCTION__.6102 [NEW] +18 [ = ] 0 __FUNCTION__.6110 [NEW] +18 [ = ] 0 __FUNCTION__.6118 [NEW] +18 [ = ] 0 __FUNCTION__.6128 [NEW] +16 [ = ] 0 nxsem_clockwait [NEW] +15 [ = ] 0 cxx_initialize [NEW] +15 [ = ] 0 nxtask_startup [NEW] +14 [ = ] 0 _inode_getcwd [NEW] +14 [ = ] 0 nxtask_uninit -7.2% -5 [ = ] 0 [13 Others] [DEL] -13 [ = ] 0 g_wdfreelist [DEL] -17 [ = ] 0 up_cxxinitialize [DEL] -18 [ = ] 0 __FUNCTION__.5606 [DEL] -18 [ = ] 0 __FUNCTION__.5991 [DEL] -18 [ = ] 0 __FUNCTION__.5999 [DEL] -18 [ = ] 0 __FUNCTION__.6007 [DEL] -18 [ = ] 0 __FUNCTION__.6019 +0.4% +36 +0.4% +36 .ARM.extab +0.4% +24 +0.4% +24 .ARM.exidx -29.4% -5 [ = ] 0 [Unmapped] -5.6% -24 [ = ] 0 .debug_ranges [ = ] 0 -14.8% -372 .bss [ = ] 0 +8.9% +20 g_idletcb [ = ] 0 [NEW] +4 inited.5681 [ = ] 0 -13.3% -2 [section .bss] [ = ] 0 [DEL] -2 g_wdnfree [ = ] 0 [DEL] -8 g_wdfreelist [ = ] 0 [DEL] -384 g_wdpool +1.3% +20.9Ki +0.3% +324 TOTAL ``` ## Analysis From this we can see some wins around the watchdog, although that is likely at the cost of some dynamic allocations. The new symbols are good places for us to look for the growth. Some of these are places were logic has been re-factored an example may be where `cxx_initialized` is new and `up_cxxinitialize` is old. Other places we see large features being brought in such as `mm_memalign`. Lets take a look at this Here are some that jumped out to me: * `mm_memalign` (new) * `nxtask_init` (new) * `nxsem_clockwait` (new) * `fclose` * `nsh_session` * `proc_findnode` To dig into what these builds actually look like I am using Cutter https://cutter.re/ which is mostly just a frontend for Radare2. ### `mm_memalign` (+284) This is a new symbol included in the build, but a function that has existed for a long time. We can look at the global call graph to see what calls it:  It is only called by `up_create_stack` We can then look at the local call graph between the two releases for `up_create_stack` __10.0.0-RC0__  __9.1.0__  We have made a change between the releases that is not calling memalign instead of malloc. Taking a look a the diff between the releases we see: ```diff #else /* CONFIG_TLS_ALIGNED */ @@ -163,14 +172,16 @@ int up_create_stack(FAR struct tcb_s *tcb, size_t stack_size, uint8_t ttype) if (ttype == TCB_FLAG_TTYPE_KERNEL) { - tcb->stack_alloc_ptr = (uint32_t *)kmm_malloc(stack_size); + tcb->stack_alloc_ptr = + (uint32_t *)kmm_memalign(CONFIG_STACK_ALIGNMENT, alloc_size); } else #endif { /* Use the user-space allocator if this is a task or pthread */ - tcb->stack_alloc_ptr = (uint32_t *)kumm_malloc(stack_size); + tcb->stack_alloc_ptr = + (uint32_t *)kumm_memalign(CONFIG_STACK_ALIGNMENT, alloc_size); } #endif /* CONFIG_TLS_ALIGNED */ ``` This change came in as part of this PR https://github.com/apache/incubator-nuttx/pull/1562 ## `proc_findnode` (+72) This increase seems to have come in as part of this https://github.com/apache/incubator-nuttx/pull/1490 This is a minor code change to fix a bug, but the generated code is surprisingly large.  ## `fclose` (+158) This largely comes from code complexity change that came in as https://github.com/apache/incubator-nuttx/pull/1611 Once again the change looks small, but ends up creating a fair bit of code.  ## `nsh_session` (+132) Used the same technique as before looking at the call graph. We have a quite a bit of complexity increase here that came in with https://github.com/apache/incubator-nuttx-apps/pull/367 which adds the ability to pass in arguments to `nsh`. I suspect this functionality might have been best held behind a config flag. The file stream change added some code size here, but very little. ## `nxtask_init` (TODO) ## `nxsem_clockwait` (TODO) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org