On 07/08/2018 05:14, Joel Sherrill wrote: > > On Mon, Aug 6, 2018 at 1:21 AM, Chris Johns <chr...@rtems.org > <mailto:chr...@rtems.org>> wrote: > > On 06/08/2018 16:12, Christian Mauderer wrote: > > Am 06.08.2018 um 07:31 schrieb Chris Johns: > >> On 06/08/2018 10:51, Chris Johns wrote: > >>> On 05/08/2018 19:39, Christian Mauderer wrote: > >>>> Am 05.08.2018 um 04:00 schrieb Chris Johns: > >>>>> Hi, > >>>>> > >>>>> I have been working on migrating covoar in the rtems-tools repo to > DWARF. The > >>>>> goal is remove objdump parsing and to get accurate details about the > functions > >>>>> being covered. This is an unfunded task. > >>>>> > >>>>> The work has resulted in a close examination of inlined code in > RTEMS > and what I > >>>>> saw alarmed me so I have added a report to the rtems-exeinfo tool in > rtems-tools > >>>>> (the change is to be posted for review once I get the coverage tests > running). > >>>>> > >>>>> A summary report for hello.exe on RTEMS 5 for SPARC is: > >>>>> > >>>>> inlined funcs : 1412 > >>>>> total funcs : 1956 > >>>>> % inline funcs : 72% > >>>>> total size : 174616 > >>>>> inline size : 81668 > >>>>> % inline size : 46% > >>>>> > >>>>> This is a small application so it could be argued that skews the > figures. A > >>>>> large C/C++ application built with -O2 running on RTEMS 4.11 ARM > reports the > >>>>> inline usage as: > >>>>> > >>>>> inlined funcs : 10370 > >>>>> total funcs : 17700 > >>>>> % inline funcs : 58% > >>>>> total size : 3066240 > >>>>> inline size : 1249514 > >>>>> % inline size : 40% > >>>>> > >>>>> This does not seem right to me. > >>>>> > >>>>> The report is new and there could be issues in the DWARF handling > that > feeds > >>>>> this report however I am posting this to start a discussion on the > topic of > >>>>> inlining. > >>>>> > >>>>> I attach the report for hello.exe. The `-i` option generates the > inline report. > >>>>> > >>>>> The first section is a summary showing the total number of functions > in the > >>>>> executable that have machine code and are flagged as inline. The > report lists > >>>>> the percentage of functions that are inlined and the percentage of > machine code > >>>>> that is inlined. The values seem high to me. > >>>>> > >>>>> The second table lists inline functions that are repeated sorted > from the > >>>>> largest foot print to the smallest. The first column the total size > of > machine > >>>>> code in the executable and the second column the number of > instances. > >>>>> > >>>>> The third table is the list of inline functions sorted from largest > machine code > >>>>> footprint to smallest. The second column are flags of which there is > one. A `E` > >>>>> indicates the inline function is also external which means the > compiler has > >>>>> created an external reference to this function, ie an address-of is > being taken. > >>>>> The third column is the address in the executable so you can take a > look with > >>>>> objdump at the machine code. > >>>>> > >>>>> We need to ask some important question in relation to inlining. It > is > cheap to > >>>>> add and we all feel the code we add needs to be fast and needs to be > inlined but > >>>>> does it really need to be inlined? > >>>>> > >>>>> Some pieces of code do need to be inlined and the overhead is just > that an > >>>>> overhead, for example in the large C/C++ application there is a low > level > >>>>> volatile hardware write routing with close to 300 instances and a > code > size of > >>>>> 10K. This code needs to be inlined for performance reasons but > should > the size > >>>>> on average be 40 bytes, I doubt it. > >>>>> > >>>>> Can we be more judicious with our use of the inline keyword? > >>>>> > >>>>> Is the performance gain we really expect or is the actual overhead > of > a call > >>>>> frame not worth saving? > >>>>> > >>>>> What are the real costs of inlining a piece of code? It adds size > to the > >>>>> executable and depending on the code being inlined it complicates > coverage > >>>>> analysis be adding extra branch points. > >>>>> > >>>>> The metrics to determine what should be inlined is complicated and I > do not > >>>>> think we have a suitable policy in place. I believe it is time we to > create one. > >>>>> > >>>>> The issue is not limited to our code, gcc, newlib and libstdc++ seem > to have > >>>>> some code that should be looked at more closely. For example > __udivmoddi4, and > >>>>> __sprint_r. > >>>>> > >>>>> Chris > >>>>> > >>>>> > >>>> > >>>> Hello Chris, > >>>> > >>>> I just took a look at one of the first function in your list: > __sprint_r > >>>> > >>>> > > https://sourceware.org/git/gitweb.cgi?p=newlib-cygwin.git;a=blob;f=newlib/libc/stdio/vfprintf.c;h=c4bf2dbe31da64462ecccec97c8e901e4ffadd44;hb=HEAD#l403 > > <https://sourceware.org/git/gitweb.cgi?p=newlib-cygwin.git;a=blob;f=newlib/libc/stdio/vfprintf.c;h=c4bf2dbe31da64462ecccec97c8e901e4ffadd44;hb=HEAD#l403> > >>>> > >>>> As far as I can see, there is no explicit inline key word for that > >>>> function. So in that case, the compiler decided that it would be a > good > >>>> idea to inline that function. > >>> > >>> Thanks and yes. At this point in time I cannot tell what is happening > and I am > >>> not sure the tool is reporting accurate data, I need to investigate. > >> > >> I have updated the tool and report to show which inline functions are: > >> > >> - inlined by compiler > >> - declared inline and not inlined > >> - declared inline and inlined > >> > >> I have also fixed a quick hack I had where the size was the span from > the > low PC > >> to the high PC, this was wrong. Inlined code can be split and moved > when > >> inlining creating a discontinuous address range. The size in the report > is now > >> the number of machine code bytes. > >> > >> The report will show any functions not inlined when asked to be > inlined. > We do > >> not have any. > >> > >> The 'C' flag in the inlined table shows which functions the compiler > has > inlined. > >> > >> Chris > >> > > > > With that list it is now much clearer which functions would be relevant > > for a potential review. > > Yes. FYI the C/C++ RTEMS 4.11 app now gives: > > inlined funcs : 10370 > total funcs : 17700 > % inline funcs : 58% > total size : 2296354 > inline size : 479628 > % inline size : 20% > > This level of reduction is more inline with what I expected. > > > > >>> > >>>> I'm not sure whether I might just haven't seen it but is there a > >>>> possibility to distinguish between functions that have been inlined > by > >>>> the compiler and ones that have been inlined due to the "inline" > keyword > >>>> without looking at every definition? > >>> > >>> I am not sure. The DWARF data is complex and detailed and I view this > initial > >>> step into the area of using DWARF to perform static analysis of RTEMS > >>> executables as green. > >>> > >>> DWARF does provide declaration attributes. I need to review the DWARF > data and > >>> standard to determine if we can tell what is declared inline and what > has been > >>> inlined. I think it would be good to know. > >>> > >>>> Did you try compiling with size optimization? I would expect that the > >>>> compiler would inline far less functions and maybe even ignore some > >>>> "inline" keywords. As far as I know it's more of a hint to the > compiler. > >>> > >>> Not yet. A complete tool build with those options is a lot of effort > and > I am > >>> still not comfortable the report is accurate. I think this is > something that > >>> should be done at some point. I think it would create an interesting > data point. > >>> > >>>> I would only worry about functions that are still inlined if size > >>>> optimization is selected. > >>> > >>> I think we need to review the functions we currently have tagged as > inline. I > >>> think the only way we can do this is with real data. > >>> > >>>> That's the case when I tell the compiler to > >>>> make the program as small as possible. In all other cases I want some > >>>> well balanced optimum between speed and size. Inlining small > functions > >>>> is OK in that case if you ask me. > >>> > >>> How do you define this, ie what is the inline policy we use? > >>> How do you audit this? > > > > Both questions are not simple to answer. It is most likely a case by > > case decision. I think there are roughly two reasons for inlining: > > > > - The code is short enough that it is smaller if it is inlined compared > > to a function call. > > > > - There is some performance reason. > > > > Anything else? > > Not for inlining rather for not inlining? I had a discussion with Joel > last week > about coverage and he said he had previously reviewed the inlines to > reduce the > branch counts because it complicates coverage. I would prefer he talks > about > this, it is his area. > > > Thanks for the lead-in Chris. > > First, there are multiple types of coverage. Statement, Decision coverage, > MC/DC, and object coverage. Based on various limiting factors which I > am happy to discuss in more detail if someone is interested, we went > with object coverage. I will briefly remind everyone of the limitations > and guidelines we went into this with: > > + We didn't want instrumented code. Test as you fly, fly as you test. > Plus there would need to be a way to get information generated by > the instrumentation off the target. > > + Without gcc support for identifying sub-expressions in complex > logical expressions, there is no way to do DC or MCDC. Further, > we would have had to have a tool to interpret the non-existent > information and know which entries in the truth tables for every > expression had to be exercised for DC and MCDC. > > + gcov's reports are primarily statement coverage. And you can't get > gcov reports without gcov output from runs which we still are working > on. Also in terms of DO-178, statement coverage is generally Level C > type coverage. > > We wanted higher level coverage for RTEMS and thus object > coverage with emphasis on every generated branch statement being > needed and exercisable as taken and not taken. This approach > ensured we didn't have dead code. > > Plus this type of information could easily be obtained from simulator > traces from Qemu and tsim. covoar was written to combine and > correlate the information back to assembly and source and generate > the reports. > > That's what led to the current coverage approach. Chris has been > working to eliminate the use of command line tools (e.g. nm, objdump, > etc) in favor of using DWARF directly. > > That's the background of the approach and how we got reports. > > Also remember that RTEMS testing is primarily via the public APIs > which combined with coverage ensures we actually need all the > internal code. If you can't reach it from a public API, it is dead. > > But the insights into RTEMS coding practices began when we started > trying to improve the coverage percentages. Key point: > > IF CODE IS DUPLICATED AT THE SOURCE OR OBJECT LEVEL, IT HAS TO > BE FULLY TESTED AT EACH INSTANCE OR INSTANTIATION. > > At the source level, sometimes we factored out a common helper method > to avoid testing the same logic via multiple APIs. This reduced the number > of test cases needed. > > We looked at a lot of generated assembly. Sometimes we would see > large methods being inlined multiple times. This would increase the overall > size of an RTEMS application. But size is not the only impact of inlining. > If an inlined method has one or more if's in it, then the branch paths > it includes are introduced EVERYWHERE it is inlined. When we > had _Thread_Dispatch_enable, I recall it was used > 100 times and > includes a branch in it. There was a build option to not inline this > routine to avoid needing to add over 100 test cases. > > There is a delicate code size and complexity dance when deciding > whether to inline something or not. There isn't an easy or obvous > answer. > > + How large is the method? > > + How many times is it used? > > + How many branches/loops does it have (e.g. cyclomatic complexity)?
I would like to add Capstone [*] to rtems-tools to allow us to look into the actual code and measure these sort of things. [*] https://www.capstone-engine.org/ Chris > > In general, very small methods with no branches can be inlined in less > code than the subroutine call takes. So these tend to be a win. > > Over a certain size, the number of times used multiplied by the inlined > size can have a negative impact on code size. We saw this a lot when > doing the code coverage improvements. > > If the inlined method has branches, is it possible or cost effective > to write test cases for the branches introduced at EVERY use? > > I came away believing a few things: > > + The MISRA rule for using simple, single condition branch logic > is really good for code coverage analysis. It simplifies it. Statement > coverage and branch coverage become closer to equivalent. The > resulting assembly code is also much easier to match to source. > > + Inlining a method with conditional logic has to be looked at > very carefully. You don't want to inline something and find yourself > having exploded the required number of test cases. Again, look > at the number of times a method is used but be cautious about > conditional expressions in inline methods. > > + Inlining doesn't always improve performance measurably. It > can increase code size, code to branch around, etc. > > For a current example of something that is very likely a problem > from a object code test coverage and code size increase view, > consider _Thread_Dispatch_enable. It is inlined 41 times. > Chris and I looked at a place where it was inlined. It was about > 80 bytes. 41*80 ==> 3280 bytes. Based on size along, it would > save ~3K across the entire RTEMS API object code to not inline it > at all. > > Then I looked at the cyclomatic complexity of _Thread_Dispatch_enable(). > From Wikipedia: > > Cyclomatic complexity is a software metric, used to indicate the complexity > of a program. It is a quantitative measure of the number of linearly > independent > paths through a program's source code. > > I used http://www.lizard.ws/ [1] to analyse this method. If you take the > conditional > compilation out of the code, the cyclomatic complexity is 4. This means that > every place this is inlined, it introduces 3 extra paths to test. 41*3 ==> > 123. > > My recommendation would be to evaluate the impact of not inlining this > method at all. The negative impact on testing and code size are pretty high. > > Unfortunately, what I wrote up is just the guidelines. Things change over > time. An inlined method that is a nice encapsulation when written could > turn out to be too large to be inlined if 10 other use cases get added. > > I lean to a method can be inlined if it is small and no more than one > branch statement in assembly. You can be more lenient if used only > once or twice. But lean to NO branches on an inlined method if the > number of uses is high. > > Ideally the inlined code is no larger than the code for the call itself > and there are no branches. But going above that is not a reason to > not inline. > > I know this was long but there are a lot of trade-offs. We want RTEMS to > be as small as possible and as testable as possible. We don't have an > army of people to write test cases. And I don't want to need that. > > Also I have focused on the things that the RTEMS community has control > over. Chris' tool gives us a broader view and we can likely help improve > things more broadly. > > [1] You can cut and paste code into this via the web. So pretty easy to > look at a single method. > > > > > Chris > _______________________________________________ > devel mailing list > devel@rtems.org <mailto:devel@rtems.org> > http://lists.rtems.org/mailman/listinfo/devel > <http://lists.rtems.org/mailman/listinfo/devel> > > _______________________________________________ devel mailing list devel@rtems.org http://lists.rtems.org/mailman/listinfo/devel