You could file this as a bug on the website: <https://savannah.nongnu.org/bugs/?group=avr-libc>
As far as I understand it, the documentation (both on the website and the Atmel documentation) is generated directly from the library code and comments - so this would be a change to the library source. Feel free to quote any parts of my mails on the subject when filing the bug. mvh., David On 09/02/17 13:11, Marcin Godlewski wrote: > Dear All, > > The site > http://www.nongnu.org/avr-libc/user-manual/optimization.html#optim_code_reorder/optimization_1optim_code_reorder.html > still contains buggy description of memory barriers in avr-gcc. As > this site is popular among avr users I think it's really worth > fixing. What is more the same inaccurate article is available on > Atmel doc site: > http://www.atmel.com/webdoc/AVRLibcReferenceManual/optimization_1optim_code_reorder.html > . Is there anybody subscribed to this mailing list who can contact > the authors/maintainers of the site in order to discuss correction of > the content? > > Marcin Godlewski > > W dniu 2016-12-10 23:25:17 użytkownik Marcin Godlewski > <marcin.godlew...@onet.pl> napisał: >> W dniu 2016-12-09 10:11:55 użytkownik David Brown >> <da...@westcontrol.com> napisał: >>> On 08/12/16 21:46, Georg-Johann Lay wrote: >>>> Marcin Godlewski schrieb: >>>>> Dear all, >>>>> >>>>> Thanks for the reply to David. However I'm not trying to find >>>>> a solution for the described issue. What I'm trying to say in >>>>> this e-mail is that this part of Atmel documentation: >>>>> http://www.atmel.com/webdoc/AVRLibcReferenceManual/optimization_1optim_code_reorder.html >>>>> >>>>> is innacurate and should be corrected. The conclusion says: >>>>> >>>>> memory barriers ensure proper ordering of volatile accesses >>>>> >>>>> memory barriers don't ensure statements with no volatile >>>>> accesses to be reordered across the barrier while it should >>>>> say: >>>>> >>>>> memory barriers ensure proper ordering of global variables >>>>> accesses >>>>> >>>>> memory barriers don't ensure local variables accesses to be >>>>> reordered across the barrier >>>> >>>> At least the "local" vs. "global" is not completely correct. >>>> After all it's about memory accesses, and it doesn't matter if >>>> the memory is local (e.g. local static) or if you are >>>> dereferencing a pointer (which might point to a local auto or >>>> to an object on heap). >>>> >>>> The code example you quoted above is actually due to a subtle >>>> implementation detail of division, modulo and some other >>>> arithmetic of GCC's avr backend (the division is _not_ a call >>>> actually). >>>> >>>> IIRC the solution for me back then was -fno-tree-ter as any >>>> messing with inline asm doesn't hit the point. >>> >>> Yes, that is the solution you proposed when we discussed it a >>> good while back (on the avrlibc list, I think). I disagree with >>> you somewhat here (as I did then, though I can't remember if we >>> discussed the details). >>> >>> Changing an optimisation option like this means that code that >>> looks right, will run as expected - and that is a good thing. >>> But it also means that the code will /only/ be correct if >>> particular optimisation flags are set in particular ways. That >>> is a very fragile situation, and should always be avoided. To be >>> safe, this optimisation would have to be completely disabled in >>> the avr-gcc port. I don't know how useful this particular >>> optimisation is in terms of generating more efficient code, >>> though from the gcc manual it appears very useful and is enabled >>> at -O1. Clearly that determines the "cost" of this solution to >>> the re-ordering problem. >>> >>> >>> The use of the assembly dependency (or a nicely named macro with >>> the same effect) fixes the problem in this situation. It does so >>> regardless of optimisation options - the compiler is required to >>> have calculated the result of the division before disabling >>> interrupts, and cannot re-order the operations. It does so >>> without adding any extra assembly code or hindering any >>> optimisations - it merely forces an order on operations that are >>> to be done anyway. >>> >>> It has the clear disadvantage of needing extra code in the >>> user's source. Like memory barriers, it is a way of giving the >>> compiler extra information that cannot be expressed in normal C, >>> and which the compiler cannot (at the moment) figure out for >>> itself. >>> >>> You say that the assembly dependency does not "hit the point". I >>> think you are correct there - it is treating the symptom, not the >>> disease. It is not telling the compiler that an operation should >>> not be re-ordered, or that division is a costly operation. It >>> simply tells the compiler that we need the results of that >>> computation /here/. But it is a very effective and efficient >>> cure for this sort of problem. Unless and until there is some >>> /safe/ fix in the compiler to avoid this (and I don't count "put >>> this compiler option in your command line" as safe), I really do >>> think it is the best we have. >>> >>> >>> Note, however, that the "forceDependency" macro only solves half >>> the problem. Consider : >>> >>> unsigned int test2b(void) { unsigned int val; >>> >>> cli(); val = ivar; sei(); val = 65535 / val; return val; } >>> >>> In this case, the compiler could move the division backwards >>> above the sei(), giving a similar problem. (It did not make the >>> move in my brief tests - but it /could/ do.) I don't know if the >>> -fno-tree-ter flag stops that too, but the forceDependency() >>> macro is not enough. The forgetCompilerKnowledge macro is the >>> answer: >>> >>> unsigned int test2b(void) { unsigned int val; >>> >>> cli(); val = ivar; sei(); asm volatile ("" : "+g" (val)); val = >>> 65535 / val; return val; } >>> >>> This tells the compiler that it needs to stabilise the value of >>> "val", and it can't assume anything about "val" after this point >>> in the code, because it /might/ be read and /might/ change in the >>> assembly code. Again, nothing is actually generated in the >>> assembly and we are only forcing an ordering on the code. >>> >>> >>> Nothing would please me better here here than to have the >>> compiler understand that users would not want such re-ordering >>> around cli() and sei(), so that the problem simply goes away. >>> But it should not require particular choices of compiler flags, >>> nor should it require disabling useful optimisations and thus >>> generating poorer code elsewhere. >>> >>> It is also worth noting that though this situation occurs >>> because division does not work like a normal function call, >>> despite it using a library call for implementation, there is >>> nothing fundamental to stop the compiler moving a call to foo() >>> back or forth across a cli() or sei() as long as the compiler is >>> sure that no memory is accessed, no volatiles are accessed, and >>> there are no other externally visible effects in foo(). If the >>> definition of foo() is available when compiling the code, then >>> the compiler could well know this to be the case. If we replace >>> "val = 65535U / val;" with "val = foo(val);", where we have : >>> >>> unsigned int foo(unsigned int v) { return (v * v) - v; } >>> >>> in the same compilation unit, some or all of the calculation from >>> foo() will be inlined and mixed with the cli(). Again, >>> -fno-tree-ter fixes this - at the cost of killing such mixing and >>> optimisation in cases where it is useful. And again, the inline >>> assembly fixes it at the cost of knowing that you have to add >>> this line of source code. >>> >>> As gcc gets ever smarter with its inlining, function cloning, >>> link-time optimisations, etc., then this will become more and >>> more of an issue. >>> >>> >>> >>> Maybe the answer is that gcc needs an "execution barrier" that >>> is stronger than a memory barrier, because it will also block >>> calculations - it would act as a barrier to all local variables. >>> I cannot think of any way to make such a barrier with inline >>> assembly or the C11 fences - I think it would take a new >>> __builtin for gcc. Such a feature would have use on all embedded >>> targets, not just AVR. >>> >>> mvh., >>> >>> David >>> >>> >> >> I totally agree with you - a feature like "execution barrier" would >> be very useful. C11 made good job standardizing multi-threading >> features but unfortunately the features not always fits firmware >> development. Controlling what exactly goes into critical section is >> a fundamental problem, so I would even go further - why don't you >> propose the "execution barrier" as a new feature for the future C >> language standard? >> >>> >>>> >>>> Johann >>>> >>>>> I don't know whether this group is the right place to post it >>>>> however I do not know any better place. Hope someone here can >>>>> trigger the change of the documentation and I also hope to be >>>>> corrected if I am wrong. >>>>> >>>>> Thanks and regards, Marcin >>> >>> >> >> >> >> > > > _______________________________________________ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org https://lists.nongnu.org/mailman/listinfo/avr-gcc-list