Dear All, The site http://www.nongnu.org/avr-libc/user-manual/optimization.html#optim_code_reorder/optimization_1optim_code_reorder.html still contains buggy description of memory barriers in avr-gcc. As this site is popular among avr users I think it's really worth fixing. What is more the same inaccurate article is available on Atmel doc site: http://www.atmel.com/webdoc/AVRLibcReferenceManual/optimization_1optim_code_reorder.html . Is there anybody subscribed to this mailing list who can contact the authors/maintainers of the site in order to discuss correction of the content?
Marcin Godlewski W dniu 2016-12-10 23:25:17 użytkownik Marcin Godlewski <marcin.godlew...@onet.pl> napisał: > W dniu 2016-12-09 10:11:55 użytkownik David Brown <da...@westcontrol.com> > napisał: > > On 08/12/16 21:46, Georg-Johann Lay wrote: > > > Marcin Godlewski schrieb: > > >> Dear all, > > >> > > >> Thanks for the reply to David. However I'm not trying to find a > > >> solution for the described issue. What I'm trying to say in this > > >> e-mail is that this part of Atmel documentation: > > >> http://www.atmel.com/webdoc/AVRLibcReferenceManual/optimization_1optim_code_reorder.html > > >> is innacurate and should be corrected. The conclusion says: > > >> > > >> memory barriers ensure proper ordering of volatile accesses > > >> > > >> memory barriers don't ensure statements with no volatile accesses > > >> to be reordered across the barrier > > >> while it should say: > > >> > > >> memory barriers ensure proper ordering of global variables accesses > > >> > > >> memory barriers don't ensure local variables accesses to be > > >> reordered across the barrier > > > > > > At least the "local" vs. "global" is not completely correct. After > > > all it's about memory accesses, and it doesn't matter if the memory > > > is local (e.g. local static) or if you are dereferencing a pointer > > > (which might point to a local auto or to an object on heap). > > > > > > The code example you quoted above is actually due to a subtle > > > implementation detail of division, modulo and some other arithmetic > > > of GCC's avr backend (the division is _not_ a call actually). > > > > > > IIRC the solution for me back then was -fno-tree-ter as any messing > > > with inline asm doesn't hit the point. > > > > Yes, that is the solution you proposed when we discussed it a good while > > back (on the avrlibc list, I think). I disagree with you somewhat here > > (as I did then, though I can't remember if we discussed the details). > > > > Changing an optimisation option like this means that code that looks > > right, will run as expected - and that is a good thing. But it also > > means that the code will /only/ be correct if particular optimisation > > flags are set in particular ways. That is a very fragile situation, and > > should always be avoided. To be safe, this optimisation would have to > > be completely disabled in the avr-gcc port. I don't know how useful > > this particular optimisation is in terms of generating more efficient > > code, though from the gcc manual it appears very useful and is enabled > > at -O1. Clearly that determines the "cost" of this solution to the > > re-ordering problem. > > > > > > The use of the assembly dependency (or a nicely named macro with the > > same effect) fixes the problem in this situation. It does so regardless > > of optimisation options - the compiler is required to have calculated > > the result of the division before disabling interrupts, and cannot > > re-order the operations. It does so without adding any extra assembly > > code or hindering any optimisations - it merely forces an order on > > operations that are to be done anyway. > > > > It has the clear disadvantage of needing extra code in the user's > > source. Like memory barriers, it is a way of giving the compiler extra > > information that cannot be expressed in normal C, and which the compiler > > cannot (at the moment) figure out for itself. > > > > You say that the assembly dependency does not "hit the point". I think > > you are correct there - it is treating the symptom, not the disease. It > > is not telling the compiler that an operation should not be re-ordered, > > or that division is a costly operation. It simply tells the compiler > > that we need the results of that computation /here/. But it is a very > > effective and efficient cure for this sort of problem. Unless and until > > there is some /safe/ fix in the compiler to avoid this (and I don't > > count "put this compiler option in your command line" as safe), I really > > do think it is the best we have. > > > > > > Note, however, that the "forceDependency" macro only solves half the > > problem. Consider : > > > > unsigned int test2b(void) > > { > > unsigned int val; > > > > cli(); > > val = ivar; > > sei(); > > val = 65535 / val; > > return val; > > } > > > > In this case, the compiler could move the division backwards above the > > sei(), giving a similar problem. (It did not make the move in my brief > > tests - but it /could/ do.) I don't know if the -fno-tree-ter flag > > stops that too, but the forceDependency() macro is not enough. The > > forgetCompilerKnowledge macro is the answer: > > > > unsigned int test2b(void) > > { > > unsigned int val; > > > > cli(); > > val = ivar; > > sei(); > > asm volatile ("" : "+g" (val)); > > val = 65535 / val; > > return val; > > } > > > > This tells the compiler that it needs to stabilise the value of "val", > > and it can't assume anything about "val" after this point in the code, > > because it /might/ be read and /might/ change in the assembly code. > > Again, nothing is actually generated in the assembly and we are only > > forcing an ordering on the code. > > > > > > Nothing would please me better here here than to have the compiler > > understand that users would not want such re-ordering around cli() and > > sei(), so that the problem simply goes away. But it should not require > > particular choices of compiler flags, nor should it require disabling > > useful optimisations and thus generating poorer code elsewhere. > > > > It is also worth noting that though this situation occurs because > > division does not work like a normal function call, despite it using a > > library call for implementation, there is nothing fundamental to stop > > the compiler moving a call to foo() back or forth across a cli() or > > sei() as long as the compiler is sure that no memory is accessed, no > > volatiles are accessed, and there are no other externally visible > > effects in foo(). If the definition of foo() is available when > > compiling the code, then the compiler could well know this to be the > > case. If we replace "val = 65535U / val;" with "val = foo(val);", where > > we have : > > > > unsigned int foo(unsigned int v) { > > return (v * v) - v; > > } > > > > in the same compilation unit, some or all of the calculation from foo() > > will be inlined and mixed with the cli(). Again, -fno-tree-ter fixes > > this - at the cost of killing such mixing and optimisation in cases > > where it is useful. And again, the inline assembly fixes it at the cost > > of knowing that you have to add this line of source code. > > > > As gcc gets ever smarter with its inlining, function cloning, link-time > > optimisations, etc., then this will become more and more of an issue. > > > > > > > > Maybe the answer is that gcc needs an "execution barrier" that is > > stronger than a memory barrier, because it will also block calculations > > - it would act as a barrier to all local variables. I cannot think of > > any way to make such a barrier with inline assembly or the C11 fences - > > I think it would take a new __builtin for gcc. Such a feature would > > have use on all embedded targets, not just AVR. > > > > mvh., > > > > David > > > > > > I totally agree with you - a feature like "execution barrier" would be very > useful. C11 made good job standardizing multi-threading features but > unfortunately the features not always fits firmware development. Controlling > what exactly goes into critical section is a fundamental problem, so I would > even go further - why don't you propose the "execution barrier" as a new > feature for the future C language standard? > > > > > > > > > Johann > > > > > >> I don't know whether this group is the right place to post it however > > >> I do not know any better place. Hope someone here can trigger the > > >> change of the documentation and I also hope to be corrected if I am > > >> wrong. > > >> > > >> Thanks and regards, > > >> Marcin > > > > > > > > _______________________________________________ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org https://lists.nongnu.org/mailman/listinfo/avr-gcc-list