On Mon, Jun 17, 2019 at 07:53:52PM +0000, Thomas via Digitalmars-d-learn wrote: [...] > int main() > { > foreach(i;0 .. 10000) > { > int x; > // do something with x > } > return 0; > } > > Do I understand it right that the variable x will be created 10000 > times and destroyed at the end of the scope in each loop ? Or will it > be 10000 overwritten by creation ?
If x were a heap-allocated object, then your concerns would be true: it would be allocated once every iteration (and also add to the garbage that the GC will have to collect later). However, in this case x is an int, which is a value type. This means it will be allocated on the stack, and allocation is as trivial as bumping the stack pointer (practically zero cost), and deallocation is as simple as bumping the stack pointer the other way. In fact, it doesn't even have to bump the stack pointer between iterations, since it's obvious from code analysis that x will always be allocated on the same position of the stack relative to the function's call frame, so in the generated machine code it can be as simple as just using that memory location directly, and reusing the same location between iterations. > I mean does it not cost the CPU some time to allocate (I know were > talking here of nsec) but work is work. As far I know from my school > days in assembler, allocation of memory is one of the most expensive > instructions on the cpu. Activate memory block, find some free place, > reserve, return pointer to caller, and so on.. That's not entirely accurate. It depends on how the allocation is done. Allocation on the stack is extremely cheap,and consists literally of adding some number (the size of the allocation) to a register (the stack pointer) -- you cannot get much simpler than that. Also, memory allocation on the heap is not "one of the most expensive instructions". The reason it's expensive is because it's not a single instruction, but an entire subroutine of instructions to manage the heap. There is no CPU I know of that has a built-in instruction for heap allocation. > In my opinion this version should perform a little better: > > int main() > { > int x; > foreach(i;0 .. 10000) > { > x = 0; // reinitialize > // do something with x > } > return 0; > } > > Or do I miss something and there is an optimization by the compiler to > avoid recreating/destroying the variable x 10000 times ? [...] What you wrote here is exactly how the compiler will emit the machine code given your first code example above. Basically all modern optimizing compilers will implement it this way, and you never have to worry about stack allocation being slow. The only time you will have a performance problem is if x is a heap-allocated object. In *that* case you will want to look into reusing previous instances of the object between iterations. But if x is a value type (int, or struct, etc.), you don't have to worry about this at all. The first version of the code you have above is the preferred one, because it makes the code clearer. On a related note, in modern programming languages generally you should be more concerned about the higher-level meaning of the code than worry about low-level details like how instructions are generated, because generally speaking the machine code generated by the compiler is highly transformed from the original source code, and generally will not have a 1-to-1 mapping to the higher-level logical meaning of the code, i.e., the first version of the code *logically* allocates x at the beginning of each iteration of the loop and deallocates it at the end, but the actual generated machine code elides all of that because the compiler's optimizer can easily see that it's a stack allocation that always ends up in the same place, so none of the allocation/deallocation actually needs to be represented as-is in the machine code. On another note, for performance-related concerns the general advice these days is to write something in the most straightforward, logical way first, and then if the performance is not good enough, **use a profiler** to identify where the hotspots are, and optimize those. Trying to optimize before you have real-world, profiler data that your code is a hotspot is premature optimization, widely regarded as evil because it usually leads to overly-convoluted code that's hard to understand and maintain, and often actually *slower* because the way you expressed the meaning of the code has become so obfuscated that the optimizer can't figure out what you're actually trying to do, so it gives up and doesn't even try to apply any optimizations that may have benefitted the code. (Of course, the above is with the caveat that writing "straightforward" code only holds up to a certain extent; when it comes to algorithms, for example, no compiler is going to change an O(n^2) algorithm into an O(log n) one; you have to select the appropriate algorithm in the first place in order to avoid the inevitable performance bottleneck. And there are certain coding practices that can result in overall better performance, but these come with experience, and should not be blindly applied in the hopes that it will magically make your program faster. If you have a hotspot somewhere in the program, you can optimize everything else to hell and back and it will still be slow, whereas if you use a profiler to identify the hotspot in the first place, a 1-line change could have won you an 80% performance boost where long hours of blood and sweat won you only 5%. You need to identify and eliminate the big hotspots first, and *then* come back and trim off the smaller, lower-level stuff for that extra 2-5% win. And experience shows that generally, most predictions of where the hotspots will be are wrong. Always use a profiler before jumping to conclusions.) T -- Those who don't understand D are condemned to reinvent it, poorly. -- Daniel N