http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57742
--- Comment #4 from Marc Glisse <glisse at gcc dot gnu.org> --- (In reply to Richard Biener from comment #2) > (In reply to Marc Glisse from comment #1) > > This is a very limited version of this optimization. It is in > > simplify_builtin_call, so only triggers if malloc/calloc is > > SSA_NAME_DEF_STMT(gimple_vuse(memset_stmt)). However, generalizing it means > > we would need plenty of tests protecting against cases where the > > transformation would be wrong. Note that this transforms: > > p=malloc(n); > > if(cond)memset(p,0,n); > > into: > > p=calloc(n,1); > > cond; > > which is good if cond is p!=0 but may not always be so great otherwise. > > ;) post-dominator tests (or simply tests whether both calls are in the > same basic-block ...). Same basic block is quite limited, and for the condition below we don't directly have post-domination, we would need post-domination between the bbs with gimple_cond and malloc, and the bb of memset with the landing block of the gimple_cond. But even finding the gimple_cond in: malloc; loop; cond; loop; memset; can be hard. I guess I'll have to limit my expectations a bit... > Also you can transform > > p = malloc (n); > if (p) > memset (p, 0, n); > > which might be a common-enough case to optimize for. Yes, that's the goal. > dereferencing a double wouldn't have a VDEF (unless you store a double). I do want to be able to store in between, so I think I have to walk the vdef chain. But as soon as I do that, I need to make sure that the writes are to places that can't alias, which complicates things a lot (and it can get a bit expensive in a function with many memset). Consider this program: #include <vector> void f(void*p,int n){ new(p)std::vector<int>(n,0); } With -O3, we end up with: _27 = operator new (_26); MEM[(struct _Vector_base *)p_4(D)]._M_impl._M_start = _27; MEM[(struct _Vector_base *)p_4(D)]._M_impl._M_finish = _27; _16 = _27 + _26; MEM[(struct _Vector_base *)p_4(D)]._M_impl._M_end_of_storage = _16; __builtin_memset (_27, 0, _26); which has memory stores between the allocation and memset. That's exactly the type of code where I'd want the optimization to apply. Joost's example has the same pattern: malloc, test for 0, several unrelated memory stores, memset. (how to handle the fact that we have operator new and not malloc is a different issue, I am thinking of having a mode/flag where we promise not to replace operator new so it can be inlined, which will include an if(p!=0) test) It would be great (in particular for application-specific plugins) to have an easy way to say things like: this is the next read/write use of this memory region (but other memory regions may be used in between), and it isn't post-dominated only because of this gimple_cond, etc. It's almost noon, too late to be dreaming ;-)