On Tue, Nov 5, 2019 at 6:53 PM Jan Hubicka <hubi...@ucw.cz> wrote: > > > On 11/5/19 3:48 PM, Jan Hubicka wrote: > > > > > > > > > > stringpool.c:63 (alloc_node) 47M: 2.3% > > > > > 0 : 0.0% 0 : 0.0% 0 : 0.0% 1217k > > > > > ipa-prop.c:4480 (ipa_read_edge_info) 51M: 2.4% > > > > > 0 : 0.0% 260k: 0.0% 404k: 0.3% 531k > > > > > hash-table.h:801 (expand) 81M: 3.9% > > > > > 0 : 0.0% 80M: 4.7% 88k: 0.1% 3349 > > > > > ^^^ some of memory comes here which ought to be accounted to > > > > > caller of > > > > > expand. > > > > > > > > Yes, these all come from ggc_internal_alloc. Ideally we should register > > > > a mem_alloc_description > > > > for each created symbol/call_summary and register manually every > > > > allocation to such descriptor. > > > > > > Or just pass memory stats from caller of expand and transitively pass it > > > from caller of summary. This will get us the line info of get_create > > > call that is IMO OK. > > > > The issue with this approach is that you will spread a summary allocation > > along all the ::get_create places. Which is not ideal. > > We get it with other allocations, too. Not ideal, but better. > Even better solutions are welcome :) > > > > Try to take a look, or we can debug that on Thursday together? > > Martin > > Found it. It turns out that ggc_prune_ovehread_list is bogus. It walks > all active allocations objects and looks if they was collected accoutnig > their collection and then throws away all allocations (including those > not colelcted) and those gets no longer accounted later. So we > basically misaccount everything that survives ggc_collect. > > No wonder that it makes me to hunt ghosts 8-O > > Also the last memory report was sorted by garbage not leak for reason - > for normal compilation we care about garbage produces primarily because > those triggers ggc collects and makes compiler slow. > > BTW I like how advanced C++ gets back to lisp :) > > With the fix I get following stats by end of firefox WPA > > cfg.c:127 (alloc_block) 32M: 1.2% 12M: > 2.6% 0 : 0.0% 0 : 0.0% 446k > symtab.c:582 (create_reference) 42M: 1.6% 0 : > 0.0% 65M: 1.7% 1329k: 0.4% 840k > gimple-streamer-in.c:101 (input_gimple_stmt) 49M: 1.9% 17M: > 3.5% 0 : 0.0% 375k: 0.1% 747k > tree-ssanames.c:308 (make_ssa_name_fn) 51M: 2.0% 16M: > 3.4% 0 : 0.0% 0 : 0.0% 973k > ipa-cp.c:5157 (ipcp_store_vr_results) 51M: 2.0% 1243k: > 0.2% 0 : 0.0% 9561k: 3.0% 146k > stringpool.c:63 (alloc_node) 53M: 2.0% 0 : > 0.0% 0 : 0.0% 0 : 0.0% 1362k > ipa-prop.c:3988 (duplicate) 63M: 2.4% 1115k: > 0.2% 0 : 0.0% 10M: 3.2% 264k > toplev.c:904 (realloc_for_line_map) 72M: 2.8% 0 : > 0.0% 71M: 1.9% 15M: 5.1% 27 > tree-ssanames.c:83 (init_ssanames) 96M: 3.7% 121M: > 24.4% 44M: 1.2% 87M: 27.8% 174k > tree-ssa-operands.c:265 (ssa_operand_alloc) 104M: 4.0% 0 : > 0.0% 39M: 1.0% 0 : 0.0% 105k > stringpool.c:41 (stringpool_ggc_alloc) 106M: 4.1% 0 : > 0.0% 0 : 0.0% 7652k: 2.4% 1362k > lto/lto-common.c:204 (lto_read_in_decl_state) 160M: 6.2% 0 : > 0.0% 105M: 2.8% 19M: 6.1% 1731k > cgraph.c:851 (create_edge) 248M: 9.5% 0 : > 0.0% 70M: 1.9% 0 : 0.0% 3141k > cgraph.h:2712 (allocate_cgraph_symbol) 383M: 14.7% 0 : > 0.0% 155M: 4.1% 0 : 0.0% 1567k > tree-streamer-in.c:631 (streamer_alloc_tree) 718M: 27.5% 136M: > 27.5% 1267M: 33.3% 64M: 20.6% 15M > -------------------------------------------------------------------------------------------------------------------------------------------- > GGC memory Leak Garbage > Freed Overhead Times > -------------------------------------------------------------------------------------------------------------------------------------------- > Total 2609M:100.0% > 497M:100.0% 3804M:100.0% 313M:100.0% 49M > -------------------------------------------------------------------------------------------------------------------------------------------- > > This looks more realistic. ssa_operands and init_ssanames shows that we > read really a lot of bodies into memory. I also wonder if we realy want > to compute virutal ssa form for them when we only want to compare them. > > After reading and symbol table merging I get: > > cgraph.h:2712 (allocate_cgraph_symbol) 148M: 7.1% 0 : > 0.0% 115M: 6.7% 0 : 0.0% 767k > > So it seems that about half of callgrpah nodes are inline clones, so > working on reducing clone overhead (in addition to re-visiting tree > merging once again) seems to be most meaningful right now. > > OK if patch passes testing?
OK. > * ggc-common.c (ggc_prune_overhead_list): Do not throw surviving > memory allocations away. > * mem-stats.h (mem_alloc_description<T>::release_object_overhead): > do not silently ignore invalid release requests. > Index: ggc-common.c > =================================================================== > --- ggc-common.c (revision 277796) > +++ ggc-common.c (working copy) > @@ -1003,10 +1003,10 @@ ggc_prune_overhead_list (void) > > for (; it != ggc_mem_desc.m_reverse_object_map->end (); ++it) > if (!ggc_marked_p ((*it).first)) > - (*it).second.first->m_collected += (*it).second.second; > - > - delete ggc_mem_desc.m_reverse_object_map; > - ggc_mem_desc.m_reverse_object_map = new map_t (13, false, false, false); > + { > + (*it).second.first->m_collected += (*it).second.second; > + ggc_mem_desc.m_reverse_object_map->remove ((*it).first); > + } > } > > /* Return memory used by heap in kb, 0 if this info is not available. */ > Index: mem-stats.h > =================================================================== > --- mem-stats.h (revision 277796) > +++ mem-stats.h (working copy) > @@ -535,11 +535,8 @@ inline void > mem_alloc_description<T>::release_object_overhead (void *ptr) > { > std::pair <T *, size_t> *entry = m_reverse_object_map->get (ptr); > - if (entry) > - { > - entry->first->release_overhead (entry->second); > - m_reverse_object_map->remove (ptr); > - } > + entry->first->release_overhead (entry->second); > + m_reverse_object_map->remove (ptr); > } > > /* Unregister a memory allocation descriptor registered with