Re: Add object allocators to symbol and call summaries

Richard Biener Wed, 06 Nov 2019 01:43:46 -0800

On Tue, Nov 5, 2019 at 6:53 PM Jan Hubicka <hubi...@ucw.cz> wrote:
>
> > On 11/5/19 3:48 PM, Jan Hubicka wrote:
> > > > >
> > > > > stringpool.c:63 (alloc_node)                            47M:  2.3%    
> > > > >     0 :  0.0%        0 :  0.0%        0 :  0.0%     1217k
> > > > > ipa-prop.c:4480 (ipa_read_edge_info)                    51M:  2.4%    
> > > > >     0 :  0.0%      260k:  0.0%      404k:  0.3%      531k
> > > > > hash-table.h:801 (expand)                               81M:  3.9%    
> > > > >     0 :  0.0%       80M:  4.7%       88k:  0.1%     3349
> > > > >    ^^^ some of memory comes here which ought to be accounted to 
> > > > > caller of
> > > > >    expand.
> > > >
> > > > Yes, these all come from ggc_internal_alloc. Ideally we should register 
> > > > a mem_alloc_description
> > > > for each created symbol/call_summary and register manually every 
> > > > allocation to such descriptor.
> > >
> > > Or just pass memory stats from caller of expand and transitively pass it
> > > from caller of summary. This will get us the line info of get_create
> > > call that is IMO OK.
> >
> > The issue with this approach is that you will spread a summary allocation
> > along all the ::get_create places. Which is not ideal.
>
> We get it with other allocations, too. Not ideal, but better.
> Even better solutions are welcome :)
> >
> > Try to take a look, or we can debug that on Thursday together?
> > Martin
>
> Found it.  It turns out that ggc_prune_ovehread_list is bogus.  It walks
> all active allocations objects and looks if they was collected accoutnig
> their collection and then throws away all allocations (including those
> not colelcted) and those gets no longer accounted later.  So we
> basically misaccount everything that survives ggc_collect.
>
> No wonder that it makes me to hunt ghosts 8-O
>
> Also the last memory report was sorted by garbage not leak for reason -
> for normal compilation we care about garbage produces primarily because
> those triggers ggc collects and makes compiler slow.
>
> BTW I like how advanced C++ gets back to lisp :)
>
> With the fix I get following stats by end of firefox WPA
>
> cfg.c:127 (alloc_block)                                 32M:  1.2%       12M: 
>  2.6%        0 :  0.0%        0 :  0.0%      446k
> symtab.c:582 (create_reference)                         42M:  1.6%        0 : 
>  0.0%       65M:  1.7%     1329k:  0.4%      840k
> gimple-streamer-in.c:101 (input_gimple_stmt)            49M:  1.9%       17M: 
>  3.5%        0 :  0.0%      375k:  0.1%      747k
> tree-ssanames.c:308 (make_ssa_name_fn)                  51M:  2.0%       16M: 
>  3.4%        0 :  0.0%        0 :  0.0%      973k
> ipa-cp.c:5157 (ipcp_store_vr_results)                   51M:  2.0%     1243k: 
>  0.2%        0 :  0.0%     9561k:  3.0%      146k
> stringpool.c:63 (alloc_node)                            53M:  2.0%        0 : 
>  0.0%        0 :  0.0%        0 :  0.0%     1362k
> ipa-prop.c:3988 (duplicate)                             63M:  2.4%     1115k: 
>  0.2%        0 :  0.0%       10M:  3.2%      264k
> toplev.c:904 (realloc_for_line_map)                     72M:  2.8%        0 : 
>  0.0%       71M:  1.9%       15M:  5.1%       27
> tree-ssanames.c:83 (init_ssanames)                      96M:  3.7%      121M: 
> 24.4%       44M:  1.2%       87M: 27.8%      174k
> tree-ssa-operands.c:265 (ssa_operand_alloc)            104M:  4.0%        0 : 
>  0.0%       39M:  1.0%        0 :  0.0%      105k
> stringpool.c:41 (stringpool_ggc_alloc)                 106M:  4.1%        0 : 
>  0.0%        0 :  0.0%     7652k:  2.4%     1362k
> lto/lto-common.c:204 (lto_read_in_decl_state)          160M:  6.2%        0 : 
>  0.0%      105M:  2.8%       19M:  6.1%     1731k
> cgraph.c:851 (create_edge)                             248M:  9.5%        0 : 
>  0.0%       70M:  1.9%        0 :  0.0%     3141k
> cgraph.h:2712 (allocate_cgraph_symbol)                 383M: 14.7%        0 : 
>  0.0%      155M:  4.1%        0 :  0.0%     1567k
> tree-streamer-in.c:631 (streamer_alloc_tree)           718M: 27.5%      136M: 
> 27.5%     1267M: 33.3%       64M: 20.6%       15M
> --------------------------------------------------------------------------------------------------------------------------------------------
> GGC memory                                              Leak          Garbage 
>            Freed        Overhead            Times
> --------------------------------------------------------------------------------------------------------------------------------------------
> Total                                                 2609M:100.0%      
> 497M:100.0%     3804M:100.0%      313M:100.0%       49M
> --------------------------------------------------------------------------------------------------------------------------------------------
>
> This looks more realistic. ssa_operands and init_ssanames shows that we
> read really a lot of bodies into memory. I also wonder if we realy want
> to compute virutal ssa form for them when we only want to compare them.
>
> After reading and symbol table merging I get:
>
> cgraph.h:2712 (allocate_cgraph_symbol)                 148M:  7.1%        0 : 
>  0.0%      115M:  6.7%        0 :  0.0%      767k
>
> So it seems that about half of callgrpah nodes are inline clones, so
> working on reducing clone overhead (in addition to re-visiting tree
> merging once again) seems to be most meaningful right now.
>
> OK if patch passes testing?


OK.

>         * ggc-common.c (ggc_prune_overhead_list): Do not throw surviving
>         memory allocations away.
>         * mem-stats.h (mem_alloc_description<T>::release_object_overhead):
>         do not silently ignore invalid release requests.
> Index: ggc-common.c
> ===================================================================
> --- ggc-common.c        (revision 277796)
> +++ ggc-common.c        (working copy)
> @@ -1003,10 +1003,10 @@ ggc_prune_overhead_list (void)
>
>    for (; it != ggc_mem_desc.m_reverse_object_map->end (); ++it)
>      if (!ggc_marked_p ((*it).first))
> -      (*it).second.first->m_collected += (*it).second.second;
> -
> -  delete ggc_mem_desc.m_reverse_object_map;
> -  ggc_mem_desc.m_reverse_object_map = new map_t (13, false, false, false);
> +      {
> +        (*it).second.first->m_collected += (*it).second.second;
> +       ggc_mem_desc.m_reverse_object_map->remove ((*it).first);
> +      }
>  }
>
>  /* Return memory used by heap in kb, 0 if this info is not available.  */
> Index: mem-stats.h
> ===================================================================
> --- mem-stats.h (revision 277796)
> +++ mem-stats.h (working copy)
> @@ -535,11 +535,8 @@ inline void
>  mem_alloc_description<T>::release_object_overhead (void *ptr)
>  {
>    std::pair <T *, size_t> *entry = m_reverse_object_map->get (ptr);
> -  if (entry)
> -    {
> -      entry->first->release_overhead (entry->second);
> -      m_reverse_object_map->remove (ptr);
> -    }
> +  entry->first->release_overhead (entry->second);
> +  m_reverse_object_map->remove (ptr);
>  }
>
>  /* Unregister a memory allocation descriptor registered with

Re: Add object allocators to symbol and call summaries

Reply via email to