Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)

Xinliang David Li Mon, 20 Aug 2012 23:16:03 -0700

On Mon, Aug 20, 2012 at 10:29 PM, Jan Hubicka <hubi...@ucw.cz> wrote:
>> On Mon, Aug 20, 2012 at 6:27 PM, Jan Hubicka <hubi...@ucw.cz> wrote:
>> >> Xinliang David Li <davi...@google.com> writes:
>> >> >
>> >> > Process level synchronization problems can happen when two processes
>> >> > (running the instrumented binary) exit at the same time. The
>> >> > updated/merged counters from one process may be overwritten by another
>> >> > process -- this is true for both counter data and summary data.
>> >> > Solution 3) does not introduce any new problems.
>> >>
>> >> You could just use lockf() ?
>> >
>> > The issue here is holding lock for all the files (that can be many) versus
>> > number of locks limits & possibilities for deadlocking (mind that updating
>> > may happen in different orders on the same files for different programs 
>> > built
>> > from same objects)
>> >
>> > For David: there is no thread safety code in mainline for the counters.
>> > Long time ago Zdenek implmented poor-mans TLS for counters (before TLS was 
>> > invented)
>> > http://gcc.gnu.org/ml/gcc-patches/2001-11/msg01546.html but it was voted 
>> > down
>> > as too memory expensive per thread. We could optionaly do atomic updates 
>> > like ICC
>> > or combination of both as discussed in the thread.
>> > So far no one implemented it since the coverage fixups seems to work well 
>> > enough in
>> > pracitce for multithreaded programs where reproducibility do not seem to 
>> > be _that_
>> > important.
>> >
>> > For GCC profiled bootstrap however I would like to see the output binary 
>> > to be
>> > reproducible. We realy ought to update profiles safe for multple processes.
>> > Trashing whole process run is worse than doing race in increment. There is 
>> > good
>> > chance that one of runs is more important than others and it will get 
>> > trashed.
>> >
>> > I do not think we do have serious update problems in the summaries at the 
>> > moment.
>> > We lock individual files as we update them. The summary is simple enough 
>> > to be safe.
>> > sum_all is summed, max_all is maximum over the individual runs. Even when 
>> > you combine
>> > mutiple programs the summary will end up same. Everything except for 
>> > max_all is ignored
>> > anyway.
>> >
>> > Solution 2 (i.e. histogram streaming) will also have the property that it 
>> > is safe
>> > WRT multiple programs, just like sum_all.
>>
>> I think the sum_all based scaling of the working set entries also has
>> this property. What is your opinion on saving the histogram in the
>
> I think the scaling will have at lest roundoff issues WRT different merging 
> orders.
>
>> summary and merging histograms together as best as possible compared
>> to the alternative of saving the working set information as now and
>> scaling it up by the ratio between the new and old sum_all when
>> merging?
>
> So far I like this option best. But David seems to lean towards thirtd option 
> with
> whole file locking.  I see it may show to be more extensible in the future.
> At the moment I do not understand two things
>  1) why do we need info on the number of counter above given threshold, sinc 
> ethe hot/cold
>     decisions usually depends purely on the count cutoff.
>     Removing those will solve merging issues with variant 2 and then it would 
> be probably
>     good solution.


This is useful for large applications with a long tail. The
instruction working set for those applications are very large, and
inliner and unroller need to be aware of that and good heuristics can
be developed to throttle aggressive code bloat transformations. For
inliner, it is kind of the like the global budget but more application
dependent. In the long run, we will collect more advanced fdo summary
regarding working set -- it will be working set size for each code
region (locality region).


>  2) Do we plan to add some features in near future that will anyway require 
> global locking?
>     I guess LIPO itself does not count since it streams its data into 
> independent file as you
>     mentioned earlier and locking LIPO file is not that hard.
>     Does LIPO stream everything into that common file, or does it use 
> combination of gcda files
>     and common summary?

Actually, LIPO module grouping information are stored in gcda files.
It is also stored in a separate .imports file (one per object) ---
this is primarily used by our build system for dependence information.


>
>     What other stuff Google plans to merge?
>     (In general I would be curious about merging plans WRT profile stuff, so 
> we get more
>     synchronized and effective on getting patches in. We have about two 
> months to get it done
>     in stage1 and it would be nice to get as much as possible. Obviously some 
> of the patches will
>     need bit fo dicsussion like this one. Hope you do not find it 
> frustrating, I actually think
>     this is an important feature).

We plan to merge in the new LIPO implementation based on LTO
streaming. Rong Xu finished this in 4.6 based compiler, and he needs
to port it to 4.8.


thanks,

David

>
> I also realized today that the common value counters (used by switch, indirect
> call and div/mod value profiling) are non-stanble WRT different merging orders
> (i.e.  parallel make in train run).  I do not think there is actual solution 
> to
> that except for not merging the counter section of this type in libgcov and
> merge them in some canonical order at profile feedback time.  Perhaps we just
> want to live with this, since the disprepancy here is small. (i.e. these
> counters are quite rare and their outcome has just local effect on the final
> binary, unlike the global summaries/edge counters).
>
> Honza
>>
>> Thanks,
>> Teresa
>>
>> >
>> > Honza
>> >>
>> >> -Andi
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)

Reply via email to