> On Mon, Aug 20, 2012 at 6:27 PM, Jan Hubicka <hubi...@ucw.cz> wrote:
> >> Xinliang David Li <davi...@google.com> writes:
> >> >
> >> > Process level synchronization problems can happen when two processes
> >> > (running the instrumented binary) exit at the same time. The
> >> > updated/merged counters from one process may be overwritten by another
> >> > process -- this is true for both counter data and summary data.
> >> > Solution 3) does not introduce any new problems.
> >>
> >> You could just use lockf() ?
> >
> > The issue here is holding lock for all the files (that can be many) versus
> > number of locks limits & possibilities for deadlocking (mind that updating
> > may happen in different orders on the same files for different programs 
> > built
> > from same objects)
> >
> > For David: there is no thread safety code in mainline for the counters.
> > Long time ago Zdenek implmented poor-mans TLS for counters (before TLS was 
> > invented)
> > http://gcc.gnu.org/ml/gcc-patches/2001-11/msg01546.html but it was voted 
> > down
> > as too memory expensive per thread. We could optionaly do atomic updates 
> > like ICC
> > or combination of both as discussed in the thread.
> > So far no one implemented it since the coverage fixups seems to work well 
> > enough in
> > pracitce for multithreaded programs where reproducibility do not seem to be 
> > _that_
> > important.
> >
> > For GCC profiled bootstrap however I would like to see the output binary to 
> > be
> > reproducible. We realy ought to update profiles safe for multple processes.
> > Trashing whole process run is worse than doing race in increment. There is 
> > good
> > chance that one of runs is more important than others and it will get 
> > trashed.
> >
> > I do not think we do have serious update problems in the summaries at the 
> > moment.
> > We lock individual files as we update them. The summary is simple enough to 
> > be safe.
> > sum_all is summed, max_all is maximum over the individual runs. Even when 
> > you combine
> > mutiple programs the summary will end up same. Everything except for 
> > max_all is ignored
> > anyway.
> >
> > Solution 2 (i.e. histogram streaming) will also have the property that it 
> > is safe
> > WRT multiple programs, just like sum_all.
> 
> I think the sum_all based scaling of the working set entries also has
> this property. What is your opinion on saving the histogram in the

I think the scaling will have at lest roundoff issues WRT different merging 
orders.

> summary and merging histograms together as best as possible compared
> to the alternative of saving the working set information as now and
> scaling it up by the ratio between the new and old sum_all when
> merging?

So far I like this option best. But David seems to lean towards thirtd option 
with
whole file locking.  I see it may show to be more extensible in the future.
At the moment I do not understand two things
 1) why do we need info on the number of counter above given threshold, sinc 
ethe hot/cold
    decisions usually depends purely on the count cutoff.
    Removing those will solve merging issues with variant 2 and then it would 
be probably
    good solution.
 2) Do we plan to add some features in near future that will anyway require 
global locking?
    I guess LIPO itself does not count since it streams its data into 
independent file as you
    mentioned earlier and locking LIPO file is not that hard.
    Does LIPO stream everything into that common file, or does it use 
combination of gcda files
    and common summary?

    What other stuff Google plans to merge?
    (In general I would be curious about merging plans WRT profile stuff, so we 
get more
    synchronized and effective on getting patches in. We have about two months 
to get it done
    in stage1 and it would be nice to get as much as possible. Obviously some 
of the patches will
    need bit fo dicsussion like this one. Hope you do not find it frustrating, 
I actually think
    this is an important feature).

I also realized today that the common value counters (used by switch, indirect
call and div/mod value profiling) are non-stanble WRT different merging orders
(i.e.  parallel make in train run).  I do not think there is actual solution to
that except for not merging the counter section of this type in libgcov and
merge them in some canonical order at profile feedback time.  Perhaps we just
want to live with this, since the disprepancy here is small. (i.e. these
counters are quite rare and their outcome has just local effect on the final
binary, unlike the global summaries/edge counters).

Honza
> 
> Thanks,
> Teresa
> 
> >
> > Honza
> >>
> >> -Andi
> 
> 
> 
> -- 
> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

Reply via email to