> On Mon, Aug 20, 2012 at 6:27 PM, Jan Hubicka <hubi...@ucw.cz> wrote: > >> Xinliang David Li <davi...@google.com> writes: > >> > > >> > Process level synchronization problems can happen when two processes > >> > (running the instrumented binary) exit at the same time. The > >> > updated/merged counters from one process may be overwritten by another > >> > process -- this is true for both counter data and summary data. > >> > Solution 3) does not introduce any new problems. > >> > >> You could just use lockf() ? > > > > The issue here is holding lock for all the files (that can be many) versus > > number of locks limits & possibilities for deadlocking (mind that updating > > may happen in different orders on the same files for different programs > > built > > from same objects) > > > > For David: there is no thread safety code in mainline for the counters. > > Long time ago Zdenek implmented poor-mans TLS for counters (before TLS was > > invented) > > http://gcc.gnu.org/ml/gcc-patches/2001-11/msg01546.html but it was voted > > down > > as too memory expensive per thread. We could optionaly do atomic updates > > like ICC > > or combination of both as discussed in the thread. > > So far no one implemented it since the coverage fixups seems to work well > > enough in > > pracitce for multithreaded programs where reproducibility do not seem to be > > _that_ > > important. > > > > For GCC profiled bootstrap however I would like to see the output binary to > > be > > reproducible. We realy ought to update profiles safe for multple processes. > > Trashing whole process run is worse than doing race in increment. There is > > good > > chance that one of runs is more important than others and it will get > > trashed. > > > > I do not think we do have serious update problems in the summaries at the > > moment. > > We lock individual files as we update them. The summary is simple enough to > > be safe. > > sum_all is summed, max_all is maximum over the individual runs. Even when > > you combine > > mutiple programs the summary will end up same. Everything except for > > max_all is ignored > > anyway. > > > > Solution 2 (i.e. histogram streaming) will also have the property that it > > is safe > > WRT multiple programs, just like sum_all. > > I think the sum_all based scaling of the working set entries also has > this property. What is your opinion on saving the histogram in the
I think the scaling will have at lest roundoff issues WRT different merging orders. > summary and merging histograms together as best as possible compared > to the alternative of saving the working set information as now and > scaling it up by the ratio between the new and old sum_all when > merging? So far I like this option best. But David seems to lean towards thirtd option with whole file locking. I see it may show to be more extensible in the future. At the moment I do not understand two things 1) why do we need info on the number of counter above given threshold, sinc ethe hot/cold decisions usually depends purely on the count cutoff. Removing those will solve merging issues with variant 2 and then it would be probably good solution. 2) Do we plan to add some features in near future that will anyway require global locking? I guess LIPO itself does not count since it streams its data into independent file as you mentioned earlier and locking LIPO file is not that hard. Does LIPO stream everything into that common file, or does it use combination of gcda files and common summary? What other stuff Google plans to merge? (In general I would be curious about merging plans WRT profile stuff, so we get more synchronized and effective on getting patches in. We have about two months to get it done in stage1 and it would be nice to get as much as possible. Obviously some of the patches will need bit fo dicsussion like this one. Hope you do not find it frustrating, I actually think this is an important feature). I also realized today that the common value counters (used by switch, indirect call and div/mod value profiling) are non-stanble WRT different merging orders (i.e. parallel make in train run). I do not think there is actual solution to that except for not merging the counter section of this type in libgcov and merge them in some canonical order at profile feedback time. Perhaps we just want to live with this, since the disprepancy here is small. (i.e. these counters are quite rare and their outcome has just local effect on the final binary, unlike the global summaries/edge counters). Honza > > Thanks, > Teresa > > > > > Honza > >> > >> -Andi > > > > -- > Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413