On Thu, Oct 20, 2011 at 7:11 PM, Xinliang David Li <davi...@google.com> wrote: > On Thu, Oct 20, 2011 at 1:21 AM, Richard Guenther > <richard.guent...@gmail.com> wrote: >> On Thu, Oct 20, 2011 at 1:33 AM, Andi Kleen <a...@firstfloor.org> wrote: >>> x...@google.com (Rong Xu) writes: >>> >>>> After some off-line discussion, we decided to use a more general approach >>>> to control the printing of optimization messages/warnings. We will >>>> introduce a new option -fopt-info: >>>> * fopt-info=0 or fno-opt-info: no message will be emitted. >>>> * fopt-info or fopt-info=1: emit important warnings and optimization >>>> messages with large performance impact. >>>> * fopt-info=2: warnings and optimization messages targeting power users. >>>> * fopt-info=3: informational messages for compiler developers. >> >> This doesn't look scalable if you consider that each pass would print >> as much of a mess like -fvectorizer-verbose=5. > > What is not scalable? For level 1 dump, only the summary of > vectorization will be printed just like other loop transformations. > >> >> I think =2 and =3 should be omitted - we do have dump-files for a reason. > > Dump files are not easy to use -- it is big, and slow especially for > people with large distributed build systems. Having both level 2 and > 3 is debatable, but it will be useful to have a least one level above > level 1. Dump files are mainly for compiler developers, while > -fopt-info are for compiler developers *and* power users who know > performance tuning. >> >> Also the coverage/profile cases you changed do not at all match >> "... with large performance impact". In fact the impact is completely >> unknown (as it would be the case usually). > > Impact of any transformations is just 'potential', coverage problems > are no different from that. > >> >> I'd rather have a way to make dump-files more structured (so, following >> some standard reporting scheme) than introducing yet another way >> of output. [after making dump-files more consistent it will be easy >> to revisit patches like this, there would be a natural general central >> way to implement it] > > Yes, I remember we have discussed about this before -- currently dump > files are a big mess -- debug tracing, IR are all mixed up, but as I > said above, this is a different matter -- it is for compiler > developers. > > For more structured optimization report, we should use option > -fopt-report which dump optimization information based on category -- > the info data base can also be shared across modules: > > Example: > > [Loop Interchange] > File a, line x, yyyyyyy > File b, line xx, yyyyyyy > .... > File c, line z, It is beneficial to interchange the loop, but not > done because of possible carried dependency (caused by false aliasing > ...) > > [Loop Vectorization] > .... > > [Loop Unroll] > ... > > [SRA] > > [Alias summary] > [Global Vars] > a: addr exposed > b: add not exposed > .. > [Global Pointers] > .. > ...
I very well understand the intent. But I disagree with where you start to implement this. Dump files are _not_ only for developers - after all we don't have anything else. -fopt-report can get as big and unmanagable to read as dump files - in fact I argue it will be worse than dump files if you go beyond very very coarse reporting. Yes, dump files are a "mess". So - why not clean them up, and at the same time annotate dump file pieces so _automatic_ filtering and redirecting to stdout with something like -fopt-report would do something sensible? I don't see why dump files have to stay messy while you at the same time would need to add _new_ code to dump to stdout for -fopt-report. So, no, please do it the right way that benefits both compiler developers and your "power users". And yes, the right way is not to start adding that -fopt-report switch. The right way is to make dump-files consumable by mere mortals first. Thanks, Richard. > > Thanks, > > David > >> >> So, please fix dump-files instead. And for coverage/profiling, fill >> in stuff in a dump-file! >> >> Richard. >> >>> It would be interested to have some warnings about missing SRA >>> opportunities in =1 or =2. I found that sometimes fixing those can give a >>> large speedup. >>> >>> Right now a common case that prevents SRA on structure field >>> is simply a memset or memcpy. >>> >>> -Andi >>> >>> >>> -- >>> a...@linux.intel.com -- Speaking for myself only >>> >> >