On Thu, Mar 03, 2016 at 09:21:30AM +0100, Ingo Molnar wrote:
> 
> * Arnaldo Carvalho de Melo <[email protected]> wrote:
> 
> > Hi Ingo,
> > 
> >     Please consider pulling,
> > 
> > - Arnaldo
> > 
> > The following changes since commit 675965b00d734c985e4285f5bec7e524d15fc4e1:
> > 
> >   perf: Export perf_event_sysfs_show() (2016-02-29 09:35:27 +0100)
> > 
> > are available in the git repository at:
> > 
> >   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
> > tags/perf-core-for-mingo-20160229
> > 
> > for you to fetch changes up to 575a02e00b11eecbbabcb1eb22eab4c68e91ae77:
> > 
> >   perf record: Ensure return non-zero rc when mmap fail (2016-02-29 
> > 12:44:15 -0300)
> > 
> > ----------------------------------------------------------------
> > perf/core improvements and fixes:
> > 
> > User visible:
> > 
> > - Check existence of frontend/backed stalled cycles in 'perf stat' (Andi 
> > Kleen)
> > 
> > - Avoid installing .o files from tools/lib/ into the python extension (Jiri 
> > Olsa)
> > 
> > - Rename the tracepoint '/format' field that carries the syscall ID from 
> > 'nr',
> >   that is also the name of some syscalls arguments, to "__syscall_nr", to
> >   avoid having multiple fields with the same name, that was breaking the
> >   python script skeleton generator from perf.data files (Taeung Song)
> > 
> > - Support converting data from bpf events in 'perf data' (Wang Nan)
> > 
> > Infrastructure:
> > 
> > - Split libtraceevent's pevent_print_event() (Steven Rostedt)
> > 
> > - Librarize some 'perf record' bits to allow handling multiple perf.data
> >   files per session (Wang Nan)
> > 
> > - Ensure return non-zero rc when mmap fail in 'perf record' (Wang Nan)
> > 
> > Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
> > 
> > ----------------------------------------------------------------
> > Andi Kleen (1):
> >       perf stat: Check existence of frontend/backed stalled cycles
> 
> > 
> > Jiri Olsa (1):
> >       perf tools: Fix python extension build
> > 
> > Steven Rostedt (1):
> >       tools lib traceevent: Split pevent_print_event() into specific 
> > functionality functions
> > 
> > Taeung Song (2):
> >       perf trace: Check and discard not only 'nr' but also '__syscall_nr'
> >       tracing/syscalls: Rename "/format" tracepoint field name "nr" to 
> > "__syscall_nr:
> > 
> > Wang Nan (6):
> >       perf data: Support converting data from bpf_perf_event_output()
> >       perf data: Explicitly set byte order for integer types
> >       perf record: Use WARN_ONCE to replace 'if' condition
> >       perf record: Extract synthesize code to record__synthesize()
> >       perf record: Introduce record__finish_output() to finish a perf.data
> >       perf record: Ensure return non-zero rc when mmap fail
> > 
> >  kernel/trace/trace_syscalls.c      |  16 ++--
> >  tools/lib/traceevent/event-parse.c | 136 +++++++++++++++++++++++-------
> >  tools/lib/traceevent/event-parse.h |  13 +++
> >  tools/perf/builtin-record.c        | 168 
> > ++++++++++++++++++++++---------------
> >  tools/perf/builtin-stat.c          |  22 ++++-
> >  tools/perf/builtin-trace.c         |   8 +-
> >  tools/perf/util/data-convert-bt.c  | 118 +++++++++++++++++++++++++-
> >  tools/perf/util/setup.py           |   4 +
> >  8 files changed, 372 insertions(+), 113 deletions(-)
> 
> Hm, there's a 'perf stat' regression that I can see:
> 
> Before:
> 
>  triton:~/tip> perf stat -a sleep 1
> 
>  Performance counter stats for 'system wide':
> 
>       11990.023100      task-clock (msec)         #   11.981 CPUs utilized    
>       
>              8,802      context-switches          #    0.734 K/sec            
>       
>                543      cpu-migrations            #    0.045 K/sec            
>       
>             97,375      page-faults               #    0.008 M/sec            
>       
>      9,854,385,894      cycles                    #    0.822 GHz              
>       
>     15,274,841,152      stalled-cycles-frontend   #  155.01% frontend cycles 
> idle   
>    <not supported>      stalled-cycles-backend   
>      9,634,486,137      instructions              #    0.98  insn per cycle   
>       
>                                                   #    1.59  stalled cycles 
> per insn
>      1,818,488,088      branches                  #  151.667 M/sec            
>       
>         46,365,120      branch-misses             #    2.55% of all branches  
>       
> 
>        1.000741599 seconds time elapsed
> 
> After:
> 
>  triton:~/tip> perf stat -a sleep 1
> 
>  Performance counter stats for 'system wide':
> 
>       11989.280397      task-clock (msec)         #   11.981 CPUs utilized    
>       
>               1299      context-switches          #    0.108 K/sec            
>       
>                  6      cpu-migrations            #    0.001 K/sec            
>       
>                 70      page-faults               #    0.006 K/sec            
>       
>          127008602      cycles                    #    0.011 GHz              
>       
>          279538533      stalled-cycles-frontend   #  220.09% frontend cycles 
> idle   
>          119213269      instructions              #    0.94  insn per cycle   
>       
>                                                   #    2.34  stalled cycles 
> per insn
>           24166678      branches                  #    2.016 M/sec            
>       
>             505681      branch-misses             #    2.09% of all branches  
>       
> 
>        1.000684278 seconds time elapsed
> 
> 
> ... see how the numbers became human-unreadable, losing the big-number 
> separator?
> 
> I suspect it's due to the following commit:
> 
>   fa184776ac27 perf stat: Check existence of frontend/backed stalled cycles

yea, it used the pmu parsing which screwes locales,
following patch fixed that for me..

jirka


---
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index ce61f79dbaae..d8cd038baed2 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -124,6 +124,17 @@ static int perf_pmu__parse_scale(struct perf_pmu_alias 
*alias, char *dir, char *
        lc = setlocale(LC_NUMERIC, NULL);
 
        /*
+        * The lc string may be allocated in static storage,
+        * so get a dynamic copy to make it survive setlocale
+        * call below.
+        */
+       lc = strdup(lc);
+       if (!lc) {
+               ret = -ENOMEM;
+               goto error;
+       }
+
+       /*
         * force to C locale to ensure kernel
         * scale string is converted correctly.
         * kernel uses default C locale.
@@ -135,6 +146,8 @@ static int perf_pmu__parse_scale(struct perf_pmu_alias 
*alias, char *dir, char *
        /* restore locale */
        setlocale(LC_NUMERIC, lc);
 
+       free((char *) lc);
+
        ret = 0;
 error:
        close(fd);

Reply via email to