Re: consistency for statistics with XDP mode

2018-12-03 Thread Arnaldo Carvalho de Melo
Em Mon, Dec 03, 2018 at 11:30:01AM -0800, David Miller escreveu:
> From: David Ahern 
> Date: Mon, 3 Dec 2018 08:45:12 -0700
> 
> > On 12/1/18 4:22 AM, Jesper Dangaard Brouer wrote:
> >> IMHO XDP_DROP should not be accounted as netdev stats drops, this is a
> >> user installed program like tc/iptables, that can also choose to drop
> >> packets.
> > 
> > sure and both tc and iptables have counters that can see the dropped
> > packets. A counter in the driver level stats ("xdp_drop" is fine with
> > with me).
> 
> Part of the problem I have with this kind of logic is we take the choice
> away from the XDP program.
> 
> If I feel that the xdp_drop counter bump is too much overhead during a
> DDoS attack and I want to avoid it, you don't give me a choice in the
> matter.
> 
> If I want to represent the statistics for that event differently, you
> also give me no choice about it.
> 
> Really, if XDP_DROP is returned, zero resources should be devoted to
> the frame past that point.
> 
> I know you want to live in this magical world where XDP stuff behaves
> like the existing stack and give you all of the visibility to events
> and objects.
> 
> But that is your choice.
> 
> Please give others the choice to not live in that world and allow XDP
> programs to live in their own entirely different environment, with
> custom statistics and complete control over how counters are
> incremented and how objects are used and represented, if they choose
> to do so.
> 
> XDP is about choice.

Coming out of the blue...: the presence of a "struct xdp_stats" in the
XDP program BPF object file .BTF section, one could query and the parse
to figure out what stats, if any, are provided.

/me goes back to tweaking his btf_loader in pahole... :-)

- Arnaldo


Re: Help with the BPF verifier

2018-11-05 Thread Arnaldo Carvalho de Melo
Em Fri, Nov 02, 2018 at 09:27:52PM +, Yonghong Song escreveu:
> 
> 
> On 11/2/18 8:42 AM, Edward Cree wrote:
> > On 02/11/18 15:02, Arnaldo Carvalho de Melo wrote:
> >> Yeah, didn't work as well:
> > 
> >> And the -vv in 'perf trace' didn't seem to map to further details in the
> >> output of the verifier debug:
> > Yeah for log_level 2 you probably need to make source-level changes to 
> > either
> >   perf or libbpf (I think the latter).  It's annoying that essentially no 
> > tools
> >   plumb through an option for that, someone should fix them ;-)
> > 
> >> libbpf: -- BEGIN DUMP LOG ---
> >> libbpf:
> >> 0: (bf) r6 = r1
> >> 1: (bf) r1 = r10
> >> 2: (07) r1 += -328
> >> 3: (b7) r7 = 64
> >> 4: (b7) r2 = 64
> >> 5: (bf) r3 = r6
> >> 6: (85) call bpf_probe_read#4
> >> 7: (79) r1 = *(u64 *)(r10 -320)
> >> 8: (15) if r1 == 0x101 goto pc+4
> >>   R0=inv(id=0) R1=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=inv64 
> >> R10=fp0,call_-1
> >> 9: (55) if r1 != 0x2 goto pc+22
> >>   R0=inv(id=0) R1=inv2 R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
> >> 10: (bf) r1 = r6
> >> 11: (07) r1 += 16
> >> 12: (05) goto pc+2
> >> 15: (79) r3 = *(u64 *)(r1 +0)
> >> dereference of modified ctx ptr R1 off=16 disallowed
> > Aha, we at least got a different error message this time.
> > And indeed llvm has done that optimisation, rather than the more obvious
> > 11: r3 = *(u64 *)(r1 +16)
> >   because it wants to have lots of reads share a single insn.  You may be 
> > able
> >   to defeat that optimisation by adding compiler barriers, idk.  Maybe 
> > someone
> >   with llvm knowledge can figure out how to stop it (ideally, llvm would 
> > know
> >   when it's generating for bpf backend and not do that).  -O0?  ¯\_(ツ)_/¯
> 
> The optimization mostly likes below:
> br1:
>   ...
>   r1 += 16
>   goto merge
> br2:
>   ...
>   r1 += 20
>   goto merge
> merge:
>   *(u64 *)(r1 + 0)
> 
> The compiler tries to merge common loads. There is no easy way to
> stop this compiler optimization without turning off a lot of other
> optimizations. The easiest way is to add barriers
> __asm__ __volatile__("": : :"memory")
> after the ctx memory access to prevent their down stream merging.

Great, this made it work:

cat /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c
#include 
#include 

/* bpf-output associated map */
struct bpf_map SEC("maps") __augmented_syscalls__ = {
.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(u32),
.max_entries = __NR_CPUS__,
};

struct syscall_enter_args {
unsigned long long common_tp_fields;
long   syscall_nr;
unsigned long  args[6];
};

struct syscall_exit_args {
unsigned long long common_tp_fields;
long   syscall_nr;
long   ret;
};

struct augmented_filename {
unsigned intsize;
int reserved;
charvalue[256];
};

#define SYS_OPEN 2
#define SYS_OPENAT 257

SEC("raw_syscalls:sys_enter")
int sys_enter(struct syscall_enter_args *args)
{
struct {
struct syscall_enter_args args;
struct augmented_filename filename;
} augmented_args;
unsigned int len = sizeof(augmented_args);
const void *filename_arg = NULL;

probe_read(_args.args, sizeof(augmented_args.args), args);

switch (augmented_args.args.syscall_nr) {
case SYS_OPEN:   filename_arg = (const void *)args->args[0];
__asm__ __volatile__("": : :"memory");
 break;
case SYS_OPENAT: filename_arg = (const void *)args->args[1];
 break;
default:
 return 0;
}

if (filename_arg != NULL) {
augmented_args.filename.reserved = 0;
augmented_args.filename.size = 
probe_read_str(_args.filename.value,
  
sizeof(augmented_args.filename.value),
  filename_arg);
if (augmented_args.filename.size < 
sizeof(augmented_args.filename.value)) {
len -= sizeof(augmented_args.filename.value) - 
augmented_args.filename.size;
len &= sizeof(augmented_args.filename.value) - 1;
}
} else {
len = sizeof(augmented_args.args);

Re: Help with the BPF verifier

2018-11-03 Thread Arnaldo Carvalho de Melo
Em Sat, Nov 03, 2018 at 08:29:34AM -0300, Arnaldo Carvalho de Melo escreveu:
> PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
> Preprocessed source(s) and associated run script(s) are located at:
> clang-7: note: diagnostic msg: /tmp/augmented_raw_syscalls-7444d9.c
> clang-7: note: diagnostic msg: /tmp/augmented_raw_syscalls-7444d9.sh
> clang-7: note: diagnostic msg: 
> 
> 
> ERROR:unable to compile 
> tools/perf/examples/bpf/augmented_raw_syscalls.c
> Hint: Check error message shown above.
> Hint: You can also pre-compile it into .o using:
>   clang -target bpf -O2 -c 
> tools/perf/examples/bpf/augmented_raw_syscalls.c
>   with proper -I and -D options.
> event syntax error: 'tools/perf/examples/bpf/augmented_raw_syscalls.c'
>  \___ Failed to load 
> tools/perf/examples/bpf/augmented_raw_syscalls.c from source: Error when 
> compiling BPF scriptlet
> 
> (add -v to see detail)
> Run 'perf list' for a list of valid events
> 
>  Usage: perf trace [] []
> or: perf trace [] --  []
> or: perf trace record [] []
> or: perf trace record [] --  []
> 
> -e, --eventevent/syscall selector. use 'perf list' to list 
> available events
> [root@seventh perf]# 
> 
> Trying with -O1...

-O1 doesn't get clang confused, its just the verifier that doesn't like
the result, i.e. we're back to that optimization, that isn't disabled
with -O1

llvm compiling command : /usr/local/bin/clang -D__KERNEL__ -D__NR_CPUS__=4 
-DLINUX_VERSION_CODE=0x41300  -I/home/acme/lib/perf/include/bpf  -nostdinc 
-isystem /usr/lib/gcc/x86_64-redhat-linux/7/include 
-I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  
-I/home/acme/git/linux/include -I./include 
-I/home/acme/git/linux/arch/x86/include/uapi 
-I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi 
-I./include/generated/uapi -include 
/home/acme/git/linux/include/linux/kconfig.h  -Wno-unused-value 
-Wno-pointer-sign -working-directory 
/lib/modules/4.19.0-rc8-00014-gc0cff31be705/build -c 
/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c -target 
bpf  -O1 -o - 
libbpf: loading object 'tools/perf/examples/bpf/augmented_raw_syscalls.c' from 
buffer
libbpf: section(1) .strtab, size 168, link 0, flags 0, type=3
libbpf: skip section(1) .strtab
libbpf: section(2) .text, size 0, link 0, flags 6, type=1
libbpf: skip section(2) .text
libbpf: section(3) raw_syscalls:sys_enter, size 344, link 0, flags 6, type=1
libbpf: found program raw_syscalls:sys_enter
libbpf: section(4) .relraw_syscalls:sys_enter, size 16, link 10, flags 0, type=9
libbpf: section(5) raw_syscalls:sys_exit, size 16, link 0, flags 6, type=1
libbpf: found program raw_syscalls:sys_exit
libbpf: section(6) maps, size 56, link 0, flags 3, type=1
libbpf: section(7) license, size 4, link 0, flags 3, type=1
libbpf: license of tools/perf/examples/bpf/augmented_raw_syscalls.c is GPL
libbpf: section(8) version, size 4, link 0, flags 3, type=1
libbpf: kernel version of tools/perf/examples/bpf/augmented_raw_syscalls.c is 
41300
libbpf: section(9) .llvm_addrsig, size 6, link 10, flags 8000, 
type=1879002115
libbpf: skip section(9) .llvm_addrsig
libbpf: section(10) .symtab, size 240, link 1, flags 0, type=2
libbpf: maps in tools/perf/examples/bpf/augmented_raw_syscalls.c: 2 maps in 56 
bytes
libbpf: map 0 is "__augmented_syscalls__"
libbpf: map 1 is "__bpf_stdout__"
libbpf: collecting relocating info for: 'raw_syscalls:sys_enter'
libbpf: relo for 4 value 28 name 124
libbpf: relocation: insn_idx=35
libbpf: relocation: find map 1 (__augmented_syscalls__) for insn 35
Added extra kernel map __entry_SYSCALL_64_trampoline 
fe006000-fe007000
Added extra kernel map __entry_SYSCALL_64_trampoline 
fe032000-fe033000
Added extra kernel map __entry_SYSCALL_64_trampoline 
fe05e000-fe05f000
Added extra kernel map __entry_SYSCALL_64_trampoline 
fe08a000-fe08b000
bpf: config program 'raw_syscalls:sys_enter'
bpf: config program 'raw_syscalls:sys_exit'
libbpf: create map __bpf_stdout__: fd=3
libbpf: create map __augmented_syscalls__: fd=4
libbpf: load bpf program failed: Permission denied
libbpf: -- BEGIN DUMP LOG ---
libbpf: 
0: (bf) r6 = r1
1: (bf) r1 = r10
2: (07) r1 += -328
3: (b7) r7 = 64
4: (b7) r2 = 64
5: (bf) r3 = r6
6: (85) call bpf_probe_read#4
7: (79) r1 = *(u64 *)(r10 -320)
8: (15) if r1 == 0x101 goto pc+4
 R0=inv(id=0) R1=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
9: (55) if r1 != 0x2 goto pc+22
 R0=inv(id=0) R1=inv2 R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
10: (bf) r1 = r6
11: (07) r1 += 16
12: (05) goto pc+2
15: (79) r3 = *(u64 *)(r1 +0)
dereference of modified ctx ptr R1 off=16 disallowed

libbpf: -- END LOG --
libbpf: failed to load program 'raw_syscalls:sys_enter'
libbpf: failed to load object 'to

Re: Help with the BPF verifier

2018-11-03 Thread Arnaldo Carvalho de Melo
Em Fri, Nov 02, 2018 at 03:42:49PM +, Edward Cree escreveu:
> On 02/11/18 15:02, Arnaldo Carvalho de Melo wrote:
> > Yeah, didn't work as well: 
> 
> > And the -vv in 'perf trace' didn't seem to map to further details in the
> > output of the verifier debug:
> Yeah for log_level 2 you probably need to make source-level changes to either
>  perf or libbpf (I think the latter).  It's annoying that essentially no tools
>  plumb through an option for that, someone should fix them ;-)
> 
> > libbpf: -- BEGIN DUMP LOG ---
> > libbpf: 
> > 0: (bf) r6 = r1
> > 1: (bf) r1 = r10
> > 2: (07) r1 += -328
> > 3: (b7) r7 = 64
> > 4: (b7) r2 = 64
> > 5: (bf) r3 = r6
> > 6: (85) call bpf_probe_read#4
> > 7: (79) r1 = *(u64 *)(r10 -320)
> > 8: (15) if r1 == 0x101 goto pc+4
> >  R0=inv(id=0) R1=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
> > 9: (55) if r1 != 0x2 goto pc+22
> >  R0=inv(id=0) R1=inv2 R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
> > 10: (bf) r1 = r6
> > 11: (07) r1 += 16
> > 12: (05) goto pc+2
> > 15: (79) r3 = *(u64 *)(r1 +0)
> > dereference of modified ctx ptr R1 off=16 disallowed
> Aha, we at least got a different error message this time.
> And indeed llvm has done that optimisation, rather than the more obvious
> 11: r3 = *(u64 *)(r1 +16)
>  because it wants to have lots of reads share a single insn.  You may be able
>  to defeat that optimisation by adding compiler barriers, idk.  Maybe someone
>  with llvm knowledge can figure out how to stop it (ideally, llvm would know
>  when it's generating for bpf backend and not do that).  -O0?  ¯\_(ツ)_/¯

set env: NR_CPUS=4
set env: LINUX_VERSION_CODE=0x41300
set env: CLANG_EXEC=/usr/local/bin/clang
unset env: CLANG_OPTIONS
set env: KERNEL_INC_OPTIONS= -nostdinc -isystem 
/usr/lib/gcc/x86_64-redhat-linux/7/include 
-I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  
-I/home/acme/git/linux/include -I./include 
-I/home/acme/git/linux/arch/x86/include/uapi 
-I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi 
-I./include/generated/uapi -include 
/home/acme/git/linux/include/linux/kconfig.h 
set env: PERF_BPF_INC_OPTIONS=-I/home/acme/lib/perf/include/bpf
set env: WORKING_DIR=/lib/modules/4.19.0-rc8-00014-gc0cff31be705/build
set env: 
CLANG_SOURCE=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c
llvm compiling command template: $CLANG_EXEC -D__KERNEL__ 
-D__NR_CPUS__=$NR_CPUS -DLINUX_VERSION_CODE=$LINUX_VERSION_CODE $CLANG_OPTIONS 
$PERF_BPF_INC_OPTIONS $KERNEL_INC_OPTIONS -Wno-unused-value -Wno-pointer-sign 
-working-directory $WORKING_DIR -c "$CLANG_SOURCE" -target bpf $CLANG_EMIT_LLVM 
-O2 -o - $LLVM_OPTIONS_PIPE
llvm compiling command : /usr/local/bin/clang -D__KERNEL__ -D__NR_CPUS__=4 
-DLINUX_VERSION_CODE=0x41300  -I/home/acme/lib/perf/include/bpf  -nostdinc 
-isystem /usr/lib/gcc/x86_64-redhat-linux/7/include 
-I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  
-I/home/acme/git/linux/include -I./include 
-I/home/acme/git/linux/arch/x86/include/uapi 
-I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi 
-I./include/generated/uapi -include 
/home/acme/git/linux/include/linux/kconfig.h  -Wno-unused-value 
-Wno-pointer-sign -working-directory 
/lib/modules/4.19.0-rc8-00014-gc0cff31be705/build -c 
/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c -target 
bpf  -O2 -o -

So it is using -O2, lets try with -O0...

So I added this to my ~/.perfconfig, i.e. the default clang command line
template replacing -O2 with -O0.

[root@seventh perf]# cat ~/.perfconfig 
[llvm]
clang-bpf-cmd-template = "$CLANG_EXEC -D__KERNEL__ 
-D__NR_CPUS__=$NR_CPUS -DLINUX_VERSION_CODE=$LINUX_VERSION_CODE $CLANG_OPTIONS 
$PERF_BPF_INC_OPTIONS $KERNEL_INC_OPTIONS -Wno-unused-value -Wno-pointer-sign 
-working-directory $WORKING_DIR -c \"$CLANG_SOURCE\" -target bpf 
$CLANG_EMIT_LLVM -O0 -o - $LLVM_OPTIONS_PIPE"
# dump-obj = true
[root@seventh perf]# 

And got an explosion:

# trace -vv -e tools/perf/examples/bpf/augmented_raw_syscalls.c sleep 1
bpf: builtin compilation failed: -95, try external compiler
Kernel build dir is set to /lib/modules/4.19.0-rc8-00014-gc0cff31be705/build
set env: KBUILD_DIR=/lib/modules/4.19.0-rc8-00014-gc0cff31be705/build
unset env: KBUILD_OPTS
include option is set to  -nostdinc -isystem 
/usr/lib/gcc/x86_64-redhat-linux/7/include 
-I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  
-I/home/acme/git/linux/include -I./include 
-I/home/acme/git/linux/arch/x86/include/uapi 
-I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi 
-I./include/generated/uapi -include 
/home/acme/git/linux/include/linux/kconfig.h 
set env: NR_CPUS=4
set env: LINUX_VERSION_CODE=0x41300
set env: CLANG_EXE

Re: Help with the BPF verifier

2018-11-02 Thread Arnaldo Carvalho de Melo
Em Thu, Nov 01, 2018 at 08:05:07PM +, Edward Cree escreveu:
> On 01/11/18 18:52, Arnaldo Carvalho de Melo wrote:
> >  R0=inv(id=0) R1=inv6 R2=inv6 R3=inv(id=0) R6=ctx(id=0,off=0,imm=0) 
> > R7=inv64 R10=fp0,call_-1
> > 15: (b7) r2 = 0
> > 16: (63) *(u32 *)(r10 -260) = r2
> > 17: (67) r1 <<= 32
> > 18: (77) r1 >>= 32
> > 19: (67) r1 <<= 3
> > 20: (bf) r2 = r6
> > 21: (0f) r2 += r1
> > 22: (79) r3 = *(u64 *)(r2 +16)
> > R2 invalid mem access 'inv'
> I wonder if you could run this with verifier log level 2?  (I'm not sure how
>  you would go about plumbing that through the perf tooling.)  It seems very
>  odd that it ends up with R2=inv, and I'm wondering whether R1 becomes unknown
>  during the shifts or whether the addition in insn 21 somehow produces the
>  unknown-ness.  (I know we used to have a thing[1] where doing ptr += K and
>  then also having an offset in the LDX produced an error about
>  ptr+const+const, but that seems to have been fixed at some point.)
> 
> Note however that even if we get past this, R1 at this point holds 6, so it
>  looks like the verifier is walking the impossible path where we're inside the
>  'if' even though filename_arg = 6.  This is a (slightly annoying) verifier
>  limitation, that it walks paths with impossible combinations of constraints
>  (we've previously had cases where assertions in the verifier would blow up
>  because of this, e.g. registers with max_val less than min_val).  So if the
>  check_ctx_access() is going to worry about whether you're off the end of the
>  array (I'm not sure what your program type is and thus which is_valid_access
>  callback is involved), then it'll complain about this.
> If filename_arg came from some external source you'd have a different
>  problem, because then it would have a totally unknown value, that on entering
>  the 'if' becomes "unknown but < 6", which is still too variable to have as
>  the offset of a ctx access.  Those have to be at a known constant offset, so
>  that we can determine the type of the returned value.
> 
> As a way to fix this, how about [UNTESTED!]:
>     const void *filename_arg = NULL;
>     /* ... */
>     switch (augmented_args.args.syscall_nr) {
>         case SYS_OPEN: filename_arg = args->args[0]; break;
>         case SYS_OPENAT: filename_arg = args->args[1]; break;
>     }
>     /* ... */
>     if (filename_arg) {
>     /* stuff */
>     blah = probe_read_str(/* ... */, filename_arg);
>     } else {
>     /* the other stuff */
>     }
> That way, you're only ever dealing in constant pointers (although judging by
>  an old thread I found[1] about ptr+const+const, the compiler might decide to
>  make some optimisations that end up looking like your existing code).

Yeah, didn't work as well:

SEC("raw_syscalls:sys_enter")
int sys_enter(struct syscall_enter_args *args)
{
struct {
struct syscall_enter_args args;
struct augmented_filename filename;
} augmented_args;
unsigned int len = sizeof(augmented_args);
const void *filename_arg = NULL;

probe_read(_args.args, sizeof(augmented_args.args), args);

switch (augmented_args.args.syscall_nr) {
case SYS_OPEN:   filename_arg = (const void *)args->args[0]; break;
case SYS_OPENAT: filename_arg = (const void *)args->args[1]; break;
}

if (filename_arg != NULL) {
augmented_args.filename.reserved = 0;
augmented_args.filename.size = 
probe_read_str(_args.filename.value,
  
sizeof(augmented_args.filename.value),
  filename_arg);
if (augmented_args.filename.size < 
sizeof(augmented_args.filename.value)) {
len -= sizeof(augmented_args.filename.value) - 
augmented_args.filename.size;
len &= sizeof(augmented_args.filename.value) - 1;
}
} else {
len = sizeof(augmented_args.args);
}

perf_event_output(args, &__augmented_syscalls__, BPF_F_CURRENT_CPU, 
_args, len);
return 0;
}

And the -vv in 'perf trace' didn't seem to map to further details in the
output of the verifier debug:

# trace -vv -e tools/perf/examples/bpf/augmented_raw_syscalls.c sleep 1
bpf: builtin compilation failed: -95, try external compiler
Kernel build dir is set to /lib/modules/4.19.0-rc8-00014-gc0cff31be705/build
set env: KBUILD_DIR=/lib/modules/4.19.0-rc8-00014-gc0cff31be705/build
unset env: KBUILD_OPTS
include option is set to  -nostdinc -isystem 
/usr/lib/gcc/x86_64-redhat-linux/7/include 
-I/home/acme/git/linux/arch/x86/include -I./arch

Re: [PATCH bpf 1/4] bpf: fix partial copy of map_ptr when dst is scalar

2018-11-01 Thread Arnaldo Carvalho de Melo
Em Thu, Nov 01, 2018 at 07:17:29PM +, Edward Cree escreveu:
> On 31/10/18 23:05, Daniel Borkmann wrote:
> > ALU operations on pointers such as scalar_reg += map_value_ptr are
> > handled in adjust_ptr_min_max_vals(). Problem is however that map_ptr
> > and range in the register state share a union, so transferring state
> > through dst_reg->range = ptr_reg->range is just buggy as any new
> > map_ptr in the dst_reg is then truncated (or null) for subsequent
> > checks. Fix this by adding a raw member and use it for copying state
> > over to dst_reg.
> >
> > Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
> > Signed-off-by: Daniel Borkmann 
> > Cc: Edward Cree 
> > Acked-by: Alexei Starovoitov 
> > ---
> Acked-by: Edward Cree 
> (though I apparently missed the 63-minute window to hit the git record...)

Those guys are fast! :-)

- Arnaldo


Re: Help with the BPF verifier

2018-11-01 Thread Arnaldo Carvalho de Melo
Em Thu, Nov 01, 2018 at 12:10:39PM -0700, David Miller escreveu:
> From: Arnaldo Carvalho de Melo 
> Date: Thu, 1 Nov 2018 15:52:17 -0300
> 
> > 50  unsigned int filename_arg = 6;
>  ...
> > --- /wb/augmented_raw_syscalls.c.old2018-11-01 15:43:55.000394234 
> > -0300
> > +++ /wb/augmented_raw_syscalls.c2018-11-01 15:44:15.102367838 -0300
> > @@ -67,7 +67,7 @@
> > augmented_args.filename.reserved = 0;
> > augmented_args.filename.size = 
> > probe_read_str(_args.filename.value,
> >   
> > sizeof(augmented_args.filename.value),
> > - (const void 
> > *)args->args[0]);
> > + (const void 
> > *)args->args[filename_arg]);
> 
> args[] is sized to '6', therefore the last valid index is '5', yet you're 
> using '6' here which
> is one entry past the end of the declared array.

Nope... this is inside an if:

if (filename_arg <= 5) {
augmented_args.filename.reserved = 0;
augmented_args.filename.size = 
probe_read_str(_args.filename.value,
  
sizeof(augmented_args.filename.value),
  (const void 
*)args->args[filename_arg]);
if (augmented_args.filename.size < 
sizeof(augmented_args.filename.value)) {
len -= sizeof(augmented_args.filename.value) - 
augmented_args.filename.size;
len &= sizeof(augmented_args.filename.value) - 1;
}
} else {

I use 6 to mean "hey, this syscall doesn't have any string argument, don't
bother with it".

- Arnaldo


Help with the BPF verifier

2018-11-01 Thread Arnaldo Carvalho de Melo
tl;dr: I seem to be trying to get past clang optimizations that get the
   verifier to accept my proggie.

Hi,

So I'm moving to use raw_syscalls:sys_exit to collect pointer
contents, using maps to tell the bpf program what to copy, how many
bytes, filters, etc.

I'm at the start of it at this point I need to use an index to
get to the right syscall arg that is a filename, starting just with
"open" and "openat", that have the filename in different args, so to get
this first part working I'm doing it directly in the bpf restricted C
program, later this will be to maps, etc, so if I set the index as a
constant, just for testing, it works, look at the "open" and "openat"
calls below, later we'll see why openat is failing to augment its
"filename" arg while "open" works:

[root@seventh perf]# trace -e tools/perf/examples/bpf/augmented_raw_syscalls.c 
sleep 1
 ? ( ): sleep/10152  ... [continued]: execve()) = 0
 0.045 ( 0.004 ms): sleep/10152 brk() = 0x55ccff356000
 0.074 ( 0.007 ms): sleep/10152 access(filename: , mode: R) = -1 ENOENT No 
such file or directory
 0.089 ( 0.006 ms): sleep/10152 openat(dfd: CWD, filename: , flags: 
CLOEXEC) = 3
 0.097 ( 0.003 ms): sleep/10152 fstat(fd: 3, statbuf: 0x7ffecdd283f0) = 0
 0.103 ( 0.006 ms): sleep/10152 mmap(len: 103334, prot: READ, flags: 
PRIVATE, fd: 3) = 0x7f8ffee9c000
 0.111 ( 0.002 ms): sleep/10152 close(fd: 3) = 0
 0.135 ( 0.007 ms): sleep/10152 openat(dfd: CWD, filename: , flags: 
CLOEXEC) = 3
 0.144 ( 0.003 ms): sleep/10152 read(fd: 3, buf: 0x7ffecdd285b8, count: 
832) = 832
 0.150 ( 0.002 ms): sleep/10152 fstat(fd: 3, statbuf: 0x7ffecdd28450) = 0
 0.155 ( 0.005 ms): sleep/10152 mmap(len: 8192, prot: READ|WRITE, flags: 
PRIVATE|ANONYMOUS) = 0x7f8ffee9a000
 0.166 ( 0.007 ms): sleep/10152 mmap(len: 3889792, prot: EXEC|READ, flags: 
PRIVATE|DENYWRITE, fd: 3) = 0x7f8ffe8dc000
 0.175 ( 0.010 ms): sleep/10152 mprotect(start: 0x7f8ffea89000, len: 
2093056) = 0
 0.188 ( 0.010 ms): sleep/10152 mmap(addr: 0x7f8ffec88000, len: 24576, 
prot: READ|WRITE, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 1753088) = 
0x7f8ffec88000
 0.204 ( 0.005 ms): sleep/10152 mmap(addr: 0x7f8ffec8e000, len: 14976, 
prot: READ|WRITE, flags: PRIVATE|FIXED|ANONYMOUS) = 0x7f8ffec8e000
 0.218 ( 0.002 ms): sleep/10152 close(fd: 3) = 0
 0.239 ( 0.002 ms): sleep/10152 arch_prctl(option: 4098, arg2: 
140256433779968) = 0
 0.312 ( 0.009 ms): sleep/10152 mprotect(start: 0x7f8ffec88000, len: 16384, 
prot: READ) = 0
 0.343 ( 0.005 ms): sleep/10152 mprotect(start: 0x55ccff1c6000, len: 4096, 
prot: READ) = 0
 0.354 ( 0.006 ms): sleep/10152 mprotect(start: 0x7f8ffeeb6000, len: 4096, 
prot: READ) = 0
 0.362 ( 0.019 ms): sleep/10152 munmap(addr: 0x7f8ffee9c000, len: 103334) = 0
 0.476 ( 0.002 ms): sleep/10152 brk() = 0x55ccff356000
 0.480 ( 0.004 ms): sleep/10152 brk(brk: 0x55ccff377000) = 0x55ccff377000
 0.487 ( 0.002 ms): sleep/10152 brk() = 0x55ccff377000
 0.497 ( 0.008 ms): sleep/10152 open(filename: 
/usr/lib/locale/locale-archive, flags: CLOEXEC) = 3
 0.507 ( 0.002 ms): sleep/10152 fstat(fd: 3, statbuf: 0x7f8ffec8daa0) = 0
 0.511 ( 0.006 ms): sleep/10152 mmap(len: 113045344, prot: READ, flags: 
PRIVATE, fd: 3) = 0x7f8ff7d0d000
 0.524 ( 0.002 ms): sleep/10152 close(fd: 3) = 0
 0.574 (1000.140 ms): sleep/10152 nanosleep(rqtp: 0x7ffecdd29130) = 0
  1000.753 ( 0.007 ms): sleep/10152 close(fd: 1) = 0
  1000.767 ( 0.004 ms): sleep/10152 close(fd: 2) = 0
  1000.781 ( ): sleep/10152 exit_group()
[root@seventh perf]# 

 1  // SPDX-License-Identifier: GPL-2.0
 2  /*
 3   * Augment the raw_syscalls tracepoints with the contents of the 
pointer arguments.
 4   *
 5   * Test it with:
 6   *
 7   * perf trace -e tools/perf/examples/bpf/augmented_raw_syscalls.c cat 
/etc/passwd > /dev/null
 8   *
 9   * This exactly matches what is marshalled into the 
raw_syscall:sys_enter
10   * payload expected by the 'perf trace' beautifiers.
11   *
12   * For now it just uses the existing tracepoint augmentation code in 
'perf
13   * trace', in the next csets we'll hook up these with the 
sys_enter/sys_exit
14   * code that will combine entry/exit in a strace like way.
15   */
   
16  #include 
17  #include 
   
18  /* bpf-output associated map */
19  struct bpf_map SEC("maps") __augmented_syscalls__ = {
20  .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
21  .key_size = sizeof(int),
22  .value_size = sizeof(u32),
23  .max_entries = __NR_CPUS__,
24  };
   
25  struct syscall_enter_args {
26  unsigned long long common_tp_fields;
27  long   syscall_nr;
28  unsigned long  args[6];
29  };
   
30  struct syscall_exit_args {
31  unsigned long long common_tp_fields;
32  

Re: [PATCH bpf] libbpf: Fix compile error in libbpf_attach_type_by_name

2018-10-31 Thread Arnaldo Carvalho de Melo
Em Wed, Oct 31, 2018 at 12:57:18PM -0700, Andrey Ignatov escreveu:
> Arnaldo Carvalho de Melo reported build error in libbpf when clang
> version 3.8.1-24 (tags/RELEASE_381/final) is used:
> 
> libbpf.c:2201:36: error: comparison of constant -22 with expression of
> type 'const enum bpf_attach_type' is always false
> [-Werror,-Wtautological-constant-out-of-range-compare]
> if (section_names[i].attach_type == -EINVAL)
>  ^  ~~~
> 1 error generated.
> 
> Fix the error by keeping "is_attachable" property of a program in a
> separate struct field instead of trying to use attach_type itself.

Thanks, now it builds in all the previously failing systems:

# export PERF_TARBALL=http://192.168.86.4/perf/perf-4.19.0.tar.xz
# dm debian:9 fedora:25 fedora:26 fedora:27 ubuntu:16.04 ubuntu:17.10
   1 debian:9: Ok   gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516 
 clang version 3.8.1-24 (tags/RELEASE_381/final)
   2 fedora:25   : Ok   gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1)  
 clang version 3.9.1 (tags/RELEASE_391/final)
   3 fedora:26   : Ok   gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2)  
 clang version 4.0.1 (tags/RELEASE_401/final)
   4 fedora:27   : Ok   gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6)  
 clang version 5.0.2 (tags/RELEASE_502/final)
   5 ubuntu:16.04: Ok   gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609 
 clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final)
   6 ubuntu:17.10: Ok   gcc (Ubuntu 7.2.0-8ubuntu3.2) 7.2.0 
 clang version 4.0.1-6 (tags/RELEASE_401/final)
#

Tested-by: Arnaldo Carvalho de Melo 

I also have it tentatively applied to my perf/urgent branch, that I'll
push upstream soon.

- Arnaldo
 
> Fixes: commit 956b620fcf0b ("libbpf: Introduce libbpf_attach_type_by_name")
> Reported-by: Arnaldo Carvalho de Melo 
> Signed-off-by: Andrey Ignatov 
> ---
>  tools/lib/bpf/libbpf.c | 13 +++--
>  1 file changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index b607be7236d3..d6e62e90e8d4 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -2084,19 +2084,19 @@ void bpf_program__set_expected_attach_type(struct 
> bpf_program *prog,
>   prog->expected_attach_type = type;
>  }
>  
> -#define BPF_PROG_SEC_IMPL(string, ptype, eatype, atype) \
> - { string, sizeof(string) - 1, ptype, eatype, atype }
> +#define BPF_PROG_SEC_IMPL(string, ptype, eatype, is_attachable, atype) \
> + { string, sizeof(string) - 1, ptype, eatype, is_attachable, atype }
>  
>  /* Programs that can NOT be attached. */
> -#define BPF_PROG_SEC(string, ptype) BPF_PROG_SEC_IMPL(string, ptype, 0, 
> -EINVAL)
> +#define BPF_PROG_SEC(string, ptype) BPF_PROG_SEC_IMPL(string, ptype, 0, 0, 0)
>  
>  /* Programs that can be attached. */
>  #define BPF_APROG_SEC(string, ptype, atype) \
> - BPF_PROG_SEC_IMPL(string, ptype, 0, atype)
> + BPF_PROG_SEC_IMPL(string, ptype, 0, 1, atype)
>  
>  /* Programs that must specify expected attach type at load time. */
>  #define BPF_EAPROG_SEC(string, ptype, eatype) \
> - BPF_PROG_SEC_IMPL(string, ptype, eatype, eatype)
> + BPF_PROG_SEC_IMPL(string, ptype, eatype, 1, eatype)
>  
>  /* Programs that can be attached but attach type can't be identified by 
> section
>   * name. Kept for backward compatibility.
> @@ -2108,6 +2108,7 @@ static const struct {
>   size_t len;
>   enum bpf_prog_type prog_type;
>   enum bpf_attach_type expected_attach_type;
> + int is_attachable;
>   enum bpf_attach_type attach_type;
>  } section_names[] = {
>   BPF_PROG_SEC("socket",  BPF_PROG_TYPE_SOCKET_FILTER),
> @@ -2198,7 +2199,7 @@ int libbpf_attach_type_by_name(const char *name,
>   for (i = 0; i < ARRAY_SIZE(section_names); i++) {
>   if (strncmp(name, section_names[i].sec, section_names[i].len))
>   continue;
> - if (section_names[i].attach_type == -EINVAL)
> + if (!section_names[i].is_attachable)
>   return -EINVAL;
>   *attach_type = section_names[i].attach_type;
>   return 0;
> -- 
> 2.17.1


libbpf build failure on debian:9 with clang

2018-10-31 Thread Arnaldo Carvalho de Melo
  1740.66 debian:9  : FAIL gcc (Debian 6.3.0-18+deb9u1) 
6.3.0 20170516

The failure was with clang tho:

clang version 3.8.1-24 (tags/RELEASE_381/final)

With:

  gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1) 

it built without any warnings/errors.

  CC   /tmp/build/perf/libbpf.o
libbpf.c:2201:36: error: comparison of constant -22 with expression of type 
'const enum bpf_attach_type' is always false 
[-Werror,-Wtautological-constant-out-of-range-compare]
if (section_names[i].attach_type == -EINVAL)
 ^  ~~~
1 error generated.
  CC   /tmp/build/perf/help.o
mv: cannot stat '/tmp/build/perf/.libbpf.o.tmp': No such file or directory
/git/linux/tools/build/Makefile.build:96: recipe for target 
'/tmp/build/perf/libbpf.o' failed
make[4]: *** [/tmp/build/perf/libbpf.o] Error 1

This is the cset:

commit 956b620fcf0b64de403cd26a56bc41e6e4826ea6
Author: Andrey Ignatov 
Date:   Wed Sep 26 15:24:53 2018 -0700

libbpf: Introduce libbpf_attach_type_by_name



Tests are continuing, so far:

   143.53 alpine:3.4: Ok   gcc (Alpine 5.3.0) 5.3.0
   258.62 alpine:3.5: Ok   gcc (Alpine 6.2.1) 6.2.1 
20160822
   351.62 alpine:3.6: Ok   gcc (Alpine 6.3.0) 6.3.0
   451.68 alpine:3.7: Ok   gcc (Alpine 6.4.0) 6.4.0
   549.38 alpine:3.8: Ok   gcc (Alpine 6.4.0) 6.4.0
   679.07 alpine:edge   : Ok   gcc (Alpine 6.4.0) 6.4.0
   763.35 amazonlinux:1 : Ok   gcc (GCC) 4.8.5 20150623 
(Red Hat 4.8.5-28)
   859.65 amazonlinux:2 : Ok   gcc (GCC) 7.3.1 20180303 
(Red Hat 7.3.1-5)
   947.39 android-ndk:r12b-arm  : Ok   arm-linux-androideabi-gcc 
(GCC) 4.9.x 20150123 (prerelease)
  1050.64 android-ndk:r15c-arm  : Ok   arm-linux-androideabi-gcc 
(GCC) 4.9.x 20150123 (prerelease)
  1128.75 centos:5  : Ok   gcc (GCC) 4.1.2 20080704 
(Red Hat 4.1.2-55)
  1233.26 centos:6  : Ok   gcc (GCC) 4.4.7 20120313 
(Red Hat 4.4.7-23)
  1343.16 centos:7  : Ok   gcc (GCC) 4.8.5 20150623 
(Red Hat 4.8.5-28)
  1473.61 clearlinux:latest : FAIL gcc (Clear Linux OS for 
Intel Architecture) 8.2.1 20180502
  1545.56 debian:7  : Ok   gcc (Debian 4.7.2-5) 4.7.2
  1645.53 debian:8  : Ok   gcc (Debian 4.9.2-10+deb8u1) 
4.9.2
  1740.66 debian:9  : FAIL gcc (Debian 6.3.0-18+deb9u1) 
6.3.0 20170516
  18   113.19 debian:experimental   : Ok   gcc (Debian 8.2.0-8) 8.2.0
  1941.48 debian:experimental-x-arm64   : Ok   aarch64-linux-gnu-gcc 
(Debian 8.2.0-7) 8.2.0
  2041.51 debian:experimental-x-mips: Ok   mips-linux-gnu-gcc (Debian 
8.2.0-7) 8.2.0
  2140.09 debian:experimental-x-mips64  : Ok   mips64-linux-gnuabi64-gcc 
(Debian 8.1.0-12) 8.1.0
  2242.17 debian:experimental-x-mipsel  : Ok   mipsel-linux-gnu-gcc (Debian 
8.2.0-7) 8.2.0
  2340.02 fedora:20 : Ok   gcc (GCC) 4.8.3 20140911 
(Red Hat 4.8.3-7)
  2445.47 fedora:21 : Ok   gcc (GCC) 4.9.2 20150212 
(Red Hat 4.9.2-6)
  2541.64 fedora:22 : Ok   gcc (GCC) 5.3.1 20160406 
(Red Hat 5.3.1-6)
  2643.60 fedora:23 : Ok   gcc (GCC) 5.3.1 20160406 
(Red Hat 5.3.1-6)
  2744.04 fedora:24 : Ok   gcc (GCC) 6.3.1 20161221 
(Red Hat 6.3.1-1)
  2837.21 fedora:24-x-ARC-uClibc: Ok   arc-linux-gcc (ARCompact ISA 
Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710

The problem with clearlinux is unrelated:

clang-7: error: unknown argument: '-fno-semantic-interposition'
clang-7: error: unsupported argument '4' to option 'flto='
clang-7: error: optimization flag '-ffat-lto-objects' is not supported 
[-Werror,-Wignored-optimization-argument]



Re: [PATCH bpf-next 2/3] bpf: emit RECORD_MMAP events for bpf prog load/unload

2018-10-17 Thread Arnaldo Carvalho de Melo
Em Wed, Oct 17, 2018 at 07:08:37PM +, Alexei Starovoitov escreveu:
> On 10/17/18 11:53 AM, Arnaldo Carvalho de Melo wrote:
> > Em Wed, Oct 17, 2018 at 04:36:08PM +, Alexei Starovoitov escreveu:
> >> On 10/17/18 8:09 AM, David Ahern wrote:
> >>> On 10/16/18 11:43 PM, Song Liu wrote:
> >>>> I agree that processing events while recording has significant overhead.
> >>>> In this case, perf user space need to know details about the the jited 
> >>>> BPF
> >>>> program. It is impossible to pass all these details to user space through
> >>>> the relatively stable ring_buffer API. Therefore, some processing of the
> >>>> data is necessary (get bpf prog_id from ring buffer, and then fetch 
> >>>> program
> >>>> details via BPF_OBJ_GET_INFO_BY_FD.
> >>>>
> >>>> I have some idea on processing important data with relatively low 
> >>>> overhead.
> >>>> Let me try implement it.
> >>>>
> >>>
> >>> As I understand it, you want this series:
> >>>
> >>>  kernel: add event to perf buffer on bpf prog load
> >>>
> >>>  userspace: perf reads the event and grabs information about the program
> >>> from the fd
> >>>
> >>> Is that correct?
> >>>
> >>> Userpsace is not awakened immediately when an event is added the the
> >>> ring. It is awakened once the number of events crosses a watermark. That
> >>> means there is an unknown - and potentially long - time window where the
> >>> program can be unloaded before perf reads the event.
> >
> >>> So no matter what you do expecting perf record to be able to process the
> >>> event quickly is an unreasonable expectation.
> >
> >> yes... unless we go with threaded model as Arnaldo suggested and use
> >> single event as a watermark to wakeup our perf thread.
> >> In such case there is still a race window between user space waking up
> >> and doing _single_ bpf_get_fd_from_id() call to hold that prog
> >> and some other process trying to instantly unload the prog it
> >> just loaded.
> >> I think such race window is extremely tiny and if perf misses
> >> those load/unload events it's a good thing, since there is no chance
> >> that normal pmu event samples would be happening during prog execution.

> >> The alternative approach with no race window at all is to burden kernel
> >> RECORD_* events with _all_ information about bpf prog. Which is jited
> >> addresses, jited image itself, info about all subprogs, info about line
> >> info, all BTF data, etc. As I said earlier I'm strongly against such
> >> RECORD_* bloating.
> >> Instead we need to find a way to process new RECORD_BPF events with
> >> single prog_id field in perf user space with minimal race
> >> and threaded approach sounds like a win to me.

> > There is another alternative, I think: put just a content based hash,
> > like a git commit id into a PERF_RECORD_MMAP3 new record, and when the
> > validator does the jit, etc, it stashes the content that
> > BPF_OBJ_GET_INFO_BY_FD would get somewhere, some filesystem populated by
> > the kernel right after getting stuff from sys_bpf() and preparing it for
> > use, then we know that in (start, end) we have blob foo with content id,
> > that we will use to retrieve information that augments what we know with
> > just (start, end, id) and allows annotation, etc.

> > That stash space for jitted stuff gets garbage collected from time to
> > time or is even completely disabled if the user is not interested in
> > such augmentation, just like one can do disabling perf's ~/.debug/
> > directory hashed by build-id.

> > I think with this we have no races, the PERF_RECORD_MMAP3 gets just what
> > is in PERF_RECORD_MMAP2 plus some extra 20 bytes for such content based
> > cookie and we solve the other race we already have with kernel modules,
> > DSOs, etc.

> > I have mentioned this before, there were objections, perhaps this time I
> > formulated in a different way that makes it more interesting?
 
> that 'content based hash' we already have. It's called program tag.

But that was calculated by whom? Userspace? It can't do that, its the
kernel that ultimately puts together, from what userspace gave it, what
we need to do performance analysis, line numbers, etc.

> and we already taught iovisor/bcc to stash that stuff into
> /var/tmp/bcc/bpf_prog_TAG/ directory.
> Unfortunately that appr

Re: [PATCH bpf-next 2/3] bpf: emit RECORD_MMAP events for bpf prog load/unload

2018-10-17 Thread Arnaldo Carvalho de Melo
Em Wed, Oct 17, 2018 at 04:36:08PM +, Alexei Starovoitov escreveu:
> On 10/17/18 8:09 AM, David Ahern wrote:
> > On 10/16/18 11:43 PM, Song Liu wrote:
> >> I agree that processing events while recording has significant overhead.
> >> In this case, perf user space need to know details about the the jited BPF
> >> program. It is impossible to pass all these details to user space through
> >> the relatively stable ring_buffer API. Therefore, some processing of the
> >> data is necessary (get bpf prog_id from ring buffer, and then fetch program
> >> details via BPF_OBJ_GET_INFO_BY_FD.
> >>
> >> I have some idea on processing important data with relatively low overhead.
> >> Let me try implement it.
> >>
> >
> > As I understand it, you want this series:
> >
> >  kernel: add event to perf buffer on bpf prog load
> >
> >  userspace: perf reads the event and grabs information about the program
> > from the fd
> >
> > Is that correct?
> >
> > Userpsace is not awakened immediately when an event is added the the
> > ring. It is awakened once the number of events crosses a watermark. That
> > means there is an unknown - and potentially long - time window where the
> > program can be unloaded before perf reads the event.

> > So no matter what you do expecting perf record to be able to process the
> > event quickly is an unreasonable expectation.
 
> yes... unless we go with threaded model as Arnaldo suggested and use
> single event as a watermark to wakeup our perf thread.
> In such case there is still a race window between user space waking up
> and doing _single_ bpf_get_fd_from_id() call to hold that prog
> and some other process trying to instantly unload the prog it
> just loaded.
> I think such race window is extremely tiny and if perf misses
> those load/unload events it's a good thing, since there is no chance
> that normal pmu event samples would be happening during prog execution.
 
> The alternative approach with no race window at all is to burden kernel
> RECORD_* events with _all_ information about bpf prog. Which is jited
> addresses, jited image itself, info about all subprogs, info about line
> info, all BTF data, etc. As I said earlier I'm strongly against such
> RECORD_* bloating.
> Instead we need to find a way to process new RECORD_BPF events with
> single prog_id field in perf user space with minimal race
> and threaded approach sounds like a win to me.

There is another alternative, I think: put just a content based hash,
like a git commit id into a PERF_RECORD_MMAP3 new record, and when the
validator does the jit, etc, it stashes the content that
BPF_OBJ_GET_INFO_BY_FD would get somewhere, some filesystem populated by
the kernel right after getting stuff from sys_bpf() and preparing it for
use, then we know that in (start, end) we have blob foo with content id,
that we will use to retrieve information that augments what we know with
just (start, end, id) and allows annotation, etc.

That stash space for jitted stuff gets garbage collected from time to
time or is even completely disabled if the user is not interested in
such augmentation, just like one can do disabling perf's ~/.debug/
directory hashed by build-id.

I think with this we have no races, the PERF_RECORD_MMAP3 gets just what
is in PERF_RECORD_MMAP2 plus some extra 20 bytes for such content based
cookie and we solve the other race we already have with kernel modules,
DSOs, etc.

I have mentioned this before, there were objections, perhaps this time I
formulated in a different way that makes it more interesting?

- Arnaldo


Re: [PATCH bpf-next 0/3] improve and fix barriers for walking perf rb

2018-10-17 Thread Arnaldo Carvalho de Melo
Em Wed, Oct 17, 2018 at 04:41:53PM +0200, Daniel Borkmann escreveu:
> This set first adds smp_* barrier variants to tools infrastructure
> and in a second step updates perf and libbpf to make use of them.
> For details, please see individual patches, thanks!
> 
> Arnaldo, if there are no objections, could this be routed via bpf-next
> with Acked-by's due to later dependencies in libbpf? Alternatively,
> I could also get the 2nd patch out during merge window, but perhaps
> it's okay to do in one go as there shouldn't be much conflict in perf.

Right, when updating kernel/events/ring_buffer.c the corresponding
code in tools/ should've been changed :-)

Acked-by: Arnaldo Carvalho de Melo 

- Arnaldo
 
> Thanks!
> 
> Daniel Borkmann (3):
>   tools: add smp_* barrier variants to include infrastructure
>   tools, perf: use smp_{rmb,mb} barriers instead of {rmb,mb}
>   bpf, libbpf: use proper barriers in perf ring buffer walk
> 
>  tools/arch/arm64/include/asm/barrier.h | 10 ++
>  tools/arch/x86/include/asm/barrier.h   |  9 ++---
>  tools/include/asm/barrier.h| 11 +++
>  tools/lib/bpf/libbpf.c | 25 +++--
>  tools/perf/util/mmap.h |  5 +++--
>  5 files changed, 49 insertions(+), 11 deletions(-)
> 
> -- 
> 2.9.5


Re: [PATCH bpf-next 2/3] bpf: emit RECORD_MMAP events for bpf prog load/unload

2018-10-17 Thread Arnaldo Carvalho de Melo
Em Wed, Oct 17, 2018 at 09:11:40AM -0300, Arnaldo Carvalho de Melo escreveu:
> Adding Alexey, Jiri and Namhyung as they worked/are working on
> multithreading 'perf record'.
> 
> Em Tue, Oct 16, 2018 at 11:43:11PM -0700, Song Liu escreveu:
> > On Tue, Oct 16, 2018 at 4:43 PM David Ahern  wrote:
> > > On 10/15/18 4:33 PM, Song Liu wrote:
> > > > I am working with Alexei on the idea of fetching BPF program 
> > > > information via
> > > > BPF_OBJ_GET_INFO_BY_FD cmd. I added PERF_RECORD_BPF_EVENT
> > > > to perf_event_type, and dumped these events to perf event ring buffer.
> 
> > > > I found that perf will not process event until the end of perf-record:
> 
> > > > root@virt-test:~# ~/perf record -ag -- sleep 10
> > > > .. 10 seconds later
> > > > [ perf record: Woken up 34 times to write data ]
> > > > machine__process_bpf_event: prog_id 6 loaded
> > > > machine__process_bpf_event: prog_id 6 unloaded
> > > > [ perf record: Captured and wrote 9.337 MB perf.data (93178 samples) ]
> 
> > > > In this example, the bpf program was loaded and then unloaded in
> > > > another terminal. When machine__process_bpf_event() processes
> > > > the load event, the bpf program is already unloaded. Therefore,
> > > > machine__process_bpf_event() will not be able to get information
> > > > about the program via BPF_OBJ_GET_INFO_BY_FD cmd.
> 
> > > > To solve this problem, we will need to run BPF_OBJ_GET_INFO_BY_FD
> > > > as soon as perf get the event from kernel. I looked around the perf
> > > > code for a while. But I haven't found a good example where some
> > > > events are processed before the end of perf-record. Could you
> > > > please help me with this?
> 
> > > perf record does not process events as they are generated. Its sole job
> > > is pushing data from the maps to a file as fast as possible meaning in
> > > bulk based on current read and write locations.
> 
> > > Adding code to process events will add significant overhead to the
> > > record command and will not really solve your race problem.
> 
> > I agree that processing events while recording has significant overhead.
> > In this case, perf user space need to know details about the the jited BPF
> > program. It is impossible to pass all these details to user space through
> > the relatively stable ring_buffer API. Therefore, some processing of the
> > data is necessary (get bpf prog_id from ring buffer, and then fetch program
> > details via BPF_OBJ_GET_INFO_BY_FD.
>  
> > I have some idea on processing important data with relatively low overhead.
> > Let me try implement it.
> 
> Well, you could have a separate thread processing just those kinds of
> events, associate it with a dummy event where you only ask for
> PERF_RECORD_BPF_EVENTs.
> 
> Here is how to setup the PERF_TYPE_SOFTWARE/PERF_COUNT_SW_DUMMY
> perf_event_attr:
> 
> [root@seventh ~]# perf record -vv -e dummy sleep 01
> 
> perf_event_attr:
>   type 1
>   size 112
>   config   0x9
>   { sample_period, sample_freq }   4000
>   sample_type  IP|TID|TIME|PERIOD
>   disabled 1
>   inherit  1

These you would have disabled, no need for
PERF_RECORD_{MMAP*,COMM,FORK,EXIT} just PERF_RECORD_BPF_EVENT

>   mmap 1
>   comm 1
>   task 1
>   mmap21
>   comm_exec1


>   freq 1
>   enable_on_exec   1
>   sample_id_all1
>   exclude_guest1
> 
> sys_perf_event_open: pid 12046  cpu 0  group_fd -1  flags 0x8 = 4
> sys_perf_event_open: pid 12046  cpu 1  group_fd -1  flags 0x8 = 5
> sys_perf_event_open: pid 12046  cpu 2  group_fd -1  flags 0x8 = 6
> sys_perf_event_open: pid 12046  cpu 3  group_fd -1  flags 0x8 = 8
> mmap size 528384B
> perf event ring buffer mmapped per cpu
> Synthesizing TSC conversion information
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.014 MB perf.data ]
> [root@seventh ~]#
> 
> [root@seventh ~]# perf evlist -v
> dummy: type: 1, size: 112, config: 0x9, { sample_period, sample_freq }: 4000, 
> sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, 
> freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, 
> mmap2: 1, comm_exec: 1
> [root@seventh ~]# 
> 
> There is work ongoing in dumping one file per cpu and then, at post
> processing time merging all those files to get ordering, so one more
> file, for these VIP events, that require per-event processing would be
> ordered at that time with all the other per-cpu files.
> 
> - Arnaldo


Re: [PATCH bpf-next 2/3] bpf: emit RECORD_MMAP events for bpf prog load/unload

2018-10-17 Thread Arnaldo Carvalho de Melo
Adding Alexey, Jiri and Namhyung as they worked/are working on
multithreading 'perf record'.

Em Tue, Oct 16, 2018 at 11:43:11PM -0700, Song Liu escreveu:
> On Tue, Oct 16, 2018 at 4:43 PM David Ahern  wrote:
> > On 10/15/18 4:33 PM, Song Liu wrote:
> > > I am working with Alexei on the idea of fetching BPF program information 
> > > via
> > > BPF_OBJ_GET_INFO_BY_FD cmd. I added PERF_RECORD_BPF_EVENT
> > > to perf_event_type, and dumped these events to perf event ring buffer.

> > > I found that perf will not process event until the end of perf-record:

> > > root@virt-test:~# ~/perf record -ag -- sleep 10
> > > .. 10 seconds later
> > > [ perf record: Woken up 34 times to write data ]
> > > machine__process_bpf_event: prog_id 6 loaded
> > > machine__process_bpf_event: prog_id 6 unloaded
> > > [ perf record: Captured and wrote 9.337 MB perf.data (93178 samples) ]

> > > In this example, the bpf program was loaded and then unloaded in
> > > another terminal. When machine__process_bpf_event() processes
> > > the load event, the bpf program is already unloaded. Therefore,
> > > machine__process_bpf_event() will not be able to get information
> > > about the program via BPF_OBJ_GET_INFO_BY_FD cmd.

> > > To solve this problem, we will need to run BPF_OBJ_GET_INFO_BY_FD
> > > as soon as perf get the event from kernel. I looked around the perf
> > > code for a while. But I haven't found a good example where some
> > > events are processed before the end of perf-record. Could you
> > > please help me with this?

> > perf record does not process events as they are generated. Its sole job
> > is pushing data from the maps to a file as fast as possible meaning in
> > bulk based on current read and write locations.

> > Adding code to process events will add significant overhead to the
> > record command and will not really solve your race problem.

> I agree that processing events while recording has significant overhead.
> In this case, perf user space need to know details about the the jited BPF
> program. It is impossible to pass all these details to user space through
> the relatively stable ring_buffer API. Therefore, some processing of the
> data is necessary (get bpf prog_id from ring buffer, and then fetch program
> details via BPF_OBJ_GET_INFO_BY_FD.
 
> I have some idea on processing important data with relatively low overhead.
> Let me try implement it.

Well, you could have a separate thread processing just those kinds of
events, associate it with a dummy event where you only ask for
PERF_RECORD_BPF_EVENTs.

Here is how to setup the PERF_TYPE_SOFTWARE/PERF_COUNT_SW_DUMMY
perf_event_attr:

[root@seventh ~]# perf record -vv -e dummy sleep 01

perf_event_attr:
  type 1
  size 112
  config   0x9
  { sample_period, sample_freq }   4000
  sample_type  IP|TID|TIME|PERIOD
  disabled 1
  inherit  1
  mmap 1
  comm 1
  freq 1
  enable_on_exec   1
  task 1
  sample_id_all1
  exclude_guest1
  mmap21
  comm_exec1

sys_perf_event_open: pid 12046  cpu 0  group_fd -1  flags 0x8 = 4
sys_perf_event_open: pid 12046  cpu 1  group_fd -1  flags 0x8 = 5
sys_perf_event_open: pid 12046  cpu 2  group_fd -1  flags 0x8 = 6
sys_perf_event_open: pid 12046  cpu 3  group_fd -1  flags 0x8 = 8
mmap size 528384B
perf event ring buffer mmapped per cpu
Synthesizing TSC conversion information
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.014 MB perf.data ]
[root@seventh ~]#

[root@seventh ~]# perf evlist -v
dummy: type: 1, size: 112, config: 0x9, { sample_period, sample_freq }: 4000, 
sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, 
freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 
1, comm_exec: 1
[root@seventh ~]# 

There is work ongoing in dumping one file per cpu and then, at post
processing time merging all those files to get ordering, so one more
file, for these VIP events, that require per-event processing would be
ordered at that time with all the other per-cpu files.

- Arnaldo


Re: [PATCH bpf-next] bpf: emit audit messages upon successful prog load and unload

2018-10-05 Thread Arnaldo Carvalho de Melo
Em Fri, Oct 05, 2018 at 11:44:35AM -0700, Alexei Starovoitov escreveu:
> On Fri, Oct 05, 2018 at 08:14:09AM +0200, Jiri Olsa wrote:
> > On Thu, Oct 04, 2018 at 03:10:15PM -0700, Alexei Starovoitov wrote:
> > > On Thu, Oct 04, 2018 at 10:22:31PM +0200, Jesper Dangaard Brouer wrote:
> > > > My use-case is to 24/7 collect and keep records in userspace, and have a
> > > > timeline of these notifications, for later retrieval.  The idea is that
> > > > our support engineers can look at these records when troubleshooting
> > > > the system.  And the plan is also to collect these records as part of
> > > > our sosreport tool, which is part of the support case.

> > > I don't think you're implying that prog load/unload should be spamming 
> > > dmesg
> > > and auditd not even running...

> > I think the problem Jesper implied is that in order to collect
> > those logs you'll need perf tool running all the time.. which
> > it's not equipped for yet

> I'm not proposing to run 'perf' binary all the time.

I think Jiri just said that one would have to run something all the time
to get all the records, see below

> Setting up perf ring buffer just for these new bpf prog load/unload events
> and epolling it is simple enough to do from any application including auditd.
> selftests/bpf/ do it for bpf output events.

I think he is talking about the preexisting loaded BPF programs. We have
the same problem with mmaps, where the perf tool will, with races,
enumerate the existing mmaps as PERF_RECORD_MMAP synthesized from
/proc/PIDS/smaps.

There was talk in the past to ask the kernel to emit PERF_RECORD_MMAP
into the ring buffer for those pre-existing entries, reducing a bit the
races, but as there doesn't seem to have a good way of doing it, we
continued with the synthesizing from procfs.

Is there a way for us to synthesize those prog load/unload for
preexisting loaded bpf objects?

- Arnaldo


Re: [PATCH bpf-next 2/3] bpf: emit RECORD_MMAP events for bpf prog load/unload

2018-09-21 Thread Arnaldo Carvalho de Melo
Em Thu, Sep 20, 2018 at 08:14:46PM -0700, Alexei Starovoitov escreveu:
> On Thu, Sep 20, 2018 at 03:56:51PM +0200, Peter Zijlstra wrote:
> > On Thu, Sep 20, 2018 at 10:25:45AM -0300, Arnaldo Carvalho de Melo wrote:
> > > PeterZ provided a patch introducing PERF_RECORD_MUNMAP, went nowhere due
> > > to having to cope with munmapping parts of existing mmaps, etc.
> > > 
> > > I'm still more in favour of introduce PERF_RECORD_MUNMAP, even if for
> > > now it would be used just in this clean case for undoing a
> > > PERF_RECORD_MMAP for a BPF program.
> > > 
> > > The ABI is already complicated, starting to use something called
> > > PERF_RECORD_MMAP for unmmaping by just using a NULL name... too clever,
> > > I think.
> > 
> > Agreed, the PERF_RECORD_MUNMAP patch was fairly trivial, the difficult
> > part was getting the perf tool to dtrt for that use-case. But if we need
> > unmap events, doing the unmap record now is the right thing.
> 
> Thanks for the pointers!
> The answer is a bit long. pls bear with me.

Ditto with me :-)
 
> I have considered adding MUNMAP to match existing MMAP, but went
> without it because I didn't want to introduce new bit in perf_event_attr
> and emit these new events in a misbalanced conditional way for prog 
> load/unload.
> Like old perf is asking kernel for mmap events via mmap bit, so prog load 
> events

By prog load events you mean that old perf, having perf_event_attr.mmap = 1 ||
perf_event_attr.mmap2 = 1 will cause the new kernel to emit
PERF_RECORD_MMAP records for the range of addresses that a BPF program
is being loaded on, right?

> will be in perf.data, but old perf report won't recognize them anyway.

Why not? It should lookup the symbol and find it in the rb_tree of maps,
with a DSO name equal to what was in the PERF_RECORD_MMAP emitted by the
BPF core, no? It'll be an unresolved symbol, but a resolved map.

> Whereas new perf would certainly want to catch bpf events and will set
> both mmap and mumap bits.

new perf with your code will find a symbol, not a map, because your code
catches a special case PERF_RECORD_MMAP and instead of creating a
'struct map' will create a 'struct symbol' and insert it in the kallsyms
'struct map', right?
 
> Then if I add MUNMAP event without new bit and emit MMAP/MUNMAP
> conditionally based on single mmap bit they will confuse old perf
> and it will print warning about 'unknown events'.

That is unfortunate and I'll turn that part into a pr_debug()
 
> Both situations are ugly, hence I went with reuse of MMAP event
> for both load/unload.

So, its doubly odd, i.e. MMAP used for mmap() and for munmap() and the
effects in the tooling is not to create or remove a 'struct map', but to
alter an existing symbol table for the kallsyms map.

> In such case old perf silently ignores them. Which is what I wanted.

In theory the old perf should catch the PERF_RECORD_MMAP with a string
in the filename part and insert a new map into the kernel mmap rb_tree,
and then samples would be resolved to this map, but since there is no
backing DSO with a symtab, it would stop at that, just stating that the
map is called NAME-OF-BPF-PROGRAM. This is all from memory, possibly
there is something in there that makes it ignore this PERF_RECORD_MMAP
emitted by the BPF kernel code when loading a new program.

> When we upgrade the kernel we cannot synchronize the kernel upgrade
> (or downgrade) with user space perf package upgrade.

sure

> Hence not confusing old perf is important.

Thanks for trying to achieve that, and its a pity that that "unknown
record" message is a pr_warning or pr_info and not a pr_debug().

> With new kernel new bpf mmap events get into perf.data and
> new perf picks them up.
> 
> Few more considerations:
> 
> I consider synthetic perf events to be non-ABI. Meaning they're
> emitted by perf user space into perf.data and there is a convention
> on names, but it's not a kernel abi. Like RECORD_MMAP with
> event.filename == "[module_name]" is an indication for perf report
> to parse elf/build-id of dso==module_name.
> There is no such support in the kernel. Kernel doesn't emit
> such events for module load/unload. If in the future
> we decide to extend kernel with such events they don't have
> to match what user space perf does today.

Right, that is another unfortunate state of affairs, kernel module
load/unload should already be supported, reported by the kernel via a
proper PERF_RECORD_MODULE_LOAD/UNLOAD
 
> Why this is important? To get to next step.
> As Arnaldo pointed out this patch set is missing support for
> JITed prog annotations and displaying asm code. Absolutely correct.
> This set only helps perf to reveal the names of bpf progs that _were_
> running at the tim

Re: [PATCH perf 3/3] tools/perf: recognize and process RECORD_MMAP events for bpf progs

2018-09-20 Thread Arnaldo Carvalho de Melo
Em Thu, Sep 20, 2018 at 10:36:17AM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Wed, Sep 19, 2018 at 03:39:35PM -0700, Alexei Starovoitov escreveu:
> > Recognize JITed bpf prog load/unload events.
> > Add/remove kernel symbols accordingly.
> > 
> > Signed-off-by: Alexei Starovoitov 
> > ---
> >  tools/perf/util/machine.c | 27 +++
> >  tools/perf/util/symbol.c  | 13 +
> >  tools/perf/util/symbol.h  |  1 +
> >  3 files changed, 41 insertions(+)
> > 
> > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > index c4acd2001db0..ae4f8a0fdc7e 100644
> > --- a/tools/perf/util/machine.c
> > +++ b/tools/perf/util/machine.c
> > @@ -25,6 +25,7 @@
> >  #include "sane_ctype.h"
> >  #include 
> >  #include 
> > +#include 
> >  
> >  static void __machine__remove_thread(struct machine *machine, struct 
> > thread *th, bool lock);
> >  
> > @@ -1460,6 +1461,32 @@ static int machine__process_kernel_mmap_event(struct 
> > machine *machine,
> > enum dso_kernel_type kernel_type;
> > bool is_kernel_mmap;
> >  
> > +   /* process JITed bpf programs load/unload events */
> > +   if (event->mmap.pid == ~0u && event->mmap.tid == BPF_FS_MAGIC) {
> 
> 
> So, this would be in machine__process_kernel_munmap-event(machine), etc,
> no check for BPF_FS_MAGIC would be needed with a PERF_RECORD_MUNMAP.
> 
> > +   struct symbol *sym;
> > +   u64 ip;
> > +
> > +   map = map_groups__find(>kmaps, event->mmap.start);
> > +   if (!map) {
> > +   pr_err("No kernel map for IP %lx\n", event->mmap.start);
> > +   goto out_problem;
> > +   }
> > +   ip = event->mmap.start - map->start + map->pgoff;
> > +   if (event->mmap.filename[0]) {
> > +   sym = symbol__new(ip, event->mmap.len, 0, 0,
> > + event->mmap.filename);
> 
> Humm, so the bpf program would be just one symbol... bpf-to-bpf calls
> will be to a different bpf program, right? 
> 
> /me goes to read https://lwn.net/Articles/741773/
>  "[PATCH bpf-next 00/13] bpf: introduce function calls"

After reading it, yeah, I think we need some way to access a symbol
table for a BPF program, and also its binary so that we can do
annotation, static (perf annotate) and live (perf top), was this already
considered? I think one can get the binary for a program giving
sufficient perms somehow, right? One other thing I need to catch up 8-)

- Arnaldo
 
> > +   dso__insert_symbol(map->dso, sym);
> > +   } else {
> > +   if (symbols__erase(>dso->symbols, ip)) {
> > +   pr_err("No bpf prog at IP %lx/%lx\n",
> > +  event->mmap.start, ip);
> > +   goto out_problem;
> > +   }
> > +   dso__reset_find_symbol_cache(map->dso);
> > +   }
> > +   return 0;
> > +   }
> > +
> > /* If we have maps from kcore then we do not need or want any others */
> > if (machine__uses_kcore(machine))
> > return 0;
> > diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
> > index d188b7588152..0653f313661d 100644
> > --- a/tools/perf/util/symbol.c
> > +++ b/tools/perf/util/symbol.c
> > @@ -353,6 +353,19 @@ static struct symbol *symbols__find(struct rb_root 
> > *symbols, u64 ip)
> > return NULL;
> >  }
> >  
> > +int symbols__erase(struct rb_root *symbols, u64 ip)
> > +{
> > +   struct symbol *s;
> > +
> > +   s = symbols__find(symbols, ip);
> > +   if (!s)
> > +   return -ENOENT;
> > +
> > +   rb_erase(>rb_node, symbols);
> > +   symbol__delete(s);
> > +   return 0;
> > +}
> > +
> >  static struct symbol *symbols__first(struct rb_root *symbols)
> >  {
> > struct rb_node *n = rb_first(symbols);
> > diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
> > index f25fae4b5743..92ef31953d9a 100644
> > --- a/tools/perf/util/symbol.h
> > +++ b/tools/perf/util/symbol.h
> > @@ -310,6 +310,7 @@ char *dso__demangle_sym(struct dso *dso, int kmodule, 
> > const char *elf_name);
> >  
> >  void __symbols__insert(struct rb_root *symbols, struct symbol *sym, bool 
> > kernel);
> >  void symbols__insert(struct rb_root *symbols, struct symbol *sym);
> > +int symbols__erase(struct rb_root *symbols, u64 ip);
> >  void symbols__fixup_duplicate(struct rb_root *symbols);
> >  void symbols__fixup_end(struct rb_root *symbols);
> >  void map_groups__fixup_end(struct map_groups *mg);
> > -- 
> > 2.17.1


Re: [PATCH perf 3/3] tools/perf: recognize and process RECORD_MMAP events for bpf progs

2018-09-20 Thread Arnaldo Carvalho de Melo
Em Wed, Sep 19, 2018 at 03:39:35PM -0700, Alexei Starovoitov escreveu:
> Recognize JITed bpf prog load/unload events.
> Add/remove kernel symbols accordingly.
> 
> Signed-off-by: Alexei Starovoitov 
> ---
>  tools/perf/util/machine.c | 27 +++
>  tools/perf/util/symbol.c  | 13 +
>  tools/perf/util/symbol.h  |  1 +
>  3 files changed, 41 insertions(+)
> 
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index c4acd2001db0..ae4f8a0fdc7e 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -25,6 +25,7 @@
>  #include "sane_ctype.h"
>  #include 
>  #include 
> +#include 
>  
>  static void __machine__remove_thread(struct machine *machine, struct thread 
> *th, bool lock);
>  
> @@ -1460,6 +1461,32 @@ static int machine__process_kernel_mmap_event(struct 
> machine *machine,
>   enum dso_kernel_type kernel_type;
>   bool is_kernel_mmap;
>  
> + /* process JITed bpf programs load/unload events */
> + if (event->mmap.pid == ~0u && event->mmap.tid == BPF_FS_MAGIC) {


So, this would be in machine__process_kernel_munmap-event(machine), etc,
no check for BPF_FS_MAGIC would be needed with a PERF_RECORD_MUNMAP.

> + struct symbol *sym;
> + u64 ip;
> +
> + map = map_groups__find(>kmaps, event->mmap.start);
> + if (!map) {
> + pr_err("No kernel map for IP %lx\n", event->mmap.start);
> + goto out_problem;
> + }
> + ip = event->mmap.start - map->start + map->pgoff;
> + if (event->mmap.filename[0]) {
> + sym = symbol__new(ip, event->mmap.len, 0, 0,
> +   event->mmap.filename);

Humm, so the bpf program would be just one symbol... bpf-to-bpf calls
will be to a different bpf program, right? 

/me goes to read https://lwn.net/Articles/741773/
 "[PATCH bpf-next 00/13] bpf: introduce function calls"

> + dso__insert_symbol(map->dso, sym);
> + } else {
> + if (symbols__erase(>dso->symbols, ip)) {
> + pr_err("No bpf prog at IP %lx/%lx\n",
> +event->mmap.start, ip);
> + goto out_problem;
> + }
> + dso__reset_find_symbol_cache(map->dso);
> + }
> + return 0;
> + }
> +
>   /* If we have maps from kcore then we do not need or want any others */
>   if (machine__uses_kcore(machine))
>   return 0;
> diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
> index d188b7588152..0653f313661d 100644
> --- a/tools/perf/util/symbol.c
> +++ b/tools/perf/util/symbol.c
> @@ -353,6 +353,19 @@ static struct symbol *symbols__find(struct rb_root 
> *symbols, u64 ip)
>   return NULL;
>  }
>  
> +int symbols__erase(struct rb_root *symbols, u64 ip)
> +{
> + struct symbol *s;
> +
> + s = symbols__find(symbols, ip);
> + if (!s)
> + return -ENOENT;
> +
> + rb_erase(>rb_node, symbols);
> + symbol__delete(s);
> + return 0;
> +}
> +
>  static struct symbol *symbols__first(struct rb_root *symbols)
>  {
>   struct rb_node *n = rb_first(symbols);
> diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
> index f25fae4b5743..92ef31953d9a 100644
> --- a/tools/perf/util/symbol.h
> +++ b/tools/perf/util/symbol.h
> @@ -310,6 +310,7 @@ char *dso__demangle_sym(struct dso *dso, int kmodule, 
> const char *elf_name);
>  
>  void __symbols__insert(struct rb_root *symbols, struct symbol *sym, bool 
> kernel);
>  void symbols__insert(struct rb_root *symbols, struct symbol *sym);
> +int symbols__erase(struct rb_root *symbols, u64 ip);
>  void symbols__fixup_duplicate(struct rb_root *symbols);
>  void symbols__fixup_end(struct rb_root *symbols);
>  void map_groups__fixup_end(struct map_groups *mg);
> -- 
> 2.17.1


Re: [PATCH bpf-next 2/3] bpf: emit RECORD_MMAP events for bpf prog load/unload

2018-09-20 Thread Arnaldo Carvalho de Melo
Em Thu, Sep 20, 2018 at 10:44:24AM +0200, Peter Zijlstra escreveu:
> On Wed, Sep 19, 2018 at 03:39:34PM -0700, Alexei Starovoitov wrote:
> >  void bpf_prog_kallsyms_del(struct bpf_prog *fp)
> >  {
> > +   unsigned long symbol_start, symbol_end;
> > +   /* mmap_record.filename cannot be NULL and has to be u64 aligned */
> > +   char buf[sizeof(u64)] = {};
> > +
> > if (!bpf_prog_kallsyms_candidate(fp))
> > return;
> >  
> > spin_lock_bh(_lock);
> > bpf_prog_ksym_node_del(fp->aux);
> > spin_unlock_bh(_lock);
> > +   bpf_get_prog_addr_region(fp, _start, _end);
> > +   perf_event_mmap_bpf_prog(symbol_start, symbol_end - symbol_start,
> > +buf, sizeof(buf));
> >  }
> 
> So perf doesn't normally issue unmap events.. We've talked about doing
> that, but so far it's never really need needed I think.
 
> I feels a bit weird to start issuing unmap events for this.

For reference, this surfaced here:

https://lkml.org/lkml/2017/1/27/452

Start of the thread, that involves postgresql, JIT, LLVM, perf is here:

https://lkml.org/lkml/2016/12/10/1

PeterZ provided a patch introducing PERF_RECORD_MUNMAP, went nowhere due
to having to cope with munmapping parts of existing mmaps, etc.

I'm still more in favour of introduce PERF_RECORD_MUNMAP, even if for
now it would be used just in this clean case for undoing a
PERF_RECORD_MMAP for a BPF program.

The ABI is already complicated, starting to use something called
PERF_RECORD_MMAP for unmmaping by just using a NULL name... too clever,
I think.

- Arnaldo


Re: [PATCH bpf-next v5 00/10] BTF: BPF Type Format

2018-06-15 Thread Arnaldo Carvalho de Melo
Em Thu, Jun 14, 2018 at 09:56:43PM -0700, Yonghong Song escreveu:
> I really want to get rid of this option as well. To make pahole work
> with the default default format, I need to add bpf support to
> libdwfl in elfutils repo. I will work on that.

Right, I haven't looked into detail, but perhaps we can do like we do in
tools/perf/ where we add a feature test to check if some function is
present in a library (elfutils even) and if so, use it, otherwise, use a
copy that we carry in pahole.git.

For instance:

tools/perf/util/symbol-elf.c

#ifndef HAVE_ELF_GETPHDRNUM_SUPPORT
static int elf_getphdrnum(Elf *elf, size_t *dst)
{
GElf_Ehdr gehdr;
GElf_Ehdr *ehdr;

ehdr = gelf_getehdr(elf, );
if (!ehdr)
return -1;

*dst = ehdr->e_phnum;

return 0;
}
#endif

And we have a feature test to check if that is present, simple one, if
that builds and links, we have it, then the tools build Makefile magic
will end up defining HAVE_ELF_GETPHDRNUM_SUPPORT and our copy doesn't
get included, using what is in elfutils:

[acme@jouet perf]$ cat tools/build/feature/test-libelf-getphdrnum.c 
// SPDX-License-Identifier: GPL-2.0
#include 

int main(void)
{
size_t dst;

return elf_getphdrnum(0, );
}
[acme@jouet perf]$ 

[acme@jouet perf]$ grep elf /tmp/build/perf/FEATURE-DUMP
feature-libelf=1
feature-libelf-getphdrnum=1
feature-libelf-gelf_getnote=1
feature-libelf-getshdrstrndx=1
feature-libelf-mmap=1
[acme@jouet perf]$ 

This way a new pahole version won't get to wait till places where it
gets built have these new functions and we stop using it as soon as the
library get it.

- Arnaldo


Re: [PATCH bpf-next v5 00/10] BTF: BPF Type Format

2018-06-14 Thread Arnaldo Carvalho de Melo
Em Thu, Jun 14, 2018 at 02:47:59PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Thu, Jun 14, 2018 at 10:21:30AM -0700, Alexei Starovoitov escreveu:
> > On 6/14/18 10:18 AM, Arnaldo Carvalho de Melo wrote:
> > > Just out of curiosity, is there any plan to have this as a clang option?
  
> > I think
> > clang ... -mllvm -mattr=dwarfris
> > should work.
 
> The message "(LLVM option parsing)" implies what you suggest, but didn't
> worked :-\
 
>   -mllvm   Additional arguments to forward to LLVM's option 
> processing
 
> Almost there tho :-\

So I thought that this -mattr=dwarfris would be available only after I
set the target, because I tried 'llc -mattr=help' and dwarfris wasn't
there:

[acme@jouet perf]$ llc -mattr=help |& grep dwarf
[acme@jouet perf]$

Only after I set the arch it appears:

[acme@jouet perf]$ llc -march=bpf -mattr=help |& grep dwarf
  dwarfris - Disable MCAsmInfo DwarfUsesRelocationsAcrossSections.
  dwarfris - Disable MCAsmInfo DwarfUsesRelocationsAcrossSections.
  dwarfris - Disable MCAsmInfo DwarfUsesRelocationsAcrossSections.
[acme@jouet perf]$ 

But even after moving the '-mllvm -mattr=dwarfris' to after '-target
bpf' it still can't grok it :-\

/usr/local/bin/clang -D__KERNEL__ -D__NR_CPUS__=4 -DLINUX_VERSION_CODE=0x41100 
-g -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include 
-I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  
-I/home/acme/git/linux/include -I./include 
-I/home/acme/git/linux/arch/x86/include/uapi 
-I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi 
-I./include/generated/uapi -include 
/home/acme/git/linux/include/linux/kconfig.h  -I/home/acme/lib/include/perf/bpf 
-Wno-unused-value -Wno-pointer-sign -working-directory 
/lib/modules/4.17.0-rc5/build -c /home/acme/bpf/hello.c -target bpf -mllvm 
-mattr=dwarfris -O2 -o hello.o

So onlye with 'clang ... -target bpf -emit-llvm -O2 -o - | llc -march=bpf 
-mattr=dwarfris ...'
things work as we expect.

- Arnaldo


Re: [PATCH bpf-next v5 00/10] BTF: BPF Type Format

2018-06-14 Thread Arnaldo Carvalho de Melo
Em Thu, Jun 14, 2018 at 10:21:30AM -0700, Alexei Starovoitov escreveu:
> On 6/14/18 10:18 AM, Arnaldo Carvalho de Melo wrote:
> > Just out of curiosity, is there any plan to have this as a clang option?
 
> I think
> clang ... -mllvm -mattr=dwarfris
> should work.

[root@jouet bpf]# cat ~/.perfconfig
[llvm]
dump-obj = true
clang-opt = -g -mllvm -mattr=dwarfris
[root@jouet bpf]# trace -e openat,hello.c touch /tmp/kafai
clang (LLVM option parsing): Unknown command line argument '-mattr=dwarfris'.  
Try: 'clang (LLVM option parsing) -help'
clang (LLVM option parsing): Did you mean '-mxgot=dwarfris'?
ERROR:  unable to compile hello.c
Hint:   Check error message shown above.
Hint:   You can also pre-compile it into .o using:
clang -target bpf -O2 -c hello.c
with proper -I and -D options.
event syntax error: 'hello.c'
 \___ Failed to load hello.c from source: Error when 
compiling BPF scriptlet

(add -v to see detail)

[root@jouet bpf]# 

[root@jouet bpf]# trace -e openat,hello.c touch /tmp/kafai |& grep clang
clang (LLVM option parsing): Unknown command line argument '-mattr=dwarfris'.  
Try: 'clang (LLVM option parsing) -help'
clang (LLVM option parsing): Did you mean '-mxgot=dwarfris'?
clang -target bpf -O2 -c hello.c
[root@jouet bpf]# trace -v -e openat,hello.c touch /tmp/kafai |& grep clang
set env: CLANG_EXEC=/usr/local/bin/clang
llvm compiling command : /usr/local/bin/clang -D__KERNEL__ -D__NR_CPUS__=4 
-DLINUX_VERSION_CODE=0x41100 -g -mllvm -mattr=dwarfris  -nostdinc -isystem 
/usr/lib/gcc/x86_64-redhat-linux/7/include 
-I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  
-I/home/acme/git/linux/include -I./include 
-I/home/acme/git/linux/arch/x86/include/uapi 
-I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi 
-I./include/generated/uapi -include 
/home/acme/git/linux/include/linux/kconfig.h  -I/home/acme/lib/include/perf/bpf 
-Wno-unused-value -Wno-pointer-sign -working-directory 
/lib/modules/4.17.0-rc5/build -c /home/acme/bpf/hello.c -target bpf -O2 -o -
clang (LLVM option parsing): Unknown command line argument '-mattr=dwarfris'.  
Try: 'clang (LLVM option parsing) -help'
clang (LLVM option parsing): Did you mean '-mxgot=dwarfris'?
clang -target bpf -O2 -c hello.c
[root@jouet bpf]#

The message "(LLVM option parsing)" implies what you suggest, but didn't
worked :-\

  -mllvm   Additional arguments to forward to LLVM's option 
processing

Almost there tho :-\

- Arnaldo


Re: [PATCH bpf-next v5 00/10] BTF: BPF Type Format

2018-06-14 Thread Arnaldo Carvalho de Melo
Em Thu, Jun 14, 2018 at 10:21:30AM -0700, Alexei Starovoitov escreveu:
> On 6/14/18 10:18 AM, Arnaldo Carvalho de Melo wrote:
> > 
> > Just out of curiosity, is there any plan to have this as a clang option?
> 
> I think
> clang ... -mllvm -mattr=dwarfris

thanks, trying...

- Arnaldo


Re: [PATCH bpf-next v5 00/10] BTF: BPF Type Format

2018-06-14 Thread Arnaldo Carvalho de Melo
Em Thu, Jun 14, 2018 at 09:22:27AM -0700, Martin KaFai Lau escreveu:
> On Thu, Jun 14, 2018 at 12:03:34PM -0300, Arnaldo Carvalho de Melo wrote:
> 
> > > > > > 1. The tools/testing/selftests/bpf/Makefile has the CLANG_FLAGS and
> > > > > >LLC_FLAGS needed to compile the bpf prog.  It requires a new
> > > > > >"-mattr=dwarf" llc option which was added to the future
> > > > > >llvm 7.0.
> 
> [ ... ]
> 
> > I tried it, but it didn't work, see:
> > 
> > [root@jouet bpf]# cat hello.c 
> > #include "stdio.h"
> > 
> > int syscall_enter(openat)(void *ctx)
> > {
> > puts("Hello, world\n");
> > return 0;
> > }
> > [root@jouet bpf]# trace -e openat,hello.c touch /tmp/kafai
> > clang-6.0: error: unknown argument: '-mattr=dwarf'
> "-mattr=dwarf" is currently a llc only option.
> 
> tools/testing/selftests/bpf/Makefile has example on how to pipe clang to llc.
 
> e.g.:
> clang -g -O2 -target bpf -emit-llvm -c hello.c -o - | llc -march=bpf 
> -mcpu=generic -mattr=dwarfris -filetype=obj -o hello.o

Ok, so I'll probably add a llvm.opts .perfconfig entry that, if present
will tell tools/perf/util/llvm-utils.c that piping the output of clang
to llvm, so that we can use llvm specific options, needs to be done.

Probably, for the time being I'll check for -g in llvm.clang-opt and if
it is there, set up the piping...

Just out of curiosity, is there any plan to have this as a clang option?

Just to finish this thing here, lemme try a slightly modified version of
your command line:

[root@jouet bpf]# clang -D__KERNEL__ -D__NR_CPUS__=4 
-DLINUX_VERSION_CODE=0x41100 -g -nostdinc -isystem 
/usr/lib/gcc/x86_64-redhat-linux/7/include 
-I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  
-I/home/acme/git/linux/include -I./include 
-I/home/acme/git/linux/arch/x86/include/uapi 
-I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi 
-I./include/generated/uapi -include 
/home/acme/git/linux/include/linux/kconfig.h  -I/home/acme/lib/include/perf/bpf 
-Wno-unused-value -Wno-pointer-sign -working-directory 
/lib/modules/4.17.0-rc5/build -c /home/acme/bpf/hello.c -target bpf -emit-llvm 
-O2 -o - | llc -march=bpf -mcpu=generic -mattr=dwarfris -filetype=obj -o 
hello2.o
[root@jouet bpf]# 

[root@jouet bpf]# file hello2.o
hello2.o: ELF 64-bit LSB relocatable, *unknown arch 0xf7* version 1 (SYSV), 
with debug_info, not stripped
[root@jouet bpf]# pahole hello2.o
struct bpf_map_def {
unsigned int   type; /* 0 4 */
unsigned int   key_size; /* 4 4 */
unsigned int   value_size;   /* 8 4 */
unsigned int   max_entries;  /*12 4 */

/* size: 16, cachelines: 1, members: 4 */
/* last cacheline: 16 bytes */
};
[root@jouet bpf]#

Finally works, thanks.

Thanks,

- Arnaldo
 
> > ERROR:  unable to compile hello.c
> > Hint:   Check error message shown above.
> > Hint:   You can also pre-compile it into .o using:
> > clang -target bpf -O2 -c hello.c
> > with proper -I and -D options.
> > event syntax error: 'hello.c'
> >  \___ Failed to load hello.c from source: Error when 
> > compiling BPF scriptlet
> > 
> > (add -v to see detail)
> > Run 'perf list' for a list of valid events
> > 
> >  Usage: perf trace [] []
> > or: perf trace [] --  []
> > or: perf trace record [] []
> > or: perf trace record [] --  []
> > 
> > -e, --eventevent/syscall selector. use 'perf list' to list 
> > available events
> > [root@jouet bpf]#
> > 
> > The full command line with that is:
> > 
> > [root@jouet bpf]# trace -v -e openat,hello.c touch /tmp/kafai |& grep mattr
> > set env: CLANG_OPTIONS=-g -mattr=dwarf
> > llvm compiling command : /usr/local/bin/clang -D__KERNEL__ -D__NR_CPUS__=4 
> > -DLINUX_VERSION_CODE=0x41100 -g -mattr=dwarf  -nostdinc -isystem 
> > /usr/lib/gcc/x86_64-redhat-linux/7/include 
> > -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  
> > -I/home/acme/git/linux/include -I./include 
> > -I/home/acme/git/linux/arch/x86/include/uapi 
> > -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi 
> > -I./include/generated/uapi -include 
> > /home/acme/git/linux/include/linux/kconfig.h  
> > -I/home/acme/lib/include/perf/bpf -Wno-unused-value -Wno-pointer-sign 
> > -working-directory /lib/modules/4.17.0-rc5/build -c /home/acme/bpf/hello.c 
> > -target bpf -O2 -o -
> > clan

Re: [PATCH bpf-next v5 00/10] BTF: BPF Type Format

2018-06-14 Thread Arnaldo Carvalho de Melo
Em Wed, Jun 13, 2018 at 04:26:38PM -0700, Martin KaFai Lau escreveu:
> On Tue, Jun 12, 2018 at 05:41:26PM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Tue, Jun 12, 2018 at 05:31:24PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > Em Thu, Jun 07, 2018 at 01:07:01PM -0700, Martin KaFai Lau escreveu:
> > > > On Thu, Jun 07, 2018 at 04:30:29PM -0300, Arnaldo Carvalho de Melo 
> > > > wrote:
> > > > > So this must be available in a newer llvm version? Which one?

> > > > I should have put in the details in my last email or
> > > > in the commit message, my bad.

> > > > 1. The tools/testing/selftests/bpf/Makefile has the CLANG_FLAGS and
> > > >LLC_FLAGS needed to compile the bpf prog.  It requires a new
> > > >"-mattr=dwarf" llc option which was added to the future
> > > >llvm 7.0.

> > > [root@jouet bpf]# pahole hello.o
> > > struct clang version 5.0.1 (tags/RELEASE_501/final) {
> > >   clang version 5.0.1 (tags/RELEASE_501/final) clang version 5.0.1 
> > > (tags/RELEASE_501/final); /* 0 4 */
> > >   clang version 5.0.1 (tags/RELEASE_501/final) clang version 5.0.1 
> > > (tags/RELEASE_501/final); /* 4 4 */
> > >   clang version 5.0.1 (tags/RELEASE_501/final) clang version 5.0.1 
> > > (tags/RELEASE_501/final); /* 8 4 */
> > >   clang version 5.0.1 (tags/RELEASE_501/final) clang version 5.0.1 
> > > (tags/RELEASE_501/final); /*12 4 */

> > >   /* size: 16, cachelines: 1, members: 4 */
> > >   /* last cacheline: 16 bytes */
> > > };
> > > [root@jouet bpf]# 
> > > 
> > > Ok, I guess I saw this case in the llvm/clang git logs, so this one was
> > > generated with the older clang, will regenerate and add that 
> > > "-mattr=dwarf"
> > > part.

> > [root@jouet bpf]# pahole hello.o
> > struct clang version 7.0.0 

> > /* size: 16, cachelines: 1, members: 4 */
> > /* last cacheline: 16 bytes */
> > };
> That means the "-mattr=dwarf" is not effective.
> Can you share your clang and llc command to create hello.o?


I tried it, but it didn't work, see:

[root@jouet bpf]# cat hello.c 
#include "stdio.h"

int syscall_enter(openat)(void *ctx)
{
puts("Hello, world\n");
return 0;
}
[root@jouet bpf]# trace -e openat,hello.c touch /tmp/kafai
clang-6.0: error: unknown argument: '-mattr=dwarf'
ERROR:  unable to compile hello.c
Hint:   Check error message shown above.
Hint:   You can also pre-compile it into .o using:
clang -target bpf -O2 -c hello.c
with proper -I and -D options.
event syntax error: 'hello.c'
 \___ Failed to load hello.c from source: Error when 
compiling BPF scriptlet

(add -v to see detail)
Run 'perf list' for a list of valid events

 Usage: perf trace [] []
or: perf trace [] --  []
or: perf trace record [] []
or: perf trace record [] --  []

-e, --eventevent/syscall selector. use 'perf list' to list 
available events
[root@jouet bpf]#

The full command line with that is:

[root@jouet bpf]# trace -v -e openat,hello.c touch /tmp/kafai |& grep mattr
set env: CLANG_OPTIONS=-g -mattr=dwarf
llvm compiling command : /usr/local/bin/clang -D__KERNEL__ -D__NR_CPUS__=4 
-DLINUX_VERSION_CODE=0x41100 -g -mattr=dwarf  -nostdinc -isystem 
/usr/lib/gcc/x86_64-redhat-linux/7/include 
-I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  
-I/home/acme/git/linux/include -I./include 
-I/home/acme/git/linux/arch/x86/include/uapi 
-I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi 
-I./include/generated/uapi -include 
/home/acme/git/linux/include/linux/kconfig.h  -I/home/acme/lib/include/perf/bpf 
-Wno-unused-value -Wno-pointer-sign -working-directory 
/lib/modules/4.17.0-rc5/build -c /home/acme/bpf/hello.c -target bpf -O2 -o -
clang-6.0: error: unknown argument: '-mattr=dwarf'
[root@jouet bpf]#

This is with these llvm and clang trees:

[root@jouet llvm]# git log --oneline -5
98c78e82f54 (HEAD -> master, origin/master, origin/HEAD) [asan] Instrument 
comdat globals on COFF targets
6ad988b5998 [DAGCombiner] clean up comments; NFC
a735ba5b795 [X86][SSE] Support v8i16/v16i16 rotations
1503b9f6fe8 [x86] add tests for node-level FMF; NFC
4a49826736f [x86] regenerate test checks; NFC
[root@jouet llvm]#

[root@jouet llvm]# cd tools/clang/
[root@jouet clang]# git log --oneline -5
8c873daccc (HEAD -> master, origin/master, origin/HEAD) [X86] Add builtins for 
vpermq/vpermpd instructions to enable target feature checking.
a344be6ba4 [X86] Change immediate type for some builtins from char to int.
dcdd53793e [CUDA] Fix emission of constant strings in sections
a90c85acaf [X86] Add bu

Re: [PATCH bpf-next v5 00/10] BTF: BPF Type Format

2018-06-12 Thread Arnaldo Carvalho de Melo
Em Tue, Jun 12, 2018 at 05:31:24PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Thu, Jun 07, 2018 at 01:07:01PM -0700, Martin KaFai Lau escreveu:
> > On Thu, Jun 07, 2018 at 04:30:29PM -0300, Arnaldo Carvalho de Melo wrote:
> > > So this must be available in a newer llvm version? Which one?
> 
> > I should have put in the details in my last email or
> > in the commit message, my bad.
>  
> > 1. The tools/testing/selftests/bpf/Makefile has the CLANG_FLAGS and
> >LLC_FLAGS needed to compile the bpf prog.  It requires a new
> >"-mattr=dwarf" llc option which was added to the future
> >llvm 7.0.

> [root@jouet bpf]# pahole hello.o
> struct clang version 5.0.1 (tags/RELEASE_501/final) {
>   clang version 5.0.1 (tags/RELEASE_501/final) clang version 5.0.1 
> (tags/RELEASE_501/final); /* 0 4 */
>   clang version 5.0.1 (tags/RELEASE_501/final) clang version 5.0.1 
> (tags/RELEASE_501/final); /* 4 4 */
>   clang version 5.0.1 (tags/RELEASE_501/final) clang version 5.0.1 
> (tags/RELEASE_501/final); /* 8 4 */
>   clang version 5.0.1 (tags/RELEASE_501/final) clang version 5.0.1 
> (tags/RELEASE_501/final); /*12 4 */
> 
>   /* size: 16, cachelines: 1, members: 4 */
>   /* last cacheline: 16 bytes */
> };
> [root@jouet bpf]# 
> 
> Ok, I guess I saw this case in the llvm/clang git logs, so this one was
> generated with the older clang, will regenerate and add that "-mattr=dwarf"
> part.

[root@jouet bpf]# pahole hello.o
struct clang version 7.0.0 (http://llvm.org/git/clang.git 
8c873daccce7ee5339b9fd82c81fe02b73543b65) (http://llvm.org/git/llvm.git 
98c78e82f54be8fb0bb5f02e3ca674fbde10ef34) {
clang version 7.0.0 (http://llvm.org/git/clang.git 
8c873daccce7ee5339b9fd82c81fe02b73543b65) (http://llvm.org/git/llvm.git 98c78 
clang version 7.0.0 (http://llvm.org/git/clang.git 
8c873daccce7ee5339b9fd82c81fe02b73543b65) (http://llvm.org/git/llvm.git 
98c78e82f54be8fb0bb5f02e3ca674fbde10ef34); /* 0 4 */
clang version 7.0.0 (http://llvm.org/git/clang.git 
8c873daccce7ee5339b9fd82c81fe02b73543b65) (http://llvm.org/git/llvm.git 98c78 
clang version 7.0.0 (http://llvm.org/git/clang.git 
8c873daccce7ee5339b9fd82c81fe02b73543b65) (http://llvm.org/git/llvm.git 
98c78e82f54be8fb0bb5f02e3ca674fbde10ef34); /* 4 4 */
clang version 7.0.0 (http://llvm.org/git/clang.git 
8c873daccce7ee5339b9fd82c81fe02b73543b65) (http://llvm.org/git/llvm.git 98c78 
clang version 7.0.0 (http://llvm.org/git/clang.git 
8c873daccce7ee5339b9fd82c81fe02b73543b65) (http://llvm.org/git/llvm.git 
98c78e82f54be8fb0bb5f02e3ca674fbde10ef34); /* 8 4 */
clang version 7.0.0 (http://llvm.org/git/clang.git 
8c873daccce7ee5339b9fd82c81fe02b73543b65) (http://llvm.org/git/llvm.git 98c78 
clang version 7.0.0 (http://llvm.org/git/clang.git 
8c873daccce7ee5339b9fd82c81fe02b73543b65) (http://llvm.org/git/llvm.git 
98c78e82f54be8fb0bb5f02e3ca674fbde10ef34); /*12 4 */

/* size: 16, cachelines: 1, members: 4 */
/* last cacheline: 16 bytes */
};
[root@jouet bpf]#

Ideas?

[root@jouet bpf]# trace -e open*,hello.c
clang-6.0: error: unknown argument: '-mattr=dwarf'
ERROR:  unable to compile hello.c
Hint:   Check error message shown above.
Hint:   You can also pre-compile it into .o using:
clang -target bpf -O2 -c hello.c
with proper -I and -D options.
event syntax error: 'hello.c'
 \___ Failed to load hello.c from source: Error when 
compiling BPF scriptlet

(add -v to see detail)
Run 'perf list' for a list of valid events

 Usage: perf trace [] []
or: perf trace [] --  []
or: perf trace record [] []
or: perf trace record [] --  []

-e, --eventevent/syscall selector. use 'perf list' to list 
available events
[root@jouet bpf]#

[root@jouet bpf]# trace -v -e open*,hello.c
bpf: builtin compilation failed: -95, try external compiler
Kernel build dir is set to /lib/modules/4.17.0-rc5/build
set env: KBUILD_DIR=/lib/modules/4.17.0-rc5/build
unset env: KBUILD_OPTS
include option is set to  -nostdinc -isystem 
/usr/lib/gcc/x86_64-redhat-linux/7/include 
-I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  
-I/home/acme/git/linux/include -I./include 
-I/home/acme/git/linux/arch/x86/include/uapi 
-I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi 
-I./include/generated/uapi -include 
/home/acme/git/linux/include/linux/kconfig.h 
set env: NR_CPUS=4
set env: LINUX_VERSION_CODE=0x41100
set env: CLANG_EXEC=/usr/local/bin/clang
set env: CLANG_OPTIONS=-g -mattr=dwarf
set env: KERNEL_INC_OPTIONS= -nostdinc -isystem 
/usr/lib/gcc/x86_64-redhat-linux/7/include 
-I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  
-I/home/acme/git/linux/include -I./include 
-I/home/acme/git/linux/arch/x86/include/uapi 
-

Re: [PATCH bpf-next v5 00/10] BTF: BPF Type Format

2018-06-12 Thread Arnaldo Carvalho de Melo
Em Thu, Jun 07, 2018 at 01:07:01PM -0700, Martin KaFai Lau escreveu:
> On Thu, Jun 07, 2018 at 04:30:29PM -0300, Arnaldo Carvalho de Melo wrote:
> > So this must be available in a newer llvm version? Which one?

> I should have put in the details in my last email or
> in the commit message, my bad.
 
> 1. The tools/testing/selftests/bpf/Makefile has the CLANG_FLAGS and
>LLC_FLAGS needed to compile the bpf prog.  It requires a new
>"-mattr=dwarf" llc option which was added to the future
>llvm 7.0.
 
>Hence, I have been using the llvm's master in github which
>also has the llvm-objcopy.
 
> 2. The kernel's btf part only focus on the BPF map.
>Hence, the testing bpf program should have the map's key
>and map's value.  e.g. tools/testing/selftests/bpf/test_btf_haskv.c

So, with llvm and clang HEAD I get:

[root@jouet bpf]# pahole -J hello.o
[root@jouet bpf]# file hello.o
hello.o: ELF 64-bit LSB relocatable, *unknown arch 0xf7* version 1 (SYSV), with 
debug_info, not stripped
[root@jouet bpf]# llvm-readelf -s hello.o
There are 26 section headers, starting at offset 0xe30:

Section Headers:
  [Nr] Name  TypeAddress  OffSize   ES Flg 
Lk Inf Al
  [ 0]   NULL 00 00 00  
0   0  0
  [ 1] .text PROGBITS 40 00 00  AX  
0   0  4
  [ 2] syscalls:sys_enter_openat PROGBITS  40 88 00  AX 
 0   0  8
  [ 3] license   PROGBITS c8 04 00  WA  
0   0  1
  [ 4] version   PROGBITS cc 04 00  WA  
0   0  4
  [ 5] maps  PROGBITS d0 10 00  WA  
0   0  4
  [ 6] .rodata.str1.1PROGBITS e0 0e 01 AMS  
0   0  1
  [ 7] .debug_strPROGBITS ee 00010e 01  MS  
0   0  1
  [ 8] .debug_locPROGBITS 0001fc 23 00  
0   0  1
  [ 9] .debug_abbrev PROGBITS 00021f e3 00  
0   0  1
  [10] .debug_info   PROGBITS 000302 00015e 00  
0   0  1
  [11] .debug_ranges PROGBITS 000460 30 00  
0   0  1
  [12] .debug_macinfoPROGBITS 000490 01 00  
0   0  1
  [13] .debug_pubnames   PROGBITS 000491 6e 00  
0   0  1
  [14] .debug_pubtypes   PROGBITS 0004ff 5a 00  
0   0  1
  [15] .debug_frame  PROGBITS 000560 28 00  
0   0  8
  [16] .debug_line   PROGBITS 000588 6e 00  
0   0  1
  [17] .symtab   SYMTAB   0005f8 000318 18 
24  29  8
  [18] .relsyscalls:sys_enter_openat REL  000910 10 10 
17   2  8
  [19] .rel.debug_info   REL  000920 0001e0 10 
17  10  8
  [20] .rel.debug_pubnames REL    000b00 10 10 
17  13  8
  [21] .rel.debug_pubtypes REL    000b10 10 10 
17  14  8
  [22] .rel.debug_frame  REL  000b20 20 10 
17  15  8
  [23] .rel.debug_line   REL  000b40 10 10 
17  16  8
  [24] .strtab   STRTAB   000b50 00018e 00  
0   0  1
  [25] .BTF  PROGBITS 000cde 00014e 00  
0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
  I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
  O (extra OS processing required) o (OS specific), p (processor specific)
[root@jouet bpf]# 
[root@jouet bpf]# pahole hello.o
struct clang version 5.0.1 (tags/RELEASE_501/final) {
clang version 5.0.1 (tags/RELEASE_501/final) clang version 5.0.1 
(tags/RELEASE_501/final); /* 0 4 */
clang version 5.0.1 (tags/RELEASE_501/final) clang version 5.0.1 
(tags/RELEASE_501/final); /* 4 4 */
clang version 5.0.1 (tags/RELEASE_501/final) clang version 5.0.1 
(tags/RELEASE_501/final); /* 8 4 */
clang version 5.0.1 (tags/RELEASE_501/final) clang version 5.0.1 
(tags/RELEASE_501/final); /*12 4 */

/* size: 16, cachelines: 1, members: 4 */
/* last cacheline: 16 bytes */
};
[root@jouet bpf]# 


Ok, I guess I saw this case in the llvm/clang git logs, so this one was
generated with the older clang, will regenerate and add that "-mattr=dwarf"
part.

- Arnaldo


Re: [PATCH bpf-next v5 00/10] BTF: BPF Type Format

2018-06-07 Thread Arnaldo Carvalho de Melo
Em Thu, Jun 07, 2018 at 01:07:01PM -0700, Martin KaFai Lau escreveu:
> On Thu, Jun 07, 2018 at 04:30:29PM -0300, Arnaldo Carvalho de Melo wrote:
> > So this must be available in a newer llvm version? Which one?

> I should have put in the details in my last email or
> in the commit message, my bad.
 
> 1. The tools/testing/selftests/bpf/Makefile has the CLANG_FLAGS and
>LLC_FLAGS needed to compile the bpf prog.  It requires a new
>"-mattr=dwarf" llc option which was added to the future
>llvm 7.0.
 
>Hence, I have been using the llvm's master in github which
>also has the llvm-objcopy.
 
> 2. The kernel's btf part only focus on the BPF map.
>Hence, the testing bpf program should have the map's key
>and map's value.  e.g. tools/testing/selftests/bpf/test_btf_haskv.c

Thanks for the version required to test this, but where is this
test_btf_haskv.c file? Which tree? net-next?

Ok, just pulled torvalds/master and there it is. Gotcha.

struct bpf_map_def SEC("maps") __bpf_stdout__ = {
   .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
   .key_size = sizeof(int),
   .value_size = sizeof(u32),
   .max_entries = __NR_CPUS__,
};

This map is in the above hello.c example, but I guess its way too simple
:-)

Ok, I'll test this at home in another machine where I have the llvm's
git repo.

- Arnaldo


Re: [PATCH bpf-next v5 00/10] BTF: BPF Type Format

2018-06-07 Thread Arnaldo Carvalho de Melo
Em Thu, Jun 07, 2018 at 12:05:10PM -0700, Martin KaFai Lau escreveu:
> On Thu, Jun 07, 2018 at 11:03:37AM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Thu, Jun 07, 2018 at 10:54:01AM -0300, Arnaldo Carvalho de Melo escreveu:
> > > Em Tue, Jun 05, 2018 at 02:25:48PM -0700, Martin KaFai Lau escreveu:
> > > > [ btw, the latest commit (1 commit) should be 94a11b59e592 ].

> > So, the commit log message for the pahole patch is non-existent:

> > https://github.com/iamkafai/pahole/commit/94a11b59e5920908085bfc8d24c92f95c8ffceaf

> > we should do better in describing what is done and how, I'm staring
> > with a message you sent to the kernel part:
> > --
> > This patch introduces BPF Type Format (BTF).

> > BTF (BPF Type Format) is the meta data format which describes
> > the data types of BPF program/map.  Hence, it basically focus
> > on the C programming language which the modern BPF is primary
> > using.  The first use case is to provide a generic pretty print
> > capability for a BPF map.

> I will add details in the next github respin/push.

Ok, but I can do that if there is nothing else to do in the code at this
stage :-)
 
> > Now I'm going to do the step-by-step guide on testing the feature just
> > introduced, and will try to convert from dwarf to BTF and back, compare
> > the pahole output for types encoded in DWARF and BTF, etc.

> > If you have something ressembling this already, please share.

> The pahole only has the encoder part.  I tested with the verbose output
> from the "pahole -V -J".  Loading the btf to the kernel is also tested.

Ok, so here it goes my first stab at testing, using perf's BPF
integration:

[root@jouet bpf]# cat hello.c
#include "stdio.h"

int syscall_enter(openat)(void *ctx)
{
puts("Hello, world\n");
return 0;
}
[root@jouet bpf]# cat ~/.perfconfig
[llvm]
dump-obj = true
[root@jouet bpf]# perf trace -e open*,hello.c touch /tmp/hello.BTF
LLVM: dumping hello.o
 0.017 ( ): __bpf_stdout__:Hello, world
 0.019 ( 0.011 ms): touch/28147 openat(dfd: CWD, filename: 
/etc/ld.so.cache, flags: CLOEXEC   ) = 3
 0.053 ( ): __bpf_stdout__:Hello, world
 0.055 ( 0.011 ms): touch/28147 openat(dfd: CWD, filename: 
/lib64/libc.so.6, flags: CLOEXEC   ) = 3
 0.354 ( 0.012 ms): touch/28147 open(filename: 
/usr/lib/locale/locale-archive, flags: CLOEXEC ) = 3
 0.411 ( ): __bpf_stdout__:Hello, world
 0.412 ( 0.198 ms): touch/28147 openat(dfd: CWD, filename: /tmp/hello.BTF, 
flags: CREAT|NOCTTY|NONBLOCK|WRONLY, mode: IRUGO|IWUGO) = 3
[root@jouet bpf]# 
[root@jouet bpf]# file hello.o
hello.o: ELF 64-bit LSB relocatable, *unknown arch 0xf7* version 1 (SYSV), not 
stripped
[root@jouet bpf]# pahole --btf_encode hello.o
pahole: hello.o: No debugging information found
[root@jouet bpf]#

[root@jouet bpf]# readelf -s hello.o

Symbol table '.symtab' contains 5 entries:
   Num:Value  Size TypeBind   Vis  Ndx Name
 0:  0 NOTYPE  LOCAL  DEFAULT  UND 
 1:  0 NOTYPE  GLOBAL DEFAULT7 __bpf_stdout__
 2:  0 NOTYPE  GLOBAL DEFAULT5 _license
 3:  0 NOTYPE  GLOBAL DEFAULT6 _version
 4:  0 NOTYPE  GLOBAL DEFAULT3 syscall_enter_openat
[root@jouet bpf]#
[root@jouet bpf]# readelf -SW hello.o
There are 10 section headers, starting at offset 0x1f8:

Section Headers:
  [Nr] Name  TypeAddress  OffSize   ES Flg 
Lk Inf Al
  [ 0]   NULL 00 00 00  
0   0  0
  [ 1] .strtab   STRTAB   000178 7f 00  
0   0  1
  [ 2] .text PROGBITS 40 00 00  AX  
0   0  4
  [ 3] syscalls:sys_enter_openat PROGBITS 40 88 
00  AX  0   0  8
  [ 4] .relsyscalls:sys_enter_openat REL  000168 
10 10  9   3  8
  [ 5] license   PROGBITS c8 04 00  WA  
0   0  1
  [ 6] version   PROGBITS cc 04 00  WA  
0   0  4
  [ 7] maps  PROGBITS d0 10 00  WA  
0   0  4
  [ 8] .rodata.str1.1PROGBITS e0 0e 01 AMS  
0   0  1
  [ 9] .symtab   SYMTAB   f0 78 18  
1   1  8
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  p (processor specific)
[root@jouet bpf]#

Humm, lemme try something, add -g to clang-opt:

[root@jouet bpf]# cat ~/.perfconfig
[llvm]
dump-obj = true
clang-o

Re: [PATCH bpf-next v5 00/10] BTF: BPF Type Format

2018-06-07 Thread Arnaldo Carvalho de Melo
Em Thu, Jun 07, 2018 at 10:54:01AM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Tue, Jun 05, 2018 at 02:25:48PM -0700, Martin KaFai Lau escreveu:
> > [ btw, the latest commit (1 commit) should be 94a11b59e592 ].

So, the commit log message for the pahole patch is non-existent:

https://github.com/iamkafai/pahole/commit/94a11b59e5920908085bfc8d24c92f95c8ffceaf

we should do better in describing what is done and how, I'm staring
with a message you sent to the kernel part:

--
This patch introduces BPF Type Format (BTF).

BTF (BPF Type Format) is the meta data format which describes
the data types of BPF program/map.  Hence, it basically focus
on the C programming language which the modern BPF is primary
using.  The first use case is to provide a generic pretty print
capability for a BPF map.
--

Now I'm going to do the step-by-step guide on testing the feature just
introduced, and will try to convert from dwarf to BTF and back, compare
the pahole output for types encoded in DWARF and BTF, etc.

If you have something ressembling this already, please share.

Thanks,

- Arnaldo


Re: [PATCH bpf-next v5 00/10] BTF: BPF Type Format

2018-06-07 Thread Arnaldo Carvalho de Melo
Em Tue, Jun 05, 2018 at 02:25:48PM -0700, Martin KaFai Lau escreveu:
> On Thu, Apr 19, 2018 at 04:40:34PM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Wed, Apr 18, 2018 at 03:55:56PM -0700, Martin KaFai Lau escreveu:
> > > This patch introduces BPF Type Format (BTF).
> > > 
> > > BTF (BPF Type Format) is the meta data format which describes
> > > the data types of BPF program/map.  Hence, it basically focus
> > > on the C programming language which the modern BPF is primary
> > > using.  The first use case is to provide a generic pretty print
> > > capability for a BPF map.
> > > 
> > > A modified pahole that can convert dwarf to BTF is here:
> > > https://github.com/iamkafai/pahole/tree/btf
> > > (Arnaldo, there is some BTF_KIND numbering changes on
> > >  Apr 18th, d61426c1571)
> > 
> > Thanks for letting me know, I'm starting to look at this,
> Hi Arnaldo,
> 
> Do you have a chance to take a look and pull it?  The kernel
> changes will be in 4.18, so it will be handy if it is available in
> the pahole repository.
> 
> [ btw, the latest commit (1 commit) should be 94a11b59e592 ].

Yeah, the one I had before had:

It also raises the number of types (and functions) limit from 0x7fff to
0x7fff.



And on this last one I see that:

 /* Max # of type identifier */
-#define BTF_MAX_TYPE   0x7fff
+#define BTF_MAX_TYPE   0x
 /* Max offset into the string section */
-#define BTF_MAX_NAME_OFFSET0x7fff
+#define BTF_MAX_NAME_OFFSET0x

So somehow (still reading) you'll be able to get more space, if we find
necessary, to have more types and names, ok.

Continuing...

- Arnaldo
 
> > 
> > - Arnaldo
> >  
> > > Please see individual patch for details.
> > > 
> > > v5:
> > > - Remove BTF_KIND_FLOAT and BTF_KIND_FUNC which are not
> > >   currently used.  They can be added in the future.
> > >   Some bpf_df_xxx() are removed together.
> > > - Add comment in patch 7 to clarify that the new bpffs_map_fops
> > >   should not be extended further.
> > > 
> > > v4:
> > > - Fix warning (remove unneeded semicolon)
> > > - Remove a redundant variable (nr_bytes) from btf_int_check_meta() in
> > >   patch 1.  Caught by W=1.
> > > 
> > > v3:
> > > - Rebase to bpf-next
> > > - Fix sparse warning (by adding static)
> > > - Add BTF header logging: btf_verifier_log_hdr()
> > > - Fix the alignment test on btf->type_off
> > > - Add tests for the BTF header
> > > - Lower the max BTF size to 16MB.  It should be enough
> > >   for some time.  We could raise it later if it would
> > >   be needed.
> > > 
> > > v2:
> > > - Use kvfree where needed in patch 1 and 2
> > > - Also consider BTF_INT_OFFSET() in the btf_int_check_meta()
> > >   in patch 1
> > > - Fix an incorrect goto target in map_create() during
> > >   the btf-error-path in patch 7
> > > - re-org some local vars to keep the rev xmas tree in btf.c
> > > 
> > > Martin KaFai Lau (10):
> > >   bpf: btf: Introduce BPF Type Format (BTF)
> > >   bpf: btf: Validate type reference
> > >   bpf: btf: Check members of struct/union
> > >   bpf: btf: Add pretty print capability for data with BTF type info
> > >   bpf: btf: Add BPF_BTF_LOAD command
> > >   bpf: btf: Add BPF_OBJ_GET_INFO_BY_FD support to BTF fd
> > >   bpf: btf: Add pretty print support to the basic arraymap
> > >   bpf: btf: Sync bpf.h and btf.h to tools/
> > >   bpf: btf: Add BTF support to libbpf
> > >   bpf: btf: Add BTF tests
> > > 
> > >  include/linux/bpf.h  |   20 +-
> > >  include/linux/btf.h  |   48 +
> > >  include/uapi/linux/bpf.h |   12 +
> > >  include/uapi/linux/btf.h |  130 ++
> > >  kernel/bpf/Makefile  |1 +
> > >  kernel/bpf/arraymap.c|   50 +
> > >  kernel/bpf/btf.c | 2064 
> > > ++
> > >  kernel/bpf/inode.c   |  156 +-
> > >  kernel/bpf/syscall.c |   51 +-
> > >  tools/include/uapi/linux/bpf.h   |   12 +
> > >  tools/include/uapi/linux/btf.h   |  130 ++
> > >  tools/lib/bpf/Build  |2 +-
> > >  tools/lib/bpf/bpf.c  |   92 +-
> > >  tools/lib/bpf/bpf.h

Re: [PATCH bpf-next v5 00/10] BTF: BPF Type Format

2018-06-06 Thread Arnaldo Carvalho de Melo
Em Tue, Jun 05, 2018 at 02:25:48PM -0700, Martin KaFai Lau escreveu:
> On Thu, Apr 19, 2018 at 04:40:34PM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Wed, Apr 18, 2018 at 03:55:56PM -0700, Martin KaFai Lau escreveu:
> > > This patch introduces BPF Type Format (BTF).
> > > 
> > > BTF (BPF Type Format) is the meta data format which describes
> > > the data types of BPF program/map.  Hence, it basically focus
> > > on the C programming language which the modern BPF is primary
> > > using.  The first use case is to provide a generic pretty print
> > > capability for a BPF map.
> > > 
> > > A modified pahole that can convert dwarf to BTF is here:
> > > https://github.com/iamkafai/pahole/tree/btf
> > > (Arnaldo, there is some BTF_KIND numbering changes on
> > >  Apr 18th, d61426c1571)
> > 
> > Thanks for letting me know, I'm starting to look at this,
> Hi Arnaldo,
> 
> Do you have a chance to take a look and pull it?  The kernel
> changes will be in 4.18, so it will be handy if it is available in
> the pahole repository.
> 
> [ btw, the latest commit (1 commit) should be 94a11b59e592 ].

Got sidetracked, will get back to it later today.

- Arnaldo


[GIT PULL 00/11] perf/core improvements and fixes

2018-05-16 Thread Arnaldo Carvalho de Melo
Hi Ingo,

Please consider pulling, more to come as I go thru Adrian's x86
PTI series and the C++ support improvements to 'perf probe', from
Holger,

Best Regards,

- Arnaldo

Test results at the end of this message, as usual.
  
The following changes since commit 291c161f6c65530092903fbea58eb07a62b220ba:

  Merge remote-tracking branch 'tip/perf/urgent' into perf/core (2018-05-15 
10:30:17 -0300)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
tags/perf-core-for-mingo-4.18-20180516

for you to fetch changes up to 7a36a287de9fbb1ba906e70573d3f2315f7fd609:

  perf bpf: Fix NULL return handling in bpf__prepare_load() (2018-05-16 
10:01:55 -0300)


perf/core improvements and fixes:

- Add '-e intel_pt//u' test to the 'parse-events' 'perf test' entry,
  to help avoiding regressions in the events parser such as one
  that caused a revert in v4.17-rc (Arnaldo Carvalho de Melo)

- Fix NULL return handling in bpf__prepare_load() (YueHaibing)

- Warn about 'perf buildid-cache --purge-all' failures (Ravi Bangoria)

- Add infrastructure to help in writing eBPF C programs to be used
  with '-e name.c' type events in tools such as 'record' and 'trace',
  with headers for common constructs and an examples directory that
  will get populated as we add more such helpers and the 'perf bpf'
  branch that Jiri Olsa has been working on (Arnaldo Carvalho de Melo)

- Handle uncore event aliases in small groups properly (Kan Liang)

- Use the "_stest" symbol to identify the kernel map when loading kcore (Adrian 
Hunter)

Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com>


Adrian Hunter (1):
  perf tools: Use the "_stest" symbol to identify the kernel map when 
loading kcore

Arnaldo Carvalho de Melo (7):
  perf tests parse-events: Add intel_pt parse test
  perf llvm-utils: Add bpf include path to clang command line
  perf bpf: Add 'examples' directories
  perf bpf: Add bpf.h to be used in eBPF proggies
  perf bpf: Add kprobe example to catch 5s naps
  perf bpf: Add license(NAME) helper
  perf bpf: Add probe() helper to reduce kprobes boilerplate

Kan Liang (1):
  perf parse-events: Handle uncore event aliases in small groups properly

Ravi Bangoria (1):
  perf buildid-cache: Warn --purge-all failures

YueHaibing (1):
  perf bpf: Fix NULL return handling in bpf__prepare_load()

 tools/perf/Makefile.config |  14 
 tools/perf/Makefile.perf   |   8 +++
 tools/perf/builtin-buildid-cache.c |   8 ++-
 tools/perf/examples/bpf/5sec.c |  49 ++
 tools/perf/examples/bpf/empty.c|   3 +
 tools/perf/include/bpf/bpf.h   |  13 
 tools/perf/tests/parse-events.c|  13 
 tools/perf/util/Build  |   2 +
 tools/perf/util/bpf-loader.c   |   6 +-
 tools/perf/util/evsel.h|   1 +
 tools/perf/util/llvm-utils.c   |  19 --
 tools/perf/util/parse-events.c | 130 -
 tools/perf/util/parse-events.h |   7 +-
 tools/perf/util/parse-events.y |   8 +--
 tools/perf/util/symbol.c   |  16 ++---
 15 files changed, 270 insertions(+), 27 deletions(-)
 create mode 100644 tools/perf/examples/bpf/5sec.c
 create mode 100644 tools/perf/examples/bpf/empty.c
 create mode 100644 tools/perf/include/bpf/bpf.h

Test results:

The first ones are container (docker) based builds of tools/perf with
and without libelf support.  Where clang is available, it is also used
to build perf with/without libelf, and building with LIBCLANGLLVM=1
(built-in clang) with gcc and clang when clang and its devel libraries
are installed.

The objtool and samples/bpf/ builds are disabled now that I'm switching from
using the sources in a local volume to fetching them from a http server to
build it inside the container, to make it easier to build in a container 
cluster.
Those will come back later.

Several are cross builds, the ones with -x-ARCH and the android one, and those
may not have all the features built, due to lack of multi-arch devel packages,
available and being used so far on just a few, like
debian:experimental-x-{arm64,mipsel}.

The 'perf test' one will perform a variety of tests exercising
tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands
with a variety of command line event specifications to then intercept the
sys_perf_event syscall to check that the perf_event_attr fields are set up as
expected, among a variety of other unit tests.

Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/
with a variety of feature sets, exercising the build with an incomplete set of
features as well as with a complete one. It is planned to have it run on each
of the containers mentioned above, using some container orchestration
infrastructure. Ge

[PATCH 11/11] perf bpf: Fix NULL return handling in bpf__prepare_load()

2018-05-16 Thread Arnaldo Carvalho de Melo
From: YueHaibing <yuehaib...@huawei.com>

bpf_object__open()/bpf_object__open_buffer can return error pointer or
NULL, check the return values with IS_ERR_OR_NULL() in bpf__prepare_load
and bpf__prepare_load_buffer

Signed-off-by: YueHaibing <yuehaib...@huawei.com>
Acked-by: Daniel Borkmann <dan...@iogearbox.net>
Cc: Alexander Shishkin <alexander.shish...@linux.intel.com>
Cc: Namhyung Kim <namhy...@kernel.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: netdev@vger.kernel.org
Link: https://lkml.kernel.org/n/tip-psf4xwc09n62al2cb9s33...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com>
---
 tools/perf/util/bpf-loader.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index af7ad814b2c3..cee658733e2c 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -66,7 +66,7 @@ bpf__prepare_load_buffer(void *obj_buf, size_t obj_buf_sz, 
const char *name)
}
 
obj = bpf_object__open_buffer(obj_buf, obj_buf_sz, name);
-   if (IS_ERR(obj)) {
+   if (IS_ERR_OR_NULL(obj)) {
pr_debug("bpf: failed to load buffer\n");
return ERR_PTR(-EINVAL);
}
@@ -102,14 +102,14 @@ struct bpf_object *bpf__prepare_load(const char 
*filename, bool source)
pr_debug("bpf: successfull builtin compilation\n");
obj = bpf_object__open_buffer(obj_buf, obj_buf_sz, filename);
 
-   if (!IS_ERR(obj) && llvm_param.dump_obj)
+   if (!IS_ERR_OR_NULL(obj) && llvm_param.dump_obj)
llvm__dump_obj(filename, obj_buf, obj_buf_sz);
 
free(obj_buf);
} else
obj = bpf_object__open(filename);
 
-   if (IS_ERR(obj)) {
+   if (IS_ERR_OR_NULL(obj)) {
pr_debug("bpf: failed to load %s\n", filename);
return obj;
}
-- 
2.14.3



Re: [PATCH bpf] tools: bpf: fix NULL return handling in bpf__prepare_load

2018-05-15 Thread Arnaldo Carvalho de Melo
Em Sun, May 13, 2018 at 01:20:22AM +0200, Daniel Borkmann escreveu:
> [ +Arnaldo ]
> 
> On 05/11/2018 01:21 PM, YueHaibing wrote:
> > bpf_object__open()/bpf_object__open_buffer can return error pointer or NULL,
> > check the return values with IS_ERR_OR_NULL() in bpf__prepare_load and
> > bpf__prepare_load_buffer
> > 
> > Signed-off-by: YueHaibing 
> > ---
> >  tools/perf/util/bpf-loader.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> This should probably be routed via Arnaldo due to the fix in perf itself. If
> there's no particular preference on which tree, we could potentially route it
> as well via bpf with Acked-by from Arnaldo, but that is up to him. Arnaldo,
> any preference?

I'm preparing a pull req right now, and working a bit on perf's BPF
support, so why not, I'll merge it, thanks,

- Arnaldo
 
> > diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
> > index af7ad81..cee6587 100644
> > --- a/tools/perf/util/bpf-loader.c
> > +++ b/tools/perf/util/bpf-loader.c
> > @@ -66,7 +66,7 @@ bpf__prepare_load_buffer(void *obj_buf, size_t 
> > obj_buf_sz, const char *name)
> > }
> >  
> > obj = bpf_object__open_buffer(obj_buf, obj_buf_sz, name);
> > -   if (IS_ERR(obj)) {
> > +   if (IS_ERR_OR_NULL(obj)) {
> > pr_debug("bpf: failed to load buffer\n");
> > return ERR_PTR(-EINVAL);
> > }
> > @@ -102,14 +102,14 @@ struct bpf_object *bpf__prepare_load(const char 
> > *filename, bool source)
> > pr_debug("bpf: successfull builtin compilation\n");
> > obj = bpf_object__open_buffer(obj_buf, obj_buf_sz, filename);
> >  
> > -   if (!IS_ERR(obj) && llvm_param.dump_obj)
> > +   if (!IS_ERR_OR_NULL(obj) && llvm_param.dump_obj)
> > llvm__dump_obj(filename, obj_buf, obj_buf_sz);
> >  
> > free(obj_buf);
> > } else
> > obj = bpf_object__open(filename);
> >  
> > -   if (IS_ERR(obj)) {
> > +   if (IS_ERR_OR_NULL(obj)) {
> > pr_debug("bpf: failed to load %s\n", filename);
> > return obj;
> > }
> > 


Re: [PATCH bpf-next v5 00/10] BTF: BPF Type Format

2018-04-19 Thread Arnaldo Carvalho de Melo
Em Wed, Apr 18, 2018 at 03:55:56PM -0700, Martin KaFai Lau escreveu:
> This patch introduces BPF Type Format (BTF).
> 
> BTF (BPF Type Format) is the meta data format which describes
> the data types of BPF program/map.  Hence, it basically focus
> on the C programming language which the modern BPF is primary
> using.  The first use case is to provide a generic pretty print
> capability for a BPF map.
> 
> A modified pahole that can convert dwarf to BTF is here:
> https://github.com/iamkafai/pahole/tree/btf
> (Arnaldo, there is some BTF_KIND numbering changes on
>  Apr 18th, d61426c1571)

Thanks for letting me know, I'm starting to look at this,

- Arnaldo
 
> Please see individual patch for details.
> 
> v5:
> - Remove BTF_KIND_FLOAT and BTF_KIND_FUNC which are not
>   currently used.  They can be added in the future.
>   Some bpf_df_xxx() are removed together.
> - Add comment in patch 7 to clarify that the new bpffs_map_fops
>   should not be extended further.
> 
> v4:
> - Fix warning (remove unneeded semicolon)
> - Remove a redundant variable (nr_bytes) from btf_int_check_meta() in
>   patch 1.  Caught by W=1.
> 
> v3:
> - Rebase to bpf-next
> - Fix sparse warning (by adding static)
> - Add BTF header logging: btf_verifier_log_hdr()
> - Fix the alignment test on btf->type_off
> - Add tests for the BTF header
> - Lower the max BTF size to 16MB.  It should be enough
>   for some time.  We could raise it later if it would
>   be needed.
> 
> v2:
> - Use kvfree where needed in patch 1 and 2
> - Also consider BTF_INT_OFFSET() in the btf_int_check_meta()
>   in patch 1
> - Fix an incorrect goto target in map_create() during
>   the btf-error-path in patch 7
> - re-org some local vars to keep the rev xmas tree in btf.c
> 
> Martin KaFai Lau (10):
>   bpf: btf: Introduce BPF Type Format (BTF)
>   bpf: btf: Validate type reference
>   bpf: btf: Check members of struct/union
>   bpf: btf: Add pretty print capability for data with BTF type info
>   bpf: btf: Add BPF_BTF_LOAD command
>   bpf: btf: Add BPF_OBJ_GET_INFO_BY_FD support to BTF fd
>   bpf: btf: Add pretty print support to the basic arraymap
>   bpf: btf: Sync bpf.h and btf.h to tools/
>   bpf: btf: Add BTF support to libbpf
>   bpf: btf: Add BTF tests
> 
>  include/linux/bpf.h  |   20 +-
>  include/linux/btf.h  |   48 +
>  include/uapi/linux/bpf.h |   12 +
>  include/uapi/linux/btf.h |  130 ++
>  kernel/bpf/Makefile  |1 +
>  kernel/bpf/arraymap.c|   50 +
>  kernel/bpf/btf.c | 2064 
> ++
>  kernel/bpf/inode.c   |  156 +-
>  kernel/bpf/syscall.c |   51 +-
>  tools/include/uapi/linux/bpf.h   |   12 +
>  tools/include/uapi/linux/btf.h   |  130 ++
>  tools/lib/bpf/Build  |2 +-
>  tools/lib/bpf/bpf.c  |   92 +-
>  tools/lib/bpf/bpf.h  |   16 +
>  tools/lib/bpf/btf.c  |  374 +
>  tools/lib/bpf/btf.h  |   22 +
>  tools/lib/bpf/libbpf.c   |  148 +-
>  tools/lib/bpf/libbpf.h   |3 +
>  tools/testing/selftests/bpf/Makefile |   26 +-
>  tools/testing/selftests/bpf/test_btf.c   | 1669 +
>  tools/testing/selftests/bpf/test_btf_haskv.c |   48 +
>  tools/testing/selftests/bpf/test_btf_nokv.c  |   43 +
>  22 files changed, 5076 insertions(+), 41 deletions(-)
>  create mode 100644 include/linux/btf.h
>  create mode 100644 include/uapi/linux/btf.h
>  create mode 100644 kernel/bpf/btf.c
>  create mode 100644 tools/include/uapi/linux/btf.h
>  create mode 100644 tools/lib/bpf/btf.c
>  create mode 100644 tools/lib/bpf/btf.h
>  create mode 100644 tools/testing/selftests/bpf/test_btf.c
>  create mode 100644 tools/testing/selftests/bpf/test_btf_haskv.c
>  create mode 100644 tools/testing/selftests/bpf/test_btf_nokv.c
> 
> -- 
> 2.9.5


Re: [PATCH bpf-next v3 00/10] BTF: BPF Type Format

2018-04-16 Thread Arnaldo Carvalho de Melo
Em Mon, Apr 16, 2018 at 12:33:17PM -0700, Martin KaFai Lau escreveu:
> This patch introduces BPF Type Format (BTF).
> 
> BTF (BPF Type Format) is the meta data format which describes
> the data types of BPF program/map.  Hence, it basically focus
> on the C programming language which the modern BPF is primary
> using.  The first use case is to provide a generic pretty print
> capability for a BPF map.
> 
> A modified pahole (Cc: Arnaldo) that can convert dwarf to BTF is here:
> https://github.com/iamkafai/pahole/tree/btf

Thanks for CCing me, no changes since when you posted the pahole
patches, I gave it a quick look, seems sane, will try to merge and push
a new pahole version out so that distros can pick it, at least fedora
will 8-)

- Arnaldo
 
> Please see individual patch for details.
> 
> v3:
> - Rebase to bpf-next
> - Fix sparse warning (by adding static)
> - Add BTF header logging: btf_verifier_log_hdr()
> - Fix the alignment test on btf->type_off
> - Add tests for the BTF header
> - Lower the max BTF size to 16MB.  It should be enough
>   for some time.  We could raise it later if it would
>   be needed.
> 
> v2:
> - Use kvfree where needed in patch 1 and 2
> - Also consider BTF_INT_OFFSET() in the btf_int_check_meta()
>   in patch 1
> - Fix an incorrect goto target in map_create() during
>   the btf-error-path in patch 7
> - re-org some local vars to keep the rev xmas tree in btf.c
> 
> Martin KaFai Lau (10):
>   bpf: btf: Introduce BPF Type Format (BTF)
>   bpf: btf: Validate type reference
>   bpf: btf: Check members of struct/union
>   bpf: btf: Add pretty print capability for data with BTF type info
>   bpf: btf: Add BPF_BTF_LOAD command
>   bpf: btf: Add BPF_OBJ_GET_INFO_BY_FD support to BTF fd
>   bpf: btf: Add pretty print support to the basic arraymap
>   bpf: btf: Sync bpf.h and btf.h to tools/
>   bpf: btf: Add BTF support to libbpf
>   bpf: btf: Add BTF tests
> 
>  include/linux/bpf.h  |   20 +-
>  include/linux/btf.h  |   48 +
>  include/uapi/linux/bpf.h |   12 +
>  include/uapi/linux/btf.h |  132 ++
>  kernel/bpf/Makefile  |1 +
>  kernel/bpf/arraymap.c|   50 +
>  kernel/bpf/btf.c | 2093 
> ++
>  kernel/bpf/inode.c   |  146 +-
>  kernel/bpf/syscall.c |   51 +-
>  tools/include/uapi/linux/bpf.h   |   13 +
>  tools/include/uapi/linux/btf.h   |  132 ++
>  tools/lib/bpf/Build  |2 +-
>  tools/lib/bpf/bpf.c  |   92 +-
>  tools/lib/bpf/bpf.h  |   16 +
>  tools/lib/bpf/btf.c  |  377 +
>  tools/lib/bpf/btf.h  |   22 +
>  tools/lib/bpf/libbpf.c   |  148 +-
>  tools/lib/bpf/libbpf.h   |3 +
>  tools/testing/selftests/bpf/Makefile |   26 +-
>  tools/testing/selftests/bpf/test_btf.c   | 1669 
>  tools/testing/selftests/bpf/test_btf_haskv.c |   48 +
>  tools/testing/selftests/bpf/test_btf_nokv.c  |   43 +
>  22 files changed, 5103 insertions(+), 41 deletions(-)
>  create mode 100644 include/linux/btf.h
>  create mode 100644 include/uapi/linux/btf.h
>  create mode 100644 kernel/bpf/btf.c
>  create mode 100644 tools/include/uapi/linux/btf.h
>  create mode 100644 tools/lib/bpf/btf.c
>  create mode 100644 tools/lib/bpf/btf.h
>  create mode 100644 tools/testing/selftests/bpf/test_btf.c
>  create mode 100644 tools/testing/selftests/bpf/test_btf_haskv.c
>  create mode 100644 tools/testing/selftests/bpf/test_btf_nokv.c
> 
> -- 
> 2.9.5


Re: [PATCH bpf-next 00/10] BTF: BPF Type Format

2018-04-03 Thread Arnaldo Carvalho de Melo
Em Fri, Mar 30, 2018 at 11:26:33AM -0700, Martin KaFai Lau escreveu:
> This patch introduces BPF Type Format (BTF).
> 
> BTF (BPF Type Format) is the meta data format which describes
> the data types of BPF program/map.  Hence, it basically focus
> on the C programming language which the modern BPF is primary
> using.  The first use case is to provide a generic pretty print
> capability for a BPF map.
> 
> A modified pahole that can convert dwarf to BTF is here:
> https://github.com/iamkafai/pahole/tree/btf

Hey, great, I'll try to review this and if all is well, merge this,
please consider CCing me in patches to pahole :-)

- Arnaldo
 
> Please see individual patch for details.
> 
> Martin KaFai Lau (10):
>   bpf: btf: Introduce BPF Type Format (BTF)
>   bpf: btf: Validate type reference
>   bpf: btf: Check members of struct/union
>   bpf: btf: Add pretty print capability for data with BTF type info
>   bpf: btf: Add BPF_BTF_LOAD command
>   bpf: btf: Add BPF_OBJ_GET_INFO_BY_FD support to BTF fd
>   bpf: btf: Add pretty print support to the basic arraymap
>   bpf: btf: Sync bpf.h and btf.h to tools/
>   bpf: btf: Add BTF support to libbpf
>   bpf: btf: Add BTF tests
> 
>  include/linux/bpf.h  |   20 +-
>  include/linux/btf.h  |   48 +
>  include/uapi/linux/bpf.h |   12 +
>  include/uapi/linux/btf.h |  132 ++
>  kernel/bpf/Makefile  |1 +
>  kernel/bpf/arraymap.c|   50 +
>  kernel/bpf/btf.c | 2064 
> ++
>  kernel/bpf/inode.c   |  146 +-
>  kernel/bpf/syscall.c |   51 +-
>  tools/include/uapi/linux/bpf.h   |   13 +
>  tools/include/uapi/linux/btf.h   |  132 ++
>  tools/lib/bpf/Build  |2 +-
>  tools/lib/bpf/bpf.c  |   92 +-
>  tools/lib/bpf/bpf.h  |   16 +
>  tools/lib/bpf/btf.c  |  377 +
>  tools/lib/bpf/btf.h  |   22 +
>  tools/lib/bpf/libbpf.c   |  148 +-
>  tools/lib/bpf/libbpf.h   |3 +
>  tools/testing/selftests/bpf/Makefile |   25 +-
>  tools/testing/selftests/bpf/test_btf.c   | 1539 +++
>  tools/testing/selftests/bpf/test_btf_haskv.c |   48 +
>  tools/testing/selftests/bpf/test_btf_nokv.c  |   43 +
>  22 files changed, 4943 insertions(+), 41 deletions(-)
>  create mode 100644 include/linux/btf.h
>  create mode 100644 include/uapi/linux/btf.h
>  create mode 100644 kernel/bpf/btf.c
>  create mode 100644 tools/include/uapi/linux/btf.h
>  create mode 100644 tools/lib/bpf/btf.c
>  create mode 100644 tools/lib/bpf/btf.h
>  create mode 100644 tools/testing/selftests/bpf/test_btf.c
>  create mode 100644 tools/testing/selftests/bpf/test_btf_haskv.c
>  create mode 100644 tools/testing/selftests/bpf/test_btf_nokv.c
> 
> -- 
> 2.9.5


Re: [PATCH bpf] tools: bpftool: fix compilation with older headers

2018-03-06 Thread Arnaldo Carvalho de Melo
[ +ingo, jolsa, namhyung ]

Em Tue, Mar 06, 2018 at 05:28:33PM +0100, Daniel Borkmann escreveu:
> [ +acme ]
> 
> On 03/06/2018 05:00 PM, David Miller wrote:
> > From: Jiri Benc 
> > Date: Tue, 6 Mar 2018 16:03:25 +0100
> > 
> >> On Tue, 6 Mar 2018 15:39:07 +0100, Daniel Borkmann wrote:
> >>> Thanks for the fix, Jiri! The standard approach to resolve such header 
> >>> dependencies under
> >>> tools/ would be to add a copy of magic.h uapi header into 
> >>> tools/include/uapi/linux/magic.h.
> >>>
> >>> Both bpftool and libbpf have tools/include/uapi/ in their include path 
> >>> from their
> >>> Makefile, so they would pull this in automatically and it would also 
> >>> allow to get rid
> >>> of the extra ifdef in libbpf then. Could you look into that?
> >>
> >> That's what I tried at first. But honestly, this is a shortcut to hell.
> >> Eventually, we'd end up with most of uapi headers duplicated under
> >> tools/include/uapi and hopelessly out of sync.
> >>
> >> The right approach would be to export uapi headers from the currently
> >> built kernel somewhere (a temporary directory, perhaps) and use that to
> >> build the tools. We should not have duplicated and out of sync headers
> >> in the kernel tree. Just look at the git log for tools/include/uapi to
> >> see what I mean by "out of sync".
> > 
> > I understand your frustration.
> > 
> > I'm really puzzled why doing "make headers_install" and then building
> > these tools does not pick those in-kernel headers up.  That's what
> > really should happen.
> 
> Arnaldo, given this came out of tools/perf originally and duplicating/syncing
> headers is common practice since about 2014 in kernel git tree, do you have
> some context on why the above was/is not considered?

So, when tools/perf/ started we tried to use kernel code directly,
things like rbtree, list.h, etc.

It worked, we were sharing stuff with the kernel, all is well.

Then someone changes something in one of these files and tools/
compilation broke.

Fine for tools/ developers, we knew something like that could happen and
would fix things in tools/, life goes on.

Then Linus, IIRC, tried building tools/perf/ when something like that
had happened and the build broke for him.

He didn't liked that and we came up with this copy and check thing: we
copy something into tools/include/ and add it to
tools/perf/check-headers.sh, when something changes in the kernel,
nothing breaks, we get notified, check if the change implies changes in
tools/perf/, things like improving 'perf trace' to deal with new ioctl
or syscall flags, new syscalls, etc. Sometimes the copy ends up
automatically updating the tools, as there are scripts that generate
ioctl id->string tables, for instance, automatically from the updated
headers, things like what happened in this header update:

http://git.kernel.org/acme/c/1350fb7d1b48
tools include powerpc: Grab a copy of arch/powerpc/include/uapi/asm/unistd.h

With this in place no kernel developer needs to care about what happens
in tools/, tools/ developers don't need to worry about getting in the
way of kernel developers day-to-day activities.

Then we also have:

[acme@jouet perf]$ make help | grep perf
  perf-tar-src-pkg- Build perf-4.16.0-rc4.tar source tarball
  perf-targz-src-pkg  - Build perf-4.16.0-rc4.tar.gz source tarball
  perf-tarbz2-src-pkg - Build perf-4.16.0-rc4.tar.bz2 source tarball
  perf-tarxz-src-pkg  - Build perf-4.16.0-rc4.tar.xz source tarball
[acme@jouet perf]$ 

Use that, get the resulting tarball, and all you need to build it
anywhere should be self contained there, so the tools may use flags,
defines, syscall definitions, etc, without ifdefs, and the resulting
source code will build in many places, cross compiling, etc, like is
done for every pull request I send to Ingo, see for instance the 53
containers where this is all built in a pull req like the one from
yesterday:

  http://lkml.kernel.org/r/20180305142932.16921-1-a...@kernel.org

But now there are more tools in tools/ and of course we can and should
improve the whole process in a way that satisfies the various projects.

So, with this said, I'll try and read the above thread.

Ingo may add some other thoughts here, this is what came to my mind now.

- Arnaldo
 
> My current understanding is that the general preference would be on copying
> the headers into tools/include/ infrastructure once there are dependencies
> identified that would be missing on older/local system headers rather than
> ifdef'ery of various bit and pieces in the code that need to make use of them.
> Would be good to get some clarification on that in any case.
> 
> But that said, I'd also be fine taking the three-liner as is into bpf as a 
> fix.
> 
> > The kernel tree internally should be self-consistent.
> > 
> > It's one thing for an external tool like iproute2 to duplicate stuff
> > like this, but user programs inside the kernel have no excuse for
> > requiring things like that just to 

Re: [bpf-next V3 PATCH 1/5] bpf: Sync kernel ABI header with tooling header for bpf_common.h

2018-02-08 Thread Arnaldo Carvalho de Melo
Em Thu, Feb 08, 2018 at 12:48:12PM +0100, Jesper Dangaard Brouer escreveu:
> I recently fixed up a lot of commits that forgot to keep the tooling
> headers in sync.  And then I forgot to do the same thing in commit
> cb5f7334d479 ("bpf: add comments to BPF ld/ldx sizes"). Let correct
> that before people notice ;-).
> 
> Lawrence did partly fix/sync this for bpf.h in commit d6d4f60c3a09
> ("bpf: add selftest for tcpbpf").
> 
> Fixes: cb5f7334d479 ("bpf: add comments to BPF ld/ldx sizes")

We don't consider a bug to forget to update the tooling headers copy of
the files, i.e. its not a strict requirement on kernel developers to
care about tools/ :-)

I, for one, like to get the warning, its an opportunity for me to see
that something changed and that I should pay attention to see if
something needs to be done in the tooling side.

- Arnaldo

> Signed-off-by: Jesper Dangaard Brouer 
> ---
>  tools/include/uapi/linux/bpf_common.h |7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/include/uapi/linux/bpf_common.h 
> b/tools/include/uapi/linux/bpf_common.h
> index 18be90725ab0..ee97668bdadb 100644
> --- a/tools/include/uapi/linux/bpf_common.h
> +++ b/tools/include/uapi/linux/bpf_common.h
> @@ -15,9 +15,10 @@
>  
>  /* ld/ldx fields */
>  #define BPF_SIZE(code)  ((code) & 0x18)
> -#define  BPF_W   0x00
> -#define  BPF_H   0x08
> -#define  BPF_B   0x10
> +#define  BPF_W   0x00 /* 32-bit */
> +#define  BPF_H   0x08 /* 16-bit */
> +#define  BPF_B   0x10 /*  8-bit */
> +/* eBPF  BPF_DW  0x1864-bit */
>  #define BPF_MODE(code)  ((code) & 0xe0)
>  #define  BPF_IMM 0x00
>  #define  BPF_ABS 0x20


Re: len = bpf_probe_read_str(); bpf_perf_event_output(... len) == FAIL

2018-01-22 Thread Arnaldo Carvalho de Melo
Em Mon, Jan 22, 2018 at 10:28:11AM -0800, Yonghong Song escreveu:
> The compiler did "40: (bf) r1 = r0" and then uses "r1" for branch
> comparison, the original "r0" is left with complete unknown integer value
> and later used to calculate the buffer size "55: (bf) r5 = r0"
> where "r5" could be negative value and the verifier rightfully
> complains.
 
> There is no easy way to fix this in verifier unless verifier starts to track
> correlations between registers which is a big task. So your below workaround
> is okay. The below workaround should also work:
 
> int len = bpf_probe_read_str(filename.path, sizeof(filename.path),
> filename.ptr);
> if (len > 0 && len < 256)
> bpf_perf_event_output(ctx, _map, BPF_F_CURRENT_CPU,
> , (len & 0xff) + sizeof(filename.ptr));
> return 0;

Ok, thanks for one more time doing the analysis of the optimizations
emitted and suggesting something more compact, that I can confirm works:

[root@jouet bpf]# perf trace -a -e open,sys_enter_open.c sleep 0.1
LLVM: dumping sys_enter_open.o
 1.212 ( ): 
__bpf_stdout__:/usr/lib/locale/locale-archive..)
 1.218 ( 0.021 ms): sleep/9872 open(filename: 
/usr/lib/locale/locale-archive, flags: CLOEXEC) = 3
 2.905 ( ): 
__bpf_stdout__:..:.F.../usr/lib/locale/locale-archive..)
 2.910 ( 0.013 ms): rm/9873 open(filename: /usr/lib/locale/locale-archive, 
flags: CLOEXEC) = 3
 7.562 ( ): 
__bpf_stdout__:..ul/usr/lib/locale/locale-archive..)
 7.564 ( 0.013 ms): mv/9874 open(filename: /usr/lib/locale/locale-archive, 
flags: CLOEXEC) = 3
11.275 ( ): 
__bpf_stdout__:...d/usr/lib/locale/locale-archive..)
11.278 ( 0.012 ms): sh/9875 open(filename: /usr/lib/locale/locale-archive, 
flags: CLOEXEC) = 3
11.945 ( ): 
__bpf_stdout__:...d/usr/lib64/gconv/gconv-modules.cache)
11.953 ( 0.018 ms): sh/9875 open(filename: 
/usr/lib64/gconv/gconv-modules.cache) = 3
17.906 ( ): 
__bpf_stdout__:..T.p.../usr/lib/locale/locale-archive..)
17.913 ( 0.319 ms): gcc/9877 open(filename: /usr/lib/locale/locale-archive, 
flags: CLOEXEC) = 4
18.389 ( ): 
__bpf_stdout__:...l/usr/share/locale/locale.alias..)
18.394 ( 0.266 ms): gcc/9877 open(filename: /usr/share/locale/locale.alias, 
flags: CLOEXEC) = 4
18.777 ( ): 
__bpf_stdout__:@.../usr/share/locale/en_US.UTF-8/LC_MESSAGES/gcc.mo)
18.782 ( 0.318 ms): gcc/9877 open(filename: 
/usr/share/locale/en_US.UTF-8/LC_MESSAGES/gcc.mo, mode: 
IFBLK|IFIFO|ISGID|ISVTX|IRUSR|IXUSR|0xb5cc) = -1 ENOENT No such file or 
directory

[root@jouet bpf]# cat sys_enter_open.c
#include "bpf.h"

SEC("syscalls:sys_enter_open")
int func(void *ctx)
{
struct {
char *ptr;
char path[256];
} filename = {
.ptr = *((char **)(ctx + 16)),
};
int len = bpf_probe_read_str(filename.path, sizeof(filename.path), 
filename.ptr);
if (len > 0 && len < 256)
perf_event_output(ctx, &__bpf_stdout__, BPF_F_CURRENT_CPU, 
, (len & 0xff) + sizeof(filename.ptr));
return 0;
}
[root@jouet bpf]# 

- Arnaldo



Re: len = bpf_probe_read_str(); bpf_perf_event_output(... len) == FAIL

2018-01-22 Thread Arnaldo Carvalho de Melo
Em Wed, Nov 22, 2017 at 10:42:22AM -0800, Gianluca Borello escreveu:
> On Tue, Nov 21, 2017 at 2:31 PM, Alexei Starovoitov
>  wrote:
> >
> > yeah sorry about this hack. Gianluca reported this issue as well.
> > Yonghong fixed it for bpf_probe_read only. We will extend
> > the fix to bpf_probe_read_str() and bpf_perf_event_output() asap.
> > The above workaround gets too much into llvm and verifier details
> > we should strive to make bpf program writing as easy as possible.
> >
> 
> Hi Arnaldo
> 
> With the help of Alexei, Daniel and Yonghong I just submitted a new
> series ("bpf: fix semantics issues with helpers receiving NULL
> arguments") that includes a fix in bpf_perf_event_output. This should
> simplify the way you write your bpf programs, so you shouldn't be
> required to write those convoluted branches anymore (there are a few
> usage examples in the commit log).
> 
> In my case it made writing the code much easier, after applying it I
> haven't been surprised by the compiler output in a while, and I hope
> your experience will be improved as well.

Trying to work with this again, and I still need to trick clang into not
doing some optimizations that end up getting the resulting eBPF object
rejected by the kernel verifier:

[root@jouet bpf]# uname -a
Linux jouet 4.15.0-rc8+ #1 SMP Wed Jan 17 11:01:34 -03 2018 x86_64 x86_64 
x86_64 GNU/Linux
[root@jouet bpf]# grep -i bpf /lib/modules/`uname -r`/build/.config
CONFIG_CGROUP_BPF=y
CONFIG_BPF=y
CONFIG_BPF_SYSCALL=y
CONFIG_BPF_JIT_ALWAYS_ON=y
# CONFIG_NETFILTER_XT_MATCH_BPF is not set
# CONFIG_NET_CLS_BPF is not set
# CONFIG_NET_ACT_BPF is not set
CONFIG_BPF_JIT=y
CONFIG_BPF_STREAM_PARSER=y
CONFIG_LWTUNNEL_BPF=y
CONFIG_HAVE_EBPF_JIT=y
CONFIG_BPF_EVENTS=y
# CONFIG_TEST_BPF is not set
[root@jouet bpf]# cat sys_enter_open.c
#include "bpf.h"

SEC("syscalls:sys_enter_open")
int func(void *ctx)
{
struct {
char *ptr;
char path[256];
} filename = {
/*
 * /sys/kernel/debug/tracing/events/syscalls/sys_enter_open/format:
 * 
 * ...
 * field:const char * filename; offset:16;  size:8; signed:0;
 * ...
 * ctx + 16 selects 'filename'
 */
.ptr = *((char **)(ctx + 16)),
};
int len = bpf_probe_read_str(filename.path, sizeof(filename.path), 
filename.ptr);
if (len > 0 && len < 256)
perf_event_output(ctx, &__bpf_stdout__, BPF_F_CURRENT_CPU, 
, len + sizeof(filename.ptr));
return 0;
}
[root@jouet bpf]# 
[root@jouet bpf]# perf trace -v -e open,sys_enter_open.c
bpf: builtin compilation failed: -95, try external compiler
Kernel build dir is set to /lib/modules/4.15.0-rc8+/build
set env: KBUILD_DIR=/lib/modules/4.15.0-rc8+/build
unset env: KBUILD_OPTS
include option is set to  -nostdinc -isystem 
/usr/lib/gcc/x86_64-redhat-linux/7/include 
-I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  
-I/home/acme/git/linux/include -I./include 
-I/home/acme/git/linux/arch/x86/include/uapi 
-I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi 
-I./include/generated/uapi -include 
/home/acme/git/linux/include/linux/kconfig.h 
set env: NR_CPUS=4
set env: LINUX_VERSION_CODE=0x40f00
set env: CLANG_EXEC=/usr/local/bin/clang
unset env: CLANG_OPTIONS
set env: KERNEL_INC_OPTIONS= -nostdinc -isystem 
/usr/lib/gcc/x86_64-redhat-linux/7/include 
-I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  
-I/home/acme/git/linux/include -I./include 
-I/home/acme/git/linux/arch/x86/include/uapi 
-I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi 
-I./include/generated/uapi -include 
/home/acme/git/linux/include/linux/kconfig.h 
set env: WORKING_DIR=/lib/modules/4.15.0-rc8+/build
set env: CLANG_SOURCE=/home/acme/bpf/sys_enter_open.c
llvm compiling command template: $CLANG_EXEC -D__KERNEL__ 
-D__NR_CPUS__=$NR_CPUS -DLINUX_VERSION_CODE=$LINUX_VERSION_CODE $CLANG_OPTIONS 
$KERNEL_INC_OPTIONS -Wno-unused-value -Wno-pointer-sign -working-directory 
$WORKING_DIR -c "$CLANG_SOURCE" -target bpf -O2 -o -
libbpf: loading object 'sys_enter_open.c' from buffer
libbpf: section .strtab, size 101, link 0, flags 0, type=3
libbpf: section .text, size 0, link 0, flags 6, type=1
libbpf: section syscalls:sys_enter_open, size 472, link 0, flags 6, type=1
libbpf: found program syscalls:sys_enter_open
libbpf: section .relsyscalls:sys_enter_open, size 16, link 8, flags 0, type=9
libbpf: section maps, size 16, link 0, flags 3, type=1
libbpf: section license, size 4, link 0, flags 3, type=1
libbpf: license of sys_enter_open.c is GPL
libbpf: section version, size 4, link 0, flags 3, type=1
libbpf: kernel version of sys_enter_open.c is 40f00
libbpf: section .symtab, size 144, link 1, flags 0, type=2
libbpf: maps in sys_enter_open.c: 1 maps in 16 bytes
libbpf: map 0 is "__bpf_stdout__"
libbpf: collecting relocating info for: 'syscalls:sys_enter_open'

Re: net-next libbpf broken on prev kernel release

2017-12-14 Thread Arnaldo Carvalho de Melo
Em Thu, Dec 14, 2017 at 10:52:19AM +0100, Daniel Borkmann escreveu:
> [ +acme, +ast ]
> 
> On 12/14/2017 10:16 AM, Eric Leblond wrote:
> > Hello,
> > 
> > It seems that the following patch did break libbpf (in net-next
> > version) which is not able to load anymore a program on a 4.14:
> > 
> > tree 5096ddd73981e33a2164606461a45b56a189889c
> > parent ad5b177bd73f5107d97c36f56395c4281fb6f089
> > author Martin KaFai Lau  Wed Sep 27 14:37:54 2017 -0700
> > committer David S. Miller  Fri Sep 29 06:17:05 2017 
> > +0100
> > 
> > bpf: libbpf: Provide basic API support to specify BPF obj name
> > 
> > The problem comes from
> > 
> > -int bpf_load_program(enum bpf_prog_type type, const struct bpf_insn *insns,
> > -size_t insns_cnt, const char *license,
> > -__u32 kern_version, char *log_buf, size_t log_buf_sz)
> > +int bpf_load_program_name(enum bpf_prog_type type, const char *name,
> > + const struct bpf_insn *insns,
> > + size_t insns_cnt, const char *license,
> > + __u32 kern_version, char *log_buf,
> > + size_t log_buf_sz)
> >  {
> > int fd;
> > union bpf_attr attr;
> > +   __u32 name_len = name ? strlen(name) : 0;
> >  
> > bzero(, sizeof(attr));
> > attr.prog_type = type;
> > @@ -130,6 +151,7 @@ int bpf_load_program(enum bpf_prog_type type, const 
> > struct bpf_insn *insns,
> > attr.log_size = 0;
> > attr.log_level = 0;
> > attr.kern_version = kern_version;
> > +   memcpy(attr.prog_name, name, min(name_len, BPF_OBJ_NAME_LEN - 1));
> > 
> > If I comment the memcpy then the eBPF program is loading correctly.
> > 
> > Is this a wanted behavior to have libbpf that needs to be in sync with
> > kernel ? or should it be fixed ?
 
> Yeah, this was reported recently here: https://lkml.org/lkml/2017/11/28/1246
 
> I agree that given the policy of perf tool is to try to use new features
> but if they fail on older kernels, then we should try to fallback whenever
> that is feasible. I think for this specific case, we should in-fact fallback
> and try w/o map/prog name in order to fix this regression for perf (or
> other lib users).
 
> Also agree that this cannot be done for every possible case like the mentioned
> prog_ifindex field for offloading to NIC in the thread above, but I imho
> the prog_ifindex is a slightly different situation given that a user needs
> to specifically ask to offload via some provided API.
 
> I think the fix should be: if a user *specifically* calls 
> bpf_load_program_name()
> or bpf_create_map_name() API from the lib, then the intention is very clear
> that the bpf object should be created *with* name and otherwise fail. So a
> fallback for these APIs to load w/o name would be inappropriate! But for the
> existing code that used to load objects before, e.g. bpf_object__create_maps()
> or bpf_program__load() it should try to use either mentioned bpf_*_name() APIs
> and *iff* they fail, fall-back to the normal ones w/o name attribute. Meaning,
> this kind of fall-back should be done, but not on a sys_bpf() layer but from
> a higher PoV in the lib instead. I guess it would make sense to probe the
> underlying kernel at startup and then based on its capabilities use one out
> of the two APIs when we get there, such that we don't need to uselessly retry
> APIs for each prog load.

tools/perf/ has:

static struct {
bool sample_id_all;
bool exclude_guest;
bool mmap2;
bool cloexec;
bool clockid;
bool clockid_wrong;
bool lbr_flags;
bool write_backward;
bool group_read;
} perf_missing_features;

When the user request something that needs some of these features we try
using it, failing it will mark it as missing and then other events will
not needlessly try using it, i.e. we don't do it at program start, we
leave that to when we actually need it, to avoid uselessly probing at
startup.

> Arnaldo, will there be a rework of your fix that we could route to bpf tree?

I'm resuming work on it after I get my current batch tested and
submitted, will reboot with an older kernel and follow your suggestions,
that seems to match Alexei's and Martin's, my patch was just a RFC to
show that we need a fallback for older kernels.

I needed to move on, so I updated my machine to a kernel where interlock
of tools/ with the kernel happens and it worked, so I left this to see
if someone else complained or if I was being too picky. :-)

- Arnaldo


Re: [PATCH net-next 2/2] tools: bpftool: create "uninstall", "doc-uninstall" make targets

2017-12-13 Thread Arnaldo Carvalho de Melo
Em Thu, Dec 07, 2017 at 03:00:18PM -0800, Jakub Kicinski escreveu:
> From: Quentin Monnet <quentin.mon...@netronome.com>
> 
> Create two targets to remove executable and documentation that would
> have been previously installed with `make install` and `make
> doc-install`.
> 
> Also create a "QUIET_UNINST" helper in tools/scripts/Makefile.include.
> 
> Do not attempt to remove directories /usr/local/sbin and
> /usr/share/bash-completions/completions, even if they are empty, as
> those specific directories probably already existed on the system before
> we installed the program, and we do not wish to break other makefiles
> that might assume their existence. Do remvoe /usr/local/share/man/man8
> if empty however, as this directory does not seem to exist by default.
> 
> Signed-off-by: Quentin Monnet <quentin.mon...@netronome.com>


Acked-by: Arnaldo Carvalho de Melo <a...@redhat.com>

> ---
> For addition to tools/scripts/Makefile.include:
> 
> CC: Arnaldo Carvalho de Melo <a...@redhat.com>
> CC: Masahiro Yamada <yamada.masah...@socionext.com>
> 
>  tools/bpf/bpftool/Documentation/Makefile |  8 +++-
>  tools/bpf/bpftool/Makefile   | 12 ++--
>  tools/scripts/Makefile.include   |  1 +
>  3 files changed, 18 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/bpf/bpftool/Documentation/Makefile 
> b/tools/bpf/bpftool/Documentation/Makefile
> index 71c17fab4f2f..c462a928e03d 100644
> --- a/tools/bpf/bpftool/Documentation/Makefile
> +++ b/tools/bpf/bpftool/Documentation/Makefile
> @@ -3,6 +3,7 @@ include ../../../scripts/utilities.mak
>  
>  INSTALL ?= install
>  RM ?= rm -f
> +RMDIR ?= rmdir --ignore-fail-on-non-empty
>  
>  ifeq ($(V),1)
>Q =
> @@ -34,5 +35,10 @@ install: man
>   $(Q)$(INSTALL) -d -m 755 $(DESTDIR)$(man8dir)
>   $(Q)$(INSTALL) -m 644 $(DOC_MAN8) $(DESTDIR)$(man8dir)
>  
> -.PHONY: man man8 clean install
> +uninstall:
> + $(call QUIET_UNINST, Documentation-man)
> + $(Q)$(RM) $(addprefix $(DESTDIR)$(man8dir)/,$(_DOC_MAN8))
> + $(Q)$(RMDIR) $(DESTDIR)$(man8dir)
> +
> +.PHONY: man man8 clean install uninstall
>  .DEFAULT_GOAL := man
> diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile
> index 203ae2e14fbc..3f17ad317512 100644
> --- a/tools/bpf/bpftool/Makefile
> +++ b/tools/bpf/bpftool/Makefile
> @@ -70,6 +70,11 @@ install: $(OUTPUT)bpftool
>   $(Q)$(INSTALL) -m 0755 -d $(DESTDIR)$(bash_compdir)
>   $(Q)$(INSTALL) -m 0644 bash-completion/bpftool $(DESTDIR)$(bash_compdir)
>  
> +uninstall:
> + $(call QUIET_UNINST, bpftool)
> + $(Q)$(RM) $(DESTDIR)$(prefix)/sbin/bpftool
> + $(Q)$(RM) $(DESTDIR)$(bash_compdir)/bpftool
> +
>  doc:
>   $(call descend,Documentation)
>  
> @@ -79,8 +84,11 @@ install: $(OUTPUT)bpftool
>  doc-install:
>   $(call descend,Documentation,install)
>  
> +doc-uninstall:
> + $(call descend,Documentation,uninstall)
> +
>  FORCE:
>  
> -.PHONY: all FORCE clean install
> -.PHONY: doc doc-clean doc-install
> +.PHONY: all FORCE clean install uninstall
> +.PHONY: doc doc-clean doc-install doc-uninstall
>  .DEFAULT_GOAL := all
> diff --git a/tools/scripts/Makefile.include b/tools/scripts/Makefile.include
> index 3fab179b1aba..fcb3ed0be5f8 100644
> --- a/tools/scripts/Makefile.include
> +++ b/tools/scripts/Makefile.include
> @@ -99,5 +99,6 @@ ifneq ($(silent),1)
>  
>   QUIET_CLEAN= @printf '  CLEAN%s\n' $1;
>   QUIET_INSTALL  = @printf '  INSTALL  %s\n' $1;
> + QUIET_UNINST   = @printf '  UNINST   %s\n' $1;
>endif
>  endif
> -- 
> 2.15.1


Re: [PATCH/RFC] Re: 'perf test BPF' failing, libbpf regression wrt "basic API for BPF obj name"

2017-12-01 Thread Arnaldo Carvalho de Melo
Em Thu, Nov 30, 2017 at 01:51:15PM -0800, Alexei Starovoitov escreveu:
> On 11/30/17 11:00 AM, Arnaldo Carvalho de Melo wrote:
> > > Instead of sinking all future bpf_attr's backward compatibility
> > > requirements to sys_bpf,  I would push it up to its own BPF_* command
> > > helper which has a better sense of its bpf_attr, i.e. push it up
> > > to bpf_create_map_node() and bpf_load_program_name() in this case.
> > Humm, we could try that approach, but the one in this patch seemed good
> > enough.
> > 
> > And after all if the first syscall() invokation, with the latest kernel
> > and latest tooling will work, right?
> 
> I agree with Martin and I also don't think it will work to push
> logic of all bpf commands into single sys_bpf syscall wrapper.

Sure, that was just a POC, I'll work on something that takes into
account what you guys pointed out.

> This logic will become more and more complex over time.
> Like this case really belongs in bpf_create_map() which is a wrapper
> on top of single BPF_CREATE_MAP command.
 
> Note it's the first time we're facing this 'new libbpf.a running on
> top of old kernel' issue and should be very careful adding such
> fallback code to the generic bpf library, since all the selftests/bpf/
> are using this lib and relying on excepted behavior.

Right, tools/perf/ uses it as well and relies on its continued
functioning.

> We don't want tests that want to test the latest kernel feature all of
> a sudden pass on old kernel that doesn't have it.

Sure, neither do I :-)
 
> To some degree perf and selftests/bpf needs are diverging here,
> so adding #ifdef to libbpf.a to match testcase expectations may be
> necessary.

But this is not just testcase expectations, the usecase is someone
wanting to use a newer tool, with perhaps some new features of interest
that don't depend on changes in the kernel, in an older kernel on a
system where updating it is not possible or desirable.

- Arnaldo


Re: len = bpf_probe_read_str(); bpf_perf_event_output(... len) == FAIL

2017-11-21 Thread Arnaldo Carvalho de Melo
Em Tue, Nov 14, 2017 at 02:58:24PM -0800, Yonghong Song escreveu:
> On 11/14/17 12:25 PM, Daniel Borkmann wrote:
> > Yeah, I know, that's what I mentioned earlier in this thread to resolve it,
> > but do we really want to add this hack everywhere? :( Potentially any 
> > function
> > having ARG_CONST_SIZE would need to handle size 0 and bail out again in 
> > their
> > helper implementation and it ends up that progs start relying on this 
> > runtime
> > check where we won't be able to get rid of it later on anymore.
 
> The compiler actually does the right thing for the below code:
>  int ret = bpf_probe_read_str(filename, sizeof(filename),
>   filename_ptr);
>  if (ret > 0)
>bpf_perf_event_output(ctx, &__bpf_stdout__,BPF_F_CURRENT_CPU,
>filename, ret & (sizeof(filename) - 1));
 
> Just from the above code without consulting bpf_probe_read_str internals, it
> is totally possible that ret = 128, then
> ret & (sizeof(filename) - 1) = 0.

> The issue is that the verifier did not set the "ret" initial range as (-inf,
> sizeof(filename) - 1). We could have this information associated with helper
> and feed back to verifier.
 
> If we have this range, later for ret & (sizeof(filename) - 1) with ret >= 1,
> the verifier should be able to conclude
>  ret & (sizeof(filename) - 1) >= 1.
 
> To workaround the immediate problem, I tested the following hack
> with bcc and it works fine.
 
> BPF_PERF_OUTPUT(events);
> int trace(struct pt_regs *ctx) {
>   char filename[128];
>   int ret = bpf_probe_read_str(filename, sizeof(filename), 0);
>   if (ret > 0) {
> if (ret == 1)
>   events.perf_submit(ctx, filename, ret);
> else if (ret < 128)
>   events.perf_submit(ctx, filename, ret);
>   }
>   return 1;
> }
 
> The idea is to make control flow more complex to prevent llvm
> do certain optimizations.

So, the hack makes it work for me, using clang 6.0:

set env: NR_CPUS=4
set env: LINUX_VERSION_CODE=0x40e00
set env: CLANG_EXEC=/usr/local/bin/clang
unset env: CLANG_OPTIONS
set env: KERNEL_INC_OPTIONS= -nostdinc -isystem 
/usr/lib/gcc/x86_64-redhat-linux/7/include 
-I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  
-I/home/acme/git/linux/include -I./include 
-I/home/acme/git/linux/arch/x86/include/uapi 
-I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi 
-I./include/generated/uapi -include 
/home/acme/git/linux/include/linux/kconfig.h 
set env: WORKING_DIR=/lib/modules/4.14.0+/build
set env: CLANG_SOURCE=/home/acme/bpf/open.c
llvm compiling command template: $CLANG_EXEC -D__KERNEL__ 
-D__NR_CPUS__=$NR_CPUS -DLINUX_VERSION_CODE=$LINUX_VERSION_CODE $CLANG_OPTIONS 
$KERNEL_INC_OPTIONS -Wno-unused-value -Wno-pointer-sign -working-directory 
$WORKING_DIR -c "$CLANG_SOURCE" -target bpf -O2 -o -

[root@jouet bpf]# perf probe -V do_sys_open
Available variables at do_sys_open
@
char*   filename
int dfd
int flags
struct open_flags   op
umode_t mode
[root@jouet bpf]# cat open.c 
#include "bpf.h"

SEC("prog=do_sys_open filename")
int prog(void *ctx, int err, char *filename_ptr)
{
char filename[128];
int len = bpf_probe_read_str(filename, sizeof(filename), filename_ptr); 
if (len > 0) {
if (len == 1)
perf_event_output(ctx, &__bpf_stdout__, 
BPF_F_CURRENT_CPU, filename, len);
else if (len < 128)
perf_event_output(ctx, &__bpf_stdout__, 
BPF_F_CURRENT_CPU, filename, len);
}
return 1;
}
[root@jouet bpf]#
[root@jouet bpf]# perf trace -e *open,open.c touch /tmp/Thanks.Yonghong.Song\!
LLVM: dumping open.o
 0.000 ( 0.009 ms): touch/9034 open(filename: 0x5b678e37, flags: CLOEXEC
 ) ...
 0.009 ( ): __bpf_stdout__:/etc/ld.so.cache)
 0.011 ( ): perf_bpf_probe:prog:(8f260da0) 
filename=0x7f805b678e37)
 0.000 ( 0.016 ms): touch/9034  ... [continued]: open()) = 3
 0.034 ( 0.002 ms): touch/9034 open(filename: 0x5b87c640, flags: CLOEXEC
 ) ...
 0.036 ( ): __bpf_stdout__:/lib64/libc.so.6)
 0.037 ( ): perf_bpf_probe:prog:(8f260da0) 
filename=0x7f805b87c640)
 0.034 ( 0.009 ms): touch/9034  ... [continued]: open()) = 3
 0.251 ( 0.002 ms): touch/9034 open(filename: 0x5b422c70, flags: CLOEXEC
 ) ...
 0.253 ( ): __bpf_stdout__:/usr/lib/locale/locale-archive..)
 0.254 ( ): perf_bpf_probe:prog:(8f260da0) 
filename=0x7f805b422c70)
 0.251 ( 0.009 ms): touch/9034  ... [continued]: open()) = 3
 0.296 ( 0.002 ms): touch/9034 open(filename: 0x1d3a00f1, flags: 
CREAT|NOCTTY|NONBLOCK|WRONLY, mode: IRUGO|IWUGO) ...
 0.298 ( ): 

Re: len = bpf_probe_read_str(); bpf_perf_event_output(... len) == FAIL

2017-11-20 Thread Arnaldo Carvalho de Melo
Em Tue, Nov 14, 2017 at 09:25:17PM +0100, Daniel Borkmann escreveu:
> On 11/14/2017 07:15 PM, Yonghong Song wrote:
> > On 11/14/17 6:19 AM, Daniel Borkmann wrote:
> >> On 11/14/2017 02:42 PM, Arnaldo Carvalho de Melo wrote:
> >>> Em Tue, Nov 14, 2017 at 02:09:34PM +0100, Daniel Borkmann escreveu:
> >>>> On 11/14/2017 01:58 PM, Arnaldo Carvalho de Melo wrote:
> >>>>> Em Tue, Nov 14, 2017 at 01:09:39AM +0100, Daniel Borkmann escreveu:
> >>>>>> On 11/13/2017 04:08 PM, Arnaldo Carvalho de Melo wrote:
> >>>>>>> libbpf: -- BEGIN DUMP LOG ---
> >>>>>>> libbpf:
> >>>>>>> 0: (79) r3 = *(u64 *)(r1 +104)
> >>>>>>> 1: (b7) r2 = 0
> >>>>>>> 2: (bf) r6 = r1
> >>>>>>> 3: (bf) r1 = r10
> >>>>>>> 4: (07) r1 += -128
> >>>>>>> 5: (b7) r2 = 128
> >>>>>>> 6: (85) call bpf_probe_read_str#45
> >>>>>>> 7: (bf) r1 = r0
> >>>>>>> 8: (07) r1 += -1
> >>>>>>> 9: (67) r1 <<= 32
> >>>>>>> 10: (77) r1 >>= 32
> >>>>>>> 11: (25) if r1 > 0x7f goto pc+11
> >>>>>>
> >>>>>> Right, so the compiler is optimizing the two tests into a single one 
> >>>>>> above,
> >>>>>> which means lower bound cannot properly be derived again by the 
> >>>>>> verifier due
> >>>>>> to this and thus you'll get the error. Similar issue was seen recently 
> >>>>>> [1].
> >>>>>>
> >>>>>> Does the below hack work for you?
> >>>>>>
> >>>>>> int prog([...])
> >>>>>> {
> >>>>>>  char filename[128];
> >>>>>>  int ret = bpf_probe_read_str(filename, sizeof(filename), 
> >>>>>> filename_ptr);
> >>>>>>  if (ret > 0)
> >>>>>>  bpf_perf_event_output(ctx, &__bpf_stdout__, 
> >>>>>> BPF_F_CURRENT_CPU, filename,
> >>>>>>    ret & (sizeof(filename) - 1));
> >>>>>>  return 1;
> >>>>>> }
> >>>>>>
> >>>>>> r0 should keep on tracking bounds here at least:
> >>>>>>
> >>>>>> prog:
> >>>>>>     0:    bf 16 00 00 00 00 00 00 r6 = r1
> >>>>>>     1:    bf a1 00 00 00 00 00 00 r1 = r10
> >>>>>>     2:    07 01 00 00 80 ff ff ff r1 += -128
> >>>>>>     3:    b7 02 00 00 80 00 00 00 r2 = 128
> >>>>>>     4:    85 00 00 00 2d 00 00 00 call 45
> >>>>>>     5:    67 00 00 00 20 00 00 00 r0 <<= 32
> >>>>>>     6:    c7 00 00 00 20 00 00 00 r0 s>>= 32
> >>>>>>     7:    b7 01 00 00 01 00 00 00 r1 = 1
> >>>>>>     8:    6d 01 0a 00 00 00 00 00 if r1 s> r0 goto 10
> >>>>>>     9:    57 00 00 00 7f 00 00 00 r0 &= 127
> >>>>>>    10:    bf a4 00 00 00 00 00 00 r4 = r10
> >>>>>>    11:    07 04 00 00 80 ff ff ff r4 += -128
> >>>>>>    12:    bf 61 00 00 00 00 00 00 r1 = r6
> >>>>>>    13:    18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 
> >>>>>> 0ll
> >>>>>>    15:    18 03 00 00 ff ff ff ff 00 00 00 00 00 00 00 00 r3 = 
> >>>>>> 4294967295ll
> >>>>>>    17:    bf 05 00 00 00 00 00 00 r5 = r0
> >>>>>>    18:    85 00 00 00 19 00 00 00 call 25
> >>>>>>
> >>>>>>    [1] 
> >>>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__patchwork.ozlabs.org_project_netdev_list_-3Fseries-3D13211=DwIDaQ=5VD0RTtNlTh3ycd41b3MUw=DA8e1B5r073vIqRrFz7MRA=Qp3xFfXEz-CT8rzYtrHeXbow2M6FlsUzwcY32i3_2Q0=z0d6b_hxStA845Kh7epJ-JiFwkiWqUH_z3fEadwqAQY=
> >>>>>
> >>>>> Not yet:
> >>>>>
> >>>>> 6: (85) call bpf_probe_read_str#45
> >>>>> 7: (bf) r1 = r0
> >>>>> 8: (67) r1 <<= 32
> >>>>> 9: (77) r1 >>= 32
> >>>>> 10: (15) 

Re: len = bpf_probe_read_str(); bpf_perf_event_output(... len) == FAIL

2017-11-14 Thread Arnaldo Carvalho de Melo
Em Tue, Nov 14, 2017 at 03:19:51PM +0100, Daniel Borkmann escreveu:
> On 11/14/2017 02:42 PM, Arnaldo Carvalho de Melo wrote:
> > Em Tue, Nov 14, 2017 at 02:09:34PM +0100, Daniel Borkmann escreveu:
> >> On 11/14/2017 01:58 PM, Arnaldo Carvalho de Melo wrote:
> >> Currently having a version compiled from the git tree:

> >> # llc --version
> >> LLVM (http://llvm.org/):
> >>   LLVM version 6.0.0git-2d810c2
> >>   Optimized build.
> >>   Default target: x86_64-unknown-linux-gnu
> >>   Host CPU: skylake

> > [root@jouet bpf]# llc --version
> > LLVM (http://llvm.org/):
> >   LLVM version 4.0.0svn

> > Old stuff! ;-) Will change, but improving these messages should be on
> > the radar, I think :-)

> Yep, agree, I think we need a generic, better solution for this type of
> issue instead of converting individual helpers to handle 0 min bound and
> then only bailing out in such case; need to brainstorm a bit on that.
 
> I think for the above in your case ...
 
>  [...]
>   6: (85) call bpf_probe_read_str#45
>   7: (bf) r1 = r0
>   8: (67) r1 <<= 32
>   9: (77) r1 >>= 32
>  10: (15) if r1 == 0x0 goto pc+10
>   R0=inv(id=0) R1=inv(id=0,umax_value=4294967295,var_off=(0x0; 0x)) 
> R6=ctx(id=0,off=0,imm=0) R10=fp0
>  11: (57) r0 &= 127
>  [...]
 
> ... the shifts on r1 might be due to using 32 bit type, so if you find
> a way to avoid these and have the test on r0 directly, we might get there.
> Perhaps keep using a 64 bit type to avoid them. It would be useful to
> propagate the deduced bound information back to r0 when we know that
> neither r0 nor r1 has changed in the meantime.

I changed len/ret to u64, didn't help, updating clang and llvm to see if
that helps...

Will end up working directly with eBPF bytecode, which is what I really
need in 'perf trace', but lets get this sorted out first.

- Arnaldo


Re: len = bpf_probe_read_str(); bpf_perf_event_output(... len) == FAIL

2017-11-14 Thread Arnaldo Carvalho de Melo
Em Tue, Nov 14, 2017 at 02:09:34PM +0100, Daniel Borkmann escreveu:
> On 11/14/2017 01:58 PM, Arnaldo Carvalho de Melo wrote:
> > Em Tue, Nov 14, 2017 at 01:09:39AM +0100, Daniel Borkmann escreveu:
> >> On 11/13/2017 04:08 PM, Arnaldo Carvalho de Melo wrote:
> >>> libbpf: -- BEGIN DUMP LOG ---
> >>> libbpf: 
> >>> 0: (79) r3 = *(u64 *)(r1 +104)
> >>> 1: (b7) r2 = 0
> >>> 2: (bf) r6 = r1
> >>> 3: (bf) r1 = r10
> >>> 4: (07) r1 += -128
> >>> 5: (b7) r2 = 128
> >>> 6: (85) call bpf_probe_read_str#45
> >>> 7: (bf) r1 = r0
> >>> 8: (07) r1 += -1
> >>> 9: (67) r1 <<= 32
> >>> 10: (77) r1 >>= 32
> >>> 11: (25) if r1 > 0x7f goto pc+11
> >>
> >> Right, so the compiler is optimizing the two tests into a single one above,
> >> which means lower bound cannot properly be derived again by the verifier 
> >> due
> >> to this and thus you'll get the error. Similar issue was seen recently [1].
> >>
> >> Does the below hack work for you?
> >>
> >> int prog([...])
> >> {
> >> char filename[128];
> >> int ret = bpf_probe_read_str(filename, sizeof(filename), 
> >> filename_ptr);
> >> if (ret > 0)
> >> bpf_perf_event_output(ctx, &__bpf_stdout__, 
> >> BPF_F_CURRENT_CPU, filename,
> >>   ret & (sizeof(filename) - 1));
> >> return 1;
> >> }
> >>
> >> r0 should keep on tracking bounds here at least:
> >>
> >> prog:
> >>0:  bf 16 00 00 00 00 00 00 r6 = r1
> >>1:  bf a1 00 00 00 00 00 00 r1 = r10
> >>2:  07 01 00 00 80 ff ff ff r1 += -128
> >>3:  b7 02 00 00 80 00 00 00 r2 = 128
> >>4:  85 00 00 00 2d 00 00 00 call 45
> >>5:  67 00 00 00 20 00 00 00 r0 <<= 32
> >>6:  c7 00 00 00 20 00 00 00 r0 s>>= 32
> >>7:  b7 01 00 00 01 00 00 00 r1 = 1
> >>8:  6d 01 0a 00 00 00 00 00 if r1 s> r0 goto 10
> >>9:  57 00 00 00 7f 00 00 00 r0 &= 127
> >>   10:  bf a4 00 00 00 00 00 00 r4 = r10
> >>   11:  07 04 00 00 80 ff ff ff r4 += -128
> >>   12:  bf 61 00 00 00 00 00 00 r1 = r6
> >>   13:  18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0ll
> >>   15:  18 03 00 00 ff ff ff ff 00 00 00 00 00 00 00 00 r3 = 
> >> 4294967295ll
> >>   17:  bf 05 00 00 00 00 00 00 r5 = r0
> >>   18:  85 00 00 00 19 00 00 00 call 25
> >>
> >>   [1] http://patchwork.ozlabs.org/project/netdev/list/?series=13211
> > 
> > Not yet:
> > 
> > 6: (85) call bpf_probe_read_str#45
> > 7: (bf) r1 = r0
> > 8: (67) r1 <<= 32
> > 9: (77) r1 >>= 32
> > 10: (15) if r1 == 0x0 goto pc+10
> >  R0=inv(id=0) R1=inv(id=0,umax_value=4294967295,var_off=(0x0; 0x)) 
> > R6=ctx(id=0,off=0,imm=0) R10=fp0
> > 11: (57) r0 &= 127
> > 12: (bf) r4 = r10
> > 13: (07) r4 += -128
> > 14: (bf) r1 = r6
> > 15: (18) r2 = 0x92bfc2aba840
> > 17: (18) r3 = 0x
> > 19: (bf) r5 = r0
> > 20: (85) call bpf_perf_event_output#25
> > invalid stack type R4 off=-128 access_size=0
> > 
> > I'll try updating clang/llvm...
> > 
> > Full details:
> > 
> > [root@jouet bpf]# cat open.c 
> > #include "bpf.h"
> > 
> > SEC("prog=do_sys_open filename")
> > int prog(void *ctx, int err, const char __user *filename_ptr)
> > {
> > char filename[128];
> > const unsigned len = bpf_probe_read_str(filename, sizeof(filename), 
> > filename_ptr);
> 
> Btw, I was using 'int' here above instead of 'unsigned' as 
> strncpy_from_unsafe()
> could potentially return errors like -EFAULT.

I changed to int, didn't help
 
> Currently having a version compiled from the git tree:
> 
> # llc --version
> LLVM (http://llvm.org/):
>   LLVM version 6.0.0git-2d810c2
>   Optimized build.
>   Default target: x86_64-unknown-linux-gnu
>   Host CPU: skylake

[root@jouet bpf]# llc --version
LLVM (http://llvm.org/):
  LLVM version 4.0.0svn

Old stuff! ;-) Will change, but improving these messages should be on
the radar, I think :-)

- Arnaldo
 
>   Registered Targets:
> bpf- BPF (host endian)
> bpfeb  - BPF (big endian)
> bpfel  - BPF (little endian)
> x86- 32-bit X86: Pentium-Pro and above
> x86-64 - 64-bit X86: EM64T and AMD64
> 
> > if (len > 0)
> > perf_event_output(ctx, &__bpf_stdout__, 
> > BPF_F_CURRENT_CPU, filename,
> >   len & (sizeof(filename) - 1));
> > return 1;
> > }


Re: len = bpf_probe_read_str(); bpf_perf_event_output(... len) == FAIL

2017-11-14 Thread Arnaldo Carvalho de Melo
Em Tue, Nov 14, 2017 at 01:09:39AM +0100, Daniel Borkmann escreveu:
> On 11/13/2017 04:08 PM, Arnaldo Carvalho de Melo wrote:
> > libbpf: -- BEGIN DUMP LOG ---
> > libbpf: 
> > 0: (79) r3 = *(u64 *)(r1 +104)
> > 1: (b7) r2 = 0
> > 2: (bf) r6 = r1
> > 3: (bf) r1 = r10
> > 4: (07) r1 += -128
> > 5: (b7) r2 = 128
> > 6: (85) call bpf_probe_read_str#45
> > 7: (bf) r1 = r0
> > 8: (07) r1 += -1
> > 9: (67) r1 <<= 32
> > 10: (77) r1 >>= 32
> > 11: (25) if r1 > 0x7f goto pc+11
> 
> Right, so the compiler is optimizing the two tests into a single one above,
> which means lower bound cannot properly be derived again by the verifier due
> to this and thus you'll get the error. Similar issue was seen recently [1].
> 
> Does the below hack work for you?
> 
> int prog([...])
> {
> char filename[128];
> int ret = bpf_probe_read_str(filename, sizeof(filename), 
> filename_ptr);
> if (ret > 0)
> bpf_perf_event_output(ctx, &__bpf_stdout__, 
> BPF_F_CURRENT_CPU, filename,
>   ret & (sizeof(filename) - 1));
> return 1;
> }
> 
> r0 should keep on tracking bounds here at least:
> 
> prog:
>0: bf 16 00 00 00 00 00 00 r6 = r1
>1: bf a1 00 00 00 00 00 00 r1 = r10
>2: 07 01 00 00 80 ff ff ff r1 += -128
>3: b7 02 00 00 80 00 00 00 r2 = 128
>4: 85 00 00 00 2d 00 00 00 call 45
>5: 67 00 00 00 20 00 00 00 r0 <<= 32
>6: c7 00 00 00 20 00 00 00 r0 s>>= 32
>7: b7 01 00 00 01 00 00 00 r1 = 1
>8: 6d 01 0a 00 00 00 00 00 if r1 s> r0 goto 10
>9: 57 00 00 00 7f 00 00 00 r0 &= 127
>   10: bf a4 00 00 00 00 00 00 r4 = r10
>   11: 07 04 00 00 80 ff ff ff r4 += -128
>   12: bf 61 00 00 00 00 00 00 r1 = r6
>   13: 18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0ll
>   15: 18 03 00 00 ff ff ff ff 00 00 00 00 00 00 00 00 r3 = 
> 4294967295ll
>   17: bf 05 00 00 00 00 00 00 r5 = r0
>   18: 85 00 00 00 19 00 00 00 call 25
> 
>   [1] http://patchwork.ozlabs.org/project/netdev/list/?series=13211

Not yet:

6: (85) call bpf_probe_read_str#45
7: (bf) r1 = r0
8: (67) r1 <<= 32
9: (77) r1 >>= 32
10: (15) if r1 == 0x0 goto pc+10
 R0=inv(id=0) R1=inv(id=0,umax_value=4294967295,var_off=(0x0; 0x)) 
R6=ctx(id=0,off=0,imm=0) R10=fp0
11: (57) r0 &= 127
12: (bf) r4 = r10
13: (07) r4 += -128
14: (bf) r1 = r6
15: (18) r2 = 0x92bfc2aba840
17: (18) r3 = 0x
19: (bf) r5 = r0
20: (85) call bpf_perf_event_output#25
invalid stack type R4 off=-128 access_size=0

I'll try updating clang/llvm...

Full details:

[root@jouet bpf]# cat open.c 
#include "bpf.h"

SEC("prog=do_sys_open filename")
int prog(void *ctx, int err, const char __user *filename_ptr)
{
char filename[128];
const unsigned len = bpf_probe_read_str(filename, sizeof(filename), 
filename_ptr);
if (len > 0)
perf_event_output(ctx, &__bpf_stdout__, BPF_F_CURRENT_CPU, 
filename,
  len & (sizeof(filename) - 1));
return 1;
}
[root@jouet bpf]# perf trace -v -e *open,open.c  usleep 2
bpf: builtin compilation failed: -95, try external compiler
Kernel build dir is set to /lib/modules/4.14.0+/build
set env: KBUILD_DIR=/lib/modules/4.14.0+/build
unset env: KBUILD_OPTS
include option is set to  -nostdinc -isystem 
/usr/lib/gcc/x86_64-redhat-linux/7/include 
-I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  
-I/home/acme/git/linux/include -I./include 
-I/home/acme/git/linux/arch/x86/include/uapi 
-I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi 
-I./include/generated/uapi -include 
/home/acme/git/linux/include/linux/kconfig.h 
set env: NR_CPUS=4
set env: LINUX_VERSION_CODE=0x40e00
set env: CLANG_EXEC=/usr/local/bin/clang
unset env: CLANG_OPTIONS
set env: KERNEL_INC_OPTIONS= -nostdinc -isystem 
/usr/lib/gcc/x86_64-redhat-linux/7/include 
-I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  
-I/home/acme/git/linux/include -I./include 
-I/home/acme/git/linux/arch/x86/include/uapi 
-I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi 
-I./include/generated/uapi -include 
/home/acme/git/linux/include/linux/kconfig.h 
set env: WORKING_DIR=/lib/modules/4.14.0+/build
set env: CLANG_SOURCE=/home/acme/bpf/open.c
llvm compiling command template: $CLANG_EXEC -D__KERNEL__ 
-D__NR_CPUS__=$NR_CPUS -DLINUX_VERSION_CODE=$LINUX_VERSION_CODE $CLANG_OPTIONS 
$KERNEL_INC_OP

Re: len = bpf_probe_read_str(); bpf_perf_event_output(... len) == FAIL

2017-11-13 Thread Arnaldo Carvalho de Melo
Em Mon, Nov 13, 2017 at 03:56:14PM +0100, Daniel Borkmann escreveu:
> On 11/13/2017 03:30 PM, Arnaldo Carvalho de Melo wrote:
> > Hi,
> > 
> > In a5e8c07059d0 ("bpf: add bpf_probe_read_str helper") you
> > state:
> > 
> >"This is suboptimal because the size of the string needs to be estimated
> > at compile time, causing more memory to be copied than often necessary,
> > and can become more problematic if further processing on buf is done,
> > for example by pushing it to userspace via bpf_perf_event_output(),
> > since the real length of the string is unknown and the entire buffer
> > must be copied (and defining an unrolled strnlen() inside the bpf
> > program is a very inefficient and unfeasible approach)."
> > 
> > So I went on to try this with 'perf trace' but it isn't working if I use
> > the return from bpf_probe_read_str(), I must be missing something
> > here... 
> > 
> > I.e. this works:
> > 
> > [root@jouet bpf]# cat open.c
> > #include "bpf.h"
> > 
> > SEC("prog=do_sys_open filename")
> > int prog(void *ctx, int err, const char __user *filename_ptr)
> > {
> > char filename[128];
> > const unsigned len = bpf_probe_read_str(filename, sizeof(filename), 
> > filename_ptr);
> > perf_event_output(ctx, &__bpf_stdout__, get_smp_processor_id(), 
> > filename, 32);
> 
> By the way, you can just use BPF_F_CURRENT_CPU flag instead of the helper
> call get_smp_processor_id() to get current CPU.

Thanks, switched to it.

> > But then if I use the return value to push just the string lenght, it
> > doesn't work:
> > 
> > [root@jouet bpf]# cat open.c
> > #include "bpf.h"
> > 
> > SEC("prog=do_sys_open filename")
> > int prog(void *ctx, int err, const char __user *filename_ptr)
> > {
> > char filename[128];
> > const unsigned len = bpf_probe_read_str(filename, sizeof(filename), 
> > filename_ptr);
> > perf_event_output(ctx, &__bpf_stdout__, get_smp_processor_id(), 
> > filename, len);
> 
> The below issue 'invalid stack type R4 off=-128 access_size=0' is basically 
> that
> unsigned len is unknown at verification time, thus unbounded. Can you try the
> following to see if that passes?
> 
> if (len > 0 && len <= sizeof(filename))
> perf_event_output(ctx, &__bpf_stdout__, get_smp_processor_id(), filename, 
> len);

I had it like:

if (len > 0 && len < 32)

And it didn't helped, now I did exactly as you suggested:

[root@jouet bpf]# cat open.c
#include "bpf.h"

SEC("prog=do_sys_open filename")
int prog(void *ctx, int err, const char __user *filename_ptr)
{
char filename[128];
const unsigned len = bpf_probe_read_str(filename, sizeof(filename), 
filename_ptr);
if (len > 0 && len <= sizeof(filename))
perf_event_output(ctx, &__bpf_stdout__, BPF_F_CURRENT_CPU, 
filename, len);
return 1;
}
[root@jouet bpf]# trace -e open,open.c touch /etc/passwd
bpf: builtin compilation failed: -95, try external compiler
event syntax error: 'open.c'
 \___ Kernel verifier blocks program loading

[root@jouet bpf]# 

The -v output looks the same:

[root@jouet bpf]# trace -v -e open,open.c touch /etc/passwd
bpf: builtin compilation failed: -95, try external compiler
Kernel build dir is set to /lib/modules/4.14.0-rc6+/build
set env: KBUILD_DIR=/lib/modules/4.14.0-rc6+/build
unset env: KBUILD_OPTS
include option is set to  -nostdinc -isystem 
/usr/lib/gcc/x86_64-redhat-linux/7/include 
-I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  
-I/home/acme/git/linux/include -I./include 
-I/home/acme/git/linux/arch/x86/include/uapi 
-I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi 
-I./include/generated/uapi -include 
/home/acme/git/linux/include/linux/kconfig.h 
set env: NR_CPUS=4
set env: LINUX_VERSION_CODE=0x40e00
set env: CLANG_EXEC=/usr/local/bin/clang
unset env: CLANG_OPTIONS
set env: KERNEL_INC_OPTIONS= -nostdinc -isystem 
/usr/lib/gcc/x86_64-redhat-linux/7/include 
-I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  
-I/home/acme/git/linux/include -I./include 
-I/home/acme/git/linux/arch/x86/include/uapi 
-I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi 
-I./include/generated/uapi -include 
/home/acme/git/linux/include/linux/kconfig.h 
set env: WORKING_DIR=/lib/modules/4.14.0-rc6+/build
set env: CLANG_SOURCE=/home/acme/bpf/open.c
llvm compiling command template: $CLANG_EXEC -D__KERNEL__ 
-D__NR_CPUS__=$NR_CPUS -DLINUX_VERSION_CODE=$LINUX_VERSION_CODE $CLANG_OPTIONS 
$KERNEL_INC_OPTIONS -W

len = bpf_probe_read_str(); bpf_perf_event_output(... len) == FAIL

2017-11-13 Thread Arnaldo Carvalho de Melo
Hi,

In a5e8c07059d0 ("bpf: add bpf_probe_read_str helper") you
state:

   "This is suboptimal because the size of the string needs to be estimated
at compile time, causing more memory to be copied than often necessary,
and can become more problematic if further processing on buf is done,
for example by pushing it to userspace via bpf_perf_event_output(),
since the real length of the string is unknown and the entire buffer
must be copied (and defining an unrolled strnlen() inside the bpf
program is a very inefficient and unfeasible approach)."

So I went on to try this with 'perf trace' but it isn't working if I use
the return from bpf_probe_read_str(), I must be missing something
here... 

I.e. this works:

[root@jouet bpf]# cat open.c
#include "bpf.h"

SEC("prog=do_sys_open filename")
int prog(void *ctx, int err, const char __user *filename_ptr)
{
char filename[128];
const unsigned len = bpf_probe_read_str(filename, sizeof(filename), 
filename_ptr);
perf_event_output(ctx, &__bpf_stdout__, get_smp_processor_id(), 
filename, 32);
return 1;
}
[root@jouet bpf]# perf trace -e open,open.c touch /etc/passwd
bpf: builtin compilation failed: -95, try external compiler
 0.000 ( 0.013 ms): touch/14403 open(filename: 0x2ff7ce37, flags: CLOEXEC   
  ) ...
 0.013 ( ): __bpf_stdout__:/etc/ld.so.cache..B.)
 0.015 ( ): perf_bpf_probe:prog:(b4260ae0) 
filename=0x7f7a2ff7ce37)
 0.000 ( 0.021 ms): touch/14403  ... [continued]: open()) = 3
 0.042 ( 0.002 ms): touch/14403 open(filename: 0x30180640, flags: CLOEXEC   
  ) ...
 0.044 ( ): __bpf_stdout__:/lib64/libc.so.6.. ...G.)
 0.045 ( ): perf_bpf_probe:prog:(b4260ae0) 
filename=0x7f7a30180640)
 0.042 ( 0.010 ms): touch/14403  ... [continued]: open()) = 3
 0.301 ( 0.003 ms): touch/14403 open(filename: 0x2fd26c70, flags: CLOEXEC   
  ) ...
 0.305 ( ): __bpf_stdout__:/usr/lib/locale/locale-archive..)
 0.306 ( ): perf_bpf_probe:prog:(b4260ae0) 
filename=0x7f7a2fd26c70)
 0.301 ( 0.011 ms): touch/14403  ... [continued]: open()) = 3
 0.360 ( 0.002 ms): touch/14403 open(filename: 0x681f20f3, flags: 
CREAT|NOCTTY|NONBLOCK|WRONLY, mode: IRUGO|IWUGO) ...
 0.362 ( ): __bpf_stdout__:/etc/passwd... ...D.)
 0.363 ( ): perf_bpf_probe:prog:(b4260ae0) 
filename=0x7ffe681f20f3)
 0.360 ( 0.010 ms): touch/14403  ... [continued]: open()) = 3
[root@jouet bpf]#

That bpf.h will set up the maps, etc, its attached if that may be needed
to help figure this out.

But then if I use the return value to push just the string lenght, it
doesn't work:

[root@jouet bpf]# cat open.c
#include "bpf.h"

SEC("prog=do_sys_open filename")
int prog(void *ctx, int err, const char __user *filename_ptr)
{
char filename[128];
const unsigned len = bpf_probe_read_str(filename, sizeof(filename), 
filename_ptr);
perf_event_output(ctx, &__bpf_stdout__, get_smp_processor_id(), 
filename, len);
return 1;
}
[root@jouet bpf]# perf trace -e open,open.c touch /etc/passwd
bpf: builtin compilation failed: -95, try external compiler
event syntax error: 'open.c'
 \___ Kernel verifier blocks program loading

(add -v to see detail)
Run 'perf list' for a list of valid events

 Usage: perf trace [] []
or: perf trace [] --  []
or: perf trace record [] []
or: perf trace record [] --  []

-e, --eventevent/syscall selector. use 'perf list' to list 
available events
[root@jouet bpf]#

When running this with -v we get the tools/lib/libbpf.c debug that may
help here:

Opening /sys/kernel/debug/tracing//kprobe_events write=1
Writing event: p:perf_bpf_probe/prog _text+2493152 filename=%si:x64
In map_prologue, ntevs=1
mapping[0]=0
libbpf: create map __bpf_stdout__: fd=3
prologue: pass validation
prologue: fast path
libbpf: load bpf program failed: Permission denied
libbpf: -- BEGIN DUMP LOG ---
libbpf: 
0: (79) r3 = *(u64 *)(r1 +104)
1: (b7) r2 = 0
2: (bf) r6 = r1
3: (bf) r7 = r10
4: (07) r7 += -128
5: (bf) r1 = r7
6: (b7) r2 = 128
7: (85) call bpf_probe_read_str#45
8: (bf) r8 = r0
9: (67) r8 <<= 32
10: (77) r8 >>= 32
11: (85) call bpf_get_smp_processor_id#8
12: (bf) r1 = r6
13: (18) r2 = 0xa0b5958e16c0
15: (bf) r3 = r0
16: (bf) r4 = r7
17: (bf) r5 = r8
18: (85) call bpf_perf_event_output#25
invalid stack type R4 off=-128 access_size=0

libbpf: -- END LOG --
libbpf: Loading the 0th instance of program 'prog=do_sys_open filename' failed
libbpf: failed to load program 'prog=do_sys_open filename'
libbpf: failed to load object 'open.c'
bpf: load objects failed
event syntax error: 'open.c'
 \___ Kernel verifier blocks program loading

I tried adding checks for len to try to somehow make sure its all bounds
checked, but 

bpf.h drift due to bpf_sk_redirect_map()

2017-10-26 Thread Arnaldo Carvalho de Melo
Hi John,

Recently the tools/perf/ build system noticed drift in
tools/include/uapi/linux/bpf.h from its master copy
include/uapi/linux/bpf.h, which comes from changes from you, can you
please check this?

[acme@jouet linux]$ diff -u tools/include/uapi/linux/bpf.h 
include/uapi/linux/bpf.h
--- tools/include/uapi/linux/bpf.h  2017-10-26 08:10:15.980323396 -0300
+++ include/uapi/linux/bpf.h2017-10-19 14:26:13.859622885 -0300
@@ -569,10 +569,9 @@
  * @flags: reserved for future use
  * Return: 0 on success or negative error code
  *
- * int bpf_sk_redirect_map(skb, map, key, flags)
+ * int bpf_sk_redirect_map(map, key, flags)
  * Redirect skb to a sock in map using key as a lookup key for the
  * sock in map.
- * @skb: pointer to skb
  * @map: pointer to sockmap
  * @key: key to lookup sock in map
  * @flags: reserved for future use
[acme@jouet linux]$

- Arnaldo


Re: [PATCH net-next v2 0/3] tools: add bpftool

2017-10-04 Thread Arnaldo Carvalho de Melo
Em Tue, Oct 03, 2017 at 05:48:22PM -0700, Jakub Kicinski escreveu:
> On Tue, 3 Oct 2017 17:19:42 -0300, Arnaldo Carvalho de Melo wrote:
> > Why not call it just 'bpf'?
 
> bpftool was suggested as a better name, I don't really mind either way.

I just thought that 'bpf' isn't used as a command, shorter, less typing,
but yeah, if people think having 'tool' in the tool name helps somewhat,
so be it.

- Arnaldo


Re: [PATCH net-next v2 0/3] tools: add bpftool

2017-10-03 Thread Arnaldo Carvalho de Melo
Em Mon, Oct 02, 2017 at 04:11:27PM -0700, Jakub Kicinski escreveu:
> Hi!
> 
> This set adds bpftool to the tools/ directory.  The first 
> patch renames tools/net to tools/bpf, the second one adds 
> the new code, while the third adds simple documentation.
> 
> v2:
>  - report names, map ids, load time, uid;
>  - add docs/man pages;
>  - general cleanups & fixes.
> 
> Thanks to David Beckett for help with docs and testing.

Why not call it just 'bpf'?

- Arnaldo
 
> Jakub Kicinski (3):
>   tools: rename tools/net directory to tools/bpf
>   tools: bpf: add bpftool
>   tools: bpftool: add documentation
> 
>  MAINTAINERS  |   3 +-
>  tools/Makefile   |  14 +-
>  tools/{net => bpf}/Makefile  |  18 +-
>  tools/{net => bpf}/bpf_asm.c |   0
>  tools/{net => bpf}/bpf_dbg.c |   0
>  tools/{net => bpf}/bpf_exp.l |   0
>  tools/{net => bpf}/bpf_exp.y |   0
>  tools/{net => bpf}/bpf_jit_disasm.c  |   0
>  tools/bpf/bpftool/Documentation/Makefile |  34 ++
>  tools/bpf/bpftool/Documentation/bpftool-map.txt  | 110 
>  tools/bpf/bpftool/Documentation/bpftool-prog.txt |  81 +++
>  tools/bpf/bpftool/Documentation/bpftool.txt  |  34 ++
>  tools/bpf/bpftool/Makefile   |  86 +++
>  tools/bpf/bpftool/common.c   | 215 +++
>  tools/bpf/bpftool/jit_disasm.c   |  87 +++
>  tools/bpf/bpftool/main.c | 212 +++
>  tools/bpf/bpftool/main.h |  99 +++
>  tools/bpf/bpftool/map.c  | 744 
> +++
>  tools/bpf/bpftool/prog.c | 427 +
>  19 files changed, 2152 insertions(+), 12 deletions(-)
>  rename tools/{net => bpf}/Makefile (74%)
>  rename tools/{net => bpf}/bpf_asm.c (100%)
>  rename tools/{net => bpf}/bpf_dbg.c (100%)
>  rename tools/{net => bpf}/bpf_exp.l (100%)
>  rename tools/{net => bpf}/bpf_exp.y (100%)
>  rename tools/{net => bpf}/bpf_jit_disasm.c (100%)
>  create mode 100644 tools/bpf/bpftool/Documentation/Makefile
>  create mode 100644 tools/bpf/bpftool/Documentation/bpftool-map.txt
>  create mode 100644 tools/bpf/bpftool/Documentation/bpftool-prog.txt
>  create mode 100644 tools/bpf/bpftool/Documentation/bpftool.txt
>  create mode 100644 tools/bpf/bpftool/Makefile
>  create mode 100644 tools/bpf/bpftool/common.c
>  create mode 100644 tools/bpf/bpftool/jit_disasm.c
>  create mode 100644 tools/bpf/bpftool/main.c
>  create mode 100644 tools/bpf/bpftool/main.h
>  create mode 100644 tools/bpf/bpftool/map.c
>  create mode 100644 tools/bpf/bpftool/prog.c
> 
> -- 
> 2.14.1


Re: [PATCH net-next] bridge: add tracepoint in br_fdb_update

2017-08-31 Thread Arnaldo Carvalho de Melo
Em Thu, Aug 31, 2017 at 06:20:12PM +0200, Jesper Dangaard Brouer escreveu:
> On Thu, 31 Aug 2017 09:30:05 -0600 David Ahern  wrote:
> > > On Thu, Aug 31, 2017 at 5:38 AM, Jesper Dangaard Brouer 
> > >  wrote:  
> > > These bridge tracepoints in context are primarily for debugging fdb
> > > updates only, not for every packet and hence not in the performance
> > > path.
> > > In large scale deployments with thousands of bridge ports and fdb
> > > entries, dev->name will definately make it easier to trouble-shoot.
> > > So, I did like to leave these with dev->name unless there are strong 
> > > objections.  

> > +1 for user friendliness for debugging tracepoints. The device name is
> > also more user friendly when adding filters to the data collection.

> > Being able to add bpf everywhere certainly changes the game a bit, but
> > we should not relinquish ease of use and understanding for the potential
> > that someone might want to put a bpf program on the tracepoint and want
> > to maintain high performance.
 
> (Cc. Acme and Peterz)
> I wonder if we can create a special perf-tracepoint type for ifindex'es
> and the tool reading (e.g. perf-script) can perform the name lookup in
> userspace (calling if_indextoname(3)) ?
 
> I don't know the perf tools well enough to know if this is possible?

Yeah, there are libtraceevent plugins, and that gets used by trace-cmd
and perf script, perf trace.

[root@jouet ~]# ls -la ~/.traceevent/plugins/
total 192
drwxr-xr-x. 2 acme acme  4096 Aug 31 15:29 .
drwxr-xr-x. 3 acme acme  4096 Jan 27  2017 ..
-rwxr-xr-x. 1 acme acme 13744 Aug 31 15:29 plugin_cfg80211.so
-rwxr-xr-x. 1 acme acme 20192 Aug 31 15:29 plugin_function.so
-rwxr-xr-x. 1 acme acme 13680 Aug 31 15:29 plugin_hrtimer.so
-rwxr-xr-x. 1 acme acme 13760 Aug 31 15:29 plugin_jbd2.so
-rwxr-xr-x. 1 acme acme 13704 Aug 31 15:29 plugin_kmem.so
-rwxr-xr-x. 1 acme acme 28568 Aug 31 15:29 plugin_kvm.so
-rwxr-xr-x. 1 acme acme 14184 Aug 31 15:29 plugin_mac80211.so
-rwxr-xr-x. 1 acme acme 14424 Aug 31 15:29 plugin_sched_switch.so
-rwxr-xr-x. 1 acme acme 20136 Aug 31 15:29 plugin_scsi.so
-rwxr-xr-x. 1 acme acme 14504 Aug 31 15:29 plugin_xen.so
[root@jouet ~]#

But... that index is something that is mutable, i.e. you'd have to
somehow record all the assignments of an index to an interface and then,
when processing the events, get from that state the mapping you want.

So you don't store the device name by doing lookups at each of those
high volume tracepoints, store just the index, but then, when
establishing the mapping, collect that as well and we come up with some
infrastructure to get that mapping in a place where a plugin can do the
lookups at post processing time.

For instance, the hrtimer plugin will get an address from the kernel,
and, from a state recorded at the same time as the trace file, will
lookup elf symbol tables for the kernel or modules and resolve that
symbol, etc.

- Arnaldo


[PATCH 1/2] dccp: Unlock sock before calling sk_free()

2017-03-01 Thread Arnaldo Carvalho de Melo
From: Arnaldo Carvalho de Melo <a...@redhat.com>

The code where sk_clone() came from created a new socket and locked it,
but then, on the error path didn't unlock it.

This problem stayed there for a long while, till b0691c8ee7c2 ("net:
Unlock sock before calling sk_free()") fixed it, but unfortunately the
callers of sk_clone() (now sk_clone_locked()) were not audited and the
one in dccp_create_openreq_child() remained.

Now in the age of the syskaller fuzzer, this was finally uncovered, as
reported by Dmitry:

  8< 

I've got the following report while running syzkaller fuzzer on
86292b33d4b7 ("Merge branch 'akpm' (patches from Andrew)")

  [ BUG: held lock freed! ]
  4.10.0+ #234 Not tainted
  -
  syz-executor6/6898 is freeing memory
  88006286cac0-88006286d3b7, with a lock still held there!
   (slock-AF_INET6){+.-...}, at: [] spin_lock
  include/linux/spinlock.h:299 [inline]
   (slock-AF_INET6){+.-...}, at: []
  sk_clone_lock+0x3d9/0x12c0 net/core/sock.c:1504
  5 locks held by syz-executor6/6898:
   #0:  (sk_lock-AF_INET6){+.+.+.}, at: [] lock_sock
  include/net/sock.h:1460 [inline]
   #0:  (sk_lock-AF_INET6){+.+.+.}, at: []
  inet_stream_connect+0x44/0xa0 net/ipv4/af_inet.c:681
   #1:  (rcu_read_lock){..}, at: []
  inet6_csk_xmit+0x12a/0x5d0 net/ipv6/inet6_connection_sock.c:126
   #2:  (rcu_read_lock){..}, at: [] __skb_unlink
  include/linux/skbuff.h:1767 [inline]
   #2:  (rcu_read_lock){..}, at: [] __skb_dequeue
  include/linux/skbuff.h:1783 [inline]
   #2:  (rcu_read_lock){..}, at: []
  process_backlog+0x264/0x730 net/core/dev.c:4835
   #3:  (rcu_read_lock){..}, at: []
  ip6_input_finish+0x0/0x1700 net/ipv6/ip6_input.c:59
   #4:  (slock-AF_INET6){+.-...}, at: [] spin_lock
  include/linux/spinlock.h:299 [inline]
   #4:  (slock-AF_INET6){+.-...}, at: []
  sk_clone_lock+0x3d9/0x12c0 net/core/sock.c:1504

Fix it just like was done by b0691c8ee7c2 ("net: Unlock sock before calling
sk_free()").

Reported-by: Dmitry Vyukov <dvyu...@google.com>
Cc: Cong Wang <xiyou.wangc...@gmail.com>
Cc: Eric Dumazet <eduma...@google.com>
Cc: Gerrit Renker <ger...@erg.abdn.ac.uk>
Cc: Thomas Gleixner <t...@linutronix.de>
Link: http://lkml.kernel.org/r/20170301153510.ge15...@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com>
---
 net/dccp/minisocks.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/dccp/minisocks.c b/net/dccp/minisocks.c
index 53eddf99e4f6..d20d948a98ed 100644
--- a/net/dccp/minisocks.c
+++ b/net/dccp/minisocks.c
@@ -122,6 +122,7 @@ struct sock *dccp_create_openreq_child(const struct sock 
*sk,
/* It is still raw copy of parent, so invalidate
 * destructor and make plain sk_free() */
newsk->sk_destruct = NULL;
+   bh_unlock_sock(newsk);
sk_free(newsk);
return NULL;
}
-- 
2.9.3



[PATCH 2/2] net: Introduce sk_clone_lock() error path routine

2017-03-01 Thread Arnaldo Carvalho de Melo
From: Arnaldo Carvalho de Melo <a...@redhat.com>

When handling problems in cloning a socket with the sk_clone_locked()
function we need to perform several steps that were open coded in it and
its callers, so introduce a routine to avoid this duplication:
sk_free_unlock_clone().

Cc: Cong Wang <xiyou.wangc...@gmail.com>
Cc: Dmitry Vyukov <dvyu...@google.com>
Cc: Eric Dumazet <eduma...@google.com>
Cc: Gerrit Renker <ger...@erg.abdn.ac.uk>
Cc: Thomas Gleixner <t...@linutronix.de>
Link: http://lkml.kernel.org/n/net-ui6laqkotycunhtmqryl9...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com>
---
 include/net/sock.h   |  1 +
 net/core/sock.c  | 16 +++-
 net/dccp/minisocks.c |  6 +-
 3 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index c4f5e6fca17c..93d1160bcd32 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1520,6 +1520,7 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t 
priority,
 void sk_free(struct sock *sk);
 void sk_destruct(struct sock *sk);
 struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority);
+void sk_free_unlock_clone(struct sock *sk);
 
 struct sk_buff *sock_wmalloc(struct sock *sk, unsigned long size, int force,
 gfp_t priority);
diff --git a/net/core/sock.c b/net/core/sock.c
index 4eca27dc5c94..a3d9bb20f65d 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1540,11 +1540,7 @@ struct sock *sk_clone_lock(const struct sock *sk, const 
gfp_t priority)
is_charged = sk_filter_charge(newsk, filter);
 
if (unlikely(!is_charged || xfrm_sk_clone_policy(newsk, sk))) {
-   /* It is still raw copy of parent, so invalidate
-* destructor and make plain sk_free() */
-   newsk->sk_destruct = NULL;
-   bh_unlock_sock(newsk);
-   sk_free(newsk);
+   sk_free_unlock_clone(newsk);
newsk = NULL;
goto out;
}
@@ -1593,6 +1589,16 @@ struct sock *sk_clone_lock(const struct sock *sk, const 
gfp_t priority)
 }
 EXPORT_SYMBOL_GPL(sk_clone_lock);
 
+void sk_free_unlock_clone(struct sock *sk)
+{
+   /* It is still raw copy of parent, so invalidate
+* destructor and make plain sk_free() */
+   sk->sk_destruct = NULL;
+   bh_unlock_sock(sk);
+   sk_free(sk);
+}
+EXPORT_SYMBOL_GPL(sk_free_unlock_clone);
+
 void sk_setup_caps(struct sock *sk, struct dst_entry *dst)
 {
u32 max_segs = 1;
diff --git a/net/dccp/minisocks.c b/net/dccp/minisocks.c
index d20d948a98ed..e267e6f4c9a5 100644
--- a/net/dccp/minisocks.c
+++ b/net/dccp/minisocks.c
@@ -119,11 +119,7 @@ struct sock *dccp_create_openreq_child(const struct sock 
*sk,
 * Activate features: initialise CCIDs, sequence windows etc.
 */
if (dccp_feat_activate_values(newsk, >dreq_featneg)) {
-   /* It is still raw copy of parent, so invalidate
-* destructor and make plain sk_free() */
-   newsk->sk_destruct = NULL;
-   bh_unlock_sock(newsk);
-   sk_free(newsk);
+   sk_free_unlock_clone(newsk);
return NULL;
}
dccp_init_xmit_timers(newsk);
-- 
2.9.3



Re: net/dccp: dccp_create_openreq_child freed held lock

2017-03-01 Thread Arnaldo Carvalho de Melo
Em Wed, Mar 01, 2017 at 10:38:54AM +0100, Dmitry Vyukov escreveu:
> Hello,
> 
> I've got the following report while running syzkaller fuzzer on
> 86292b33d4b79ee03e2f43ea0381ef85f077c760:
> 
> 
> It seems that dccp_create_openreq_child needs to unlock the sock if
> dccp_feat_activate_values fails.

Yeah, can you please use the patch below, that mimics the error paths in
sk_clone_new(), from where I think even the comment about it being a raw
copy came, but the bh_unlock_sock() didn't?

- Arnaldo

diff --git a/net/dccp/minisocks.c b/net/dccp/minisocks.c
index 53eddf99e4f6..d20d948a98ed 100644
--- a/net/dccp/minisocks.c
+++ b/net/dccp/minisocks.c
@@ -122,6 +122,7 @@ struct sock *dccp_create_openreq_child(const struct sock 
*sk,
/* It is still raw copy of parent, so invalidate
 * destructor and make plain sk_free() */
newsk->sk_destruct = NULL;
+   bh_unlock_sock(newsk);
sk_free(newsk);
return NULL;
}


Re: net/dccp: dccp_create_openreq_child freed held lock

2017-03-01 Thread Arnaldo Carvalho de Melo
Em Wed, Mar 01, 2017 at 12:35:10PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Wed, Mar 01, 2017 at 10:38:54AM +0100, Dmitry Vyukov escreveu:
> > Hello,
> > 
> > I've got the following report while running syzkaller fuzzer on
> > 86292b33d4b79ee03e2f43ea0381ef85f077c760:
> > 
> > 
> > It seems that dccp_create_openreq_child needs to unlock the sock if
> > dccp_feat_activate_values fails.
> 
> Yeah, can you please use the patch below, that mimics the error paths in
> sk_clone_new(), from where I think even the comment about it being a raw

Argh, s/sk_clone_new()/sk_clone_lock()/g

- Arnaldo

> copy came, but the bh_unlock_sock() didn't?
> 
> - Arnaldo
> 
> diff --git a/net/dccp/minisocks.c b/net/dccp/minisocks.c
> index 53eddf99e4f6..d20d948a98ed 100644
> --- a/net/dccp/minisocks.c
> +++ b/net/dccp/minisocks.c
> @@ -122,6 +122,7 @@ struct sock *dccp_create_openreq_child(const struct sock 
> *sk,
>   /* It is still raw copy of parent, so invalidate
>* destructor and make plain sk_free() */
>   newsk->sk_destruct = NULL;
> + bh_unlock_sock(newsk);
>   sk_free(newsk);
>   return NULL;
>   }


Re: [PATCH] net/dccp: fix use after free in tw_timer_handler()

2017-02-21 Thread Arnaldo Carvalho de Melo
Em Tue, Feb 21, 2017 at 02:27:40PM +0300, Andrey Ryabinin escreveu:
> DCCP doesn't purge timewait sockets on network namespace shutdown.
> So, after net namespace destroyed we could still have an active timer
> which will trigger use after free in tw_timer_handler():
> 
> 
> Add .exit_batch hook to dccp_v4_ops()/dccp_v6_ops() which will purge
> timewait sockets on net namespace destruction and prevent above issue.

Please add this, to help stable kernels to pick this up

Fixes: b099ce2602d8 ("net: Batch inet_twsk_purge")
Cc: Eric W. Biederman <ebied...@xmission.com> 

[acme@jouet linux]$ git describe b099ce2602d8
v2.6.32-rc8-1977-gb099ce2602d8

This one added the pernet operations related to network namespaces, but
then the one above got missed.

commit 72a2d6138224298a576bcdc33d7d0004de604856
Author: Pavel Emelyanov <xe...@openvz.org>
Date:   Sun Apr 13 22:29:13 2008 -0700

[NETNS][DCCPV4]: Add dummy per-net operations.

--

It looks ok, so please consider adding my:

Acked-by: Arnaldo Carvalho de Melo <a...@redhat.com>

- Arnaldo

> Reported-by: Dmitry Vyukov <dvyu...@google.com>
> Signed-off-by: Andrey Ryabinin <aryabi...@virtuozzo.com>
> ---
>  net/dccp/ipv4.c | 6 ++
>  net/dccp/ipv6.c | 6 ++
>  2 files changed, 12 insertions(+)
> 
> diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
> index d859a5c..da7cb16 100644
> --- a/net/dccp/ipv4.c
> +++ b/net/dccp/ipv4.c
> @@ -1018,9 +1018,15 @@ static void __net_exit dccp_v4_exit_net(struct net 
> *net)
>   inet_ctl_sock_destroy(net->dccp.v4_ctl_sk);
>  }
>  
> +static void __net_exit dccp_v4_exit_batch(struct list_head *net_exit_list)
> +{
> + inet_twsk_purge(_hashinfo, _death_row, AF_INET);
> +}
> +
>  static struct pernet_operations dccp_v4_ops = {
>   .init   = dccp_v4_init_net,
>   .exit   = dccp_v4_exit_net,
> + .exit_batch = dccp_v4_exit_batch,
>  };
>  
>  static int __init dccp_v4_init(void)
> diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
> index c4e879c..f3d8f92 100644
> --- a/net/dccp/ipv6.c
> +++ b/net/dccp/ipv6.c
> @@ -1077,9 +1077,15 @@ static void __net_exit dccp_v6_exit_net(struct net 
> *net)
>   inet_ctl_sock_destroy(net->dccp.v6_ctl_sk);
>  }
>  
> +static void __net_exit dccp_v6_exit_batch(struct list_head *net_exit_list)
> +{
> + inet_twsk_purge(_hashinfo, _death_row, AF_INET6);
> +}
> +
>  static struct pernet_operations dccp_v6_ops = {
>   .init   = dccp_v6_init_net,
>   .exit   = dccp_v6_exit_net,
> + .exit_batch = dccp_v6_exit_batch,
>  };
>  
>  static int __init dccp_v6_init(void)
> -- 
> 2.10.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe dccp" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux-next: build failure after merge of the net tree

2017-02-14 Thread Arnaldo Carvalho de Melo
Em Tue, Feb 14, 2017 at 02:23:26PM +0100, Jiri Olsa escreveu:
> On Tue, Feb 14, 2017 at 09:50:20AM -0300, Arnaldo Carvalho de Melo wrote:
> 
> SNIP
> 
> > 
> > What I think Ingo meant with dependency at the build system level is to
> > somehow state that if file A gets changed, then tool B must be rebuilt.
> > 
> > Now that samples/bpf and tools/perf/ depend on tools/lib/bpf/ I _always_
> > build both, ditto for tools/objtool, that shares a different library
> > with tools/perf/, tools/lib/subcmd/:
> > 
> > ENTRYPOINT make -C /git/linux/tools/perf O=/tmp/build/perf && \
> >rm -rf /tmp/build/perf/{.[^.]*,*} && \
> >make NO_LIBELF=1 -C /git/linux/tools/perf O=/tmp/build/perf && \
> >make -C /git/linux/tools/objtool O=/tmp/build/objtool && \
> >make -C /git/linux O=/tmp/build/linux allmodconfig && \
> >make -C /git/linux O=/tmp/build/linux headers_install && \
> >make -C /git/linux O=/tmp/build/linux samples/bpf/
> > 
> > This is the default action for my
> > docker.io/acmel/linux-perf-tools-build-fedora:rawhide container.
> > 
> > It is published, so a:
> > 
> >docker pull docker.io/acmel/linux-perf-tools-build-fedora:rawhide
> > 
> > And then run it before pushing things upstream would catch these kinds
> > of errors.
> > 
> > But that would possibly disrupt too much people's workflow, that is why
> > using the Kbuild originated tools/build/ we have to somehow express that
> > when a change is made in a file then a tool that uses that file needs to
> > be rebuilt.
> 
> we already have the check in the check-headers.sh script,
> an AFAICS there's no 'rebuild' option here.. just warn or fail
> because the headers update needs to be done manualy

... when needed. And that will only be detected if you try to build
tools using what is in tools/include/linux/bpf.h

Tools using tools/lib/bpf/ _must_ use what is in tools/include/.

So lemme see if my reasoning is right:

tools/lib/bpf/bpf.c has:

  #include 

Now, samples/bpf/ will build tools/lib/bpf/bpf.o:

# Libbpf dependencies
LIBBPF := ../../tools/lib/bpf/bpf.o

HOSTCFLAGS += -I$(objtree)/usr/include
HOSTCFLAGS += -I$(srctree)/tools/lib/
HOSTCFLAGS += -I$(srctree)/tools/testing/selftests/bpf/
HOSTCFLAGS += -I$(srctree)/tools/lib/ -I$(srctree)/tools/include
HOSTCFLAGS += -I$(srctree)/tools/perf

HOSTCFLAGS_bpf_load.o += -I$(objtree)/usr/include -Wno-unused-variable

So it will never include tools/include/uapi/linux/bpf.h, which it
should.

Because the workflow people working on sample/bpf/ is to first install
the new headers using a variation of:

  make headers_install

So they will get the new bpf.h, not use tools/include/uapi/linux/bpf.h,
b00m.

They should use tools/include/uapi/linux/bpf.h, which is the one we know
builds well with tools/lib/bpf/bpf.c, since we tested it last time we
made the copy.
 
> > Makefile rules probably would be enough, but then it would have to be
> > done at the tools/build/ level and all tools using shared components
> > would have to use it to trigger the rebuild.
 
> we can move/invoke the check-headers.sh script in some upper dir

Most of the time I just ignore that warning, only when I find spare time
I go look if the changes in the kernel copy, i.e. upstream, should
trigger changes in the tools using its copy in tools/include/.

- Arnaldo


Re: linux-next: build failure after merge of the net tree

2017-02-14 Thread Arnaldo Carvalho de Melo
Em Tue, Feb 14, 2017 at 10:19:37AM +0100, Jiri Olsa escreveu:
> On Tue, Feb 14, 2017 at 07:42:21AM +0100, Ingo Molnar wrote:
> > * Stephen Rothwell  wrote:
> > > Unfortunately, the perf header files are kept separate from the kernel
> > > header files proper and are not automatically copied over :-(

> > No, that's wrong, the problem is not that headers were not shared, the 
> > problem is 
> > that a tooling interdependency was not properly tested *and* that the 
> > dependency 
> > was not properly implemented in the build system either.

> > Note that we had similar build breakages when include headers _were_ shared 
> > as 
> > well, so sharing the headers would only have worked around this particular 
> > bug and 
> > would have introduced fragility in other places...

> > The best, most robust solution in this particular case would be to fix the 
> > (tooling) build system to express the dependency, that would have shown the 
> > build 
> > failure right when the modification was done.
 
> so we have the warning now:
>   Warning: tools/include/uapi/linux/bpf.h differs from kernel
 
> do you want to change it into the build failure?

No. Differences in the copy are not always problematic, the problem here
lies elsewhere.

Please run:

  make -C tools all

To build all tools when you touch something in tools/include and/or
tools/lib/

- Arnaldo



Bored? Here is what I first wrote ;-)

Simply using the kernel original would require kernel hackers to build
all tools using that file, something we long decided not to do.

What I think Ingo meant with dependency at the build system level is to
somehow state that if file A gets changed, then tool B must be rebuilt.

Now that samples/bpf and tools/perf/ depend on tools/lib/bpf/ I _always_
build both, ditto for tools/objtool, that shares a different library
with tools/perf/, tools/lib/subcmd/:

ENTRYPOINT make -C /git/linux/tools/perf O=/tmp/build/perf && \
   rm -rf /tmp/build/perf/{.[^.]*,*} && \
   make NO_LIBELF=1 -C /git/linux/tools/perf O=/tmp/build/perf && \
   make -C /git/linux/tools/objtool O=/tmp/build/objtool && \
   make -C /git/linux O=/tmp/build/linux allmodconfig && \
   make -C /git/linux O=/tmp/build/linux headers_install && \
   make -C /git/linux O=/tmp/build/linux samples/bpf/

This is the default action for my
docker.io/acmel/linux-perf-tools-build-fedora:rawhide container.

It is published, so a:

   docker pull docker.io/acmel/linux-perf-tools-build-fedora:rawhide

And then run it before pushing things upstream would catch these kinds
of errors.

But that would possibly disrupt too much people's workflow, that is why
using the Kbuild originated tools/build/ we have to somehow express that
when a change is made in a file then a tool that uses that file needs to
be rebuilt.

Makefile rules probably would be enough, but then it would have to be
done at the tools/build/ level and all tools using shared components
would have to use it to trigger the rebuild.

- Arnaldo


[GIT PULL 00/15] perf/core improvements and fixes

2017-02-13 Thread Arnaldo Carvalho de Melo
Hi Ingo,

Please consider pulling,

- Arnaldo

Test results at the end of this message, as usual.

The following changes since commit f2029b1e47b607619d1dd2cb0bbb77f64ec6b7c2:

  perf/x86/intel: Add Kaby Lake support (2017-02-11 21:28:23 +0100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
tags/perf-core-for-mingo-4.11-20170213

for you to fetch changes up to a734fb5d60067a73dd7099a58756847c07f9cd68:

  samples/bpf: Reset global variables (2017-02-13 17:22:53 -0300)


perf/core improvements and fixes:

New feature:

- Introduce the 'delta-abs' 'perf diff' compute method, that orders the
  histogram entries by the absolute value of the percentage delta for a
  function in two perf.data files, i.e. the functions that changed the
  most (increase or decrease in samples) comes first (Namhyung Kim)

User visible:

- Improve message about tweaking the kernel.perf_event_paranoid setting,
  telling how to make the change permanent by editing /etc/sysctl.conf
  (Ingo Molnar)

Infrastructure:

- Introduce linux/compiler-gcc.h as a counterpart to the kernel's,
  initially containing the definition of __fallthrough, more to
  come (__maybe_unused, etc) (Arnaldo Carvalho de Melo)

- Fixes for problems uncovered by building tools/perf with clang, such
  as always true tests of arrays against NULL and variables that sometimes
  were used without being initialized (Arnaldo Carvalho de Melo, Steven Rostedt)

- Before loading a new ELF, clear global variables set by the
  samples/bpf loader (Mickaël Salaün)

- Ignore already processed ELF sections in the samples/bpf
  loader (Mickaël Salaün)

- Fix compile error in the scripting code with some perl5
  versions (Wang YanQing)

Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com>

----
Arnaldo Carvalho de Melo (6):
  tools include: Introduce linux/compiler-gcc.h
  tools lib traceevent plugin function: Initialize 'index' variable
  perf evsel: Inform how to make a sysctl setting permanent
  perf symbols: No need to check if sym->name is NULL
  perf tests record: No need to test an array against NULL
  perf symbols: dso->name is an array, no need to check it against NULL

Mickaël Salaün (3):
  samples/bpf: Add missing header
  samples/bpf: Ignore already processed ELF sections
  samples/bpf: Reset global variables

Namhyung Kim (4):
  perf diff: Add 'delta-abs' compute method
  perf diff: Add diff.order config option
  perf diff: Add diff.compute config option
  perf diff: Change default setting to "delta-abs"

Steven Rostedt (VMware) (1):
  tools lib traceevent: Initialize lenght on OLD_RING_BUFFER_TYPE_TIME_STAMP

Wang YanQing (1):
  perf scripting perl: Fix compile error with some perl5 versions

 samples/bpf/bpf_load.c |  7 ++
 samples/bpf/tracex5_kern.c |  1 +
 tools/include/linux/compiler-gcc.h | 14 
 tools/include/linux/compiler.h | 10 +--
 tools/lib/traceevent/kbuffer-parse.c   |  1 +
 tools/lib/traceevent/plugin_function.c |  2 +-
 tools/perf/Documentation/perf-config.txt   | 12 
 tools/perf/Documentation/perf-diff.txt | 15 -
 tools/perf/MANIFEST|  1 +
 tools/perf/builtin-diff.c  | 78 --
 tools/perf/builtin-kmem.c  |  4 +-
 tools/perf/builtin-record.c|  2 +-
 tools/perf/builtin-sched.c |  2 +-
 tools/perf/builtin-stat.c  |  2 +-
 tools/perf/builtin-top.c   |  2 +-
 tools/perf/tests/perf-record.c |  2 +-
 tools/perf/util/evsel.c|  4 +-
 tools/perf/util/evsel_fprintf.c|  1 -
 tools/perf/util/machine.c  |  2 +-
 tools/perf/util/map.c  |  4 +-
 tools/perf/util/scripting-engines/Build|  2 +-
 .../perf/util/scripting-engines/trace-event-perl.c |  4 +-
 tools/perf/util/symbol_fprintf.c   |  2 +-
 23 files changed, 145 insertions(+), 29 deletions(-)
 create mode 100644 tools/include/linux/compiler-gcc.h

Test results:

The first ones are container (docker) based builds of tools/perf with and
without libelf support, objtool where it is supported and samples/bpf/, ditto.

Several are cross builds, the ones with -x-ARCH, and the android one, and those
may not have all the features built, due to lack of multi-arch devel packages,
available and being used so far on just a few, like
debian:experimental-x-{arm64,mipsel}.

The 'perf test' one will perform a variety of tests exercising
tools/perf/util/, tools/lib/{bpf,traceevent,etc

[PATCH 13/15] samples/bpf: Add missing header

2017-02-13 Thread Arnaldo Carvalho de Melo
From: Mickaël Salaün <m...@digikod.net>

Include unistd.h to define __NR_getuid and __NR_getsid.

Signed-off-by: Mickaël Salaün <m...@digikod.net>
Acked-by: Joe Stringer <j...@ovn.org>
Acked-by: Wang Nan <wangn...@huawei.com>
Cc: Alexei Starovoitov <a...@fb.com>
Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: David S. Miller <da...@davemloft.net>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20170208202744.16274-4-...@digikod.net
Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com>
---
 samples/bpf/tracex5_kern.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/samples/bpf/tracex5_kern.c b/samples/bpf/tracex5_kern.c
index fd12d7154d42..7e4cf74553ff 100644
--- a/samples/bpf/tracex5_kern.c
+++ b/samples/bpf/tracex5_kern.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "bpf_helpers.h"
 
 #define PROG(F) SEC("kprobe/"__stringify(F)) int bpf_func_##F
-- 
2.9.3



[PATCH 14/15] samples/bpf: Ignore already processed ELF sections

2017-02-13 Thread Arnaldo Carvalho de Melo
From: Mickaël Salaün <m...@digikod.net>

Add a missing check for the map fixup loop.

Signed-off-by: Mickaël Salaün <m...@digikod.net>
Acked-by: Joe Stringer <j...@ovn.org>
Acked-by: Wang Nan <wangn...@huawei.com>
Cc: Alexei Starovoitov <a...@fb.com>
Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: David S. Miller <da...@davemloft.net>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20170208202744.16274-2-...@digikod.net
Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com>
---
 samples/bpf/bpf_load.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index 396e204888b3..e04fe09d7c2e 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -328,6 +328,8 @@ int load_bpf_file(char *path)
 
/* load programs that need map fixup (relocations) */
for (i = 1; i < ehdr.e_shnum; i++) {
+   if (processed_sec[i])
+   continue;
 
if (get_sec(elf, i, , , , ))
continue;
-- 
2.9.3



[PATCH 15/15] samples/bpf: Reset global variables

2017-02-13 Thread Arnaldo Carvalho de Melo
From: Mickaël Salaün <m...@digikod.net>

Before loading a new ELF, clean previous kernel version, license and
processed sections.

Signed-off-by: Mickaël Salaün <m...@digikod.net>
Acked-by: Joe Stringer <j...@ovn.org>
Acked-by: Wang Nan <wangn...@huawei.com>
Cc: Alexei Starovoitov <a...@fb.com>
Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: David S. Miller <da...@davemloft.net>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20170208202744.16274-3-...@digikod.net
Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com>
---
 samples/bpf/bpf_load.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index e04fe09d7c2e..b86ee54da2d1 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -277,6 +277,11 @@ int load_bpf_file(char *path)
Elf_Data *data, *data_prog, *symbols = NULL;
char *shname, *shname_prog;
 
+   /* reset global variables */
+   kern_version = 0;
+   memset(license, 0, sizeof(license));
+   memset(processed_sec, 0, sizeof(processed_sec));
+
if (elf_version(EV_CURRENT) == EV_NONE)
return 1;
 
-- 
2.9.3



Re: [PATCH v4 0/3] Miscellaneous fixes for BPF (perf tree)

2017-02-13 Thread Arnaldo Carvalho de Melo
Em Mon, Feb 13, 2017 at 09:42:31AM +0800, Wangnan (F) escreveu:
> On 2017/2/9 4:27, Mickaël Salaün wrote:
> >Mickaël Salaün (3):
> >   samples/bpf: Ignore already processed ELF sections
> >   samples/bpf: Reset global variables
> >   samples/bpf: Add missing header
> >
> >  samples/bpf/bpf_load.c | 7 +++
> >  samples/bpf/tracex5_kern.c | 1 +
> >  2 files changed, 8 insertions(+)
> >
> Looks good to me.
> 
> Thank you.

Thanks, applied, added Acked-by tags for you and Joe.

- Arnaldo


[PATCH 1/1] MAINTAINERS: Remove old e-mail address

2017-02-13 Thread Arnaldo Carvalho de Melo
The ghostprotocols.net domain is not working, remove it from CREDITS and
MAINTAINERS, and change the status to "Odd fixes", and since I haven't
been maintaining those, remove my address from there.

Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com>
---
 CREDITS |  5 ++---
 MAINTAINERS | 15 ++-
 2 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/CREDITS b/CREDITS
index c58560701d13..c5626bf06264 100644
--- a/CREDITS
+++ b/CREDITS
@@ -2478,12 +2478,11 @@ S: D-90453 Nuernberg
 S: Germany
 
 N: Arnaldo Carvalho de Melo
-E: a...@ghostprotocols.net
+E: a...@kernel.org
 E: arnaldo.m...@gmail.com
 E: a...@redhat.com
-W: http://oops.ghostprotocols.net:81/blog/
 P: 1024D/9224DF01 D5DF E3BB E3C8 BCBB F8AD  841A B6AB 4681 9224 DF01
-D: IPX, LLC, DCCP, cyc2x, wl3501_cs, net/ hacks
+D: tools/, IPX, LLC, DCCP, cyc2x, wl3501_cs, net/ hacks
 S: Brazil
 
 N: Karsten Merker
diff --git a/MAINTAINERS b/MAINTAINERS
index 3960e7faaa99..b781db49d363 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -877,8 +877,8 @@ S:  Odd fixes
 F: drivers/hwmon/applesmc.c
 
 APPLETALK NETWORK LAYER
-M: Arnaldo Carvalho de Melo <a...@ghostprotocols.net>
-S: Maintained
+L: netdev@vger.kernel.org
+S: Odd fixes
 F: drivers/net/appletalk/
 F: net/appletalk/
 
@@ -6727,9 +6727,8 @@ S:Odd Fixes
 F: drivers/tty/ipwireless/
 
 IPX NETWORK LAYER
-M:     Arnaldo Carvalho de Melo <a...@ghostprotocols.net>
 L: netdev@vger.kernel.org
-S: Maintained
+S: Odd fixes
 F: include/net/ipx.h
 F: include/uapi/linux/ipx.h
 F: net/ipx/
@@ -7501,8 +7500,8 @@ S:Maintained
 F: drivers/misc/lkdtm*
 
 LLC (802.2)
-M: Arnaldo Carvalho de Melo <a...@ghostprotocols.net>
-S: Maintained
+L: netdev@vger.kernel.org
+S: Odd fixes
 F: include/linux/llc.h
 F: include/uapi/linux/llc.h
 F: include/net/llc*
@@ -13373,10 +13372,8 @@ S: Maintained
 F: drivers/input/misc/wistron_btns.c
 
 WL3501 WIRELESS PCMCIA CARD DRIVER
-M: Arnaldo Carvalho de Melo <a...@ghostprotocols.net>
 L: linux-wirel...@vger.kernel.org
-W: http://oops.ghostprotocols.net:81/blog
-S: Maintained
+S: Odd fixes
 F: drivers/net/wireless/wl3501*
 
 WOLFSON MICROELECTRONICS DRIVERS
-- 
2.9.3



Re: [PATCH v4 0/3] Miscellaneous fixes for BPF (perf tree)

2017-02-10 Thread Arnaldo Carvalho de Melo
Em Wed, Feb 08, 2017 at 09:27:41PM +0100, Mickaël Salaün escreveu:
> This series brings some fixes and small improvements to the BPF samples.
> 
> This is intended for the perf tree and apply on 7a5980f9c006 ("tools lib bpf:
> Add missing header to the library").

Wang, are you ok with this series? Joe?

- Arnaldo
 
> Changes since v3:
> * remove applied patch 1/5
> * remove patch 2/5 on bpf_load_program() as requested by Wang Nan
> 
> Changes since v2:
> * add this cover letter
> 
> Changes since v1:
> * exclude patches not intended for the perf tree
> 
> Regards,
> 
> Mickaël Salaün (3):
>   samples/bpf: Ignore already processed ELF sections
>   samples/bpf: Reset global variables
>   samples/bpf: Add missing header
> 
>  samples/bpf/bpf_load.c | 7 +++
>  samples/bpf/tracex5_kern.c | 1 +
>  2 files changed, 8 insertions(+)
> 
> -- 
> 2.11.0


Re: [PATCH net-next v3 04/11] bpf: Use bpf_load_program() from the library

2017-02-08 Thread Arnaldo Carvalho de Melo
Em Tue, Feb 07, 2017 at 03:17:43PM -0800, Alexei Starovoitov escreveu:
> On 2/7/17 1:44 PM, Mickaël Salaün wrote:
> >-union bpf_attr attr;
> >+union bpf_attr attr = {};
> >
> >-bzero(, sizeof(attr));
> 
> I think somebody mentioned that there are compilers out there
> that don't do it correctly, hence it was done with explicit bzero.
> Arnaldo, Wang, do you remember the details?

https://www.spinics.net/lists/netdev/msg411144.html

But this was when some named initializers are used in a union with
unnamed members like 'union bpf_attr', unsure if this would break with
the above case where no named initializers are being used.

Having that said, the above is gratuitous, the code that is being
replaced is not related to the patch at hand, and conceptually the end
result should be the same.

So, please, just leave it as is, i.e. using bzero() and make your patch
a bit smaller, remember, small is good, smaller is even better ;-)

- Arnaldo


Re: [PATCH net-next v3 04/11] bpf: Use bpf_load_program() from the library

2017-02-08 Thread Arnaldo Carvalho de Melo
Em Tue, Feb 07, 2017 at 03:17:43PM -0800, Alexei Starovoitov escreveu:
> On 2/7/17 1:44 PM, Mickaël Salaün wrote:
> >-union bpf_attr attr;
> >+union bpf_attr attr = {};
> >
> >-bzero(, sizeof(attr));
 
> I think somebody mentioned that there are compilers out there
> that don't do it correctly, hence it was done with explicit bzero.
> Arnaldo, Wang, do you remember the details?

Yeah, lemme dig it...

- Arnaldo


Re: [PATCH v3 1/5] bpf: Add missing header to the library

2017-02-08 Thread Arnaldo Carvalho de Melo
Em Wed, Feb 08, 2017 at 10:47:18AM +0800, Wangnan (F) escreveu:
> >+++ b/tools/lib/bpf/bpf.h
> >@@ -22,6 +22,7 @@
> >  #define __BPF_BPF_H
> >  #include 
> >+#include 
> >  int bpf_create_map(enum bpf_map_type map_type, int key_size, int 
> > value_size,
> >int max_entries, __u32 map_flags);
> Looks good to me.
> 
> Thank you.

Applied, took the "Thank you" as an "Acked-by: Wang",

Regards,

- Arnaldo


[PATCH 09/14] tools lib bpf: Add bpf_object__pin()

2017-02-01 Thread Arnaldo Carvalho de Melo
From: Joe Stringer <j...@ovn.org>

Add a new API to pin a BPF object to the filesystem. The user can
specify the path within a BPF filesystem to pin the object.
Programs will be pinned under a subdirectory named the same as the
program, with each instance appearing as a numbered file under that
directory, and maps will be pinned under the path using the name of
the map as the file basename.

For example, with the directory '/sys/fs/bpf/foo' and a BPF object which
contains two instances of a program named 'bar', and a map named 'baz':

/sys/fs/bpf/foo/bar/0
/sys/fs/bpf/foo/bar/1
/sys/fs/bpf/foo/baz

Signed-off-by: Joe Stringer <j...@ovn.org>
Cc: Alexei Starovoitov <a...@fb.com>
Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: Wang Nan <wangn...@huawei.com>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20170126212001.14103-4-...@ovn.org
[ Check snprintf >= for truncation, as snprintf(bf, size, ...) == size also 
means truncation ]
Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com>

Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com>
---
 tools/lib/bpf/libbpf.c | 53 ++
 tools/lib/bpf/libbpf.h |  1 +
 2 files changed, 54 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 6a8c8beeb291..ac6eb863b2a4 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1379,6 +1379,59 @@ int bpf_map__pin(struct bpf_map *map, const char *path)
return 0;
 }
 
+int bpf_object__pin(struct bpf_object *obj, const char *path)
+{
+   struct bpf_program *prog;
+   struct bpf_map *map;
+   int err;
+
+   if (!obj)
+   return -ENOENT;
+
+   if (!obj->loaded) {
+   pr_warning("object not yet loaded; load it first\n");
+   return -ENOENT;
+   }
+
+   err = make_dir(path);
+   if (err)
+   return err;
+
+   bpf_map__for_each(map, obj) {
+   char buf[PATH_MAX];
+   int len;
+
+   len = snprintf(buf, PATH_MAX, "%s/%s", path,
+  bpf_map__name(map));
+   if (len < 0)
+   return -EINVAL;
+   else if (len >= PATH_MAX)
+   return -ENAMETOOLONG;
+
+   err = bpf_map__pin(map, buf);
+   if (err)
+   return err;
+   }
+
+   bpf_object__for_each_program(prog, obj) {
+   char buf[PATH_MAX];
+   int len;
+
+   len = snprintf(buf, PATH_MAX, "%s/%s", path,
+  prog->section_name);
+   if (len < 0)
+   return -EINVAL;
+   else if (len >= PATH_MAX)
+   return -ENAMETOOLONG;
+
+   err = bpf_program__pin(prog, buf);
+   if (err)
+   return err;
+   }
+
+   return 0;
+}
+
 void bpf_object__close(struct bpf_object *obj)
 {
size_t i;
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 2addf9d5b13c..b30394f9947a 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -65,6 +65,7 @@ struct bpf_object *bpf_object__open(const char *path);
 struct bpf_object *bpf_object__open_buffer(void *obj_buf,
   size_t obj_buf_sz,
   const char *name);
+int bpf_object__pin(struct bpf_object *object, const char *path);
 void bpf_object__close(struct bpf_object *object);
 
 /* Load/unload object into/from kernel */
-- 
2.9.3



[PATCH 10/14] tools perf util: Make rm_rf(path) argument const

2017-02-01 Thread Arnaldo Carvalho de Melo
From: Joe Stringer <j...@ovn.org>

rm_rf() doesn't modify its path argument, and a future caller will pass
a string constant into it to delete.

Signed-off-by: Joe Stringer <j...@ovn.org>
Cc: Alexei Starovoitov <a...@fb.com>
Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: Wang Nan <wangn...@huawei.com>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20170126212001.14103-5-...@ovn.org
Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com>
---
 tools/perf/util/util.c | 2 +-
 tools/perf/util/util.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index bf29aed16bd6..d8b45cea54d0 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -85,7 +85,7 @@ int mkdir_p(char *path, mode_t mode)
return (stat(path, ) && mkdir(path, mode)) ? -1 : 0;
 }
 
-int rm_rf(char *path)
+int rm_rf(const char *path)
 {
DIR *dir;
int ret = 0;
diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h
index 6e8be174ec0b..c74708da8571 100644
--- a/tools/perf/util/util.h
+++ b/tools/perf/util/util.h
@@ -209,7 +209,7 @@ static inline int sane_case(int x, int high)
 }
 
 int mkdir_p(char *path, mode_t mode);
-int rm_rf(char *path);
+int rm_rf(const char *path);
 struct strlist *lsdir(const char *name, bool (*filter)(const char *, struct 
dirent *));
 bool lsdir_no_dot_filter(const char *name, struct dirent *d);
 int copyfile(const char *from, const char *to);
-- 
2.9.3



[PATCH 11/14] tools lib api fs: Add bpf_fs filesystem detector

2017-02-01 Thread Arnaldo Carvalho de Melo
From: Joe Stringer <j...@ovn.org>

Allow mounting of the BPF filesystem at /sys/fs/bpf.

Signed-off-by: Joe Stringer <j...@ovn.org>
Cc: Alexei Starovoitov <a...@fb.com>
Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: Wang Nan <wangn...@huawei.com>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20170126212001.14103-6-...@ovn.org
Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com>
---
 tools/lib/api/fs/fs.c | 16 
 tools/lib/api/fs/fs.h |  1 +
 2 files changed, 17 insertions(+)

diff --git a/tools/lib/api/fs/fs.c b/tools/lib/api/fs/fs.c
index f99f49e4a31e..4b6bfc43cccf 100644
--- a/tools/lib/api/fs/fs.c
+++ b/tools/lib/api/fs/fs.c
@@ -38,6 +38,10 @@
 #define HUGETLBFS_MAGIC0x958458f6
 #endif
 
+#ifndef BPF_FS_MAGIC
+#define BPF_FS_MAGIC   0xcafe4a11
+#endif
+
 static const char * const sysfs__fs_known_mountpoints[] = {
"/sys",
0,
@@ -75,6 +79,11 @@ static const char * const hugetlbfs__known_mountpoints[] = {
0,
 };
 
+static const char * const bpf_fs__known_mountpoints[] = {
+   "/sys/fs/bpf",
+   0,
+};
+
 struct fs {
const char  *name;
const char * const  *mounts;
@@ -89,6 +98,7 @@ enum {
FS__DEBUGFS = 2,
FS__TRACEFS = 3,
FS__HUGETLBFS = 4,
+   FS__BPF_FS = 5,
 };
 
 #ifndef TRACEFS_MAGIC
@@ -121,6 +131,11 @@ static struct fs fs__entries[] = {
.mounts = hugetlbfs__known_mountpoints,
.magic  = HUGETLBFS_MAGIC,
},
+   [FS__BPF_FS] = {
+   .name   = "bpf",
+   .mounts = bpf_fs__known_mountpoints,
+   .magic  = BPF_FS_MAGIC,
+   },
 };
 
 static bool fs__read_mounts(struct fs *fs)
@@ -280,6 +295,7 @@ FS(procfs,  FS__PROCFS);
 FS(debugfs, FS__DEBUGFS);
 FS(tracefs, FS__TRACEFS);
 FS(hugetlbfs, FS__HUGETLBFS);
+FS(bpf_fs, FS__BPF_FS);
 
 int filename__read_int(const char *filename, int *value)
 {
diff --git a/tools/lib/api/fs/fs.h b/tools/lib/api/fs/fs.h
index a63269f5d20c..6b332dc74498 100644
--- a/tools/lib/api/fs/fs.h
+++ b/tools/lib/api/fs/fs.h
@@ -22,6 +22,7 @@ FS(procfs)
 FS(debugfs)
 FS(tracefs)
 FS(hugetlbfs)
+FS(bpf_fs)
 
 #undef FS
 
-- 
2.9.3



[PATCH 08/14] tools lib bpf: Add bpf_map__pin()

2017-02-01 Thread Arnaldo Carvalho de Melo
From: Joe Stringer <j...@ovn.org>

Add a new API to pin a BPF map to the filesystem. The user can specify
the path full path within a BPF filesystem to pin the map.

Signed-off-by: Joe Stringer <j...@ovn.org>
Cc: Alexei Starovoitov <a...@fb.com>
Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: Wang Nan <wangn...@huawei.com>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20170126212001.14103-3-...@ovn.org
Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com>
---
 tools/lib/bpf/libbpf.c | 22 ++
 tools/lib/bpf/libbpf.h |  1 +
 2 files changed, 23 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index c4465b2fddf6..6a8c8beeb291 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1357,6 +1357,28 @@ int bpf_program__pin(struct bpf_program *prog, const 
char *path)
return 0;
 }
 
+int bpf_map__pin(struct bpf_map *map, const char *path)
+{
+   int err;
+
+   err = check_path(path);
+   if (err)
+   return err;
+
+   if (map == NULL) {
+   pr_warning("invalid map pointer\n");
+   return -EINVAL;
+   }
+
+   if (bpf_obj_pin(map->fd, path)) {
+   pr_warning("failed to pin map: %s\n", strerror(errno));
+   return -errno;
+   }
+
+   pr_debug("pinned map '%s'\n", path);
+   return 0;
+}
+
 void bpf_object__close(struct bpf_object *obj)
 {
size_t i;
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 9f8aa63b95f4..2addf9d5b13c 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -236,6 +236,7 @@ typedef void (*bpf_map_clear_priv_t)(struct bpf_map *, void 
*);
 int bpf_map__set_priv(struct bpf_map *map, void *priv,
  bpf_map_clear_priv_t clear_priv);
 void *bpf_map__priv(struct bpf_map *map);
+int bpf_map__pin(struct bpf_map *map, const char *path);
 
 long libbpf_get_error(const void *ptr);
 
-- 
2.9.3



[PATCH 12/14] perf test: Add libbpf pinning test

2017-02-01 Thread Arnaldo Carvalho de Melo
From: Joe Stringer <j...@ovn.org>

Add a test for the newly added BPF object pinning functionality.

For example:

  # tools/perf/perf test 37
37: BPF filter :
37.1: Basic BPF filtering  : Ok
37.2: BPF pinning  : Ok
37.3: BPF prologue generation  : Ok
37.4: BPF relocation checker   : Ok

  # tools/perf/perf test 37 -v 2>&1 | grep pinned
libbpf: pinned map '/sys/fs/bpf/perf_test/flip_table'
libbpf: pinned program '/sys/fs/bpf/perf_test/func=SyS_epoll_wait/0'

Signed-off-by: Joe Stringer <j...@ovn.org>
Requested-and-Tested-by: Arnaldo Carvalho de Melo <a...@redhat.com>
Cc: Alexei Starovoitov <a...@fb.com>
Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: Wang Nan <wangn...@huawei.com>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20170126212001.14103-7-...@ovn.org
Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com>
---
 tools/perf/tests/bpf.c | 42 +-
 1 file changed, 41 insertions(+), 1 deletion(-)

diff --git a/tools/perf/tests/bpf.c b/tools/perf/tests/bpf.c
index 92343f43e44a..1a04fe77487d 100644
--- a/tools/perf/tests/bpf.c
+++ b/tools/perf/tests/bpf.c
@@ -5,11 +5,13 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "tests.h"
 #include "llvm.h"
 #include "debug.h"
 #define NR_ITERS   111
+#define PERF_TEST_BPF_PATH "/sys/fs/bpf/perf_test"
 
 #ifdef HAVE_LIBBPF_SUPPORT
 
@@ -54,6 +56,7 @@ static struct {
const char *msg_load_fail;
int (*target_func)(void);
int expect_result;
+   boolpin;
 } bpf_testcase_table[] = {
{
LLVM_TESTCASE_BASE,
@@ -63,6 +66,17 @@ static struct {
"load bpf object failed",
_wait_loop,
(NR_ITERS + 1) / 2,
+   false,
+   },
+   {
+   LLVM_TESTCASE_BASE,
+   "BPF pinning",
+   "[bpf_pinning]",
+   "fix kbuild first",
+   "check your vmlinux setting?",
+   _wait_loop,
+   (NR_ITERS + 1) / 2,
+   true,
},
 #ifdef HAVE_BPF_PROLOGUE
{
@@ -73,6 +87,7 @@ static struct {
"check your vmlinux setting?",
_loop,
(NR_ITERS + 1) / 4,
+   false,
},
 #endif
{
@@ -83,6 +98,7 @@ static struct {
"libbpf error when dealing with relocation",
NULL,
0,
+   false,
},
 };
 
@@ -226,10 +242,34 @@ static int __test__bpf(int idx)
goto out;
}
 
-   if (obj)
+   if (obj) {
ret = do_test(obj,
  bpf_testcase_table[idx].target_func,
  bpf_testcase_table[idx].expect_result);
+   if (ret != TEST_OK)
+   goto out;
+   if (bpf_testcase_table[idx].pin) {
+   int err;
+
+   if (!bpf_fs__mount()) {
+   pr_debug("BPF filesystem not mounted\n");
+   ret = TEST_FAIL;
+   goto out;
+   }
+   err = mkdir(PERF_TEST_BPF_PATH, 0777);
+   if (err && errno != EEXIST) {
+   pr_debug("Failed to make perf_test dir: %s\n",
+strerror(errno));
+   ret = TEST_FAIL;
+   goto out;
+   }
+   if (bpf_object__pin(obj, PERF_TEST_BPF_PATH))
+   ret = TEST_FAIL;
+   if (rm_rf(PERF_TEST_BPF_PATH))
+   ret = TEST_FAIL;
+   }
+   }
+
 out:
bpf__clear();
return ret;
-- 
2.9.3



[PATCH 07/14] tools lib bpf: Add BPF program pinning APIs

2017-02-01 Thread Arnaldo Carvalho de Melo
From: Joe Stringer <j...@ovn.org>

Add new APIs to pin a BPF program (or specific instances) to the
filesystem.  The user can specify the path full path within a BPF
filesystem to pin the program.

bpf_program__pin_instance(prog, path, n) will pin the nth instance of
'prog' to the specified path.

bpf_program__pin(prog, path) will create the directory 'path' (if it
does not exist) and pin each instance within that directory. For
instance, path/0, path/1, path/2.

Committer notes:

- Add missing headers for mkdir()

- Check strdup() for failure

- Check snprintf >= size, not >, as == also means truncated, see 'man
  snprintf', return value.

- Conditionally define BPF_FS_MAGIC, as it isn't in magic.h in older
  systems and we're not yet having a tools/include/uapi/linux/magic.h
  copy.

- Do not include linux/magic.h, not present in older distros.

Signed-off-by: Joe Stringer <j...@ovn.org>
Cc: Alexei Starovoitov <a...@fb.com>
Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: Wang Nan <wangn...@huawei.com>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20170126212001.14103-2-...@ovn.org
Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com>
---
 tools/lib/bpf/libbpf.c | 120 +
 tools/lib/bpf/libbpf.h |   3 ++
 2 files changed, 123 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index e6cd62b1264b..c4465b2fddf6 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -4,6 +4,7 @@
  * Copyright (C) 2013-2015 Alexei Starovoitov <a...@kernel.org>
  * Copyright (C) 2015 Wang Nan <wangn...@huawei.com>
  * Copyright (C) 2015 Huawei Inc.
+ * Copyright (C) 2017 Nicira, Inc.
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public
@@ -22,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -32,6 +34,10 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 
 #include 
 #include 
 
@@ -42,6 +48,10 @@
 #define EM_BPF 247
 #endif
 
+#ifndef BPF_FS_MAGIC
+#define BPF_FS_MAGIC   0xcafe4a11
+#endif
+
 #define __printf(a, b) __attribute__((format(printf, a, b)))
 
 __printf(1, 2)
@@ -1237,6 +1247,116 @@ int bpf_object__load(struct bpf_object *obj)
return err;
 }
 
+static int check_path(const char *path)
+{
+   struct statfs st_fs;
+   char *dname, *dir;
+   int err = 0;
+
+   if (path == NULL)
+   return -EINVAL;
+
+   dname = strdup(path);
+   if (dname == NULL)
+   return -ENOMEM;
+
+   dir = dirname(dname);
+   if (statfs(dir, _fs)) {
+   pr_warning("failed to statfs %s: %s\n", dir, strerror(errno));
+   err = -errno;
+   }
+   free(dname);
+
+   if (!err && st_fs.f_type != BPF_FS_MAGIC) {
+   pr_warning("specified path %s is not on BPF FS\n", path);
+   err = -EINVAL;
+   }
+
+   return err;
+}
+
+int bpf_program__pin_instance(struct bpf_program *prog, const char *path,
+ int instance)
+{
+   int err;
+
+   err = check_path(path);
+   if (err)
+   return err;
+
+   if (prog == NULL) {
+   pr_warning("invalid program pointer\n");
+   return -EINVAL;
+   }
+
+   if (instance < 0 || instance >= prog->instances.nr) {
+   pr_warning("invalid prog instance %d of prog %s (max %d)\n",
+  instance, prog->section_name, prog->instances.nr);
+   return -EINVAL;
+   }
+
+   if (bpf_obj_pin(prog->instances.fds[instance], path)) {
+   pr_warning("failed to pin program: %s\n", strerror(errno));
+   return -errno;
+   }
+   pr_debug("pinned program '%s'\n", path);
+
+   return 0;
+}
+
+static int make_dir(const char *path)
+{
+   int err = 0;
+
+   if (mkdir(path, 0700) && errno != EEXIST)
+   err = -errno;
+
+   if (err)
+   pr_warning("failed to mkdir %s: %s\n", path, strerror(-err));
+   return err;
+}
+
+int bpf_program__pin(struct bpf_program *prog, const char *path)
+{
+   int i, err;
+
+   err = check_path(path);
+   if (err)
+   return err;
+
+   if (prog == NULL) {
+   pr_warning("invalid program pointer\n");
+   return -EINVAL;
+   }
+
+   if (prog->instances.nr <= 0) {
+   pr_warning("no instances of prog %s to pin\n",
+  prog->section_name);
+   return -EINVAL;
+   }
+
+   err = make_dir(path);
+   if (err)
+   return err;
+
+   for (i = 0; i < prog->instances.nr; i++) {
+   char buf[PATH_MAX];
+   

Re: [PATCHv3 perf/core 1/6] tools lib bpf: Add BPF program pinning APIs.

2017-01-31 Thread Arnaldo Carvalho de Melo
Em Tue, Jan 31, 2017 at 01:13:20PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Tue, Jan 31, 2017 at 01:08:27PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Mon, Jan 30, 2017 at 09:58:05PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > Em Mon, Jan 30, 2017 at 01:16:18PM -0800, Joe Stringer escreveu:
> > > > On 30 January 2017 at 12:28, Arnaldo Carvalho de Melo <a...@kernel.org> 
> > > > wrote:
> > > > > ---
> > > > > Thus, a return value of size or more means that the output was
> > > > > truncated.
> > > > > ---
> > >  
> > > > Good spotting, I looked over the committed versions and tested them,
> > > > they seem good to me. Thanks!
> > > 
> > > Thanks for checking, will push Ingo's way after a battery of extra
> > > tests, tomorrow,
> > 
> > Which failed for centos:5, centos:6, centos:7, debian:7, debian:8,
> > debian:experimental and others, I stopped the test at this point,
> > working on fixing it.
> > 
> > All seems related to:
> > 
> > libbpf.c:1267: error: 'BPF_FS_MAGIC' undeclared (first use in this function)
> > libbpf.c:1267: error: (Each undeclared identifier is reported only once
> > libbpf.c:1267: error: for each function it appears in.)
> 
> We need to carry a tools/include/uapi/linux/magic.c copy, check if it
> drifts, remove the ifdefs for _FS_MAGIC defines from tools/ and use that
> instead, etc, till then I'll just add the ifdef to libbpf.c.

After also removing that

#include 

line, that is not used anywhere else in tools/{perf,include,lib}/ it is
going further:

[root@jouet ~]# time dm
   1 83.120412349 alpine:3.4: Ok
   2 35.486456929 android-ndk:r12b-arm: Ok
   3 85.384259996 archlinux:latest: Ok
   4 49.518031326 centos:5: Ok
   5 70.417375831 centos:6: Ok
   6 87.033156092 centos:7: Ok

31 more to go

:-)

- Arnaldo


Re: [PATCHv3 perf/core 1/6] tools lib bpf: Add BPF program pinning APIs.

2017-01-31 Thread Arnaldo Carvalho de Melo
Em Tue, Jan 31, 2017 at 01:08:27PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Mon, Jan 30, 2017 at 09:58:05PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Mon, Jan 30, 2017 at 01:16:18PM -0800, Joe Stringer escreveu:
> > > On 30 January 2017 at 12:28, Arnaldo Carvalho de Melo <a...@kernel.org> 
> > > wrote:
> > > > ---
> > > > Thus, a return value of size or more means that the output was
> > > > truncated.
> > > > ---
> >  
> > > Good spotting, I looked over the committed versions and tested them,
> > > they seem good to me. Thanks!
> > 
> > Thanks for checking, will push Ingo's way after a battery of extra
> > tests, tomorrow,
> 
> Which failed for centos:5, centos:6, centos:7, debian:7, debian:8,
> debian:experimental and others, I stopped the test at this point,
> working on fixing it.
> 
> All seems related to:
> 
> libbpf.c:1267: error: 'BPF_FS_MAGIC' undeclared (first use in this function)
> libbpf.c:1267: error: (Each undeclared identifier is reported only once
> libbpf.c:1267: error: for each function it appears in.)

We need to carry a tools/include/uapi/linux/magic.c copy, check if it
drifts, remove the ifdefs for _FS_MAGIC defines from tools/ and use that
instead, etc, till then I'll just add the ifdef to libbpf.c.

[acme@jouet linux]$ grep BPF_FS_MAGIC /usr/include/*/*.h
/usr/include/linux/magic.h:#define BPF_FS_MAGIC 0xcafe4a11
[acme@jouet linux]$ rpm -qf /usr/include/linux/magic.h
kernel-headers-4.9.6-200.fc25.x86_64
[acme@jouet linux]$ cat /etc/fedora-release 
Fedora release 25 (Twenty Five)
[acme@jouet linux]$ 

But those other distros don't have it.

- Arnaldo


Re: [PATCHv3 perf/core 1/6] tools lib bpf: Add BPF program pinning APIs.

2017-01-31 Thread Arnaldo Carvalho de Melo
Em Mon, Jan 30, 2017 at 09:58:05PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Mon, Jan 30, 2017 at 01:16:18PM -0800, Joe Stringer escreveu:
> > On 30 January 2017 at 12:28, Arnaldo Carvalho de Melo <a...@kernel.org> 
> > wrote:
> > > ---
> > > Thus, a return value of size or more means that the output was
> > > truncated.
> > > ---
>  
> > Good spotting, I looked over the committed versions and tested them,
> > they seem good to me. Thanks!
> 
> Thanks for checking, will push Ingo's way after a battery of extra
> tests, tomorrow,

Which failed for centos:5, centos:6, centos:7, debian:7, debian:8,
debian:experimental and others, I stopped the test at this point,
working on fixing it.

All seems related to:

libbpf.c:1267: error: 'BPF_FS_MAGIC' undeclared (first use in this function)
libbpf.c:1267: error: (Each undeclared identifier is reported only once
libbpf.c:1267: error: for each function it appears in.)



Re: [PATCHv3 perf/core 1/6] tools lib bpf: Add BPF program pinning APIs.

2017-01-30 Thread Arnaldo Carvalho de Melo
Em Mon, Jan 30, 2017 at 01:16:18PM -0800, Joe Stringer escreveu:
> On 30 January 2017 at 12:28, Arnaldo Carvalho de Melo <a...@kernel.org> wrote:
> > ---
> > Thus, a return value of size or more means that the output was
> > truncated.
> > ---
 
> Good spotting, I looked over the committed versions and tested them,
> they seem good to me. Thanks!

Thanks for checking, will push Ingo's way after a battery of extra
tests, tomorrow,

- Arnaldo


Re: [PATCHv3 perf/core 0/6] Libbpf object pinning

2017-01-30 Thread Arnaldo Carvalho de Melo
Em Thu, Jan 26, 2017 at 01:19:55PM -0800, Joe Stringer escreveu:
> This series adds pinning functionality for maps, programs, and objects.
> Library users may call bpf_map__pin(map, path) or bpf_program__pin(prog, path)
> to pin maps and programs separately, or use bpf_object__pin(obj, path) to
> pin all maps and programs from the BPF object to the path. The map and program
> variations require a path where it will be pinned in the filesystem,
> and the object variation will create named directories for each program with
> instances within, and mount the maps by name under the path.
> 
> For example, with the directory '/sys/fs/bpf/foo' and a BPF object which
> contains two instances of a program named 'bar', and a map named 'baz':
> /sys/fs/bpf/foo/bar/0
> /sys/fs/bpf/foo/bar/1
> /sys/fs/bpf/foo/baz

Thanks, applied, after some minor fixes.

- Arnaldo
 
> ---
> v3: Split out bpf_program__pin_instance().
> Change the paths from PATH/{maps,progs}/foo to the above.
> Drop the patches that were applied.
> Add a perf test to check that pinning works.
> v2: Wang Nan provided improvements to patch 1.
> Dropped patch 2 from v1.
> Added acks for acked patches.
> Split the bpf_obj__pin() to also provide map / program pinning APIs.
> Allow users to provide full filesystem path (don't autodetect/mount 
> BPFFS).
> v1: Initial post.
> 
> Joe Stringer (6):
>   tools lib bpf: Add BPF program pinning APIs.
>   tools lib bpf: Add bpf_map__pin()
>   tools lib bpf: Add bpf_object__pin()
>   tools perf util: Make rm_rf(path) argument const
>   tools lib api fs: Add bpf_fs filesystem detector
>   perf test: Add libbpf pinning test
> 
>  tools/lib/api/fs/fs.c  |  16 +
>  tools/lib/api/fs/fs.h  |   1 +
>  tools/lib/bpf/libbpf.c | 188 
> +
>  tools/lib/bpf/libbpf.h |   5 ++
>  tools/perf/tests/bpf.c |  42 ++-
>  tools/perf/util/util.c |   2 +-
>  tools/perf/util/util.h |   2 +-
>  7 files changed, 253 insertions(+), 3 deletions(-)
> 
> -- 
> 2.11.0


Re: [PATCHv3 perf/core 1/6] tools lib bpf: Add BPF program pinning APIs.

2017-01-30 Thread Arnaldo Carvalho de Melo
Em Mon, Jan 30, 2017 at 05:25:06PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Thu, Jan 26, 2017 at 01:19:56PM -0800, Joe Stringer escreveu:
> > Add new APIs to pin a BPF program (or specific instances) to the filesystem.
> > The user can specify the path full path within a BPF filesystem to pin the
> > program.
> > 
> > bpf_program__pin_instance(prog, path, n) will pin the nth instance of
> > 'prog' to the specified path.
> > bpf_program__pin(prog, path) will create the directory 'path' (if it
> > does not exist) and pin each instance within that directory. For
> > instance, path/0, path/1, path/2.
> > 
> > Signed-off-by: Joe Stringer <j...@ovn.org>
> 
> make: Entering directory '/home/acme/git/linux/tools/perf'
>   BUILD:   Doing 'make -j4' parallel build
>   CC   /tmp/build/perf/builtin-record.o
>   CC   /tmp/build/perf/libbpf.o
>   CC   /tmp/build/perf/util/parse-events.o
>   INSTALL  trace_plugins
> libbpf.c: In function ‘make_dir’:
> libbpf.c:1303:6: error: implicit declaration of function ‘mkdir’ 
> [-Werror=implicit-function-declaration]
>   if (mkdir(path, 0700) && errno != EEXIST)
>   ^
> libbpf.c:1303:2: error: nested extern declaration of ‘mkdir’ 
> [-Werror=nested-externs]
>   if (mkdir(path, 0700) && errno != EEXIST)
>   ^~
> cc1: all warnings being treated as errors
> mv: cannot stat '/tmp/build/perf/.libbpf.o.tmp': No such file or directory
> /home/acme/git/linux/tools/build/Makefile.build:101: recipe for target 
> '/tmp/build/perf/libbpf.o' failed
> 
> 
> And strdup() is not checked for failure, I'm fixing those,
> 
> +++ b/tools/lib/bpf/libbpf.c
> @@ -36,6 +36,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  #include 

This as well:

@@ -1338,7 +1343,7 @@ int bpf_program__pin(struct bpf_program *prog,
const char *path)
len = snprintf(buf, PATH_MAX, "%s/%d", path, i);
if (len < 0)
return -EINVAL;
-   else if (len > PATH_MAX)
+   else if (len >= PATH_MAX)
return -ENAMETOOLONG;


See 'man snprintf', return value:

---
Thus, a return value of size or more means that the output was
truncated.
---


Re: [PATCHv3 perf/core 1/6] tools lib bpf: Add BPF program pinning APIs.

2017-01-30 Thread Arnaldo Carvalho de Melo
Em Thu, Jan 26, 2017 at 01:19:56PM -0800, Joe Stringer escreveu:
> Add new APIs to pin a BPF program (or specific instances) to the filesystem.
> The user can specify the path full path within a BPF filesystem to pin the
> program.
> 
> bpf_program__pin_instance(prog, path, n) will pin the nth instance of
> 'prog' to the specified path.
> bpf_program__pin(prog, path) will create the directory 'path' (if it
> does not exist) and pin each instance within that directory. For
> instance, path/0, path/1, path/2.
> 
> Signed-off-by: Joe Stringer 

make: Entering directory '/home/acme/git/linux/tools/perf'
  BUILD:   Doing 'make -j4' parallel build
  CC   /tmp/build/perf/builtin-record.o
  CC   /tmp/build/perf/libbpf.o
  CC   /tmp/build/perf/util/parse-events.o
  INSTALL  trace_plugins
libbpf.c: In function ‘make_dir’:
libbpf.c:1303:6: error: implicit declaration of function ‘mkdir’ 
[-Werror=implicit-function-declaration]
  if (mkdir(path, 0700) && errno != EEXIST)
  ^
libbpf.c:1303:2: error: nested extern declaration of ‘mkdir’ 
[-Werror=nested-externs]
  if (mkdir(path, 0700) && errno != EEXIST)
  ^~
cc1: all warnings being treated as errors
mv: cannot stat '/tmp/build/perf/.libbpf.o.tmp': No such file or directory
/home/acme/git/linux/tools/build/Makefile.build:101: recipe for target 
'/tmp/build/perf/libbpf.o' failed


And strdup() is not checked for failure, I'm fixing those,

+++ b/tools/lib/bpf/libbpf.c
@@ -36,6 +36,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 


- Arnaldo

> ---
> v3: Add per-instance pinning.
> Use path for bpf_program__pin() as directory.
> v2: Don't automount BPF filesystem
> Split program, map, object pinning into separate APIs and separate
> patches.
> ---
>  tools/lib/bpf/libbpf.c | 112 
> +
>  tools/lib/bpf/libbpf.h |   3 ++
>  2 files changed, 115 insertions(+)
> 
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index e6cd62b1264b..d1d7638b7c21 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -4,6 +4,7 @@
>   * Copyright (C) 2013-2015 Alexei Starovoitov 
>   * Copyright (C) 2015 Wang Nan 
>   * Copyright (C) 2015 Huawei Inc.
> + * Copyright (C) 2017 Nicira, Inc.
>   *
>   * This program is free software; you can redistribute it and/or
>   * modify it under the terms of the GNU Lesser General Public
> @@ -22,6 +23,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -31,7 +33,10 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
> +#include 
> +#include 
>  #include 
>  #include 
>  
> @@ -1237,6 +1242,113 @@ int bpf_object__load(struct bpf_object *obj)
>   return err;
>  }
>  
> +static int check_path(const char *path)
> +{
> + struct statfs st_fs;
> + char *dname, *dir;
> + int err = 0;
> +
> + if (path == NULL)
> + return -EINVAL;
> +
> + dname = strdup(path);
> + dir = dirname(dname);
> + if (statfs(dir, _fs)) {
> + pr_warning("failed to statfs %s: %s\n", dir, strerror(errno));
> + err = -errno;
> + }
> + free(dname);
> +
> + if (!err && st_fs.f_type != BPF_FS_MAGIC) {
> + pr_warning("specified path %s is not on BPF FS\n", path);
> + err = -EINVAL;
> + }
> +
> + return err;
> +}
> +
> +int bpf_program__pin_instance(struct bpf_program *prog, const char *path,
> +   int instance)
> +{
> + int err;
> +
> + err = check_path(path);
> + if (err)
> + return err;
> +
> + if (prog == NULL) {
> + pr_warning("invalid program pointer\n");
> + return -EINVAL;
> + }
> +
> + if (instance < 0 || instance >= prog->instances.nr) {
> + pr_warning("invalid prog instance %d of prog %s (max %d)\n",
> +instance, prog->section_name, prog->instances.nr);
> + return -EINVAL;
> + }
> +
> + if (bpf_obj_pin(prog->instances.fds[instance], path)) {
> + pr_warning("failed to pin program: %s\n", strerror(errno));
> + return -errno;
> + }
> + pr_debug("pinned program '%s'\n", path);
> +
> + return 0;
> +}
> +
> +static int make_dir(const char *path)
> +{
> + int err = 0;
> +
> + if (mkdir(path, 0700) && errno != EEXIST)
> + err = -errno;
> +
> + if (err)
> + pr_warning("failed to mkdir %s: %s\n", path, strerror(-err));
> + return err;
> +}
> +
> +int bpf_program__pin(struct bpf_program *prog, const char *path)
> +{
> + int i, err;
> +
> + err = check_path(path);
> + if (err)
> + return err;
> +
> + if (prog == NULL) {
> + pr_warning("invalid program pointer\n");
> + return -EINVAL;
> + }
> +
> + if (prog->instances.nr <= 0) {
> + pr_warning("no instances of 

Re: [PATCH net-next 1/3] trace: add variant without spacing in trace_print_hex_seq

2017-01-26 Thread Arnaldo Carvalho de Melo
Em Wed, Jan 25, 2017 at 02:28:16AM +0100, Daniel Borkmann escreveu:
> For upcoming tracepoint support for BPF, we want to dump the program's
> tag. Format should be similar to __print_hex(), but without spacing.
> Add a __print_hex_str() variant for exactly that purpose that reuses
> trace_print_hex_seq().

Steven should be back to his side of the wall soon, will wait for his
Ack, ok?

- Arnaldo
 
> Signed-off-by: Daniel Borkmann <dan...@iogearbox.net>
> Cc: Steven Rostedt <rost...@goodmis.org>
> Cc: Arnaldo Carvalho de Melo <a...@redhat.com>
> ---
>  include/linux/trace_events.h | 3 ++-
>  include/trace/trace_events.h | 8 +++-
>  kernel/trace/trace_output.c  | 7 ---
>  3 files changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> index be00761..cfa475a 100644
> --- a/include/linux/trace_events.h
> +++ b/include/linux/trace_events.h
> @@ -33,7 +33,8 @@ const char *trace_print_bitmask_seq(struct trace_seq *p, 
> void *bitmask_ptr,
>   unsigned int bitmask_size);
>  
>  const char *trace_print_hex_seq(struct trace_seq *p,
> - const unsigned char *buf, int len);
> + const unsigned char *buf, int len,
> + bool spacing);
>  
>  const char *trace_print_array_seq(struct trace_seq *p,
>  const void *buf, int count,
> diff --git a/include/trace/trace_events.h b/include/trace/trace_events.h
> index 467e12f..9f68462 100644
> --- a/include/trace/trace_events.h
> +++ b/include/trace/trace_events.h
> @@ -297,7 +297,12 @@
>  #endif
>  
>  #undef __print_hex
> -#define __print_hex(buf, buf_len) trace_print_hex_seq(p, buf, buf_len)
> +#define __print_hex(buf, buf_len)\
> + trace_print_hex_seq(p, buf, buf_len, true)
> +
> +#undef __print_hex_str
> +#define __print_hex_str(buf, buf_len)
> \
> + trace_print_hex_seq(p, buf, buf_len, false)
>  
>  #undef __print_array
>  #define __print_array(array, count, el_size) \
> @@ -711,6 +716,7 @@
>  #undef __print_flags
>  #undef __print_symbolic
>  #undef __print_hex
> +#undef __print_hex_str
>  #undef __get_dynamic_array
>  #undef __get_dynamic_array_len
>  #undef __get_str
> diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
> index 5d33a73..30a144b1 100644
> --- a/kernel/trace/trace_output.c
> +++ b/kernel/trace/trace_output.c
> @@ -163,14 +163,15 @@ enum print_line_t trace_print_printk_msg_only(struct 
> trace_iterator *iter)
>  EXPORT_SYMBOL_GPL(trace_print_bitmask_seq);
>  
>  const char *
> -trace_print_hex_seq(struct trace_seq *p, const unsigned char *buf, int 
> buf_len)
> +trace_print_hex_seq(struct trace_seq *p, const unsigned char *buf, int 
> buf_len,
> + bool spacing)
>  {
>   int i;
>   const char *ret = trace_seq_buffer_ptr(p);
>  
>   for (i = 0; i < buf_len; i++)
> - trace_seq_printf(p, "%s%2.2x", i == 0 ? "" : " ", buf[i]);
> -
> + trace_seq_printf(p, "%s%2.2x", !spacing || i == 0 ? "" : " ",
> +  buf[i]);
>   trace_seq_putc(p, 0);
>  
>   return ret;
> -- 
> 1.9.3


Re: [PATCHv2 perf/core 5/7] tools lib bpf: Add bpf_program__pin()

2017-01-26 Thread Arnaldo Carvalho de Melo
Em Wed, Jan 25, 2017 at 10:18:22AM +0800, Wangnan (F) escreveu:
> On 2017/1/25 9:16, Joe Stringer wrote:
> > On 24 January 2017 at 17:06, Wangnan (F)  wrote:
> > > On 2017/1/25 9:04, Wangnan (F) wrote:
> > > Is it possible to use directory tree instead?

> > > %s/object/mapname
> > > %s/object/prog/instance
> > I don't think objects have names, so let's assume an object with two
> > program instances named foo, and one map named bar.

> > A call of bpf_object__pin(obj, "/sys/fs/bpf/myobj") would mount with
> > the following files and directories:
> > /sys/fs/bpf/myobj/foo/1
> > /sys/fs/bpf/myobj/foo/2
> > /sys/fs/bpf/myobj/bar

> > Alternatively, if you want to control exactly where you want the
> > progs/maps to be pinned, you can call eg
> > bpf_program__pin_instance(prog, "/sys/fs/bpf/wherever", 0) and that
> > instance will be mounted to /sys/fs/bpf/wherever, or alternatively
> > bpf_program__pin(prog, "/sys/fs/bpf/foo"), and you will end up with
> > /sys/fs/bpf/foo/{0,1}.

> > This looks pretty reasonable to me.

> It looks good to me.

Ok, please continue from perf/core, Ingo merged the first patch of this
patchset today,

- Arnaldo


[PATCH 16/23] tools lib bpf: Add set/is helpers for all prog types

2017-01-25 Thread Arnaldo Carvalho de Melo
From: Joe Stringer <j...@ovn.org>

These bpf_prog_types were exposed in the uapi but there were no
corresponding functions to set these types for programs in libbpf.

Signed-off-by: Joe Stringer <j...@ovn.org>
Acked-by: Wang Nan <wangn...@huawei.com>
Cc: Alexei Starovoitov <a...@fb.com>
Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20170123011128.26534-4-...@ovn.org
Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com>
---
 tools/lib/bpf/libbpf.c |  5 +
 tools/lib/bpf/libbpf.h | 10 ++
 2 files changed, 15 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 371cb40a2304..406838fa9c4f 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1448,8 +1448,13 @@ bool bpf_program__is_##NAME(struct bpf_program *prog)
\
return bpf_program__is_type(prog, TYPE);\
 }  \
 
+BPF_PROG_TYPE_FNS(socket_filter, BPF_PROG_TYPE_SOCKET_FILTER);
 BPF_PROG_TYPE_FNS(kprobe, BPF_PROG_TYPE_KPROBE);
+BPF_PROG_TYPE_FNS(sched_cls, BPF_PROG_TYPE_SCHED_CLS);
+BPF_PROG_TYPE_FNS(sched_act, BPF_PROG_TYPE_SCHED_ACT);
 BPF_PROG_TYPE_FNS(tracepoint, BPF_PROG_TYPE_TRACEPOINT);
+BPF_PROG_TYPE_FNS(xdp, BPF_PROG_TYPE_XDP);
+BPF_PROG_TYPE_FNS(perf_event, BPF_PROG_TYPE_PERF_EVENT);
 
 int bpf_map__fd(struct bpf_map *map)
 {
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index a5a8b86a06fe..2188ccdc0e2d 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -174,11 +174,21 @@ int bpf_program__nth_fd(struct bpf_program *prog, int n);
 /*
  * Adjust type of bpf program. Default is kprobe.
  */
+int bpf_program__set_socket_filter(struct bpf_program *prog);
 int bpf_program__set_tracepoint(struct bpf_program *prog);
 int bpf_program__set_kprobe(struct bpf_program *prog);
+int bpf_program__set_sched_cls(struct bpf_program *prog);
+int bpf_program__set_sched_act(struct bpf_program *prog);
+int bpf_program__set_xdp(struct bpf_program *prog);
+int bpf_program__set_perf_event(struct bpf_program *prog);
 
+bool bpf_program__is_socket_filter(struct bpf_program *prog);
 bool bpf_program__is_tracepoint(struct bpf_program *prog);
 bool bpf_program__is_kprobe(struct bpf_program *prog);
+bool bpf_program__is_sched_cls(struct bpf_program *prog);
+bool bpf_program__is_sched_act(struct bpf_program *prog);
+bool bpf_program__is_xdp(struct bpf_program *prog);
+bool bpf_program__is_perf_event(struct bpf_program *prog);
 
 /*
  * We don't need __attribute__((packed)) now since it is
-- 
2.9.3



[PATCH 15/23] tools lib bpf: Define prog_type fns with macro

2017-01-25 Thread Arnaldo Carvalho de Melo
From: Joe Stringer <j...@ovn.org>

Turning this into a macro allows future prog types to be added with a
single line per type.

Signed-off-by: Joe Stringer <j...@ovn.org>
Acked-by: Wang Nan <wangn...@huawei.com>
Cc: Alexei Starovoitov <a...@fb.com>
Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20170123011128.26534-3-...@ovn.org
Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com>
---
 tools/lib/bpf/libbpf.c | 41 -
 1 file changed, 16 insertions(+), 25 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 671d5ad07cf1..371cb40a2304 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1428,37 +1428,28 @@ static void bpf_program__set_type(struct bpf_program 
*prog,
prog->type = type;
 }
 
-int bpf_program__set_tracepoint(struct bpf_program *prog)
-{
-   if (!prog)
-   return -EINVAL;
-   bpf_program__set_type(prog, BPF_PROG_TYPE_TRACEPOINT);
-   return 0;
-}
-
-int bpf_program__set_kprobe(struct bpf_program *prog)
-{
-   if (!prog)
-   return -EINVAL;
-   bpf_program__set_type(prog, BPF_PROG_TYPE_KPROBE);
-   return 0;
-}
-
 static bool bpf_program__is_type(struct bpf_program *prog,
 enum bpf_prog_type type)
 {
return prog ? (prog->type == type) : false;
 }
 
-bool bpf_program__is_tracepoint(struct bpf_program *prog)
-{
-   return bpf_program__is_type(prog, BPF_PROG_TYPE_TRACEPOINT);
-}
-
-bool bpf_program__is_kprobe(struct bpf_program *prog)
-{
-   return bpf_program__is_type(prog, BPF_PROG_TYPE_KPROBE);
-}
+#define BPF_PROG_TYPE_FNS(NAME, TYPE)  \
+int bpf_program__set_##NAME(struct bpf_program *prog)  \
+{  \
+   if (!prog)  \
+   return -EINVAL; \
+   bpf_program__set_type(prog, TYPE);  \
+   return 0;   \
+}  \
+   \
+bool bpf_program__is_##NAME(struct bpf_program *prog)  \
+{  \
+   return bpf_program__is_type(prog, TYPE);\
+}  \
+
+BPF_PROG_TYPE_FNS(kprobe, BPF_PROG_TYPE_KPROBE);
+BPF_PROG_TYPE_FNS(tracepoint, BPF_PROG_TYPE_TRACEPOINT);
 
 int bpf_map__fd(struct bpf_map *map)
 {
-- 
2.9.3



[PATCH 17/23] tools lib bpf: Add libbpf_get_error()

2017-01-25 Thread Arnaldo Carvalho de Melo
From: Joe Stringer <j...@ovn.org>

This function will turn a libbpf pointer into a standard error code (or
0 if the pointer is valid).

This also allows removal of the dependency on linux/err.h in the public
header file, which causes problems in userspace programs built against
libbpf.

Signed-off-by: Joe Stringer <j...@ovn.org>
Acked-by: Wang Nan <wangn...@huawei.com>
Cc: Alexei Starovoitov <a...@fb.com>
Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20170123011128.26534-5-...@ovn.org
Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com>
---
 tools/lib/bpf/libbpf.c  | 8 
 tools/lib/bpf/libbpf.h  | 4 +++-
 tools/perf/tests/llvm.c | 2 +-
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 406838fa9c4f..e6cd62b1264b 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1542,3 +1543,10 @@ bpf_object__find_map_by_offset(struct bpf_object *obj, 
size_t offset)
}
return ERR_PTR(-ENOENT);
 }
+
+long libbpf_get_error(const void *ptr)
+{
+   if (IS_ERR(ptr))
+   return PTR_ERR(ptr);
+   return 0;
+}
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 2188ccdc0e2d..4014d1ba5e3d 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -22,8 +22,8 @@
 #define __BPF_LIBBPF_H
 
 #include 
+#include 
 #include 
-#include 
 #include   // for size_t
 
 enum libbpf_errno {
@@ -234,4 +234,6 @@ int bpf_map__set_priv(struct bpf_map *map, void *priv,
  bpf_map_clear_priv_t clear_priv);
 void *bpf_map__priv(struct bpf_map *map);
 
+long libbpf_get_error(const void *ptr);
+
 #endif
diff --git a/tools/perf/tests/llvm.c b/tools/perf/tests/llvm.c
index 02a33ebcd992..d357dab72e68 100644
--- a/tools/perf/tests/llvm.c
+++ b/tools/perf/tests/llvm.c
@@ -13,7 +13,7 @@ static int test__bpf_parsing(void *obj_buf, size_t obj_buf_sz)
struct bpf_object *obj;
 
obj = bpf_object__open_buffer(obj_buf, obj_buf_sz, NULL);
-   if (IS_ERR(obj))
+   if (libbpf_get_error(obj))
return TEST_FAIL;
bpf_object__close(obj);
return TEST_OK;
-- 
2.9.3



[PATCH 14/23] tools lib bpf: Fix map offsets in relocation

2017-01-25 Thread Arnaldo Carvalho de Melo
From: Joe Stringer <j...@ovn.org>

Commit 4708bbda5cb2 ("tools lib bpf: Fix maps resolution") attempted to
fix map resolution by identifying the number of symbols that point to
maps, and using this number to resolve each of the maps.

However, during relocation the original definition of the map size was
still in use. For up to two maps, the calculation was correct if there
was a small difference in size between the map definition in libbpf and
the one that the client library uses. However if the difference was
large, particularly if more than two maps were used in the BPF program,
the relocation would fail.

For example, when using a map definition with size 28, with three maps,
map relocation would count:

(sym_offset / sizeof(struct bpf_map_def) => map_idx)
(0 / 16 => 0), ie map_idx = 0
(28 / 16 => 1), ie map_idx = 1
(56 / 16 => 3), ie map_idx = 3

So, libbpf reports:

libbpf: bpf relocation: map_idx 3 large than 2

Fix map relocation by checking the exact offset of maps when doing
relocation.

Signed-off-by: Joe Stringer <j...@ovn.org>
[Allow different map size in an object]
Signed-off-by: Wang Nan <wangn...@huawei.com>
Cc: Alexei Starovoitov <a...@fb.com>
Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: netdev@vger.kernel.org
Fixes: 4708bbda5cb2 ("tools lib bpf: Fix maps resolution")
Link: http://lkml.kernel.org/r/20170123011128.26534-2-...@ovn.org
Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com>

Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com>
---
 tools/lib/bpf/libbpf.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 84e6b35da4bd..671d5ad07cf1 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -779,7 +779,7 @@ static int
 bpf_program__collect_reloc(struct bpf_program *prog,
   size_t nr_maps, GElf_Shdr *shdr,
   Elf_Data *data, Elf_Data *symbols,
-  int maps_shndx)
+  int maps_shndx, struct bpf_map *maps)
 {
int i, nrels;
 
@@ -829,7 +829,15 @@ bpf_program__collect_reloc(struct bpf_program *prog,
return -LIBBPF_ERRNO__RELOC;
}
 
-   map_idx = sym.st_value / sizeof(struct bpf_map_def);
+   /* TODO: 'maps' is sorted. We can use bsearch to make it 
faster. */
+   for (map_idx = 0; map_idx < nr_maps; map_idx++) {
+   if (maps[map_idx].offset == sym.st_value) {
+   pr_debug("relocation: find map %zd (%s) for 
insn %u\n",
+map_idx, maps[map_idx].name, insn_idx);
+   break;
+   }
+   }
+
if (map_idx >= nr_maps) {
pr_warning("bpf relocation: map_idx %d large than %d\n",
   (int)map_idx, (int)nr_maps - 1);
@@ -953,7 +961,8 @@ static int bpf_object__collect_reloc(struct bpf_object *obj)
err = bpf_program__collect_reloc(prog, nr_maps,
 shdr, data,
 obj->efile.symbols,
-obj->efile.maps_shndx);
+obj->efile.maps_shndx,
+obj->maps);
if (err)
return err;
}
-- 
2.9.3



Re: [PATCHv2 perf/core 0/7] Libbpf improvements

2017-01-24 Thread Arnaldo Carvalho de Melo
Em Sun, Jan 22, 2017 at 05:11:21PM -0800, Joe Stringer escreveu:
> Patch 1 fixes an issue when using drastically different BPF map definitions
> inside ELFs from a client using libbpf, vs the map definition libbpf uses.
> 
> Patches 2-4 add some simple, useful helper functions for setting prog type
> and retrieving libbpf errors without depending on kernel headers from
> userspace programs.
> 
> Patches 5-7 add a new pinning functionality for maps, programs, and objects.
> Library users may call bpf_map__pin(map, path) or bpf_program__pin(prog, path)
> to pin maps and programs separately, or use bpf_object__pin(obj, path) to
> pin all maps and programs from the BPF object to the path. The map and program
> variations require a full path where it will be pinned in the filesystem,
> and the object variation will create directories "maps/" and "progs/" under
> the specified path, then mount each map and program under those 
> subdirectories.

Merged the ones either acked by Wang or adjusted by you to address
Wang's remarks, the last ones introducing those __pin() methods, please
provide users together with those APIs, preferably entries for 'perf
test',

- Arnaldo
 
> ---
> v1: Initial post.
> v2: Wang Nan provided improvements to patch 1.
> Dropped patch 2 from v1.
> Added acks for acked patches.
> Split the bpf_obj__pin() to also provide map / program pinning APIs.
> Allow users to provide full filesystem path (don't autodetect/mount 
> BPFFS).
> 
> Joe Stringer (7):
>   tools lib bpf: Fix map offsets in relocation
>   tools lib bpf: Define prog_type fns with macro
>   tools lib bpf: Add set/is helpers for all prog types
>   tools lib bpf: Add libbpf_get_error()
>   tools lib bpf: Add bpf_program__pin()
>   tools lib bpf: Add bpf_map__pin()
>   tools lib bpf: Add bpf_object__pin()
> 
>  tools/lib/bpf/libbpf.c  | 240 
> ++--
>  tools/lib/bpf/libbpf.h  |  17 +++-
>  tools/perf/tests/llvm.c |   2 +-
>  3 files changed, 229 insertions(+), 30 deletions(-)
> 
> -- 
> 2.11.0


Re: [patch] samples/bpf: silence shift wrapping warning

2017-01-24 Thread Arnaldo Carvalho de Melo
Em Mon, Jan 23, 2017 at 10:44:34PM -0800, Alexei Starovoitov escreveu:
> On Mon, Jan 23, 2017 at 5:27 AM, Arnaldo Carvalho de Melo
> <arnaldo.m...@gmail.com> wrote:
> > Em Sun, Jan 22, 2017 at 02:51:25PM -0800, Alexei Starovoitov escreveu:
> >> On Sat, Jan 21, 2017 at 07:51:43AM +0300, Dan Carpenter wrote:
> >> > max_key is a value in the 0-63 range, so on 32 bit systems the shift
> >> > could wrap.
> >> >
> >> > Signed-off-by: Dan Carpenter <dan.carpen...@oracle.com>
> >>
> >> Looks fine. I think 'net-next' is ok.
> >
> > I could process these patches, if that would help,
> 
> Thanks for the offer.
> I don't think there will be conflicts with all the work happening in net-next,
> but it's best to avoid even possibility of it when we can.

Okay sir, I'll let you know when/if the tests I perform building
samples/bpf/ in my containers catch something,

- Arnaldo

> Dan,
> can you please resend the patch cc-ing Dave and netdev ?
> please mention [PATCH net-next] in the subject.
> 
> > - Arnaldo
> >
> >> Acked-by: Alexei Starovoitov <a...@kernel.org>
> >
> >> > diff --git a/samples/bpf/lwt_len_hist_user.c 
> >> > b/samples/bpf/lwt_len_hist_user.c
> >> > index ec8f3bb..bd06eef 100644
> >> > --- a/samples/bpf/lwt_len_hist_user.c
> >> > +++ b/samples/bpf/lwt_len_hist_user.c
> >> > @@ -68,7 +68,7 @@ int main(int argc, char **argv)
> >> > for (i = 1; i <= max_key + 1; i++) {
> >> > stars(starstr, data[i - 1], max_value, MAX_STARS);
> >> > printf("%8ld -> %-8ld : %-8ld |%-*s|\n",
> >> > -  (1l << i) >> 1, (1l << i) - 1, data[i - 1],
> >> > +  (1ULL << i) >> 1, (1ULL << i) - 1, data[i - 1],
> >> >MAX_STARS, starstr);
> >> > }
> >> >


Re: [patch] samples/bpf: silence shift wrapping warning

2017-01-23 Thread Arnaldo Carvalho de Melo
Em Sun, Jan 22, 2017 at 02:51:25PM -0800, Alexei Starovoitov escreveu:
> On Sat, Jan 21, 2017 at 07:51:43AM +0300, Dan Carpenter wrote:
> > max_key is a value in the 0-63 range, so on 32 bit systems the shift
> > could wrap.
> > 
> > Signed-off-by: Dan Carpenter 
> 
> Looks fine. I think 'net-next' is ok.

I could process these patches, if that would help,

- Arnaldo
 
> Acked-by: Alexei Starovoitov 
 
> > diff --git a/samples/bpf/lwt_len_hist_user.c 
> > b/samples/bpf/lwt_len_hist_user.c
> > index ec8f3bb..bd06eef 100644
> > --- a/samples/bpf/lwt_len_hist_user.c
> > +++ b/samples/bpf/lwt_len_hist_user.c
> > @@ -68,7 +68,7 @@ int main(int argc, char **argv)
> > for (i = 1; i <= max_key + 1; i++) {
> > stars(starstr, data[i - 1], max_value, MAX_STARS);
> > printf("%8ld -> %-8ld : %-8ld |%-*s|\n",
> > -  (1l << i) >> 1, (1l << i) - 1, data[i - 1],
> > +  (1ULL << i) >> 1, (1ULL << i) - 1, data[i - 1],
> >MAX_STARS, starstr);
> > }
> >  


Re: [PATCH perf/core REBASE 3/5] tools lib bpf: Add bpf_prog_{attach,detach}

2016-12-21 Thread Arnaldo Carvalho de Melo
Em Tue, Dec 20, 2016 at 10:50:22AM -0800, Joe Stringer escreveu:
> On 20 December 2016 at 06:32, Arnaldo Carvalho de Melo <a...@kernel.org> 
> wrote:
> > Em Tue, Dec 20, 2016 at 11:18:51AM -0300, Arnaldo Carvalho de Melo escreveu:
> >> This one makes it fail for CentOS 5 and 6, others may fail as well,
> >> still building, investigating...
> >
> > Ok, fixed it by making it follow the model of the other sys_bpf wrappers
> > setting up that bpf_attr union wrt initializing unamed struct members:
> > -   union bpf_attr attr = {
> > -   .target_fd = target_fd,
> > -   };
> > +   union bpf_attr attr;
> > +
> > +   bzero(, sizeof(attr));
> > +   attr.target_fd = target_fd;

> Ah, I just shifted these across originally so the delta would be
> minimal but now I know why this code is like this. Thanks.

np, making sure this code works in all those environments requires
automation, I'd say its impossible otherwise, too many details :-\

Fixed, pushed, merged, should hit 4.10 soon :-)

- Arnaldo


[PATCH 19/29] samples/bpf: Make samples more libbpf-centric

2016-12-20 Thread Arnaldo Carvalho de Melo
From: Joe Stringer <j...@ovn.org>

Switch all of the sample code to use the function names from
tools/lib/bpf so that they're consistent with that, and to declare their
own log buffers. This allow the next commit to be purely devoted to
getting rid of the duplicate library in samples/bpf.

Committer notes:

Testing it:

On a fedora rawhide container, with clang/llvm 3.9, sharing the host
linux kernel git tree:

  # make O=/tmp/build/linux/ headers_install
  # make O=/tmp/build/linux -C samples/bpf/

Since I forgot to make it privileged, just tested it outside the
container, using what it generated:

  # uname -a
  Linux jouet 4.9.0-rc8+ #1 SMP Mon Dec 12 11:20:49 BRT 2016 x86_64 x86_64 
x86_64 GNU/Linux
  # cd 
/var/lib/docker/devicemapper/mnt/c43e09a53ff56c86a07baf79847f00e2cc2a17a1e2220e1adbf8cbc62734feda/rootfs/tmp/build/linux/samples/bpf/
  # ls -la offwaketime
  -rwxr-xr-x. 1 root root 24200 Dec 15 12:19 offwaketime
  # file offwaketime
  offwaketime: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically 
linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, 
BuildID[sha1]=c940d3f127d5e66cdd680e42d885cb0b64f8a0e4, not stripped
  # readelf -SW offwaketime_kern.o  | grep PROGBITS
  [ 2] .text PROGBITS 40 00 00  AX  
0   0  4
  [ 3] kprobe/try_to_wake_up PROGBITS 40 d8 00  
AX  0   0  8
  [ 5] tracepoint/sched/sched_switch PROGBITS 000118 
000318 00  AX  0   0  8
  [ 7] maps  PROGBITS 000430 50 00  WA  
0   0  4
  [ 8] license   PROGBITS 000480 04 00  WA  
0   0  1
  [ 9] version   PROGBITS 000484 04 00  WA  
0   0  4
  # ./offwaketime | head -5
  
swapper/1;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule;__schedule;-;---;;
 106
  CPU 
0/KVM;entry_SYSCALL_64_fastpath;sys_ioctl;do_vfs_ioctl;kvm_vcpu_ioctl;kvm_arch_vcpu_ioctl_run;kvm_vcpu_block;schedule;__schedule;-;try_to_wake_up;swake_up_locked;swake_up;apic_timer_expired;apic_timer_fn;__hrtimer_run_queues;hrtimer_interrupt;local_apic_timer_interrupt;smp_apic_timer_interrupt;__irqentry_text_start;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary;;swapper/3
 2
  
Compositor;entry_SYSCALL_64_fastpath;sys_futex;do_futex;futex_wait;futex_wait_queue_me;schedule;__schedule;-;try_to_wake_up;futex_requeue;do_futex;sys_futex;entry_SYSCALL_64_fastpath;;SoftwareVsyncTh
 5
  
firefox;entry_SYSCALL_64_fastpath;sys_poll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;pollwake;__wake_up_common;__wake_up_sync_key;pipe_write;__vfs_write;vfs_write;sys_write;entry_SYSCALL_64_fastpath;;Timer
 13
  JS 
Helper;entry_SYSCALL_64_fastpath;sys_futex;do_futex;futex_wait;futex_wait_queue_me;schedule;__schedule;-;try_to_wake_up;do_futex;sys_futex;entry_SYSCALL_64_fastpath;;firefox
 2
  #

Signed-off-by: Joe Stringer <j...@ovn.org>
Tested-by: Arnaldo Carvalho de Melo <a...@redhat.com>
Cc: Alexei Starovoitov <a...@fb.com>
Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: Wang Nan <wangn...@huawei.com>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20161214224342.12858-2-...@ovn.org
Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com>
---
 samples/bpf/bpf_load.c| 17 +---
 samples/bpf/bpf_load.h|  3 +++
 samples/bpf/fds_example.c |  9 ---
 samples/bpf/lathist_user.c|  2 +-
 samples/bpf/libbpf.c  | 23 
 samples/bpf/libbpf.h  | 18 ++---
 samples/bpf/lwt_len_hist_user.c   |  6 +++--
 samples/bpf/offwaketime_user.c|  8 +++---
 samples/bpf/sampleip_user.c   |  4 +--
 samples/bpf/sock_example.c| 12 +
 samples/bpf/sockex1_user.c|  6 ++---
 samples/bpf/sockex2_user.c|  4 +--
 samples/bpf/sockex3_user.c|  4 +--
 samples/bpf/spintest_user.c   |  8 +++---
 samples/bpf/tc_l2_redirect_user.c |  4 +--
 samples/bpf/test_cgrp2_array_pin.c|  4 +--
 samples/bpf/test_cgrp2_attach.c   | 11 +---
 samples/bpf/test_cgrp2_attach2.c  |  7 +++--
 samples/bpf/test_cgrp2_sock.c |  6 +++--
 samples/bpf/test_current_task_under_cgroup_user.c |  8 +++---
 samples/bpf/test_lru_dist.c   | 32 +++
 samples/bpf/test_probe_write_user_user.c  |  2 +-
 samples/bpf/trace_event_user.c| 14 +-
 samples/bpf/trace_output_user.c   |  2 +-
 samples/bpf/tracex2_user.c  

[PATCH 25/29] samples/bpf: Switch over to libbpf

2016-12-20 Thread Arnaldo Carvalho de Melo
PROGBITS 000484 04 00  
WA  0   0  4
[10] .symtab   SYMTAB   000488 000120 18
  1   4  8
  Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
[root@jouet bpf]# ./offwaketime | head -3
  
qemu-system-x86;entry_SYSCALL_64_fastpath;sys_ppoll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;hrtimer_wakeup;__hrtimer_run_queues;hrtimer_interrupt;local_apic_timer_interrupt;smp_apic_timer_interrupt;__irqentry_text_start;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel;start_cpu;;swapper/0
 4
  
firefox;entry_SYSCALL_64_fastpath;sys_poll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;pollwake;__wake_up_common;__wake_up_sync_key;pipe_write;__vfs_write;vfs_write;sys_write;entry_SYSCALL_64_fastpath;;Timer
 1
  
swapper/2;start_cpu;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule;__schedule;-;---;;
 61
  [root@jouet bpf]#

Signed-off-by: Joe Stringer <j...@ovn.org>
Tested-by: Arnaldo Carvalho de Melo <a...@redhat.com>
Cc: Alexei Starovoitov <a...@fb.com>
Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: Wang Nan <wangn...@huawei.com>
Cc: netdev@vger.kernel.org
Link: 
https://github.com/joestringer/linux/commit/5c40f54a52b1f437123c81e21873f4b4b1f9bd55.patch
Link: http://lkml.kernel.org/n/tip-xr8twtx7sjh5821g8qw47...@git.kernel.org
[ Use -I$(srctree)/tools/lib/ to support out of source code tree builds, as 
noticed by Wang Nan ]
Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com>

Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com>
---
 samples/bpf/Makefile |  68 +---
 samples/bpf/README.rst   |   4 +-
 samples/bpf/bpf_load.c   |   3 +-
 samples/bpf/fds_example.c|   3 +-
 samples/bpf/libbpf.c | 111 ---
 samples/bpf/libbpf.h |  19 +--
 samples/bpf/sock_example.c   |   3 +-
 samples/bpf/test_cgrp2_attach.c  |   3 +-
 samples/bpf/test_cgrp2_attach2.c |   3 +-
 samples/bpf/test_cgrp2_sock.c|   3 +-
 10 files changed, 52 insertions(+), 168 deletions(-)

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index f2219c1489e5..81b0ef2f7994 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -35,40 +35,43 @@ hostprogs-y += tc_l2_redirect
 hostprogs-y += lwt_len_hist
 hostprogs-y += xdp_tx_iptunnel
 
-test_lru_dist-objs := test_lru_dist.o libbpf.o
-sock_example-objs := sock_example.o libbpf.o
-fds_example-objs := bpf_load.o libbpf.o fds_example.o
-sockex1-objs := bpf_load.o libbpf.o sockex1_user.o
-sockex2-objs := bpf_load.o libbpf.o sockex2_user.o
-sockex3-objs := bpf_load.o libbpf.o sockex3_user.o
-tracex1-objs := bpf_load.o libbpf.o tracex1_user.o
-tracex2-objs := bpf_load.o libbpf.o tracex2_user.o
-tracex3-objs := bpf_load.o libbpf.o tracex3_user.o
-tracex4-objs := bpf_load.o libbpf.o tracex4_user.o
-tracex5-objs := bpf_load.o libbpf.o tracex5_user.o
-tracex6-objs := bpf_load.o libbpf.o tracex6_user.o
-test_probe_write_user-objs := bpf_load.o libbpf.o test_probe_write_user_user.o
-trace_output-objs := bpf_load.o libbpf.o trace_output_user.o
-lathist-objs := bpf_load.o libbpf.o lathist_user.o
-offwaketime-objs := bpf_load.o libbpf.o offwaketime_user.o
-spintest-objs := bpf_load.o libbpf.o spintest_user.o
-map_perf_test-objs := bpf_load.o libbpf.o map_perf_test_user.o
-test_overhead-objs := bpf_load.o libbpf.o test_overhead_user.o
-test_cgrp2_array_pin-objs := libbpf.o test_cgrp2_array_pin.o
-test_cgrp2_attach-objs := libbpf.o test_cgrp2_attach.o
-test_cgrp2_attach2-objs := libbpf.o test_cgrp2_attach2.o cgroup_helpers.o
-test_cgrp2_sock-objs := libbpf.o test_cgrp2_sock.o
-test_cgrp2_sock2-objs := bpf_load.o libbpf.o test_cgrp2_sock2.o
-xdp1-objs := bpf_load.o libbpf.o xdp1_user.o
+# Libbpf dependencies
+LIBBPF := libbpf.o ../../tools/lib/bpf/bpf.o
+
+test_lru_dist-objs := test_lru_dist.o $(LIBBPF)
+sock_example-objs := sock_example.o $(LIBBPF)
+fds_example-objs := bpf_load.o $(LIBBPF) fds_example.o
+sockex1-objs := bpf_load.o $(LIBBPF) sockex1_user.o
+sockex2-objs := bpf_load.o $(LIBBPF) sockex2_user.o
+sockex3-objs := bpf_load.o $(LIBBPF) sockex3_user.o
+tracex1-objs := bpf_load.o $(LIBBPF) tracex1_user.o
+tracex2-objs := bpf_load.o $(LIBBPF) tracex2_user.o
+tracex3-objs := bpf_load.o $(LIBBPF) tracex3_user.o
+tracex4-objs := bpf_load.o $(LIBBPF) tracex4_user.o
+tracex5-objs := bpf_load.o $(LIBBPF) tracex5_user.o
+tracex6-objs := bpf_load.o $(LIBBPF) tracex6_user.o
+test_probe_write_user-objs := bpf_load.o $(LIBBPF) test_probe_wr

[GIT PULL 00/29] perf/core improvements and fixes

2016-12-20 Thread Arnaldo Carvalho de Melo
Hi Ingo,

Please consider pulling, I had most of this queued before your first
pull req to Linus for 4.10, most are fixes, with 'perf sched timehist --idle'
as a followup new feature to the 'perf sched timehist' command introduced in
this window.

One other thing that delayed this was the samples/bpf/ switch to
tools/lib/bpf/ that involved fixing up merge clashes with net.git and also
to properly test it, after more rounds than antecipated, but all seems ok
now and would be good to get this merge issues past us ASAP.

- Arnaldo

Test results at the end of this message, as usual.

The following changes since commit e7aa8c2eb11ba69b1b69099c3c7bd6be3087b0ba:

  Merge tag 'docs-4.10' of git://git.lwn.net/linux (2016-12-12 21:58:13 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
tags/perf-core-for-mingo-20161220

for you to fetch changes up to 9899694a7f67714216665b87318eb367e2c5c901:

  samples/bpf: Move open_raw_sock to separate header (2016-12-20 12:00:40 -0300)


perf/core improvements and fixes:

New features:

- Introduce 'perf sched timehist --idle', to analyse processes
  going to/from idle state (Namhyung Kim)

Fixes:

- Allow 'perf record -u user' to continue when facing races with threads
  going away after having scanned them via /proc (Jiri Olsa)

- Fix 'perf mem' --all-user/--all-kernel options (Jiri Olsa)

- Support jumps with multiple arguments (Ravi Bangoria)

- Fix jumps to before the function where they are located (Ravi
Bangoria)

- Fix lock-pi help string (Davidlohr Bueso)

- Fix build of 'perf trace' in odd systems such as a RHEL PPC one (Jiri Olsa)

- Do not overwrite valid build id in 'perf diff' (Kan Liang)

- Don't throw error for zero length symbols, allowing the use of the TUI
  in PowerPC, where such symbols became more common recently (Ravi Bangoria)

Infrastructure:

- Switch of samples/bpf/ to use tools/lib/bpf, removing libbpf
  duplication (Joe Stringer)

- Move headers check into bash script (Jiri Olsa)

Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com>

----
Arnaldo Carvalho de Melo (3):
  perf tools: Remove some needless __maybe_unused
  samples/bpf: Make perf_event_read() static
  samples/bpf: Be consistent with bpf_load_program bpf_insn parameter

Davidlohr Bueso (1):
  perf bench futex: Fix lock-pi help string

Jiri Olsa (7):
  perf tools: Move headers check into bash script
  perf mem: Fix --all-user/--all-kernel options
  perf evsel: Use variable instead of repeating lengthy FD macro
  perf thread_map: Add thread_map__remove function
  perf evsel: Allow to ignore missing pid
  perf record: Force ignore_missing_thread for uid option
  perf trace: Check if MAP_32BIT is defined (again)

Joe Stringer (8):
  tools lib bpf: Sync {tools,}/include/uapi/linux/bpf.h
  tools lib bpf: use __u32 from linux/types.h
  tools lib bpf: Add flags to bpf_create_map()
  samples/bpf: Make samples more libbpf-centric
  samples/bpf: Switch over to libbpf
  tools lib bpf: Add bpf_prog_{attach,detach}
  samples/bpf: Remove perf_event_open() declaration
  samples/bpf: Move open_raw_sock to separate header

Kan Liang (1):
  perf diff: Do not overwrite valid build id

Namhyung Kim (6):
  perf sched timehist: Split is_idle_sample()
  perf sched timehist: Introduce struct idle_time_data
  perf sched timehist: Save callchain when entering idle
  perf sched timehist: Skip non-idle events when necessary
  perf sched timehist: Add -I/--idle-hist option
  perf sched timehist: Show callchains for idle stat

Ravi Bangoria (3):
  perf annotate: Support jump instruction with target as second operand
  perf annotate: Fix jump target outside of function address range
  perf annotate: Don't throw error for zero length symbols

 samples/bpf/Makefile  |  70 +--
 samples/bpf/README.rst|   4 +-
 samples/bpf/bpf_load.c|  21 +-
 samples/bpf/bpf_load.h|   3 +
 samples/bpf/fds_example.c |  13 +-
 samples/bpf/lathist_user.c|   2 +-
 samples/bpf/libbpf.c  | 176 ---
 samples/bpf/libbpf.h  |  28 +-
 samples/bpf/lwt_len_hist_user.c   |   6 +-
 samples/bpf/offwaketime_user.c|   8 +-
 samples/bpf/sampleip_user.c   |   7 +-
 samples/bpf/sock_example.c|  14 +-
 samples/bpf/sock_example.h|  35 ++
 samples/bpf/sockex1_user.c|   7 +-
 samples/bpf/sockex2_user.c|   5 +-
 samples/bpf/sockex3_user.c   

  1   2   3   4   5   6   7   >