Hi, Ok, I found the problem. It still exists upstream , just very tricky to trigger. Took me lots of time with gdb + watchpoints to track this down, where in fact it was just in front of me.
>From the crashdump: Program received signal SIGSEGV, Segmentation fault. free_dup_event (oe=0x26a39a0, event=0xffffffff00000000) I was puzzled by the 0xffffffff00000000. I tracked down where this value was coming from using watchpoints. In my case the memory was used before by elfutils to back the struct Elf. The -1 in the upper bits came from: file_read_elf () at third_party/elfutils/libelf/elf_begin.c:451 elf->state.elf64.scns.data[cnt].shndx_index = -1; And yet the next access to that memory location was in the crash. That meant the memory was released by elfutils and reused by ordered_events, yet without any initialization. But looking at alloc_event(), it was not obvious to figure out how a new_event->event could be uninitialized. Well, it turns out there is this little hack where the code commandeers the first element in the oe->buffer to use as a list_head for the oe->to_free freelist. The problem is that this entry also gets freed, but its event->event field is NEVER initialized. So depending on how the memory was previously used, you could get a on NULL value and crash in free_dup_event(). This is what happened to me. I am glad I pursued this further because the bug is still in the upstream version. The patch is a one-liner fixing the initialization of the event->event = NULL. For the other elements in the list, the initialization is already done at the end of alloc_event(). I will send the patch separately. On Wed, Aug 8, 2018 at 1:23 AM Jiri Olsa <[email protected]> wrote: > > On Tue, Aug 07, 2018 at 12:11:05PM -0700, Stephane Eranian wrote: > > Jiri, > > On Tue, Aug 7, 2018 at 1:50 AM Jiri Olsa <[email protected]> wrote: > > > > > > On Tue, Aug 07, 2018 at 01:16:22AM -0700, Stephane Eranian wrote: > > > > On Tue, Aug 7, 2018 at 12:20 AM Jiri Olsa <[email protected]> wrote: > > > > > > > > > > On Mon, Aug 06, 2018 at 06:23:35PM -0700, Stephane Eranian wrote: > > > > > > Depending on memory allocations, it was possible to get a SEGFAULT > > > > > > in > > > > > > free_dup_event() because the event pointer was bogus: > > > > > > > > > > > > perf[1354]: segfault at ffffffff00000006 ip 00000000004b7fc7 > > > > > > > > > > is there any reproducer? > > > > > > > > > The cmdline is simple: > > > > $ perf record -e cycles:pp -o - -a sleep 1 | perf inject -b -i - > > > > >/dev/null > > > > I was using v4.13 for my tests and it may be sensitive to compiler. > > > > Was using LLVM. > > > > > > I can't make it fail even when I compile with clang 'make CC=clang' > > > > > I checked, my actual reproducer is: > > $ perf record -o - -e cycles date | perf inject -b -i - >/dev/null > > Tue Aug 7 12:03:48 PDT 2018 > > [ perf record: Woken up 1 times to write data ] > > [ perf record: Captured and wrote 0.000 MB - ] > > Segmentation fault (core dumped) > > > > the crash is in perf inject. > > So if I do: > > $ perf record -o - -e cycles date >tt > > $ gdb perf > > (gdb) r inject -b -i - < tt >/dev/null > > Program received signal SIGSEGV, Segmentation fault. > > free_dup_event (oe=0x26a39a0, event=0xffffffff00000000) at > > util/ordered-events.c:85 > > 85 in util/ordered-events.c > > (gdb) bt > > #0 free_dup_event (oe=0x26a39a0, event=0xffffffff00000000) at > > util/ordered-events.c:85 > > #1 ordered_events__free (oe=0x26a39a0) at util/ordered-events.c:310 > > #2 0x00000000004b5a56 in __perf_session__process_pipe_events > > (session=<optimized out>) at util/session.c:1753 > > #3 perf_session__process_events (session=<optimized out>) at > > util/session.c:1932 > > #4 0x000000000043a2eb in __cmd_inject (inject=<optimized out>) at > > builtin-inject.c:750 > > #5 cmd_inject (argc=<optimized out>, argv=<optimized out>) at > > builtin-inject.c:924 > > #6 0x000000000046b175 in run_builtin (p=0xabc640 <commands+576>, > > argc=4, argv=0x7fffffffe560) at perf.c:297 > > #7 0x000000000046b062 in handle_internal_command (argc=4, > > argv=0x7fffffffe560) at perf.c:349 > > #8 0x000000000046a5e8 in run_argv (argcp=<optimized out>, > > argv=<optimized out>) at perf.c:393 > > #9 main (argc=4, argv=0x7fffffffe560) at perf.c:531 > > nice > > > > > Again, this is with an older version of perf compiled with LLVM. > > can you also reproduce it with latest code? > > > Notice the value of event passed to free_dup_event(): 0xffffffff00000000 > > And yes, I checked sizeof(union_perf_event) = 4168 which is the size of > > the mmap2_event which is the largest. > > > > I also checked that you are freeing what you have actually allocated. > > No double free. > > If I add the padding or modifies the call to memdup() as in the patch, > > then the problem > > goes away. > > > > If you don't want to copy 4Kb each time, then you could also make the > > event field > > by a void *event and case whenever needed. > > not sure I understand this one.. could you please elaborate? > > > > > I suspect the problem may come from a compiler optimization or assumption > > which > > clashes with what you are optimizing here. > > I'll go throught the code and try to find somethign.. thanks a lot for the > backtrace > > jirka >

