----- Original Message -----
> From: "Steven Rostedt" <[email protected]>
> To: "Chunyu Hu" <[email protected]>
> Cc: [email protected]
> Sent: Wednesday, June 22, 2016 9:08:14 PM
> Subject: Re: [PATCH] tracing: Fix oops caused by graph notrace filter
> 
> On Tue, 21 Jun 2016 18:57:46 +0800
> Chunyu Hu <[email protected]> wrote:
> 
> > wakeup tracer can use function_graph trace when display_graph trace
> > option is setup by user via tracefs, and bypass the set_graph_function
> > and set_graph_notrace. But the bypass of set_graph_notrace is not clean.
> > Although wakeup_graph_entry does most of the bypass, and both the enry
> > and exit event will be submitted to the trace ring buffer, the ret_stack
> > index, which will be assigned to depth field of graph enrty event is not
> > handled. The issue is that the depth is used as the array index of
> > fgraph_cpu_data and can cause an oops when it's negative. irqsoff tracer
> > has same issue. To see the oops:
> > 
> > echo 1 > options/display_graph
> > echo schedule > set_graph_notrace
> > echo wakeup > current_tracer
> > cat trace
> > cat trace
> 
> I'm not able to trigger an oops, but I do see them being traced.

Actually this needs to 'cat trace' for more than two times, after collecting
three graph entry events at least, as the print showed the depth was
-65534. One example call trace is like this, it's usually the same call trace:

[73710.161617] BUG: unable to handle kernel paging request at ffffe8fdf79836e0 
[73710.168616] IP: [<ffffffff81161254>] print_graph_entry+0x3c4/0x3e0 
[73710.174815] PGD 272759067 PUD 272758067 PMD 0  
[73710.179313] Oops: 0002 [#1] SMP 
[73710.182454] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs 
fscache rpcrdma ib_isert iscsi_target_mod ib_iser ib_srpt target_core_mod 
ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm 
ib_cm iw_cm ocrdma ib_core intel_powerclamp coretemp kvm_intel kvm irqbypass 
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul 
glue_helper ablk_helper be2iscsi ipmi_ssif cryptd ipmi_devintf cdc_ether 
iscsi_boot_sysfs libiscsi sg ipmi_si scsi_transport_iscsi iTCO_wdt usbnet 
pcspkr iTCO_vendor_support mii ipmi_msghandler ioatdma shpchp dca lpc_ich 
i2c_i801 i7core_edac mfd_core edac_core acpi_cpufreq nfsd auth_rpcgss nfs_acl 
lockd grace sunrpc ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit 
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ata_generic 
pata_acpi drm mptsas ata_piix scsi_transport_sas mptscsih libata crc32c_intel 
serio_raw be2net mptbase i2c_core bnx2 dm_mirror dm_region_hash dm_log dm_mod 
[73710.269323] CPU: 8 PID: 31611 Comm: cat Not tainted 4.7.0-rc3 #1 
[73710.275325] Hardware name: IBM System x3550 M3 -[7944I21]-/69Y4438     , 
BIOS -[D6E157AUS-1.15]- 06/13/2012 
[73710.285054] task: ffff88007f213d80 ti: ffff88007f238000 task.ti: 
ffff88007f238000 
[73710.292528] RIP: 0010:[<ffffffff81161254>]  [<ffffffff81161254>] 
print_graph_entry+0x3c4/0x3e0 
[73710.301149] RSP: 0018:ffff88007f23bc68  EFLAGS: 00010286 
[73710.306458] RAX: ffffe8fdf7a036c0 RBX: ffff88007f2f0000 RCX: 
ffffffff810ef390 
[73710.313585] RDX: ffffffffffff0002 RSI: 0000000000000282 RDI: 
ffffffff81c81de0 
[73710.320713] RBP: ffff88007f23bca8 R08: 0000000000000003 R09: 
ffff880444a69ea0 
[73710.327839] R10: 0000000000000001 R11: 000000000000000b R12: 
ffff88007f23bce0 
[73710.334966] R13: ffff88007f2f1098 R14: ffff880444a69ea0 R15: 
0000000000000038 
[73710.342094] FS:  00007fda5c740740(0000) GS:ffff880277b00000(0000) 
knlGS:0000000000000000 
[73710.350177] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
[73710.355918] CR2: ffffe8fdf79836e0 CR3: 0000000033c32000 CR4: 
00000000000006e0 
[73710.363044] Stack: 
[73710.365059]  ffff8802730ce400 ffff880444a69ea0 000000007f23bcc8 
ffff88007f2f0000 
[73710.372524]  ffff88007f2f1098 0000000000000000 0000000000000038 
ffff88027557698c 
[73710.379991]  ffff88007f23bd28 ffffffff81161329 0000000000000aa8 
0000000000000994 
[73710.387456] Call Trace: 
[73710.389906]  [<ffffffff81161329>] print_graph_function_flags+0xb9/0x500 
[73710.396514]  [<ffffffff810ef390>] ? rcu_sched_qs+0x60/0x60 
[73710.401998]  [<ffffffff8115ea10>] wakeup_print_line+0x20/0x30 
[73710.407740]  [<ffffffff81158524>] print_trace_line+0x54/0x510 
[73710.413481]  [<ffffffff81159389>] s_show+0x29/0x150 
[73710.418357]  [<ffffffff8123ddbe>] seq_read+0x22e/0x370 
[73710.423495]  [<ffffffff81218b67>] __vfs_read+0x37/0x150 
[73710.428716]  [<ffffffff81218b35>] ? __vfs_read+0x5/0x150 
[73710.434027]  [<ffffffff812c9413>] ? security_file_permission+0xa3/0xc0 
[73710.440550]  [<ffffffff81218b35>] ? __vfs_read+0x5/0x150 
[73710.445860]  [<ffffffff8121912e>] vfs_read+0x8e/0x140 
[73710.450908]  [<ffffffff8121a625>] SyS_read+0x55/0xc0 
[73710.455872]  [<ffffffff81003b12>] do_syscall_64+0x62/0x110 
[73710.461356]  [<ffffffff816c65a1>] entry_SYSCALL64_slow_path+0x25/0x25 
[73710.467791] Code: 42 18 49 89 41 38 48 8b 42 20 49 89 41 40 48 8b 42 28 49 
89 41 48 4c 89 e0 e9 93 fd ff ff 89 50 08 e9 95 fc ff ff 49 8b 4c 24 08 <48> 89 
4c d0 10 e9 2f ff ff ff 4c 89 e0 e9 74 fd ff ff 66 2e 0f  
[73710.487945] RIP  [<ffffffff81161254>] print_graph_entry+0x3c4/0x3e0 
[73710.494229]  RSP <ffff88007f23bc68> 
[73710.497714] CR2: ffffe8fdf79836e0 
 

> > 
> > Making ftrace_graph_notrace_addr always return false when tracers need
> > to bypass it is a proposed fix.
> 
> There's a lot of things wrong with this patch, but before we go there,

Thanks for the attention. Agree :/. , I found one wrong, I read the code 
lines again, the graph_ret event won't be added to ring buffer in this case.
And the trying to bypass is not so wise, I was thinking why filters were 
ignored in this tracer? overhead reason? Looks like it's not so.

> why should the latency tracers ignore set_graph_notrace?

There are two directions. I went to the wrong. If we resolve this through
enabling this graph notrace filter, it's also providing a small extra function 
of this tracer.

> What about this patch?
> 
> -- Steve
> 
> diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
> index 03cdff84d026..68fcee5097d2 100644
> --- a/kernel/trace/trace_irqsoff.c
> +++ b/kernel/trace/trace_irqsoff.c
> @@ -175,6 +175,9 @@ static int irqsoff_graph_entry(struct ftrace_graph_ent
> *trace)
>       int ret;
>       int pc;
>  
> +     if (trace->depth < 0)
> +             return 0;
> +

I tried the patch, if just for resolving the oops issue, it's indeed enough.
But if we care about just this filter, and compare this with the 
trace_graph_entry
filter handling, I found that, the depth 0 item will be submitted to the ring 
buffer, not
respecting the filter. I still use the same steps. It's possible to show 
schedule entry:

 419.456410 |     <idle>-0    |  .N..              |  
schedule_preempt_disabled() {
  419.456410 |     <idle>-0    |  .N..              |    schedule() {
 ------------------------------------------
 3)    <idle>-0    =>   rngd-1979
 ------------------------------------------

  419.456425 |    rngd-1979    |  d...  0.405 us    |  finish_task_switch();
  419.456428 |    rngd-1979    |  ....              |  __fdget() {
  419.456428 |    rngd-1979    |  ....  0.913 us    |    __fget_light();
  419.456430 |    rngd-1979    |  ....  2.103 us    |  }
  419.456431 |    rngd-1979    |  ....  0.241 us    |  random_poll();
  419.456433 |    rngd-1979    |  ....              |  poll_schedule_timeout() {
  419.456433 |    rngd-1979    |  ....              |    
schedule_hrtimeout_range() {
  419.456434 |    rngd-1979    |  ....              |      
schedule_hrtimeout_range_clock() {
  419.456435 |    rngd-1979    |  ....              |        schedule() {
  419.456444 |    rngd-1979    |  d...  0.000 us    |  __schedule();
  419.456445 |    rngd-1979    |  d...              |  /*   1979:120:S ==> 
[003]  1494:100:R kworker/3:1H */

And compared with trace_graph_entry function, don't we need both the two check? 

+       if (trace->depth < 0)
+               return 0;
+       if (ftrace_graph_notrace_addr(trace->func))
+               return 1;



>       if (!func_prolog_dec(tr, &data, &flags))
>               return 0;
>  
> diff --git a/kernel/trace/trace_sched_wakeup.c
> b/kernel/trace/trace_sched_wakeup.c
> index 9d4399b553a3..ce9889bff8b4 100644
> --- a/kernel/trace/trace_sched_wakeup.c
> +++ b/kernel/trace/trace_sched_wakeup.c
> @@ -239,6 +239,9 @@ static int wakeup_graph_entry(struct ftrace_graph_ent
> *trace)
>       unsigned long flags;
>       int pc, ret = 0;
>  
> +     if (trace->depth < 0)
> +             return 0;
> +

Do't we need this check?
+       if (ftrace_graph_notrace_addr(trace->func))
+               return 1;

>       if (!func_prolog_preempt_disable(tr, &data, &pc))
>               return 0;
>  
> 

-- 
Regards,
Chunyu Hu

Reply via email to