[PATCH] x86/mce: Clean up TP_printk() output line of the mce_record tracepoint

2024-03-29 Thread Naik, Avadhut



On 3/29/2024 11:50, Luck, Tony wrote:
>>> - While at it, don't forget to:
>>>
>>>s/ADDR/MISC/SYND
>>> /addr/misc/synd
>>>
>> These are actually acronyms for Address, Miscellaneous and Syndrome 
>> registers.
> 
> They look like abbreviations, not acronyms to me.
> 
> -Tony

Yes, they are actually abbreviations. Wrong choice of words on my part.
Was under the impression that Boris' recommendation also applied to
abbreviations. Will change them though and resubmit.

-- 
Thanks,
Avadhut Naik



RE: [PATCH] x86/mce: Clean up TP_printk() output line of the mce_record tracepoint

2024-03-29 Thread Luck, Tony
>> - While at it, don't forget to:
>> 
>>s/ADDR/MISC/SYND
>> /addr/misc/synd
>>
> These are actually acronyms for Address, Miscellaneous and Syndrome registers.

They look like abbreviations, not acronyms to me.

-Tony


Re: [PATCH] ftrace: make extra rcu_is_watching() validation check optional

2024-03-29 Thread Andrii Nakryiko
On Tue, Mar 26, 2024 at 11:58 AM Steven Rostedt  wrote:
>
> On Tue, 26 Mar 2024 09:16:33 -0700
> Andrii Nakryiko  wrote:
>
> > > It's no different than lockdep. Test boxes should have it enabled, but
> > > there's no reason to have this enabled in a production system.
> > >
> >
> > I tend to agree with Steven here (which is why I sent this patch as it
> > is), but I'm happy to do it as an opt-out, if Masami insists. Please
> > do let me know if I need to send v2 or this one is actually the one
> > we'll end up using. Thanks!
>
> Masami,
>
> Are you OK with just keeping it set to N.
>
> We could have other options like PROVE_LOCKING enable it.
>

So what's the conclusion, Masami? Should I send another version where
this config is opt-out, or are you ok with keeping it as opt-in as
proposed in this revision?

> -- Steve



[PATCH] x86/mce: Clean up TP_printk() output line of the mce_record tracepoint

2024-03-29 Thread Naik, Avadhut



On 3/29/2024 02:11, Ingo Molnar wrote:
> 
> Please split out the other (capitalization) changes to the output into 
> a separate patch.
> 
Okay. Will put the capitalization changes into a separate patch.

> - While at it, don't forget to:
> 
>s/ADDR/MISC/SYND
> /addr/misc/synd
>
These are actually acronyms for Address, Miscellaneous and Syndrome registers.

It was recommended to keep the acronyms in ALL CAPS. Hence, didn't change them.

https://lore.kernel.org/linux-edac/20240327205435.3667588-1-avadhut.n...@amd.com/T/#m0c04f1c0deaa0347af66653a5950aad5f6b320e7
 
> - Also, it's a bit weird that we have 'CPU' but also 'processor' 
>   fields, why isn't it 'vendor' and 'CPUID'?
> 
Think it has been this way since the tracepoint was created back
in 2009. Will modify the field per your suggestion.

> - Finally, why are some fields 'merged' as per ADDR/MISC/SYND, while 
>   others are listed separately? All that have separate names should be 
>   listed separately.
> 
Will separate the fields so that each is listed individually.
> Ie. something like the patch below?
> 
> Thanks,
> 
>   Ingo
> 
> >
> From: Ingo Molnar 
> Date: Fri, 29 Mar 2024 08:09:23 +0100
> Subject: [PATCH] x86/mce: Clean up TP_printk() output line of the mce_record 
> tracepoint
> 
>  - Only capitalize entries where that makes sense
>  - Print separate values separately
>  - Rename 'PROCESSOR' to vendor & CPUID
> 
> Signed-off-by: Ingo Molnar 
> ---
>  include/trace/events/mce.h | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/include/trace/events/mce.h b/include/trace/events/mce.h
> index 1391ada0da3b..c5b0523f25ee 100644
> --- a/include/trace/events/mce.h
> +++ b/include/trace/events/mce.h
> @@ -55,15 +55,18 @@ TRACE_EVENT(mce_record,
>   __entry->cpuvendor  = m->cpuvendor;
>   ),
>  
> - TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, 
> ADDR/MISC/SYND: %016Lx/%016Lx/%016Lx, RIP: %02x:<%016Lx>, TSC: %llx, 
> PROCESSOR: %u:%x, TIME: %llu, SOCKET: %u, APIC: %x",
> + TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, 
> addr: %016Lx, misc: %016Lx, SYND: %016Lx, RIP: %02x:<%016Lx>, TSC: %llx, 
> vendor: %u, CPUID: %x, time: %llu, socket: %u, APIC: %x",
>   __entry->cpu,
>   __entry->mcgcap, __entry->mcgstatus,
>   __entry->bank, __entry->status,
>   __entry->ipid,
> - __entry->addr, __entry->misc, __entry->synd,
> + __entry->addr,
> + __entry->misc,
> + __entry->synd,
>   __entry->cs, __entry->ip,
>   __entry->tsc,
> - __entry->cpuvendor, __entry->cpuid,
> + __entry->cpuvendor,
> + __entry->cpuid,
>   __entry->walltime,
>   __entry->socketid,
>   __entry->apicid)

-- 
Thanks,
Avadhut Naik



[PATCH] perf-ftrace: sync funcgraph options with perf-ftrace interface

2024-03-29 Thread piecuch
From: Krzysztof Piecuch 

Adding overrun, retval, retval-hex and tail options to set
off-by-default funcgraph features.

Add nocpu, nooverhead, noduration options to turn off funcgraph
features which are on by default.

Signed-off-by: Krzysztof Piecuch 
---
 Documentation/trace/ftrace.rst   |   5 +-
 tools/perf/Documentation/perf-ftrace.txt |  16 +++
 tools/perf/builtin-ftrace.c  | 118 ++-
 tools/perf/util/ftrace.h |   7 ++
 4 files changed, 142 insertions(+), 4 deletions(-)

diff --git a/Documentation/trace/ftrace.rst b/Documentation/trace/ftrace.rst
index 7e7b8ec17934..9a608a59dce6 100644
--- a/Documentation/trace/ftrace.rst
+++ b/Documentation/trace/ftrace.rst
@@ -1384,9 +1384,8 @@ Options for function_graph tracer:
of each process displayed at every line.
 
   funcgraph-duration
-   At the end of each function (the return)
-   the duration of the amount of time in the
-   function is displayed in microseconds.
+   At the end (return) of each function the duration
+   of the function is displayed in microseconds.
 
   funcgraph-abstime
When set, the timestamp is displayed at each line.
diff --git a/tools/perf/Documentation/perf-ftrace.txt 
b/tools/perf/Documentation/perf-ftrace.txt
index d780b93fcf87..0cf7b3301478 100644
--- a/tools/perf/Documentation/perf-ftrace.txt
+++ b/tools/perf/Documentation/perf-ftrace.txt
@@ -121,6 +121,22 @@ OPTIONS for 'perf ftrace trace'
List of options allowed to set:
 
  - nosleep-time - Measure on-CPU time only for function_graph tracer.
+ - overrun  - Show number of lost events due to overwriting when
+   the buffer was full.
+ - nocpu- Don't show the CPU which the process was running on.
+ - nooverhead   - Do not print the appropriate delay sign next to
+   long-running functions (' ' for sub-10us, '+' for sub-100us, '!' for
+   sub-1ms, '#' for sub-10ms, '*' for sub-100ms and '@' for sub-1s, '$'
+   for sub-1s runtime).
+ - noduration   - Don't show function the end of each function (the
+   return) the duration of the amount of time in the function is
+   displayed in microseconds.
+ - retval   - Print function's return value on exit. Error codes
+   are printed as signed decimals, other return values are printed as
+   hexadecimals.
+ - retval-hex   - Print all return values in hexadecimal format.
+ - tail - Print the returning name of function next to
+   closing brackets.
  - noirqs   - Ignore functions that happen inside interrupt.
  - verbose  - Show process names, PIDs, timestamps, etc.
  - thresh=   - Setup trace duration threshold in microseconds.
diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index eb30c8eca488..1c58fa744758 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -225,9 +225,16 @@ static void reset_tracing_options(struct perf_ftrace 
*ftrace __maybe_unused)
write_tracing_option_file("function-fork", "0");
write_tracing_option_file("func_stack_trace", "0");
write_tracing_option_file("sleep-time", "1");
+   write_tracing_option_file("funcgraph-overrun", "0");
+   write_tracing_option_file("funcgraph-cpu", "1");
+   write_tracing_option_file("funcgraph-overhead", "1");
+   write_tracing_option_file("funcgraph-duration", "1");
write_tracing_option_file("funcgraph-irqs", "1");
write_tracing_option_file("funcgraph-proc", "0");
write_tracing_option_file("funcgraph-abstime", "0");
+   write_tracing_option_file("funcgraph-retval", "0");
+   write_tracing_option_file("funcgraph-retval-hex", "0");
+   write_tracing_option_file("funcgraph-tail", "0");
write_tracing_option_file("latency-format", "0");
write_tracing_option_file("irq-info", "0");
 }
@@ -425,6 +432,42 @@ static int set_tracing_trace_inherit(struct perf_ftrace 
*ftrace)
return 0;
 }
 
+static int set_tracing_funcgraph_overrun(struct perf_ftrace *ftrace)
+{
+   if (ftrace->graph_overrun == 0)
+   return 0;
+   if (write_tracing_option_file("funcgraph-overrun", "1") < 0)
+   return -1;
+   return 0;
+}
+
+static int set_tracing_funcgraph_nocpu(struct perf_ftrace *ftrace)
+{
+   if (ftrace->graph_nocpu == 0)
+   return 0;
+   if (write_tracing_option_file("funcgraph-cpu", "0") < 0)
+   return -1;
+   return 0;
+}
+
+static int set_tracing_funcgraph_nooverhead(struct perf_ftrace *ftrace)
+{
+   if (ftrace->graph_nooverhead == 0)
+   return 0;
+   if (write_tracing_option_file("funcgraph-overhead", "0") < 0)
+   return -1;
+   return 0;
+}
+
+static int set_tracing_funcgraph_noduration(struct perf_ftrace *ftrace)
+{
+   if (ftrace->graph_noduration == 0)
+   return 0;
+   if 

Re: [PATCH net-next v3 2/3] trace: tcp: fully support trace_tcp_send_reset

2024-03-29 Thread Eric Dumazet
On Fri, Mar 29, 2024 at 11:23 AM Jason Xing  wrote:
>
> On Fri, Mar 29, 2024 at 5:07 PM Eric Dumazet  wrote:
> >
> > On Fri, Mar 29, 2024 at 4:43 AM Jason Xing  
> > wrote:
> > >
> > > From: Jason Xing 
> > >
> > > Prior to this patch, what we can see by enabling trace_tcp_send is
> > > only happening under two circumstances:
> > > 1) active rst mode
> > > 2) non-active rst mode and based on the full socket
> > >
> > > That means the inconsistency occurs if we use tcpdump and trace
> > > simultaneously to see how rst happens.
> > >
> > > It's necessary that we should take into other cases into considerations,
> > > say:
> > > 1) time-wait socket
> > > 2) no socket
> > > ...
> > >
> > > By parsing the incoming skb and reversing its 4-tuple can
> > > we know the exact 'flow' which might not exist.
> > >
> > > Samples after applied this patch:
> > > 1. tcp_send_reset: skbaddr=XXX skaddr=XXX src=ip:port dest=ip:port
> > > state=TCP_ESTABLISHED
> > > 2. tcp_send_reset: skbaddr=000...000 skaddr=XXX src=ip:port dest=ip:port
> > > state=UNKNOWN
> > > Note:
> > > 1) UNKNOWN means we cannot extract the right information from skb.
> > > 2) skbaddr/skaddr could be 0
> > >
> > > Signed-off-by: Jason Xing 
> > > ---
> > >  include/trace/events/tcp.h | 39 --
> > >  net/ipv4/tcp_ipv4.c|  4 ++--
> > >  net/ipv6/tcp_ipv6.c|  3 ++-
> > >  3 files changed, 41 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
> > > index 194425f69642..289438c54227 100644
> > > --- a/include/trace/events/tcp.h
> > > +++ b/include/trace/events/tcp.h
> > > @@ -78,11 +78,46 @@ DEFINE_EVENT(tcp_event_sk_skb, tcp_retransmit_skb,
> > >   * skb of trace_tcp_send_reset is the skb that caused RST. In case of
> > >   * active reset, skb should be NULL
> > >   */
> > > -DEFINE_EVENT(tcp_event_sk_skb, tcp_send_reset,
> > > +TRACE_EVENT(tcp_send_reset,
> > >
> > > TP_PROTO(const struct sock *sk, const struct sk_buff *skb),
> > >
> > > -   TP_ARGS(sk, skb)
> > > +   TP_ARGS(sk, skb),
> > > +
> > > +   TP_STRUCT__entry(
> > > +   __field(const void *, skbaddr)
> > > +   __field(const void *, skaddr)
> > > +   __field(int, state)
> > > +   __array(__u8, saddr, sizeof(struct sockaddr_in6))
> > > +   __array(__u8, daddr, sizeof(struct sockaddr_in6))
> > > +   ),
> > > +
> > > +   TP_fast_assign(
> > > +   __entry->skbaddr = skb;
> > > +   __entry->skaddr = sk;
> > > +   /* Zero means unknown state. */
> > > +   __entry->state = sk ? sk->sk_state : 0;
> > > +
> > > +   memset(__entry->saddr, 0, sizeof(struct sockaddr_in6));
> > > +   memset(__entry->daddr, 0, sizeof(struct sockaddr_in6));
> > > +
> > > +   if (sk && sk_fullsock(sk)) {
> > > +   const struct inet_sock *inet = inet_sk(sk);
> > > +
> > > +   TP_STORE_ADDR_PORTS(__entry, inet, sk);
> > > +   } else {
> >
> > To be on the safe side, I would test if (skb) here.
> > We have one caller with skb == NULL, we might have more in the future.
>
> Thanks for the review.
>
> How about changing '} else {' to '} else if (skb) {', then if we go
> into this else-if branch, we will print nothing, right? I'll test it
> in this case.

Right, the fields are cleared before this else

+   memset(__entry->saddr, 0, sizeof(struct sockaddr_in6));
+   memset(__entry->daddr, 0, sizeof(struct sockaddr_in6));

>
> >
> > > +   /*
> > > +* We should reverse the 4-tuple of skb, so later
> > > +* it can print the right flow direction of rst.
> > > +*/
> > > +   TP_STORE_ADDR_PORTS_SKB(skb, entry->daddr, 
> > > entry->saddr);
> > > +   }
> > > +   ),
> > > +
> > > +   TP_printk("skbaddr=%p skaddr=%p src=%pISpc dest=%pISpc state=%s",
> > > + __entry->skbaddr, __entry->skaddr,
> > > + __entry->saddr, __entry->daddr,
> > > + __entry->state ? show_tcp_state_name(__entry->state) : 
> > > "UNKNOWN")
> > >  );
> > >
> > >  /*
> > > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> > > index a22ee5838751..d5c4a969c066 100644
> > > --- a/net/ipv4/tcp_ipv4.c
> > > +++ b/net/ipv4/tcp_ipv4.c
> > > @@ -868,10 +868,10 @@ static void tcp_v4_send_reset(const struct sock 
> > > *sk, struct sk_buff *skb)
> > >  */
> > > if (sk) {
> > > arg.bound_dev_if = sk->sk_bound_dev_if;
> > > -   if (sk_fullsock(sk))
> > > -   trace_tcp_send_reset(sk, skb);
> > > }
> >
> > Remove the { } ?
>
> Yes, I forgot to remove them.

No problem.



Re: [PATCH net-next v3 3/3] tcp: add location into reset trace process

2024-03-29 Thread Jason Xing
On Fri, Mar 29, 2024 at 5:13 PM Eric Dumazet  wrote:
>
> On Fri, Mar 29, 2024 at 4:43 AM Jason Xing  wrote:
> >
> > From: Jason Xing 
> >
> > In addition to knowing the 4-tuple of the flow which generates RST,
> > the reason why it does so is very important because we have some
> > cases where the RST should be sent and have no clue which one
> > exactly.
> >
> > Adding location of reset process can help us more, like what
> > trace_kfree_skb does.
>
> Well, I would prefer a drop_reason here, even if there is no 'dropped' packet.

Good idea really. Then we can accurately diagnose which kind of reason
exactly causes the RST behavior.

I'm not sure if we can reuse the drop_reason here, like adding/using
some reasons in enum skb_drop_reason {}? The name is a little bit
strange.

Oh, I can just print the string of reason directly instead of really
using enum skb_drop_reason {}...

>
> This would be more stable than something based on function names that
> could be changed.
>
> tracepoints do not have to get ugly, we can easily get stack traces if needed.
>
> perf record -a -g  -e tcp:tcp_send_reset ...

Ah, yes, I blindly mimic what trace_skb_kfree() and
trace_consume_skb() do. Introducing some RST reasons is more
reasonable and easier to detect since it's not hard to add four or
five reasons only.

Thanks,
Jason



Re: [PATCH net-next v3 2/3] trace: tcp: fully support trace_tcp_send_reset

2024-03-29 Thread Jason Xing
On Fri, Mar 29, 2024 at 5:07 PM Eric Dumazet  wrote:
>
> On Fri, Mar 29, 2024 at 4:43 AM Jason Xing  wrote:
> >
> > From: Jason Xing 
> >
> > Prior to this patch, what we can see by enabling trace_tcp_send is
> > only happening under two circumstances:
> > 1) active rst mode
> > 2) non-active rst mode and based on the full socket
> >
> > That means the inconsistency occurs if we use tcpdump and trace
> > simultaneously to see how rst happens.
> >
> > It's necessary that we should take into other cases into considerations,
> > say:
> > 1) time-wait socket
> > 2) no socket
> > ...
> >
> > By parsing the incoming skb and reversing its 4-tuple can
> > we know the exact 'flow' which might not exist.
> >
> > Samples after applied this patch:
> > 1. tcp_send_reset: skbaddr=XXX skaddr=XXX src=ip:port dest=ip:port
> > state=TCP_ESTABLISHED
> > 2. tcp_send_reset: skbaddr=000...000 skaddr=XXX src=ip:port dest=ip:port
> > state=UNKNOWN
> > Note:
> > 1) UNKNOWN means we cannot extract the right information from skb.
> > 2) skbaddr/skaddr could be 0
> >
> > Signed-off-by: Jason Xing 
> > ---
> >  include/trace/events/tcp.h | 39 --
> >  net/ipv4/tcp_ipv4.c|  4 ++--
> >  net/ipv6/tcp_ipv6.c|  3 ++-
> >  3 files changed, 41 insertions(+), 5 deletions(-)
> >
> > diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
> > index 194425f69642..289438c54227 100644
> > --- a/include/trace/events/tcp.h
> > +++ b/include/trace/events/tcp.h
> > @@ -78,11 +78,46 @@ DEFINE_EVENT(tcp_event_sk_skb, tcp_retransmit_skb,
> >   * skb of trace_tcp_send_reset is the skb that caused RST. In case of
> >   * active reset, skb should be NULL
> >   */
> > -DEFINE_EVENT(tcp_event_sk_skb, tcp_send_reset,
> > +TRACE_EVENT(tcp_send_reset,
> >
> > TP_PROTO(const struct sock *sk, const struct sk_buff *skb),
> >
> > -   TP_ARGS(sk, skb)
> > +   TP_ARGS(sk, skb),
> > +
> > +   TP_STRUCT__entry(
> > +   __field(const void *, skbaddr)
> > +   __field(const void *, skaddr)
> > +   __field(int, state)
> > +   __array(__u8, saddr, sizeof(struct sockaddr_in6))
> > +   __array(__u8, daddr, sizeof(struct sockaddr_in6))
> > +   ),
> > +
> > +   TP_fast_assign(
> > +   __entry->skbaddr = skb;
> > +   __entry->skaddr = sk;
> > +   /* Zero means unknown state. */
> > +   __entry->state = sk ? sk->sk_state : 0;
> > +
> > +   memset(__entry->saddr, 0, sizeof(struct sockaddr_in6));
> > +   memset(__entry->daddr, 0, sizeof(struct sockaddr_in6));
> > +
> > +   if (sk && sk_fullsock(sk)) {
> > +   const struct inet_sock *inet = inet_sk(sk);
> > +
> > +   TP_STORE_ADDR_PORTS(__entry, inet, sk);
> > +   } else {
>
> To be on the safe side, I would test if (skb) here.
> We have one caller with skb == NULL, we might have more in the future.

Thanks for the review.

How about changing '} else {' to '} else if (skb) {', then if we go
into this else-if branch, we will print nothing, right? I'll test it
in this case.

>
> > +   /*
> > +* We should reverse the 4-tuple of skb, so later
> > +* it can print the right flow direction of rst.
> > +*/
> > +   TP_STORE_ADDR_PORTS_SKB(skb, entry->daddr, 
> > entry->saddr);
> > +   }
> > +   ),
> > +
> > +   TP_printk("skbaddr=%p skaddr=%p src=%pISpc dest=%pISpc state=%s",
> > + __entry->skbaddr, __entry->skaddr,
> > + __entry->saddr, __entry->daddr,
> > + __entry->state ? show_tcp_state_name(__entry->state) : 
> > "UNKNOWN")
> >  );
> >
> >  /*
> > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> > index a22ee5838751..d5c4a969c066 100644
> > --- a/net/ipv4/tcp_ipv4.c
> > +++ b/net/ipv4/tcp_ipv4.c
> > @@ -868,10 +868,10 @@ static void tcp_v4_send_reset(const struct sock *sk, 
> > struct sk_buff *skb)
> >  */
> > if (sk) {
> > arg.bound_dev_if = sk->sk_bound_dev_if;
> > -   if (sk_fullsock(sk))
> > -   trace_tcp_send_reset(sk, skb);
> > }
>
> Remove the { } ?

Yes, I forgot to remove them.

Thanks,
Jason

>
>
> >
> > +   trace_tcp_send_reset(sk, skb);
> > +
> > BUILD_BUG_ON(offsetof(struct sock, sk_bound_dev_if) !=
> >  offsetof(struct inet_timewait_sock, tw_bound_dev_if));
> >
> > diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
> > index 3f4cba49e9ee..8e9c59b6c00c 100644
> > --- a/net/ipv6/tcp_ipv6.c
> > +++ b/net/ipv6/tcp_ipv6.c
> > @@ -1113,7 +1113,6 @@ static void tcp_v6_send_reset(const struct sock *sk, 
> > struct sk_buff *skb)
> > if (sk) {
> > oif = sk->sk_bound_dev_if;
> > if (sk_fullsock(sk)) {
> > -

Re: [PATCH net-next v3 3/3] tcp: add location into reset trace process

2024-03-29 Thread Eric Dumazet
On Fri, Mar 29, 2024 at 4:43 AM Jason Xing  wrote:
>
> From: Jason Xing 
>
> In addition to knowing the 4-tuple of the flow which generates RST,
> the reason why it does so is very important because we have some
> cases where the RST should be sent and have no clue which one
> exactly.
>
> Adding location of reset process can help us more, like what
> trace_kfree_skb does.

Well, I would prefer a drop_reason here, even if there is no 'dropped' packet.

This would be more stable than something based on function names that
could be changed.

tracepoints do not have to get ugly, we can easily get stack traces if needed.

perf record -a -g  -e tcp:tcp_send_reset ...



Re: [PATCH net-next v3 2/3] trace: tcp: fully support trace_tcp_send_reset

2024-03-29 Thread Eric Dumazet
On Fri, Mar 29, 2024 at 4:43 AM Jason Xing  wrote:
>
> From: Jason Xing 
>
> Prior to this patch, what we can see by enabling trace_tcp_send is
> only happening under two circumstances:
> 1) active rst mode
> 2) non-active rst mode and based on the full socket
>
> That means the inconsistency occurs if we use tcpdump and trace
> simultaneously to see how rst happens.
>
> It's necessary that we should take into other cases into considerations,
> say:
> 1) time-wait socket
> 2) no socket
> ...
>
> By parsing the incoming skb and reversing its 4-tuple can
> we know the exact 'flow' which might not exist.
>
> Samples after applied this patch:
> 1. tcp_send_reset: skbaddr=XXX skaddr=XXX src=ip:port dest=ip:port
> state=TCP_ESTABLISHED
> 2. tcp_send_reset: skbaddr=000...000 skaddr=XXX src=ip:port dest=ip:port
> state=UNKNOWN
> Note:
> 1) UNKNOWN means we cannot extract the right information from skb.
> 2) skbaddr/skaddr could be 0
>
> Signed-off-by: Jason Xing 
> ---
>  include/trace/events/tcp.h | 39 --
>  net/ipv4/tcp_ipv4.c|  4 ++--
>  net/ipv6/tcp_ipv6.c|  3 ++-
>  3 files changed, 41 insertions(+), 5 deletions(-)
>
> diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
> index 194425f69642..289438c54227 100644
> --- a/include/trace/events/tcp.h
> +++ b/include/trace/events/tcp.h
> @@ -78,11 +78,46 @@ DEFINE_EVENT(tcp_event_sk_skb, tcp_retransmit_skb,
>   * skb of trace_tcp_send_reset is the skb that caused RST. In case of
>   * active reset, skb should be NULL
>   */
> -DEFINE_EVENT(tcp_event_sk_skb, tcp_send_reset,
> +TRACE_EVENT(tcp_send_reset,
>
> TP_PROTO(const struct sock *sk, const struct sk_buff *skb),
>
> -   TP_ARGS(sk, skb)
> +   TP_ARGS(sk, skb),
> +
> +   TP_STRUCT__entry(
> +   __field(const void *, skbaddr)
> +   __field(const void *, skaddr)
> +   __field(int, state)
> +   __array(__u8, saddr, sizeof(struct sockaddr_in6))
> +   __array(__u8, daddr, sizeof(struct sockaddr_in6))
> +   ),
> +
> +   TP_fast_assign(
> +   __entry->skbaddr = skb;
> +   __entry->skaddr = sk;
> +   /* Zero means unknown state. */
> +   __entry->state = sk ? sk->sk_state : 0;
> +
> +   memset(__entry->saddr, 0, sizeof(struct sockaddr_in6));
> +   memset(__entry->daddr, 0, sizeof(struct sockaddr_in6));
> +
> +   if (sk && sk_fullsock(sk)) {
> +   const struct inet_sock *inet = inet_sk(sk);
> +
> +   TP_STORE_ADDR_PORTS(__entry, inet, sk);
> +   } else {

To be on the safe side, I would test if (skb) here.
We have one caller with skb == NULL, we might have more in the future.

> +   /*
> +* We should reverse the 4-tuple of skb, so later
> +* it can print the right flow direction of rst.
> +*/
> +   TP_STORE_ADDR_PORTS_SKB(skb, entry->daddr, 
> entry->saddr);
> +   }
> +   ),
> +
> +   TP_printk("skbaddr=%p skaddr=%p src=%pISpc dest=%pISpc state=%s",
> + __entry->skbaddr, __entry->skaddr,
> + __entry->saddr, __entry->daddr,
> + __entry->state ? show_tcp_state_name(__entry->state) : 
> "UNKNOWN")
>  );
>
>  /*
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index a22ee5838751..d5c4a969c066 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -868,10 +868,10 @@ static void tcp_v4_send_reset(const struct sock *sk, 
> struct sk_buff *skb)
>  */
> if (sk) {
> arg.bound_dev_if = sk->sk_bound_dev_if;
> -   if (sk_fullsock(sk))
> -   trace_tcp_send_reset(sk, skb);
> }

Remove the { } ?


>
> +   trace_tcp_send_reset(sk, skb);
> +
> BUILD_BUG_ON(offsetof(struct sock, sk_bound_dev_if) !=
>  offsetof(struct inet_timewait_sock, tw_bound_dev_if));
>
> diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
> index 3f4cba49e9ee..8e9c59b6c00c 100644
> --- a/net/ipv6/tcp_ipv6.c
> +++ b/net/ipv6/tcp_ipv6.c
> @@ -1113,7 +1113,6 @@ static void tcp_v6_send_reset(const struct sock *sk, 
> struct sk_buff *skb)
> if (sk) {
> oif = sk->sk_bound_dev_if;
> if (sk_fullsock(sk)) {
> -   trace_tcp_send_reset(sk, skb);
> if (inet6_test_bit(REPFLOW, sk))
> label = ip6_flowlabel(ipv6h);
> priority = READ_ONCE(sk->sk_priority);
> @@ -1129,6 +1128,8 @@ static void tcp_v6_send_reset(const struct sock *sk, 
> struct sk_buff *skb)
> label = ip6_flowlabel(ipv6h);
> }
>
> +   trace_tcp_send_reset(sk, skb);
> +
> tcp_v6_send_response(sk, skb, seq, ack_seq, 0, 0, 0, oif, 1,
>

Re: [PATCH net-next v3 1/3] trace: adjust TP_STORE_ADDR_PORTS_SKB() parameters

2024-03-29 Thread Eric Dumazet
On Fri, Mar 29, 2024 at 4:43 AM Jason Xing  wrote:
>
> From: Jason Xing 
>
> Introducing entry_saddr and entry_daddr parameters in this macro
> for later use can help us record the reverse 4-tuple by analyzing
> the 4-tuple of the incoming skb when receiving.
>
> Signed-off-by: Jason Xing 

Reviewed-by: Eric Dumazet 



[PATCH] x86/mce: Clean up TP_printk() output line of the mce_record tracepoint

2024-03-29 Thread Ingo Molnar


* Avadhut Naik  wrote:

>  
> +/*
> + * MCE Event Record.
> + *
> + * Only very relevant and transient information which cannot be
> + * gathered from a system by any other means or which can only be
> + * acquired arduously should be added to this record.
> + */
> +
>  TRACE_EVENT(mce_record,
>  
>   TP_PROTO(struct mce *m),
> @@ -25,6 +33,7 @@ TRACE_EVENT(mce_record,
>   __field(u64,ipid)
>   __field(u64,ip  )
>   __field(u64,tsc )
> + __field(u64,ppin)
>   __field(u64,walltime)
>   __field(u32,cpu )
>   __field(u32,cpuid   )
> @@ -45,6 +54,7 @@ TRACE_EVENT(mce_record,
>   __entry->ipid   = m->ipid;
>   __entry->ip = m->ip;
>   __entry->tsc= m->tsc;
> + __entry->ppin   = m->ppin;
>   __entry->walltime   = m->time;
>   __entry->cpu= m->extcpu;
>   __entry->cpuid  = m->cpuid;
> @@ -55,7 +65,7 @@ TRACE_EVENT(mce_record,
>   __entry->cpuvendor  = m->cpuvendor;
>   ),
>  
> - TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, 
> ADDR/MISC/SYND: %016Lx/%016Lx/%016Lx, RIP: %02x:<%016Lx>, TSC: %llx, 
> PROCESSOR: %u:%x, TIME: %llu, SOCKET: %u, APIC: %x",
> + TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, 
> ADDR/MISC/SYND: %016Lx/%016Lx/%016Lx, RIP: %02x:<%016Lx>, TSC: %llx, PPIN: 
> %llx, processor: %u:%x, time: %llu, socket: %u, APIC: %x",

Please split out the other (capitalization) changes to the output into 
a separate patch.

- While at it, don't forget to:

   s/ADDR/MISC/SYND
/addr/misc/synd

- Also, it's a bit weird that we have 'CPU' but also 'processor' 
  fields, why isn't it 'vendor' and 'CPUID'?

- Finally, why are some fields 'merged' as per ADDR/MISC/SYND, while 
  others are listed separately? All that have separate names should be 
  listed separately.

Ie. something like the patch below?

Thanks,

Ingo

>
From: Ingo Molnar 
Date: Fri, 29 Mar 2024 08:09:23 +0100
Subject: [PATCH] x86/mce: Clean up TP_printk() output line of the mce_record 
tracepoint

 - Only capitalize entries where that makes sense
 - Print separate values separately
 - Rename 'PROCESSOR' to vendor & CPUID

Signed-off-by: Ingo Molnar 
---
 include/trace/events/mce.h | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/include/trace/events/mce.h b/include/trace/events/mce.h
index 1391ada0da3b..c5b0523f25ee 100644
--- a/include/trace/events/mce.h
+++ b/include/trace/events/mce.h
@@ -55,15 +55,18 @@ TRACE_EVENT(mce_record,
__entry->cpuvendor  = m->cpuvendor;
),
 
-   TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, 
ADDR/MISC/SYND: %016Lx/%016Lx/%016Lx, RIP: %02x:<%016Lx>, TSC: %llx, PROCESSOR: 
%u:%x, TIME: %llu, SOCKET: %u, APIC: %x",
+   TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, 
addr: %016Lx, misc: %016Lx, SYND: %016Lx, RIP: %02x:<%016Lx>, TSC: %llx, 
vendor: %u, CPUID: %x, time: %llu, socket: %u, APIC: %x",
__entry->cpu,
__entry->mcgcap, __entry->mcgstatus,
__entry->bank, __entry->status,
__entry->ipid,
-   __entry->addr, __entry->misc, __entry->synd,
+   __entry->addr,
+   __entry->misc,
+   __entry->synd,
__entry->cs, __entry->ip,
__entry->tsc,
-   __entry->cpuvendor, __entry->cpuid,
+   __entry->cpuvendor,
+   __entry->cpuid,
__entry->walltime,
__entry->socketid,
__entry->apicid)