ARGH!!! please, if there are known holes in patches, put a comment in.

I now had to independently discover this problem during review of the
last patch.

On Wed, May 24, 2017 at 05:59:39PM +0900, Byungchul Park wrote:
> The ring buffer can be overwritten by hardirq/softirq/work contexts.
> That cases must be considered on rollback or commit. For example,
> 
>           |<------ hist_lock ring buffer size ----->|
>           ppppppppppppiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
> wrapped > iiiiiiiiiiiiiiiiiiiiiii....................
> 
>           where 'p' represents an acquisition in process context,
>           'i' represents an acquisition in irq context.
> 
> On irq exit, crossrelease tries to rollback idx to original position,
> but it should not because the entry already has been invalid by
> overwriting 'i'. Avoid rollback or commit for entries overwritten.
> 
> Signed-off-by: Byungchul Park <byungchul.p...@lge.com>
> ---
>  include/linux/lockdep.h  | 20 +++++++++++
>  include/linux/sched.h    |  4 +++
>  kernel/locking/lockdep.c | 92 
> +++++++++++++++++++++++++++++++++++++++++-------
>  3 files changed, 104 insertions(+), 12 deletions(-)
> 
> diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
> index d531097..a03f79d 100644
> --- a/include/linux/lockdep.h
> +++ b/include/linux/lockdep.h
> @@ -284,6 +284,26 @@ struct held_lock {
>   */
>  struct hist_lock {
>       /*
> +      * Id for each entry in the ring buffer. This is used to
> +      * decide whether the ring buffer was overwritten or not.
> +      *
> +      * For example,
> +      *
> +      *           |<----------- hist_lock ring buffer size ------->|
> +      *           pppppppppppppppppppppiiiiiiiiiiiiiiiiiiiiiiiiiiiii
> +      * wrapped > iiiiiiiiiiiiiiiiiiiiiiiiiii.......................
> +      *
> +      *           where 'p' represents an acquisition in process
> +      *           context, 'i' represents an acquisition in irq
> +      *           context.
> +      *
> +      * In this example, the ring buffer was overwritten by
> +      * acquisitions in irq context, that should be detected on
> +      * rollback or commit.
> +      */
> +     unsigned int hist_id;
> +
> +     /*
>        * Seperate stack_trace data. This will be used at commit step.
>        */
>       struct stack_trace      trace;
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 5f6d6f4..9e1437c 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1756,6 +1756,10 @@ struct task_struct {
>       unsigned int xhlock_idx_soft; /* For restoring at softirq exit */
>       unsigned int xhlock_idx_hard; /* For restoring at hardirq exit */
>       unsigned int xhlock_idx_work; /* For restoring at work exit */
> +     unsigned int hist_id;
> +     unsigned int hist_id_soft; /* For overwrite check at softirq exit */
> +     unsigned int hist_id_hard; /* For overwrite check at hardirq exit */
> +     unsigned int hist_id_work; /* For overwrite check at work exit */
>  #endif
>  #ifdef CONFIG_UBSAN
>       unsigned int in_ubsan;


Right, like I wrote in the comment; I don't think you need quite this
much.

The problem only happens if you rewind more than MAX_XHLOCKS_NR;
although I realize it can be an accumulative rewind, which makes it
slightly more tricky.

We can either make the rewind more expensive and make xhlock_valid()
false for each rewound entry; or we can keep the max_idx and account
from there. If we rewind >= MAX_XHLOCKS_NR from the max_idx we need to
invalidate the entire state, which we can do by invaliding
xhlock_valid() or by re-introduction of the hist_gen_id. When we
invalidate the entire state, we can also clear the max_idx.

Given that rewinding _that_ far should be fairly rare (do we have
numbers?) simply iterating the entire thing and setting
xhlock->hlock.instance = NULL, should work I think.

Reply via email to