date:20170222

Re: lib: Introduce priority array area manager

2017-02-22 Thread Geert Uytterhoeven

Hi Jiri,

On Wed, Feb 22, 2017 at 8:02 PM, Linux Kernel Mailing List
 wrote:
> Web:
> https://git.kernel.org/torvalds/c/44091d29f2075972aede47ef17e1e70db3d51190
> Commit: 44091d29f2075972aede47ef17e1e70db3d51190
> Parent: b862815c3ee7b49ec20a9ab25da55a5f0bcbb95e
> Refname:refs/heads/master
> Author: Jiri Pirko 
> AuthorDate: Fri Feb 3 10:29:06 2017 +0100
> Committer:  David S. Miller 
> CommitDate: Fri Feb 3 16:35:42 2017 -0500
>
> lib: Introduce priority array area manager
>
> This introduces a infrastructure for management of linear priority
> areas. Priority order in an array matters, however order of items inside
> a priority group does not matter.
>
> As an initial implementation, L-sort algorithm is used. It is quite
> trivial. More advanced algorithm called P-sort will be introduced as a
> follow-up. The infrastructure is prepared for other algos.
>
> Alongside this, a testing module is introduced as well.
>
> Signed-off-by: Jiri Pirko 
> Signed-off-by: David S. Miller 

> --- a/lib/Kconfig
> +++ b/lib/Kconfig
> @@ -550,4 +550,7 @@ config STACKDEPOT
>  config SBITMAP
> bool
>
> +config PARMAN
> +   tristate "parman"

| parman (PARMAN) [N/m/y] (NEW) ?
|
| There is no help available for this option.

Can you please add a description for this option?
Or drop the "parman" string if this is always selected by its kernel users, and
never intended to be enabled by the end user.

Thanks!

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: lib: Introduce priority array area manager

2017-02-22 Thread Geert Uytterhoeven

Hi Jiri,

On Wed, Feb 22, 2017 at 8:02 PM, Linux Kernel Mailing List
 wrote:
> Web:
> https://git.kernel.org/torvalds/c/44091d29f2075972aede47ef17e1e70db3d51190
> Commit: 44091d29f2075972aede47ef17e1e70db3d51190
> Parent: b862815c3ee7b49ec20a9ab25da55a5f0bcbb95e
> Refname:refs/heads/master
> Author: Jiri Pirko 
> AuthorDate: Fri Feb 3 10:29:06 2017 +0100
> Committer:  David S. Miller 
> CommitDate: Fri Feb 3 16:35:42 2017 -0500
>
> lib: Introduce priority array area manager
>
> This introduces a infrastructure for management of linear priority
> areas. Priority order in an array matters, however order of items inside
> a priority group does not matter.
>
> As an initial implementation, L-sort algorithm is used. It is quite
> trivial. More advanced algorithm called P-sort will be introduced as a
> follow-up. The infrastructure is prepared for other algos.
>
> Alongside this, a testing module is introduced as well.
>
> Signed-off-by: Jiri Pirko 
> Signed-off-by: David S. Miller 

> --- a/lib/Kconfig
> +++ b/lib/Kconfig
> @@ -550,4 +550,7 @@ config STACKDEPOT
>  config SBITMAP
> bool
>
> +config PARMAN
> +   tristate "parman"

| parman (PARMAN) [N/m/y] (NEW) ?
|
| There is no help available for this option.

Can you please add a description for this option?
Or drop the "parman" string if this is always selected by its kernel users, and
never intended to be enabled by the end user.

Thanks!

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [PATCHv7 2/8] printk: rename nmi.c and exported api

2017-02-22 Thread Geert Uytterhoeven

Hi Sergey,

On Tue, Dec 27, 2016 at 3:16 PM, Sergey Senozhatsky
 wrote:
> A preparation patch for printk_safe work. No functional change.
> - rename nmi.c to print_safe.c
> - add `printk_safe' prefix to some (which used both by printk-safe
>   and printk-nmi) of the exported functions.
>
> Signed-off-by: Sergey Senozhatsky 

> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -875,17 +875,19 @@ config LOG_CPU_MAX_BUF_SHIFT
>  13 =>   8 KB for each CPU
>  12 =>   4 KB for each CPU
>
> -config NMI_LOG_BUF_SHIFT
> -   int "Temporary per-CPU NMI log buffer size (12 => 4KB, 13 => 8KB)"
> +config PRINTK_SAFE_LOG_BUF_SHIFT
> +   int "Temporary per-CPU printk log buffer size (12 => 4KB, 13 => 8KB)"
> range 10 21
> default 13
> -   depends on PRINTK_NMI
> +   depends on PRINTK

Was this dependency change intentional?
My platform doesn't have PRINTK_NMI.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [PATCHv7 2/8] printk: rename nmi.c and exported api

2017-02-22 Thread Geert Uytterhoeven

Hi Sergey,

On Tue, Dec 27, 2016 at 3:16 PM, Sergey Senozhatsky
 wrote:
> A preparation patch for printk_safe work. No functional change.
> - rename nmi.c to print_safe.c
> - add `printk_safe' prefix to some (which used both by printk-safe
>   and printk-nmi) of the exported functions.
>
> Signed-off-by: Sergey Senozhatsky 

> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -875,17 +875,19 @@ config LOG_CPU_MAX_BUF_SHIFT
>  13 =>   8 KB for each CPU
>  12 =>   4 KB for each CPU
>
> -config NMI_LOG_BUF_SHIFT
> -   int "Temporary per-CPU NMI log buffer size (12 => 4KB, 13 => 8KB)"
> +config PRINTK_SAFE_LOG_BUF_SHIFT
> +   int "Temporary per-CPU printk log buffer size (12 => 4KB, 13 => 8KB)"
> range 10 21
> default 13
> -   depends on PRINTK_NMI
> +   depends on PRINTK

Was this dependency change intentional?
My platform doesn't have PRINTK_NMI.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

RE: [PATCH 1/7] fs, xfs: convert xfs_bui_log_item.bui_refcount from atomic_t to refcount_t

2017-02-22 Thread Reshetova, Elena

> On Wed, Feb 22, 2017 at 11:20:31AM +, Reshetova, Elena wrote:
> > > On Tue, Feb 21, 2017 at 05:49:01PM +0200, Elena Reshetova wrote:
> > > > refcount_t type and corresponding API should be
> > > > used instead of atomic_t when the variable is used as
> > > > a reference counter. This allows to avoid accidental
> > > > refcounter overflows that might lead to use-after-free
> > > > situations.
> > >
> > > I'm missing something: how do you overflow a log item object
> > > reference count?
> >
> > We are currently converting all reference counters present in kernel to a
> safer refcount_t type.
> 
> Yes, I see that you are taking anything that you *think* is an
> object lifetime reference counter and changing it.
> 
> > Agreed, in some cases it might be easier or harder to actually 
> > create/trigger
> an overflow, but since it can be caused even by a bug in the legitimate code
> (current version or its future iterative), it is good idea to do "safe 
> defaults" and
> stop worrying about the problem.
> >
> > Do you have any reasons why it should not be converted?
> 
> It's core dirty metadata object code.  Any change to code in this
> area needs to be gone over with a fine tooth comb, because bugs can
> result in filesystem and/or journal corruption issues that may not
> be noticed until a system crashes and log recovery fails and the
> user loses their entire filesystem
> 
> Hence the repeated comments about needing to actually test the code
> you are changing.

Sure, we are now in the process of testing this run-time as was suggested using 
xfstests. 
I will only repost this series after we done with testing and fix issues.
 
Best Regards,
Elena.


> 
> Cheers,
> 
> Dave.
> --
> Dave Chinner
> da...@fromorbit.com

RE: [PATCH 1/7] fs, xfs: convert xfs_bui_log_item.bui_refcount from atomic_t to refcount_t

2017-02-22 Thread Reshetova, Elena

> On Wed, Feb 22, 2017 at 11:20:31AM +, Reshetova, Elena wrote:
> > > On Tue, Feb 21, 2017 at 05:49:01PM +0200, Elena Reshetova wrote:
> > > > refcount_t type and corresponding API should be
> > > > used instead of atomic_t when the variable is used as
> > > > a reference counter. This allows to avoid accidental
> > > > refcounter overflows that might lead to use-after-free
> > > > situations.
> > >
> > > I'm missing something: how do you overflow a log item object
> > > reference count?
> >
> > We are currently converting all reference counters present in kernel to a
> safer refcount_t type.
> 
> Yes, I see that you are taking anything that you *think* is an
> object lifetime reference counter and changing it.
> 
> > Agreed, in some cases it might be easier or harder to actually 
> > create/trigger
> an overflow, but since it can be caused even by a bug in the legitimate code
> (current version or its future iterative), it is good idea to do "safe 
> defaults" and
> stop worrying about the problem.
> >
> > Do you have any reasons why it should not be converted?
> 
> It's core dirty metadata object code.  Any change to code in this
> area needs to be gone over with a fine tooth comb, because bugs can
> result in filesystem and/or journal corruption issues that may not
> be noticed until a system crashes and log recovery fails and the
> user loses their entire filesystem
> 
> Hence the repeated comments about needing to actually test the code
> you are changing.

Sure, we are now in the process of testing this run-time as was suggested using 
xfstests. 
I will only repost this series after we done with testing and fix issues.
 
Best Regards,
Elena.


> 
> Cheers,
> 
> Dave.
> --
> Dave Chinner
> da...@fromorbit.com

[BUG?] perf: dwarf unwind doesn't work correctly on aarch64

2017-02-22 Thread Masami Hiramatsu

Hello,

perf record -g dwarf (and perf report) doesn't show correct callchain
on aarch64. Here is how to reproduce it.

1) I've prepared an debian8 aarch64 VM on qemu-system-aarch64, and
   build/install the latest perftools on it.

2) Build attached program as below
# gcc -O0 -ggdb3 -funwind-tables -o main main.c

3) Run perf to record with dwarf.
# perf record -g --call-graph dwarf,1024 -e cpu-clock:u -o /tmp/perf.data -- 
./main
^C[ perf record: Woken up 35 times to write data ]
[ perf record: Captured and wrote 8.526 MB /tmp/perf.data (6495 samples) ]


4) Run perf to report the result.

# perf report -i /tmp/perf.data
# To display the perf.data header info, please use --header/--header-only option
#
#
# Total Lost Samples: 0
#
# Samples: 6K of event 'cpu-clock:u'
# Event count (approx.): 162375
#
# Children  Self  Command  Shared Object  Symbol
#     ...  .  ..
#
17.21%17.21%  main main   [.] func2
|
---func2

17.09%17.09%  main main   [.] func1
|
---func1

16.67%16.67%  main main   [.] main
|
---main
.

So, as you can see, the call graph reported each function has been
called from itself. If I report it with fp as below, perf reported
correct callgraph.

3') record it with fp
# perf record -g --call-graph fp -e cpu-clock:u -o /tmp/perf.data -- ./main 
^C[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.397 MB /tmp/perf.data (4160 samples) ]

4') report it

# perf report -i /tmp/perf.data # To display the perf.data header info, please 
use --header/--header-only option
#
#
# Total Lost Samples: 0
#
# Samples: 4K of event 'cpu-clock:u'
# Event count (approx.): 104000
#
# Children  Self  Command  Shared Object  Symbol 
#     ...  .  ...
#
99.06% 0.00%  main libc-2.19.so   [.] __libc_start_main
|
---__libc_start_main
   |  
--98.94%--main
  |  
  |--80.24%--func0
  |  |  
  |  |--63.27%--func1
  |  |  |  
  |  |  |--47.04%--func2
  |  |  |  | 
.


I tried to dump but it seems correct.

# perf report -D -i /tmp/perf.data
[...]
.  0130:  c8 05 40 00 00 00 00 00 30 e1 a8 df ff ff 00 00  ..@.0...
.  0140:  90 05 40 00 00 00 00 00 [00 04 00 00 00 00 00 00](*)  ..@.
.  0150:  50 e1 a8 df ff ff 00 00 e8 05 40 00 00 00 00 00  P.@.
.  0160:  00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 00  
.  0170:  70 e1 a8 df ff ff 00 00 08 06 40 00 00 00 00 00  p.@.
.  0180:  8c b6 c9 8f ff ff 00 00 00 00 00 00 04 00 00 00  
.  0190:  90 e1 a8 df ff ff 00 00 28 06 40 00 00 00 00 00  (.@.
.  01a0:  c0 e1 a8 df ff ff 00 00 2c 1d b6 8f 02 00 00 00  ,...
.  01b0:  b0 e1 a8 df ff ff 00 00 40 06 40 00 00 00 00 00  @.@.
.  01c0:  c0 e1 a8 df ff ff 00 00 48 1d b6 8f 01 00 00 00  H..
[...]
1207680984048 0xf040 [0x560]: PERF_RECORD_SAMPLE(IP, 0x2): 114/114: 0x400590 per
... FP chain: nr:0
... user regs: mask 0x1 ABI 64-bit
[...]
 x29   0xdfa8e130
 lr0x4005c8
 sp0xdfa8e130
 pc0x400590
... ustack: size 1024, offset 0x148
 . data_src: 0x5080021
 ... thread: main:114

In this entry, ustack should start from offset=0x0148 in event raw data,
I marked it with (*), which is the saved stack size(0x400 = 1024) and
the top of stack has 0xdfa8e150 which seems next frame pointer, and
after that there is 0x4005e8, which is next return address.

004005b0 :
  4005b0:   a9be7bfdstp x29, x30, [sp,#-32]!
  4005b4:   910003fdmov x29, sp
  4005b8:   b9001fa0str w0, [x29,#28]
  4005bc:   b9401fa0ldr w0, [x29,#28]
  4005c0:   11002000add w0, w0, #0x8
  4005c4:   97f3bl  400590 
  4005c8:   a8c27bfdldp x29, x30, [sp],#32
  4005cc:   d65f03c0ret

004005d0 :
  4005d0:   a9be7bfdstp x29, x30, [sp,#-32]!
  4005d4:   910003fdmov x29, sp
  4005d8:   b9001fa0str w0, [x29,#28]
  4005dc:   b9401fa0ldr w0, [x29,#28]
  4005e0:   11001000add w0, w0, #0x4
  4005e4:   97f3bl  4005b0 
  4005e8:   a8c27bfdldp x29, x30, [sp],#32

So, the stack data should be correct.

I guess there is a bug in libunwind on aarch64 or we missed to pass
the stack data to libunwind. (BTW, it works

[BUG?] perf: dwarf unwind doesn't work correctly on aarch64

2017-02-22 Thread Masami Hiramatsu

Hello,

perf record -g dwarf (and perf report) doesn't show correct callchain
on aarch64. Here is how to reproduce it.

1) I've prepared an debian8 aarch64 VM on qemu-system-aarch64, and
   build/install the latest perftools on it.

2) Build attached program as below
# gcc -O0 -ggdb3 -funwind-tables -o main main.c

3) Run perf to record with dwarf.
# perf record -g --call-graph dwarf,1024 -e cpu-clock:u -o /tmp/perf.data -- 
./main
^C[ perf record: Woken up 35 times to write data ]
[ perf record: Captured and wrote 8.526 MB /tmp/perf.data (6495 samples) ]


4) Run perf to report the result.

# perf report -i /tmp/perf.data
# To display the perf.data header info, please use --header/--header-only option
#
#
# Total Lost Samples: 0
#
# Samples: 6K of event 'cpu-clock:u'
# Event count (approx.): 162375
#
# Children  Self  Command  Shared Object  Symbol
#     ...  .  ..
#
17.21%17.21%  main main   [.] func2
|
---func2

17.09%17.09%  main main   [.] func1
|
---func1

16.67%16.67%  main main   [.] main
|
---main
.

So, as you can see, the call graph reported each function has been
called from itself. If I report it with fp as below, perf reported
correct callgraph.

3') record it with fp
# perf record -g --call-graph fp -e cpu-clock:u -o /tmp/perf.data -- ./main 
^C[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.397 MB /tmp/perf.data (4160 samples) ]

4') report it

# perf report -i /tmp/perf.data # To display the perf.data header info, please 
use --header/--header-only option
#
#
# Total Lost Samples: 0
#
# Samples: 4K of event 'cpu-clock:u'
# Event count (approx.): 104000
#
# Children  Self  Command  Shared Object  Symbol 
#     ...  .  ...
#
99.06% 0.00%  main libc-2.19.so   [.] __libc_start_main
|
---__libc_start_main
   |  
--98.94%--main
  |  
  |--80.24%--func0
  |  |  
  |  |--63.27%--func1
  |  |  |  
  |  |  |--47.04%--func2
  |  |  |  | 
.


I tried to dump but it seems correct.

# perf report -D -i /tmp/perf.data
[...]
.  0130:  c8 05 40 00 00 00 00 00 30 e1 a8 df ff ff 00 00  ..@.0...
.  0140:  90 05 40 00 00 00 00 00 [00 04 00 00 00 00 00 00](*)  ..@.
.  0150:  50 e1 a8 df ff ff 00 00 e8 05 40 00 00 00 00 00  P.@.
.  0160:  00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 00  
.  0170:  70 e1 a8 df ff ff 00 00 08 06 40 00 00 00 00 00  p.@.
.  0180:  8c b6 c9 8f ff ff 00 00 00 00 00 00 04 00 00 00  
.  0190:  90 e1 a8 df ff ff 00 00 28 06 40 00 00 00 00 00  (.@.
.  01a0:  c0 e1 a8 df ff ff 00 00 2c 1d b6 8f 02 00 00 00  ,...
.  01b0:  b0 e1 a8 df ff ff 00 00 40 06 40 00 00 00 00 00  @.@.
.  01c0:  c0 e1 a8 df ff ff 00 00 48 1d b6 8f 01 00 00 00  H..
[...]
1207680984048 0xf040 [0x560]: PERF_RECORD_SAMPLE(IP, 0x2): 114/114: 0x400590 per
... FP chain: nr:0
... user regs: mask 0x1 ABI 64-bit
[...]
 x29   0xdfa8e130
 lr0x4005c8
 sp0xdfa8e130
 pc0x400590
... ustack: size 1024, offset 0x148
 . data_src: 0x5080021
 ... thread: main:114

In this entry, ustack should start from offset=0x0148 in event raw data,
I marked it with (*), which is the saved stack size(0x400 = 1024) and
the top of stack has 0xdfa8e150 which seems next frame pointer, and
after that there is 0x4005e8, which is next return address.

004005b0 :
  4005b0:   a9be7bfdstp x29, x30, [sp,#-32]!
  4005b4:   910003fdmov x29, sp
  4005b8:   b9001fa0str w0, [x29,#28]
  4005bc:   b9401fa0ldr w0, [x29,#28]
  4005c0:   11002000add w0, w0, #0x8
  4005c4:   97f3bl  400590 
  4005c8:   a8c27bfdldp x29, x30, [sp],#32
  4005cc:   d65f03c0ret

004005d0 :
  4005d0:   a9be7bfdstp x29, x30, [sp,#-32]!
  4005d4:   910003fdmov x29, sp
  4005d8:   b9001fa0str w0, [x29,#28]
  4005dc:   b9401fa0ldr w0, [x29,#28]
  4005e0:   11001000add w0, w0, #0x4
  4005e4:   97f3bl  4005b0 
  4005e8:   a8c27bfdldp x29, x30, [sp],#32

So, the stack data should be correct.

I guess there is a bug in libunwind on aarch64 or we missed to pass
the stack data to libunwind. (BTW, it works

Re: [PATCH 2/7] clk: tegra: fix isp clock modelling

2017-02-22 Thread Mikko Perttunen

The TRM shows a CLK_SOURCE_ISPB register, but after some discussion, it 
seems like that is a documentation generation bug, so this should be 
correct.


Reviewed-by: Mikko Perttunen 

On 22.02.2017 17:13, Peter De Schrijver wrote:

The 2 isp clocks (ispa and ispb) share a mux/divider control. So model
this as 1 mux/divider clock and child gate clocks.

Signed-off-by: Peter De Schrijver 
---
 drivers/clk/tegra/clk-id.h   |  1 +
 drivers/clk/tegra/clk-tegra-periph.c | 11 +--
 drivers/clk/tegra/clk-tegra210.c |  1 +
 include/dt-bindings/clock/tegra210-car.h |  4 ++--
 4 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/clk/tegra/clk-id.h b/drivers/clk/tegra/clk-id.h
index 5738635..1019eb8 100644
--- a/drivers/clk/tegra/clk-id.h
+++ b/drivers/clk/tegra/clk-id.h
@@ -307,6 +307,7 @@ enum clk_id {
tegra_clk_xusb_ssp_src,
tegra_clk_sclk_mux,
tegra_clk_sor_safe,
+   tegra_clk_ispa,
tegra_clk_max,
 };

diff --git a/drivers/clk/tegra/clk-tegra-periph.c 
b/drivers/clk/tegra/clk-tegra-periph.c
index 4ce4e7f..19b00b7 100644
--- a/drivers/clk/tegra/clk-tegra-periph.c
+++ b/drivers/clk/tegra/clk-tegra-periph.c
@@ -168,6 +168,12 @@
  0, TEGRA_PERIPH_NO_GATE, _clk_id,\
  _parents##_idx, 0, _lock)

+#define MUX8_NOGATE(_name, _parents, _offset, _clk_id) \
+   TEGRA_INIT_DATA_TABLE(_name, NULL, NULL, _parents, _offset, \
+ 29, MASK(3), 0, 0, 8, 1, TEGRA_DIVIDER_ROUND_UP,\
+ 0, TEGRA_PERIPH_NO_GATE, _clk_id,\
+ _parents##_idx, 0, NULL)
+
 #define INT(_name, _parents, _offset,  \
_clk_num, _gate_flags, _clk_id) \
TEGRA_INIT_DATA_TABLE(_name, NULL, NULL, _parents, _offset,\
@@ -739,7 +745,7 @@
MUX8("soc_therm", mux_clkm_pllc_pllp_plla, CLK_SOURCE_SOC_THERM, 78, 
TEGRA_PERIPH_ON_APB, tegra_clk_soc_therm_8),
MUX8("vi_sensor", mux_pllm_pllc2_c_c3_pllp_plla, CLK_SOURCE_VI_SENSOR, 
164, TEGRA_PERIPH_NO_RESET, tegra_clk_vi_sensor_8),
MUX8("isp", mux_pllm_pllc_pllp_plla_clkm_pllc4, CLK_SOURCE_ISP, 23, 
TEGRA_PERIPH_ON_APB, tegra_clk_isp_8),
-   MUX8("isp", mux_pllc_pllp_plla1_pllc2_c3_clkm_pllc4, CLK_SOURCE_ISP, 
23, TEGRA_PERIPH_ON_APB, tegra_clk_isp_9),
+   MUX8_NOGATE("isp", mux_pllc_pllp_plla1_pllc2_c3_clkm_pllc4, 
CLK_SOURCE_ISP, tegra_clk_isp_9),
MUX8("entropy", mux_pllp_clkm1, CLK_SOURCE_ENTROPY, 149,  0, 
tegra_clk_entropy),
MUX8("entropy", mux_pllp_clkm_clk32_plle, CLK_SOURCE_ENTROPY, 149,  0, 
tegra_clk_entropy_8),
MUX8("hdmi_audio", mux_pllp3_pllc_clkm, CLK_SOURCE_HDMI_AUDIO, 176, 
TEGRA_PERIPH_NO_RESET, tegra_clk_hdmi_audio),
@@ -819,7 +825,8 @@
GATE("xusb_dev", "xusb_dev_src", 95, 0, tegra_clk_xusb_dev, 0),
GATE("emc", "emc_mux", 57, 0, tegra_clk_emc, CLK_IGNORE_UNUSED),
GATE("sata_cold", "clk_m", 129, TEGRA_PERIPH_ON_APB, 
tegra_clk_sata_cold, 0),
-   GATE("ispb", "clk_m", 3, 0, tegra_clk_ispb, 0),
+   GATE("ispa", "isp", 23, 0, tegra_clk_ispa, 0),
+   GATE("ispb", "isp", 3, 0, tegra_clk_ispb, 0),
GATE("vim2_clk", "clk_m", 11, 0, tegra_clk_vim2_clk, 0),
GATE("pcie", "clk_m", 70, 0, tegra_clk_pcie, 0),
GATE("gpu", "pll_ref", 184, 0, tegra_clk_gpu, 0),
diff --git a/drivers/clk/tegra/clk-tegra210.c b/drivers/clk/tegra/clk-tegra210.c
index 2ef8d49..7bda8ba 100644
--- a/drivers/clk/tegra/clk-tegra210.c
+++ b/drivers/clk/tegra/clk-tegra210.c
@@ -2210,6 +2210,7 @@ static u32 pll_expo_p_to_pdiv(u32 p, u32 *pdiv)
[tegra_clk_pll_c4_out3] = { .dt_id = TEGRA210_CLK_PLL_C4_OUT3, .present 
= true },
[tegra_clk_apb2ape] = { .dt_id = TEGRA210_CLK_APB2APE, .present = true 
},
[tegra_clk_pll_a1] = { .dt_id = TEGRA210_CLK_PLL_A1, .present = true },
+   [tegra_clk_ispa] = { .dt_id = TEGRA210_CLK_ISPA, .present = true },
 };

 static struct tegra_devclk devclks[] __initdata = {
diff --git a/include/dt-bindings/clock/tegra210-car.h 
b/include/dt-bindings/clock/tegra210-car.h
index 35288b2..f5c6563 100644
--- a/include/dt-bindings/clock/tegra210-car.h
+++ b/include/dt-bindings/clock/tegra210-car.h
@@ -39,7 +39,7 @@
 /* 20 (register bit affects vi and vi_sensor) */
 /* 21 */
 #define TEGRA210_CLK_USBD 22
-#define TEGRA210_CLK_ISP 23
+#define TEGRA210_CLK_ISPA 23
 /* 24 */
 /* 25 */
 #define TEGRA210_CLK_DISP2 26
@@ -349,7 +349,7 @@
 #define TEGRA210_CLK_PLL_RE_OUT1 319
 /* 320 */
 /* 321 */
-/* 322 */
+#define TEGRA210_CLK_ISP 322
 /* 323 */
 /* 324 */
 /* 325 */

Re: [PATCH 2/7] clk: tegra: fix isp clock modelling

2017-02-22 Thread Mikko Perttunen

The TRM shows a CLK_SOURCE_ISPB register, but after some discussion, it 
seems like that is a documentation generation bug, so this should be 
correct.


Reviewed-by: Mikko Perttunen 

On 22.02.2017 17:13, Peter De Schrijver wrote:

The 2 isp clocks (ispa and ispb) share a mux/divider control. So model
this as 1 mux/divider clock and child gate clocks.

Signed-off-by: Peter De Schrijver 
---
 drivers/clk/tegra/clk-id.h   |  1 +
 drivers/clk/tegra/clk-tegra-periph.c | 11 +--
 drivers/clk/tegra/clk-tegra210.c |  1 +
 include/dt-bindings/clock/tegra210-car.h |  4 ++--
 4 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/clk/tegra/clk-id.h b/drivers/clk/tegra/clk-id.h
index 5738635..1019eb8 100644
--- a/drivers/clk/tegra/clk-id.h
+++ b/drivers/clk/tegra/clk-id.h
@@ -307,6 +307,7 @@ enum clk_id {
tegra_clk_xusb_ssp_src,
tegra_clk_sclk_mux,
tegra_clk_sor_safe,
+   tegra_clk_ispa,
tegra_clk_max,
 };

diff --git a/drivers/clk/tegra/clk-tegra-periph.c 
b/drivers/clk/tegra/clk-tegra-periph.c
index 4ce4e7f..19b00b7 100644
--- a/drivers/clk/tegra/clk-tegra-periph.c
+++ b/drivers/clk/tegra/clk-tegra-periph.c
@@ -168,6 +168,12 @@
  0, TEGRA_PERIPH_NO_GATE, _clk_id,\
  _parents##_idx, 0, _lock)

+#define MUX8_NOGATE(_name, _parents, _offset, _clk_id) \
+   TEGRA_INIT_DATA_TABLE(_name, NULL, NULL, _parents, _offset, \
+ 29, MASK(3), 0, 0, 8, 1, TEGRA_DIVIDER_ROUND_UP,\
+ 0, TEGRA_PERIPH_NO_GATE, _clk_id,\
+ _parents##_idx, 0, NULL)
+
 #define INT(_name, _parents, _offset,  \
_clk_num, _gate_flags, _clk_id) \
TEGRA_INIT_DATA_TABLE(_name, NULL, NULL, _parents, _offset,\
@@ -739,7 +745,7 @@
MUX8("soc_therm", mux_clkm_pllc_pllp_plla, CLK_SOURCE_SOC_THERM, 78, 
TEGRA_PERIPH_ON_APB, tegra_clk_soc_therm_8),
MUX8("vi_sensor", mux_pllm_pllc2_c_c3_pllp_plla, CLK_SOURCE_VI_SENSOR, 
164, TEGRA_PERIPH_NO_RESET, tegra_clk_vi_sensor_8),
MUX8("isp", mux_pllm_pllc_pllp_plla_clkm_pllc4, CLK_SOURCE_ISP, 23, 
TEGRA_PERIPH_ON_APB, tegra_clk_isp_8),
-   MUX8("isp", mux_pllc_pllp_plla1_pllc2_c3_clkm_pllc4, CLK_SOURCE_ISP, 
23, TEGRA_PERIPH_ON_APB, tegra_clk_isp_9),
+   MUX8_NOGATE("isp", mux_pllc_pllp_plla1_pllc2_c3_clkm_pllc4, 
CLK_SOURCE_ISP, tegra_clk_isp_9),
MUX8("entropy", mux_pllp_clkm1, CLK_SOURCE_ENTROPY, 149,  0, 
tegra_clk_entropy),
MUX8("entropy", mux_pllp_clkm_clk32_plle, CLK_SOURCE_ENTROPY, 149,  0, 
tegra_clk_entropy_8),
MUX8("hdmi_audio", mux_pllp3_pllc_clkm, CLK_SOURCE_HDMI_AUDIO, 176, 
TEGRA_PERIPH_NO_RESET, tegra_clk_hdmi_audio),
@@ -819,7 +825,8 @@
GATE("xusb_dev", "xusb_dev_src", 95, 0, tegra_clk_xusb_dev, 0),
GATE("emc", "emc_mux", 57, 0, tegra_clk_emc, CLK_IGNORE_UNUSED),
GATE("sata_cold", "clk_m", 129, TEGRA_PERIPH_ON_APB, 
tegra_clk_sata_cold, 0),
-   GATE("ispb", "clk_m", 3, 0, tegra_clk_ispb, 0),
+   GATE("ispa", "isp", 23, 0, tegra_clk_ispa, 0),
+   GATE("ispb", "isp", 3, 0, tegra_clk_ispb, 0),
GATE("vim2_clk", "clk_m", 11, 0, tegra_clk_vim2_clk, 0),
GATE("pcie", "clk_m", 70, 0, tegra_clk_pcie, 0),
GATE("gpu", "pll_ref", 184, 0, tegra_clk_gpu, 0),
diff --git a/drivers/clk/tegra/clk-tegra210.c b/drivers/clk/tegra/clk-tegra210.c
index 2ef8d49..7bda8ba 100644
--- a/drivers/clk/tegra/clk-tegra210.c
+++ b/drivers/clk/tegra/clk-tegra210.c
@@ -2210,6 +2210,7 @@ static u32 pll_expo_p_to_pdiv(u32 p, u32 *pdiv)
[tegra_clk_pll_c4_out3] = { .dt_id = TEGRA210_CLK_PLL_C4_OUT3, .present 
= true },
[tegra_clk_apb2ape] = { .dt_id = TEGRA210_CLK_APB2APE, .present = true 
},
[tegra_clk_pll_a1] = { .dt_id = TEGRA210_CLK_PLL_A1, .present = true },
+   [tegra_clk_ispa] = { .dt_id = TEGRA210_CLK_ISPA, .present = true },
 };

 static struct tegra_devclk devclks[] __initdata = {
diff --git a/include/dt-bindings/clock/tegra210-car.h 
b/include/dt-bindings/clock/tegra210-car.h
index 35288b2..f5c6563 100644
--- a/include/dt-bindings/clock/tegra210-car.h
+++ b/include/dt-bindings/clock/tegra210-car.h
@@ -39,7 +39,7 @@
 /* 20 (register bit affects vi and vi_sensor) */
 /* 21 */
 #define TEGRA210_CLK_USBD 22
-#define TEGRA210_CLK_ISP 23
+#define TEGRA210_CLK_ISPA 23
 /* 24 */
 /* 25 */
 #define TEGRA210_CLK_DISP2 26
@@ -349,7 +349,7 @@
 #define TEGRA210_CLK_PLL_RE_OUT1 319
 /* 320 */
 /* 321 */
-/* 322 */
+#define TEGRA210_CLK_ISP 322
 /* 323 */
 /* 324 */
 /* 325 */

TEBRIK EDERIZ!!!

2017-02-22 Thread Avustralya Piyangosu Uluslararasi

Tarihli: 20.02.2017
Ref: 435062725
Toplu is: 7050470902/189
Kazanma no: GB8101 / LPRC

TEBRIK EDERIZ!!!

Sevgili kazanan,

Size olan odulunuzu bildirmekten mutluluk duyuyoruz.
20 subat 2017'de yayinlanan
Avustralya
Uluslararasi piyango, tamamen temelli olarak programlanmis
Kazananlarin elektronik bir seckisiyle
Bazi sitelerden e-posta adresleri. e
Bilet numarasina eklenmistir; 839831507056490102
Ve seri numarasi 774113933'dir. Bu toplu is,
sansli sayilari 53-32-00-32-89 ve bonus olarak asagidaki gibi
Sayi 18, dolayisiyla piyangoyu kazandi
Ikinci kategori.

Buradan oturu bir toplu odeme onaylandi.
1.000.000,00 ABD Dolari (BIR MILYON DOLAR) nakit
Kredi dosyasi ref: ILP / HW 6812363/17 toplam nakit degerinden
Bu sekiz sansli kazanan arasinda odul
kategori. Tum katilimcilar,
Doksan yuz cizilmis bilgisayar oy sistemi
Kanada'dan bin e-posta adresi,
Avustralya, Amerika Birlesik Devletleri, Asya, Avrupa, Orta Dogu,
Uluslararasi ve Afrika'nin bir parcasi olan Afrika ve Okyanusya
Her yil duzenlenen tanitim programi. Bu
Piyango, bir konglomera tarafindan tesvik edildi ve desteklendi.
Bazi cok uluslu sirketlerin kendi
Vatandaslara sosyal sorumluluk
Bir operasyonel ussu olan topluluklar.

Dahasi, ayrintilariniz (e-posta adresiniz)
Amsterdam'daki Avrupa temsilcilik ofisimiz,
Oyun kuponunuzda belirtildigi sekilde Hollanda ve
US $ 1,000,000.00 odulu sizden gelecek
Bu bolgesel Asya-Malezya subesi.
Umariz odulunuzun bir kismiyla,
Bizim katilacagiz
Yil sonunda 1.3 Milyar ABD Dolari yuksek payi
Iddianame icin lutfen dosyayi arayin.
Talepler bolumumuzle iddialarla iletisime gecin
Ajan:

Adi: Mr.Polosky
Bu e-postayi yanitla.

Lutfen referans, toplu ve kazanan numaradan alinti yapin.
Bunun sol ust kosesinde bulunabilir
Bildirimde bulunmak, tam adinizi, adresinizi ve
Dosyanizi kolayca bulmaniza yardimci olacak telefon numarasi.
Guvenlik nedenleriyle, bunu kazanmak icin tum kazananlari biz oneriyoruz
Bilgiler kamuya acik olana kadar gizli
Hak talebi islenir ve odulunuz size birakilir.
cift onlemek icin guvenlik protokolu bir parcasidir
Bunun avantajli oldugunu iddia ederek
Katilimcilar tarafindan veya resmi olmayan gramer
Not: Tum kazanclar,
31 Mart 2017; Aksi halde tum fonlar
Talep edilmemis olarak dondu ve sonunda hayir islerine bagislandi
Kuruluslar.

18 yas alti kimse otomatik olarak mazur gorulur.

SAYGILARIMLA,
Bay Paul Rosse
Avustralya Piyangosu Uluslararasi (koordinator) "

TEBRIK EDERIZ!!!

2017-02-22 Thread Avustralya Piyangosu Uluslararasi

Tarihli: 20.02.2017
Ref: 435062725
Toplu is: 7050470902/189
Kazanma no: GB8101 / LPRC

TEBRIK EDERIZ!!!

Sevgili kazanan,

Size olan odulunuzu bildirmekten mutluluk duyuyoruz.
20 subat 2017'de yayinlanan
Avustralya
Uluslararasi piyango, tamamen temelli olarak programlanmis
Kazananlarin elektronik bir seckisiyle
Bazi sitelerden e-posta adresleri. e
Bilet numarasina eklenmistir; 839831507056490102
Ve seri numarasi 774113933'dir. Bu toplu is,
sansli sayilari 53-32-00-32-89 ve bonus olarak asagidaki gibi
Sayi 18, dolayisiyla piyangoyu kazandi
Ikinci kategori.

Buradan oturu bir toplu odeme onaylandi.
1.000.000,00 ABD Dolari (BIR MILYON DOLAR) nakit
Kredi dosyasi ref: ILP / HW 6812363/17 toplam nakit degerinden
Bu sekiz sansli kazanan arasinda odul
kategori. Tum katilimcilar,
Doksan yuz cizilmis bilgisayar oy sistemi
Kanada'dan bin e-posta adresi,
Avustralya, Amerika Birlesik Devletleri, Asya, Avrupa, Orta Dogu,
Uluslararasi ve Afrika'nin bir parcasi olan Afrika ve Okyanusya
Her yil duzenlenen tanitim programi. Bu
Piyango, bir konglomera tarafindan tesvik edildi ve desteklendi.
Bazi cok uluslu sirketlerin kendi
Vatandaslara sosyal sorumluluk
Bir operasyonel ussu olan topluluklar.

Dahasi, ayrintilariniz (e-posta adresiniz)
Amsterdam'daki Avrupa temsilcilik ofisimiz,
Oyun kuponunuzda belirtildigi sekilde Hollanda ve
US $ 1,000,000.00 odulu sizden gelecek
Bu bolgesel Asya-Malezya subesi.
Umariz odulunuzun bir kismiyla,
Bizim katilacagiz
Yil sonunda 1.3 Milyar ABD Dolari yuksek payi
Iddianame icin lutfen dosyayi arayin.
Talepler bolumumuzle iddialarla iletisime gecin
Ajan:

Adi: Mr.Polosky
Bu e-postayi yanitla.

Lutfen referans, toplu ve kazanan numaradan alinti yapin.
Bunun sol ust kosesinde bulunabilir
Bildirimde bulunmak, tam adinizi, adresinizi ve
Dosyanizi kolayca bulmaniza yardimci olacak telefon numarasi.
Guvenlik nedenleriyle, bunu kazanmak icin tum kazananlari biz oneriyoruz
Bilgiler kamuya acik olana kadar gizli
Hak talebi islenir ve odulunuz size birakilir.
cift onlemek icin guvenlik protokolu bir parcasidir
Bunun avantajli oldugunu iddia ederek
Katilimcilar tarafindan veya resmi olmayan gramer
Not: Tum kazanclar,
31 Mart 2017; Aksi halde tum fonlar
Talep edilmemis olarak dondu ve sonunda hayir islerine bagislandi
Kuruluslar.

18 yas alti kimse otomatik olarak mazur gorulur.

SAYGILARIMLA,
Bay Paul Rosse
Avustralya Piyangosu Uluslararasi (koordinator) "

Re: [PATCH] drm: kselftest: fix spelling mistake: "misalinged" -> "misaligned"

2017-02-22 Thread Chris Wilson

On Thu, Feb 23, 2017 at 12:07:17AM +, Colin King wrote:
> From: Colin Ian King 
> 
> trivial fix to spelling mistake in pr_err message
> 
> Signed-off-by: Colin Ian King 
Reviewed-by: Chris Wilson 
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

Re: [PATCH] drm: kselftest: fix spelling mistake: "misalinged" -> "misaligned"

2017-02-22 Thread Chris Wilson

On Thu, Feb 23, 2017 at 12:07:17AM +, Colin King wrote:
> From: Colin Ian King 
> 
> trivial fix to spelling mistake in pr_err message
> 
> Signed-off-by: Colin Ian King 
Reviewed-by: Chris Wilson 
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

Re: [lkp-robot] [mm, vmscan] 5e56dfbd83: fsmark.files_per_sec -11.1% regression

2017-02-22 Thread Michal Hocko

On Thu 23-02-17 09:27:34, Ye Xiaolong wrote:
> Hi, Michal
> 
> On 02/07, Michal Hocko wrote:
> [snip]
> >Could you retest with a single NUMA node? I am not familiar with the
> >benchmark enough to judge it was set up properly for a NUMA machine.
> 
> I've retested the commit with a single NUMA node via "numactl -m 0 fs_mark 
> xxx",
> and it did help recover the performance back.

Thanks for restesting! get_scan_count which was 
> 
> Here is the comparison:
> 
> commit/compiler/cpufreq_governor/disk/filesize/fs/iterations/kconfig/md/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase:
>   
> 5e56dfbd837421b7fa3c6c06018c6701e2704917/gcc-6/performance/3HDD/4M/btrfs/1/x86_64-rhel-7.2/RAID5/64/debian-x86_64-2016-08-31.cgz/NoSync/ivb44/130G/fsmark
>
> (with a single NUMA node) (2 NUMA nodes)
>  
>fail:runs   %reproductionfail:runs
>|  | |
>  %stddev  %change %stddev
>  \   |\  
>  57.60 ±  0%  -11.1%  51.20 ±  0%  fsmark.files_per_sec
> 607.84 ±  0%   +9.0% 662.24 ±  1%  fsmark.time.elapsed_time
> 607.84 ±  0%   +9.0% 662.24 ±  1%  fsmark.time.elapsed_time.max
>  14317 ±  6%  -12.2%  12568 ±  7%  
> fsmark.time.involuntary_context_switches
>   1864 ±  0%   +0.5%   1873 ±  0%  
> fsmark.time.maximum_resident_set_size
>  12425 ±  0%  +23.3%  15320 ±  3%  fsmark.time.minor_page_faults
>  33.00 ±  3%  -33.9%  21.80 ±  1%  
> fsmark.time.percent_of_cpu_this_job_got
> 203.49 ±  3%  -28.1% 146.31 ±  1%  fsmark.time.system_time
> 605701 ±  0%   +3.6% 627486 ±  0%  
> fsmark.time.voluntary_context_switches
> 307106 ±  2%  +20.2% 368992 ±  9%  
> interrupts.CAL:Function_call_interrupts
> 183040 ±  0%  +23.2% 225559 ±  3%  softirqs.BLOCK
>  12203 ± 57% +236.4%  41056 ±103%  softirqs.NET_RX
> 186118 ±  0%  +21.9% 226922 ±  2%  softirqs.TASKLET
>  14317 ±  6%  -12.2%  12568 ±  7%  
> time.involuntary_context_switches
>  12425 ±  0%  +23.3%  15320 ±  3%  time.minor_page_faults
>  33.00 ±  3%  -33.9%  21.80 ±  1%  
> time.percent_of_cpu_this_job_got
> 203.49 ±  3%  -28.1% 146.31 ±  1%  time.system_time
>   3.47 ±  3%  -13.0%   3.02 ±  1%  turbostat.%Busy
>  99.60 ±  1%   -9.6%  90.00 ±  1%  turbostat.Avg_MHz
>  78.69 ±  1%   +1.7%  80.01 ±  0%  turbostat.CorWatt
>   3.56 ± 61%  -91.7%   0.30 ± 76%  turbostat.Pkg%pc2
> 207790 ±  0%   -8.2% 190654 ±  1%  vmstat.io.bo
>   30667691 ±  0%  +65.9%   50890669 ±  1%  vmstat.memory.cache
>   34549892 ±  0%  -58.4%   14378939 ±  4%  vmstat.memory.free
>   6768 ±  0%   -1.3%   6681 ±  1%  vmstat.system.cs
>  1.089e+10 ±  2%  +13.4%  1.236e+10 ±  3%  cpuidle.C1E-IVT.time
>   11475304 ±  2%  +13.4%   13007849 ±  3%  cpuidle.C1E-IVT.usage
>2.7e+09 ±  6%  +13.2%  3.057e+09 ±  3%  cpuidle.C3-IVT.time
>2954294 ±  6%  +14.3%3375966 ±  3%  cpuidle.C3-IVT.usage
>   96963295 ± 14%  +17.5%  1.139e+08 ± 12%  cpuidle.POLL.time
>   8761 ±  7%  +17.6%  10299 ±  9%  cpuidle.POLL.usage
>   30454483 ±  0%  +66.4%   50666102 ±  1%  meminfo.Cached
> 
> Do you see what's happening?

not really. All I could see in the previous data was that the memory
locality was different (and better) with my patch, which I cannot
explain either because get_scan_count is always per-node thing. Moreover
the change shouldn't make any difference for normal GFP_KERNEL requests
on 64b systems because the reclaim index covers all zones so there is
nothing to skip over.

> Or is there anything we can do to improve fsmark benchmark setup to
> make it more reasonable?

Unfortunatelly I am not an expert on this benchmark. Maybe Mel knows
better.
-- 
Michal Hocko
SUSE Labs

Re: [lkp-robot] [mm, vmscan] 5e56dfbd83: fsmark.files_per_sec -11.1% regression

2017-02-22 Thread Michal Hocko

On Thu 23-02-17 09:27:34, Ye Xiaolong wrote:
> Hi, Michal
> 
> On 02/07, Michal Hocko wrote:
> [snip]
> >Could you retest with a single NUMA node? I am not familiar with the
> >benchmark enough to judge it was set up properly for a NUMA machine.
> 
> I've retested the commit with a single NUMA node via "numactl -m 0 fs_mark 
> xxx",
> and it did help recover the performance back.

Thanks for restesting! get_scan_count which was 
> 
> Here is the comparison:
> 
> commit/compiler/cpufreq_governor/disk/filesize/fs/iterations/kconfig/md/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase:
>   
> 5e56dfbd837421b7fa3c6c06018c6701e2704917/gcc-6/performance/3HDD/4M/btrfs/1/x86_64-rhel-7.2/RAID5/64/debian-x86_64-2016-08-31.cgz/NoSync/ivb44/130G/fsmark
>
> (with a single NUMA node) (2 NUMA nodes)
>  
>fail:runs   %reproductionfail:runs
>|  | |
>  %stddev  %change %stddev
>  \   |\  
>  57.60 ±  0%  -11.1%  51.20 ±  0%  fsmark.files_per_sec
> 607.84 ±  0%   +9.0% 662.24 ±  1%  fsmark.time.elapsed_time
> 607.84 ±  0%   +9.0% 662.24 ±  1%  fsmark.time.elapsed_time.max
>  14317 ±  6%  -12.2%  12568 ±  7%  
> fsmark.time.involuntary_context_switches
>   1864 ±  0%   +0.5%   1873 ±  0%  
> fsmark.time.maximum_resident_set_size
>  12425 ±  0%  +23.3%  15320 ±  3%  fsmark.time.minor_page_faults
>  33.00 ±  3%  -33.9%  21.80 ±  1%  
> fsmark.time.percent_of_cpu_this_job_got
> 203.49 ±  3%  -28.1% 146.31 ±  1%  fsmark.time.system_time
> 605701 ±  0%   +3.6% 627486 ±  0%  
> fsmark.time.voluntary_context_switches
> 307106 ±  2%  +20.2% 368992 ±  9%  
> interrupts.CAL:Function_call_interrupts
> 183040 ±  0%  +23.2% 225559 ±  3%  softirqs.BLOCK
>  12203 ± 57% +236.4%  41056 ±103%  softirqs.NET_RX
> 186118 ±  0%  +21.9% 226922 ±  2%  softirqs.TASKLET
>  14317 ±  6%  -12.2%  12568 ±  7%  
> time.involuntary_context_switches
>  12425 ±  0%  +23.3%  15320 ±  3%  time.minor_page_faults
>  33.00 ±  3%  -33.9%  21.80 ±  1%  
> time.percent_of_cpu_this_job_got
> 203.49 ±  3%  -28.1% 146.31 ±  1%  time.system_time
>   3.47 ±  3%  -13.0%   3.02 ±  1%  turbostat.%Busy
>  99.60 ±  1%   -9.6%  90.00 ±  1%  turbostat.Avg_MHz
>  78.69 ±  1%   +1.7%  80.01 ±  0%  turbostat.CorWatt
>   3.56 ± 61%  -91.7%   0.30 ± 76%  turbostat.Pkg%pc2
> 207790 ±  0%   -8.2% 190654 ±  1%  vmstat.io.bo
>   30667691 ±  0%  +65.9%   50890669 ±  1%  vmstat.memory.cache
>   34549892 ±  0%  -58.4%   14378939 ±  4%  vmstat.memory.free
>   6768 ±  0%   -1.3%   6681 ±  1%  vmstat.system.cs
>  1.089e+10 ±  2%  +13.4%  1.236e+10 ±  3%  cpuidle.C1E-IVT.time
>   11475304 ±  2%  +13.4%   13007849 ±  3%  cpuidle.C1E-IVT.usage
>2.7e+09 ±  6%  +13.2%  3.057e+09 ±  3%  cpuidle.C3-IVT.time
>2954294 ±  6%  +14.3%3375966 ±  3%  cpuidle.C3-IVT.usage
>   96963295 ± 14%  +17.5%  1.139e+08 ± 12%  cpuidle.POLL.time
>   8761 ±  7%  +17.6%  10299 ±  9%  cpuidle.POLL.usage
>   30454483 ±  0%  +66.4%   50666102 ±  1%  meminfo.Cached
> 
> Do you see what's happening?

not really. All I could see in the previous data was that the memory
locality was different (and better) with my patch, which I cannot
explain either because get_scan_count is always per-node thing. Moreover
the change shouldn't make any difference for normal GFP_KERNEL requests
on 64b systems because the reclaim index covers all zones so there is
nothing to skip over.

> Or is there anything we can do to improve fsmark benchmark setup to
> make it more reasonable?

Unfortunatelly I am not an expert on this benchmark. Maybe Mel knows
better.
-- 
Michal Hocko
SUSE Labs

Re: [RFC PATCH] mm/vmscan: fix high cpu usage of kswapd if there

2017-02-22 Thread Michal Hocko

On Wed 22-02-17 15:24:06, Johannes Weiner wrote:
> On Wed, Feb 22, 2017 at 03:16:57PM -0500, Johannes Weiner wrote:
> > [...] And then it sounds pretty much like what the allocator/direct
> > reclaim already does.
> 
> On a side note: Michal, I'm not sure I fully understand why we need
> the backoff code in should_reclaim_retry(). If no_progress_loops is
> growing steadily, then we quickly reach 16 and bail anyway. Layering
> on top a backoff function that *might* cut out an iteration or two
> earlier in the cold path of an OOM situation seems unnecessary.
> Conversely, if there *are* intermittent reclaims, no_progress_loops
> gets reset straight to 0, which then also makes the backoff function
> jump back to square one. So in the only situation where backing off
> would make sense - making some progress, but not enough - it's not
> actually backing off. It seems to me it should be enough to bail after
> either 16 iterations or when free + reclaimable < watermark.

Hmm, yes you are right! I wanted to use this backoff to reduce chances
to trash over last remaining reclaimable pages. But the code evolved in
a way that this no longer works that way, as you say. I just got stuck
with the code without rethinking its relevance during the development.

That being said, I think we will eventually want some backoff logic for
those cases where we still make a little progress but not enough (e.g.
count the number of reclaimed pages and give up when we reach a portion
of available reclaimable memory), but the patch below is a good start to
make the code simpler. Feel free to add my Acked-by when posting a full
patch.

Thanks!

> 
> Hm?
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c470b8fe28cf..b0e9495c0530 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3396,11 +3396,10 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
>  /*
>   * Checks whether it makes sense to retry the reclaim to make a forward 
> progress
>   * for the given allocation request.
> - * The reclaim feedback represented by did_some_progress (any progress during
> - * the last reclaim round) and no_progress_loops (number of reclaim rounds 
> without
> - * any progress in a row) is considered as well as the reclaimable pages on 
> the
> - * applicable zone list (with a backoff mechanism which is a function of
> - * no_progress_loops).
> + *
> + * We give up when we either have tried MAX_RECLAIM_RETRIES in a row
> + * without success, or when we couldn't even meet the watermark if we
> + * reclaimed all remaining pages on the LRU lists.
>   *
>   * Returns true if a retry is viable or false to enter the oom path.
>   */
> @@ -3441,13 +3440,11 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order,
>   unsigned long reclaimable;
>  
>   available = reclaimable = zone_reclaimable_pages(zone);
> - available -= DIV_ROUND_UP((*no_progress_loops) * available,
> -   MAX_RECLAIM_RETRIES);
>   available += zone_page_state_snapshot(zone, NR_FREE_PAGES);
>  
>   /*
> -  * Would the allocation succeed if we reclaimed the whole
> -  * available?
> +  * Would the allocation succeed if we reclaimed all
> +  * the reclaimable pages?
>*/
>   if (__zone_watermark_ok(zone, order, min_wmark_pages(zone),
>   ac_classzone_idx(ac), alloc_flags, available)) {

-- 
Michal Hocko
SUSE Labs

Re: [RFC PATCH] mm/vmscan: fix high cpu usage of kswapd if there

2017-02-22 Thread Michal Hocko

On Wed 22-02-17 15:24:06, Johannes Weiner wrote:
> On Wed, Feb 22, 2017 at 03:16:57PM -0500, Johannes Weiner wrote:
> > [...] And then it sounds pretty much like what the allocator/direct
> > reclaim already does.
> 
> On a side note: Michal, I'm not sure I fully understand why we need
> the backoff code in should_reclaim_retry(). If no_progress_loops is
> growing steadily, then we quickly reach 16 and bail anyway. Layering
> on top a backoff function that *might* cut out an iteration or two
> earlier in the cold path of an OOM situation seems unnecessary.
> Conversely, if there *are* intermittent reclaims, no_progress_loops
> gets reset straight to 0, which then also makes the backoff function
> jump back to square one. So in the only situation where backing off
> would make sense - making some progress, but not enough - it's not
> actually backing off. It seems to me it should be enough to bail after
> either 16 iterations or when free + reclaimable < watermark.

Hmm, yes you are right! I wanted to use this backoff to reduce chances
to trash over last remaining reclaimable pages. But the code evolved in
a way that this no longer works that way, as you say. I just got stuck
with the code without rethinking its relevance during the development.

That being said, I think we will eventually want some backoff logic for
those cases where we still make a little progress but not enough (e.g.
count the number of reclaimed pages and give up when we reach a portion
of available reclaimable memory), but the patch below is a good start to
make the code simpler. Feel free to add my Acked-by when posting a full
patch.

Thanks!

> 
> Hm?
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c470b8fe28cf..b0e9495c0530 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3396,11 +3396,10 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
>  /*
>   * Checks whether it makes sense to retry the reclaim to make a forward 
> progress
>   * for the given allocation request.
> - * The reclaim feedback represented by did_some_progress (any progress during
> - * the last reclaim round) and no_progress_loops (number of reclaim rounds 
> without
> - * any progress in a row) is considered as well as the reclaimable pages on 
> the
> - * applicable zone list (with a backoff mechanism which is a function of
> - * no_progress_loops).
> + *
> + * We give up when we either have tried MAX_RECLAIM_RETRIES in a row
> + * without success, or when we couldn't even meet the watermark if we
> + * reclaimed all remaining pages on the LRU lists.
>   *
>   * Returns true if a retry is viable or false to enter the oom path.
>   */
> @@ -3441,13 +3440,11 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order,
>   unsigned long reclaimable;
>  
>   available = reclaimable = zone_reclaimable_pages(zone);
> - available -= DIV_ROUND_UP((*no_progress_loops) * available,
> -   MAX_RECLAIM_RETRIES);
>   available += zone_page_state_snapshot(zone, NR_FREE_PAGES);
>  
>   /*
> -  * Would the allocation succeed if we reclaimed the whole
> -  * available?
> +  * Would the allocation succeed if we reclaimed all
> +  * the reclaimable pages?
>*/
>   if (__zone_watermark_ok(zone, order, min_wmark_pages(zone),
>   ac_classzone_idx(ac), alloc_flags, available)) {

-- 
Michal Hocko
SUSE Labs

Re: [PATCH v2] regulator: devres: introduce managed enable and disable operations

2017-02-22 Thread Dmitry Torokhov

On Tue, Feb 21, 2017 at 10:56:34AM -0800, Mark Brown wrote:
> On Tue, Feb 21, 2017 at 12:30:03AM -0800, Dmitry Torokhov wrote:
> > On Mon, Feb 20, 2017 at 11:02:58AM -0800, Mark Brown wrote:
> > > On Mon, Feb 13, 2017 at 10:51:52AM -0800, Dmitry Torokhov wrote:
>
> > But that is what I meant here about managed action. You are not
> > interacting with managed regulator here, you have managed enable. There
> > is absolutely nothing preventing you from calling
> > devm_regulator_enable() on a regulator that was obtained with
> > regulator_get() (i.e. non-managed).
>
> That's not the point, the point is using both devm_regulator_enable()
> and regulator_enable() and so on.

I understand that you have objection that devm_regulator_enable() and
regulator_enable() can be used together, I just do not see it being a
problem in practice.

I still think we need a way for the drivers to "undo" the enable
automatically. Do you have some other idea how to achieve this? Do you
maybe want regulator_put() to undo all outstanding disables for the
regulator? Then drivers would not need to care about disabling
regulators in error paths/driver teardown.

Where would you want to take the API?

Thanks.

-- 
Dmitry

Re: [PATCH v2] regulator: devres: introduce managed enable and disable operations

2017-02-22 Thread Dmitry Torokhov

On Tue, Feb 21, 2017 at 10:56:34AM -0800, Mark Brown wrote:
> On Tue, Feb 21, 2017 at 12:30:03AM -0800, Dmitry Torokhov wrote:
> > On Mon, Feb 20, 2017 at 11:02:58AM -0800, Mark Brown wrote:
> > > On Mon, Feb 13, 2017 at 10:51:52AM -0800, Dmitry Torokhov wrote:
>
> > But that is what I meant here about managed action. You are not
> > interacting with managed regulator here, you have managed enable. There
> > is absolutely nothing preventing you from calling
> > devm_regulator_enable() on a regulator that was obtained with
> > regulator_get() (i.e. non-managed).
>
> That's not the point, the point is using both devm_regulator_enable()
> and regulator_enable() and so on.

I understand that you have objection that devm_regulator_enable() and
regulator_enable() can be used together, I just do not see it being a
problem in practice.

I still think we need a way for the drivers to "undo" the enable
automatically. Do you have some other idea how to achieve this? Do you
maybe want regulator_put() to undo all outstanding disables for the
regulator? Then drivers would not need to care about disabling
regulators in error paths/driver teardown.

Where would you want to take the API?

Thanks.

-- 
Dmitry

Re: [PATCH v4 01/17] x86/mpx: Do not use SIB index if index points to R/ESP

2017-02-22 Thread Peter Zijlstra

On Wed, Feb 22, 2017 at 10:36:50PM -0800, Ricardo Neri wrote:
> + /*
> +  * A negative offset generally means a error, except
> +  * -EDOM, which means that the contents of the register
> +  * should not be used as index.
> +  */
>   if (indx_offset < 0)
> - goto out_err;
> + if (indx_offset == -EDOM)
> + indx = 0;
> + else
> + goto out_err;
> + else
> + indx = regs_get_register(regs, indx_offset);

Kernel coding style requires more brackets than are strictly required by
C, any block longer than 1 line needs then. Also, if one leg of a
conditional needs them, then they should be on both legs.

Your code has many such instances, please change them all.

Re: [PATCH v4 01/17] x86/mpx: Do not use SIB index if index points to R/ESP

2017-02-22 Thread Peter Zijlstra

On Wed, Feb 22, 2017 at 10:36:50PM -0800, Ricardo Neri wrote:
> + /*
> +  * A negative offset generally means a error, except
> +  * -EDOM, which means that the contents of the register
> +  * should not be used as index.
> +  */
>   if (indx_offset < 0)
> - goto out_err;
> + if (indx_offset == -EDOM)
> + indx = 0;
> + else
> + goto out_err;
> + else
> + indx = regs_get_register(regs, indx_offset);

Kernel coding style requires more brackets than are strictly required by
C, any block longer than 1 line needs then. Also, if one leg of a
conditional needs them, then they should be on both legs.

Your code has many such instances, please change them all.

Re: [RFC PATCH] mm/vmscan: fix high cpu usage of kswapd if there

2017-02-22 Thread Michal Hocko

On Thu 23-02-17 10:46:01, hejianet wrote:
> sorry, resend it due to a delivery-failure:
> "Wrong MIME labeling on 8-bit character texts"
> I am sorry if anybody received it twice
> 
> Hi Johannes
> On 23/02/2017 4:16 AM, Johannes Weiner wrote:
> > On Wed, Feb 22, 2017 at 05:04:48PM +0800, Jia He wrote:
> > > When I try to dynamically allocate the hugepages more than system total
> > > free memory:
> > 
> > > Then the kswapd will take 100% cpu for a long time(more than 3 hours, and
> > > will not be about to end)
> > 
> > > The root cause is kswapd3 is trying to do relaim again and again but it
> > > makes no progress
> > 
> > > At that time, there are no relaimable pages in that node:
> > 
> > Yes, this is a problem with the current kswapd code.
> > 
> > A less artificial scenario that I observed recently was machines with
> > two NUMA nodes, after being up for 200+ days, getting into a state
> > where node0 is mostly consumed by anon and some kernel allocations,
> > leaving less than the high watermark free. The machines don't have
> > swap, so the anon isn't reclaimable. But also, anon LRU is never even
> > *scanned*, so the "all unreclaimable" logic doesn't kick in. Kswapd is
> > spinning at 100% CPU calculating scan counts and checking zone states.
> > 
> > One specific problem with your patch, Jia, is that there might be some
> > cache pages that are pinned one way or another. That was the case on
> > our machines, and so reclaimable pages wasn't 0. Even if we check the
> > reclaimable pages, we need a hard cutoff after X attempts. And then it
> > sounds pretty much like what the allocator/direct reclaim already does.
> > 
> > Can we use the *exact* same cutoff conditions for direct reclaim and
> > kswapd, though? I don't think so. For direct reclaim, the goal is the
> > watermark, to make an allocation happen in the caller. While kswapd
> > tries to restore the watermarks too, it might never meet them but
> > still do useful work on behalf of concurrently allocating threads. It
> > should only stop when it tries and fails to free any pages at all.
> > 
> Yes, this is what I thought before this patch，but seems Michal
> doesn't like this idea :)
> Please see https://lkml.org/lkml/2017/1/24/543

Yeah, I didn't like the hard limit on kswapd retries as you proposed it.
It didn't make much sense to me because the current condition for kswapd
to back off is to have all zones balanced. Without further criterion
kswapd would just wake up and go around the same retry loops again with
no progress. I didn't realize that a direct reclaim progress might be
that criterion. Proposal from Johannes makes much more sense. I have to
think about it some more but this looks like a way forward.
-- 
Michal Hocko
SUSE Labs

Re: [RFC PATCH] mm/vmscan: fix high cpu usage of kswapd if there

2017-02-22 Thread Michal Hocko

On Thu 23-02-17 10:46:01, hejianet wrote:
> sorry, resend it due to a delivery-failure:
> "Wrong MIME labeling on 8-bit character texts"
> I am sorry if anybody received it twice
> 
> Hi Johannes
> On 23/02/2017 4:16 AM, Johannes Weiner wrote:
> > On Wed, Feb 22, 2017 at 05:04:48PM +0800, Jia He wrote:
> > > When I try to dynamically allocate the hugepages more than system total
> > > free memory:
> > 
> > > Then the kswapd will take 100% cpu for a long time(more than 3 hours, and
> > > will not be about to end)
> > 
> > > The root cause is kswapd3 is trying to do relaim again and again but it
> > > makes no progress
> > 
> > > At that time, there are no relaimable pages in that node:
> > 
> > Yes, this is a problem with the current kswapd code.
> > 
> > A less artificial scenario that I observed recently was machines with
> > two NUMA nodes, after being up for 200+ days, getting into a state
> > where node0 is mostly consumed by anon and some kernel allocations,
> > leaving less than the high watermark free. The machines don't have
> > swap, so the anon isn't reclaimable. But also, anon LRU is never even
> > *scanned*, so the "all unreclaimable" logic doesn't kick in. Kswapd is
> > spinning at 100% CPU calculating scan counts and checking zone states.
> > 
> > One specific problem with your patch, Jia, is that there might be some
> > cache pages that are pinned one way or another. That was the case on
> > our machines, and so reclaimable pages wasn't 0. Even if we check the
> > reclaimable pages, we need a hard cutoff after X attempts. And then it
> > sounds pretty much like what the allocator/direct reclaim already does.
> > 
> > Can we use the *exact* same cutoff conditions for direct reclaim and
> > kswapd, though? I don't think so. For direct reclaim, the goal is the
> > watermark, to make an allocation happen in the caller. While kswapd
> > tries to restore the watermarks too, it might never meet them but
> > still do useful work on behalf of concurrently allocating threads. It
> > should only stop when it tries and fails to free any pages at all.
> > 
> Yes, this is what I thought before this patch，but seems Michal
> doesn't like this idea :)
> Please see https://lkml.org/lkml/2017/1/24/543

Yeah, I didn't like the hard limit on kswapd retries as you proposed it.
It didn't make much sense to me because the current condition for kswapd
to back off is to have all zones balanced. Without further criterion
kswapd would just wake up and go around the same retry loops again with
no progress. I didn't realize that a direct reclaim progress might be
that criterion. Proposal from Johannes makes much more sense. I have to
think about it some more but this looks like a way forward.
-- 
Michal Hocko
SUSE Labs

Re: [Outreachy kernel] Re: [PATCH v2] staging: wilc1000: renames struct tstrRSSI and its members u8Index, u8Full

2017-02-22 Thread Julia Lawall

> Thanks for the feedback Arend, I really appreciate it. I've decided to go with
> these changes in my follow-up patch request:
>
> - rename tstrRSSI to 'rssi_history_buffer' as Aren suggested since it makes 
> the
> purpose of the struct clear
> - remove Hungarian notation from all tstrRSSI members' names
> - change type of u8Full to bool since it's only ever 1 or 0
> - change name of as8RSSI to 'samples' since this buffer is only ever used to
> compute an average, and the "rssi" prefix is implied by the struct's name
> - rename str_rssi to rssi_history in the network_info struct for clarity
>
> Since my reasoning for these changes deviates from just "renaming to
> avoid camel casing" (as in the original checkpatch.pl warning), would it still
> make sense to submit all this in a single patch? I know my commit message
> needs to change but I wonder if this is too much detail.

I would strongly suggest not to do it all in a single patch.  Even if these
changes are not very complicated conceptually, there is always a chance of
doing things wrong.  Taking the problems one by one will improve the chance
that the result is correct.  Also, the results will be easier for you and
others to review if each patch only does one thing.  And easier to revert
if needed later if something goes wrong.

julia

Re: [Outreachy kernel] Re: [PATCH v2] staging: wilc1000: renames struct tstrRSSI and its members u8Index, u8Full

2017-02-22 Thread Julia Lawall

> Thanks for the feedback Arend, I really appreciate it. I've decided to go with
> these changes in my follow-up patch request:
>
> - rename tstrRSSI to 'rssi_history_buffer' as Aren suggested since it makes 
> the
> purpose of the struct clear
> - remove Hungarian notation from all tstrRSSI members' names
> - change type of u8Full to bool since it's only ever 1 or 0
> - change name of as8RSSI to 'samples' since this buffer is only ever used to
> compute an average, and the "rssi" prefix is implied by the struct's name
> - rename str_rssi to rssi_history in the network_info struct for clarity
>
> Since my reasoning for these changes deviates from just "renaming to
> avoid camel casing" (as in the original checkpatch.pl warning), would it still
> make sense to submit all this in a single patch? I know my commit message
> needs to change but I wonder if this is too much detail.

I would strongly suggest not to do it all in a single patch.  Even if these
changes are not very complicated conceptually, there is always a chance of
doing things wrong.  Taking the problems one by one will improve the chance
that the result is correct.  Also, the results will be easier for you and
others to review if each patch only does one thing.  And easier to revert
if needed later if something goes wrong.

julia

[PATCH] target: Fix NULL dereference during LUN lookup + active I/O shutdown

2017-02-22 Thread Nicholas A. Bellinger

From: Nicholas Bellinger 

When transport_clear_lun_ref() is shutting down a se_lun via
configfs with new I/O in-flight, it's possible to trigger a
NULL pointer dereference in transport_lookup_cmd_lun() due
to the fact percpu_ref_get() doesn't do any __PERCPU_REF_DEAD
checking before incrementing lun->lun_ref.count after
lun->lun_ref has switched to atomic_t mode.

This results in a NULL pointer dereference as LUN shutdown
code in core_tpg_remove_lun() continues running after the
existing ->release() -> core_tpg_lun_ref_release() callback
completes, and clears the RCU protected se_lun->lun_se_dev
pointer.

During the OOPs, the state of lun->lun_ref in the process
which triggered the NULL pointer dereference looks like
the following on v4.1.y stable code:

struct se_lun {
  lun_link_magic = 4294932337,
  lun_status = TRANSPORT_LUN_STATUS_FREE,

  .

  lun_se_dev = 0x0,
  lun_sep = 0x0,

  .

  lun_ref = {
count = {
  counter = 1
},
percpu_count_ptr = 3,
release = 0xa02fa1e0 ,
confirm_switch = 0x0,
force_atomic = false,
rcu = {
  next = 0x88154fa1a5d0,
  func = 0x8137c4c0 
}
  }
}

To address this bug, use percpu_ref_tryget_live() to ensure
once __PERCPU_REF_DEAD is visable on all CPUs and ->lun_ref
has switched to atomic_t, all new I/Os will fail to obtain
a new lun->lun_ref reference.

Also use an explicit percpu_ref_kill_and_confirm() callback
to block on ->lun_ref_comp to allow the first stage and
associated RCU grace period to complete, and then block on
->lun_ref_shutdown waiting for the final percpu_ref_put()
to drop the last reference via transport_lun_remove_cmd()
before continuing with core_tpg_remove_lun() shutdown.

Reported-by: Rob Millner 
Tested-by: Rob Millner 
Cc: Rob Millner 
Tested-by: Vaibhav Tandon 
Cc: Vaibhav Tandon 
Signed-off-by: Nicholas Bellinger 
---
 drivers/target/target_core_device.c| 10 --
 drivers/target/target_core_tpg.c   |  3 ++-
 drivers/target/target_core_transport.c | 31 ++-
 include/target/target_core_base.h  |  1 +
 4 files changed, 41 insertions(+), 4 deletions(-)

diff --git a/drivers/target/target_core_device.c 
b/drivers/target/target_core_device.c
index cb7047d..c754ae3 100644
--- a/drivers/target/target_core_device.c
+++ b/drivers/target/target_core_device.c
@@ -78,12 +78,16 @@
>read_bytes);
 
se_lun = rcu_dereference(deve->se_lun);
+
+   if (!percpu_ref_tryget_live(_lun->lun_ref)) {
+   se_lun = NULL;
+   goto out_unlock;
+   }
+
se_cmd->se_lun = rcu_dereference(deve->se_lun);
se_cmd->pr_res_key = deve->pr_res_key;
se_cmd->orig_fe_lun = unpacked_lun;
se_cmd->se_cmd_flags |= SCF_SE_LUN_CMD;
-
-   percpu_ref_get(_lun->lun_ref);
se_cmd->lun_ref_active = true;
 
if ((se_cmd->data_direction == DMA_TO_DEVICE) &&
@@ -97,6 +101,7 @@
goto ref_dev;
}
}
+out_unlock:
rcu_read_unlock();
 
if (!se_lun) {
@@ -815,6 +820,7 @@ struct se_device *target_alloc_device(struct se_hba *hba, 
const char *name)
xcopy_lun = >xcopy_lun;
rcu_assign_pointer(xcopy_lun->lun_se_dev, dev);
init_completion(_lun->lun_ref_comp);
+   init_completion(_lun->lun_shutdown_comp);
INIT_LIST_HEAD(_lun->lun_deve_list);
INIT_LIST_HEAD(_lun->lun_dev_link);
mutex_init(_lun->lun_tg_pt_md_mutex);
diff --git a/drivers/target/target_core_tpg.c b/drivers/target/target_core_tpg.c
index d99752c..2744251 100644
--- a/drivers/target/target_core_tpg.c
+++ b/drivers/target/target_core_tpg.c
@@ -445,7 +445,7 @@ static void core_tpg_lun_ref_release(struct percpu_ref *ref)
 {
struct se_lun *lun = container_of(ref, struct se_lun, lun_ref);
 
-   complete(>lun_ref_comp);
+   complete(>lun_shutdown_comp);
 }
 
 int core_tpg_register(
@@ -571,6 +571,7 @@ struct se_lun *core_tpg_alloc_lun(
lun->lun_link_magic = SE_LUN_LINK_MAGIC;
atomic_set(>lun_acl_count, 0);
init_completion(>lun_ref_comp);
+   init_completion(>lun_shutdown_comp);
INIT_LIST_HEAD(>lun_deve_list);
INIT_LIST_HEAD(>lun_dev_link);
atomic_set(>lun_tg_pt_secondary_offline, 0);
diff --git a/drivers/target/target_core_transport.c 
b/drivers/target/target_core_transport.c
index efb9e6f..434d9d6 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -2700,10 +2700,39 @@ void target_wait_for_sess_cmds(struct se_session 
*se_sess)
 }
 EXPORT_SYMBOL(target_wait_for_sess_cmds);
 
+static void target_lun_confirm(struct percpu_ref *ref)
+{
+   struct se_lun *lun =

[PATCH] target: Fix NULL dereference during LUN lookup + active I/O shutdown

2017-02-22 Thread Nicholas A. Bellinger

From: Nicholas Bellinger 

When transport_clear_lun_ref() is shutting down a se_lun via
configfs with new I/O in-flight, it's possible to trigger a
NULL pointer dereference in transport_lookup_cmd_lun() due
to the fact percpu_ref_get() doesn't do any __PERCPU_REF_DEAD
checking before incrementing lun->lun_ref.count after
lun->lun_ref has switched to atomic_t mode.

This results in a NULL pointer dereference as LUN shutdown
code in core_tpg_remove_lun() continues running after the
existing ->release() -> core_tpg_lun_ref_release() callback
completes, and clears the RCU protected se_lun->lun_se_dev
pointer.

During the OOPs, the state of lun->lun_ref in the process
which triggered the NULL pointer dereference looks like
the following on v4.1.y stable code:

struct se_lun {
  lun_link_magic = 4294932337,
  lun_status = TRANSPORT_LUN_STATUS_FREE,

  .

  lun_se_dev = 0x0,
  lun_sep = 0x0,

  .

  lun_ref = {
count = {
  counter = 1
},
percpu_count_ptr = 3,
release = 0xa02fa1e0 ,
confirm_switch = 0x0,
force_atomic = false,
rcu = {
  next = 0x88154fa1a5d0,
  func = 0x8137c4c0 
}
  }
}

To address this bug, use percpu_ref_tryget_live() to ensure
once __PERCPU_REF_DEAD is visable on all CPUs and ->lun_ref
has switched to atomic_t, all new I/Os will fail to obtain
a new lun->lun_ref reference.

Also use an explicit percpu_ref_kill_and_confirm() callback
to block on ->lun_ref_comp to allow the first stage and
associated RCU grace period to complete, and then block on
->lun_ref_shutdown waiting for the final percpu_ref_put()
to drop the last reference via transport_lun_remove_cmd()
before continuing with core_tpg_remove_lun() shutdown.

Reported-by: Rob Millner 
Tested-by: Rob Millner 
Cc: Rob Millner 
Tested-by: Vaibhav Tandon 
Cc: Vaibhav Tandon 
Signed-off-by: Nicholas Bellinger 
---
 drivers/target/target_core_device.c| 10 --
 drivers/target/target_core_tpg.c   |  3 ++-
 drivers/target/target_core_transport.c | 31 ++-
 include/target/target_core_base.h  |  1 +
 4 files changed, 41 insertions(+), 4 deletions(-)

diff --git a/drivers/target/target_core_device.c 
b/drivers/target/target_core_device.c
index cb7047d..c754ae3 100644
--- a/drivers/target/target_core_device.c
+++ b/drivers/target/target_core_device.c
@@ -78,12 +78,16 @@
>read_bytes);
 
se_lun = rcu_dereference(deve->se_lun);
+
+   if (!percpu_ref_tryget_live(_lun->lun_ref)) {
+   se_lun = NULL;
+   goto out_unlock;
+   }
+
se_cmd->se_lun = rcu_dereference(deve->se_lun);
se_cmd->pr_res_key = deve->pr_res_key;
se_cmd->orig_fe_lun = unpacked_lun;
se_cmd->se_cmd_flags |= SCF_SE_LUN_CMD;
-
-   percpu_ref_get(_lun->lun_ref);
se_cmd->lun_ref_active = true;
 
if ((se_cmd->data_direction == DMA_TO_DEVICE) &&
@@ -97,6 +101,7 @@
goto ref_dev;
}
}
+out_unlock:
rcu_read_unlock();
 
if (!se_lun) {
@@ -815,6 +820,7 @@ struct se_device *target_alloc_device(struct se_hba *hba, 
const char *name)
xcopy_lun = >xcopy_lun;
rcu_assign_pointer(xcopy_lun->lun_se_dev, dev);
init_completion(_lun->lun_ref_comp);
+   init_completion(_lun->lun_shutdown_comp);
INIT_LIST_HEAD(_lun->lun_deve_list);
INIT_LIST_HEAD(_lun->lun_dev_link);
mutex_init(_lun->lun_tg_pt_md_mutex);
diff --git a/drivers/target/target_core_tpg.c b/drivers/target/target_core_tpg.c
index d99752c..2744251 100644
--- a/drivers/target/target_core_tpg.c
+++ b/drivers/target/target_core_tpg.c
@@ -445,7 +445,7 @@ static void core_tpg_lun_ref_release(struct percpu_ref *ref)
 {
struct se_lun *lun = container_of(ref, struct se_lun, lun_ref);
 
-   complete(>lun_ref_comp);
+   complete(>lun_shutdown_comp);
 }
 
 int core_tpg_register(
@@ -571,6 +571,7 @@ struct se_lun *core_tpg_alloc_lun(
lun->lun_link_magic = SE_LUN_LINK_MAGIC;
atomic_set(>lun_acl_count, 0);
init_completion(>lun_ref_comp);
+   init_completion(>lun_shutdown_comp);
INIT_LIST_HEAD(>lun_deve_list);
INIT_LIST_HEAD(>lun_dev_link);
atomic_set(>lun_tg_pt_secondary_offline, 0);
diff --git a/drivers/target/target_core_transport.c 
b/drivers/target/target_core_transport.c
index efb9e6f..434d9d6 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -2700,10 +2700,39 @@ void target_wait_for_sess_cmds(struct se_session 
*se_sess)
 }
 EXPORT_SYMBOL(target_wait_for_sess_cmds);
 
+static void target_lun_confirm(struct percpu_ref *ref)
+{
+   struct se_lun *lun = container_of(ref, struct se_lun, lun_ref);
+
+   complete(>lun_ref_comp);
+}
+
 void transport_clear_lun_ref(struct se_lun *lun)
 {
-

[PATCH] ARM: dts: sunxi: Add regulators for Sinovoip BPI-M2

2017-02-22 Thread Emmanuel Vadot

Add the needed node for DFVS on Sinovoip BPI-M2.
This add the axp221 under the p2wi node, the regulators and
the cpu-supply property for cpu0.

Signed-off-by: Emmanuel Vadot 
---
 arch/arm/boot/dts/sun6i-a31s-sinovoip-bpi-m2.dts | 57 
 1 file changed, 57 insertions(+)

diff --git a/arch/arm/boot/dts/sun6i-a31s-sinovoip-bpi-m2.dts 
b/arch/arm/boot/dts/sun6i-a31s-sinovoip-bpi-m2.dts
index db7fa13f5425..48cf5eb1f042 100644
--- a/arch/arm/boot/dts/sun6i-a31s-sinovoip-bpi-m2.dts
+++ b/arch/arm/boot/dts/sun6i-a31s-sinovoip-bpi-m2.dts
@@ -86,6 +86,10 @@
};
 };
 
+ {
+   cpu-supply = <_dcdc3>;
+};
+
  {
status = "okay";
 };
@@ -151,6 +155,17 @@
status = "okay";
 };
 
+ {
+   status = "okay";
+
+   axp22x: pmic@68 {
+   compatible = "x-powers,axp221";
+   reg = <0x68>;
+   interrupt-parent = <_intc>;
+   interrupts = <0 IRQ_TYPE_LEVEL_LOW>;
+   };
+};
+
  {
gmac_phy_reset_pin_bpi_m2: gmac_phy_reset_pin@0 {
allwinner,pins = "PA21";
@@ -183,6 +198,48 @@
};
 };
 
+#include "axp22x.dtsi"
+
+_dc5ldo {
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <132>;
+   regulator-name = "vdd-cpus";
+};
+
+_dcdc1 {
+   regulator-always-on;
+   regulator-min-microvolt = <300>;
+   regulator-max-microvolt = <300>;
+   regulator-name = "vdd-3v0";
+};
+
+_dcdc2 {
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <132>;
+   regulator-name = "vdd-gpu";
+};
+
+_dcdc3 {
+   regulator-always-on;
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <132>;
+   regulator-name = "vdd-cpu";
+};
+
+_dcdc4 {
+   regulator-always-on;
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <132>;
+   regulator-name = "vdd-sys-dll";
+};
+
+_dcdc5 {
+   regulator-always-on;
+   regulator-min-microvolt = <150>;
+   regulator-max-microvolt = <150>;
+   regulator-name = "vcc-dram";
+};
+
  {
pinctrl-names = "default";
pinctrl-0 = <_pins_a>;
-- 
2.11.0

[PATCH] ARM: dts: sunxi: Add regulators for Sinovoip BPI-M2

2017-02-22 Thread Emmanuel Vadot

Add the needed node for DFVS on Sinovoip BPI-M2.
This add the axp221 under the p2wi node, the regulators and
the cpu-supply property for cpu0.

Signed-off-by: Emmanuel Vadot 
---
 arch/arm/boot/dts/sun6i-a31s-sinovoip-bpi-m2.dts | 57 
 1 file changed, 57 insertions(+)

diff --git a/arch/arm/boot/dts/sun6i-a31s-sinovoip-bpi-m2.dts 
b/arch/arm/boot/dts/sun6i-a31s-sinovoip-bpi-m2.dts
index db7fa13f5425..48cf5eb1f042 100644
--- a/arch/arm/boot/dts/sun6i-a31s-sinovoip-bpi-m2.dts
+++ b/arch/arm/boot/dts/sun6i-a31s-sinovoip-bpi-m2.dts
@@ -86,6 +86,10 @@
};
 };
 
+ {
+   cpu-supply = <_dcdc3>;
+};
+
  {
status = "okay";
 };
@@ -151,6 +155,17 @@
status = "okay";
 };
 
+ {
+   status = "okay";
+
+   axp22x: pmic@68 {
+   compatible = "x-powers,axp221";
+   reg = <0x68>;
+   interrupt-parent = <_intc>;
+   interrupts = <0 IRQ_TYPE_LEVEL_LOW>;
+   };
+};
+
  {
gmac_phy_reset_pin_bpi_m2: gmac_phy_reset_pin@0 {
allwinner,pins = "PA21";
@@ -183,6 +198,48 @@
};
 };
 
+#include "axp22x.dtsi"
+
+_dc5ldo {
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <132>;
+   regulator-name = "vdd-cpus";
+};
+
+_dcdc1 {
+   regulator-always-on;
+   regulator-min-microvolt = <300>;
+   regulator-max-microvolt = <300>;
+   regulator-name = "vdd-3v0";
+};
+
+_dcdc2 {
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <132>;
+   regulator-name = "vdd-gpu";
+};
+
+_dcdc3 {
+   regulator-always-on;
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <132>;
+   regulator-name = "vdd-cpu";
+};
+
+_dcdc4 {
+   regulator-always-on;
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <132>;
+   regulator-name = "vdd-sys-dll";
+};
+
+_dcdc5 {
+   regulator-always-on;
+   regulator-min-microvolt = <150>;
+   regulator-max-microvolt = <150>;
+   regulator-name = "vcc-dram";
+};
+
  {
pinctrl-names = "default";
pinctrl-0 = <_pins_a>;
-- 
2.11.0

Re: [PATCH V3 0/4] Define coherent device memory node

2017-02-22 Thread Anshuman Khandual

On 02/22/2017 03:20 PM, Michal Hocko wrote:
> On Tue 21-02-17 19:09:18, Anshuman Khandual wrote:
>> On 02/21/2017 04:41 PM, Michal Hocko wrote:
>>> On Fri 17-02-17 17:11:57, Anshuman Khandual wrote:
>>> [...]
 * User space using mbind() to get CDM memory is an additional benefit
   we get by making the CDM plug in as a node and be part of the buddy
   allocator. But the over all idea from the user space point of view
   is that the application can allocate any generic buffer and try to
   use the buffer either from the CPU side or from the device without
   knowing about where the buffer is really mapped physically. That
   gives a seamless and transparent view to the user space where CPU
   compute and possible device based compute can work together. This
   is not possible through a driver allocated buffer.
>>>
>>> But how are you going to define any policy around that. Who is allowed
>>
>> The user space VMA can define the policy with a mbind(MPOL_BIND) call
>> with CDM/CDMs in the nodemask.
>>
>>> to allocate and how much of this "special memory". Is it possible that
>>
>> Any user space application with mbind(MPOL_BIND) call with CDM/CDMs in
>> the nodemask can allocate from the CDM memory. "How much" gets controlled
>> by how we fault from CPU and the default behavior of the buddy allocator.
> 
> In other words the policy is implemented by the kernel. Why is this a
> good thing?

Its controlled by the kernel only during page fault paths of either CPU
or device. But the device driver will actually do the placements after
wards after taking into consideration access patterns and relative
performance. We dont want the driver to be involved during page fault
path memory allocations which should naturally go through the buddy
allocator.

> 
>>> we will eventually need some access control mechanism? If yes then mbind
>>
>> No access control mechanism is needed. If an application wants to use
>> CDM memory by specifying in the mbind() it can. Nothing prevents it
>> from using the CDM memory.
> 
> What if we find out that an access control _is_ really needed? I can
> easily imagine that some devices will come up with really fast and expensive
> memory. You do not want some random user to steal it from you when you
> want to use it for your workload.

Hmm, it makes sense but I think its not something we have to deal with
right away. Later we may have to think about some generic access control
mechanism for mbind() and then accommodate CDM with it.

> 
>>> is really not suitable interface to (ab)use. Also what should happen if
>>> the mbind mentions only CDM memory and that is depleted?
>>
>> IIUC *only CDM* cannot be requested from user space as there are no user
>> visible interface which can translate to __GFP_THISNODE.
> 
> I do not understand what __GFP_THISNODE has to do with this. This is an
> internal flag.

Right. My bad. I was just referring to the fact that there is nothing in
user space which can make buddy allocator pick NOFALLBACK list instead of
FALLBACK list.

> 
>> MPOL_BIND with
>> CDM in the nodemask will eventually pick a FALLBACK zonelist which will
>> have zones of the system including CDM ones. If the resultant CDM zones
>> run out of memory, we fail the allocation request as usual.
> 
> OK, so let's say you mbind to a single node which is CDM. You seem to be
> saying that we will simply break the NUMA affinity in this special case?

Why ? It should simply follow what happens when we pick a single NUMA node
in previous situations.

> Currently we invoke the OOM killer if nodes which the application binds
> to are depleted and cannot be reclaimed.

Right, the same should happen here for CDM as well.

>  
>>> Could you also explain why the transparent view is really better than
>>> using a device specific mmap (aka CDM awareness)?
>>
>> Okay with a transparent view, we can achieve a control flow of application
>> like the following.
>>
>> (1) Allocate a buffer:   alloc_buffer(buf, size)
>> (2) CPU compute on buffer:   cpu_compute(buf, size)
>> (3) Device compute on buffer:device_compute(buf, size)
>> (4) CPU compute on buffer:   cpu_compute(buf, size)
>> (5) Release the buffer:  release_buffer(buf, size)
>>
>> With assistance from a device specific driver, the actual page mapping of
>> the buffer can change between system RAM and device memory depending on
>> which side is accessing at a given point. This will be achieved through
>> driver initiated migrations.
> 
> But then you do not need any NUMA affinity, right? The driver can do
> all this automagically. How does the numa policy comes into the game in
> your above example. Sorry for being dense, I might be really missing
> something important here, but I really fail to see why the NUMA is the
> proper interface here.

You are right. Driver can migrate any mapping in the userspace to any
where on the system as long as cpuset does not prohibit it. But we still
want the

Re: [PATCH V3 0/4] Define coherent device memory node

2017-02-22 Thread Anshuman Khandual

On 02/22/2017 03:20 PM, Michal Hocko wrote:
> On Tue 21-02-17 19:09:18, Anshuman Khandual wrote:
>> On 02/21/2017 04:41 PM, Michal Hocko wrote:
>>> On Fri 17-02-17 17:11:57, Anshuman Khandual wrote:
>>> [...]
 * User space using mbind() to get CDM memory is an additional benefit
   we get by making the CDM plug in as a node and be part of the buddy
   allocator. But the over all idea from the user space point of view
   is that the application can allocate any generic buffer and try to
   use the buffer either from the CPU side or from the device without
   knowing about where the buffer is really mapped physically. That
   gives a seamless and transparent view to the user space where CPU
   compute and possible device based compute can work together. This
   is not possible through a driver allocated buffer.
>>>
>>> But how are you going to define any policy around that. Who is allowed
>>
>> The user space VMA can define the policy with a mbind(MPOL_BIND) call
>> with CDM/CDMs in the nodemask.
>>
>>> to allocate and how much of this "special memory". Is it possible that
>>
>> Any user space application with mbind(MPOL_BIND) call with CDM/CDMs in
>> the nodemask can allocate from the CDM memory. "How much" gets controlled
>> by how we fault from CPU and the default behavior of the buddy allocator.
> 
> In other words the policy is implemented by the kernel. Why is this a
> good thing?

Its controlled by the kernel only during page fault paths of either CPU
or device. But the device driver will actually do the placements after
wards after taking into consideration access patterns and relative
performance. We dont want the driver to be involved during page fault
path memory allocations which should naturally go through the buddy
allocator.

> 
>>> we will eventually need some access control mechanism? If yes then mbind
>>
>> No access control mechanism is needed. If an application wants to use
>> CDM memory by specifying in the mbind() it can. Nothing prevents it
>> from using the CDM memory.
> 
> What if we find out that an access control _is_ really needed? I can
> easily imagine that some devices will come up with really fast and expensive
> memory. You do not want some random user to steal it from you when you
> want to use it for your workload.

Hmm, it makes sense but I think its not something we have to deal with
right away. Later we may have to think about some generic access control
mechanism for mbind() and then accommodate CDM with it.

> 
>>> is really not suitable interface to (ab)use. Also what should happen if
>>> the mbind mentions only CDM memory and that is depleted?
>>
>> IIUC *only CDM* cannot be requested from user space as there are no user
>> visible interface which can translate to __GFP_THISNODE.
> 
> I do not understand what __GFP_THISNODE has to do with this. This is an
> internal flag.

Right. My bad. I was just referring to the fact that there is nothing in
user space which can make buddy allocator pick NOFALLBACK list instead of
FALLBACK list.

> 
>> MPOL_BIND with
>> CDM in the nodemask will eventually pick a FALLBACK zonelist which will
>> have zones of the system including CDM ones. If the resultant CDM zones
>> run out of memory, we fail the allocation request as usual.
> 
> OK, so let's say you mbind to a single node which is CDM. You seem to be
> saying that we will simply break the NUMA affinity in this special case?

Why ? It should simply follow what happens when we pick a single NUMA node
in previous situations.

> Currently we invoke the OOM killer if nodes which the application binds
> to are depleted and cannot be reclaimed.

Right, the same should happen here for CDM as well.

>  
>>> Could you also explain why the transparent view is really better than
>>> using a device specific mmap (aka CDM awareness)?
>>
>> Okay with a transparent view, we can achieve a control flow of application
>> like the following.
>>
>> (1) Allocate a buffer:   alloc_buffer(buf, size)
>> (2) CPU compute on buffer:   cpu_compute(buf, size)
>> (3) Device compute on buffer:device_compute(buf, size)
>> (4) CPU compute on buffer:   cpu_compute(buf, size)
>> (5) Release the buffer:  release_buffer(buf, size)
>>
>> With assistance from a device specific driver, the actual page mapping of
>> the buffer can change between system RAM and device memory depending on
>> which side is accessing at a given point. This will be achieved through
>> driver initiated migrations.
> 
> But then you do not need any NUMA affinity, right? The driver can do
> all this automagically. How does the numa policy comes into the game in
> your above example. Sorry for being dense, I might be really missing
> something important here, but I really fail to see why the NUMA is the
> proper interface here.

You are right. Driver can migrate any mapping in the userspace to any
where on the system as long as cpuset does not prohibit it. But we still
want the

[PATCH v4 06/17] x86/insn-eval: Add utility function to get segment descriptor

2017-02-22 Thread Ricardo Neri

The segment descriptor contains information that is relevant to how linear
address need to be computed. It contains the default size of addresses as
well as the base address of the segment. Thus, given a segment selector,
we ought look at segment descriptor to correctly calculate the linear
address.

In protected mode, the segment selector might indicate a segment
descriptor from either the global descriptor table or a local descriptor
table. Both cases are considered in this function.

This function is the initial implementation for subsequent functions that
will obtain the aforementioned attributes of the segment descriptor.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 61 
 1 file changed, 61 insertions(+)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 516902e..e6d5dfb 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -5,9 +5,13 @@
  */
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 
 enum reg_type {
@@ -262,6 +266,63 @@ static int get_reg_offset(struct insn *insn, struct 
pt_regs *regs,
 }
 
 /**
+ * get_desc() - Obtain address of segment descriptor
+ * @seg:   Segment selector
+ * @desc:  Pointer to the selected segment descriptor
+ *
+ * Given a segment selector, obtain a memory pointer to the segment
+ * descriptor. Both global and local descriptor tables are supported.
+ * desc will contain the address of the descriptor.
+ *
+ * Return: 0 if success, -EINVAL if failure
+ */
+static int get_desc(unsigned short seg, struct desc_struct **desc)
+{
+   struct desc_ptr gdt_desc = {0, 0};
+   unsigned long desc_base;
+
+   if (!desc)
+   return -EINVAL;
+
+   desc_base = seg & ~(SEGMENT_RPL_MASK | SEGMENT_TI_MASK);
+
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
+   if ((seg & SEGMENT_TI_MASK) == SEGMENT_LDT) {
+   seg >>= 3;
+
+   mutex_lock(>active_mm->context.lock);
+   if (unlikely(!current->active_mm->context.ldt ||
+seg >= current->active_mm->context.ldt->size)) {
+   *desc = NULL;
+   mutex_unlock(>active_mm->context.lock);
+   return -EINVAL;
+   }
+
+   *desc = >active_mm->context.ldt->entries[seg];
+   mutex_unlock(>active_mm->context.lock);
+   return 0;
+   }
+#endif
+   native_store_gdt(_desc);
+
+   /*
+* Bits [15:3] of the segment selector contain the index. Such
+* index needs to be multiplied by 8. However, as the index
+* least significant bit is already in bit 3, we don't have
+* to perform the multiplication.
+*/
+   desc_base = seg & ~(SEGMENT_RPL_MASK | SEGMENT_TI_MASK);
+
+   if (desc_base > gdt_desc.size) {
+   *desc = NULL;
+   return -EINVAL;
+   }
+
+   *desc = (struct desc_struct *)(gdt_desc.address + desc_base);
+   return 0;
+}
+
+/**
  * insn_get_reg_offset_modrm_rm - Obtain register in r/m part of ModRM byte
  * @insn:  Instruction structure containing the ModRM byte
  * @regs:  Set of registers indicated by the ModRM byte
-- 
2.9.3

[PATCH v4 06/17] x86/insn-eval: Add utility function to get segment descriptor

2017-02-22 Thread Ricardo Neri

The segment descriptor contains information that is relevant to how linear
address need to be computed. It contains the default size of addresses as
well as the base address of the segment. Thus, given a segment selector,
we ought look at segment descriptor to correctly calculate the linear
address.

In protected mode, the segment selector might indicate a segment
descriptor from either the global descriptor table or a local descriptor
table. Both cases are considered in this function.

This function is the initial implementation for subsequent functions that
will obtain the aforementioned attributes of the segment descriptor.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 61 
 1 file changed, 61 insertions(+)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 516902e..e6d5dfb 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -5,9 +5,13 @@
  */
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 
 enum reg_type {
@@ -262,6 +266,63 @@ static int get_reg_offset(struct insn *insn, struct 
pt_regs *regs,
 }
 
 /**
+ * get_desc() - Obtain address of segment descriptor
+ * @seg:   Segment selector
+ * @desc:  Pointer to the selected segment descriptor
+ *
+ * Given a segment selector, obtain a memory pointer to the segment
+ * descriptor. Both global and local descriptor tables are supported.
+ * desc will contain the address of the descriptor.
+ *
+ * Return: 0 if success, -EINVAL if failure
+ */
+static int get_desc(unsigned short seg, struct desc_struct **desc)
+{
+   struct desc_ptr gdt_desc = {0, 0};
+   unsigned long desc_base;
+
+   if (!desc)
+   return -EINVAL;
+
+   desc_base = seg & ~(SEGMENT_RPL_MASK | SEGMENT_TI_MASK);
+
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
+   if ((seg & SEGMENT_TI_MASK) == SEGMENT_LDT) {
+   seg >>= 3;
+
+   mutex_lock(>active_mm->context.lock);
+   if (unlikely(!current->active_mm->context.ldt ||
+seg >= current->active_mm->context.ldt->size)) {
+   *desc = NULL;
+   mutex_unlock(>active_mm->context.lock);
+   return -EINVAL;
+   }
+
+   *desc = >active_mm->context.ldt->entries[seg];
+   mutex_unlock(>active_mm->context.lock);
+   return 0;
+   }
+#endif
+   native_store_gdt(_desc);
+
+   /*
+* Bits [15:3] of the segment selector contain the index. Such
+* index needs to be multiplied by 8. However, as the index
+* least significant bit is already in bit 3, we don't have
+* to perform the multiplication.
+*/
+   desc_base = seg & ~(SEGMENT_RPL_MASK | SEGMENT_TI_MASK);
+
+   if (desc_base > gdt_desc.size) {
+   *desc = NULL;
+   return -EINVAL;
+   }
+
+   *desc = (struct desc_struct *)(gdt_desc.address + desc_base);
+   return 0;
+}
+
+/**
  * insn_get_reg_offset_modrm_rm - Obtain register in r/m part of ModRM byte
  * @insn:  Instruction structure containing the ModRM byte
  * @regs:  Set of registers indicated by the ModRM byte
-- 
2.9.3

Re: [PATCH] dma-buf: add support for compat ioctl

2017-02-22 Thread Sumit Semwal

Hi Marek,

On 23 February 2017 at 00:37, Daniel Vetter  wrote:
> On Tue, Feb 21, 2017 at 4:08 PM, Christian König
>  wrote:
>> Am 21.02.2017 um 15:55 schrieb Marek Szyprowski:
>>>
>>> Dear All,
>>>
>>> On 2017-02-21 15:37, Marek Szyprowski wrote:

 Hi Christian,

 On 2017-02-21 14:59, Christian König wrote:
>
> Am 21.02.2017 um 14:21 schrieb Marek Szyprowski:
>>
>> Add compat ioctl support to dma-buf. This lets one to use
>> DMA_BUF_IOCTL_SYNC
>> ioctl from 32bit application on 64bit kernel. Data structures for both
>> 32
>> and 64bit modes are same, so there is no need for additional
>> translation
>> layer.
>
>
> Well I might be wrong, but IIRC compat_ioctl was just optional and if
> not specified unlocked_ioctl was called instead.
>
> If that is true your patch wouldn't have any effect at all.


 Well, then why I got -ENOTTY in the 32bit test app for this ioctl on
 64bit ARM64 kernel without this patch?

>>>
>>> I've checked in fs/compat_ioctl.c, I see no fallback in
>>> COMPAT_SYSCALL_DEFINE3,
>>> so one has to provide compat_ioctl callback to have ioctl working with
>>> 32bit
>>> apps.
>>
>>
>> Then my memory cheated on me.
>>
>> In this case the patch is Reviewed-by: Christian König
>> .
>

Thanks much for spotting this!

> Since you have commit rights for drm-misc, care to push this to
> drm-misc-next-fixes pls? Also I think this warrants a cc: stable,
> clearly an obvious screw-up in creating this api on our side :( So
> feel free to smash my ack on the patch.
>
Daniel, Christian,

I saw this just now, so if Christian hasn't already pulled it into
drm-misc-next-fixes, I'll give it a stab.

> Thanks, Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

Best,
Sumit.

-- 
Thanks and regards,

Sumit Semwal
Linaro Mobile Group - Kernel Team Lead
Linaro.org │ Open source software for ARM SoCs

Re: [PATCH] dma-buf: add support for compat ioctl

2017-02-22 Thread Sumit Semwal

Hi Marek,

On 23 February 2017 at 00:37, Daniel Vetter  wrote:
> On Tue, Feb 21, 2017 at 4:08 PM, Christian König
>  wrote:
>> Am 21.02.2017 um 15:55 schrieb Marek Szyprowski:
>>>
>>> Dear All,
>>>
>>> On 2017-02-21 15:37, Marek Szyprowski wrote:

 Hi Christian,

 On 2017-02-21 14:59, Christian König wrote:
>
> Am 21.02.2017 um 14:21 schrieb Marek Szyprowski:
>>
>> Add compat ioctl support to dma-buf. This lets one to use
>> DMA_BUF_IOCTL_SYNC
>> ioctl from 32bit application on 64bit kernel. Data structures for both
>> 32
>> and 64bit modes are same, so there is no need for additional
>> translation
>> layer.
>
>
> Well I might be wrong, but IIRC compat_ioctl was just optional and if
> not specified unlocked_ioctl was called instead.
>
> If that is true your patch wouldn't have any effect at all.


 Well, then why I got -ENOTTY in the 32bit test app for this ioctl on
 64bit ARM64 kernel without this patch?

>>>
>>> I've checked in fs/compat_ioctl.c, I see no fallback in
>>> COMPAT_SYSCALL_DEFINE3,
>>> so one has to provide compat_ioctl callback to have ioctl working with
>>> 32bit
>>> apps.
>>
>>
>> Then my memory cheated on me.
>>
>> In this case the patch is Reviewed-by: Christian König
>> .
>

Thanks much for spotting this!

> Since you have commit rights for drm-misc, care to push this to
> drm-misc-next-fixes pls? Also I think this warrants a cc: stable,
> clearly an obvious screw-up in creating this api on our side :( So
> feel free to smash my ack on the patch.
>
Daniel, Christian,

I saw this just now, so if Christian hasn't already pulled it into
drm-misc-next-fixes, I'll give it a stab.

> Thanks, Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

Best,
Sumit.

-- 
Thanks and regards,

Sumit Semwal
Linaro Mobile Group - Kernel Team Lead
Linaro.org │ Open source software for ARM SoCs

[PATCH v4 08/17] x86/insn-eval: Add functions to get default operand and address sizes

2017-02-22 Thread Ricardo Neri

These functions read the default values of the address and operand sizes
as specified in the segment descriptor. This information is determined
from the D and L bits. Hence, it can be used for both IA-32e 64-bit and
32-bit legacy modes. For virtual-8086 mode, the default address and
operand sizes are always 2 bytes.

The D bit is only meaningful for code segments. Thus, these functions
always use the code segment selector contained in regs.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/insn-eval.h |  2 +
 arch/x86/lib/insn-eval.c | 80 
 2 files changed, 82 insertions(+)

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
index 0de3083..cd4008251 100644
--- a/arch/x86/include/asm/insn-eval.h
+++ b/arch/x86/include/asm/insn-eval.h
@@ -15,6 +15,8 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs);
 int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs);
 int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
 int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
+unsigned char insn_get_seg_default_address_bytes(struct pt_regs *regs);
+unsigned char insn_get_seg_default_operand_bytes(struct pt_regs *regs);
 unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
int regoff);
 
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 4e3f797..3fe4ddb 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -365,6 +365,86 @@ unsigned long insn_get_seg_base(struct pt_regs *regs, 
struct insn *insn,
 }
 
 /**
+ * insn_get_seg_default_address_bytes - Obtain default address size of segment
+ * @regs:  Set of registers containing the segment selector
+ *
+ * Obtain the default address size as indicated in the segment descriptor
+ * selected in regs' code segment selector. In protected mode, the default
+ * address is determined by inspecting the L and D bits of the segment
+ * descriptor. In virtual-8086 mode, the default is always two bytes.
+ *
+ * Return: Default address size of segment
+ */
+unsigned char insn_get_seg_default_address_bytes(struct pt_regs *regs)
+{
+   struct desc_struct *desc;
+   unsigned short seg;
+   int ret;
+
+   if (v8086_mode(regs))
+   return 2;
+
+   seg = (unsigned short)regs->cs;
+
+   ret = get_desc(seg, );
+   if (ret)
+   return 0;
+
+   switch ((desc->l << 1) | desc->d) {
+   case 0: /* Legacy mode. 16-bit addresses. CS.L=0, CS.D=0 */
+   return 2;
+   case 1: /* Legacy mode. 32-bit addresses. CS.L=0, CS.D=1 */
+   return 4;
+   case 2: /* IA-32e 64-bit mode. 64-bit addresses. CS.L=1, CS.D=0 */
+   return 8;
+   case 3: /* Invalid setting. CS.L=1, CS.D=1 */
+   /* fall through */
+   default:
+   return 0;
+   }
+}
+
+/**
+ * insn_get_seg_default_operand_bytes - Obtain default operand size of segment
+ * @regs:  Set of registers containing the segment selector
+ *
+ * Obtain the default operand size as indicated in the segment descriptor
+ * selected in regs' code segment selector. In protected mode, the default
+ * operand size is determined by inspecting the L and D bits of the segment
+ * descriptor. In virtual-8086 mode, the default is always two bytes.
+ *
+ * Return: Default operand size of segment
+ */
+unsigned char insn_get_seg_default_operand_bytes(struct pt_regs *regs)
+{
+   struct desc_struct *desc;
+   unsigned short seg;
+   int ret;
+
+   if (v8086_mode(regs))
+   return 2;
+
+   seg = (unsigned short)regs->cs;
+
+   ret = get_desc(seg, );
+   if (ret)
+   return 0;
+
+   switch ((desc->l << 1) | desc->d) {
+   case 0: /* Legacy mode. 16-bit or 8-bit operands CS.L=0, CS.D=0 */
+   return 2;
+   case 1: /* Legacy mode. 32- or 8 bit operands CS.L=0, CS.D=1 */
+   /* fall through */
+   case 2: /* IA-32e 64-bit mode. 32- or 8-bit opnds. CS.L=1, CS.D=0 */
+   return 4;
+   case 3: /* Invalid setting. CS.L=1, CS.D=1 */
+   /* fall through */
+   default:
+   return 0;
+   }
+}
+
+/**
  *

[PATCH v4 08/17] x86/insn-eval: Add functions to get default operand and address sizes

2017-02-22 Thread Ricardo Neri

These functions read the default values of the address and operand sizes
as specified in the segment descriptor. This information is determined
from the D and L bits. Hence, it can be used for both IA-32e 64-bit and
32-bit legacy modes. For virtual-8086 mode, the default address and
operand sizes are always 2 bytes.

The D bit is only meaningful for code segments. Thus, these functions
always use the code segment selector contained in regs.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/insn-eval.h |  2 +
 arch/x86/lib/insn-eval.c | 80 
 2 files changed, 82 insertions(+)

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
index 0de3083..cd4008251 100644
--- a/arch/x86/include/asm/insn-eval.h
+++ b/arch/x86/include/asm/insn-eval.h
@@ -15,6 +15,8 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs);
 int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs);
 int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
 int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
+unsigned char insn_get_seg_default_address_bytes(struct pt_regs *regs);
+unsigned char insn_get_seg_default_operand_bytes(struct pt_regs *regs);
 unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
int regoff);
 
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 4e3f797..3fe4ddb 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -365,6 +365,86 @@ unsigned long insn_get_seg_base(struct pt_regs *regs, 
struct insn *insn,
 }
 
 /**
+ * insn_get_seg_default_address_bytes - Obtain default address size of segment
+ * @regs:  Set of registers containing the segment selector
+ *
+ * Obtain the default address size as indicated in the segment descriptor
+ * selected in regs' code segment selector. In protected mode, the default
+ * address is determined by inspecting the L and D bits of the segment
+ * descriptor. In virtual-8086 mode, the default is always two bytes.
+ *
+ * Return: Default address size of segment
+ */
+unsigned char insn_get_seg_default_address_bytes(struct pt_regs *regs)
+{
+   struct desc_struct *desc;
+   unsigned short seg;
+   int ret;
+
+   if (v8086_mode(regs))
+   return 2;
+
+   seg = (unsigned short)regs->cs;
+
+   ret = get_desc(seg, );
+   if (ret)
+   return 0;
+
+   switch ((desc->l << 1) | desc->d) {
+   case 0: /* Legacy mode. 16-bit addresses. CS.L=0, CS.D=0 */
+   return 2;
+   case 1: /* Legacy mode. 32-bit addresses. CS.L=0, CS.D=1 */
+   return 4;
+   case 2: /* IA-32e 64-bit mode. 64-bit addresses. CS.L=1, CS.D=0 */
+   return 8;
+   case 3: /* Invalid setting. CS.L=1, CS.D=1 */
+   /* fall through */
+   default:
+   return 0;
+   }
+}
+
+/**
+ * insn_get_seg_default_operand_bytes - Obtain default operand size of segment
+ * @regs:  Set of registers containing the segment selector
+ *
+ * Obtain the default operand size as indicated in the segment descriptor
+ * selected in regs' code segment selector. In protected mode, the default
+ * operand size is determined by inspecting the L and D bits of the segment
+ * descriptor. In virtual-8086 mode, the default is always two bytes.
+ *
+ * Return: Default operand size of segment
+ */
+unsigned char insn_get_seg_default_operand_bytes(struct pt_regs *regs)
+{
+   struct desc_struct *desc;
+   unsigned short seg;
+   int ret;
+
+   if (v8086_mode(regs))
+   return 2;
+
+   seg = (unsigned short)regs->cs;
+
+   ret = get_desc(seg, );
+   if (ret)
+   return 0;
+
+   switch ((desc->l << 1) | desc->d) {
+   case 0: /* Legacy mode. 16-bit or 8-bit operands CS.L=0, CS.D=0 */
+   return 2;
+   case 1: /* Legacy mode. 32- or 8 bit operands CS.L=0, CS.D=1 */
+   /* fall through */
+   case 2: /* IA-32e 64-bit mode. 32- or 8-bit opnds. CS.L=1, CS.D=0 */
+   return 4;
+   case 3: /* Invalid setting. CS.L=1, CS.D=1 */
+   /* fall through */
+   default:
+   return 0;
+   }
+}
+
+/**
  * insn_get_reg_offset_modrm_rm - Obtain register in r/m part of ModRM byte
  * @insn:  Instruction structure containing the ModRM byte
  * @regs:  Set of registers indicated by the ModRM byte
-- 
2.9.3

[PATCH v4 02/17] x86/mpx: Do not use R/EBP as base in the SIB byte with Mod = 0

2017-02-22 Thread Ricardo Neri

Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software
Developer's Manual volume 2A states that when a SIB byte is used and the
base of the SIB byte points to R/EBP (i.e., base = 5) and the mod part
of the ModRM byte is zero, the value of such register will not be used
as part of the address computation. To signal this, a -EDOM error is
returned to indicate callers that they should ignore the value.

Also, for this particular case, a displacement of 32-bits should follow
the SIB byte if the mod part of ModRM is equal to zero. The instruction
decoder ensures that this is the case.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/mm/mpx.c | 30 +++---
 1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index 6a034bc..f660ddf 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -121,6 +121,17 @@ static int get_reg_offset(struct insn *insn, struct 
pt_regs *regs,
 
case REG_TYPE_BASE:
regno = X86_SIB_BASE(insn->sib.value);
+   /*
+* If mod is 0 and register R/EBP (regno=5) is indicated in the
+* base part of the SIB byte, the value of such register should
+* not be used in the address computation. Also, a 32-bit
+* displacement is expected in this case; the instruction
+* decoder takes care of it. This is true for both R13 and
+* R/EBP as REX.B will not be decoded.
+*/
+   if (regno == 5 && X86_MODRM_MOD(insn->modrm.value) == 0)
+   return -EDOM;
+
if (X86_REX_B(insn->rex_prefix.value))
regno += 8;
break;
@@ -160,16 +171,22 @@ static void __user *mpx_get_addr_ref(struct insn *insn, 
struct pt_regs *regs)
addr = regs_get_register(regs, addr_offset);
} else {
if (insn->sib.nbytes) {
+   /*
+* Negative values in the base and index offset means
+* an error when decoding the SIB byte. Except -EDOM,
+* which means that the registers should not be used
+* in the address computation.
+*/
base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
if (base_offset < 0)
-   goto out_err;
+   if (base_offset == -EDOM)
+   base = 0;
+   else
+   goto out_err;
+   else
+   base = regs_get_register(regs, base_offset);
 
indx_offset = get_reg_offset(insn, regs, 
REG_TYPE_INDEX);
-   /*
-* A negative offset generally means a error, except
-* -EDOM, which means that the contents of the register
-* should not be used as index.
-*/
if (indx_offset < 0)
if (indx_offset == -EDOM)
indx = 0;
@@ -178,7 +195,6 @@ static void __user *mpx_get_addr_ref(struct insn *insn, 
struct pt_regs *regs)
else
indx = regs_get_register(regs, indx_offset);
 
-   base = regs_get_register(regs, base_offset);
addr = base + indx * (1 << X86_SIB_SCALE(sib));
} else {
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
-- 
2.9.3

[PATCH v4 02/17] x86/mpx: Do not use R/EBP as base in the SIB byte with Mod = 0

2017-02-22 Thread Ricardo Neri

Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software
Developer's Manual volume 2A states that when a SIB byte is used and the
base of the SIB byte points to R/EBP (i.e., base = 5) and the mod part
of the ModRM byte is zero, the value of such register will not be used
as part of the address computation. To signal this, a -EDOM error is
returned to indicate callers that they should ignore the value.

Also, for this particular case, a displacement of 32-bits should follow
the SIB byte if the mod part of ModRM is equal to zero. The instruction
decoder ensures that this is the case.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/mm/mpx.c | 30 +++---
 1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index 6a034bc..f660ddf 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -121,6 +121,17 @@ static int get_reg_offset(struct insn *insn, struct 
pt_regs *regs,
 
case REG_TYPE_BASE:
regno = X86_SIB_BASE(insn->sib.value);
+   /*
+* If mod is 0 and register R/EBP (regno=5) is indicated in the
+* base part of the SIB byte, the value of such register should
+* not be used in the address computation. Also, a 32-bit
+* displacement is expected in this case; the instruction
+* decoder takes care of it. This is true for both R13 and
+* R/EBP as REX.B will not be decoded.
+*/
+   if (regno == 5 && X86_MODRM_MOD(insn->modrm.value) == 0)
+   return -EDOM;
+
if (X86_REX_B(insn->rex_prefix.value))
regno += 8;
break;
@@ -160,16 +171,22 @@ static void __user *mpx_get_addr_ref(struct insn *insn, 
struct pt_regs *regs)
addr = regs_get_register(regs, addr_offset);
} else {
if (insn->sib.nbytes) {
+   /*
+* Negative values in the base and index offset means
+* an error when decoding the SIB byte. Except -EDOM,
+* which means that the registers should not be used
+* in the address computation.
+*/
base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
if (base_offset < 0)
-   goto out_err;
+   if (base_offset == -EDOM)
+   base = 0;
+   else
+   goto out_err;
+   else
+   base = regs_get_register(regs, base_offset);
 
indx_offset = get_reg_offset(insn, regs, 
REG_TYPE_INDEX);
-   /*
-* A negative offset generally means a error, except
-* -EDOM, which means that the contents of the register
-* should not be used as index.
-*/
if (indx_offset < 0)
if (indx_offset == -EDOM)
indx = 0;
@@ -178,7 +195,6 @@ static void __user *mpx_get_addr_ref(struct insn *insn, 
struct pt_regs *regs)
else
indx = regs_get_register(regs, indx_offset);
 
-   base = regs_get_register(regs, base_offset);
addr = base + indx * (1 << X86_SIB_SCALE(sib));
} else {
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
-- 
2.9.3

[PATCH v4 00/17] x86: Enable User-Mode Instruction Prevention

2017-02-22 Thread Ricardo Neri

This is v4 of this series. Again, it took me a while to complete the
updates as support for 16-bit address encodings for protected mode
required extra rework. The two previous submissions can be found here [1],
here [2] and here [3].

=== What is UMIP?

User-Mode Instruction Prevention (UMIP) is a security feature present in
new Intel Processors. If enabled, it prevents the execution of certain
instructions if the Current Privilege Level (CPL) is greater than 0. If
these instructions were executed while in CPL > 0, user space applications
could have access to system-wide settings such as the global and local
descriptor tables, the segment selectors to the current task state and the
local descriptor table.

These are the instructions covered by UMIP:
* SGDT - Store Global Descriptor Table
* SIDT - Store Interrupt Descriptor Table
* SLDT - Store Local Descriptor Table
* SMSW - Store Machine Status Word
* STR - Store Task Register

If any of these instructions is executed with CPL > 0, a general protection
exception is issued when UMIP is enabled.

=== How does it impact applications?

There is a caveat, however. Certain applications rely on some of these
instructions to function. An example of this are applications that use
WineHQ[4]. For instance, these applications rely on sidt returning a non-
accessible memory location[5]. During the discussions, it was proposed that
the fault could be relied to the user-space and perform the emulation in
user-mode. However, this would break existing applications until, for
instance, they update to a new WineHQ version. However, this approach
would require UMIP to be disabled by default. The consensus in this forum
is to always enable it.

This patchset initially treated tasks running in virtual-8086 mode as a
special case. However, I received clarification that DOSEMU[6] does not
support applications that use these instructions. It relies on WineHQ for
this [7]. Furthermore, the applications for which the concern was raised
run in protected mode [5].

=== How are UMIP-protected instructions emulated?

This version keeps UMIP enabled at all times and by default. If a general
protection fault caused by the instructions protected by UMIP is
detected, such fault will be fixed-up by returning dummy values as follows:
 
 * SGDT and SIDT return hard-coded dummy values as the base of the global
   descriptor and interrupt descriptor tables. These hard-coded values
   correspond to memory addresses that are near the end of the kernel
   memory map. This is also the case for virtual-8086 mode tasks. In all
   my experiments in x86_32, the base of GDT and IDT was always a 4-byte
   address, even for 16-bit operands. Thus, my emulation code does the
   same. In all cases, the limit of the table is set to 0.
 * STR and SLDT return 0 as the segment selector. This looks appropriate
   since we are providing a dummy value as the base address of the global
   descriptor table.
 * SMSW returns the value with which the CR0 register is programmed in
   head_32/64.S at boot time. This is, the following bits are enabed:
   CR0.0 for Protection Enable, CR.1 for Monitor Coprocessor, CR.4 for
   Extension Type, which will always be 1 in recent processors with UMIP;
   CR.5 for Numeric Error, CR0.16 for Write Protect, CR0.18 for Alignment
   Mask. As per the Intel 64 and IA-32 Architectures Software Developer's
   Manual, SMSW returns a 16-bit results for memory operands. However, when
   the operand is a register, the results can be up to CR0[63:0]. Since
   the emulation code only kicks-in in x86_32, we return up to CR[31:0].
 * The proposed emulation code is handles faults that happens in both
   protected and virtual-8086 mode.

=== How is this series laid out?

++ Fix bugs in MPX address evaluator
I found very useful the code for Intel MPX (Memory Protection Extensions)
used to parse opcodes and the memory locations contained in the general
purpose registers when used as operands. I put some of this code in
a separate library file that both MPX and UMIP can access and avoid code
duplication. Before creating the new library, I fixed a couple of bugs
that I found in how MPX determines the address contained in the
instruction and operands.

++ Provide a new x86 instruction evaluating library
With bugs fixed, the MPX evaluating code is relocated in a new insn-eval.c
library. The basic functionality of this library is extended to obtain the
segment descriptor selected by either segment override prefixes or the
default segment by the involved registers in the calculation of the
effective address. It was also extended to obtain the default address and
operand sizes as well as the segment base address. Also, support to 
process 16-bit address encodings. Armed with this arsenal, it is now
possible to determine the linear address onto which the emulated results
shall be copied.

This code supports Normal 32-bit and 64-bit (i.e., __USER32_CS and/or
__USER_CS) protected mode, virtual-8086 mode, 16-bit

[PATCH v4 07/17] x86/insn-eval: Add utility function to get segment descriptor base address

2017-02-22 Thread Ricardo Neri

With segmentation, the base address of the segment descriptor is needed
to compute a linear address. The segment descriptor used in the address
computation depends on either any segment override prefixes in the in the
instruction or the default segment determined by the registers involved
in the address computation. Thus, both the instruction as well as the
register (specified as the offset from the base of pt_regs) are given as
inputs. Furthermore, if insn is null, overrides are ignored; this is
useful when, for instance, obtaining the base address of the instruction
pointer (the code segment is always used).

The segment selector is determined by get_seg_selector with the inputs
described above. Once the selector is known the base address is
determined. In protected mode, the selector is used to obtain the segment
descriptor and then its base address. In virtual-8086 mode, the base
address is computed as the value of the segment selector shifted 4
positions to the left.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/insn-eval.h |  2 ++
 arch/x86/lib/insn-eval.c | 42 
 2 files changed, 44 insertions(+)

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
index 754211b..0de3083 100644
--- a/arch/x86/include/asm/insn-eval.h
+++ b/arch/x86/include/asm/insn-eval.h
@@ -15,5 +15,7 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs);
 int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs);
 int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
 int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
+unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
+   int regoff);
 
 #endif /* _ASM_X86_INSN_EVAL_H */
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index e6d5dfb..4e3f797 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -323,6 +323,48 @@ static int get_desc(unsigned short seg, struct desc_struct 
**desc)
 }
 
 /**
+ * insn_get_seg_base() - Obtain base address contained in descriptor
+ * @regs:  Set of registers containing the segment selector
+ * @insn:  Instruction structure with selector override prefixes
+ * @regoff:Operand offset, in pt_regs, of which the selector is needed
+ *
+ * Obtain the base address of the segment descriptor as indicated by either any
+ * segment override prefixes contained in insn or the default segment 
applicable
+ * to the register indicated by regoff. regoff is specified as the offset in
+ * bytes from the base of pt_regs. If insn is not null and contain any segment
+ * override prefixes, the override is used instead of the default segment.
+ *
+ * Return: In protected mode, 0 if in CONFIG_X86_64, -1L in case of error,
+ * or the base address indicated in the selected segment descriptor. In
+ * virtual-8086, the segment selector shifted four positions to the right.
+ */
+unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
+   int regoff)
+{
+   struct desc_struct *desc;
+   unsigned short seg;
+   int ret;
+
+   seg = get_segment_selector(regs, insn, regoff);
+
+   if (v8086_mode(regs))
+   /*
+* Base is simply the segment selector sifted 4
+* positions to the right.
+*/
+   return (unsigned long)(seg << 4);
+
+   /* 64-bit mode */
+   if (!seg)
+   return 0;
+   ret = get_desc(seg, );
+   if (ret)
+   return -1L;
+
+   return get_desc_base(desc);
+}
+
+/**
  * insn_get_reg_offset_modrm_rm - Obtain register in r/m part of ModRM byte
  * @insn:  Instruction structure containing the ModRM byte
  * @regs:  Set of registers indicated by the ModRM byte
-- 
2.9.3

[PATCH v4 01/17] x86/mpx: Do not use SIB index if index points to R/ESP

2017-02-22 Thread Ricardo Neri

Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software
Developer's Manual volume 2A states that when memory addressing is used
(i.e., mod part of ModR/M is not 3), a SIB byte is used and the index of
the SIB byte points to the R/ESP (i.e., index = 4), the index should not be
used in the computation of the memory address.

In these cases the address is simply the value present in the register
pointed by the base part of the SIB byte plus the displacement byte.

An example of such instruction could be

insn -0x80(%rsp)

This is represented as:

 [opcode] 4c 23 80

  ModR/M=0x4c: mod: 0x1, reg: 0x1: r/m: 0x4(R/ESP)
  SIB=0x23: sc: 0, index: 0x100(R/ESP), base: 0x11(R/EBX):
  Displacement -0x80

The correct address is (base) + displacement; no index is used.

We can achieve the desired effect of not using the index by making
get_reg_offset return -EDOM in this particular case. This value indicates
callers that they should not use the index to calculate the address.
EINVAL continues to indicate that an error when decoding the SIB byte.

Care is taken to allow R12 to be used as index, which is a valid scenario.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/mm/mpx.c | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index 86c2d96..6a034bc 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -110,6 +110,13 @@ static int get_reg_offset(struct insn *insn, struct 
pt_regs *regs,
regno = X86_SIB_INDEX(insn->sib.value);
if (X86_REX_X(insn->rex_prefix.value))
regno += 8;
+   /*
+* If mod !=3, register R/ESP (regno=4) is not used as index in
+* the address computation. Check is done after looking at REX.X
+* This is because R12 (regno=12) can be used as an index.
+*/
+   if (regno == 4 && X86_MODRM_MOD(insn->modrm.value) != 3)
+   return -EDOM;
break;
 
case REG_TYPE_BASE:
@@ -158,11 +165,20 @@ static void __user *mpx_get_addr_ref(struct insn *insn, 
struct pt_regs *regs)
goto out_err;
 
indx_offset = get_reg_offset(insn, regs, 
REG_TYPE_INDEX);
+   /*
+* A negative offset generally means a error, except
+* -EDOM, which means that the contents of the register
+* should not be used as index.
+*/
if (indx_offset < 0)
-   goto out_err;
+   if (indx_offset == -EDOM)
+   indx = 0;
+   else
+   goto out_err;
+   else
+   indx = regs_get_register(regs, indx_offset);
 
base = regs_get_register(regs, base_offset);
-   indx = regs_get_register(regs, indx_offset);
addr = base + indx * (1 << X86_SIB_SCALE(sib));
} else {
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
-- 
2.9.3

[PATCH v4 00/17] x86: Enable User-Mode Instruction Prevention

2017-02-22 Thread Ricardo Neri

This is v4 of this series. Again, it took me a while to complete the
updates as support for 16-bit address encodings for protected mode
required extra rework. The two previous submissions can be found here [1],
here [2] and here [3].

=== What is UMIP?

User-Mode Instruction Prevention (UMIP) is a security feature present in
new Intel Processors. If enabled, it prevents the execution of certain
instructions if the Current Privilege Level (CPL) is greater than 0. If
these instructions were executed while in CPL > 0, user space applications
could have access to system-wide settings such as the global and local
descriptor tables, the segment selectors to the current task state and the
local descriptor table.

These are the instructions covered by UMIP:
* SGDT - Store Global Descriptor Table
* SIDT - Store Interrupt Descriptor Table
* SLDT - Store Local Descriptor Table
* SMSW - Store Machine Status Word
* STR - Store Task Register

If any of these instructions is executed with CPL > 0, a general protection
exception is issued when UMIP is enabled.

=== How does it impact applications?

There is a caveat, however. Certain applications rely on some of these
instructions to function. An example of this are applications that use
WineHQ[4]. For instance, these applications rely on sidt returning a non-
accessible memory location[5]. During the discussions, it was proposed that
the fault could be relied to the user-space and perform the emulation in
user-mode. However, this would break existing applications until, for
instance, they update to a new WineHQ version. However, this approach
would require UMIP to be disabled by default. The consensus in this forum
is to always enable it.

This patchset initially treated tasks running in virtual-8086 mode as a
special case. However, I received clarification that DOSEMU[6] does not
support applications that use these instructions. It relies on WineHQ for
this [7]. Furthermore, the applications for which the concern was raised
run in protected mode [5].

=== How are UMIP-protected instructions emulated?

This version keeps UMIP enabled at all times and by default. If a general
protection fault caused by the instructions protected by UMIP is
detected, such fault will be fixed-up by returning dummy values as follows:
 
 * SGDT and SIDT return hard-coded dummy values as the base of the global
   descriptor and interrupt descriptor tables. These hard-coded values
   correspond to memory addresses that are near the end of the kernel
   memory map. This is also the case for virtual-8086 mode tasks. In all
   my experiments in x86_32, the base of GDT and IDT was always a 4-byte
   address, even for 16-bit operands. Thus, my emulation code does the
   same. In all cases, the limit of the table is set to 0.
 * STR and SLDT return 0 as the segment selector. This looks appropriate
   since we are providing a dummy value as the base address of the global
   descriptor table.
 * SMSW returns the value with which the CR0 register is programmed in
   head_32/64.S at boot time. This is, the following bits are enabed:
   CR0.0 for Protection Enable, CR.1 for Monitor Coprocessor, CR.4 for
   Extension Type, which will always be 1 in recent processors with UMIP;
   CR.5 for Numeric Error, CR0.16 for Write Protect, CR0.18 for Alignment
   Mask. As per the Intel 64 and IA-32 Architectures Software Developer's
   Manual, SMSW returns a 16-bit results for memory operands. However, when
   the operand is a register, the results can be up to CR0[63:0]. Since
   the emulation code only kicks-in in x86_32, we return up to CR[31:0].
 * The proposed emulation code is handles faults that happens in both
   protected and virtual-8086 mode.

=== How is this series laid out?

++ Fix bugs in MPX address evaluator
I found very useful the code for Intel MPX (Memory Protection Extensions)
used to parse opcodes and the memory locations contained in the general
purpose registers when used as operands. I put some of this code in
a separate library file that both MPX and UMIP can access and avoid code
duplication. Before creating the new library, I fixed a couple of bugs
that I found in how MPX determines the address contained in the
instruction and operands.

++ Provide a new x86 instruction evaluating library
With bugs fixed, the MPX evaluating code is relocated in a new insn-eval.c
library. The basic functionality of this library is extended to obtain the
segment descriptor selected by either segment override prefixes or the
default segment by the involved registers in the calculation of the
effective address. It was also extended to obtain the default address and
operand sizes as well as the segment base address. Also, support to 
process 16-bit address encodings. Armed with this arsenal, it is now
possible to determine the linear address onto which the emulated results
shall be copied.

This code supports Normal 32-bit and 64-bit (i.e., __USER32_CS and/or
__USER_CS) protected mode, virtual-8086 mode, 16-bit

[PATCH v4 07/17] x86/insn-eval: Add utility function to get segment descriptor base address

2017-02-22 Thread Ricardo Neri

With segmentation, the base address of the segment descriptor is needed
to compute a linear address. The segment descriptor used in the address
computation depends on either any segment override prefixes in the in the
instruction or the default segment determined by the registers involved
in the address computation. Thus, both the instruction as well as the
register (specified as the offset from the base of pt_regs) are given as
inputs. Furthermore, if insn is null, overrides are ignored; this is
useful when, for instance, obtaining the base address of the instruction
pointer (the code segment is always used).

The segment selector is determined by get_seg_selector with the inputs
described above. Once the selector is known the base address is
determined. In protected mode, the selector is used to obtain the segment
descriptor and then its base address. In virtual-8086 mode, the base
address is computed as the value of the segment selector shifted 4
positions to the left.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/insn-eval.h |  2 ++
 arch/x86/lib/insn-eval.c | 42 
 2 files changed, 44 insertions(+)

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
index 754211b..0de3083 100644
--- a/arch/x86/include/asm/insn-eval.h
+++ b/arch/x86/include/asm/insn-eval.h
@@ -15,5 +15,7 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs);
 int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs);
 int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
 int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
+unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
+   int regoff);
 
 #endif /* _ASM_X86_INSN_EVAL_H */
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index e6d5dfb..4e3f797 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -323,6 +323,48 @@ static int get_desc(unsigned short seg, struct desc_struct 
**desc)
 }
 
 /**
+ * insn_get_seg_base() - Obtain base address contained in descriptor
+ * @regs:  Set of registers containing the segment selector
+ * @insn:  Instruction structure with selector override prefixes
+ * @regoff:Operand offset, in pt_regs, of which the selector is needed
+ *
+ * Obtain the base address of the segment descriptor as indicated by either any
+ * segment override prefixes contained in insn or the default segment 
applicable
+ * to the register indicated by regoff. regoff is specified as the offset in
+ * bytes from the base of pt_regs. If insn is not null and contain any segment
+ * override prefixes, the override is used instead of the default segment.
+ *
+ * Return: In protected mode, 0 if in CONFIG_X86_64, -1L in case of error,
+ * or the base address indicated in the selected segment descriptor. In
+ * virtual-8086, the segment selector shifted four positions to the right.
+ */
+unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
+   int regoff)
+{
+   struct desc_struct *desc;
+   unsigned short seg;
+   int ret;
+
+   seg = get_segment_selector(regs, insn, regoff);
+
+   if (v8086_mode(regs))
+   /*
+* Base is simply the segment selector sifted 4
+* positions to the right.
+*/
+   return (unsigned long)(seg << 4);
+
+   /* 64-bit mode */
+   if (!seg)
+   return 0;
+   ret = get_desc(seg, );
+   if (ret)
+   return -1L;
+
+   return get_desc_base(desc);
+}
+
+/**
  * insn_get_reg_offset_modrm_rm - Obtain register in r/m part of ModRM byte
  * @insn:  Instruction structure containing the ModRM byte
  * @regs:  Set of registers indicated by the ModRM byte
-- 
2.9.3

[PATCH v4 01/17] x86/mpx: Do not use SIB index if index points to R/ESP

2017-02-22 Thread Ricardo Neri

Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software
Developer's Manual volume 2A states that when memory addressing is used
(i.e., mod part of ModR/M is not 3), a SIB byte is used and the index of
the SIB byte points to the R/ESP (i.e., index = 4), the index should not be
used in the computation of the memory address.

In these cases the address is simply the value present in the register
pointed by the base part of the SIB byte plus the displacement byte.

An example of such instruction could be

insn -0x80(%rsp)

This is represented as:

 [opcode] 4c 23 80

  ModR/M=0x4c: mod: 0x1, reg: 0x1: r/m: 0x4(R/ESP)
  SIB=0x23: sc: 0, index: 0x100(R/ESP), base: 0x11(R/EBX):
  Displacement -0x80

The correct address is (base) + displacement; no index is used.

We can achieve the desired effect of not using the index by making
get_reg_offset return -EDOM in this particular case. This value indicates
callers that they should not use the index to calculate the address.
EINVAL continues to indicate that an error when decoding the SIB byte.

Care is taken to allow R12 to be used as index, which is a valid scenario.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/mm/mpx.c | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index 86c2d96..6a034bc 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -110,6 +110,13 @@ static int get_reg_offset(struct insn *insn, struct 
pt_regs *regs,
regno = X86_SIB_INDEX(insn->sib.value);
if (X86_REX_X(insn->rex_prefix.value))
regno += 8;
+   /*
+* If mod !=3, register R/ESP (regno=4) is not used as index in
+* the address computation. Check is done after looking at REX.X
+* This is because R12 (regno=12) can be used as an index.
+*/
+   if (regno == 4 && X86_MODRM_MOD(insn->modrm.value) != 3)
+   return -EDOM;
break;
 
case REG_TYPE_BASE:
@@ -158,11 +165,20 @@ static void __user *mpx_get_addr_ref(struct insn *insn, 
struct pt_regs *regs)
goto out_err;
 
indx_offset = get_reg_offset(insn, regs, 
REG_TYPE_INDEX);
+   /*
+* A negative offset generally means a error, except
+* -EDOM, which means that the contents of the register
+* should not be used as index.
+*/
if (indx_offset < 0)
-   goto out_err;
+   if (indx_offset == -EDOM)
+   indx = 0;
+   else
+   goto out_err;
+   else
+   indx = regs_get_register(regs, indx_offset);
 
base = regs_get_register(regs, base_offset);
-   indx = regs_get_register(regs, indx_offset);
addr = base + indx * (1 << X86_SIB_SCALE(sib));
} else {
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
-- 
2.9.3

[PATCH v4 04/17] x86/insn-eval: Add utility functions to get register offsets

2017-02-22 Thread Ricardo Neri

The function insn_get_reg_offset takes as argument an enumeration that
indicates the type of offset that is returned: the R/M part of the ModRM
byte, the index of the SIB byte or the base of the SIB byte. Callers of
this function would need the definition of such enumeration. This is not
needed. Instead, helper functions can be defined for this purpose can be
added. These functions are useful in cases when, for instance, the caller
needs to decide whether the operand is a register or a memory location by
looking at the mod part of the ModRM byte.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/insn-eval.h |  3 +++
 arch/x86/lib/insn-eval.c | 51 
 2 files changed, 54 insertions(+)

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
index 5cab1b1..754211b 100644
--- a/arch/x86/include/asm/insn-eval.h
+++ b/arch/x86/include/asm/insn-eval.h
@@ -12,5 +12,8 @@
 #include 
 
 void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
+int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs);
+int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
+int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
 
 #endif /* _ASM_X86_INSN_EVAL_H */
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 2ebfaa4..6c62fbf 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -98,6 +98,57 @@ static int get_reg_offset(struct insn *insn, struct pt_regs 
*regs,
return regoff[regno];
 }
 
+/**
+ * insn_get_reg_offset_modrm_rm - Obtain register in r/m part of ModRM byte
+ * @insn:  Instruction structure containing the ModRM byte
+ * @regs:  Set of registers indicated by the ModRM byte
+ *
+ * Obtain the register indicated by the r/m part of the ModRM byte. The
+ * register is obtained as an offset from the base of pt_regs. In specific
+ * cases, the returned value can be -EDOM to indicate that the particular value
+ * of ModRM does not refer to a register.
+ *
+ * Return: Register indicated by r/m, as an offset within struct pt_regs
+ */
+int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs)
+{
+   return get_reg_offset(insn, regs, REG_TYPE_RM);
+}
+
+/**
+ * insn_get_reg_offset_sib_base - Obtain register in base part of SiB byte
+ * @insn:  Instruction structure containing the SiB byte
+ * @regs:  Set of registers indicated by the SiB byte
+ *
+ * Obtain the register indicated by the base part of the SiB byte. The
+ * register is obtained as an offset from the base of pt_regs. In specific
+ * cases, the returned value can be -EDOM to indicate that the particular value
+ * of SiB does not refer to a register.
+ *
+ * Return: Register indicated by SiB's base, as an offset within struct pt_regs
+ */
+int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs)
+{
+   return get_reg_offset(insn, regs, REG_TYPE_BASE);
+}
+
+/**
+ * insn_get_reg_offset_sib_index - Obtain register in index part of SiB byte
+ * @insn:  Instruction structure containing the SiB byte
+ * @regs:  Set of registers indicated by the SiB byte
+ *
+ * Obtain the register indicated by the index part of the SiB byte. The
+ * register is obtained as an offset from the index of pt_regs. In specific
+ * cases, the returned value can be -EDOM to indicate that the particular value
+ * of SiB does not refer to a register.
+ *
+ * Return: Register indicated by SiB's base, as an offset within struct pt_regs
+ */
+int insn_get_reg_offset_sib_index(struct insn *insn, struct pt_regs *regs)
+{
+   return get_reg_offset(insn, regs, REG_TYPE_INDEX);
+}
+
 /*
  * return the address being referenced be instruction
  * for rm=3 returning the content of the rm reg
-- 
2.9.3

[PATCH v4 04/17] x86/insn-eval: Add utility functions to get register offsets

2017-02-22 Thread Ricardo Neri

The function insn_get_reg_offset takes as argument an enumeration that
indicates the type of offset that is returned: the R/M part of the ModRM
byte, the index of the SIB byte or the base of the SIB byte. Callers of
this function would need the definition of such enumeration. This is not
needed. Instead, helper functions can be defined for this purpose can be
added. These functions are useful in cases when, for instance, the caller
needs to decide whether the operand is a register or a memory location by
looking at the mod part of the ModRM byte.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/insn-eval.h |  3 +++
 arch/x86/lib/insn-eval.c | 51 
 2 files changed, 54 insertions(+)

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
index 5cab1b1..754211b 100644
--- a/arch/x86/include/asm/insn-eval.h
+++ b/arch/x86/include/asm/insn-eval.h
@@ -12,5 +12,8 @@
 #include 
 
 void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
+int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs);
+int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
+int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
 
 #endif /* _ASM_X86_INSN_EVAL_H */
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 2ebfaa4..6c62fbf 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -98,6 +98,57 @@ static int get_reg_offset(struct insn *insn, struct pt_regs 
*regs,
return regoff[regno];
 }
 
+/**
+ * insn_get_reg_offset_modrm_rm - Obtain register in r/m part of ModRM byte
+ * @insn:  Instruction structure containing the ModRM byte
+ * @regs:  Set of registers indicated by the ModRM byte
+ *
+ * Obtain the register indicated by the r/m part of the ModRM byte. The
+ * register is obtained as an offset from the base of pt_regs. In specific
+ * cases, the returned value can be -EDOM to indicate that the particular value
+ * of ModRM does not refer to a register.
+ *
+ * Return: Register indicated by r/m, as an offset within struct pt_regs
+ */
+int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs)
+{
+   return get_reg_offset(insn, regs, REG_TYPE_RM);
+}
+
+/**
+ * insn_get_reg_offset_sib_base - Obtain register in base part of SiB byte
+ * @insn:  Instruction structure containing the SiB byte
+ * @regs:  Set of registers indicated by the SiB byte
+ *
+ * Obtain the register indicated by the base part of the SiB byte. The
+ * register is obtained as an offset from the base of pt_regs. In specific
+ * cases, the returned value can be -EDOM to indicate that the particular value
+ * of SiB does not refer to a register.
+ *
+ * Return: Register indicated by SiB's base, as an offset within struct pt_regs
+ */
+int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs)
+{
+   return get_reg_offset(insn, regs, REG_TYPE_BASE);
+}
+
+/**
+ * insn_get_reg_offset_sib_index - Obtain register in index part of SiB byte
+ * @insn:  Instruction structure containing the SiB byte
+ * @regs:  Set of registers indicated by the SiB byte
+ *
+ * Obtain the register indicated by the index part of the SiB byte. The
+ * register is obtained as an offset from the index of pt_regs. In specific
+ * cases, the returned value can be -EDOM to indicate that the particular value
+ * of SiB does not refer to a register.
+ *
+ * Return: Register indicated by SiB's base, as an offset within struct pt_regs
+ */
+int insn_get_reg_offset_sib_index(struct insn *insn, struct pt_regs *regs)
+{
+   return get_reg_offset(insn, regs, REG_TYPE_INDEX);
+}
+
 /*
  * return the address being referenced be instruction
  * for rm=3 returning the content of the rm reg
-- 
2.9.3

[PATCH v4 13/17] x86: Add emulation code for UMIP instructions

2017-02-22 Thread Ricardo Neri

The feature User-Mode Instruction Prevention present in recent Intel
processor prevents a group of instructions from being executed with
CPL > 0. Otherwise, a general protection fault is issued.

Rather than relaying this fault to the user space (in the form of a SIGSEGV
signal), the instructions protected by UMIP can be emulated to provide
dummy results. This allows to conserve the current kernel behavior and not
reveal the system resources that UMIP intends to protect (the global
descriptor and interrupt descriptor tables, the segment selectors of the
local descriptor table and the task state and the machine status word).

This emulation is needed because certain applications (e.g., WineHQ) rely
on this subset of instructions to function.

The instructions protected by UMIP can be split in two groups. Those who
return a kernel memory address (sgdt and sidt) and those who return a
value (sldt, str and smsw).

For the instructions that return a kernel memory address, applications
such as WineHQ rely on the result being located in the kernel memory space.
The result is emulated as a hard-coded value that, lies close to the top
of the kernel memory. The limit for the GDT and the IDT are set to zero.

The instructions sldt and str return a segment selector relative to the
base address of the global descriptor table. Since the actual address of
such table is not revealed, it makes sense to emulate the result as zero.

The instruction smsw is emulated to return the value that the register CR0
has at boot time as set in the head_32.

Care is taken to appropriately emulate the results when segmentation is
used. This is, rather than relying on USER_DS and USER_CS, the function
insn_get_addr_ref inspects the segment descriptor pointed by the registers
in pt_regs. This ensures that we correctly obtain the segment base address
and the address and operand sizes even if the user space application uses
local descriptor table.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: Liang Z. Li 
Cc: Alexandre Julliard 
Cc: Stas Sergeev 
Cc: x...@kernel.org
Cc: linux-ms...@vger.kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/umip.h |  15 +++
 arch/x86/kernel/Makefile|   1 +
 arch/x86/kernel/umip.c  | 262 
 3 files changed, 278 insertions(+)
 create mode 100644 arch/x86/include/asm/umip.h
 create mode 100644 arch/x86/kernel/umip.c

diff --git a/arch/x86/include/asm/umip.h b/arch/x86/include/asm/umip.h
new file mode 100644
index 000..077b236
--- /dev/null
+++ b/arch/x86/include/asm/umip.h
@@ -0,0 +1,15 @@
+#ifndef _ASM_X86_UMIP_H
+#define _ASM_X86_UMIP_H
+
+#include 
+#include 
+
+#ifdef CONFIG_X86_INTEL_UMIP
+bool fixup_umip_exception(struct pt_regs *regs);
+#else
+static inline bool fixup_umip_exception(struct pt_regs *regs)
+{
+   return false;
+}
+#endif  /* CONFIG_X86_INTEL_UMIP */
+#endif  /* _ASM_X86_UMIP_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index bdcdb3b..424b58f 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -123,6 +123,7 @@ obj-$(CONFIG_EFI)   += sysfb_efi.o
 obj-$(CONFIG_PERF_EVENTS)  += perf_regs.o
 obj-$(CONFIG_TRACING)  += tracepoint.o
 obj-$(CONFIG_SCHED_MC_PRIO)+= itmt.o
+obj-$(CONFIG_X86_INTEL_UMIP)   += umip.o
 
 ifdef CONFIG_FRAME_POINTER
 obj-y  += unwind_frame.o
diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
new file mode 100644
index 000..b16542a
--- /dev/null
+++ b/arch/x86/kernel/umip.c
@@ -0,0 +1,262 @@
+/*
+ * umip.c Emulation for instruction protected by the Intel User-Mode
+ * Instruction Prevention. The instructions are:
+ *sgdt
+ *sldt
+ *sidt
+ *str
+ *smsw
+ *
+ * Copyright (c) 2016, Intel Corporation.
+ * Ricardo Neri 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * == Base addresses of GDT and IDT
+ * Some applications to function rely finding the global descriptor table (GDT)
+ * and the interrupt descriptor table (IDT) in kernel memory.
+ * For

[PATCH v4 09/17] x86/insn-eval: Do not use R/EBP as base if mod in ModRM is zero

2017-02-22 Thread Ricardo Neri

Section 2.2.1.3 of the Intel 64 and IA-32 Architectures Software
Developer's Manual volume 2A states that when the mod part of the ModRM
byte is zero and R/EBP is specified in the R/M part of such bit, the value
of the aforementioned register should not be used in the address
computation. Instead, a 32-bit displacement is expected. The instruction
decoder takes care of setting the displacement to the expected value.
Returning -EDOM signals callers that they should ignore the value of such
register when computing the address encoded in the instruction operands.

Also, callers should exercise care to correctly interpret this particular
case. In IA-32e 64-bit mode, the address is given by the displacement plus
the value of the RIP. In IA-32e compatibility mode, the value of EIP is
ignored. This correction is done for our insn_get_addr_ref.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 33 ++---
 1 file changed, 30 insertions(+), 3 deletions(-)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 3fe4ddb..d6525c2 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -218,6 +218,14 @@ static int get_reg_offset(struct insn *insn, struct 
pt_regs *regs,
switch (type) {
case REG_TYPE_RM:
regno = X86_MODRM_RM(insn->modrm.value);
+   /* if mod=0, register R/EBP is not used in the address
+* computation. Instead, a 32-bit displacement is expected;
+* the instruction decoder takes care of reading such
+* displacement. This is true for both R/EBP and R13, as the
+* REX.B bit is not decoded.
+*/
+   if (regno == 5 && X86_MODRM_MOD(insn->modrm.value) == 0)
+   return -EDOM;
if (X86_REX_B(insn->rex_prefix.value))
regno += 8;
break;
@@ -544,10 +552,29 @@ static void __user *insn_get_addr_ref(struct insn *insn, 
struct pt_regs *regs)
 
addr = base + indx * (1 << X86_SIB_SCALE(sib));
} else {
+   unsigned char addr_bytes;
+
+   addr_bytes = insn_get_seg_default_address_bytes(regs);
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
-   if (addr_offset < 0)
-   goto out_err;
-   addr = regs_get_register(regs, addr_offset);
+   if (addr_offset < 0) {
+   /* -EDOM means that we must ignore the
+* address_offset. The only case in which we
+* see this value is when R/M points to R/EBP.
+* In such a case, the address involves using
+* the instruction pointer for 64-bit mode.
+*/
+   if (addr_offset == -EDOM) {
+   /* if in 64-bit mode */
+   if (addr_bytes == 8)
+   addr = regs->ip;
+   else
+   addr = 0;
+   } else {
+   goto out_err;
+   }
+   } else {
+   addr = regs_get_register(regs, addr_offset);
+   }
}
addr += insn->displacement.value;
}
-- 
2.9.3

[PATCH v4 13/17] x86: Add emulation code for UMIP instructions

2017-02-22 Thread Ricardo Neri

The feature User-Mode Instruction Prevention present in recent Intel
processor prevents a group of instructions from being executed with
CPL > 0. Otherwise, a general protection fault is issued.

Rather than relaying this fault to the user space (in the form of a SIGSEGV
signal), the instructions protected by UMIP can be emulated to provide
dummy results. This allows to conserve the current kernel behavior and not
reveal the system resources that UMIP intends to protect (the global
descriptor and interrupt descriptor tables, the segment selectors of the
local descriptor table and the task state and the machine status word).

This emulation is needed because certain applications (e.g., WineHQ) rely
on this subset of instructions to function.

The instructions protected by UMIP can be split in two groups. Those who
return a kernel memory address (sgdt and sidt) and those who return a
value (sldt, str and smsw).

For the instructions that return a kernel memory address, applications
such as WineHQ rely on the result being located in the kernel memory space.
The result is emulated as a hard-coded value that, lies close to the top
of the kernel memory. The limit for the GDT and the IDT are set to zero.

The instructions sldt and str return a segment selector relative to the
base address of the global descriptor table. Since the actual address of
such table is not revealed, it makes sense to emulate the result as zero.

The instruction smsw is emulated to return the value that the register CR0
has at boot time as set in the head_32.

Care is taken to appropriately emulate the results when segmentation is
used. This is, rather than relying on USER_DS and USER_CS, the function
insn_get_addr_ref inspects the segment descriptor pointed by the registers
in pt_regs. This ensures that we correctly obtain the segment base address
and the address and operand sizes even if the user space application uses
local descriptor table.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: Liang Z. Li 
Cc: Alexandre Julliard 
Cc: Stas Sergeev 
Cc: x...@kernel.org
Cc: linux-ms...@vger.kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/umip.h |  15 +++
 arch/x86/kernel/Makefile|   1 +
 arch/x86/kernel/umip.c  | 262 
 3 files changed, 278 insertions(+)
 create mode 100644 arch/x86/include/asm/umip.h
 create mode 100644 arch/x86/kernel/umip.c

diff --git a/arch/x86/include/asm/umip.h b/arch/x86/include/asm/umip.h
new file mode 100644
index 000..077b236
--- /dev/null
+++ b/arch/x86/include/asm/umip.h
@@ -0,0 +1,15 @@
+#ifndef _ASM_X86_UMIP_H
+#define _ASM_X86_UMIP_H
+
+#include 
+#include 
+
+#ifdef CONFIG_X86_INTEL_UMIP
+bool fixup_umip_exception(struct pt_regs *regs);
+#else
+static inline bool fixup_umip_exception(struct pt_regs *regs)
+{
+   return false;
+}
+#endif  /* CONFIG_X86_INTEL_UMIP */
+#endif  /* _ASM_X86_UMIP_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index bdcdb3b..424b58f 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -123,6 +123,7 @@ obj-$(CONFIG_EFI)   += sysfb_efi.o
 obj-$(CONFIG_PERF_EVENTS)  += perf_regs.o
 obj-$(CONFIG_TRACING)  += tracepoint.o
 obj-$(CONFIG_SCHED_MC_PRIO)+= itmt.o
+obj-$(CONFIG_X86_INTEL_UMIP)   += umip.o
 
 ifdef CONFIG_FRAME_POINTER
 obj-y  += unwind_frame.o
diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
new file mode 100644
index 000..b16542a
--- /dev/null
+++ b/arch/x86/kernel/umip.c
@@ -0,0 +1,262 @@
+/*
+ * umip.c Emulation for instruction protected by the Intel User-Mode
+ * Instruction Prevention. The instructions are:
+ *sgdt
+ *sldt
+ *sidt
+ *str
+ *smsw
+ *
+ * Copyright (c) 2016, Intel Corporation.
+ * Ricardo Neri 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * == Base addresses of GDT and IDT
+ * Some applications to function rely finding the global descriptor table (GDT)
+ * and the interrupt descriptor table (IDT) in kernel memory.
+ * For x86_32, the selected values do not match any particular hole, but it
+ * suffices to provide a memory location within kernel memory.
+ *
+ * == CRO flags for SMSW
+ * Use the flags given when booting, as found in head_32.S
+ */
+
+#define CR0_STATE (X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | X86_CR0_NE | \
+  X86_CR0_WP | X86_CR0_AM)
+#define UMIP_DUMMY_GDT_BASE 0xfffe
+#define UMIP_DUMMY_IDT_BASE 0x
+
+/*
+ * Definitions for x86 page fault error code bits. Only a simple
+ * pagefault during a write in user

[PATCH v4 09/17] x86/insn-eval: Do not use R/EBP as base if mod in ModRM is zero

2017-02-22 Thread Ricardo Neri

Section 2.2.1.3 of the Intel 64 and IA-32 Architectures Software
Developer's Manual volume 2A states that when the mod part of the ModRM
byte is zero and R/EBP is specified in the R/M part of such bit, the value
of the aforementioned register should not be used in the address
computation. Instead, a 32-bit displacement is expected. The instruction
decoder takes care of setting the displacement to the expected value.
Returning -EDOM signals callers that they should ignore the value of such
register when computing the address encoded in the instruction operands.

Also, callers should exercise care to correctly interpret this particular
case. In IA-32e 64-bit mode, the address is given by the displacement plus
the value of the RIP. In IA-32e compatibility mode, the value of EIP is
ignored. This correction is done for our insn_get_addr_ref.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 33 ++---
 1 file changed, 30 insertions(+), 3 deletions(-)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 3fe4ddb..d6525c2 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -218,6 +218,14 @@ static int get_reg_offset(struct insn *insn, struct 
pt_regs *regs,
switch (type) {
case REG_TYPE_RM:
regno = X86_MODRM_RM(insn->modrm.value);
+   /* if mod=0, register R/EBP is not used in the address
+* computation. Instead, a 32-bit displacement is expected;
+* the instruction decoder takes care of reading such
+* displacement. This is true for both R/EBP and R13, as the
+* REX.B bit is not decoded.
+*/
+   if (regno == 5 && X86_MODRM_MOD(insn->modrm.value) == 0)
+   return -EDOM;
if (X86_REX_B(insn->rex_prefix.value))
regno += 8;
break;
@@ -544,10 +552,29 @@ static void __user *insn_get_addr_ref(struct insn *insn, 
struct pt_regs *regs)
 
addr = base + indx * (1 << X86_SIB_SCALE(sib));
} else {
+   unsigned char addr_bytes;
+
+   addr_bytes = insn_get_seg_default_address_bytes(regs);
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
-   if (addr_offset < 0)
-   goto out_err;
-   addr = regs_get_register(regs, addr_offset);
+   if (addr_offset < 0) {
+   /* -EDOM means that we must ignore the
+* address_offset. The only case in which we
+* see this value is when R/M points to R/EBP.
+* In such a case, the address involves using
+* the instruction pointer for 64-bit mode.
+*/
+   if (addr_offset == -EDOM) {
+   /* if in 64-bit mode */
+   if (addr_bytes == 8)
+   addr = regs->ip;
+   else
+   addr = 0;
+   } else {
+   goto out_err;
+   }
+   } else {
+   addr = regs_get_register(regs, addr_offset);
+   }
}
addr += insn->displacement.value;
}
-- 
2.9.3

[PATCH v4 16/17] x86: Enable User-Mode Instruction Prevention

2017-02-22 Thread Ricardo Neri

User_mode Instruction Prevention (UMIP) is enabled by setting/clearing a
bit in %cr4.

It makes sense to enable UMIP at some point while booting, before user
spaces come up. Like SMAP and SMEP, is not critical to have it enabled
very early during boot. This is because UMIP is relevant only when there is
a userspace to be protected from. Given the similarities in relevance, it
makes sense to enable UMIP along with SMAP and SMEP.

UMIP is enabled by default. It can be disabled by adding clearcpuid=514
to the kernel parameters.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: Liang Z. Li 
Cc: Alexandre Julliard 
Cc: Stas Sergeev 
Cc: x...@kernel.org
Cc: linux-ms...@vger.kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/Kconfig | 10 ++
 arch/x86/kernel/cpu/common.c | 16 +++-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f8fbfc5..8819fb2 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1733,6 +1733,16 @@ config X86_SMAP
 
  If unsure, say Y.
 
+config X86_INTEL_UMIP
+   def_bool y
+   depends on CPU_SUP_INTEL
+   prompt "User Mode Instruction Prevention" if EXPERT
+   ---help---
+ The User Mode Instruction Prevention (UMIP) is a security
+ feature in newer Intel processors. If enabled, a general
+ protection fault is issued if the instructions SGDT, SLDT,
+ SIDT, SMSW and STR are executed in user mode.
+
 config X86_INTEL_MPX
prompt "Intel MPX (Memory Protection Extensions)"
def_bool n
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index c188ae5..8668828 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -312,6 +312,19 @@ static __always_inline void setup_smap(struct cpuinfo_x86 
*c)
}
 }
 
+static __always_inline void setup_umip(struct cpuinfo_x86 *c)
+{
+   if (cpu_feature_enabled(X86_FEATURE_UMIP) &&
+   cpu_has(c, X86_FEATURE_UMIP))
+   cr4_set_bits(X86_CR4_UMIP);
+   else
+   /*
+* Make sure UMIP is disabled in case it was enabled in a
+* previous boot (e.g., via kexec).
+*/
+   cr4_clear_bits(X86_CR4_UMIP);
+}
+
 /*
  * Protection Keys are not available in 32-bit mode.
  */
@@ -1083,9 +1096,10 @@ static void identify_cpu(struct cpuinfo_x86 *c)
/* Disable the PN if appropriate */
squash_the_stupid_serial_number(c);
 
-   /* Set up SMEP/SMAP */
+   /* Set up SMEP/SMAP/UMIP */
setup_smep(c);
setup_smap(c);
+   setup_umip(c);
 
/*
 * The vendor-specific functions might have changed features.
-- 
2.9.3

[PATCH v4 16/17] x86: Enable User-Mode Instruction Prevention

2017-02-22 Thread Ricardo Neri

User_mode Instruction Prevention (UMIP) is enabled by setting/clearing a
bit in %cr4.

It makes sense to enable UMIP at some point while booting, before user
spaces come up. Like SMAP and SMEP, is not critical to have it enabled
very early during boot. This is because UMIP is relevant only when there is
a userspace to be protected from. Given the similarities in relevance, it
makes sense to enable UMIP along with SMAP and SMEP.

UMIP is enabled by default. It can be disabled by adding clearcpuid=514
to the kernel parameters.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: Liang Z. Li 
Cc: Alexandre Julliard 
Cc: Stas Sergeev 
Cc: x...@kernel.org
Cc: linux-ms...@vger.kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/Kconfig | 10 ++
 arch/x86/kernel/cpu/common.c | 16 +++-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f8fbfc5..8819fb2 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1733,6 +1733,16 @@ config X86_SMAP
 
  If unsure, say Y.
 
+config X86_INTEL_UMIP
+   def_bool y
+   depends on CPU_SUP_INTEL
+   prompt "User Mode Instruction Prevention" if EXPERT
+   ---help---
+ The User Mode Instruction Prevention (UMIP) is a security
+ feature in newer Intel processors. If enabled, a general
+ protection fault is issued if the instructions SGDT, SLDT,
+ SIDT, SMSW and STR are executed in user mode.
+
 config X86_INTEL_MPX
prompt "Intel MPX (Memory Protection Extensions)"
def_bool n
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index c188ae5..8668828 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -312,6 +312,19 @@ static __always_inline void setup_smap(struct cpuinfo_x86 
*c)
}
 }
 
+static __always_inline void setup_umip(struct cpuinfo_x86 *c)
+{
+   if (cpu_feature_enabled(X86_FEATURE_UMIP) &&
+   cpu_has(c, X86_FEATURE_UMIP))
+   cr4_set_bits(X86_CR4_UMIP);
+   else
+   /*
+* Make sure UMIP is disabled in case it was enabled in a
+* previous boot (e.g., via kexec).
+*/
+   cr4_clear_bits(X86_CR4_UMIP);
+}
+
 /*
  * Protection Keys are not available in 32-bit mode.
  */
@@ -1083,9 +1096,10 @@ static void identify_cpu(struct cpuinfo_x86 *c)
/* Disable the PN if appropriate */
squash_the_stupid_serial_number(c);
 
-   /* Set up SMEP/SMAP */
+   /* Set up SMEP/SMAP/UMIP */
setup_smep(c);
setup_smap(c);
+   setup_umip(c);
 
/*
 * The vendor-specific functions might have changed features.
-- 
2.9.3

[PATCH v4 05/17] x86/insn-eval: Add utility function to get segment selector

2017-02-22 Thread Ricardo Neri

When computing a linear address and segmentation is used, we need to know
the base address of the segment involved in the computation. In most of
the cases, it will be sufficient to use USER_DS, which has a base of 0.
However, it may be possible that a user space program defines its own
segments via a local descriptor table. Thus, the base address of the
segment is needed.

The segment selector to be used when computing a linear address is
determined by any either segment select override prefixes in the
instruction or the registers involved in the computation of the effective
address; in that order. Also, there are cases when the overrides shall be
ignored.

This function can be used in both protected mode and virtual-8086 mode.
When in protected mode, the segment selector is obtained from the pt_regs
structure. When in virtual-8086 mode, data segments are obtained from the
kernel_vm86_regs.

When in CONFIG_X86_64, selectors for data segments are absent from pt_regs.
Hence, the returned selector is zero to signal that segmentation is not
in use.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 163 +++
 1 file changed, 163 insertions(+)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 6c62fbf..516902e 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 
 enum reg_type {
REG_TYPE_RM = 0,
@@ -15,6 +16,168 @@ enum reg_type {
REG_TYPE_BASE,
 };
 
+/**
+ * get_segment_selector() - obtain segment selector
+ * @regs:  Set of registers containing the segment selector
+ * @insn:  Instruction structure with selector override prefixes
+ * @regoff:Operand offset, in pt_regs, of which the selector is needed
+ *
+ * The segment selector to which an effective address refers depends on
+ * a) segment selector overrides instruction prefixes or b) the operand
+ * register indicated in the ModRM or SiB byte.
+ *
+ * For case a), the function inspects any prefixes in the insn instruction;
+ * insn can be null to indicate that selector override prefixes shall be
+ * ignored. This is useful when the use of prefixes is forbidden (e.g.,
+ * obtaining the code selector). For case b), the operand register shall be
+ * represented as the offset from the base address of pt_regs. Also, regoff
+ * can be -EINVAL for cases in which registers are not used as operands (e.g.,
+ * when the mod and r/m parts of the ModRM byte are 0 and 5, respectively).
+ *
+ * The returned segment selector is obtained from the regs structure. Both
+ * protected and virtual-8086 modes are supported. In virtual-8086 mode,
+ * data segments are obtained from the kernel_vm86_regs structure.
+ * For CONFIG_X86_64, the returned segment selector is null if such selector
+ * refers to es, fs or gs.
+ *
+ * Return: Value of the segment selector
+ */
+static unsigned short get_segment_selector(struct pt_regs *regs,
+  struct insn *insn, int regoff)
+{
+   int i;
+
+   struct kernel_vm86_regs *vm86regs = (struct kernel_vm86_regs *)regs;
+
+   if (!insn)
+   goto default_seg;
+
+   insn_get_prefixes(insn);
+
+   if (v8086_mode(regs)) {
+   /*
+* Check first if we have selector overrides. Having more than
+* one selector override leads to undefined behavior. We
+* only use the first one and return
+*/
+   for (i = 0; i < insn->prefixes.nbytes; i++) {
+   switch (insn->prefixes.bytes[i]) {
+   /*
+* Code and stack segment selector register are saved in
+* all processor modes. Thus, it makes sense to take
+* them from pt_regs.
+*/
+   case 0x2e:
+   return (unsigned short)regs->cs;
+   case 0x36:
+   return (unsigned short)regs->ss;
+   /*
+* The rest of the segment selector registers are only
+* saved in virtual-8086 mode. Thus, we must obtain them
+* from the vm86

[PATCH v4 03/17] x86/mpx, x86/insn: Relocate insn util functions to a new insn-kernel

2017-02-22 Thread Ricardo Neri

Other kernel submodules can benefit from using the utility functions
defined in mpx.c to obtain the addresses and values of operands contained
in the general purpose registers. An instance of this is the emulation code
used for instructions protected by the Intel User-Mode Instruction
Prevention feature.

Thus, these functions are relocated to a new insn-eval.c file. The reason
to not relocate these utilities into insn.c is that the latter solely
analyses instructions given by a struct insn without any knowledge of the
meaning of the values of instruction operands. This new utility insn-
eval.c aims to be used to resolve effective and linear addresses based on
the contents of the instruction operands as well as the contents of the
struct pt_regs.

These utilities come with a separate header. This is to avoid taking insn.c
out of sync from the instructions decoders under tools/obj and tools/perf.
This also avoids adding cumbersome #ifdef's for the #include'd files
required to decode instructions in a kernel context.

Functions are simply relocated. There are not functional or indentation
changes.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/insn-eval.h |  16 
 arch/x86/lib/Makefile|   2 +-
 arch/x86/lib/insn-eval.c | 160 +++
 arch/x86/mm/mpx.c| 152 +
 4 files changed, 179 insertions(+), 151 deletions(-)
 create mode 100644 arch/x86/include/asm/insn-eval.h
 create mode 100644 arch/x86/lib/insn-eval.c

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
new file mode 100644
index 000..5cab1b1
--- /dev/null
+++ b/arch/x86/include/asm/insn-eval.h
@@ -0,0 +1,16 @@
+#ifndef _ASM_X86_INSN_EVAL_H
+#define _ASM_X86_INSN_EVAL_H
+/*
+ * A collection of utility functions for x86 instruction analysis to be
+ * used in a kernel context. Useful when, for instance, making sense
+ * of the registers indicated by operands.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
+
+#endif /* _ASM_X86_INSN_EVAL_H */
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 34a7413..675d7b0 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -23,7 +23,7 @@ lib-y := delay.o misc.o cmdline.o cpu.o
 lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o
 lib-y += memcpy_$(BITS).o
 lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
-lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o
+lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o insn-eval.o
 lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
 
 obj-y += msr.o msr-reg.o msr-reg-export.o hweight.o
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
new file mode 100644
index 000..2ebfaa4
--- /dev/null
+++ b/arch/x86/lib/insn-eval.c
@@ -0,0 +1,160 @@
+/*
+ * Utility functions for x86 operand and address decoding
+ *
+ * Copyright (C) Intel Corporation 2016
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+
+enum reg_type {
+   REG_TYPE_RM = 0,
+   REG_TYPE_INDEX,
+   REG_TYPE_BASE,
+};
+
+static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
+ enum reg_type type)
+{
+   int regno = 0;
+
+   static const int regoff[] = {
+   offsetof(struct pt_regs, ax),
+   offsetof(struct pt_regs, cx),
+   offsetof(struct pt_regs, dx),
+   offsetof(struct pt_regs, bx),
+   offsetof(struct pt_regs, sp),
+   offsetof(struct pt_regs, bp),
+   offsetof(struct pt_regs, si),
+   offsetof(struct pt_regs, di),
+#ifdef CONFIG_X86_64
+   offsetof(struct pt_regs, r8),
+   offsetof(struct pt_regs, r9),
+   offsetof(struct pt_regs, r10),
+   offsetof(struct pt_regs, r11),
+   offsetof(struct pt_regs, r12),
+   offsetof(struct pt_regs, r13),
+   offsetof(struct pt_regs, r14),
+   offsetof(struct pt_regs, r15),
+#endif
+   };
+   int nr_registers = ARRAY_SIZE(regoff);
+   /*
+* Don't possibly decode a 32-bit instructions as
+* reading a 64-bit-only register.
+*/
+   if (IS_ENABLED(CONFIG_X86_64) && !insn->x86_64)
+

[PATCH v4 05/17] x86/insn-eval: Add utility function to get segment selector

2017-02-22 Thread Ricardo Neri

When computing a linear address and segmentation is used, we need to know
the base address of the segment involved in the computation. In most of
the cases, it will be sufficient to use USER_DS, which has a base of 0.
However, it may be possible that a user space program defines its own
segments via a local descriptor table. Thus, the base address of the
segment is needed.

The segment selector to be used when computing a linear address is
determined by any either segment select override prefixes in the
instruction or the registers involved in the computation of the effective
address; in that order. Also, there are cases when the overrides shall be
ignored.

This function can be used in both protected mode and virtual-8086 mode.
When in protected mode, the segment selector is obtained from the pt_regs
structure. When in virtual-8086 mode, data segments are obtained from the
kernel_vm86_regs.

When in CONFIG_X86_64, selectors for data segments are absent from pt_regs.
Hence, the returned selector is zero to signal that segmentation is not
in use.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 163 +++
 1 file changed, 163 insertions(+)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 6c62fbf..516902e 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 
 enum reg_type {
REG_TYPE_RM = 0,
@@ -15,6 +16,168 @@ enum reg_type {
REG_TYPE_BASE,
 };
 
+/**
+ * get_segment_selector() - obtain segment selector
+ * @regs:  Set of registers containing the segment selector
+ * @insn:  Instruction structure with selector override prefixes
+ * @regoff:Operand offset, in pt_regs, of which the selector is needed
+ *
+ * The segment selector to which an effective address refers depends on
+ * a) segment selector overrides instruction prefixes or b) the operand
+ * register indicated in the ModRM or SiB byte.
+ *
+ * For case a), the function inspects any prefixes in the insn instruction;
+ * insn can be null to indicate that selector override prefixes shall be
+ * ignored. This is useful when the use of prefixes is forbidden (e.g.,
+ * obtaining the code selector). For case b), the operand register shall be
+ * represented as the offset from the base address of pt_regs. Also, regoff
+ * can be -EINVAL for cases in which registers are not used as operands (e.g.,
+ * when the mod and r/m parts of the ModRM byte are 0 and 5, respectively).
+ *
+ * The returned segment selector is obtained from the regs structure. Both
+ * protected and virtual-8086 modes are supported. In virtual-8086 mode,
+ * data segments are obtained from the kernel_vm86_regs structure.
+ * For CONFIG_X86_64, the returned segment selector is null if such selector
+ * refers to es, fs or gs.
+ *
+ * Return: Value of the segment selector
+ */
+static unsigned short get_segment_selector(struct pt_regs *regs,
+  struct insn *insn, int regoff)
+{
+   int i;
+
+   struct kernel_vm86_regs *vm86regs = (struct kernel_vm86_regs *)regs;
+
+   if (!insn)
+   goto default_seg;
+
+   insn_get_prefixes(insn);
+
+   if (v8086_mode(regs)) {
+   /*
+* Check first if we have selector overrides. Having more than
+* one selector override leads to undefined behavior. We
+* only use the first one and return
+*/
+   for (i = 0; i < insn->prefixes.nbytes; i++) {
+   switch (insn->prefixes.bytes[i]) {
+   /*
+* Code and stack segment selector register are saved in
+* all processor modes. Thus, it makes sense to take
+* them from pt_regs.
+*/
+   case 0x2e:
+   return (unsigned short)regs->cs;
+   case 0x36:
+   return (unsigned short)regs->ss;
+   /*
+* The rest of the segment selector registers are only
+* saved in virtual-8086 mode. Thus, we must obtain them
+* from the vm86 register structure.
+*/
+   case 0x3e:
+   return vm86regs->ds;
+   case 0x26:
+   return vm86regs->es;
+   case 0x64:
+   return vm86regs->fs;
+   case 0x65:
+

[PATCH v4 03/17] x86/mpx, x86/insn: Relocate insn util functions to a new insn-kernel

2017-02-22 Thread Ricardo Neri

Other kernel submodules can benefit from using the utility functions
defined in mpx.c to obtain the addresses and values of operands contained
in the general purpose registers. An instance of this is the emulation code
used for instructions protected by the Intel User-Mode Instruction
Prevention feature.

Thus, these functions are relocated to a new insn-eval.c file. The reason
to not relocate these utilities into insn.c is that the latter solely
analyses instructions given by a struct insn without any knowledge of the
meaning of the values of instruction operands. This new utility insn-
eval.c aims to be used to resolve effective and linear addresses based on
the contents of the instruction operands as well as the contents of the
struct pt_regs.

These utilities come with a separate header. This is to avoid taking insn.c
out of sync from the instructions decoders under tools/obj and tools/perf.
This also avoids adding cumbersome #ifdef's for the #include'd files
required to decode instructions in a kernel context.

Functions are simply relocated. There are not functional or indentation
changes.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/insn-eval.h |  16 
 arch/x86/lib/Makefile|   2 +-
 arch/x86/lib/insn-eval.c | 160 +++
 arch/x86/mm/mpx.c| 152 +
 4 files changed, 179 insertions(+), 151 deletions(-)
 create mode 100644 arch/x86/include/asm/insn-eval.h
 create mode 100644 arch/x86/lib/insn-eval.c

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
new file mode 100644
index 000..5cab1b1
--- /dev/null
+++ b/arch/x86/include/asm/insn-eval.h
@@ -0,0 +1,16 @@
+#ifndef _ASM_X86_INSN_EVAL_H
+#define _ASM_X86_INSN_EVAL_H
+/*
+ * A collection of utility functions for x86 instruction analysis to be
+ * used in a kernel context. Useful when, for instance, making sense
+ * of the registers indicated by operands.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
+
+#endif /* _ASM_X86_INSN_EVAL_H */
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 34a7413..675d7b0 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -23,7 +23,7 @@ lib-y := delay.o misc.o cmdline.o cpu.o
 lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o
 lib-y += memcpy_$(BITS).o
 lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
-lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o
+lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o insn-eval.o
 lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
 
 obj-y += msr.o msr-reg.o msr-reg-export.o hweight.o
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
new file mode 100644
index 000..2ebfaa4
--- /dev/null
+++ b/arch/x86/lib/insn-eval.c
@@ -0,0 +1,160 @@
+/*
+ * Utility functions for x86 operand and address decoding
+ *
+ * Copyright (C) Intel Corporation 2016
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+
+enum reg_type {
+   REG_TYPE_RM = 0,
+   REG_TYPE_INDEX,
+   REG_TYPE_BASE,
+};
+
+static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
+ enum reg_type type)
+{
+   int regno = 0;
+
+   static const int regoff[] = {
+   offsetof(struct pt_regs, ax),
+   offsetof(struct pt_regs, cx),
+   offsetof(struct pt_regs, dx),
+   offsetof(struct pt_regs, bx),
+   offsetof(struct pt_regs, sp),
+   offsetof(struct pt_regs, bp),
+   offsetof(struct pt_regs, si),
+   offsetof(struct pt_regs, di),
+#ifdef CONFIG_X86_64
+   offsetof(struct pt_regs, r8),
+   offsetof(struct pt_regs, r9),
+   offsetof(struct pt_regs, r10),
+   offsetof(struct pt_regs, r11),
+   offsetof(struct pt_regs, r12),
+   offsetof(struct pt_regs, r13),
+   offsetof(struct pt_regs, r14),
+   offsetof(struct pt_regs, r15),
+#endif
+   };
+   int nr_registers = ARRAY_SIZE(regoff);
+   /*
+* Don't possibly decode a 32-bit instructions as
+* reading a 64-bit-only register.
+*/
+   if (IS_ENABLED(CONFIG_X86_64) && !insn->x86_64)
+   nr_registers -= 8;
+
+   switch (type) {
+   case REG_TYPE_RM:
+   regno = X86_MODRM_RM(insn->modrm.value);
+   if (X86_REX_B(insn->rex_prefix.value))
+   regno += 8;
+   break;
+
+   case REG_TYPE_INDEX:
+   regno = X86_SIB_INDEX(insn->sib.value);
+

[PATCH v4 15/17] x86/traps: Fixup general protection faults caused by UMIP

2017-02-22 Thread Ricardo Neri

If the User-Mode Instruction Prevention CPU feature is available and
enabled, a general protection fault will be issued if the instructions
sgdt, sldt, sidt, str or smsw are executed from user-mode context
(CPL > 0). If the fault was caused by any of the instructions protected
by UMIP, fixup_umip_exception will emulate dummy results for these
instructions. If emulation is successful, the result is passed to the
user space program and no SIGSEGV signal is emitted.

Please note that fixup_umip_exception also caters for the case when
the fault originated while running in virtual-8086 mode.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: Liang Z. Li 
Cc: Alexandre Julliard 
Cc: Stas Sergeev 
Cc: x...@kernel.org
Cc: linux-ms...@vger.kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/kernel/traps.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 948443e..39614ef 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -65,6 +65,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_X86_64
 #include 
@@ -492,6 +493,9 @@ do_general_protection(struct pt_regs *regs, long error_code)
RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
cond_local_irq_enable(regs);
 
+   if (user_mode(regs) && (fixup_umip_exception(regs) == true))
+   return;
+
if (v8086_mode(regs)) {
local_irq_enable();
handle_vm86_fault((struct kernel_vm86_regs *) regs, error_code);
-- 
2.9.3

[PATCH v4 15/17] x86/traps: Fixup general protection faults caused by UMIP

2017-02-22 Thread Ricardo Neri

If the User-Mode Instruction Prevention CPU feature is available and
enabled, a general protection fault will be issued if the instructions
sgdt, sldt, sidt, str or smsw are executed from user-mode context
(CPL > 0). If the fault was caused by any of the instructions protected
by UMIP, fixup_umip_exception will emulate dummy results for these
instructions. If emulation is successful, the result is passed to the
user space program and no SIGSEGV signal is emitted.

Please note that fixup_umip_exception also caters for the case when
the fault originated while running in virtual-8086 mode.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: Liang Z. Li 
Cc: Alexandre Julliard 
Cc: Stas Sergeev 
Cc: x...@kernel.org
Cc: linux-ms...@vger.kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/kernel/traps.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 948443e..39614ef 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -65,6 +65,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_X86_64
 #include 
@@ -492,6 +493,9 @@ do_general_protection(struct pt_regs *regs, long error_code)
RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
cond_local_irq_enable(regs);
 
+   if (user_mode(regs) && (fixup_umip_exception(regs) == true))
+   return;
+
if (v8086_mode(regs)) {
local_irq_enable();
handle_vm86_fault((struct kernel_vm86_regs *) regs, error_code);
-- 
2.9.3

[PATCH v4 11/17] x86/insn-eval: Add support to resolve 16-bit addressing encodings

2017-02-22 Thread Ricardo Neri

Tasks running in virtual-8086 mode or in protected mode with code
segment descriptors that specify 16-bit default address sizes via the
D bit will use 16-bit addressing form encodings as described in the Intel
64 and IA-32 Architecture Software Developer's Manual Volume 2A Section
2.1.5. 16-bit addressing encodings differ in several ways from the
32-bit/64-bit addressing form encodings: the r/m part of the ModRM byte
points to different registers and, in some cases, addresses can be
indicated by the addition of the value of two registers. Also, there is
no support for SiB bytes. Thus, a separate function is needed to parse
this form of addressing.

A couple of functions are introduced. get_reg_offset_16 obtains the
offset from the base of pt_regs of the registers indicated by the ModRM
byte of the address encoding. insn_get_addr_ref_16 computes the linear
address indicated by the instructions using the value of the registers
given by ModRM as well as the base address of the segment.

Lastly, the original function insn_get_addr_ref is renamed as
insn_get_addr_ref_32_64. A new insn_get_addr_ref function decides what
type of address decoding must be done base on the number of address bytes
given by the instruction.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 156 ++-
 1 file changed, 155 insertions(+), 1 deletion(-)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index b3a2fe8..ea5a38d 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -274,6 +274,73 @@ static int get_reg_offset(struct insn *insn, struct 
pt_regs *regs,
 }
 
 /**
+ * get_reg_offset_16 - Obtain offset of register indicated by instruction
+ * @insn:  Instruction structure containing ModRM and SiB bytes
+ * @regs:  Set of registers referred by the instruction
+ * @offs1: Offset of the first operand register
+ * @offs2: Offset of the second opeand register, if applicable.
+ *
+ * Obtain the offset, in pt_regs, of the registers indicated by the ModRM byte
+ * within insn. This function is to be used with 16-bit address encodings. The
+ * offs1 and offs2 will be written with the offset of the two registers
+ * indicated by the instruction. In cases where any of the registers is not
+ * referenced by the instruction, the value will be set to -EDOM.
+ *
+ * Return: 0 on success, -EINVAL on failure.
+ */
+static int get_reg_offset_16(struct insn *insn, struct pt_regs *regs,
+int *offs1, int *offs2)
+{
+   /* 16-bit addressing can use one or two registers */
+   static const int regoff1[] = {
+   offsetof(struct pt_regs, bx),
+   offsetof(struct pt_regs, bx),
+   offsetof(struct pt_regs, bp),
+   offsetof(struct pt_regs, bp),
+   offsetof(struct pt_regs, si),
+   offsetof(struct pt_regs, di),
+   offsetof(struct pt_regs, bp),
+   offsetof(struct pt_regs, bx),
+   };
+
+   static const int regoff2[] = {
+   offsetof(struct pt_regs, si),
+   offsetof(struct pt_regs, di),
+   offsetof(struct pt_regs, si),
+   offsetof(struct pt_regs, di),
+   -EDOM,
+   -EDOM,
+   -EDOM,
+   -EDOM,
+   };
+
+   if (!offs1 || !offs2)
+   return -EINVAL;
+
+   /* operand is a register, use the generic function */
+   if (X86_MODRM_MOD(insn->modrm.value) == 3) {
+   *offs1 = insn_get_reg_offset_modrm_rm(insn, regs);
+   *offs2 = -EDOM;
+   return 0;
+   }
+
+   *offs1 = regoff1[X86_MODRM_RM(insn->modrm.value)];
+   *offs2 = regoff2[X86_MODRM_RM(insn->modrm.value)];
+
+   /*
+* If no displacement is indicated in the mod part of the ModRM byte,
+* (mod part is 0) and the r/m part of the same byte is 6, no register
+* is used caculate the operand address. An r/m part of 6 means that
+* the second register offset is already invalid.
+*/
+   if ((X86_MODRM_MOD(insn->modrm.value) == 0) &&
+   (X86_MODRM_RM(insn->modrm.value) == 6))
+   *offs1 = -EDOM;
+
+   return 0;
+}
+
+/**
  * get_desc() - Obtain address of segment descriptor
  * @seg:

[PATCH v2 4/4] w1: w1_ds2760.h: fix defines indentation

2017-02-22 Thread Mariusz Bialonczyk

Signed-off-by: Mariusz Bialonczyk 
---
 drivers/w1/slaves/w1_ds2760.h | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/w1/slaves/w1_ds2760.h b/drivers/w1/slaves/w1_ds2760.h
index 58e774141568..24168c94eeae 100644
--- a/drivers/w1/slaves/w1_ds2760.h
+++ b/drivers/w1/slaves/w1_ds2760.h
@@ -24,11 +24,13 @@
 #define DS2760_DATA_SIZE   0x40
 
 #define DS2760_PROTECTION_REG  0x00
+
 #define DS2760_STATUS_REG  0x01
-   #define DS2760_STATUS_IE(1 << 2)
-   #define DS2760_STATUS_SWEN  (1 << 3)
-   #define DS2760_STATUS_RNAOP (1 << 4)
-   #define DS2760_STATUS_PMOD  (1 << 5)
+#define DS2760_STATUS_IE   (1 << 2)
+#define DS2760_STATUS_SWEN (1 << 3)
+#define DS2760_STATUS_RNAOP(1 << 4)
+#define DS2760_STATUS_PMOD (1 << 5)
+
 #define DS2760_EEPROM_REG  0x07
 #define DS2760_SPECIAL_FEATURE_REG 0x08
 #define DS2760_VOLTAGE_MSB 0x0c
-- 
2.11.0

[PATCH v4 11/17] x86/insn-eval: Add support to resolve 16-bit addressing encodings

2017-02-22 Thread Ricardo Neri

Tasks running in virtual-8086 mode or in protected mode with code
segment descriptors that specify 16-bit default address sizes via the
D bit will use 16-bit addressing form encodings as described in the Intel
64 and IA-32 Architecture Software Developer's Manual Volume 2A Section
2.1.5. 16-bit addressing encodings differ in several ways from the
32-bit/64-bit addressing form encodings: the r/m part of the ModRM byte
points to different registers and, in some cases, addresses can be
indicated by the addition of the value of two registers. Also, there is
no support for SiB bytes. Thus, a separate function is needed to parse
this form of addressing.

A couple of functions are introduced. get_reg_offset_16 obtains the
offset from the base of pt_regs of the registers indicated by the ModRM
byte of the address encoding. insn_get_addr_ref_16 computes the linear
address indicated by the instructions using the value of the registers
given by ModRM as well as the base address of the segment.

Lastly, the original function insn_get_addr_ref is renamed as
insn_get_addr_ref_32_64. A new insn_get_addr_ref function decides what
type of address decoding must be done base on the number of address bytes
given by the instruction.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 156 ++-
 1 file changed, 155 insertions(+), 1 deletion(-)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index b3a2fe8..ea5a38d 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -274,6 +274,73 @@ static int get_reg_offset(struct insn *insn, struct 
pt_regs *regs,
 }
 
 /**
+ * get_reg_offset_16 - Obtain offset of register indicated by instruction
+ * @insn:  Instruction structure containing ModRM and SiB bytes
+ * @regs:  Set of registers referred by the instruction
+ * @offs1: Offset of the first operand register
+ * @offs2: Offset of the second opeand register, if applicable.
+ *
+ * Obtain the offset, in pt_regs, of the registers indicated by the ModRM byte
+ * within insn. This function is to be used with 16-bit address encodings. The
+ * offs1 and offs2 will be written with the offset of the two registers
+ * indicated by the instruction. In cases where any of the registers is not
+ * referenced by the instruction, the value will be set to -EDOM.
+ *
+ * Return: 0 on success, -EINVAL on failure.
+ */
+static int get_reg_offset_16(struct insn *insn, struct pt_regs *regs,
+int *offs1, int *offs2)
+{
+   /* 16-bit addressing can use one or two registers */
+   static const int regoff1[] = {
+   offsetof(struct pt_regs, bx),
+   offsetof(struct pt_regs, bx),
+   offsetof(struct pt_regs, bp),
+   offsetof(struct pt_regs, bp),
+   offsetof(struct pt_regs, si),
+   offsetof(struct pt_regs, di),
+   offsetof(struct pt_regs, bp),
+   offsetof(struct pt_regs, bx),
+   };
+
+   static const int regoff2[] = {
+   offsetof(struct pt_regs, si),
+   offsetof(struct pt_regs, di),
+   offsetof(struct pt_regs, si),
+   offsetof(struct pt_regs, di),
+   -EDOM,
+   -EDOM,
+   -EDOM,
+   -EDOM,
+   };
+
+   if (!offs1 || !offs2)
+   return -EINVAL;
+
+   /* operand is a register, use the generic function */
+   if (X86_MODRM_MOD(insn->modrm.value) == 3) {
+   *offs1 = insn_get_reg_offset_modrm_rm(insn, regs);
+   *offs2 = -EDOM;
+   return 0;
+   }
+
+   *offs1 = regoff1[X86_MODRM_RM(insn->modrm.value)];
+   *offs2 = regoff2[X86_MODRM_RM(insn->modrm.value)];
+
+   /*
+* If no displacement is indicated in the mod part of the ModRM byte,
+* (mod part is 0) and the r/m part of the same byte is 6, no register
+* is used caculate the operand address. An r/m part of 6 means that
+* the second register offset is already invalid.
+*/
+   if ((X86_MODRM_MOD(insn->modrm.value) == 0) &&
+   (X86_MODRM_RM(insn->modrm.value) == 6))
+   *offs1 = -EDOM;
+
+   return 0;
+}
+
+/**
  * get_desc() - Obtain address of segment descriptor
  * @seg:   Segment selector
  * @desc:  Pointer to the selected segment descriptor
@@ -503,12 +570,79 @@ int insn_get_reg_offset_sib_index(struct insn *insn, 
struct pt_regs *regs)
return get_reg_offset(insn, regs, REG_TYPE_INDEX);
 }
 
+/**
+ * insn_get_addr_ref_16 - Obtain the 16-bit address referred by instruction
+ * @insn:  Instruction

[PATCH v2 4/4] w1: w1_ds2760.h: fix defines indentation

2017-02-22 Thread Mariusz Bialonczyk

Signed-off-by: Mariusz Bialonczyk 
---
 drivers/w1/slaves/w1_ds2760.h | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/w1/slaves/w1_ds2760.h b/drivers/w1/slaves/w1_ds2760.h
index 58e774141568..24168c94eeae 100644
--- a/drivers/w1/slaves/w1_ds2760.h
+++ b/drivers/w1/slaves/w1_ds2760.h
@@ -24,11 +24,13 @@
 #define DS2760_DATA_SIZE   0x40
 
 #define DS2760_PROTECTION_REG  0x00
+
 #define DS2760_STATUS_REG  0x01
-   #define DS2760_STATUS_IE(1 << 2)
-   #define DS2760_STATUS_SWEN  (1 << 3)
-   #define DS2760_STATUS_RNAOP (1 << 4)
-   #define DS2760_STATUS_PMOD  (1 << 5)
+#define DS2760_STATUS_IE   (1 << 2)
+#define DS2760_STATUS_SWEN (1 << 3)
+#define DS2760_STATUS_RNAOP(1 << 4)
+#define DS2760_STATUS_PMOD (1 << 5)
+
 #define DS2760_EEPROM_REG  0x07
 #define DS2760_SPECIAL_FEATURE_REG 0x08
 #define DS2760_VOLTAGE_MSB 0x0c
-- 
2.11.0

[PATCH v2 0/4] w1: add DS2438 support, documentation and small fixes

2017-02-22 Thread Mariusz Bialonczyk

This is my second version of my w1 patchset.
It mainly adds support for the DS2438. There is also a documentation
for it and also a missing one for DS2413.

Changes since v1:
Cleaned up according to Evgeniy Polyakov suggestions:
1/ changed to have lock/unlock_mutex calls in a single function
   (it was splitted accross more functions)
2/ fix defines indentations
3/ added additional patch which fixes the same defines indentation problem
   in w1_ds2760.h

Mariusz Bialonczyk (4):
  w1: add missing DS2413 documentation
  w1: add support for DS2438 Smart Battery Monitor
  w1: add documentation for w1_ds2438
  w1: w1_ds2760.h: fix defines indentation

 Documentation/w1/slaves/00-INDEX  |   4 +
 Documentation/w1/slaves/w1_ds2413 |  50 +
 Documentation/w1/slaves/w1_ds2438 |  63 ++
 drivers/w1/slaves/Kconfig |   6 +
 drivers/w1/slaves/Makefile|   1 +
 drivers/w1/slaves/w1_ds2438.c | 390 ++
 drivers/w1/slaves/w1_ds2760.h |  10 +-
 drivers/w1/w1_family.h|   1 +
 8 files changed, 521 insertions(+), 4 deletions(-)
 create mode 100644 Documentation/w1/slaves/w1_ds2413
 create mode 100644 Documentation/w1/slaves/w1_ds2438
 create mode 100644 drivers/w1/slaves/w1_ds2438.c

-- 
2.11.0

[PATCH v2 0/4] w1: add DS2438 support, documentation and small fixes

2017-02-22 Thread Mariusz Bialonczyk

This is my second version of my w1 patchset.
It mainly adds support for the DS2438. There is also a documentation
for it and also a missing one for DS2413.

Changes since v1:
Cleaned up according to Evgeniy Polyakov suggestions:
1/ changed to have lock/unlock_mutex calls in a single function
   (it was splitted accross more functions)
2/ fix defines indentations
3/ added additional patch which fixes the same defines indentation problem
   in w1_ds2760.h

Mariusz Bialonczyk (4):
  w1: add missing DS2413 documentation
  w1: add support for DS2438 Smart Battery Monitor
  w1: add documentation for w1_ds2438
  w1: w1_ds2760.h: fix defines indentation

 Documentation/w1/slaves/00-INDEX  |   4 +
 Documentation/w1/slaves/w1_ds2413 |  50 +
 Documentation/w1/slaves/w1_ds2438 |  63 ++
 drivers/w1/slaves/Kconfig |   6 +
 drivers/w1/slaves/Makefile|   1 +
 drivers/w1/slaves/w1_ds2438.c | 390 ++
 drivers/w1/slaves/w1_ds2760.h |  10 +-
 drivers/w1/w1_family.h|   1 +
 8 files changed, 521 insertions(+), 4 deletions(-)
 create mode 100644 Documentation/w1/slaves/w1_ds2413
 create mode 100644 Documentation/w1/slaves/w1_ds2438
 create mode 100644 drivers/w1/slaves/w1_ds2438.c

-- 
2.11.0

[PATCH v2 2/4] w1: add support for DS2438 Smart Battery Monitor

2017-02-22 Thread Mariusz Bialonczyk

Detailed information about support and provided sysfs files
in my next commit which creates a documentation file:
Documentation/w1/slaves/w1_ds2438

Signed-off-by: Mariusz Bialonczyk 
---
 drivers/w1/slaves/Kconfig |   6 +
 drivers/w1/slaves/Makefile|   1 +
 drivers/w1/slaves/w1_ds2438.c | 390 ++
 drivers/w1/w1_family.h|   1 +
 4 files changed, 398 insertions(+)
 create mode 100644 drivers/w1/slaves/w1_ds2438.c

diff --git a/drivers/w1/slaves/Kconfig b/drivers/w1/slaves/Kconfig
index cfe74d09932e..9b4a79782276 100644
--- a/drivers/w1/slaves/Kconfig
+++ b/drivers/w1/slaves/Kconfig
@@ -78,6 +78,12 @@ config W1_SLAVE_DS2433_CRC
  Each block has 30 bytes of data and a two byte CRC16.
  Full block writes are only allowed if the CRC is valid.
 
+config W1_SLAVE_DS2438
+   tristate "DS2438 Smart Battery Monitor 0x26 family support"
+   help
+ Say Y here if you want to use a 1-wire
+ DS2438 Smart Battery Monitor device support
+
 config W1_SLAVE_DS2760
tristate "Dallas 2760 battery monitor chip (HP iPAQ & others)"
help
diff --git a/drivers/w1/slaves/Makefile b/drivers/w1/slaves/Makefile
index 1e9989afe7bf..7ad7a2cf1e12 100644
--- a/drivers/w1/slaves/Makefile
+++ b/drivers/w1/slaves/Makefile
@@ -10,6 +10,7 @@ obj-$(CONFIG_W1_SLAVE_DS2406) += w1_ds2406.o
 obj-$(CONFIG_W1_SLAVE_DS2423)  += w1_ds2423.o
 obj-$(CONFIG_W1_SLAVE_DS2431)  += w1_ds2431.o
 obj-$(CONFIG_W1_SLAVE_DS2433)  += w1_ds2433.o
+obj-$(CONFIG_W1_SLAVE_DS2438)  += w1_ds2438.o
 obj-$(CONFIG_W1_SLAVE_DS2760)  += w1_ds2760.o
 obj-$(CONFIG_W1_SLAVE_DS2780)  += w1_ds2780.o
 obj-$(CONFIG_W1_SLAVE_DS2781)  += w1_ds2781.o
diff --git a/drivers/w1/slaves/w1_ds2438.c b/drivers/w1/slaves/w1_ds2438.c
new file mode 100644
index ..5ededb4965e1
--- /dev/null
+++ b/drivers/w1/slaves/w1_ds2438.c
@@ -0,0 +1,390 @@
+/*
+ * 1-Wire implementation for the ds2438 chip
+ *
+ * Copyright (c) 2017 Mariusz Bialonczyk 
+ *
+ * This source code is licensed under the GNU General Public License,
+ * Version 2. See the file COPYING for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "../w1.h"
+#include "../w1_family.h"
+
+#define W1_DS2438_RETRIES  3
+
+/* Memory commands */
+#define W1_DS2438_READ_SCRATCH 0xBE
+#define W1_DS2438_WRITE_SCRATCH0x4E
+#define W1_DS2438_COPY_SCRATCH 0x48
+#define W1_DS2438_RECALL_MEMORY0xB8
+/* Register commands */
+#define W1_DS2438_CONVERT_TEMP 0x44
+#define W1_DS2438_CONVERT_VOLTAGE  0xB4
+
+#define DS2438_PAGE_SIZE   8
+#define DS2438_ADC_INPUT_VAD   0
+#define DS2438_ADC_INPUT_VDD   1
+#define DS2438_MAX_CONVERSION_TIME 10  /* ms */
+
+/* Page #0 definitions */
+#define DS2438_STATUS_REG  0x00/* Status/Configuration 
Register */
+#define DS2438_STATUS_IAD  (1 << 0)/* Current A/D Control 
Bit */
+#define DS2438_STATUS_CA   (1 << 1)/* Current Accumulator 
Configuration */
+#define DS2438_STATUS_EE   (1 << 2)/* Current Accumulator 
Shadow Selector bit */
+#define DS2438_STATUS_AD   (1 << 3)/* Voltage A/D Input 
Select Bit */
+#define DS2438_STATUS_TB   (1 << 4)/* Temperature Busy 
Flag */
+#define DS2438_STATUS_NVB  (1 << 5)/* Nonvolatile Memory 
Busy Flag */
+#define DS2438_STATUS_ADB  (1 << 6)/* A/D Converter Busy 
Flag */
+
+#define DS2438_TEMP_LSB0x01
+#define DS2438_TEMP_MSB0x02
+#define DS2438_VOLTAGE_LSB 0x03
+#define DS2438_VOLTAGE_MSB 0x04
+#define DS2438_CURRENT_LSB 0x05
+#define DS2438_CURRENT_MSB 0x06
+#define DS2438_THRESHOLD   0x07
+
+int w1_ds2438_get_page(struct w1_slave *sl, int pageno, u8 *buf)
+{
+   unsigned int retries = W1_DS2438_RETRIES;
+   u8 w1_buf[2];
+   u8 crc;
+   size_t count;
+
+   while (retries--) {
+   crc = 0;
+
+   if (w1_reset_select_slave(sl))
+   continue;
+   w1_buf[0] = W1_DS2438_RECALL_MEMORY;
+   w1_buf[1] = 0x00;
+   w1_write_block(sl->master, w1_buf, 2);
+
+   if (w1_reset_select_slave(sl))
+   continue;
+   w1_buf[0] = W1_DS2438_READ_SCRATCH;
+   w1_buf[1] = 0x00;
+   w1_write_block(sl->master, w1_buf, 2);
+
+   count = w1_read_block(sl->master, buf, DS2438_PAGE_SIZE + 1);
+   if (count == DS2438_PAGE_SIZE + 1) {
+   crc = w1_calc_crc8(buf, DS2438_PAGE_SIZE);
+
+   /* check for correct CRC */
+   if ((u8)buf[DS2438_PAGE_SIZE] == crc)
+   return 0;
+

[PATCH v4 10/17] insn/eval: Incorporate segment base in address computation

2017-02-22 Thread Ricardo Neri

insn_get_addr_ref returns the effective address as defined by the
section 3.7.5.1 Vol 1 of the Intel 64 and IA-32 Architectures Software
Developer's Manual. In order to truly give the linear address, we must
add the effective address to the segment base as described by the segment
descriptor.

In most cases, the base will be 0 if the USER_DS segment is used or if
segmentation is not used. However, the base address is not necessarily
zero if a user programs defines its own segments. This is possible by
using a local descriptor table.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index d6525c2..b3a2fe8 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -523,6 +523,7 @@ static void __user *insn_get_addr_ref(struct insn *insn, 
struct pt_regs *regs)
if (addr_offset < 0)
goto out_err;
addr = regs_get_register(regs, addr_offset);
+   addr += insn_get_seg_base(regs, insn, addr_offset);
} else {
if (insn->sib.nbytes) {
/*
@@ -551,6 +552,7 @@ static void __user *insn_get_addr_ref(struct insn *insn, 
struct pt_regs *regs)
indx = regs_get_register(regs, indx_offset);
 
addr = base + indx * (1 << X86_SIB_SCALE(sib));
+   addr += insn_get_seg_base(regs, insn, base_offset);
} else {
unsigned char addr_bytes;
 
@@ -575,8 +577,10 @@ static void __user *insn_get_addr_ref(struct insn *insn, 
struct pt_regs *regs)
} else {
addr = regs_get_register(regs, addr_offset);
}
+   addr += insn_get_seg_base(regs, insn, addr_offset);
}
addr += insn->displacement.value;
+
}
return (void __user *)addr;
 out_err:
-- 
2.9.3

[PATCH v2 1/4] w1: add missing DS2413 documentation

2017-02-22 Thread Mariusz Bialonczyk

Signed-off-by: Mariusz Bialonczyk 
---
 Documentation/w1/slaves/00-INDEX  |  2 ++
 Documentation/w1/slaves/w1_ds2413 | 50 +++
 2 files changed, 52 insertions(+)
 create mode 100644 Documentation/w1/slaves/w1_ds2413

diff --git a/Documentation/w1/slaves/00-INDEX b/Documentation/w1/slaves/00-INDEX
index 6e18c70c3474..cbcca1d3a680 100644
--- a/Documentation/w1/slaves/00-INDEX
+++ b/Documentation/w1/slaves/00-INDEX
@@ -2,6 +2,8 @@
- This file
 w1_therm
- The Maxim/Dallas Semiconductor ds18*20 temperature sensor.
+w1_ds2413
+   - The Maxim/Dallas Semiconductor ds2413 dual channel addressable switch.
 w1_ds2423
- The Maxim/Dallas Semiconductor ds2423 counter device.
 w1_ds28e04
diff --git a/Documentation/w1/slaves/w1_ds2413 
b/Documentation/w1/slaves/w1_ds2413
new file mode 100644
index ..936263a8ccb4
--- /dev/null
+++ b/Documentation/w1/slaves/w1_ds2413
@@ -0,0 +1,50 @@
+Kernel driver w1_ds2413
+===
+
+Supported chips:
+  * Maxim DS2413 1-Wire Dual Channel Addressable Switch
+
+supported family codes:
+W1_FAMILY_DS24130x3A
+
+Author: Mariusz Bialonczyk 
+
+Description
+---
+
+The DS2413 chip has two open-drain outputs (PIO A and PIO B).
+Support is provided through the sysfs files "output" and "state".
+
+Reading state
+-
+The "state" file provides one-byte value which is in the same format as for
+the chip PIO_ACCESS_READ command (refer the datasheet for details):
+
+Bit 0:   PIOA Pin State
+Bit 1:   PIOA Output Latch State
+Bit 2:   PIOB Pin State
+Bit 3:   PIOB Output Latch State
+Bit 4-7: Complement of Bit 3 to Bit 0 (verified by the kernel module)
+
+This file is readonly.
+
+Writing output
+--
+You can set the PIO pins using the "output" file.
+It is writable, you can write one-byte value to this sysfs file.
+Similarly the byte format is the same as for the PIO_ACCESS_WRITE command:
+
+Bit 0:   PIOA
+Bit 1:   PIOB
+Bit 2-7: No matter (driver will set it to "1"s)
+
+
+The chip has some kind of basic protection against transmission errors.
+When reading the state, there is a four complement bits.
+The driver is checking this complement, and when it is wrong then it is
+returning I/O error.
+
+When writing output, the master must repeat the PIO Output Data byte in
+its inverted form and it is waiting for a confirmation.
+If the write is unsuccessful for three times, the write also returns
+I/O error.
-- 
2.11.0

[PATCH v4 10/17] insn/eval: Incorporate segment base in address computation

2017-02-22 Thread Ricardo Neri

insn_get_addr_ref returns the effective address as defined by the
section 3.7.5.1 Vol 1 of the Intel 64 and IA-32 Architectures Software
Developer's Manual. In order to truly give the linear address, we must
add the effective address to the segment base as described by the segment
descriptor.

In most cases, the base will be 0 if the USER_DS segment is used or if
segmentation is not used. However, the base address is not necessarily
zero if a user programs defines its own segments. This is possible by
using a local descriptor table.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index d6525c2..b3a2fe8 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -523,6 +523,7 @@ static void __user *insn_get_addr_ref(struct insn *insn, 
struct pt_regs *regs)
if (addr_offset < 0)
goto out_err;
addr = regs_get_register(regs, addr_offset);
+   addr += insn_get_seg_base(regs, insn, addr_offset);
} else {
if (insn->sib.nbytes) {
/*
@@ -551,6 +552,7 @@ static void __user *insn_get_addr_ref(struct insn *insn, 
struct pt_regs *regs)
indx = regs_get_register(regs, indx_offset);
 
addr = base + indx * (1 << X86_SIB_SCALE(sib));
+   addr += insn_get_seg_base(regs, insn, base_offset);
} else {
unsigned char addr_bytes;
 
@@ -575,8 +577,10 @@ static void __user *insn_get_addr_ref(struct insn *insn, 
struct pt_regs *regs)
} else {
addr = regs_get_register(regs, addr_offset);
}
+   addr += insn_get_seg_base(regs, insn, addr_offset);
}
addr += insn->displacement.value;
+
}
return (void __user *)addr;
 out_err:
-- 
2.9.3

[PATCH v2 1/4] w1: add missing DS2413 documentation

2017-02-22 Thread Mariusz Bialonczyk

Signed-off-by: Mariusz Bialonczyk 
---
 Documentation/w1/slaves/00-INDEX  |  2 ++
 Documentation/w1/slaves/w1_ds2413 | 50 +++
 2 files changed, 52 insertions(+)
 create mode 100644 Documentation/w1/slaves/w1_ds2413

diff --git a/Documentation/w1/slaves/00-INDEX b/Documentation/w1/slaves/00-INDEX
index 6e18c70c3474..cbcca1d3a680 100644
--- a/Documentation/w1/slaves/00-INDEX
+++ b/Documentation/w1/slaves/00-INDEX
@@ -2,6 +2,8 @@
- This file
 w1_therm
- The Maxim/Dallas Semiconductor ds18*20 temperature sensor.
+w1_ds2413
+   - The Maxim/Dallas Semiconductor ds2413 dual channel addressable switch.
 w1_ds2423
- The Maxim/Dallas Semiconductor ds2423 counter device.
 w1_ds28e04
diff --git a/Documentation/w1/slaves/w1_ds2413 
b/Documentation/w1/slaves/w1_ds2413
new file mode 100644
index ..936263a8ccb4
--- /dev/null
+++ b/Documentation/w1/slaves/w1_ds2413
@@ -0,0 +1,50 @@
+Kernel driver w1_ds2413
+===
+
+Supported chips:
+  * Maxim DS2413 1-Wire Dual Channel Addressable Switch
+
+supported family codes:
+W1_FAMILY_DS24130x3A
+
+Author: Mariusz Bialonczyk 
+
+Description
+---
+
+The DS2413 chip has two open-drain outputs (PIO A and PIO B).
+Support is provided through the sysfs files "output" and "state".
+
+Reading state
+-
+The "state" file provides one-byte value which is in the same format as for
+the chip PIO_ACCESS_READ command (refer the datasheet for details):
+
+Bit 0:   PIOA Pin State
+Bit 1:   PIOA Output Latch State
+Bit 2:   PIOB Pin State
+Bit 3:   PIOB Output Latch State
+Bit 4-7: Complement of Bit 3 to Bit 0 (verified by the kernel module)
+
+This file is readonly.
+
+Writing output
+--
+You can set the PIO pins using the "output" file.
+It is writable, you can write one-byte value to this sysfs file.
+Similarly the byte format is the same as for the PIO_ACCESS_WRITE command:
+
+Bit 0:   PIOA
+Bit 1:   PIOB
+Bit 2-7: No matter (driver will set it to "1"s)
+
+
+The chip has some kind of basic protection against transmission errors.
+When reading the state, there is a four complement bits.
+The driver is checking this complement, and when it is wrong then it is
+returning I/O error.
+
+When writing output, the master must repeat the PIO Output Data byte in
+its inverted form and it is waiting for a confirmation.
+If the write is unsuccessful for three times, the write also returns
+I/O error.
-- 
2.11.0

[PATCH v2 2/4] w1: add support for DS2438 Smart Battery Monitor

2017-02-22 Thread Mariusz Bialonczyk

Detailed information about support and provided sysfs files
in my next commit which creates a documentation file:
Documentation/w1/slaves/w1_ds2438

Signed-off-by: Mariusz Bialonczyk 
---
 drivers/w1/slaves/Kconfig |   6 +
 drivers/w1/slaves/Makefile|   1 +
 drivers/w1/slaves/w1_ds2438.c | 390 ++
 drivers/w1/w1_family.h|   1 +
 4 files changed, 398 insertions(+)
 create mode 100644 drivers/w1/slaves/w1_ds2438.c

diff --git a/drivers/w1/slaves/Kconfig b/drivers/w1/slaves/Kconfig
index cfe74d09932e..9b4a79782276 100644
--- a/drivers/w1/slaves/Kconfig
+++ b/drivers/w1/slaves/Kconfig
@@ -78,6 +78,12 @@ config W1_SLAVE_DS2433_CRC
  Each block has 30 bytes of data and a two byte CRC16.
  Full block writes are only allowed if the CRC is valid.
 
+config W1_SLAVE_DS2438
+   tristate "DS2438 Smart Battery Monitor 0x26 family support"
+   help
+ Say Y here if you want to use a 1-wire
+ DS2438 Smart Battery Monitor device support
+
 config W1_SLAVE_DS2760
tristate "Dallas 2760 battery monitor chip (HP iPAQ & others)"
help
diff --git a/drivers/w1/slaves/Makefile b/drivers/w1/slaves/Makefile
index 1e9989afe7bf..7ad7a2cf1e12 100644
--- a/drivers/w1/slaves/Makefile
+++ b/drivers/w1/slaves/Makefile
@@ -10,6 +10,7 @@ obj-$(CONFIG_W1_SLAVE_DS2406) += w1_ds2406.o
 obj-$(CONFIG_W1_SLAVE_DS2423)  += w1_ds2423.o
 obj-$(CONFIG_W1_SLAVE_DS2431)  += w1_ds2431.o
 obj-$(CONFIG_W1_SLAVE_DS2433)  += w1_ds2433.o
+obj-$(CONFIG_W1_SLAVE_DS2438)  += w1_ds2438.o
 obj-$(CONFIG_W1_SLAVE_DS2760)  += w1_ds2760.o
 obj-$(CONFIG_W1_SLAVE_DS2780)  += w1_ds2780.o
 obj-$(CONFIG_W1_SLAVE_DS2781)  += w1_ds2781.o
diff --git a/drivers/w1/slaves/w1_ds2438.c b/drivers/w1/slaves/w1_ds2438.c
new file mode 100644
index ..5ededb4965e1
--- /dev/null
+++ b/drivers/w1/slaves/w1_ds2438.c
@@ -0,0 +1,390 @@
+/*
+ * 1-Wire implementation for the ds2438 chip
+ *
+ * Copyright (c) 2017 Mariusz Bialonczyk 
+ *
+ * This source code is licensed under the GNU General Public License,
+ * Version 2. See the file COPYING for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "../w1.h"
+#include "../w1_family.h"
+
+#define W1_DS2438_RETRIES  3
+
+/* Memory commands */
+#define W1_DS2438_READ_SCRATCH 0xBE
+#define W1_DS2438_WRITE_SCRATCH0x4E
+#define W1_DS2438_COPY_SCRATCH 0x48
+#define W1_DS2438_RECALL_MEMORY0xB8
+/* Register commands */
+#define W1_DS2438_CONVERT_TEMP 0x44
+#define W1_DS2438_CONVERT_VOLTAGE  0xB4
+
+#define DS2438_PAGE_SIZE   8
+#define DS2438_ADC_INPUT_VAD   0
+#define DS2438_ADC_INPUT_VDD   1
+#define DS2438_MAX_CONVERSION_TIME 10  /* ms */
+
+/* Page #0 definitions */
+#define DS2438_STATUS_REG  0x00/* Status/Configuration 
Register */
+#define DS2438_STATUS_IAD  (1 << 0)/* Current A/D Control 
Bit */
+#define DS2438_STATUS_CA   (1 << 1)/* Current Accumulator 
Configuration */
+#define DS2438_STATUS_EE   (1 << 2)/* Current Accumulator 
Shadow Selector bit */
+#define DS2438_STATUS_AD   (1 << 3)/* Voltage A/D Input 
Select Bit */
+#define DS2438_STATUS_TB   (1 << 4)/* Temperature Busy 
Flag */
+#define DS2438_STATUS_NVB  (1 << 5)/* Nonvolatile Memory 
Busy Flag */
+#define DS2438_STATUS_ADB  (1 << 6)/* A/D Converter Busy 
Flag */
+
+#define DS2438_TEMP_LSB0x01
+#define DS2438_TEMP_MSB0x02
+#define DS2438_VOLTAGE_LSB 0x03
+#define DS2438_VOLTAGE_MSB 0x04
+#define DS2438_CURRENT_LSB 0x05
+#define DS2438_CURRENT_MSB 0x06
+#define DS2438_THRESHOLD   0x07
+
+int w1_ds2438_get_page(struct w1_slave *sl, int pageno, u8 *buf)
+{
+   unsigned int retries = W1_DS2438_RETRIES;
+   u8 w1_buf[2];
+   u8 crc;
+   size_t count;
+
+   while (retries--) {
+   crc = 0;
+
+   if (w1_reset_select_slave(sl))
+   continue;
+   w1_buf[0] = W1_DS2438_RECALL_MEMORY;
+   w1_buf[1] = 0x00;
+   w1_write_block(sl->master, w1_buf, 2);
+
+   if (w1_reset_select_slave(sl))
+   continue;
+   w1_buf[0] = W1_DS2438_READ_SCRATCH;
+   w1_buf[1] = 0x00;
+   w1_write_block(sl->master, w1_buf, 2);
+
+   count = w1_read_block(sl->master, buf, DS2438_PAGE_SIZE + 1);
+   if (count == DS2438_PAGE_SIZE + 1) {
+   crc = w1_calc_crc8(buf, DS2438_PAGE_SIZE);
+
+   /* check for correct CRC */
+   if ((u8)buf[DS2438_PAGE_SIZE] == crc)
+   return 0;
+   }
+   }
+   return

[PATCH v2 3/4] w1: add documentation for w1_ds2438

2017-02-22 Thread Mariusz Bialonczyk

Signed-off-by: Mariusz Bialonczyk 
---
 Documentation/w1/slaves/00-INDEX  |  2 ++
 Documentation/w1/slaves/w1_ds2438 | 63 +++
 2 files changed, 65 insertions(+)
 create mode 100644 Documentation/w1/slaves/w1_ds2438

diff --git a/Documentation/w1/slaves/00-INDEX b/Documentation/w1/slaves/00-INDEX
index cbcca1d3a680..8d76718e1ea2 100644
--- a/Documentation/w1/slaves/00-INDEX
+++ b/Documentation/w1/slaves/00-INDEX
@@ -6,5 +6,7 @@ w1_ds2413
- The Maxim/Dallas Semiconductor ds2413 dual channel addressable switch.
 w1_ds2423
- The Maxim/Dallas Semiconductor ds2423 counter device.
+w1_ds2438
+   - The Maxim/Dallas Semiconductor ds2438 smart battery monitor.
 w1_ds28e04
- The Maxim/Dallas Semiconductor ds28e04 eeprom.
diff --git a/Documentation/w1/slaves/w1_ds2438 
b/Documentation/w1/slaves/w1_ds2438
new file mode 100644
index ..b99f3674c5b4
--- /dev/null
+++ b/Documentation/w1/slaves/w1_ds2438
@@ -0,0 +1,63 @@
+Kernel driver w1_ds2438
+===
+
+Supported chips:
+  * Maxim DS2438 Smart Battery Monitor
+
+supported family codes:
+W1_FAMILY_DS24380x26
+
+Author: Mariusz Bialonczyk 
+
+Description
+---
+
+The DS2438 chip provides several functions that are desirable to carry in
+a battery pack. It also has a 40 bytes of nonvolatile EEPROM.
+Because the ability of temperature, current and voltage measurement, the chip
+is also often used in weather stations and applications such as: rain gauge,
+wind speed/direction measuring, humidity sensing, etc.
+
+Current support is provided through the following sysfs files (all files
+except "iad" are readonly):
+
+"iad"
+-
+This file controls the 'Current A/D Control Bit' (IAD) in the
+Status/Configuration Register.
+Writing a zero value will clear the IAD bit and disables the current
+measurements.
+Writing value "1" is setting the IAD bit (enables the measurements).
+The IAD bit is enabled by default in the DS2438.
+
+When writing to sysfs file bits 2-7 are ignored, so it's safe to write ASCII.
+An I/O error is returned when there is a problem setting the new value.
+
+"page0"
+---
+This file provides full 8 bytes of the chip Page 0 (00h).
+This page contains the most frequently accessed information of the DS2438.
+Internally when this file is read, the additional CRC byte is also obtained
+from the slave device. If it is correct, the 8 bytes page data are passed
+to userspace, otherwise an I/O error is returned.
+
+"temperature"
+-
+Opening and reading this file initiates the CONVERT_T (temperature conversion)
+command of the chip, afterwards the temperature is read from the device
+registers and provided as an ASCII decimal value.
+
+Important: The returned value has to be divided by 256 to get a real
+temperature in degrees Celsius.
+
+"vad", "vdd"
+
+Opening and reading this file initiates the CONVERT_V (voltage conversion)
+command of the chip.
+
+Depending on a sysfs filename a different input for the A/D will be selected:
+vad: general purpose A/D input (VAD)
+vdd: battery input (VDD)
+
+After the voltage conversion the value is returned as decimal ASCII.
+Note: The value is in mV, so to get a volts the value has to be divided by 10.
-- 
2.11.0

[PATCH v2 3/4] w1: add documentation for w1_ds2438

2017-02-22 Thread Mariusz Bialonczyk

Signed-off-by: Mariusz Bialonczyk 
---
 Documentation/w1/slaves/00-INDEX  |  2 ++
 Documentation/w1/slaves/w1_ds2438 | 63 +++
 2 files changed, 65 insertions(+)
 create mode 100644 Documentation/w1/slaves/w1_ds2438

diff --git a/Documentation/w1/slaves/00-INDEX b/Documentation/w1/slaves/00-INDEX
index cbcca1d3a680..8d76718e1ea2 100644
--- a/Documentation/w1/slaves/00-INDEX
+++ b/Documentation/w1/slaves/00-INDEX
@@ -6,5 +6,7 @@ w1_ds2413
- The Maxim/Dallas Semiconductor ds2413 dual channel addressable switch.
 w1_ds2423
- The Maxim/Dallas Semiconductor ds2423 counter device.
+w1_ds2438
+   - The Maxim/Dallas Semiconductor ds2438 smart battery monitor.
 w1_ds28e04
- The Maxim/Dallas Semiconductor ds28e04 eeprom.
diff --git a/Documentation/w1/slaves/w1_ds2438 
b/Documentation/w1/slaves/w1_ds2438
new file mode 100644
index ..b99f3674c5b4
--- /dev/null
+++ b/Documentation/w1/slaves/w1_ds2438
@@ -0,0 +1,63 @@
+Kernel driver w1_ds2438
+===
+
+Supported chips:
+  * Maxim DS2438 Smart Battery Monitor
+
+supported family codes:
+W1_FAMILY_DS24380x26
+
+Author: Mariusz Bialonczyk 
+
+Description
+---
+
+The DS2438 chip provides several functions that are desirable to carry in
+a battery pack. It also has a 40 bytes of nonvolatile EEPROM.
+Because the ability of temperature, current and voltage measurement, the chip
+is also often used in weather stations and applications such as: rain gauge,
+wind speed/direction measuring, humidity sensing, etc.
+
+Current support is provided through the following sysfs files (all files
+except "iad" are readonly):
+
+"iad"
+-
+This file controls the 'Current A/D Control Bit' (IAD) in the
+Status/Configuration Register.
+Writing a zero value will clear the IAD bit and disables the current
+measurements.
+Writing value "1" is setting the IAD bit (enables the measurements).
+The IAD bit is enabled by default in the DS2438.
+
+When writing to sysfs file bits 2-7 are ignored, so it's safe to write ASCII.
+An I/O error is returned when there is a problem setting the new value.
+
+"page0"
+---
+This file provides full 8 bytes of the chip Page 0 (00h).
+This page contains the most frequently accessed information of the DS2438.
+Internally when this file is read, the additional CRC byte is also obtained
+from the slave device. If it is correct, the 8 bytes page data are passed
+to userspace, otherwise an I/O error is returned.
+
+"temperature"
+-
+Opening and reading this file initiates the CONVERT_T (temperature conversion)
+command of the chip, afterwards the temperature is read from the device
+registers and provided as an ASCII decimal value.
+
+Important: The returned value has to be divided by 256 to get a real
+temperature in degrees Celsius.
+
+"vad", "vdd"
+
+Opening and reading this file initiates the CONVERT_V (voltage conversion)
+command of the chip.
+
+Depending on a sysfs filename a different input for the A/D will be selected:
+vad: general purpose A/D input (VAD)
+vdd: battery input (VDD)
+
+After the voltage conversion the value is returned as decimal ASCII.
+Note: The value is in mV, so to get a volts the value has to be divided by 10.
-- 
2.11.0

[PATCH v4 14/17] x86/umip: Force a page fault when unable to copy emulated result to user

2017-02-22 Thread Ricardo Neri

fixup_umip_exception will be called from do_general_protection. If the
former returns false, the latter will issue a SIGSEGV with SEND_SIG_PRIV.
However, when emulation is successful but the emulated result cannot be
copied to user space memory, it is more accurate to issue a SIGSEGV with
SEGV_MAPERR with the offending address. A new function is inspired in
force_sig_info_fault is introduced to model the page fault.

Signed-off-by: Ricardo Neri 
---
 arch/x86/kernel/umip.c | 45 +++--
 1 file changed, 43 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
index b16542a..93bc80d 100644
--- a/arch/x86/kernel/umip.c
+++ b/arch/x86/kernel/umip.c
@@ -170,6 +170,41 @@ static int __emulate_umip_insn(struct insn *insn, enum 
umip_insn umip_inst,
 }
 
 /**
+ * __force_sig_info_umip_fault - Force a SIGSEGV with SEGV_MAPERR
+ * @address:   Address that caused the signal
+ * @regs:  Register set containing the instruction pointer
+ *
+ * Force a SIGSEGV signal with SEGV_MAPERR as the error code. This function is
+ * intended to be used to provide a segmentation fault when the result of the
+ * UMIP emulation could not be copied to the user space memory.
+ *
+ * Return: none
+ */
+static void __force_sig_info_umip_fault(void __user *address,
+   struct pt_regs *regs)
+{
+   siginfo_t info;
+   struct task_struct *tsk = current;
+
+   if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV)) {
+   printk_ratelimited("%s[%d] umip emulation segfault ip:%lx 
sp:%lx error:%lx in %lx\n",
+  tsk->comm, task_pid_nr(tsk), regs->ip,
+  regs->sp, UMIP_PF_USER | UMIP_PF_WRITE,
+  regs->ip);
+   }
+
+   tsk->thread.cr2 = (unsigned long)address;
+   tsk->thread.error_code  = UMIP_PF_USER | UMIP_PF_WRITE;
+   tsk->thread.trap_nr = X86_TRAP_PF;
+
+   info.si_signo   = SIGSEGV;
+   info.si_errno   = 0;
+   info.si_code= SEGV_MAPERR;
+   info.si_addr= address;
+   force_sig_info(SIGSEGV, , tsk);
+}
+
+/**
  * fixup_umip_exception - Fixup #GP faults caused by UMIP
  * @regs:  Registers as saved when entering the #GP trap
  *
@@ -252,8 +287,14 @@ bool fixup_umip_exception(struct pt_regs *regs)
} else {
uaddr = insn_get_addr_ref(, regs);
nr_copied = copy_to_user(uaddr, dummy_data, dummy_data_size);
-   if (nr_copied  > 0)
-   return false;
+   if (nr_copied  > 0) {
+   /*
+* If copy fails, send a signal and tell caller that
+* fault was fixed up
+*/
+   __force_sig_info_umip_fault(uaddr, regs);
+   return true;
+   }
}
 
/* increase IP to let the program keep going */
-- 
2.9.3

[PATCH v4 12/17] x86/cpufeature: Add User-Mode Instruction Prevention definitions

2017-02-22 Thread Ricardo Neri

User-Mode Instruction Prevention is a security feature present in new
Intel processors that, when set, prevents the execution of a subset of
instructions if such instructions are executed in user mode (CPL > 0).
Attempting to execute such instructions causes a general protection
exception.

The subset of instructions comprises:

 * SGDT - Store Global Descriptor Table
 * SIDT - Store Interrupt Descriptor Table
 * SLDT - Store Local Descriptor Table
 * SMSW - Store Machine Status Word
 * STR  - Store Task Register

This feature is also added to the list of disabled-features to allow
a cleaner handling of build-time configuration.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: Liang Z. Li 
Cc: Alexandre Julliard 
Cc: Stas Sergeev 
Cc: x...@kernel.org
Cc: linux-ms...@vger.kernel.org

Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/cpufeatures.h  | 1 +
 arch/x86/include/asm/disabled-features.h| 8 +++-
 arch/x86/include/uapi/asm/processor-flags.h | 2 ++
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 4e77723..0739f1e 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -286,6 +286,7 @@
 
 /* Intel-defined CPU features, CPUID level 0x0007:0 (ecx), word 16 */
 #define X86_FEATURE_AVX512VBMI  (16*32+ 1) /* AVX512 Vector Bit Manipulation 
instructions*/
+#define X86_FEATURE_UMIP   (16*32+ 2) /* User Mode Instruction Protection 
*/
 #define X86_FEATURE_PKU(16*32+ 3) /* Protection Keys for 
Userspace */
 #define X86_FEATURE_OSPKE  (16*32+ 4) /* OS Protection Keys Enable */
 #define X86_FEATURE_AVX512_VPOPCNTDQ (16*32+14) /* POPCNT for vectors of DW/QW 
*/
diff --git a/arch/x86/include/asm/disabled-features.h 
b/arch/x86/include/asm/disabled-features.h
index 85599ad..4707445 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -16,6 +16,12 @@
 # define DISABLE_MPX   (1<<(X86_FEATURE_MPX & 31))
 #endif
 
+#ifdef CONFIG_X86_INTEL_UMIP
+# define DISABLE_UMIP  0
+#else
+# define DISABLE_UMIP  (1<<(X86_FEATURE_UMIP & 31))
+#endif
+
 #ifdef CONFIG_X86_64
 # define DISABLE_VME   (1<<(X86_FEATURE_VME & 31))
 # define DISABLE_K6_MTRR   (1<<(X86_FEATURE_K6_MTRR & 31))
@@ -55,7 +61,7 @@
 #define DISABLED_MASK130
 #define DISABLED_MASK140
 #define DISABLED_MASK150
-#define DISABLED_MASK16(DISABLE_PKU|DISABLE_OSPKE)
+#define DISABLED_MASK16(DISABLE_PKU|DISABLE_OSPKE|DISABLE_UMIP)
 #define DISABLED_MASK170
 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 18)
 
diff --git a/arch/x86/include/uapi/asm/processor-flags.h 
b/arch/x86/include/uapi/asm/processor-flags.h
index 567de50..d2c2af8 100644
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -104,6 +104,8 @@
 #define X86_CR4_OSFXSR _BITUL(X86_CR4_OSFXSR_BIT)
 #define X86_CR4_OSXMMEXCPT_BIT 10 /* enable unmasked SSE exceptions */
 #define X86_CR4_OSXMMEXCPT _BITUL(X86_CR4_OSXMMEXCPT_BIT)
+#define X86_CR4_UMIP_BIT   11 /* enable UMIP support */
+#define X86_CR4_UMIP   _BITUL(X86_CR4_UMIP_BIT)
 #define X86_CR4_VMXE_BIT   13 /* enable VMX virtualization */
 #define X86_CR4_VMXE   _BITUL(X86_CR4_VMXE_BIT)
 #define X86_CR4_SMXE_BIT   14 /* enable safer mode (TXT) */
-- 
2.9.3

[PATCH v4 17/17] selftests/x86: Add tests for User-Mode Instruction Prevention

2017-02-22 Thread Ricardo Neri

Certain user space programs that run on virtual-8086 mode may utilize
instructions protected by the User-Mode Instruction Prevention (UMIP)
security feature present in new Intel processors: SGDT, SIDT and SMSW. In
such a case, a general protection fault is issued if UMIP is enabled. When
such a fault happens, the kernel catches it and emulates the results of
these instructions with dummy values. The purpose of this new
test is to verify whether the impacted instructions can be executed without
causing such #GP. If no #GP exceptions occur, we expect to exit virtual-
8086 mode from INT 0x80.

The instructions protected by UMIP are executed in representative use
cases:
 a) the memory address of the result is given in the form of a displacement
from the base of the data segment
 b) the memory address of the result is given in a general purpose register
 c) the result is stored directly in a general purpose register.

Unfortunately, it is not possible to check the results against a set of
expected values because no emulation will occur in systems that do not have
the UMIP feature. Instead, results are printed for verification.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Signed-off-by: Ricardo Neri 
---
 tools/testing/selftests/x86/entry_from_vm86.c | 39 ++-
 1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/x86/entry_from_vm86.c 
b/tools/testing/selftests/x86/entry_from_vm86.c
index d075ea0..377b773 100644
--- a/tools/testing/selftests/x86/entry_from_vm86.c
+++ b/tools/testing/selftests/x86/entry_from_vm86.c
@@ -95,6 +95,22 @@ asm (
"int3\n\t"
"vmcode_int80:\n\t"
"int $0x80\n\t"
+   "umip:\n\t"
+   /* addressing via displacements */
+   "smsw (2052)\n\t"
+   "sidt (2054)\n\t"
+   "sgdt (2060)\n\t"
+   /* addressing via registers */
+   "mov $2066, %bx\n\t"
+   "smsw (%bx)\n\t"
+   "mov $2068, %bx\n\t"
+   "sidt (%bx)\n\t"
+   "mov $2074, %bx\n\t"
+   "sgdt (%bx)\n\t"
+   /* register operands, only for smsw */
+   "smsw %ax\n\t"
+   "mov %ax, (2080)\n\t"
+   "int $0x80\n\t"
".size vmcode, . - vmcode\n\t"
"end_vmcode:\n\t"
".code32\n\t"
@@ -103,7 +119,7 @@ asm (
 
 extern unsigned char vmcode[], end_vmcode[];
 extern unsigned char vmcode_bound[], vmcode_sysenter[], vmcode_syscall[],
-   vmcode_sti[], vmcode_int3[], vmcode_int80[];
+   vmcode_sti[], vmcode_int3[], vmcode_int80[], umip[];
 
 /* Returns false if the test was skipped. */
 static bool do_test(struct vm86plus_struct *v86, unsigned long eip,
@@ -218,6 +234,27 @@ int main(void)
v86.regs.eax = (unsigned int)-1;
do_test(, vmcode_int80 - vmcode, VM86_INTx, 0x80, "int80");
 
+   /* UMIP -- should exit with INTx 0x80 unless UMIP was not disabled */
+   do_test(, umip - vmcode, VM86_INTx, 0x80, "UMIP tests");
+   printf("[INFO]\tResults of UMIP-protected instructions via 
displacements:\n");
+   printf("[INFO]\tSMSW:[0x%04x]\n", *(unsigned short *)(addr + 2052));
+   printf("[INFO]\tSIDT: limit[0x%04x]base[0x%08lx]\n",
+  *(unsigned short *)(addr + 2054),
+  *(unsigned long  *)(addr + 2056));
+   printf("[INFO]\tSGDT: limit[0x%04x]base[0x%08lx]\n",
+  *(unsigned short *)(addr + 2060),
+  *(unsigned long  *)(addr + 2062));
+   printf("[INFO]\tResults of UMIP-protected instructions via addressing 
in registers:\n");
+   printf("[INFO]\tSMSW:[0x%04x]\n", *(unsigned short *)(addr + 2066));
+   printf("[INFO]\tSIDT: limit[0x%04x]base[0x%08lx]\n",
+  *(unsigned short *)(addr + 2068),
+  *(unsigned long  *)(addr + 2070));
+   printf("[INFO]\tSGDT: limit[0x%04x]base[0x%08lx]\n",
+  *(unsigned short *)(addr + 2074),
+  *(unsigned long  *)(addr + 2076));
+   printf("[INFO]\tResults of SMSW via register operands:\n");
+   printf("[INFO]\tSMSW:[0x%04x]\n", *(unsigned short *)(addr + 2080));
+
/* Execute a null pointer */
v86.regs.cs = 0;
v86.regs.ss = 0;
-- 
2.9.3

[PATCH v4 14/17] x86/umip: Force a page fault when unable to copy emulated result to user

2017-02-22 Thread Ricardo Neri

fixup_umip_exception will be called from do_general_protection. If the
former returns false, the latter will issue a SIGSEGV with SEND_SIG_PRIV.
However, when emulation is successful but the emulated result cannot be
copied to user space memory, it is more accurate to issue a SIGSEGV with
SEGV_MAPERR with the offending address. A new function is inspired in
force_sig_info_fault is introduced to model the page fault.

Signed-off-by: Ricardo Neri 
---
 arch/x86/kernel/umip.c | 45 +++--
 1 file changed, 43 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
index b16542a..93bc80d 100644
--- a/arch/x86/kernel/umip.c
+++ b/arch/x86/kernel/umip.c
@@ -170,6 +170,41 @@ static int __emulate_umip_insn(struct insn *insn, enum 
umip_insn umip_inst,
 }
 
 /**
+ * __force_sig_info_umip_fault - Force a SIGSEGV with SEGV_MAPERR
+ * @address:   Address that caused the signal
+ * @regs:  Register set containing the instruction pointer
+ *
+ * Force a SIGSEGV signal with SEGV_MAPERR as the error code. This function is
+ * intended to be used to provide a segmentation fault when the result of the
+ * UMIP emulation could not be copied to the user space memory.
+ *
+ * Return: none
+ */
+static void __force_sig_info_umip_fault(void __user *address,
+   struct pt_regs *regs)
+{
+   siginfo_t info;
+   struct task_struct *tsk = current;
+
+   if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV)) {
+   printk_ratelimited("%s[%d] umip emulation segfault ip:%lx 
sp:%lx error:%lx in %lx\n",
+  tsk->comm, task_pid_nr(tsk), regs->ip,
+  regs->sp, UMIP_PF_USER | UMIP_PF_WRITE,
+  regs->ip);
+   }
+
+   tsk->thread.cr2 = (unsigned long)address;
+   tsk->thread.error_code  = UMIP_PF_USER | UMIP_PF_WRITE;
+   tsk->thread.trap_nr = X86_TRAP_PF;
+
+   info.si_signo   = SIGSEGV;
+   info.si_errno   = 0;
+   info.si_code= SEGV_MAPERR;
+   info.si_addr= address;
+   force_sig_info(SIGSEGV, , tsk);
+}
+
+/**
  * fixup_umip_exception - Fixup #GP faults caused by UMIP
  * @regs:  Registers as saved when entering the #GP trap
  *
@@ -252,8 +287,14 @@ bool fixup_umip_exception(struct pt_regs *regs)
} else {
uaddr = insn_get_addr_ref(, regs);
nr_copied = copy_to_user(uaddr, dummy_data, dummy_data_size);
-   if (nr_copied  > 0)
-   return false;
+   if (nr_copied  > 0) {
+   /*
+* If copy fails, send a signal and tell caller that
+* fault was fixed up
+*/
+   __force_sig_info_umip_fault(uaddr, regs);
+   return true;
+   }
}
 
/* increase IP to let the program keep going */
-- 
2.9.3

[PATCH v4 12/17] x86/cpufeature: Add User-Mode Instruction Prevention definitions

2017-02-22 Thread Ricardo Neri

User-Mode Instruction Prevention is a security feature present in new
Intel processors that, when set, prevents the execution of a subset of
instructions if such instructions are executed in user mode (CPL > 0).
Attempting to execute such instructions causes a general protection
exception.

The subset of instructions comprises:

 * SGDT - Store Global Descriptor Table
 * SIDT - Store Interrupt Descriptor Table
 * SLDT - Store Local Descriptor Table
 * SMSW - Store Machine Status Word
 * STR  - Store Task Register

This feature is also added to the list of disabled-features to allow
a cleaner handling of build-time configuration.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: Liang Z. Li 
Cc: Alexandre Julliard 
Cc: Stas Sergeev 
Cc: x...@kernel.org
Cc: linux-ms...@vger.kernel.org

Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/cpufeatures.h  | 1 +
 arch/x86/include/asm/disabled-features.h| 8 +++-
 arch/x86/include/uapi/asm/processor-flags.h | 2 ++
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 4e77723..0739f1e 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -286,6 +286,7 @@
 
 /* Intel-defined CPU features, CPUID level 0x0007:0 (ecx), word 16 */
 #define X86_FEATURE_AVX512VBMI  (16*32+ 1) /* AVX512 Vector Bit Manipulation 
instructions*/
+#define X86_FEATURE_UMIP   (16*32+ 2) /* User Mode Instruction Protection 
*/
 #define X86_FEATURE_PKU(16*32+ 3) /* Protection Keys for 
Userspace */
 #define X86_FEATURE_OSPKE  (16*32+ 4) /* OS Protection Keys Enable */
 #define X86_FEATURE_AVX512_VPOPCNTDQ (16*32+14) /* POPCNT for vectors of DW/QW 
*/
diff --git a/arch/x86/include/asm/disabled-features.h 
b/arch/x86/include/asm/disabled-features.h
index 85599ad..4707445 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -16,6 +16,12 @@
 # define DISABLE_MPX   (1<<(X86_FEATURE_MPX & 31))
 #endif
 
+#ifdef CONFIG_X86_INTEL_UMIP
+# define DISABLE_UMIP  0
+#else
+# define DISABLE_UMIP  (1<<(X86_FEATURE_UMIP & 31))
+#endif
+
 #ifdef CONFIG_X86_64
 # define DISABLE_VME   (1<<(X86_FEATURE_VME & 31))
 # define DISABLE_K6_MTRR   (1<<(X86_FEATURE_K6_MTRR & 31))
@@ -55,7 +61,7 @@
 #define DISABLED_MASK130
 #define DISABLED_MASK140
 #define DISABLED_MASK150
-#define DISABLED_MASK16(DISABLE_PKU|DISABLE_OSPKE)
+#define DISABLED_MASK16(DISABLE_PKU|DISABLE_OSPKE|DISABLE_UMIP)
 #define DISABLED_MASK170
 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 18)
 
diff --git a/arch/x86/include/uapi/asm/processor-flags.h 
b/arch/x86/include/uapi/asm/processor-flags.h
index 567de50..d2c2af8 100644
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -104,6 +104,8 @@
 #define X86_CR4_OSFXSR _BITUL(X86_CR4_OSFXSR_BIT)
 #define X86_CR4_OSXMMEXCPT_BIT 10 /* enable unmasked SSE exceptions */
 #define X86_CR4_OSXMMEXCPT _BITUL(X86_CR4_OSXMMEXCPT_BIT)
+#define X86_CR4_UMIP_BIT   11 /* enable UMIP support */
+#define X86_CR4_UMIP   _BITUL(X86_CR4_UMIP_BIT)
 #define X86_CR4_VMXE_BIT   13 /* enable VMX virtualization */
 #define X86_CR4_VMXE   _BITUL(X86_CR4_VMXE_BIT)
 #define X86_CR4_SMXE_BIT   14 /* enable safer mode (TXT) */
-- 
2.9.3

[PATCH v4 17/17] selftests/x86: Add tests for User-Mode Instruction Prevention

2017-02-22 Thread Ricardo Neri

Certain user space programs that run on virtual-8086 mode may utilize
instructions protected by the User-Mode Instruction Prevention (UMIP)
security feature present in new Intel processors: SGDT, SIDT and SMSW. In
such a case, a general protection fault is issued if UMIP is enabled. When
such a fault happens, the kernel catches it and emulates the results of
these instructions with dummy values. The purpose of this new
test is to verify whether the impacted instructions can be executed without
causing such #GP. If no #GP exceptions occur, we expect to exit virtual-
8086 mode from INT 0x80.

The instructions protected by UMIP are executed in representative use
cases:
 a) the memory address of the result is given in the form of a displacement
from the base of the data segment
 b) the memory address of the result is given in a general purpose register
 c) the result is stored directly in a general purpose register.

Unfortunately, it is not possible to check the results against a set of
expected values because no emulation will occur in systems that do not have
the UMIP feature. Instead, results are printed for verification.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Signed-off-by: Ricardo Neri 
---
 tools/testing/selftests/x86/entry_from_vm86.c | 39 ++-
 1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/x86/entry_from_vm86.c 
b/tools/testing/selftests/x86/entry_from_vm86.c
index d075ea0..377b773 100644
--- a/tools/testing/selftests/x86/entry_from_vm86.c
+++ b/tools/testing/selftests/x86/entry_from_vm86.c
@@ -95,6 +95,22 @@ asm (
"int3\n\t"
"vmcode_int80:\n\t"
"int $0x80\n\t"
+   "umip:\n\t"
+   /* addressing via displacements */
+   "smsw (2052)\n\t"
+   "sidt (2054)\n\t"
+   "sgdt (2060)\n\t"
+   /* addressing via registers */
+   "mov $2066, %bx\n\t"
+   "smsw (%bx)\n\t"
+   "mov $2068, %bx\n\t"
+   "sidt (%bx)\n\t"
+   "mov $2074, %bx\n\t"
+   "sgdt (%bx)\n\t"
+   /* register operands, only for smsw */
+   "smsw %ax\n\t"
+   "mov %ax, (2080)\n\t"
+   "int $0x80\n\t"
".size vmcode, . - vmcode\n\t"
"end_vmcode:\n\t"
".code32\n\t"
@@ -103,7 +119,7 @@ asm (
 
 extern unsigned char vmcode[], end_vmcode[];
 extern unsigned char vmcode_bound[], vmcode_sysenter[], vmcode_syscall[],
-   vmcode_sti[], vmcode_int3[], vmcode_int80[];
+   vmcode_sti[], vmcode_int3[], vmcode_int80[], umip[];
 
 /* Returns false if the test was skipped. */
 static bool do_test(struct vm86plus_struct *v86, unsigned long eip,
@@ -218,6 +234,27 @@ int main(void)
v86.regs.eax = (unsigned int)-1;
do_test(, vmcode_int80 - vmcode, VM86_INTx, 0x80, "int80");
 
+   /* UMIP -- should exit with INTx 0x80 unless UMIP was not disabled */
+   do_test(, umip - vmcode, VM86_INTx, 0x80, "UMIP tests");
+   printf("[INFO]\tResults of UMIP-protected instructions via 
displacements:\n");
+   printf("[INFO]\tSMSW:[0x%04x]\n", *(unsigned short *)(addr + 2052));
+   printf("[INFO]\tSIDT: limit[0x%04x]base[0x%08lx]\n",
+  *(unsigned short *)(addr + 2054),
+  *(unsigned long  *)(addr + 2056));
+   printf("[INFO]\tSGDT: limit[0x%04x]base[0x%08lx]\n",
+  *(unsigned short *)(addr + 2060),
+  *(unsigned long  *)(addr + 2062));
+   printf("[INFO]\tResults of UMIP-protected instructions via addressing 
in registers:\n");
+   printf("[INFO]\tSMSW:[0x%04x]\n", *(unsigned short *)(addr + 2066));
+   printf("[INFO]\tSIDT: limit[0x%04x]base[0x%08lx]\n",
+  *(unsigned short *)(addr + 2068),
+  *(unsigned long  *)(addr + 2070));
+   printf("[INFO]\tSGDT: limit[0x%04x]base[0x%08lx]\n",
+  *(unsigned short *)(addr + 2074),
+  *(unsigned long  *)(addr + 2076));
+   printf("[INFO]\tResults of SMSW via register operands:\n");
+   printf("[INFO]\tSMSW:[0x%04x]\n", *(unsigned short *)(addr + 2080));
+
/* Execute a null pointer */
v86.regs.cs = 0;
v86.regs.ss = 0;
-- 
2.9.3

Re: [PATCH v2] staging: wilc1000: renames struct tstrRSSI and its members u8Index, u8Full

2017-02-22 Thread Joe Perches

On Wed, 2017-02-22 at 20:50 +0100, Arend Van Spriel wrote:
> On 22-2-2017 18:14, Tahia Khan wrote:
> > Fixes multiple camel case checks on struct tstrRSSI from checkpatch.pl:
[]
> Just a generic remark that may help you with other changes you will be
> making in the linux kernel. Warnings from checkpatch.pl and other tools
> are useful, but try to look further than just fixing a warning.
> Understand what the code is doing is just as important.

I'd assert understanding what the code is doing is
_more_ important.  Style consistency simply helps
improve the speed of a new reader's understanding.

Re: [PATCH v2] staging: wilc1000: renames struct tstrRSSI and its members u8Index, u8Full

2017-02-22 Thread Joe Perches

On Wed, 2017-02-22 at 20:50 +0100, Arend Van Spriel wrote:
> On 22-2-2017 18:14, Tahia Khan wrote:
> > Fixes multiple camel case checks on struct tstrRSSI from checkpatch.pl:
[]
> Just a generic remark that may help you with other changes you will be
> making in the linux kernel. Warnings from checkpatch.pl and other tools
> are useful, but try to look further than just fixing a warning.
> Understand what the code is doing is just as important.

I'd assert understanding what the code is doing is
_more_ important.  Style consistency simply helps
improve the speed of a new reader's understanding.

Re: [PATCH V2 1/3] arm64: dts: Add basic DT to support Spreadtrum's SP9860G

2017-02-22 Thread Chunyan Zhang

[...]

>>> > +
>>> > +   soc {
>>> > +   soc_funnel: funnel@10001000 {
>>>
>>> There is no need for a label ("soc_funnel) before the device name if that
>>> device is not referenced elsewhere in the DTS.  The same comment applies to 
>>> most
>>> of the component listed below.
>>>
>>
>> OK, I will remove these labels from this DT.
>> And there's another issue I'd like to discuss with you, do you think which 
>> way is better:
>> 1) use class name which can represent this kind of components as device node 
>> name in DT, e.g.
>> funnel@... {
>>
>> }
>> replicator@... {
>>
>> }
>> etb@... {
>>
>> }
>> etf@...
>> etm@...
>> stm@...
>>
>> 2) use more descriptive device name for those which are more than one on
>> a SoC, e.g.
>> soc-funnel@... {
>>
>> }
>> cluster0-funnel@... {
>>
>> }
>> cluster1-funnel@... {
>>
>> }
>>
>> I noticed Juno use the 2), would you suggest that way?
>
> It is better to describe the HW component themselves rather than where
> they are in the topology - the address of the component will make sure
> the names are unique.  So just the component type (etm, funnel,
> replicator, ) and the address they are located at.
>

OK. And to avoid making other person confused in the future, is it
better to revise juno-base.dtsi according to this convention?

Thanks,
Chunyan

>>
>> Thanks,
>> Chunyan
>>
>>> > +   compatible = "arm,coresight-funnel", "arm,primecell";
>>> > +   reg = <0 0x10001000 0 0x1000>;
>>> > +   clocks = <_26m>;
>>> > +   clock-names = "apb_pclk";
>>> > +   ports {
>>> > +   #address-cells = <1>;
>>> > +   #size-cells = <0>;
>>> > +
>>> > +   port@0 {
>>> > +   reg = <0>;
>>> > +   soc_funnel_out_port: endpoint {
>>> > +   remote-endpoint = <_in>;
>>> > +   };
>>> > +   };
>>> > +
>>> > +   port@1 {
>>> > +   reg = <0>;
>>> > +   soc_funnel_in_port: endpoint {
>>> > +   slave-mode;
>>> > +   remote-endpoint =
>>> > +   <_funnel_out_port>;
>>> > +   };
>>> > +   };
>>> > +   };
>>> > +   };
>>> > +
>>> > +   etb@10003000 {
>>> > +   compatible = "arm,coresight-tmc", "arm,primecell";
>>> > +   reg = <0 0x10003000 0 0x1000>;
>>> > +   clocks = <_26m>;
>>> > +   clock-names = "apb_pclk";
>>> > +   port {
>>> > +   etb_in: endpoint {
>>> > +   slave-mode;
>>> > +   remote-endpoint =
>>> > +   <_funnel_out_port>;
>>> > +   };
>>> > +   };
>>> > +   };
>>> > +
>>> > +   cluster0_funnel: funnel@11001000 {
>>> > +   compatible = "arm,coresight-funnel", "arm,primecell";
>>> > +   reg = <0 0x11001000 0 0x1000>;
>>> > +   clocks = <_26m>;
>>> > +   clock-names = "apb_pclk";
>>> > +   ports {
>>> > +   #address-cells = <1>;
>>> > +   #size-cells = <0>;
>>> > +
>>> > +   port@0 {
>>> > +   reg = <0>;
>>> > +   cluster0_funnel_out_port: endpoint {
>>> > +   remote-endpoint =
>>> > +   <_etf_in>;
>>> > +   };
>>> > +   };
>>> > +
>>> > +   port@1 {
>>> > +   reg = <0>;
>>> > +   cluster0_funnel_in_port0: endpoint {
>>> > +   slave-mode;
>>> > +   remote-endpoint = <_out>;
>>> > +   };
>>> > +   };
>>> > +
>>> > +   port@2 {
>>> > +   reg = <1>;
>>> > +   cluster0_funnel_in_port1: endpoint {
>>> > +   slave-mode;
>>> > +   remote-endpoint = <_out>;
>>> > +   };
>>> > +   };
>>> > +
>>> > +

Re: [PATCH V2 1/3] arm64: dts: Add basic DT to support Spreadtrum's SP9860G

2017-02-22 Thread Chunyan Zhang

[...]

>>> > +
>>> > +   soc {
>>> > +   soc_funnel: funnel@10001000 {
>>>
>>> There is no need for a label ("soc_funnel) before the device name if that
>>> device is not referenced elsewhere in the DTS.  The same comment applies to 
>>> most
>>> of the component listed below.
>>>
>>
>> OK, I will remove these labels from this DT.
>> And there's another issue I'd like to discuss with you, do you think which 
>> way is better:
>> 1) use class name which can represent this kind of components as device node 
>> name in DT, e.g.
>> funnel@... {
>>
>> }
>> replicator@... {
>>
>> }
>> etb@... {
>>
>> }
>> etf@...
>> etm@...
>> stm@...
>>
>> 2) use more descriptive device name for those which are more than one on
>> a SoC, e.g.
>> soc-funnel@... {
>>
>> }
>> cluster0-funnel@... {
>>
>> }
>> cluster1-funnel@... {
>>
>> }
>>
>> I noticed Juno use the 2), would you suggest that way?
>
> It is better to describe the HW component themselves rather than where
> they are in the topology - the address of the component will make sure
> the names are unique.  So just the component type (etm, funnel,
> replicator, ) and the address they are located at.
>

OK. And to avoid making other person confused in the future, is it
better to revise juno-base.dtsi according to this convention?

Thanks,
Chunyan

>>
>> Thanks,
>> Chunyan
>>
>>> > +   compatible = "arm,coresight-funnel", "arm,primecell";
>>> > +   reg = <0 0x10001000 0 0x1000>;
>>> > +   clocks = <_26m>;
>>> > +   clock-names = "apb_pclk";
>>> > +   ports {
>>> > +   #address-cells = <1>;
>>> > +   #size-cells = <0>;
>>> > +
>>> > +   port@0 {
>>> > +   reg = <0>;
>>> > +   soc_funnel_out_port: endpoint {
>>> > +   remote-endpoint = <_in>;
>>> > +   };
>>> > +   };
>>> > +
>>> > +   port@1 {
>>> > +   reg = <0>;
>>> > +   soc_funnel_in_port: endpoint {
>>> > +   slave-mode;
>>> > +   remote-endpoint =
>>> > +   <_funnel_out_port>;
>>> > +   };
>>> > +   };
>>> > +   };
>>> > +   };
>>> > +
>>> > +   etb@10003000 {
>>> > +   compatible = "arm,coresight-tmc", "arm,primecell";
>>> > +   reg = <0 0x10003000 0 0x1000>;
>>> > +   clocks = <_26m>;
>>> > +   clock-names = "apb_pclk";
>>> > +   port {
>>> > +   etb_in: endpoint {
>>> > +   slave-mode;
>>> > +   remote-endpoint =
>>> > +   <_funnel_out_port>;
>>> > +   };
>>> > +   };
>>> > +   };
>>> > +
>>> > +   cluster0_funnel: funnel@11001000 {
>>> > +   compatible = "arm,coresight-funnel", "arm,primecell";
>>> > +   reg = <0 0x11001000 0 0x1000>;
>>> > +   clocks = <_26m>;
>>> > +   clock-names = "apb_pclk";
>>> > +   ports {
>>> > +   #address-cells = <1>;
>>> > +   #size-cells = <0>;
>>> > +
>>> > +   port@0 {
>>> > +   reg = <0>;
>>> > +   cluster0_funnel_out_port: endpoint {
>>> > +   remote-endpoint =
>>> > +   <_etf_in>;
>>> > +   };
>>> > +   };
>>> > +
>>> > +   port@1 {
>>> > +   reg = <0>;
>>> > +   cluster0_funnel_in_port0: endpoint {
>>> > +   slave-mode;
>>> > +   remote-endpoint = <_out>;
>>> > +   };
>>> > +   };
>>> > +
>>> > +   port@2 {
>>> > +   reg = <1>;
>>> > +   cluster0_funnel_in_port1: endpoint {
>>> > +   slave-mode;
>>> > +   remote-endpoint = <_out>;
>>> > +   };
>>> > +   };
>>> > +
>>> > +

Re: git email From: parsing (was Re: [GIT PULL] Staging/IIO driver patches for 4.11-rc1)

2017-02-22 Thread Jeff King

On Thu, Feb 23, 2017 at 07:04:44AM +0100, Greg KH wrote:

> > Poor Simon Sandström.
> > 
> > Funnily enough, this only exists for one commit. You've got several
> > other commits from Simon that get his name right.
> > 
> > What happened?
> 
> I don't know what happened, I used git for this, I don't use quilt for
> "normal" patches accepted into my trees anymore, only for stable kernel
> work.
> 
> So either the mail is malformed, or git couldn't figure it out, I've
> attached the original message below, and cc:ed the git mailing list.
> 
> Also, Simon emailed me after this was committed saying something went
> wrong, but I couldn't go back and rebase my tree.  Simon, did you ever
> figure out if something was odd on your end?
> 
> Git developers, any ideas?

The problem isn't on the applying end, but rather on the generating end.
The From header in the attached mbox is:

  From: =?us-ascii?B?PT9VVEYtOD9xP1NpbW9uPTIwU2FuZHN0cj1DMz1CNm0/PQ==?= 


If you de-base64 that, you get:

  =?UTF-8?q?Simon=20Sandstr=C3=B6m?=

So something double-encoded it before it got to your mbox.

-Peff

Re: git email From: parsing (was Re: [GIT PULL] Staging/IIO driver patches for 4.11-rc1)

2017-02-22 Thread Jeff King

On Thu, Feb 23, 2017 at 07:04:44AM +0100, Greg KH wrote:

> > Poor Simon Sandström.
> > 
> > Funnily enough, this only exists for one commit. You've got several
> > other commits from Simon that get his name right.
> > 
> > What happened?
> 
> I don't know what happened, I used git for this, I don't use quilt for
> "normal" patches accepted into my trees anymore, only for stable kernel
> work.
> 
> So either the mail is malformed, or git couldn't figure it out, I've
> attached the original message below, and cc:ed the git mailing list.
> 
> Also, Simon emailed me after this was committed saying something went
> wrong, but I couldn't go back and rebase my tree.  Simon, did you ever
> figure out if something was odd on your end?
> 
> Git developers, any ideas?

The problem isn't on the applying end, but rather on the generating end.
The From header in the attached mbox is:

  From: =?us-ascii?B?PT9VVEYtOD9xP1NpbW9uPTIwU2FuZHN0cj1DMz1CNm0/PQ==?= 


If you de-base64 that, you get:

  =?UTF-8?q?Simon=20Sandstr=C3=B6m?=

So something double-encoded it before it got to your mbox.

-Peff

Re: Problem on SCTP

2017-02-22 Thread Xin Long

On Thu, Feb 23, 2017 at 1:30 PM, Sun Paul  wrote:
> does this fixed in RHEL7?
yes, I think so.

>
> On Wed, Feb 22, 2017 at 11:03 AM, Xin Long  wrote:
>> On Wed, Feb 22, 2017 at 10:29 AM, Sun Paul  wrote:
>>> Hi Xin
>>>
>>> do you mean we need to patch the kernel?
>> Yups, pls comment on that bz if it's really needed for your env.
>> A z-stream kernel may be available for that issue soon.
>>
>> Thanks.
>>
>>>
>>>
>>>
>>> On Wed, Feb 22, 2017 at 10:00 AM, Xin Long  wrote:
 On Wed, Feb 22, 2017 at 9:12 AM, Sun Paul  wrote:
> Hi
>
> the router is actually is a linux running on RHEL6.8
> (2.6.32-642.4.2.el6.x86_64). it uses iptables to do SNAT aand DNAT
> forward.
 https://bugzilla.redhat.com/show_bug.cgi?id=1412038

 sctp_manip_pkt->sctp_compute_cksum:
 struct sctphdr *sh = sctp_hdr(sub);

 But in rhel6, skb->transport_header is not yet set at that time.
 This patch should be backported into rhel6.

 commit 21d1196a35f5686c4323e42a62fdb4b23b0ab4a3
 Author: Eric Dumazet 
 Date:   Mon Jul 15 20:03:19 2013 -0700

 ipv4: set transport header earlier


>
> On Tue, Feb 21, 2017 at 11:53 PM, Xin Long  wrote:
>> On Tue, Feb 21, 2017 at 12:26 PM, Sun Paul  wrote:
>>> Hi,
>>>
>>> sorry to get back late, the platform is running on KVM. and this
>>> problem is resolved by moving to VMware environment, however,  a new
>>> problem is coming out, we noticed that the HB REQ is being ABORT from
>>> client.
>>>
>>>
>>> 03:32:35.233572 IP 10.165.250.22.3868 > 192.168.2.13.40001: sctp (1)
>>> [HB REQ] (from server to sctp router)
>>> 03:32:35.233603 IP 192.168.2.14.3868 > 192.168.2.13.40001: sctp (1)
>>> [HB REQ] (from sctp router to client)
>>> 03:32:35.233852 IP 192.168.2.13.40001 > 192.168.2.14.3868: sctp (1)
>>> [ABORT] (from client to sctp router)
>>>
>>> 03:32:37.928679 IP 10.165.250.22.3868 > 192.168.2.13.40001: sctp (1) 
>>> [HB REQ]
>>> 03:32:37.928717 IP 192.168.2.14.3868 > 192.168.2.13.40001: sctp (1) [HB 
>>> REQ]
>>> 03:32:37.929247 IP 192.168.2.13.40001 > 192.168.2.14.3868: sctp (1) 
>>> [ABORT]
>>>
>>> For the above packet flow, 10.165.250.22 is the server and
>>> 192.168.2.13 is the client, the server 10.165.250.22 sends HB REQ to
>>> client 192.168.2.13 through 192.168.2.14 (the SCTP router), and the
>>> SCTP router change the src address before forward the HB REQ to the
>>> client.
>>>
>>> But somehow the client is ABORT the HB REQ, any idea? is it related to
>>> the HEARTBEAT information? or the checksum again>?
>> The incorrect checksum won't cause ABORT, but the abnormal HB REQ
>> could be, if HB information was modified when calculating the checksum
>> on router, the ABORT may be caused in client process.
>>
>> is your SCTP router linux ? if yes, what's the kernel version ?
>>
>>>
>>> On Fri, Jan 13, 2017 at 9:19 PM, Michael Tuexen
>>>  wrote:
> On 13 Jan 2017, at 10:43, Michael Tuexen 
>  wrote:
>
> Your router does NOT change any field in the SCTP packet, but the
> SCTP checksum was modified from
>   Checksum: 0xbaea49e5 (not verified)
> to
>   Checksum: 0xa9a86d3f (not verified)
> At least one of these is wrong. Read the tracefiles in wireshark and
> enable checksum validation and wireshark will tell you which one is
> correct. (That is why I asked for .pcap file instead of a .txt).
>
> My guess is that the initial checksum is correct and your box 
> middlebox
> not only changes the destination address and ttl field and header
> checksum in the IP-header (which is expected) but also incorrectly the
> SCTP checksum. Since no field in the SCTP packet has changed, the 
> checksum
> must be the same.
 At the server have a look at the SNMP counters:
 cat /proc/net/sctp/snmp
 You should find a line staring with
 SctpChecksumErrors
 If the number reported there is positive, the node received packets
 with checksum errors.

 Best regards
 Michael
>
> Best regards
> Michael
>> On 13 Jan 2017, at 04:29, Sun Paul  wrote:
>>
>> Frame 2: 98 bytes on wire (784 bits), 98 bytes captured (784 bits)
>>   Encapsulation type: Ethernet (1)
>>   Arrival Time: Jan  6, 2017 16:52:49.662321000 Malay Peninsula 
>> Standard Time
>>   [Time shift for this packet: 0.0 seconds]
>>   Epoch Time: 1483692769.662321000

Re: Problem on SCTP

2017-02-22 Thread Xin Long

On Thu, Feb 23, 2017 at 1:30 PM, Sun Paul  wrote:
> does this fixed in RHEL7?
yes, I think so.

>
> On Wed, Feb 22, 2017 at 11:03 AM, Xin Long  wrote:
>> On Wed, Feb 22, 2017 at 10:29 AM, Sun Paul  wrote:
>>> Hi Xin
>>>
>>> do you mean we need to patch the kernel?
>> Yups, pls comment on that bz if it's really needed for your env.
>> A z-stream kernel may be available for that issue soon.
>>
>> Thanks.
>>
>>>
>>>
>>>
>>> On Wed, Feb 22, 2017 at 10:00 AM, Xin Long  wrote:
 On Wed, Feb 22, 2017 at 9:12 AM, Sun Paul  wrote:
> Hi
>
> the router is actually is a linux running on RHEL6.8
> (2.6.32-642.4.2.el6.x86_64). it uses iptables to do SNAT aand DNAT
> forward.
 https://bugzilla.redhat.com/show_bug.cgi?id=1412038

 sctp_manip_pkt->sctp_compute_cksum:
 struct sctphdr *sh = sctp_hdr(sub);

 But in rhel6, skb->transport_header is not yet set at that time.
 This patch should be backported into rhel6.

 commit 21d1196a35f5686c4323e42a62fdb4b23b0ab4a3
 Author: Eric Dumazet 
 Date:   Mon Jul 15 20:03:19 2013 -0700

 ipv4: set transport header earlier


>
> On Tue, Feb 21, 2017 at 11:53 PM, Xin Long  wrote:
>> On Tue, Feb 21, 2017 at 12:26 PM, Sun Paul  wrote:
>>> Hi,
>>>
>>> sorry to get back late, the platform is running on KVM. and this
>>> problem is resolved by moving to VMware environment, however,  a new
>>> problem is coming out, we noticed that the HB REQ is being ABORT from
>>> client.
>>>
>>>
>>> 03:32:35.233572 IP 10.165.250.22.3868 > 192.168.2.13.40001: sctp (1)
>>> [HB REQ] (from server to sctp router)
>>> 03:32:35.233603 IP 192.168.2.14.3868 > 192.168.2.13.40001: sctp (1)
>>> [HB REQ] (from sctp router to client)
>>> 03:32:35.233852 IP 192.168.2.13.40001 > 192.168.2.14.3868: sctp (1)
>>> [ABORT] (from client to sctp router)
>>>
>>> 03:32:37.928679 IP 10.165.250.22.3868 > 192.168.2.13.40001: sctp (1) 
>>> [HB REQ]
>>> 03:32:37.928717 IP 192.168.2.14.3868 > 192.168.2.13.40001: sctp (1) [HB 
>>> REQ]
>>> 03:32:37.929247 IP 192.168.2.13.40001 > 192.168.2.14.3868: sctp (1) 
>>> [ABORT]
>>>
>>> For the above packet flow, 10.165.250.22 is the server and
>>> 192.168.2.13 is the client, the server 10.165.250.22 sends HB REQ to
>>> client 192.168.2.13 through 192.168.2.14 (the SCTP router), and the
>>> SCTP router change the src address before forward the HB REQ to the
>>> client.
>>>
>>> But somehow the client is ABORT the HB REQ, any idea? is it related to
>>> the HEARTBEAT information? or the checksum again>?
>> The incorrect checksum won't cause ABORT, but the abnormal HB REQ
>> could be, if HB information was modified when calculating the checksum
>> on router, the ABORT may be caused in client process.
>>
>> is your SCTP router linux ? if yes, what's the kernel version ?
>>
>>>
>>> On Fri, Jan 13, 2017 at 9:19 PM, Michael Tuexen
>>>  wrote:
> On 13 Jan 2017, at 10:43, Michael Tuexen 
>  wrote:
>
> Your router does NOT change any field in the SCTP packet, but the
> SCTP checksum was modified from
>   Checksum: 0xbaea49e5 (not verified)
> to
>   Checksum: 0xa9a86d3f (not verified)
> At least one of these is wrong. Read the tracefiles in wireshark and
> enable checksum validation and wireshark will tell you which one is
> correct. (That is why I asked for .pcap file instead of a .txt).
>
> My guess is that the initial checksum is correct and your box 
> middlebox
> not only changes the destination address and ttl field and header
> checksum in the IP-header (which is expected) but also incorrectly the
> SCTP checksum. Since no field in the SCTP packet has changed, the 
> checksum
> must be the same.
 At the server have a look at the SNMP counters:
 cat /proc/net/sctp/snmp
 You should find a line staring with
 SctpChecksumErrors
 If the number reported there is positive, the node received packets
 with checksum errors.

 Best regards
 Michael
>
> Best regards
> Michael
>> On 13 Jan 2017, at 04:29, Sun Paul  wrote:
>>
>> Frame 2: 98 bytes on wire (784 bits), 98 bytes captured (784 bits)
>>   Encapsulation type: Ethernet (1)
>>   Arrival Time: Jan  6, 2017 16:52:49.662321000 Malay Peninsula 
>> Standard Time
>>   [Time shift for this packet: 0.0 seconds]
>>   Epoch Time: 1483692769.662321000 seconds
>>   [Time delta from previous captured frame: 0.000179000 seconds]
>>   [Time delta from previous displayed frame: 0.000179000 seconds]
>>   [Time since reference or first frame: 0.000179000 seconds]
>>   Frame

min_vruntime update when a task is sleeping/migrating

2017-02-22 Thread Pavan Kondeti

Hi Peter,

The comment and the code around 2nd update_min_vruntime() call in
dequeue_entity() are not matching. If I understand commit b60205c7c558
("sched/fair: Fix min_vruntime tracking") correctly, the check is
inverted there. We want to update min_vruntime when a task is
sleeping/migrating. is my understanding right?

static void
dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
{

 
  /*
   * Now advance min_vruntime if @se was the entity holding it back,
   * except when: DEQUEUE_SAVE && !DEQUEUE_MOVE, in this case we'll be
   * put back on, and if we advance min_vruntime, we'll be placed back
   * further than we started -- ie. we'll be penalized.
   */
   if ((flags & (DEQUEUE_SAVE | DEQUEUE_MOVE)) == DEQUEUE_SAVE)
  update_min_vruntime(cfs_rq);
}

Thanks,
Pavan

-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a
Linux Foundation Collaborative Project

min_vruntime update when a task is sleeping/migrating

2017-02-22 Thread Pavan Kondeti

Hi Peter,

The comment and the code around 2nd update_min_vruntime() call in
dequeue_entity() are not matching. If I understand commit b60205c7c558
("sched/fair: Fix min_vruntime tracking") correctly, the check is
inverted there. We want to update min_vruntime when a task is
sleeping/migrating. is my understanding right?

static void
dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
{

 
  /*
   * Now advance min_vruntime if @se was the entity holding it back,
   * except when: DEQUEUE_SAVE && !DEQUEUE_MOVE, in this case we'll be
   * put back on, and if we advance min_vruntime, we'll be placed back
   * further than we started -- ie. we'll be penalized.
   */
   if ((flags & (DEQUEUE_SAVE | DEQUEUE_MOVE)) == DEQUEUE_SAVE)
  update_min_vruntime(cfs_rq);
}

Thanks,
Pavan

-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a
Linux Foundation Collaborative Project

[PATCH v3] sched/deadline: Change the time to replenish runtime for sleep tasks

2017-02-22 Thread Byungchul Park

Let's consider the following example.

timeline : o...o.o...o..o
   ^   ^ ^   ^  ^
   |   | |   |  |
   start   | |   |  |
original runtime |   |  |
 sleep with (-)runtime   |  |
 original deadline  |
  wake up

When this task is woken up, a negative runtime should be considered,
which means that the task should get penalized when assigning runtime,
becasue it already spent more than expected. Current code handles this
by replenishing a runtime in hrtimer callback for deadline. But this
approach has room for improvement:

   It will be replenished twice unnecessarily if the task sleeps for
   long time so that the deadline, assigned in the hrtimer callback,
   also passed. In other words, one happens in the callback and the
   other happens in update_dl_entiry() when waking it up.

So force to replenish it for sleep tasks when waking it up.

Signed-off-by: Byungchul Park 
---
 kernel/sched/deadline.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 27737f3..cb43ce9 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -498,8 +498,9 @@ static void update_dl_entity(struct sched_dl_entity *dl_se,
struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
struct rq *rq = rq_of_dl_rq(dl_rq);
 
-   if (dl_time_before(dl_se->deadline, rq_clock(rq)) ||
-   dl_entity_overflow(dl_se, pi_se, rq_clock(rq))) {
+   if (dl_time_before(dl_se->deadline, rq_clock(rq)))
+   replenish_dl_entity(dl_se, pi_se);
+   else if (dl_entity_overflow(dl_se, pi_se, rq_clock(rq))) {
dl_se->deadline = rq_clock(rq) + pi_se->dl_deadline;
dl_se->runtime = pi_se->dl_runtime;
}
@@ -621,13 +622,11 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer 
*timer)
 * __dequeue_task_dl()
 * prev->on_rq = 0;
 *
-* We can be both throttled and !queued. Replenish the counter
-* but do not enqueue -- wait for our wakeup to do that.
+* We can be both throttled and !queued. Wait for our wakeup to
+* replenish runtime and enqueue p.
 */
-   if (!task_on_rq_queued(p)) {
-   replenish_dl_entity(dl_se, dl_se);
+   if (!task_on_rq_queued(p))
goto unlock;
-   }
 
 #ifdef CONFIG_SMP
if (unlikely(!rq->online)) {
-- 
1.9.1

Re: [PATCH v2 2/2] sched/deadline: Change the way to replenish runtime for sleep tasks

2017-02-22 Thread Byungchul Park

On Thu, Feb 23, 2017 at 12:18:48PM +0900, byungchul.park wrote:
> > Current code handles this by replenishing a runtime in hrtimer callback
> > for deadline. But this approach has room for improvement in two ways:
> > 
> >1. No need to keep the hrtimer for a sleep task because it can be
> >   handled when waking it up.
> > 
> >2. It will be replenished twice unnecessarily if the task sleeps for
> >   long time so that the deadline, assigned in the hrtimer callback,
> >   also passed. In other words, one happens in the callback and the
> >   other happens in update_dl_entiry() when waking it up.

I wanted to enhance not only the second but also the first, so remove
the unnecessary timer overhead for sleep tasks. It's possible but it
makes code too complecated and I won't do that.

I will resend a patch doing only the second one at the next spin.

Thank you,
Byungchul

> > @@ -981,6 +983,9 @@ static void dequeue_task_dl(struct rq *rq, struct
> > task_struct *p, int flags)
> >  {
> > update_curr_dl(rq);
> > __dequeue_task_dl(rq, p, flags);
> > +
> > +   if (flags & DEQUEUE_SLEEP)
> > +   hrtimer_try_to_cancel(>dl.dl_timer);
> 
> Sorry. I found I might have made a mistake. I might have to re-start the
> timer when waking it up if necessary. Let me think more.

[PATCH v3] sched/deadline: Change the time to replenish runtime for sleep tasks

2017-02-22 Thread Byungchul Park

Let's consider the following example.

timeline : o...o.o...o..o
   ^   ^ ^   ^  ^
   |   | |   |  |
   start   | |   |  |
original runtime |   |  |
 sleep with (-)runtime   |  |
 original deadline  |
  wake up

When this task is woken up, a negative runtime should be considered,
which means that the task should get penalized when assigning runtime,
becasue it already spent more than expected. Current code handles this
by replenishing a runtime in hrtimer callback for deadline. But this
approach has room for improvement:

   It will be replenished twice unnecessarily if the task sleeps for
   long time so that the deadline, assigned in the hrtimer callback,
   also passed. In other words, one happens in the callback and the
   other happens in update_dl_entiry() when waking it up.

So force to replenish it for sleep tasks when waking it up.

Signed-off-by: Byungchul Park 
---
 kernel/sched/deadline.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 27737f3..cb43ce9 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -498,8 +498,9 @@ static void update_dl_entity(struct sched_dl_entity *dl_se,
struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
struct rq *rq = rq_of_dl_rq(dl_rq);
 
-   if (dl_time_before(dl_se->deadline, rq_clock(rq)) ||
-   dl_entity_overflow(dl_se, pi_se, rq_clock(rq))) {
+   if (dl_time_before(dl_se->deadline, rq_clock(rq)))
+   replenish_dl_entity(dl_se, pi_se);
+   else if (dl_entity_overflow(dl_se, pi_se, rq_clock(rq))) {
dl_se->deadline = rq_clock(rq) + pi_se->dl_deadline;
dl_se->runtime = pi_se->dl_runtime;
}
@@ -621,13 +622,11 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer 
*timer)
 * __dequeue_task_dl()
 * prev->on_rq = 0;
 *
-* We can be both throttled and !queued. Replenish the counter
-* but do not enqueue -- wait for our wakeup to do that.
+* We can be both throttled and !queued. Wait for our wakeup to
+* replenish runtime and enqueue p.
 */
-   if (!task_on_rq_queued(p)) {
-   replenish_dl_entity(dl_se, dl_se);
+   if (!task_on_rq_queued(p))
goto unlock;
-   }
 
 #ifdef CONFIG_SMP
if (unlikely(!rq->online)) {
-- 
1.9.1

Re: [PATCH v2 2/2] sched/deadline: Change the way to replenish runtime for sleep tasks

2017-02-22 Thread Byungchul Park

On Thu, Feb 23, 2017 at 12:18:48PM +0900, byungchul.park wrote:
> > Current code handles this by replenishing a runtime in hrtimer callback
> > for deadline. But this approach has room for improvement in two ways:
> > 
> >1. No need to keep the hrtimer for a sleep task because it can be
> >   handled when waking it up.
> > 
> >2. It will be replenished twice unnecessarily if the task sleeps for
> >   long time so that the deadline, assigned in the hrtimer callback,
> >   also passed. In other words, one happens in the callback and the
> >   other happens in update_dl_entiry() when waking it up.

I wanted to enhance not only the second but also the first, so remove
the unnecessary timer overhead for sleep tasks. It's possible but it
makes code too complecated and I won't do that.

I will resend a patch doing only the second one at the next spin.

Thank you,
Byungchul

> > @@ -981,6 +983,9 @@ static void dequeue_task_dl(struct rq *rq, struct
> > task_struct *p, int flags)
> >  {
> > update_curr_dl(rq);
> > __dequeue_task_dl(rq, p, flags);
> > +
> > +   if (flags & DEQUEUE_SLEEP)
> > +   hrtimer_try_to_cancel(>dl.dl_timer);
> 
> Sorry. I found I might have made a mistake. I might have to re-start the
> timer when waking it up if necessary. Let me think more.

Re: [f2fs-dev] [PATCH 2/5] f2fs: check last page index in cached bio to decide submission

2017-02-22 Thread Chao Yu

On 2017/2/4 7:48, Jaegeuk Kim wrote:
> If the cached bio has the last page's index, then we need to submit it.
> Otherwise, we don't need to submit it and can wait for further IO merges.
> 
> Signed-off-by: Jaegeuk Kim 

Reviewed-by: Chao Yu

Re: [f2fs-dev] [PATCH 2/5] f2fs: check last page index in cached bio to decide submission

2017-02-22 Thread Chao Yu

On 2017/2/4 7:48, Jaegeuk Kim wrote:
> If the cached bio has the last page's index, then we need to submit it.
> Otherwise, we don't need to submit it and can wait for further IO merges.
> 
> Signed-off-by: Jaegeuk Kim 

Reviewed-by: Chao Yu

git email From: parsing (was Re: [GIT PULL] Staging/IIO driver patches for 4.11-rc1)

2017-02-22 Thread Greg KH

On Wed, Feb 22, 2017 at 11:59:01AM -0800, Linus Torvalds wrote:
> On Wed, Feb 22, 2017 at 6:56 AM, Greg KH  wrote:
> >
> > =?UTF-8?q?Simon=20Sandstr=C3=B6m?= (1):
> >   staging: vt6656: Add missing identifier names
> 
> Wow, your scripts really screwed up that name.
> 
> I'm assuming this is quilt not doing proper character set handling..
> 
> Because if it's git, we need to get that fixed (but I'm pretty sure
> git gets this right - there are various tests for email header
> quoting).
> 
> Alternatively, somebody hand-edited some email and moved the From:
> header to the body without fixing up the RFC 1342 mail header quoting
> (which is very different from how quoting works in the *body* of an
> email).
> 
> Poor Simon Sandström.
> 
> Funnily enough, this only exists for one commit. You've got several
> other commits from Simon that get his name right.
> 
> What happened?

I don't know what happened, I used git for this, I don't use quilt for
"normal" patches accepted into my trees anymore, only for stable kernel
work.

So either the mail is malformed, or git couldn't figure it out, I've
attached the original message below, and cc:ed the git mailing list.

Also, Simon emailed me after this was committed saying something went
wrong, but I couldn't go back and rebase my tree.  Simon, did you ever
figure out if something was odd on your end?

Git developers, any ideas?

thanks,

greg k-h

messy_email.mbox
Description: application/mbox

git email From: parsing (was Re: [GIT PULL] Staging/IIO driver patches for 4.11-rc1)

2017-02-22 Thread Greg KH

On Wed, Feb 22, 2017 at 11:59:01AM -0800, Linus Torvalds wrote:
> On Wed, Feb 22, 2017 at 6:56 AM, Greg KH  wrote:
> >
> > =?UTF-8?q?Simon=20Sandstr=C3=B6m?= (1):
> >   staging: vt6656: Add missing identifier names
> 
> Wow, your scripts really screwed up that name.
> 
> I'm assuming this is quilt not doing proper character set handling..
> 
> Because if it's git, we need to get that fixed (but I'm pretty sure
> git gets this right - there are various tests for email header
> quoting).
> 
> Alternatively, somebody hand-edited some email and moved the From:
> header to the body without fixing up the RFC 1342 mail header quoting
> (which is very different from how quoting works in the *body* of an
> email).
> 
> Poor Simon Sandström.
> 
> Funnily enough, this only exists for one commit. You've got several
> other commits from Simon that get his name right.
> 
> What happened?

I don't know what happened, I used git for this, I don't use quilt for
"normal" patches accepted into my trees anymore, only for stable kernel
work.

So either the mail is malformed, or git couldn't figure it out, I've
attached the original message below, and cc:ed the git mailing list.

Also, Simon emailed me after this was committed saying something went
wrong, but I couldn't go back and rebase my tree.  Simon, did you ever
figure out if something was odd on your end?

Git developers, any ideas?

thanks,

greg k-h

messy_email.mbox
Description: application/mbox

Re: [PATCH v3] x86/mce: Don't participate in rendezvous process once nmi_shootdown_cpus() was made

2017-02-22 Thread Xunlei Pang

On 02/23/2017 at 02:50 AM, Luck, Tony wrote:
> On Wed, Feb 22, 2017 at 12:11:14PM +0800, Xunlei Pang wrote:
>> +/*
>> + * Cases to bail out to avoid rendezvous process timeout:
>> + * 1)If this CPU is offline.
>> + * 2)If crashing_cpu was set, e.g. entering kdump,
>> + *   we need to skip cpus remaining in 1st kernel.
>> + */
>> +if (cpu_is_offline(cpu) ||
>> +(crashing_cpu != -1 && crashing_cpu != cpu)) {
>>  u64 mcgstatus;
>>  
>>  mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
>
> I think we should document the remaining race conditions. I don't
> think there is any good way to eliminate them, and they are already
> pretty small windows.
>
> I think the sequence of events looks like:
>
>  1Panic occurs
>  2nmi_shootdown_cpus() sets crashing_cpu
>  3send NMI to everyone else
>  4wait up to a second for other CPUs to take NMI
>  5go to kexec code
>  6start new kernel
>  7new kernel establishes #MC handler
>
> If one of the other cpus triggers a machine check while
> getting to, or in, the NMI handler ... then that cpu will
> skip processing (if RIPV is set).
>
> Between '2' and '5' if crashing_cpu gets a machine check it
> will execute in the old kernel handler, and do the right thing.
>
> There's a fuzzy area between '6' and '7' where a machine check
> might not end up in the right code.
>
> From '7' onwards the kexec kernel will handle and machine
> checks caused by kdump.
>

Agree, will update the comment.

Regards,
Xunlei

Re: [PATCH v3] x86/mce: Don't participate in rendezvous process once nmi_shootdown_cpus() was made

2017-02-22 Thread Xunlei Pang

On 02/23/2017 at 02:50 AM, Luck, Tony wrote:
> On Wed, Feb 22, 2017 at 12:11:14PM +0800, Xunlei Pang wrote:
>> +/*
>> + * Cases to bail out to avoid rendezvous process timeout:
>> + * 1)If this CPU is offline.
>> + * 2)If crashing_cpu was set, e.g. entering kdump,
>> + *   we need to skip cpus remaining in 1st kernel.
>> + */
>> +if (cpu_is_offline(cpu) ||
>> +(crashing_cpu != -1 && crashing_cpu != cpu)) {
>>  u64 mcgstatus;
>>  
>>  mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
>
> I think we should document the remaining race conditions. I don't
> think there is any good way to eliminate them, and they are already
> pretty small windows.
>
> I think the sequence of events looks like:
>
>  1Panic occurs
>  2nmi_shootdown_cpus() sets crashing_cpu
>  3send NMI to everyone else
>  4wait up to a second for other CPUs to take NMI
>  5go to kexec code
>  6start new kernel
>  7new kernel establishes #MC handler
>
> If one of the other cpus triggers a machine check while
> getting to, or in, the NMI handler ... then that cpu will
> skip processing (if RIPV is set).
>
> Between '2' and '5' if crashing_cpu gets a machine check it
> will execute in the old kernel handler, and do the right thing.
>
> There's a fuzzy area between '6' and '7' where a machine check
> might not end up in the right code.
>
> From '7' onwards the kexec kernel will handle and machine
> checks caused by kdump.
>

Agree, will update the comment.

Regards,
Xunlei

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1526 matches

Mail list logo