Re: [PATCH v4 2/4] KVM: X86: Add paravirt remote TLB flush

2017-11-13 Thread Peter Zijlstra
On Tue, Nov 14, 2017 at 02:28:56PM +0800, Wanpeng Li wrote:
> >  - have the TLB invalidate handler do something like:
> >
> >state = READ_ONCE(src->preempted);
> >if (!(state & KVM_VCPU_IPI_PENDING))
> >return;
> >
> >local_flush_tlb();
> >
> >do {
> >} while (!try_cmpxchg(>preempted, ,
> >  state & ~KVM_VCPU_IPI_PENDING));
> 
> There are a lot of cases handled by flush_tlb_func_remote() ->
> flush_tlb_function_common(), so I'm afraid to have hole.

Sure, just fix the handler to do what must be done. The above was merely
a sketch. The important part is to only clear IPI_PENDING after we do
the actual flushing, since the caller is waiting for that bit.


So flush_tlb_others() has two callers:

 - arch_tlbbatch_flush() -- info::end = TLB_FLUSH_ALL
 - flush_tlb_mm_range()  -- info::mm = mm

native_flush_tlb_others() does smp_call_function_many(
.func=flush_tlb_func_remote) which in turn calls flush_tlb_func_common(
.local=false, .reason=TLB_REMOTE_SHOOTDOWN).


So something like:

struct flush_tlb_info info = {
.start = 0,
.end = TLB_FLUSH_ALL,
};

flush_tlb_func_common(, false, TLB_REMOTE_SHOOTDOWN);

should work for the new IPI. It 'upgrades' all ranges to full flushes,
but meh.




Re: [PATCH v4 2/4] KVM: X86: Add paravirt remote TLB flush

2017-11-13 Thread Peter Zijlstra
On Tue, Nov 14, 2017 at 02:28:56PM +0800, Wanpeng Li wrote:
> >  - have the TLB invalidate handler do something like:
> >
> >state = READ_ONCE(src->preempted);
> >if (!(state & KVM_VCPU_IPI_PENDING))
> >return;
> >
> >local_flush_tlb();
> >
> >do {
> >} while (!try_cmpxchg(>preempted, ,
> >  state & ~KVM_VCPU_IPI_PENDING));
> 
> There are a lot of cases handled by flush_tlb_func_remote() ->
> flush_tlb_function_common(), so I'm afraid to have hole.

Sure, just fix the handler to do what must be done. The above was merely
a sketch. The important part is to only clear IPI_PENDING after we do
the actual flushing, since the caller is waiting for that bit.


So flush_tlb_others() has two callers:

 - arch_tlbbatch_flush() -- info::end = TLB_FLUSH_ALL
 - flush_tlb_mm_range()  -- info::mm = mm

native_flush_tlb_others() does smp_call_function_many(
.func=flush_tlb_func_remote) which in turn calls flush_tlb_func_common(
.local=false, .reason=TLB_REMOTE_SHOOTDOWN).


So something like:

struct flush_tlb_info info = {
.start = 0,
.end = TLB_FLUSH_ALL,
};

flush_tlb_func_common(, false, TLB_REMOTE_SHOOTDOWN);

should work for the new IPI. It 'upgrades' all ranges to full flushes,
but meh.




Re: [PATCH 0/2] backlight: pwm_bl: prevent backlight flicker when switching PWM on

2017-11-13 Thread Lothar Waßmann
Hi,

On Fri, 10 Nov 2017 12:22:15 +0100 Enric Balletbo i Serra wrote:
> Hi all,
> 
> On 08/11/17 11:48, Daniel Thompson wrote:
> > On 26/10/17 13:49, Lothar Waßmann wrote:
> >> These patches implement some measures to prevent backlight flicker
> >> when the backlight is being switched on for a display with an active
> >> low brightness control pin.
> >> GIT: [PATCH 1/2] backlight: pwm_bl: Enable PWM before switching regulator
> >> GIT: [PATCH 2/2] backlight: pwm_bl: add configurable delay between
> > 
> > Other than hoisting the pwm_enable() even earlier in the setup sequence this
> > patchset seems to have a significant overlap with Enric's recent posting:
> > 
> >   https://lkml.org/lkml/2017/7/21/211
> > 
> > Any chance of a shared view on this, especially on the DT bindings?
> > 
> 
> The DT binding were discussed some time ago for the patch series I sent, 
> though
> there isn't a final ack from DT maintainer.
> 
> Lothar, the series I sent have been reviewed and acked, can you test if the
> series fixes the problem for you too?
> 
I'll try to test it within the next couple of days and will report back.


Lothar Waßmann


Re: [PATCH 0/2] backlight: pwm_bl: prevent backlight flicker when switching PWM on

2017-11-13 Thread Lothar Waßmann
Hi,

On Fri, 10 Nov 2017 12:22:15 +0100 Enric Balletbo i Serra wrote:
> Hi all,
> 
> On 08/11/17 11:48, Daniel Thompson wrote:
> > On 26/10/17 13:49, Lothar Waßmann wrote:
> >> These patches implement some measures to prevent backlight flicker
> >> when the backlight is being switched on for a display with an active
> >> low brightness control pin.
> >> GIT: [PATCH 1/2] backlight: pwm_bl: Enable PWM before switching regulator
> >> GIT: [PATCH 2/2] backlight: pwm_bl: add configurable delay between
> > 
> > Other than hoisting the pwm_enable() even earlier in the setup sequence this
> > patchset seems to have a significant overlap with Enric's recent posting:
> > 
> >   https://lkml.org/lkml/2017/7/21/211
> > 
> > Any chance of a shared view on this, especially on the DT bindings?
> > 
> 
> The DT binding were discussed some time ago for the patch series I sent, 
> though
> there isn't a final ack from DT maintainer.
> 
> Lothar, the series I sent have been reviewed and acked, can you test if the
> series fixes the problem for you too?
> 
I'll try to test it within the next couple of days and will report back.


Lothar Waßmann


Re: [PATCH 4.13 00/33] 4.13.13-stable review

2017-11-13 Thread Greg Kroah-Hartman
On Mon, Nov 13, 2017 at 02:29:09PM -0800, Guenter Roeck wrote:
> On Mon, Nov 13, 2017 at 01:56:21PM +0100, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.13.13 release.
> > There are 33 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Wed Nov 15 12:55:46 UTC 2017.
> > Anything received after that time might be too late.
> > 
> 
> Build results:
>   total: 145 pass: 145 fail: 0
> Qemu test results:
>   total: 123 pass: 123 fail: 0
> 
> Details are available at http://kerneltests.org/builders.

Thanks for testing all of these and letting me know.

greg k-h


Re: [PATCH 4.13 00/33] 4.13.13-stable review

2017-11-13 Thread Greg Kroah-Hartman
On Mon, Nov 13, 2017 at 02:29:09PM -0800, Guenter Roeck wrote:
> On Mon, Nov 13, 2017 at 01:56:21PM +0100, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.13.13 release.
> > There are 33 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Wed Nov 15 12:55:46 UTC 2017.
> > Anything received after that time might be too late.
> > 
> 
> Build results:
>   total: 145 pass: 145 fail: 0
> Qemu test results:
>   total: 123 pass: 123 fail: 0
> 
> Details are available at http://kerneltests.org/builders.

Thanks for testing all of these and letting me know.

greg k-h


Re: [PATCH 4.9 00/87] 4.9.62-stable review --> crash

2017-11-13 Thread Sebastian Gottschall

Am 14.11.2017 um 08:41 schrieb Greg Kroah-Hartman:

On Tue, Nov 14, 2017 at 07:48:47AM +0100, Sebastian Gottschall wrote:

ahm it compiles well. but

[   24.838120] Unable to handle kernel NULL pointer dereference at virtual
address 0055
[   24.846193] pgd = c0004000
[   24.848893] [0055] *pgd=
[   24.852472] Internal error: Oops - BUG: 817 [#1] PREEMPT SMP ARM
[   24.858463] Modules linked in: xhci_plat_hcd xhci_pci xhci_hcd ohci_hcd
ehci_pci ehci_platform ehci_hcd usbcore usb_common nls_base qca_ssdk
gpio_pca953x mii_gpio wil6210 ath10k_pci ath10k_core ath9k ath9k_common
ath9k_hw ath mac80211 cfg80211 compat
[   24.880852] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.9.62-rc1 #90
[   24.887189] Hardware name: AnnapurnaLabs Alpine (Device Tree)
[   24.892921] task: ef029ac0 task.stack: ef05a000
[   24.897444] PC is at nf_nat_cleanup_conntrack+0x4c/0x74
[   24.902657] LR is at nf_nat_cleanup_conntrack+0x38/0x74
[   24.907869] pc : []    lr : []    psr: 6153
[   24.907869] sp : ef05bb58  ip : ef05bb58  fp : ef05bb6c
[   24.919317] r10: ed230cc0  r9 : ed230c00  r8 : edf45800
[   24.924529] r7 : ebcd2f00  r6 : ec33739e  r5 : c0892294  r4 : ebcd2f00
[   24.931040] r3 :   r2 : 0055  r1 :   r0 : c0892718
[   24.937551] Flags: nZCv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  Segment
user
[   24.944755] Control: 10c5387d  Table: 2bd1006a  DAC: 0055
[   24.950486] Process swapper/2 (pid: 0, stack limit = 0xef05a210)
[   24.956477] Stack: (0xef05bb58 to 0xef05c000)


will dig into the code to find the reason

Can you run 'git bisect' or if you use quilt, do a manual bisect to find
the offending patch?


already done

netfilter: nat: Revert "netfilter: nat: convert nat bysrc hash to 
rhashtable"


this one caused the crash. if i revert it, its working again


Sebastian


--
Mit freundlichen Grüssen / Regards

Sebastian Gottschall / CTO

NewMedia-NET GmbH - DD-WRT
Firmensitz:  Stubenwaldallee 21a, 64625 Bensheim
Registergericht: Amtsgericht Darmstadt, HRB 25473
Geschäftsführer: Peter Steinhäuser, Christian Scheele
http://www.dd-wrt.com
email: s.gottsch...@dd-wrt.com
Tel.: +496251-582650 / Fax: +496251-5826565



Re: [PATCH 4.9 00/87] 4.9.62-stable review --> crash

2017-11-13 Thread Sebastian Gottschall

Am 14.11.2017 um 08:41 schrieb Greg Kroah-Hartman:

On Tue, Nov 14, 2017 at 07:48:47AM +0100, Sebastian Gottschall wrote:

ahm it compiles well. but

[   24.838120] Unable to handle kernel NULL pointer dereference at virtual
address 0055
[   24.846193] pgd = c0004000
[   24.848893] [0055] *pgd=
[   24.852472] Internal error: Oops - BUG: 817 [#1] PREEMPT SMP ARM
[   24.858463] Modules linked in: xhci_plat_hcd xhci_pci xhci_hcd ohci_hcd
ehci_pci ehci_platform ehci_hcd usbcore usb_common nls_base qca_ssdk
gpio_pca953x mii_gpio wil6210 ath10k_pci ath10k_core ath9k ath9k_common
ath9k_hw ath mac80211 cfg80211 compat
[   24.880852] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.9.62-rc1 #90
[   24.887189] Hardware name: AnnapurnaLabs Alpine (Device Tree)
[   24.892921] task: ef029ac0 task.stack: ef05a000
[   24.897444] PC is at nf_nat_cleanup_conntrack+0x4c/0x74
[   24.902657] LR is at nf_nat_cleanup_conntrack+0x38/0x74
[   24.907869] pc : []    lr : []    psr: 6153
[   24.907869] sp : ef05bb58  ip : ef05bb58  fp : ef05bb6c
[   24.919317] r10: ed230cc0  r9 : ed230c00  r8 : edf45800
[   24.924529] r7 : ebcd2f00  r6 : ec33739e  r5 : c0892294  r4 : ebcd2f00
[   24.931040] r3 :   r2 : 0055  r1 :   r0 : c0892718
[   24.937551] Flags: nZCv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  Segment
user
[   24.944755] Control: 10c5387d  Table: 2bd1006a  DAC: 0055
[   24.950486] Process swapper/2 (pid: 0, stack limit = 0xef05a210)
[   24.956477] Stack: (0xef05bb58 to 0xef05c000)


will dig into the code to find the reason

Can you run 'git bisect' or if you use quilt, do a manual bisect to find
the offending patch?


already done

netfilter: nat: Revert "netfilter: nat: convert nat bysrc hash to 
rhashtable"


this one caused the crash. if i revert it, its working again


Sebastian


--
Mit freundlichen Grüssen / Regards

Sebastian Gottschall / CTO

NewMedia-NET GmbH - DD-WRT
Firmensitz:  Stubenwaldallee 21a, 64625 Bensheim
Registergericht: Amtsgericht Darmstadt, HRB 25473
Geschäftsführer: Peter Steinhäuser, Christian Scheele
http://www.dd-wrt.com
email: s.gottsch...@dd-wrt.com
Tel.: +496251-582650 / Fax: +496251-5826565



RE: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages

2017-11-13 Thread Ran Wang
Hi Michal,
> -Original Message-
> From: Michal Hocko [mailto:mho...@kernel.org]
> Sent: Tuesday, November 14, 2017 3:07 PM
> To: Ran Wang 
> Cc: linux...@kvack.org; Michael Ellerman ; Vlastimil
> Babka ; Andrew Morton ;
> KAMEZAWA Hiroyuki ; Reza Arbab
> ; Yasuaki Ishimatsu ;
> qiuxi...@huawei.com; Igor Mammedov ; Vitaly
> Kuznetsov ; LKML ;
> Leo Li ; Xiaobo Xie 
> Subject: Re: [PATCH 1/2] mm: drop migrate type checks from
> has_unmovable_pages
> 
> On Tue 14-11-17 06:10:00, Ran Wang wrote:
> [...]
> > > > This drop cause DWC3 USB controller fail on initialization with
> > > > Layerscaper processors (such as LS1043A) as below:
> > > >
> > > > [2.701437] xhci-hcd xhci-hcd.0.auto: new USB bus registered,
> assigned
> > > bus number 1
> > > > [2.710949] cma: cma_alloc: alloc failed, req-size: 1 pages, ret: -16
> > > > [2.717411] xhci-hcd xhci-hcd.0.auto: can't setup: -12
> > > > [2.727940] xhci-hcd xhci-hcd.0.auto: USB bus 1 deregistered
> > > > [2.733607] xhci-hcd: probe of xhci-hcd.0.auto failed with error -12
> > > > [2.739978] xhci-hcd xhci-hcd.1.auto: xHCI Host Controller
> > > >
> > > > And I notice that someone also reported to you that DWC2 got
> > > > affected recently, so do you have the solution now?
> > >
> > > Yes. It should be in linux-next. Have a look at the following email
> > > thread:
> > >
> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flkml.
> > >
> kernel.org%2Fr%2F20171104082500.qvzbb2kw4suo6cgy%40dhcp22.suse.cz&
> > >
> data=02%7C01%7Cran.wang_1%40nxp.com%7C5e73c6a941fc4f1c10e708d52
> > >
> a860c5b%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C636461677
> > >
> 583607877=zlRxJ4LZwOBsit5qRx9yFT5qfP54wZ0z6G1z%2Bcywf5g%3D
> > > =0
> 
> I really have no idea where the above link came from because my email had
> a reference to
> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flkml.
> kernel.org%2Fr%2F20171104082500.qvzbb2kw4suo6cgy%40dhcp22.suse.cz&
> data=02%7C01%7Cran.wang_1%40nxp.com%7C9b452e62f11e446d12b408d5
> 2b2e4014%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C63646239
> 9997608449=S9MPhGyIUiYCJdVYMh3DAHAEytu%2Fu45BB%2BcMhO%
> 2BP3Qo%3D=0
> Has your email client modified the original email?
> 
> > Thanks for your info, although I fail to open the link you shared, but
> > I got patch from my colleague and the issue got fix on my side, let you 
> > know,
> thanks.
> 
> Thanks for your testing anyway. Can I assume your Tested-by?
Yes, please.

BR
Ran
> --
> Michal Hocko
> SUSE Labs


RE: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages

2017-11-13 Thread Ran Wang
Hi Michal,
> -Original Message-
> From: Michal Hocko [mailto:mho...@kernel.org]
> Sent: Tuesday, November 14, 2017 3:07 PM
> To: Ran Wang 
> Cc: linux...@kvack.org; Michael Ellerman ; Vlastimil
> Babka ; Andrew Morton ;
> KAMEZAWA Hiroyuki ; Reza Arbab
> ; Yasuaki Ishimatsu ;
> qiuxi...@huawei.com; Igor Mammedov ; Vitaly
> Kuznetsov ; LKML ;
> Leo Li ; Xiaobo Xie 
> Subject: Re: [PATCH 1/2] mm: drop migrate type checks from
> has_unmovable_pages
> 
> On Tue 14-11-17 06:10:00, Ran Wang wrote:
> [...]
> > > > This drop cause DWC3 USB controller fail on initialization with
> > > > Layerscaper processors (such as LS1043A) as below:
> > > >
> > > > [2.701437] xhci-hcd xhci-hcd.0.auto: new USB bus registered,
> assigned
> > > bus number 1
> > > > [2.710949] cma: cma_alloc: alloc failed, req-size: 1 pages, ret: -16
> > > > [2.717411] xhci-hcd xhci-hcd.0.auto: can't setup: -12
> > > > [2.727940] xhci-hcd xhci-hcd.0.auto: USB bus 1 deregistered
> > > > [2.733607] xhci-hcd: probe of xhci-hcd.0.auto failed with error -12
> > > > [2.739978] xhci-hcd xhci-hcd.1.auto: xHCI Host Controller
> > > >
> > > > And I notice that someone also reported to you that DWC2 got
> > > > affected recently, so do you have the solution now?
> > >
> > > Yes. It should be in linux-next. Have a look at the following email
> > > thread:
> > >
> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flkml.
> > >
> kernel.org%2Fr%2F20171104082500.qvzbb2kw4suo6cgy%40dhcp22.suse.cz&
> > >
> data=02%7C01%7Cran.wang_1%40nxp.com%7C5e73c6a941fc4f1c10e708d52
> > >
> a860c5b%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C636461677
> > >
> 583607877=zlRxJ4LZwOBsit5qRx9yFT5qfP54wZ0z6G1z%2Bcywf5g%3D
> > > =0
> 
> I really have no idea where the above link came from because my email had
> a reference to
> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flkml.
> kernel.org%2Fr%2F20171104082500.qvzbb2kw4suo6cgy%40dhcp22.suse.cz&
> data=02%7C01%7Cran.wang_1%40nxp.com%7C9b452e62f11e446d12b408d5
> 2b2e4014%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C63646239
> 9997608449=S9MPhGyIUiYCJdVYMh3DAHAEytu%2Fu45BB%2BcMhO%
> 2BP3Qo%3D=0
> Has your email client modified the original email?
> 
> > Thanks for your info, although I fail to open the link you shared, but
> > I got patch from my colleague and the issue got fix on my side, let you 
> > know,
> thanks.
> 
> Thanks for your testing anyway. Can I assume your Tested-by?
Yes, please.

BR
Ran
> --
> Michal Hocko
> SUSE Labs


Re: [PATCH RFC v3 4/6] Documentation: Add three sysctls for smart idle poll

2017-11-13 Thread Ingo Molnar

* Quan Xu  wrote:

> 
> 
> On 2017/11/13 23:08, Ingo Molnar wrote:
> > * Quan Xu  wrote:
> > 
> > > From: Quan Xu 
> > > 
> > > To reduce the cost of poll, we introduce three sysctl to control the
> > > poll time when running as a virtual machine with paravirt.
> > > 
> > > Signed-off-by: Yang Zhang 
> > > Signed-off-by: Quan Xu 
> > > ---
> > >   Documentation/sysctl/kernel.txt |   35 
> > > +++
> > >   arch/x86/kernel/paravirt.c  |4 
> > >   include/linux/kernel.h  |6 ++
> > >   kernel/sysctl.c |   34 
> > > ++
> > >   4 files changed, 79 insertions(+), 0 deletions(-)
> > > 
> > > diff --git a/Documentation/sysctl/kernel.txt 
> > > b/Documentation/sysctl/kernel.txt
> > > index 694968c..30c25fb 100644
> > > --- a/Documentation/sysctl/kernel.txt
> > > +++ b/Documentation/sysctl/kernel.txt
> > > @@ -714,6 +714,41 @@ kernel tries to allocate a number starting from this 
> > > one.
> > >   ==
> > > +paravirt_poll_grow: (X86 only)
> > > +
> > > +Multiplied value to increase the poll time. This is expected to take
> > > +effect only when running as a virtual machine with CONFIG_PARAVIRT
> > > +enabled. This can't bring any benifit on bare mental even with
> > > +CONFIG_PARAVIRT enabled.
> > > +
> > > +By default this value is 2. Possible values to set are in range {2..16}.
> > > +
> > > +==
> > > +
> > > +paravirt_poll_shrink: (X86 only)
> > > +
> > > +Divided value to reduce the poll time. This is expected to take effect
> > > +only when running as a virtual machine with CONFIG_PARAVIRT enabled.
> > > +This can't bring any benifit on bare mental even with CONFIG_PARAVIRT
> > > +enabled.
> > > +
> > > +By default this value is 2. Possible values to set are in range {2..16}.
> > > +
> > > +==
> > > +
> > > +paravirt_poll_threshold_ns: (X86 only)
> > > +
> > > +Controls the maximum poll time before entering real idle path. This is
> > > +expected to take effect only when running as a virtual machine with
> > > +CONFIG_PARAVIRT enabled. This can't bring any benifit on bare mental
> > > +even with CONFIG_PARAVIRT enabled.
> > > +
> > > +By default, this value is 0 means not to poll. Possible values to set
> > > +are in range {0..50}. Change the value to non-zero if running
> > > +latency-bound workloads in a virtual machine.
> > I absolutely hate it how this hybrid idle loop polling mechanism is not
> > self-tuning!
> 
> Ingo, actually it is self-tuning..

Then why the hell does it touch the syscall ABI?

> could I only leave paravirt_poll_threshold_ns parameter (the maximum poll 
> time), 
> which is as similar as "adaptive halt-polling" Wanpeng mentioned.. then user 
> can 
> turn it off, or find an appropriate threshold for some odd scenario..

That way lies utter madness. Maybe add it as a debugfs knob, but exposing it to 
userspace: NAK.

Thanks,

Ingo


Re: [PATCH 3.18 00/28] 3.18.81-stable review

2017-11-13 Thread Greg Kroah-Hartman
On Mon, Nov 13, 2017 at 02:50:22PM -0700, Shuah Khan wrote:
> On 11/13/2017 05:54 AM, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 3.18.81 release.
> > There are 28 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Wed Nov 15 12:53:41 UTC 2017.
> > Anything received after that time might be too late.
> > 
> > The whole patch series can be found in one patch at:
> > kernel.org/pub/linux/kernel/v3.x/stable-review/patch-3.18.81-rc1.gz
> > or in the git tree and branch at:
> >   git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> > linux-3.18.y
> > and the diffstat can be found below.
> > 
> > thanks,
> > 
> > greg k-h
> > 
> 
> Compiled and booted on my test system. No dmesg regressions.

Thanks for testing all of these and letting me know.

greg k-h


Re: [PATCH RFC v3 4/6] Documentation: Add three sysctls for smart idle poll

2017-11-13 Thread Ingo Molnar

* Quan Xu  wrote:

> 
> 
> On 2017/11/13 23:08, Ingo Molnar wrote:
> > * Quan Xu  wrote:
> > 
> > > From: Quan Xu 
> > > 
> > > To reduce the cost of poll, we introduce three sysctl to control the
> > > poll time when running as a virtual machine with paravirt.
> > > 
> > > Signed-off-by: Yang Zhang 
> > > Signed-off-by: Quan Xu 
> > > ---
> > >   Documentation/sysctl/kernel.txt |   35 
> > > +++
> > >   arch/x86/kernel/paravirt.c  |4 
> > >   include/linux/kernel.h  |6 ++
> > >   kernel/sysctl.c |   34 
> > > ++
> > >   4 files changed, 79 insertions(+), 0 deletions(-)
> > > 
> > > diff --git a/Documentation/sysctl/kernel.txt 
> > > b/Documentation/sysctl/kernel.txt
> > > index 694968c..30c25fb 100644
> > > --- a/Documentation/sysctl/kernel.txt
> > > +++ b/Documentation/sysctl/kernel.txt
> > > @@ -714,6 +714,41 @@ kernel tries to allocate a number starting from this 
> > > one.
> > >   ==
> > > +paravirt_poll_grow: (X86 only)
> > > +
> > > +Multiplied value to increase the poll time. This is expected to take
> > > +effect only when running as a virtual machine with CONFIG_PARAVIRT
> > > +enabled. This can't bring any benifit on bare mental even with
> > > +CONFIG_PARAVIRT enabled.
> > > +
> > > +By default this value is 2. Possible values to set are in range {2..16}.
> > > +
> > > +==
> > > +
> > > +paravirt_poll_shrink: (X86 only)
> > > +
> > > +Divided value to reduce the poll time. This is expected to take effect
> > > +only when running as a virtual machine with CONFIG_PARAVIRT enabled.
> > > +This can't bring any benifit on bare mental even with CONFIG_PARAVIRT
> > > +enabled.
> > > +
> > > +By default this value is 2. Possible values to set are in range {2..16}.
> > > +
> > > +==
> > > +
> > > +paravirt_poll_threshold_ns: (X86 only)
> > > +
> > > +Controls the maximum poll time before entering real idle path. This is
> > > +expected to take effect only when running as a virtual machine with
> > > +CONFIG_PARAVIRT enabled. This can't bring any benifit on bare mental
> > > +even with CONFIG_PARAVIRT enabled.
> > > +
> > > +By default, this value is 0 means not to poll. Possible values to set
> > > +are in range {0..50}. Change the value to non-zero if running
> > > +latency-bound workloads in a virtual machine.
> > I absolutely hate it how this hybrid idle loop polling mechanism is not
> > self-tuning!
> 
> Ingo, actually it is self-tuning..

Then why the hell does it touch the syscall ABI?

> could I only leave paravirt_poll_threshold_ns parameter (the maximum poll 
> time), 
> which is as similar as "adaptive halt-polling" Wanpeng mentioned.. then user 
> can 
> turn it off, or find an appropriate threshold for some odd scenario..

That way lies utter madness. Maybe add it as a debugfs knob, but exposing it to 
userspace: NAK.

Thanks,

Ingo


Re: [PATCH 3.18 00/28] 3.18.81-stable review

2017-11-13 Thread Greg Kroah-Hartman
On Mon, Nov 13, 2017 at 02:50:22PM -0700, Shuah Khan wrote:
> On 11/13/2017 05:54 AM, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 3.18.81 release.
> > There are 28 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Wed Nov 15 12:53:41 UTC 2017.
> > Anything received after that time might be too late.
> > 
> > The whole patch series can be found in one patch at:
> > kernel.org/pub/linux/kernel/v3.x/stable-review/patch-3.18.81-rc1.gz
> > or in the git tree and branch at:
> >   git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> > linux-3.18.y
> > and the diffstat can be found below.
> > 
> > thanks,
> > 
> > greg k-h
> > 
> 
> Compiled and booted on my test system. No dmesg regressions.

Thanks for testing all of these and letting me know.

greg k-h


Re: [PATCH 4.13 00/33] 4.13.13-stable review

2017-11-13 Thread Greg Kroah-Hartman
On Mon, Nov 13, 2017 at 02:02:12PM -0800, kernelci.org bot wrote:
> stable-rc/linux-4.13.y boot: 127 boots: 10 failed, 116 passed with 1 conflict 
> (v4.13.12-34-g109b28ca1340)
> 
> Full Boot Summary: 
> https://kernelci.org/boot/all/job/stable-rc/branch/linux-4.13.y/kernel/v4.13.12-34-g109b28ca1340/
> Full Build Summary: 
> https://kernelci.org/build/stable-rc/branch/linux-4.13.y/kernel/v4.13.12-34-g109b28ca1340/
> 
> Tree: stable-rc
> Branch: linux-4.13.y
> Git Describe: v4.13.12-34-g109b28ca1340
> Git Commit: 109b28ca1340961002d4bede168f07823451b8e4
> Git URL: 
> http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
> Tested: 51 unique boards, 17 SoC families, 18 builds out of 188
> 
> Boot Regressions Detected:
> 
> arm:
> 
> at91_dt_defconfig:
> at91rm9200ek_rootfs:nfs:
> lab-free-electrons: new failure (last pass: v4.13.12)
> 
> exynos_defconfig:
> exynos4412-odroidx2_rootfs:nfs:
> lab-collabora: failing since 4 days (last pass: 
> v4.13.11-37-g2295c8345797 - first fail: v4.13.12)
> exynos5250-snow_rootfs:nfs:
> lab-collabora: failing since 4 days (last pass: 
> v4.13.11-37-g2295c8345797 - first fail: v4.13.12)

Thanks for these "failing since..." markings, that makes me feel better
that I didn't break anything on my end :)

greg k-h


Re: [PATCH 4.13 00/33] 4.13.13-stable review

2017-11-13 Thread Greg Kroah-Hartman
On Mon, Nov 13, 2017 at 02:02:12PM -0800, kernelci.org bot wrote:
> stable-rc/linux-4.13.y boot: 127 boots: 10 failed, 116 passed with 1 conflict 
> (v4.13.12-34-g109b28ca1340)
> 
> Full Boot Summary: 
> https://kernelci.org/boot/all/job/stable-rc/branch/linux-4.13.y/kernel/v4.13.12-34-g109b28ca1340/
> Full Build Summary: 
> https://kernelci.org/build/stable-rc/branch/linux-4.13.y/kernel/v4.13.12-34-g109b28ca1340/
> 
> Tree: stable-rc
> Branch: linux-4.13.y
> Git Describe: v4.13.12-34-g109b28ca1340
> Git Commit: 109b28ca1340961002d4bede168f07823451b8e4
> Git URL: 
> http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
> Tested: 51 unique boards, 17 SoC families, 18 builds out of 188
> 
> Boot Regressions Detected:
> 
> arm:
> 
> at91_dt_defconfig:
> at91rm9200ek_rootfs:nfs:
> lab-free-electrons: new failure (last pass: v4.13.12)
> 
> exynos_defconfig:
> exynos4412-odroidx2_rootfs:nfs:
> lab-collabora: failing since 4 days (last pass: 
> v4.13.11-37-g2295c8345797 - first fail: v4.13.12)
> exynos5250-snow_rootfs:nfs:
> lab-collabora: failing since 4 days (last pass: 
> v4.13.11-37-g2295c8345797 - first fail: v4.13.12)

Thanks for these "failing since..." markings, that makes me feel better
that I didn't break anything on my end :)

greg k-h


Re: [PATCH 4.9 00/87] 4.9.62-stable review --> crash

2017-11-13 Thread Greg Kroah-Hartman
On Tue, Nov 14, 2017 at 07:48:47AM +0100, Sebastian Gottschall wrote:
> ahm it compiles well. but
> 
> [   24.838120] Unable to handle kernel NULL pointer dereference at virtual
> address 0055
> [   24.846193] pgd = c0004000
> [   24.848893] [0055] *pgd=
> [   24.852472] Internal error: Oops - BUG: 817 [#1] PREEMPT SMP ARM
> [   24.858463] Modules linked in: xhci_plat_hcd xhci_pci xhci_hcd ohci_hcd
> ehci_pci ehci_platform ehci_hcd usbcore usb_common nls_base qca_ssdk
> gpio_pca953x mii_gpio wil6210 ath10k_pci ath10k_core ath9k ath9k_common
> ath9k_hw ath mac80211 cfg80211 compat
> [   24.880852] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.9.62-rc1 #90
> [   24.887189] Hardware name: AnnapurnaLabs Alpine (Device Tree)
> [   24.892921] task: ef029ac0 task.stack: ef05a000
> [   24.897444] PC is at nf_nat_cleanup_conntrack+0x4c/0x74
> [   24.902657] LR is at nf_nat_cleanup_conntrack+0x38/0x74
> [   24.907869] pc : []    lr : []    psr: 6153
> [   24.907869] sp : ef05bb58  ip : ef05bb58  fp : ef05bb6c
> [   24.919317] r10: ed230cc0  r9 : ed230c00  r8 : edf45800
> [   24.924529] r7 : ebcd2f00  r6 : ec33739e  r5 : c0892294  r4 : ebcd2f00
> [   24.931040] r3 :   r2 : 0055  r1 :   r0 : c0892718
> [   24.937551] Flags: nZCv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  Segment
> user
> [   24.944755] Control: 10c5387d  Table: 2bd1006a  DAC: 0055
> [   24.950486] Process swapper/2 (pid: 0, stack limit = 0xef05a210)
> [   24.956477] Stack: (0xef05bb58 to 0xef05c000)
> 
> 
> will dig into the code to find the reason

Can you run 'git bisect' or if you use quilt, do a manual bisect to find
the offending patch?

thanks,

greg k-h


Re: [PATCH 4.9 00/87] 4.9.62-stable review --> crash

2017-11-13 Thread Greg Kroah-Hartman
On Tue, Nov 14, 2017 at 07:48:47AM +0100, Sebastian Gottschall wrote:
> ahm it compiles well. but
> 
> [   24.838120] Unable to handle kernel NULL pointer dereference at virtual
> address 0055
> [   24.846193] pgd = c0004000
> [   24.848893] [0055] *pgd=
> [   24.852472] Internal error: Oops - BUG: 817 [#1] PREEMPT SMP ARM
> [   24.858463] Modules linked in: xhci_plat_hcd xhci_pci xhci_hcd ohci_hcd
> ehci_pci ehci_platform ehci_hcd usbcore usb_common nls_base qca_ssdk
> gpio_pca953x mii_gpio wil6210 ath10k_pci ath10k_core ath9k ath9k_common
> ath9k_hw ath mac80211 cfg80211 compat
> [   24.880852] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.9.62-rc1 #90
> [   24.887189] Hardware name: AnnapurnaLabs Alpine (Device Tree)
> [   24.892921] task: ef029ac0 task.stack: ef05a000
> [   24.897444] PC is at nf_nat_cleanup_conntrack+0x4c/0x74
> [   24.902657] LR is at nf_nat_cleanup_conntrack+0x38/0x74
> [   24.907869] pc : []    lr : []    psr: 6153
> [   24.907869] sp : ef05bb58  ip : ef05bb58  fp : ef05bb6c
> [   24.919317] r10: ed230cc0  r9 : ed230c00  r8 : edf45800
> [   24.924529] r7 : ebcd2f00  r6 : ec33739e  r5 : c0892294  r4 : ebcd2f00
> [   24.931040] r3 :   r2 : 0055  r1 :   r0 : c0892718
> [   24.937551] Flags: nZCv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  Segment
> user
> [   24.944755] Control: 10c5387d  Table: 2bd1006a  DAC: 0055
> [   24.950486] Process swapper/2 (pid: 0, stack limit = 0xef05a210)
> [   24.956477] Stack: (0xef05bb58 to 0xef05c000)
> 
> 
> will dig into the code to find the reason

Can you run 'git bisect' or if you use quilt, do a manual bisect to find
the offending patch?

thanks,

greg k-h


Re: [PATCH 0/7] net: core: devname allocation cleanups

2017-11-13 Thread David Miller
From: Rasmus Villemoes 
Date: Mon, 13 Nov 2017 00:15:03 +0100

> It's somewhat confusing to have both dev_alloc_name and
> dev_get_valid_name. I can't see why the former is less strict than the
> latter, so make them (or rather dev_alloc_name_ns and
> dev_get_valid_name) equivalent, hardening dev_alloc_name() a little.
> 
> Obvious follow-up patches would be to only export one function, and
> make dev_alloc_name a static inline wrapper for that (whichever name
> is chosen for the exported interface). But maybe there is a good
> reason the two exported interfaces do different checking, so I'll
> refrain from including the trivial but tree-wide renaming in this
> series.

Series applied, thanks.


Re: [PATCH 0/7] net: core: devname allocation cleanups

2017-11-13 Thread David Miller
From: Rasmus Villemoes 
Date: Mon, 13 Nov 2017 00:15:03 +0100

> It's somewhat confusing to have both dev_alloc_name and
> dev_get_valid_name. I can't see why the former is less strict than the
> latter, so make them (or rather dev_alloc_name_ns and
> dev_get_valid_name) equivalent, hardening dev_alloc_name() a little.
> 
> Obvious follow-up patches would be to only export one function, and
> make dev_alloc_name a static inline wrapper for that (whichever name
> is chosen for the exported interface). But maybe there is a good
> reason the two exported interfaces do different checking, so I'll
> refrain from including the trivial but tree-wide renaming in this
> series.

Series applied, thanks.


Re: [PATCH v4 2/4] KVM: X86: Add paravirt remote TLB flush

2017-11-13 Thread Peter Zijlstra
On Tue, Nov 14, 2017 at 02:10:16PM +0800, Wanpeng Li wrote:
> 2017-11-13 21:02 GMT+08:00 Peter Zijlstra :
> > That can be written like:
> >
> > do {
> > if (state & KVM_VCPU_PREEMPTED)
> > new_state = state | KVM_VCPU_SHOULD_FLUSH;
> > else
> > new_state = state | KVM_VCPU_IPI_PENDING;
> > } while (!try_cmpxchg(>preempted, state, new_state);
> >
> > if (new_state & KVM_VCPU_IPI_PENDING)
> 
> Should be new_state & KVM_VCPU_SHOULD_FLUSH I think.

Quite so indeed.


Re: [PATCH v4 2/4] KVM: X86: Add paravirt remote TLB flush

2017-11-13 Thread Peter Zijlstra
On Tue, Nov 14, 2017 at 02:10:16PM +0800, Wanpeng Li wrote:
> 2017-11-13 21:02 GMT+08:00 Peter Zijlstra :
> > That can be written like:
> >
> > do {
> > if (state & KVM_VCPU_PREEMPTED)
> > new_state = state | KVM_VCPU_SHOULD_FLUSH;
> > else
> > new_state = state | KVM_VCPU_IPI_PENDING;
> > } while (!try_cmpxchg(>preempted, state, new_state);
> >
> > if (new_state & KVM_VCPU_IPI_PENDING)
> 
> Should be new_state & KVM_VCPU_SHOULD_FLUSH I think.

Quite so indeed.


Re: [RESEND PATCH v2 4/4] x86/umip: Warn if UMIP-protected instructions are used

2017-11-13 Thread Ingo Molnar

* Ricardo Neri  wrote:

> +const char * const umip_insns[5] = {
> + [UMIP_INST_SGDT] = "sgdt",
> + [UMIP_INST_SIDT] = "sidt",
> + [UMIP_INST_SMSW] = "smsw",
> + [UMIP_INST_SLDT] = "sldt",
> + [UMIP_INST_STR] = "str",
> +};

Sigh ...

> +/*
> + * If you change these strings, ensure that buffers using them are 
> sufficiently
> + * large.
> + */
> +static const char umip_warn_use[] = "cannot be used by applications.";
> +static const char umip_warn_emu[] = "For now, expensive software emulation 
> returns result.";

Please use the string literals directly, don't add an extra obfuscation layer.

Plus:

> + unsigned char buf[MAX_INSN_SIZE], warn[128];

> + snprintf(warn, sizeof(warn), "%s %s", umip_insns[umip_inst],
> +  umip_warn_use);

This is incredibly fragile against future buffer overflows, and warning about 
it 
in comments does not make it less fragile!

Thanks,

Ingo


Re: [RESEND PATCH v2 4/4] x86/umip: Warn if UMIP-protected instructions are used

2017-11-13 Thread Ingo Molnar

* Ricardo Neri  wrote:

> +const char * const umip_insns[5] = {
> + [UMIP_INST_SGDT] = "sgdt",
> + [UMIP_INST_SIDT] = "sidt",
> + [UMIP_INST_SMSW] = "smsw",
> + [UMIP_INST_SLDT] = "sldt",
> + [UMIP_INST_STR] = "str",
> +};

Sigh ...

> +/*
> + * If you change these strings, ensure that buffers using them are 
> sufficiently
> + * large.
> + */
> +static const char umip_warn_use[] = "cannot be used by applications.";
> +static const char umip_warn_emu[] = "For now, expensive software emulation 
> returns result.";

Please use the string literals directly, don't add an extra obfuscation layer.

Plus:

> + unsigned char buf[MAX_INSN_SIZE], warn[128];

> + snprintf(warn, sizeof(warn), "%s %s", umip_insns[umip_inst],
> +  umip_warn_use);

This is incredibly fragile against future buffer overflows, and warning about 
it 
in comments does not make it less fragile!

Thanks,

Ingo


RE: [patch v2 3/8] KVM: x86: add Intel processor trace virtualization mode

2017-11-13 Thread Kang, Luwei
> > +   if (!(_cpu_based_2nd_exec_control & SECONDARY_EXEC_PT_USE_GPA) ||
> > +   !(_vmexit_control & VM_EXIT_CLEAR_IA32_RTIT_CTL) ||
> > +   !(_vmentry_control & VM_ENTRY_LOAD_IA32_RTIT_CTL)) {
> > +   _cpu_based_2nd_exec_control &= ~SECONDARY_EXEC_PT_USE_GPA;
> 
> Also, you are not checking anywhere if the SUPPRESS_PIP controls are 
> available.  This is probably the best place.

SUPPRESS_PIP(should be "CONCEAL", will fix it.) is use for control of  
processor trace packet. 
I think we should clear it when in SYSTEM mode (For example, PIPs are generated 
on VM exit, with NonRoot=0. On VM exit to SMM, VMCS packets are additionally 
generated). Why need check this here?

> 
> > +   _vmexit_control &= ~VM_EXIT_CLEAR_IA32_RTIT_CTL;
> > +   _vmentry_control &= ~VM_ENTRY_LOAD_IA32_RTIT_CTL;
> 
> These two are not needed; disabling SECONDARY_EXEC_PT_USE_GPA is enough.
> The tracing mode will revert to PT_SYSTEM, which does not use the load/clear 
> RTIT_CTL controls.
> 

The status of *_RTIT_CTL should be same with SECONDARY_EXEC_PT_USE_GPA or would 
cause VM-entry failed. 
(architecture-instruction-set-extensions-programming-reference  5.2.3)


RE: [patch v2 3/8] KVM: x86: add Intel processor trace virtualization mode

2017-11-13 Thread Kang, Luwei
> > +   if (!(_cpu_based_2nd_exec_control & SECONDARY_EXEC_PT_USE_GPA) ||
> > +   !(_vmexit_control & VM_EXIT_CLEAR_IA32_RTIT_CTL) ||
> > +   !(_vmentry_control & VM_ENTRY_LOAD_IA32_RTIT_CTL)) {
> > +   _cpu_based_2nd_exec_control &= ~SECONDARY_EXEC_PT_USE_GPA;
> 
> Also, you are not checking anywhere if the SUPPRESS_PIP controls are 
> available.  This is probably the best place.

SUPPRESS_PIP(should be "CONCEAL", will fix it.) is use for control of  
processor trace packet. 
I think we should clear it when in SYSTEM mode (For example, PIPs are generated 
on VM exit, with NonRoot=0. On VM exit to SMM, VMCS packets are additionally 
generated). Why need check this here?

> 
> > +   _vmexit_control &= ~VM_EXIT_CLEAR_IA32_RTIT_CTL;
> > +   _vmentry_control &= ~VM_ENTRY_LOAD_IA32_RTIT_CTL;
> 
> These two are not needed; disabling SECONDARY_EXEC_PT_USE_GPA is enough.
> The tracing mode will revert to PT_SYSTEM, which does not use the load/clear 
> RTIT_CTL controls.
> 

The status of *_RTIT_CTL should be same with SECONDARY_EXEC_PT_USE_GPA or would 
cause VM-entry failed. 
(architecture-instruction-set-extensions-programming-reference  5.2.3)


[f2fs-dev] [PATCH RESEND] f2fs: fix concurrent problem for updating free bitmap

2017-11-13 Thread LiFan
alloc_nid_failed and scan_nat_page can be called at the same time,
and we haven't protected add_free_nid and update_free_nid_bitmap
with the same nid_list_lock. That could lead to

Thread AThread B
- __build_free_nids
 - scan_nat_page
  - add_free_nid
- alloc_nid_failed
 - update_free_nid_bitmap
  - update_free_nid_bitmap

scan_nat_page will clear the free bitmap since the nid is PREALLOC_NID,
but alloc_nid_failed needs to set the free bitmap. This results in
free nid with free bitmap cleared.
This patch update the bitmap under the same nid_list_lock in add_free_nid.

Signed-off-by: Fan li 
---
 fs/f2fs/node.c | 82 ++
 1 file changed, 42 insertions(+), 40 deletions(-)

diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index b965a53..0a217d2 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1811,8 +1811,33 @@ static void __move_free_nid(struct f2fs_sb_info *sbi, 
struct free_nid *i,
}
 }
 
+static void update_free_nid_bitmap(struct f2fs_sb_info *sbi, nid_t nid,
+   bool set, bool build)
+{
+   struct f2fs_nm_info *nm_i = NM_I(sbi);
+   unsigned int nat_ofs = NAT_BLOCK_OFFSET(nid);
+   unsigned int nid_ofs = nid - START_NID(nid);
+
+   if (!test_bit_le(nat_ofs, nm_i->nat_block_bitmap))
+   return;
+
+   if (set) {
+   if (test_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]))
+   return;
+   __set_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]);
+   nm_i->free_nid_count[nat_ofs]++;
+   } else {
+   if (!test_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]))
+   return;
+   __clear_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]);
+   if (!build)
+   nm_i->free_nid_count[nat_ofs]--;
+   }
+}
+
 /* return if the nid is recognized as free */
-static bool add_free_nid(struct f2fs_sb_info *sbi, nid_t nid, bool build)
+static bool add_free_nid(struct f2fs_sb_info *sbi,
+   nid_t nid, bool build, bool update)
 {
struct f2fs_nm_info *nm_i = NM_I(sbi);
struct free_nid *i, *e;
@@ -1870,6 +1895,11 @@ static bool add_free_nid(struct f2fs_sb_info *sbi, nid_t 
nid, bool build)
ret = true;
err = __insert_free_nid(sbi, i, FREE_NID);
 err_out:
+   if (update) {
+   update_free_nid_bitmap(sbi, nid, ret, build);
+   if (!build)
+   nm_i->available_nids++;
+   }
spin_unlock(_i->nid_list_lock);
radix_tree_preload_end();
 err:
@@ -1896,30 +1926,6 @@ static void remove_free_nid(struct f2fs_sb_info *sbi, 
nid_t nid)
kmem_cache_free(free_nid_slab, i);
 }
 
-static void update_free_nid_bitmap(struct f2fs_sb_info *sbi, nid_t nid,
-   bool set, bool build)
-{
-   struct f2fs_nm_info *nm_i = NM_I(sbi);
-   unsigned int nat_ofs = NAT_BLOCK_OFFSET(nid);
-   unsigned int nid_ofs = nid - START_NID(nid);
-
-   if (!test_bit_le(nat_ofs, nm_i->nat_block_bitmap))
-   return;
-
-   if (set) {
-   if (test_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]))
-   return;
-   __set_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]);
-   nm_i->free_nid_count[nat_ofs]++;
-   } else {
-   if (!test_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]))
-   return;
-   __clear_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]);
-   if (!build)
-   nm_i->free_nid_count[nat_ofs]--;
-   }
-}
-
 static void scan_nat_page(struct f2fs_sb_info *sbi,
struct page *nat_page, nid_t start_nid)
 {
@@ -1937,18 +1943,18 @@ static void scan_nat_page(struct f2fs_sb_info *sbi,
i = start_nid % NAT_ENTRY_PER_BLOCK;
 
for (; i < NAT_ENTRY_PER_BLOCK; i++, start_nid++) {
-   bool freed = false;
-
if (unlikely(start_nid >= nm_i->max_nid))
break;
 
blk_addr = le32_to_cpu(nat_blk->entries[i].block_addr);
f2fs_bug_on(sbi, blk_addr == NEW_ADDR);
-   if (blk_addr == NULL_ADDR)
-   freed = add_free_nid(sbi, start_nid, true);
-   spin_lock(_I(sbi)->nid_list_lock);
-   update_free_nid_bitmap(sbi, start_nid, freed, true);
-   spin_unlock(_I(sbi)->nid_list_lock);
+   if (blk_addr == NULL_ADDR) {
+   add_free_nid(sbi, start_nid, true, true);
+   } else {
+   spin_lock(_I(sbi)->nid_list_lock);
+   update_free_nid_bitmap(sbi, 

Re: [PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops

2017-11-13 Thread Juergen Gross
On 14/11/17 08:02, Quan Xu wrote:
> 
> 
> On 2017/11/13 18:53, Juergen Gross wrote:
>> On 13/11/17 11:06, Quan Xu wrote:
>>> From: Quan Xu 
>>>
>>> So far, pv_idle_ops.poll is the only ops for pv_idle. .poll is called
>>> in idle path which will poll for a while before we enter the real idle
>>> state.
>>>
>>> In virtualization, idle path includes several heavy operations
>>> includes timer access(LAPIC timer or TSC deadline timer) which will
>>> hurt performance especially for latency intensive workload like message
>>> passing task. The cost is mainly from the vmexit which is a hardware
>>> context switch between virtual machine and hypervisor. Our solution is
>>> to poll for a while and do not enter real idle path if we can get the
>>> schedule event during polling.
>>>
>>> Poll may cause the CPU waste so we adopt a smart polling mechanism to
>>> reduce the useless poll.
>>>
>>> Signed-off-by: Yang Zhang 
>>> Signed-off-by: Quan Xu 
>>> Cc: Juergen Gross 
>>> Cc: Alok Kataria 
>>> Cc: Rusty Russell 
>>> Cc: Thomas Gleixner 
>>> Cc: Ingo Molnar 
>>> Cc: "H. Peter Anvin" 
>>> Cc: x...@kernel.org
>>> Cc: virtualizat...@lists.linux-foundation.org
>>> Cc: linux-kernel@vger.kernel.org
>>> Cc: xen-de...@lists.xenproject.org
>> Hmm, is the idle entry path really so critical to performance that a new
>> pvops function is necessary?
> Juergen, Here is the data we get when running benchmark netperf:
>  1. w/o patch and disable kvm dynamic poll (halt_poll_ns=0):
>     29031.6 bit/s -- 76.1 %CPU
> 
>  2. w/ patch and disable kvm dynamic poll (halt_poll_ns=0):
>     35787.7 bit/s -- 129.4 %CPU
> 
>  3. w/ kvm dynamic poll:
>     35735.6 bit/s -- 200.0 %CPU
> 
>  4. w/patch and w/ kvm dynamic poll:
>     42225.3 bit/s -- 198.7 %CPU
> 
>  5. idle=poll
>     37081.7 bit/s -- 998.1 %CPU
> 
> 
> 
>  w/ this patch, we will improve performance by 23%.. even we could improve
>  performance by 45.4%, if we use w/patch and w/ kvm dynamic poll. also the
>  cost of CPU is much lower than 'idle=poll' case..

I don't question the general idea. I just think pvops isn't the best way
to implement it.

>> Wouldn't a function pointer, maybe guarded
>> by a static key, be enough? A further advantage would be that this would
>> work on other architectures, too.
> 
> I assume this feature will be ported to other archs.. a new pvops makes
> code
> clean and easy to maintain. also I tried to add it into existed pvops,
> but it
> doesn't match.

You are aware that pvops is x86 only?

I really don't see the big difference in maintainability compared to the
static key / function pointer variant:

void (*guest_idle_poll_func)(void);
struct static_key guest_idle_poll_key __read_mostly;

static inline void guest_idle_poll(void)
{
if (static_key_false(_idle_poll_key))
guest_idle_poll_func();
}

And KVM would just need to set guest_idle_poll_func and enable the
static key. Works on non-x86 architectures, too.


Juergen


[f2fs-dev] [PATCH RESEND] f2fs: fix concurrent problem for updating free bitmap

2017-11-13 Thread LiFan
alloc_nid_failed and scan_nat_page can be called at the same time,
and we haven't protected add_free_nid and update_free_nid_bitmap
with the same nid_list_lock. That could lead to

Thread AThread B
- __build_free_nids
 - scan_nat_page
  - add_free_nid
- alloc_nid_failed
 - update_free_nid_bitmap
  - update_free_nid_bitmap

scan_nat_page will clear the free bitmap since the nid is PREALLOC_NID,
but alloc_nid_failed needs to set the free bitmap. This results in
free nid with free bitmap cleared.
This patch update the bitmap under the same nid_list_lock in add_free_nid.

Signed-off-by: Fan li 
---
 fs/f2fs/node.c | 82 ++
 1 file changed, 42 insertions(+), 40 deletions(-)

diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index b965a53..0a217d2 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1811,8 +1811,33 @@ static void __move_free_nid(struct f2fs_sb_info *sbi, 
struct free_nid *i,
}
 }
 
+static void update_free_nid_bitmap(struct f2fs_sb_info *sbi, nid_t nid,
+   bool set, bool build)
+{
+   struct f2fs_nm_info *nm_i = NM_I(sbi);
+   unsigned int nat_ofs = NAT_BLOCK_OFFSET(nid);
+   unsigned int nid_ofs = nid - START_NID(nid);
+
+   if (!test_bit_le(nat_ofs, nm_i->nat_block_bitmap))
+   return;
+
+   if (set) {
+   if (test_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]))
+   return;
+   __set_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]);
+   nm_i->free_nid_count[nat_ofs]++;
+   } else {
+   if (!test_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]))
+   return;
+   __clear_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]);
+   if (!build)
+   nm_i->free_nid_count[nat_ofs]--;
+   }
+}
+
 /* return if the nid is recognized as free */
-static bool add_free_nid(struct f2fs_sb_info *sbi, nid_t nid, bool build)
+static bool add_free_nid(struct f2fs_sb_info *sbi,
+   nid_t nid, bool build, bool update)
 {
struct f2fs_nm_info *nm_i = NM_I(sbi);
struct free_nid *i, *e;
@@ -1870,6 +1895,11 @@ static bool add_free_nid(struct f2fs_sb_info *sbi, nid_t 
nid, bool build)
ret = true;
err = __insert_free_nid(sbi, i, FREE_NID);
 err_out:
+   if (update) {
+   update_free_nid_bitmap(sbi, nid, ret, build);
+   if (!build)
+   nm_i->available_nids++;
+   }
spin_unlock(_i->nid_list_lock);
radix_tree_preload_end();
 err:
@@ -1896,30 +1926,6 @@ static void remove_free_nid(struct f2fs_sb_info *sbi, 
nid_t nid)
kmem_cache_free(free_nid_slab, i);
 }
 
-static void update_free_nid_bitmap(struct f2fs_sb_info *sbi, nid_t nid,
-   bool set, bool build)
-{
-   struct f2fs_nm_info *nm_i = NM_I(sbi);
-   unsigned int nat_ofs = NAT_BLOCK_OFFSET(nid);
-   unsigned int nid_ofs = nid - START_NID(nid);
-
-   if (!test_bit_le(nat_ofs, nm_i->nat_block_bitmap))
-   return;
-
-   if (set) {
-   if (test_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]))
-   return;
-   __set_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]);
-   nm_i->free_nid_count[nat_ofs]++;
-   } else {
-   if (!test_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]))
-   return;
-   __clear_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]);
-   if (!build)
-   nm_i->free_nid_count[nat_ofs]--;
-   }
-}
-
 static void scan_nat_page(struct f2fs_sb_info *sbi,
struct page *nat_page, nid_t start_nid)
 {
@@ -1937,18 +1943,18 @@ static void scan_nat_page(struct f2fs_sb_info *sbi,
i = start_nid % NAT_ENTRY_PER_BLOCK;
 
for (; i < NAT_ENTRY_PER_BLOCK; i++, start_nid++) {
-   bool freed = false;
-
if (unlikely(start_nid >= nm_i->max_nid))
break;
 
blk_addr = le32_to_cpu(nat_blk->entries[i].block_addr);
f2fs_bug_on(sbi, blk_addr == NEW_ADDR);
-   if (blk_addr == NULL_ADDR)
-   freed = add_free_nid(sbi, start_nid, true);
-   spin_lock(_I(sbi)->nid_list_lock);
-   update_free_nid_bitmap(sbi, start_nid, freed, true);
-   spin_unlock(_I(sbi)->nid_list_lock);
+   if (blk_addr == NULL_ADDR) {
+   add_free_nid(sbi, start_nid, true, true);
+   } else {
+   spin_lock(_I(sbi)->nid_list_lock);
+   update_free_nid_bitmap(sbi, start_nid, false, true);
+  

Re: [PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops

2017-11-13 Thread Juergen Gross
On 14/11/17 08:02, Quan Xu wrote:
> 
> 
> On 2017/11/13 18:53, Juergen Gross wrote:
>> On 13/11/17 11:06, Quan Xu wrote:
>>> From: Quan Xu 
>>>
>>> So far, pv_idle_ops.poll is the only ops for pv_idle. .poll is called
>>> in idle path which will poll for a while before we enter the real idle
>>> state.
>>>
>>> In virtualization, idle path includes several heavy operations
>>> includes timer access(LAPIC timer or TSC deadline timer) which will
>>> hurt performance especially for latency intensive workload like message
>>> passing task. The cost is mainly from the vmexit which is a hardware
>>> context switch between virtual machine and hypervisor. Our solution is
>>> to poll for a while and do not enter real idle path if we can get the
>>> schedule event during polling.
>>>
>>> Poll may cause the CPU waste so we adopt a smart polling mechanism to
>>> reduce the useless poll.
>>>
>>> Signed-off-by: Yang Zhang 
>>> Signed-off-by: Quan Xu 
>>> Cc: Juergen Gross 
>>> Cc: Alok Kataria 
>>> Cc: Rusty Russell 
>>> Cc: Thomas Gleixner 
>>> Cc: Ingo Molnar 
>>> Cc: "H. Peter Anvin" 
>>> Cc: x...@kernel.org
>>> Cc: virtualizat...@lists.linux-foundation.org
>>> Cc: linux-kernel@vger.kernel.org
>>> Cc: xen-de...@lists.xenproject.org
>> Hmm, is the idle entry path really so critical to performance that a new
>> pvops function is necessary?
> Juergen, Here is the data we get when running benchmark netperf:
>  1. w/o patch and disable kvm dynamic poll (halt_poll_ns=0):
>     29031.6 bit/s -- 76.1 %CPU
> 
>  2. w/ patch and disable kvm dynamic poll (halt_poll_ns=0):
>     35787.7 bit/s -- 129.4 %CPU
> 
>  3. w/ kvm dynamic poll:
>     35735.6 bit/s -- 200.0 %CPU
> 
>  4. w/patch and w/ kvm dynamic poll:
>     42225.3 bit/s -- 198.7 %CPU
> 
>  5. idle=poll
>     37081.7 bit/s -- 998.1 %CPU
> 
> 
> 
>  w/ this patch, we will improve performance by 23%.. even we could improve
>  performance by 45.4%, if we use w/patch and w/ kvm dynamic poll. also the
>  cost of CPU is much lower than 'idle=poll' case..

I don't question the general idea. I just think pvops isn't the best way
to implement it.

>> Wouldn't a function pointer, maybe guarded
>> by a static key, be enough? A further advantage would be that this would
>> work on other architectures, too.
> 
> I assume this feature will be ported to other archs.. a new pvops makes
> code
> clean and easy to maintain. also I tried to add it into existed pvops,
> but it
> doesn't match.

You are aware that pvops is x86 only?

I really don't see the big difference in maintainability compared to the
static key / function pointer variant:

void (*guest_idle_poll_func)(void);
struct static_key guest_idle_poll_key __read_mostly;

static inline void guest_idle_poll(void)
{
if (static_key_false(_idle_poll_key))
guest_idle_poll_func();
}

And KVM would just need to set guest_idle_poll_func and enable the
static key. Works on non-x86 architectures, too.


Juergen


Re: [PATCH] iio: mma8452: add power_mode sysfs configuration

2017-11-13 Thread Martin Kepplinger

Am 14.11.2017 05:56 schrieb harinath Nampally:

Hi Martin,

But given your concerns, I would strip down this patch to only offer 
the

already documented "low_noise" and "low_power" modes. It wouldn't be
worth it to extend the ABI just because of this!

OK then we can map 'low_noise' to high resolution mode. But I am afraid
I can't test the functionality because I don't have proper instruments 
to

measure the current draw(in microAmps) accurately.


I would like "oversampling" more than this "power_mode" too. For this
driver it would be far more complicated to implement though. I doubt
that it'll be done. power_mode is basically already there implicitely,
and given that there *is* the ABI, we could offer it for free.

I think 'oversampling' is already implemented, as I see
'case IIO_CHAN_INFO_OVERSAMPLING_RATIO:'
being handled which is basically setting the all 4 different power 
modes.

If we also add 'power_mode', I think it would be like having two
different user interfaces for
same functionality. So I don't see much of value adding 'power_mode' as 
well.

Please correct me if I am wrong.

Thanks,
Harinath



You're right. I should've looked more closely. oversampling is there and 
seems to
work. No need to blow up this driver or let alone extend an ABI now. 
Let's drop

this patch.

thanks
 martin


On Sun, Nov 12, 2017 at 7:28 AM, Martin Kepplinger  
wrote:

On 2017-11-11 01:33, Jonathan Cameron wrote:

On Mon, 6 Nov 2017 08:19:58 +0100
Martin Kepplinger  wrote:

This adds the power_mode sysfs interface to the device as documented 
in

sysfs-bus-iio.

---

Note that I explicitely don't sign off on this.

This is a starting point for anybody who can test it and check for 
correct
API usage, and ABI correctness, as documented in 
Documentation/ABI/testing/sys-bus-iio
(grep it for "power_mode"). The ABI doc probably would need an 
addition

too, if the 4 power modes here seem generally useful (there are only
 2 listed there)!

So, if you can test this, feel free to set up a proper patch or
two, and I'm happy to review.

Please note that this patch is quite old. It really should be that 
simple
as far as my understanding back then. We always list the available 
frequencies
of the given power mode we are in, for example, already, and 
everything

basically is in place except for the user interface.


Hmm. A lot of devices support something along these lines.  The issue
has always been - how is userspace to figure out what to do with it?
It's all very vague...

Funnily enough - this used to be really common, but is becoming less 
so
now - presumably because no one was using it much (or maybe I am 
reading

too much into that ;)

Now the question is whether it can be tied to better defined things?

Here low noise restricts the range to 4g.  Issue is that we don't 
actually
have writeable _available attributes (which correspond to range in 
this case).




Does it? Isn't it merely less oversampling.

Low power mode... This one is apparently oversampling.  If possible 
support

it as that as we have well defined interfaces for that.

Jonathan.


Ah, I remember; the oversampling settings was actually a reason why I
hadn't submitted the patch :) The oversampling API would definitely be
more accurate.

I would like "oversampling" more than this "power_mode" too. For this
driver it would be far more complicated to implement though. I doubt
that it'll be done. power_mode is basically already there implicitely,
and given that there *is* the ABI, we could offer it for free.

But given your concerns, I would strip down this patch to only offer 
the

already documented "low_noise" and "low_power" modes. It wouldn't be
worth it to extend the ABI just because of this!

Users would have a simple switch if they don't really *want* to know 
the
details. I think it can be useful to just say "I don't care about 
power

consuption. Be as accurate as possible" or "I just want this think to
work. Use a little power as possible." Sure it's vage, but would it be
useless?




Re: [PATCH] iio: mma8452: add power_mode sysfs configuration

2017-11-13 Thread Martin Kepplinger

Am 14.11.2017 05:56 schrieb harinath Nampally:

Hi Martin,

But given your concerns, I would strip down this patch to only offer 
the

already documented "low_noise" and "low_power" modes. It wouldn't be
worth it to extend the ABI just because of this!

OK then we can map 'low_noise' to high resolution mode. But I am afraid
I can't test the functionality because I don't have proper instruments 
to

measure the current draw(in microAmps) accurately.


I would like "oversampling" more than this "power_mode" too. For this
driver it would be far more complicated to implement though. I doubt
that it'll be done. power_mode is basically already there implicitely,
and given that there *is* the ABI, we could offer it for free.

I think 'oversampling' is already implemented, as I see
'case IIO_CHAN_INFO_OVERSAMPLING_RATIO:'
being handled which is basically setting the all 4 different power 
modes.

If we also add 'power_mode', I think it would be like having two
different user interfaces for
same functionality. So I don't see much of value adding 'power_mode' as 
well.

Please correct me if I am wrong.

Thanks,
Harinath



You're right. I should've looked more closely. oversampling is there and 
seems to
work. No need to blow up this driver or let alone extend an ABI now. 
Let's drop

this patch.

thanks
 martin


On Sun, Nov 12, 2017 at 7:28 AM, Martin Kepplinger  
wrote:

On 2017-11-11 01:33, Jonathan Cameron wrote:

On Mon, 6 Nov 2017 08:19:58 +0100
Martin Kepplinger  wrote:

This adds the power_mode sysfs interface to the device as documented 
in

sysfs-bus-iio.

---

Note that I explicitely don't sign off on this.

This is a starting point for anybody who can test it and check for 
correct
API usage, and ABI correctness, as documented in 
Documentation/ABI/testing/sys-bus-iio
(grep it for "power_mode"). The ABI doc probably would need an 
addition

too, if the 4 power modes here seem generally useful (there are only
 2 listed there)!

So, if you can test this, feel free to set up a proper patch or
two, and I'm happy to review.

Please note that this patch is quite old. It really should be that 
simple
as far as my understanding back then. We always list the available 
frequencies
of the given power mode we are in, for example, already, and 
everything

basically is in place except for the user interface.


Hmm. A lot of devices support something along these lines.  The issue
has always been - how is userspace to figure out what to do with it?
It's all very vague...

Funnily enough - this used to be really common, but is becoming less 
so
now - presumably because no one was using it much (or maybe I am 
reading

too much into that ;)

Now the question is whether it can be tied to better defined things?

Here low noise restricts the range to 4g.  Issue is that we don't 
actually
have writeable _available attributes (which correspond to range in 
this case).




Does it? Isn't it merely less oversampling.

Low power mode... This one is apparently oversampling.  If possible 
support

it as that as we have well defined interfaces for that.

Jonathan.


Ah, I remember; the oversampling settings was actually a reason why I
hadn't submitted the patch :) The oversampling API would definitely be
more accurate.

I would like "oversampling" more than this "power_mode" too. For this
driver it would be far more complicated to implement though. I doubt
that it'll be done. power_mode is basically already there implicitely,
and given that there *is* the ABI, we could offer it for free.

But given your concerns, I would strip down this patch to only offer 
the

already documented "low_noise" and "low_power" modes. It wouldn't be
worth it to extend the ABI just because of this!

Users would have a simple switch if they don't really *want* to know 
the
details. I think it can be useful to just say "I don't care about 
power

consuption. Be as accurate as possible" or "I just want this think to
work. Use a little power as possible." Sure it's vage, but would it be
useless?




Re: [PATCH v3 2/3] usb: xhci: Add DbC support in xHCI driver

2017-11-13 Thread Felipe Balbi

Hi,

Mathias Nyman  writes:
>> +static int dbc_buf_alloc(struct dbc_buf *db, unsigned int size)
>> +{
>> +db->buf_buf = kzalloc(size, GFP_KERNEL);
>> +if (!db->buf_buf)
>> +return -ENOMEM;
>> +
>> +db->buf_size = size;
>> +db->buf_put = db->buf_buf;
>> +db->buf_get = db->buf_buf;
>> +
>> +return 0;
>> +}

you may wanna have a look at kfifo.

-- 
balbi


Re: [PATCH v3 2/3] usb: xhci: Add DbC support in xHCI driver

2017-11-13 Thread Felipe Balbi

Hi,

Mathias Nyman  writes:
>> +static int dbc_buf_alloc(struct dbc_buf *db, unsigned int size)
>> +{
>> +db->buf_buf = kzalloc(size, GFP_KERNEL);
>> +if (!db->buf_buf)
>> +return -ENOMEM;
>> +
>> +db->buf_size = size;
>> +db->buf_put = db->buf_buf;
>> +db->buf_get = db->buf_buf;
>> +
>> +return 0;
>> +}

you may wanna have a look at kfifo.

-- 
balbi


Re: [PATCH] lib: Avoid redundant sizeof checking in __bitmap_weight() calculation.

2017-11-13 Thread Rasmus Villemoes
On 14 November 2017 at 07:57, Rakib Mullick  wrote:
> Currently, during __bitmap_weight() calculation hweight_long() is used.
> Inside a hweight_long() a check has been made to figure out whether a
> hweight32() or hweight64() version to use.
>
> diff --git a/lib/bitmap.c b/lib/bitmap.c
> index d8f0c09..552096f 100644
> --- a/lib/bitmap.c
> +++ b/lib/bitmap.c
> @@ -241,10 +241,15 @@ EXPORT_SYMBOL(__bitmap_subset);
>  int __bitmap_weight(const unsigned long *bitmap, unsigned int bits)
>  {
> unsigned int k, lim = bits/BITS_PER_LONG;
> -   int w = 0;
> -
> -   for (k = 0; k < lim; k++)
> -   w += hweight_long(bitmap[k]);
> +   int w = 0, is32 = sizeof(bitmap[0]) ? 1 : 0;
> +

hint: sizeof() very rarely evaluates to zero... So this is the same as
"is32 = 1". So the patch as-is is broken (and may explain the 1-byte
delta in vmlinux). But even if this condition is fixed, the patch
doesn't change anything, since the sizeof() evaluation is done at
compile-time, regardless of whether it happens inside the inlined
hweight_long or outside. So it is certainly not worth it to duplicate
the loop.

Rasmus


Re: [PATCH] lib: Avoid redundant sizeof checking in __bitmap_weight() calculation.

2017-11-13 Thread Rasmus Villemoes
On 14 November 2017 at 07:57, Rakib Mullick  wrote:
> Currently, during __bitmap_weight() calculation hweight_long() is used.
> Inside a hweight_long() a check has been made to figure out whether a
> hweight32() or hweight64() version to use.
>
> diff --git a/lib/bitmap.c b/lib/bitmap.c
> index d8f0c09..552096f 100644
> --- a/lib/bitmap.c
> +++ b/lib/bitmap.c
> @@ -241,10 +241,15 @@ EXPORT_SYMBOL(__bitmap_subset);
>  int __bitmap_weight(const unsigned long *bitmap, unsigned int bits)
>  {
> unsigned int k, lim = bits/BITS_PER_LONG;
> -   int w = 0;
> -
> -   for (k = 0; k < lim; k++)
> -   w += hweight_long(bitmap[k]);
> +   int w = 0, is32 = sizeof(bitmap[0]) ? 1 : 0;
> +

hint: sizeof() very rarely evaluates to zero... So this is the same as
"is32 = 1". So the patch as-is is broken (and may explain the 1-byte
delta in vmlinux). But even if this condition is fixed, the patch
doesn't change anything, since the sizeof() evaluation is done at
compile-time, regardless of whether it happens inside the inlined
hweight_long or outside. So it is certainly not worth it to duplicate
the loop.

Rasmus


Re: [PATCH] iio: accel: mma8452: Add single pulse/tap event detection

2017-11-13 Thread Martin Kepplinger

Am 14.11.2017 05:36 schrieb harinath Nampally:

> This patch adds following related changes:
> - defines pulse event related registers
> - enables and handles single pulse interrupt for fxls8471
> - handles IIO_EV_DIR_EITHER in read/write callbacks (because
>   event direction for pulse is either rising or falling)
> - configures read/write event value for pulse latency register
>   using IIO_EV_INFO_HYSTERESIS
> - adds multiple events like pulse and tranient event spec
>   as elements of event_spec array named 'mma8452_accel_events'
>
> Except mma8653 chip all other chips like mma845x and
> fxls8471 have single tap detection feature.
> Tested thoroughly using iio_event_monitor application on
> imx6ul-evk board which has fxls8471.
>
> Signed-off-by: Harinath Nampally 
> ---
What tree is this written against? It doesn't apply to the current 
-next

anyways.

Thanks for the review.
It is actually against 'testing' branch, I think two of my earlier
patches are not yet applied to
any branch, that might be reason this patch is not good against
current -next or 'togreg'.


I think the defintions would deserve to be in a separate patch, but
that's debatable.

Yes, I would argue that definitions are not a logical change.



I would argue definitions don't break the build and maybe slightly 
better

support features like bisect or revert :)


>   .type = IIO_EV_TYPE_MAG,
>   .dir = IIO_EV_DIR_RISING,
>   .mask_separate = BIT(IIO_EV_INFO_ENABLE),
> @@ -1139,6 +1274,15 @@ static const struct iio_event_spec 
mma8452_transient_event[] = {
>   BIT(IIO_EV_INFO_PERIOD) |
>   BIT(IIO_EV_INFO_HIGH_PASS_FILTER_3DB)
>   },
> + {
> + //pulse event
> + .type = IIO_EV_TYPE_MAG,
> + .dir = IIO_EV_DIR_EITHER,
> + .mask_separate = BIT(IIO_EV_INFO_ENABLE),
> + .mask_shared_by_type = BIT(IIO_EV_INFO_VALUE) |
> + BIT(IIO_EV_INFO_PERIOD) |
> + BIT(IIO_EV_INFO_HYSTERESIS)
> + },
>  };
>
>  static const struct iio_event_spec mma8452_motion_event[] = {
> @@ -1202,8 +1346,8 @@ static struct attribute_group 
mma8452_event_attribute_group = {
>   .shift = 16 - (bits), \
>   .endianness = IIO_BE, \
>   }, \
> - .event_spec = mma8452_transient_event, \
> - .num_event_specs = ARRAY_SIZE(mma8452_transient_event), \
> + .event_spec = mma8452_accel_events, \
> + .num_event_specs = ARRAY_SIZE(mma8452_accel_events), \
that would go in the mentioned separate renaming-patch

OK so I will make a patch set; patch 1/2 to just rename
'mma8452_transient_event[]'
to 'mma8452_accel_events[]'(without adding pulse event).
and everything else would go in 2/2. Does that makes sense?



It does to me.


Re: [PATCH] iio: accel: mma8452: Add single pulse/tap event detection

2017-11-13 Thread Martin Kepplinger

Am 14.11.2017 05:36 schrieb harinath Nampally:

> This patch adds following related changes:
> - defines pulse event related registers
> - enables and handles single pulse interrupt for fxls8471
> - handles IIO_EV_DIR_EITHER in read/write callbacks (because
>   event direction for pulse is either rising or falling)
> - configures read/write event value for pulse latency register
>   using IIO_EV_INFO_HYSTERESIS
> - adds multiple events like pulse and tranient event spec
>   as elements of event_spec array named 'mma8452_accel_events'
>
> Except mma8653 chip all other chips like mma845x and
> fxls8471 have single tap detection feature.
> Tested thoroughly using iio_event_monitor application on
> imx6ul-evk board which has fxls8471.
>
> Signed-off-by: Harinath Nampally 
> ---
What tree is this written against? It doesn't apply to the current 
-next

anyways.

Thanks for the review.
It is actually against 'testing' branch, I think two of my earlier
patches are not yet applied to
any branch, that might be reason this patch is not good against
current -next or 'togreg'.


I think the defintions would deserve to be in a separate patch, but
that's debatable.

Yes, I would argue that definitions are not a logical change.



I would argue definitions don't break the build and maybe slightly 
better

support features like bisect or revert :)


>   .type = IIO_EV_TYPE_MAG,
>   .dir = IIO_EV_DIR_RISING,
>   .mask_separate = BIT(IIO_EV_INFO_ENABLE),
> @@ -1139,6 +1274,15 @@ static const struct iio_event_spec 
mma8452_transient_event[] = {
>   BIT(IIO_EV_INFO_PERIOD) |
>   BIT(IIO_EV_INFO_HIGH_PASS_FILTER_3DB)
>   },
> + {
> + //pulse event
> + .type = IIO_EV_TYPE_MAG,
> + .dir = IIO_EV_DIR_EITHER,
> + .mask_separate = BIT(IIO_EV_INFO_ENABLE),
> + .mask_shared_by_type = BIT(IIO_EV_INFO_VALUE) |
> + BIT(IIO_EV_INFO_PERIOD) |
> + BIT(IIO_EV_INFO_HYSTERESIS)
> + },
>  };
>
>  static const struct iio_event_spec mma8452_motion_event[] = {
> @@ -1202,8 +1346,8 @@ static struct attribute_group 
mma8452_event_attribute_group = {
>   .shift = 16 - (bits), \
>   .endianness = IIO_BE, \
>   }, \
> - .event_spec = mma8452_transient_event, \
> - .num_event_specs = ARRAY_SIZE(mma8452_transient_event), \
> + .event_spec = mma8452_accel_events, \
> + .num_event_specs = ARRAY_SIZE(mma8452_accel_events), \
that would go in the mentioned separate renaming-patch

OK so I will make a patch set; patch 1/2 to just rename
'mma8452_transient_event[]'
to 'mma8452_accel_events[]'(without adding pulse event).
and everything else would go in 2/2. Does that makes sense?



It does to me.


Re: [RESEND PATCH v2 3/4] x86/umip: Identify the str and sldt instructions

2017-11-13 Thread Ingo Molnar

* Ricardo Neri  wrote:

> The instructions STR and SLDT are not emulated in any case. Thus, it made
> sense to not implement functionality to identify them. However, a
> subsequent commit will introduce functionality to warn about the use of
> all the instructions that UMIP protect, not only those that are emulated.
> A first step for that is the ability to identify them.
> 
> Plus, now that STR and SLDT are identified, we need to explicitly avoid
> their emulation (i.e., not rely on unsuccessful identification). Group
> togehter all the cases that we do not want to emulate: STR, SLDT and user
> long mode processes.
> 
> Cc: Andy Lutomirski 
> Cc: H. Peter Anvin 
> Cc: Borislav Petkov 
> Cc: Tony Luck 
> Cc: Paolo Bonzini 
> Cc: Ravi V. Shankar 
> Cc: x...@kernel.org
> Signed-off-by: Ricardo Neri 

Sigh, the _title_ still refers to 'str'...

I'll fix it up, no need to resend, but this lack of attention to details is 
seriously sad.

Thanks,

Ingo


Re: [RESEND PATCH v2 3/4] x86/umip: Identify the str and sldt instructions

2017-11-13 Thread Ingo Molnar

* Ricardo Neri  wrote:

> The instructions STR and SLDT are not emulated in any case. Thus, it made
> sense to not implement functionality to identify them. However, a
> subsequent commit will introduce functionality to warn about the use of
> all the instructions that UMIP protect, not only those that are emulated.
> A first step for that is the ability to identify them.
> 
> Plus, now that STR and SLDT are identified, we need to explicitly avoid
> their emulation (i.e., not rely on unsuccessful identification). Group
> togehter all the cases that we do not want to emulate: STR, SLDT and user
> long mode processes.
> 
> Cc: Andy Lutomirski 
> Cc: H. Peter Anvin 
> Cc: Borislav Petkov 
> Cc: Tony Luck 
> Cc: Paolo Bonzini 
> Cc: Ravi V. Shankar 
> Cc: x...@kernel.org
> Signed-off-by: Ricardo Neri 

Sigh, the _title_ still refers to 'str'...

I'll fix it up, no need to resend, but this lack of attention to details is 
seriously sad.

Thanks,

Ingo


Re: [PATCH] arch, mm: introduce arch_tlb_gather_mmu_lazy (was: Re: [RESEND PATCH] mm, oom_reaper: gather each vma to prevent) leaking TLB entry

2017-11-13 Thread Michal Hocko
On Tue 14-11-17 10:45:49, Minchan Kim wrote:
[...]
> Anyway, I think Wang Nan's patch is already broken.
> http://lkml.kernel.org/r/%3c20171107095453.179940-1-wangn...@huawei.com%3E
> 
> Because unmap_page_range(ie, zap_pte_range) can flush TLB forcefully
> and free pages. However, the architecture code for TLB flush cannot
> flush at all by wrong fullmm so other threads can write freed-page.

I am not sure I understand what you mean. How is that any different from
any other explicit partial madvise call?
-- 
Michal Hocko
SUSE Labs


[f2fs-dev] [PATCH RESEND v2] f2fs: validate before set/clear free nat bitmap

2017-11-13 Thread LiFan
In flush_nat_entries, all dirty nats will be flushed and if their new address
isn't NULL_ADDR, their bitmaps will be updated, the free_nid_count of the
bitmaps will be increased regardless of whether the nats have already been
occupied before. This could lead to wrong free_nid_count.
So this patch checks the status of the bits before actually set/clear them.

Fixes: 586d1492f301 ("f2fs: skip scanning free nid bitmap of full NAT blocks")

Signed-off-by: Fan li 
---
 fs/f2fs/node.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index d234c6e..b965a53 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1906,15 +1906,18 @@ static void update_free_nid_bitmap(struct f2fs_sb_info 
*sbi, nid_t nid,
if (!test_bit_le(nat_ofs, nm_i->nat_block_bitmap))
return;
 
-   if (set)
+   if (set) {
+   if (test_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]))
+   return;
__set_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]);
-   else
-   __clear_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]);
-
-   if (set)
nm_i->free_nid_count[nat_ofs]++;
-   else if (!build)
-   nm_i->free_nid_count[nat_ofs]--;
+   } else {
+   if (!test_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]))
+   return;
+   __clear_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]);
+   if (!build)
+   nm_i->free_nid_count[nat_ofs]--;
+   }
 }
 
 static void scan_nat_page(struct f2fs_sb_info *sbi,
--
2.7.4




Re: [PATCH] arch, mm: introduce arch_tlb_gather_mmu_lazy (was: Re: [RESEND PATCH] mm, oom_reaper: gather each vma to prevent) leaking TLB entry

2017-11-13 Thread Michal Hocko
On Tue 14-11-17 10:45:49, Minchan Kim wrote:
[...]
> Anyway, I think Wang Nan's patch is already broken.
> http://lkml.kernel.org/r/%3c20171107095453.179940-1-wangn...@huawei.com%3E
> 
> Because unmap_page_range(ie, zap_pte_range) can flush TLB forcefully
> and free pages. However, the architecture code for TLB flush cannot
> flush at all by wrong fullmm so other threads can write freed-page.

I am not sure I understand what you mean. How is that any different from
any other explicit partial madvise call?
-- 
Michal Hocko
SUSE Labs


[f2fs-dev] [PATCH RESEND v2] f2fs: validate before set/clear free nat bitmap

2017-11-13 Thread LiFan
In flush_nat_entries, all dirty nats will be flushed and if their new address
isn't NULL_ADDR, their bitmaps will be updated, the free_nid_count of the
bitmaps will be increased regardless of whether the nats have already been
occupied before. This could lead to wrong free_nid_count.
So this patch checks the status of the bits before actually set/clear them.

Fixes: 586d1492f301 ("f2fs: skip scanning free nid bitmap of full NAT blocks")

Signed-off-by: Fan li 
---
 fs/f2fs/node.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index d234c6e..b965a53 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1906,15 +1906,18 @@ static void update_free_nid_bitmap(struct f2fs_sb_info 
*sbi, nid_t nid,
if (!test_bit_le(nat_ofs, nm_i->nat_block_bitmap))
return;
 
-   if (set)
+   if (set) {
+   if (test_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]))
+   return;
__set_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]);
-   else
-   __clear_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]);
-
-   if (set)
nm_i->free_nid_count[nat_ofs]++;
-   else if (!build)
-   nm_i->free_nid_count[nat_ofs]--;
+   } else {
+   if (!test_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]))
+   return;
+   __clear_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]);
+   if (!build)
+   nm_i->free_nid_count[nat_ofs]--;
+   }
 }
 
 static void scan_nat_page(struct f2fs_sb_info *sbi,
--
2.7.4




Re: [PATCH 3/4] x86/umip: Identify the str and sldt instructions

2017-11-13 Thread Ingo Molnar

* Ricardo Neri  wrote:

> On Mon, Nov 13, 2017 at 09:12:03AM +0100, Ingo Molnar wrote:
> > 
> > * Ricardo Neri  wrote:
> > 
> > > The instructions str and sldt are not emulated in any case. Thus, it made
> > > sense to not implement functionality to identify them. However, a
> > > subsequent commit will introduce functionality to warn about the use of
> > > all the instructions that UMIP protect, not only those that are emulated.
> > > A first step for that is the ability to identify them.
> > > 
> > > Plus, now that str and sldt are identified, we need to explicitly avoid
> > > their emulation (i.e., not rely on unsuccessful identification). Group
> > > togehter all the cases that we do not want to emulate: str, sldt and user
> > > long mode processes.
> > 
> > Did you notice how in all your previous patches (both in the code and in 
> > the 
> > changelogs) I have manually fixed up the capitalization of these 
> > instruction 
> > mnenonics?
> 
> I am sorry, I tried to see where you made these changes but I could not find
> any. I did a git diff of arch/x86/kernel/umip.c between the branch 
> rneri/umip_v11
> of my repository [1] and the master branch of the tip tree and I did not find
> any differences.

For example, I turned:

  [PATCH v11 12/12] selftests/x86: Add tests for instruction str and sldt

  The instructions str and sldt are not valid when running on virtual-8086
  mode and generate an invalid operand exception.
  ...

into:

  a9e017d5619e: selftests/x86: Add tests for the STR and SLDT instructions

  The STR and SLDT instructions are not valid when running on virtual-8086
  mode and generate an invalid operand exception.
  ...

I did not catch every case though.

Thanks,

Ingo


Re: [PATCH 3/4] x86/umip: Identify the str and sldt instructions

2017-11-13 Thread Ingo Molnar

* Ricardo Neri  wrote:

> On Mon, Nov 13, 2017 at 09:12:03AM +0100, Ingo Molnar wrote:
> > 
> > * Ricardo Neri  wrote:
> > 
> > > The instructions str and sldt are not emulated in any case. Thus, it made
> > > sense to not implement functionality to identify them. However, a
> > > subsequent commit will introduce functionality to warn about the use of
> > > all the instructions that UMIP protect, not only those that are emulated.
> > > A first step for that is the ability to identify them.
> > > 
> > > Plus, now that str and sldt are identified, we need to explicitly avoid
> > > their emulation (i.e., not rely on unsuccessful identification). Group
> > > togehter all the cases that we do not want to emulate: str, sldt and user
> > > long mode processes.
> > 
> > Did you notice how in all your previous patches (both in the code and in 
> > the 
> > changelogs) I have manually fixed up the capitalization of these 
> > instruction 
> > mnenonics?
> 
> I am sorry, I tried to see where you made these changes but I could not find
> any. I did a git diff of arch/x86/kernel/umip.c between the branch 
> rneri/umip_v11
> of my repository [1] and the master branch of the tip tree and I did not find
> any differences.

For example, I turned:

  [PATCH v11 12/12] selftests/x86: Add tests for instruction str and sldt

  The instructions str and sldt are not valid when running on virtual-8086
  mode and generate an invalid operand exception.
  ...

into:

  a9e017d5619e: selftests/x86: Add tests for the STR and SLDT instructions

  The STR and SLDT instructions are not valid when running on virtual-8086
  mode and generate an invalid operand exception.
  ...

I did not catch every case though.

Thanks,

Ingo


Re: [PATCH net-next 0/3] rxrpc: Fixes

2017-11-13 Thread David Miller
From: David Howells 
Date: Sat, 11 Nov 2017 17:57:52 +

> 
> Here are some patches that fix some things in AF_RXRPC:
> 
>  (1) Prevent notifications from being passed to a kernel service for a call
>  that it has ended.
> 
>  (2) Fix a null pointer deference that occurs under some circumstances when an
>  ACK is generated.
> 
>  (3) Fix a number of things to do with call expiration.
> 
> The patches can be found here also:
> 
>   
> http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-next
> 
> Tagged thusly:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
>   rxrpc-next-2017

Pulled, thanks David.


Re: [PATCH net-next 0/3] rxrpc: Fixes

2017-11-13 Thread David Miller
From: David Howells 
Date: Sat, 11 Nov 2017 17:57:52 +

> 
> Here are some patches that fix some things in AF_RXRPC:
> 
>  (1) Prevent notifications from being passed to a kernel service for a call
>  that it has ended.
> 
>  (2) Fix a null pointer deference that occurs under some circumstances when an
>  ACK is generated.
> 
>  (3) Fix a number of things to do with call expiration.
> 
> The patches can be found here also:
> 
>   
> http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-next
> 
> Tagged thusly:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
>   rxrpc-next-2017

Pulled, thanks David.


Re: [RFC 1/7] x86/asm/64: Allocate and enable the SYSENTER stack

2017-11-13 Thread Ingo Molnar

* Andy Lutomirski  wrote:

> I have old patches to stop using IST for #DB and #BP, but I never finished 
> them.

I'm all in favor of reviving that effort!

Thanks,

Ingo


Re: [RFC 1/7] x86/asm/64: Allocate and enable the SYSENTER stack

2017-11-13 Thread Ingo Molnar

* Andy Lutomirski  wrote:

> I have old patches to stop using IST for #DB and #BP, but I never finished 
> them.

I'm all in favor of reviving that effort!

Thanks,

Ingo


Re: [PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops

2017-11-13 Thread Wanpeng Li
2017-11-14 15:02 GMT+08:00 Quan Xu :
>
>
> On 2017/11/13 18:53, Juergen Gross wrote:
>>
>> On 13/11/17 11:06, Quan Xu wrote:
>>>
>>> From: Quan Xu 
>>>
>>> So far, pv_idle_ops.poll is the only ops for pv_idle. .poll is called
>>> in idle path which will poll for a while before we enter the real idle
>>> state.
>>>
>>> In virtualization, idle path includes several heavy operations
>>> includes timer access(LAPIC timer or TSC deadline timer) which will
>>> hurt performance especially for latency intensive workload like message
>>> passing task. The cost is mainly from the vmexit which is a hardware
>>> context switch between virtual machine and hypervisor. Our solution is
>>> to poll for a while and do not enter real idle path if we can get the
>>> schedule event during polling.
>>>
>>> Poll may cause the CPU waste so we adopt a smart polling mechanism to
>>> reduce the useless poll.
>>>
>>> Signed-off-by: Yang Zhang 
>>> Signed-off-by: Quan Xu 
>>> Cc: Juergen Gross 
>>> Cc: Alok Kataria 
>>> Cc: Rusty Russell 
>>> Cc: Thomas Gleixner 
>>> Cc: Ingo Molnar 
>>> Cc: "H. Peter Anvin" 
>>> Cc: x...@kernel.org
>>> Cc: virtualizat...@lists.linux-foundation.org
>>> Cc: linux-kernel@vger.kernel.org
>>> Cc: xen-de...@lists.xenproject.org
>>
>> Hmm, is the idle entry path really so critical to performance that a new
>> pvops function is necessary?
>
> Juergen, Here is the data we get when running benchmark netperf:
>  1. w/o patch and disable kvm dynamic poll (halt_poll_ns=0):
> 29031.6 bit/s -- 76.1 %CPU
>
>  2. w/ patch and disable kvm dynamic poll (halt_poll_ns=0):
> 35787.7 bit/s -- 129.4 %CPU
>
>  3. w/ kvm dynamic poll:
> 35735.6 bit/s -- 200.0 %CPU

Actually we can reduce the CPU utilization by sleeping a period of
time as what has already been done in the poll logic of IO subsystem,
then we can improve the algorithm in kvm instead of introduing another
duplicate one in the kvm guest.

Regards,
Wanpeng Li

>
>  4. w/patch and w/ kvm dynamic poll:
> 42225.3 bit/s -- 198.7 %CPU
>
>  5. idle=poll
> 37081.7 bit/s -- 998.1 %CPU
>
>
>
>  w/ this patch, we will improve performance by 23%.. even we could improve
>  performance by 45.4%, if we use w/patch and w/ kvm dynamic poll. also the
>  cost of CPU is much lower than 'idle=poll' case..
>
>> Wouldn't a function pointer, maybe guarded
>> by a static key, be enough? A further advantage would be that this would
>> work on other architectures, too.
>
>
> I assume this feature will be ported to other archs.. a new pvops makes code
> clean and easy to maintain. also I tried to add it into existed pvops, but
> it
> doesn't match.
>
>
>
> Quan
> Alibaba Cloud
>>
>>
>> Juergen
>>
>


Re: [PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops

2017-11-13 Thread Wanpeng Li
2017-11-14 15:02 GMT+08:00 Quan Xu :
>
>
> On 2017/11/13 18:53, Juergen Gross wrote:
>>
>> On 13/11/17 11:06, Quan Xu wrote:
>>>
>>> From: Quan Xu 
>>>
>>> So far, pv_idle_ops.poll is the only ops for pv_idle. .poll is called
>>> in idle path which will poll for a while before we enter the real idle
>>> state.
>>>
>>> In virtualization, idle path includes several heavy operations
>>> includes timer access(LAPIC timer or TSC deadline timer) which will
>>> hurt performance especially for latency intensive workload like message
>>> passing task. The cost is mainly from the vmexit which is a hardware
>>> context switch between virtual machine and hypervisor. Our solution is
>>> to poll for a while and do not enter real idle path if we can get the
>>> schedule event during polling.
>>>
>>> Poll may cause the CPU waste so we adopt a smart polling mechanism to
>>> reduce the useless poll.
>>>
>>> Signed-off-by: Yang Zhang 
>>> Signed-off-by: Quan Xu 
>>> Cc: Juergen Gross 
>>> Cc: Alok Kataria 
>>> Cc: Rusty Russell 
>>> Cc: Thomas Gleixner 
>>> Cc: Ingo Molnar 
>>> Cc: "H. Peter Anvin" 
>>> Cc: x...@kernel.org
>>> Cc: virtualizat...@lists.linux-foundation.org
>>> Cc: linux-kernel@vger.kernel.org
>>> Cc: xen-de...@lists.xenproject.org
>>
>> Hmm, is the idle entry path really so critical to performance that a new
>> pvops function is necessary?
>
> Juergen, Here is the data we get when running benchmark netperf:
>  1. w/o patch and disable kvm dynamic poll (halt_poll_ns=0):
> 29031.6 bit/s -- 76.1 %CPU
>
>  2. w/ patch and disable kvm dynamic poll (halt_poll_ns=0):
> 35787.7 bit/s -- 129.4 %CPU
>
>  3. w/ kvm dynamic poll:
> 35735.6 bit/s -- 200.0 %CPU

Actually we can reduce the CPU utilization by sleeping a period of
time as what has already been done in the poll logic of IO subsystem,
then we can improve the algorithm in kvm instead of introduing another
duplicate one in the kvm guest.

Regards,
Wanpeng Li

>
>  4. w/patch and w/ kvm dynamic poll:
> 42225.3 bit/s -- 198.7 %CPU
>
>  5. idle=poll
> 37081.7 bit/s -- 998.1 %CPU
>
>
>
>  w/ this patch, we will improve performance by 23%.. even we could improve
>  performance by 45.4%, if we use w/patch and w/ kvm dynamic poll. also the
>  cost of CPU is much lower than 'idle=poll' case..
>
>> Wouldn't a function pointer, maybe guarded
>> by a static key, be enough? A further advantage would be that this would
>> work on other architectures, too.
>
>
> I assume this feature will be ported to other archs.. a new pvops makes code
> clean and easy to maintain. also I tried to add it into existed pvops, but
> it
> doesn't match.
>
>
>
> Quan
> Alibaba Cloud
>>
>>
>> Juergen
>>
>


RE: [patch v2 3/8] KVM: x86: add Intel processor trace virtualization mode

2017-11-13 Thread Kang, Luwei
> > +#define VM_EXIT_PT_SUPPRESS_PIP0x0100
> > +#define VM_EXIT_CLEAR_IA32_RTIT_CTL0x0200
> >
> >  #define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR  0x00036dff
> >
> > @@ -108,6 +112,8 @@
> >  #define VM_ENTRY_LOAD_IA32_PAT 0x4000
> >  #define VM_ENTRY_LOAD_IA32_EFER 0x8000
> >  #define VM_ENTRY_LOAD_BNDCFGS   0x0001
> > +#define VM_ENTRY_PT_SUPPRESS_PIP   0x0002
> > +#define VM_ENTRY_LOAD_IA32_RTIT_CTL0x0004
> 
> 
> Please use PT_CONCEAL instead of PT_SUPPRESS_PIP, to better match the SDM 
> (for both vmexit and vmentry controls).
> 
> > +   if (!enable_ept)
> > +   vmexit_control &= ~VM_EXIT_CLEAR_IA32_RTIT_CTL;
> > +
> 
> Why is this (and the similar bit-clear operation in vmx_vmentry_control) 
> needed only for !enable_ept?
> 
> Shouldn't it be like
> 
>   if (pt_mode == PT_MODE_SYSTEM) {
>   vmexit_control &= ~VM_EXIT_PT_SUPPRESS_PIP;
>   vmexit_control &= ~VM_EXIT_CLEAR_IA32_RTIT_CTL;
>   }
> 
> and
> 
>   if (pt_mode == PT_MODE_SYSTEM) {
>   vmentry_control &= ~VM_ENTRY_PT_SUPPRESS_PIP;
>   vmentry_control &= ~VM_ENTRY_LOAD_IA32_RTIT_CTL;
>   }
> 

I think I have a misunderstand of " always set "use GPA for processor tracing" 
in secondary execution control if it can be ".
"use GPA for processor tracing" can't be set in SYSTEM mode even if hardware 
can set this bit. Because guest will still think this a GPA address and 
translate by EPT. In fact, RTIT_OUTPUT_BASE will always a HPA in SYSTEM mode.
Will fix in next version.

Thanks,
Luwei Kang



RE: [patch v2 3/8] KVM: x86: add Intel processor trace virtualization mode

2017-11-13 Thread Kang, Luwei
> > +#define VM_EXIT_PT_SUPPRESS_PIP0x0100
> > +#define VM_EXIT_CLEAR_IA32_RTIT_CTL0x0200
> >
> >  #define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR  0x00036dff
> >
> > @@ -108,6 +112,8 @@
> >  #define VM_ENTRY_LOAD_IA32_PAT 0x4000
> >  #define VM_ENTRY_LOAD_IA32_EFER 0x8000
> >  #define VM_ENTRY_LOAD_BNDCFGS   0x0001
> > +#define VM_ENTRY_PT_SUPPRESS_PIP   0x0002
> > +#define VM_ENTRY_LOAD_IA32_RTIT_CTL0x0004
> 
> 
> Please use PT_CONCEAL instead of PT_SUPPRESS_PIP, to better match the SDM 
> (for both vmexit and vmentry controls).
> 
> > +   if (!enable_ept)
> > +   vmexit_control &= ~VM_EXIT_CLEAR_IA32_RTIT_CTL;
> > +
> 
> Why is this (and the similar bit-clear operation in vmx_vmentry_control) 
> needed only for !enable_ept?
> 
> Shouldn't it be like
> 
>   if (pt_mode == PT_MODE_SYSTEM) {
>   vmexit_control &= ~VM_EXIT_PT_SUPPRESS_PIP;
>   vmexit_control &= ~VM_EXIT_CLEAR_IA32_RTIT_CTL;
>   }
> 
> and
> 
>   if (pt_mode == PT_MODE_SYSTEM) {
>   vmentry_control &= ~VM_ENTRY_PT_SUPPRESS_PIP;
>   vmentry_control &= ~VM_ENTRY_LOAD_IA32_RTIT_CTL;
>   }
> 

I think I have a misunderstand of " always set "use GPA for processor tracing" 
in secondary execution control if it can be ".
"use GPA for processor tracing" can't be set in SYSTEM mode even if hardware 
can set this bit. Because guest will still think this a GPA address and 
translate by EPT. In fact, RTIT_OUTPUT_BASE will always a HPA in SYSTEM mode.
Will fix in next version.

Thanks,
Luwei Kang



Re: Improving documentation of parent-ID field in /proc/PID/mountinfo

2017-11-13 Thread Michael Kerrisk (man-pages)
Hi Miklos, Ram

Thanks for your comments. A question below.

On 13 November 2017 at 09:11, Miklos Szeredi  wrote:
> On Mon, Nov 13, 2017 at 8:55 AM, Ram Pai  wrote:
>> On Mon, Nov 13, 2017 at 07:02:21AM +0100, Michael Kerrisk (man-pages) wrote:
>>> Hello Ram,
>>>
>>> Long ago (2.6.29) you added the /proc/PID/mountinfo file and
>>> associated documentation in Documentation/filesystems/proc.txt. Later,
>>> I pasted much of that documentation into the proc(5) manual page.
>>>
>>> That documentation says of the second field in the file:
>>>
>>> [[
>>> This file contains lines of the form:
>>>
>>> 36 35 98:0 /mnt1 /mnt2 rw,noatime master:1 - ext3 /dev/root 
>>> rw,errors=continue
>>> (1)(2)(3)   (4)   (5)  (6)  (7)   (8) (9)   (10) (11)
>>>
>>> (1) mount ID:  unique identifier of the mount (may be reused after umount)
>>> (2) parent ID:  ID of parent (or of self for the top of the mount tree)
>>> ...
>>> ]]
>>>
>>> The last piece of the description of field (2) doesn't seem to be
>>> correct, or is at least rather unclear. I take this to be saying that
>>> that for the root mount point, /, field (2) will have the same value
>>> as field (1). I never actually looked at this detail closely, but
>>> Alexander pointed out that this is obviously not so, as one can
>>> immediately verify:
>>>
>>> $ grep '/ / ' /proc/$$/mountinfo
>>> 65 0 8:2 / / rw,relatime shared:1 - ext4 /dev/sda2 rw,seclabel,data=order
>>>
>>> I dug around in the kernel source for a bit. I do not have an exact
>>> handle on the details, but I can see roughly what is going on.
>>> Internally, there seems to be one ("hidden") mount ID reserved to each
>>> mount namespace, and that ID is the parent of the root mount point.
>>>
>>> Looking through the (4.14) kernel source, mount IDs are allocated by
>>> mnt_alloc_id() (in fs/namespace.c), which is in turn called by
>>> alloc_vfsmnt() which is in turn called by clone_mnt().
>>>
>>> A new mount namespace is created by the kernel function copy_mnt_ns()
>>> (in fs/namespace.c, called by create_new_namespaces() in
>>> kernel/nsproxy.c). The copy_mnt_ns() function calls copy_tree() (in
>>> fs/namespace.c), and copy_tree() calls clone_mnt() in *two* places.
>>> The first of these is the call that creates the "hidden" mount ID that
>>> becomes the parent of the root mount point. (I verified this by
>>> instrumenting the kernel with a few printk() calls to display the
>>> IDs.)  The second place where copy_tree() calls clone_mnt() is in a
>>> loop that replicates each of the mount points (including the root
>>> mount point) in the source mount namespace.
>>
>> We used to report that mount, ones upon a time.  Something has changed
>> the behavior since then and its not reported any more, thus making it
>> hidden.
>
> The hidden one is the initramfs, I believe.  That's the root of the
> mount namespace, and the when a namespace is cloned, the tree is
> copied from the namespace root.
>
> It is "hidden" because no process has its root there.  Note the
> difference between namespace root and process root: the first is the
> real root of the mount tree and is unchangeable, the second is
> pointing to some place in a mount tree and can be changed (chroot).
>
> So there's nothing special in this rootfs, it is just hidden because
> it's not the root of any task.
>
> The description of  field (2) is correct, it just does not make it
> clear what it means by "root".

Sorry -- do you mean the old description is correct, or my new
description (below)?

Cheers,

Michael


> Thanks,
> Miklos
>
>>
>>>
>>> With these details in mind, I propose to patch the man page to read as
>>> below. Perhaps you have some corrections or improvements to suggest
>>> for this text?
>>>
>>> [[
>>>   (2)  parent ID: the ID of the parent mount.  For  the  root
>>>mount  point,  the  ID shown here is a hidden mount ID
>>>associated with the mount namespace.  That ID is  dis‐
>>>tinct  from  any  of the IDs shown in field (1) of the
>>>records shown in the  mountinfo  file,  and  does  not
>>>appear in field (1) in the mountinfo file in any other
>>>mount namespace.  (In  the  initial  mount  namespace,
>>>this hidden ID has the value 0.)
>>
>> It captures the current semantics correctly.
>>
>> RP
>>



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/


Re: Improving documentation of parent-ID field in /proc/PID/mountinfo

2017-11-13 Thread Michael Kerrisk (man-pages)
Hi Miklos, Ram

Thanks for your comments. A question below.

On 13 November 2017 at 09:11, Miklos Szeredi  wrote:
> On Mon, Nov 13, 2017 at 8:55 AM, Ram Pai  wrote:
>> On Mon, Nov 13, 2017 at 07:02:21AM +0100, Michael Kerrisk (man-pages) wrote:
>>> Hello Ram,
>>>
>>> Long ago (2.6.29) you added the /proc/PID/mountinfo file and
>>> associated documentation in Documentation/filesystems/proc.txt. Later,
>>> I pasted much of that documentation into the proc(5) manual page.
>>>
>>> That documentation says of the second field in the file:
>>>
>>> [[
>>> This file contains lines of the form:
>>>
>>> 36 35 98:0 /mnt1 /mnt2 rw,noatime master:1 - ext3 /dev/root 
>>> rw,errors=continue
>>> (1)(2)(3)   (4)   (5)  (6)  (7)   (8) (9)   (10) (11)
>>>
>>> (1) mount ID:  unique identifier of the mount (may be reused after umount)
>>> (2) parent ID:  ID of parent (or of self for the top of the mount tree)
>>> ...
>>> ]]
>>>
>>> The last piece of the description of field (2) doesn't seem to be
>>> correct, or is at least rather unclear. I take this to be saying that
>>> that for the root mount point, /, field (2) will have the same value
>>> as field (1). I never actually looked at this detail closely, but
>>> Alexander pointed out that this is obviously not so, as one can
>>> immediately verify:
>>>
>>> $ grep '/ / ' /proc/$$/mountinfo
>>> 65 0 8:2 / / rw,relatime shared:1 - ext4 /dev/sda2 rw,seclabel,data=order
>>>
>>> I dug around in the kernel source for a bit. I do not have an exact
>>> handle on the details, but I can see roughly what is going on.
>>> Internally, there seems to be one ("hidden") mount ID reserved to each
>>> mount namespace, and that ID is the parent of the root mount point.
>>>
>>> Looking through the (4.14) kernel source, mount IDs are allocated by
>>> mnt_alloc_id() (in fs/namespace.c), which is in turn called by
>>> alloc_vfsmnt() which is in turn called by clone_mnt().
>>>
>>> A new mount namespace is created by the kernel function copy_mnt_ns()
>>> (in fs/namespace.c, called by create_new_namespaces() in
>>> kernel/nsproxy.c). The copy_mnt_ns() function calls copy_tree() (in
>>> fs/namespace.c), and copy_tree() calls clone_mnt() in *two* places.
>>> The first of these is the call that creates the "hidden" mount ID that
>>> becomes the parent of the root mount point. (I verified this by
>>> instrumenting the kernel with a few printk() calls to display the
>>> IDs.)  The second place where copy_tree() calls clone_mnt() is in a
>>> loop that replicates each of the mount points (including the root
>>> mount point) in the source mount namespace.
>>
>> We used to report that mount, ones upon a time.  Something has changed
>> the behavior since then and its not reported any more, thus making it
>> hidden.
>
> The hidden one is the initramfs, I believe.  That's the root of the
> mount namespace, and the when a namespace is cloned, the tree is
> copied from the namespace root.
>
> It is "hidden" because no process has its root there.  Note the
> difference between namespace root and process root: the first is the
> real root of the mount tree and is unchangeable, the second is
> pointing to some place in a mount tree and can be changed (chroot).
>
> So there's nothing special in this rootfs, it is just hidden because
> it's not the root of any task.
>
> The description of  field (2) is correct, it just does not make it
> clear what it means by "root".

Sorry -- do you mean the old description is correct, or my new
description (below)?

Cheers,

Michael


> Thanks,
> Miklos
>
>>
>>>
>>> With these details in mind, I propose to patch the man page to read as
>>> below. Perhaps you have some corrections or improvements to suggest
>>> for this text?
>>>
>>> [[
>>>   (2)  parent ID: the ID of the parent mount.  For  the  root
>>>mount  point,  the  ID shown here is a hidden mount ID
>>>associated with the mount namespace.  That ID is  dis‐
>>>tinct  from  any  of the IDs shown in field (1) of the
>>>records shown in the  mountinfo  file,  and  does  not
>>>appear in field (1) in the mountinfo file in any other
>>>mount namespace.  (In  the  initial  mount  namespace,
>>>this hidden ID has the value 0.)
>>
>> It captures the current semantics correctly.
>>
>> RP
>>



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/


Re: linux-next: Tree for Nov 7

2017-11-13 Thread Michal Hocko
On Mon 13-11-17 09:35:22, Khalid Aziz wrote:
> On 11/13/2017 09:06 AM, Michal Hocko wrote:
> > OK, so this one should take care of the backward compatibility while
> > still not touching the arch code
> > ---
> > commit 39ff9bf8597e79a032da0954aea1f0d77d137765
> > Author: Michal Hocko 
> > Date:   Mon Nov 13 17:06:24 2017 +0100
> > 
> >  mm: introduce MAP_FIXED_SAFE
> >  MAP_FIXED is used quite often but it is inherently dangerous because it
> >  unmaps an existing mapping covered by the requested range. While this
> >  might be might be really desidered behavior in many cases there are
> >  others which would rather see a failure than a silent memory 
> > corruption.
> >  Introduce a new MAP_FIXED_SAFE flag for mmap to achive this behavior.
> >  It is a MAP_FIXED extension with a single exception that it fails with
> >  ENOMEM if the requested address is already covered by an existing
> >  mapping. We still do rely on get_unmaped_area to handle all the arch
> >  specific MAP_FIXED treatment and check for a conflicting vma after it
> >  returns.
> >  Signed-off-by: Michal Hocko 
> > 
> > .. deleted ...
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index 680506faceae..aad8d37f0205 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -1358,6 +1358,10 @@ unsigned long do_mmap(struct file *file, unsigned 
> > long addr,
> > if (mm->map_count > sysctl_max_map_count)
> > return -ENOMEM;
> > +   /* force arch specific MAP_FIXED handling in get_unmapped_area */
> > +   if (flags & MAP_FIXED_SAFE)
> > +   flags |= MAP_FIXED;
> > +
> > /* Obtain the address to map to. we verify (or select) it and ensure
> >  * that it represents a valid section of the address space.
> >  */
> 
> Do you need to move this code above:
> 
> if (!(flags & MAP_FIXED))
> addr = round_hint_to_min(addr);
> 
> /* Careful about overflows.. */
> len = PAGE_ALIGN(len);
> if (!len)
> return -ENOMEM;
> 
> Not doing that might mean the hint address will end up being rounded for
> MAP_FIXED_SAFE which would change the behavior from MAP_FIXED.

Yes, I will move it.
-- 
Michal Hocko
SUSE Labs


Re: linux-next: Tree for Nov 7

2017-11-13 Thread Michal Hocko
On Mon 13-11-17 09:35:22, Khalid Aziz wrote:
> On 11/13/2017 09:06 AM, Michal Hocko wrote:
> > OK, so this one should take care of the backward compatibility while
> > still not touching the arch code
> > ---
> > commit 39ff9bf8597e79a032da0954aea1f0d77d137765
> > Author: Michal Hocko 
> > Date:   Mon Nov 13 17:06:24 2017 +0100
> > 
> >  mm: introduce MAP_FIXED_SAFE
> >  MAP_FIXED is used quite often but it is inherently dangerous because it
> >  unmaps an existing mapping covered by the requested range. While this
> >  might be might be really desidered behavior in many cases there are
> >  others which would rather see a failure than a silent memory 
> > corruption.
> >  Introduce a new MAP_FIXED_SAFE flag for mmap to achive this behavior.
> >  It is a MAP_FIXED extension with a single exception that it fails with
> >  ENOMEM if the requested address is already covered by an existing
> >  mapping. We still do rely on get_unmaped_area to handle all the arch
> >  specific MAP_FIXED treatment and check for a conflicting vma after it
> >  returns.
> >  Signed-off-by: Michal Hocko 
> > 
> > .. deleted ...
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index 680506faceae..aad8d37f0205 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -1358,6 +1358,10 @@ unsigned long do_mmap(struct file *file, unsigned 
> > long addr,
> > if (mm->map_count > sysctl_max_map_count)
> > return -ENOMEM;
> > +   /* force arch specific MAP_FIXED handling in get_unmapped_area */
> > +   if (flags & MAP_FIXED_SAFE)
> > +   flags |= MAP_FIXED;
> > +
> > /* Obtain the address to map to. we verify (or select) it and ensure
> >  * that it represents a valid section of the address space.
> >  */
> 
> Do you need to move this code above:
> 
> if (!(flags & MAP_FIXED))
> addr = round_hint_to_min(addr);
> 
> /* Careful about overflows.. */
> len = PAGE_ALIGN(len);
> if (!len)
> return -ENOMEM;
> 
> Not doing that might mean the hint address will end up being rounded for
> MAP_FIXED_SAFE which would change the behavior from MAP_FIXED.

Yes, I will move it.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages

2017-11-13 Thread Michal Hocko
On Tue 14-11-17 06:10:00, Ran Wang wrote:
[...]
> > > This drop cause DWC3 USB controller fail on initialization with
> > > Layerscaper processors (such as LS1043A) as below:
> > >
> > > [2.701437] xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned
> > bus number 1
> > > [2.710949] cma: cma_alloc: alloc failed, req-size: 1 pages, ret: -16
> > > [2.717411] xhci-hcd xhci-hcd.0.auto: can't setup: -12
> > > [2.727940] xhci-hcd xhci-hcd.0.auto: USB bus 1 deregistered
> > > [2.733607] xhci-hcd: probe of xhci-hcd.0.auto failed with error -12
> > > [2.739978] xhci-hcd xhci-hcd.1.auto: xHCI Host Controller
> > >
> > > And I notice that someone also reported to you that DWC2 got affected
> > > recently, so do you have the solution now?
> > 
> > Yes. It should be in linux-next. Have a look at the following email
> > thread:
> > https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flkml.
> > kernel.org%2Fr%2F20171104082500.qvzbb2kw4suo6cgy%40dhcp22.suse.cz&
> > data=02%7C01%7Cran.wang_1%40nxp.com%7C5e73c6a941fc4f1c10e708d52
> > a860c5b%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C636461677
> > 583607877=zlRxJ4LZwOBsit5qRx9yFT5qfP54wZ0z6G1z%2Bcywf5g%3D
> > =0

I really have no idea where the above link came from because my email
had a reference to 
http://lkml.kernel.org/r/20171104082500.qvzbb2kw4suo6...@dhcp22.suse.cz
Has your email client modified the original email?

> Thanks for your info, although I fail to open the link you shared, but I got 
> patch
> from my colleague and the issue got fix on my side, let you know, thanks.

Thanks for your testing anyway. Can I assume your Tested-by?
-- 
Michal Hocko
SUSE Labs


Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages

2017-11-13 Thread Michal Hocko
On Tue 14-11-17 06:10:00, Ran Wang wrote:
[...]
> > > This drop cause DWC3 USB controller fail on initialization with
> > > Layerscaper processors (such as LS1043A) as below:
> > >
> > > [2.701437] xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned
> > bus number 1
> > > [2.710949] cma: cma_alloc: alloc failed, req-size: 1 pages, ret: -16
> > > [2.717411] xhci-hcd xhci-hcd.0.auto: can't setup: -12
> > > [2.727940] xhci-hcd xhci-hcd.0.auto: USB bus 1 deregistered
> > > [2.733607] xhci-hcd: probe of xhci-hcd.0.auto failed with error -12
> > > [2.739978] xhci-hcd xhci-hcd.1.auto: xHCI Host Controller
> > >
> > > And I notice that someone also reported to you that DWC2 got affected
> > > recently, so do you have the solution now?
> > 
> > Yes. It should be in linux-next. Have a look at the following email
> > thread:
> > https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flkml.
> > kernel.org%2Fr%2F20171104082500.qvzbb2kw4suo6cgy%40dhcp22.suse.cz&
> > data=02%7C01%7Cran.wang_1%40nxp.com%7C5e73c6a941fc4f1c10e708d52
> > a860c5b%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C636461677
> > 583607877=zlRxJ4LZwOBsit5qRx9yFT5qfP54wZ0z6G1z%2Bcywf5g%3D
> > =0

I really have no idea where the above link came from because my email
had a reference to 
http://lkml.kernel.org/r/20171104082500.qvzbb2kw4suo6...@dhcp22.suse.cz
Has your email client modified the original email?

> Thanks for your info, although I fail to open the link you shared, but I got 
> patch
> from my colleague and the issue got fix on my side, let you know, thanks.

Thanks for your testing anyway. Can I assume your Tested-by?
-- 
Michal Hocko
SUSE Labs


Re: [PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops

2017-11-13 Thread Quan Xu



On 2017/11/13 18:53, Juergen Gross wrote:

On 13/11/17 11:06, Quan Xu wrote:

From: Quan Xu 

So far, pv_idle_ops.poll is the only ops for pv_idle. .poll is called
in idle path which will poll for a while before we enter the real idle
state.

In virtualization, idle path includes several heavy operations
includes timer access(LAPIC timer or TSC deadline timer) which will
hurt performance especially for latency intensive workload like message
passing task. The cost is mainly from the vmexit which is a hardware
context switch between virtual machine and hypervisor. Our solution is
to poll for a while and do not enter real idle path if we can get the
schedule event during polling.

Poll may cause the CPU waste so we adopt a smart polling mechanism to
reduce the useless poll.

Signed-off-by: Yang Zhang 
Signed-off-by: Quan Xu 
Cc: Juergen Gross 
Cc: Alok Kataria 
Cc: Rusty Russell 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: x...@kernel.org
Cc: virtualizat...@lists.linux-foundation.org
Cc: linux-kernel@vger.kernel.org
Cc: xen-de...@lists.xenproject.org

Hmm, is the idle entry path really so critical to performance that a new
pvops function is necessary?

Juergen, Here is the data we get when running benchmark netperf:
 1. w/o patch and disable kvm dynamic poll (halt_poll_ns=0):
    29031.6 bit/s -- 76.1 %CPU

 2. w/ patch and disable kvm dynamic poll (halt_poll_ns=0):
    35787.7 bit/s -- 129.4 %CPU

 3. w/ kvm dynamic poll:
    35735.6 bit/s -- 200.0 %CPU

 4. w/patch and w/ kvm dynamic poll:
    42225.3 bit/s -- 198.7 %CPU

 5. idle=poll
    37081.7 bit/s -- 998.1 %CPU



 w/ this patch, we will improve performance by 23%.. even we could improve
 performance by 45.4%, if we use w/patch and w/ kvm dynamic poll. also the
 cost of CPU is much lower than 'idle=poll' case..


Wouldn't a function pointer, maybe guarded
by a static key, be enough? A further advantage would be that this would
work on other architectures, too.


I assume this feature will be ported to other archs.. a new pvops makes code
clean and easy to maintain. also I tried to add it into existed pvops, 
but it

doesn't match.



Quan
Alibaba Cloud


Juergen





Re: [PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops

2017-11-13 Thread Quan Xu



On 2017/11/13 18:53, Juergen Gross wrote:

On 13/11/17 11:06, Quan Xu wrote:

From: Quan Xu 

So far, pv_idle_ops.poll is the only ops for pv_idle. .poll is called
in idle path which will poll for a while before we enter the real idle
state.

In virtualization, idle path includes several heavy operations
includes timer access(LAPIC timer or TSC deadline timer) which will
hurt performance especially for latency intensive workload like message
passing task. The cost is mainly from the vmexit which is a hardware
context switch between virtual machine and hypervisor. Our solution is
to poll for a while and do not enter real idle path if we can get the
schedule event during polling.

Poll may cause the CPU waste so we adopt a smart polling mechanism to
reduce the useless poll.

Signed-off-by: Yang Zhang 
Signed-off-by: Quan Xu 
Cc: Juergen Gross 
Cc: Alok Kataria 
Cc: Rusty Russell 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: x...@kernel.org
Cc: virtualizat...@lists.linux-foundation.org
Cc: linux-kernel@vger.kernel.org
Cc: xen-de...@lists.xenproject.org

Hmm, is the idle entry path really so critical to performance that a new
pvops function is necessary?

Juergen, Here is the data we get when running benchmark netperf:
 1. w/o patch and disable kvm dynamic poll (halt_poll_ns=0):
    29031.6 bit/s -- 76.1 %CPU

 2. w/ patch and disable kvm dynamic poll (halt_poll_ns=0):
    35787.7 bit/s -- 129.4 %CPU

 3. w/ kvm dynamic poll:
    35735.6 bit/s -- 200.0 %CPU

 4. w/patch and w/ kvm dynamic poll:
    42225.3 bit/s -- 198.7 %CPU

 5. idle=poll
    37081.7 bit/s -- 998.1 %CPU



 w/ this patch, we will improve performance by 23%.. even we could improve
 performance by 45.4%, if we use w/patch and w/ kvm dynamic poll. also the
 cost of CPU is much lower than 'idle=poll' case..


Wouldn't a function pointer, maybe guarded
by a static key, be enough? A further advantage would be that this would
work on other architectures, too.


I assume this feature will be ported to other archs.. a new pvops makes code
clean and easy to maintain. also I tried to add it into existed pvops, 
but it

doesn't match.



Quan
Alibaba Cloud


Juergen





RE: [patch v2 8/8] KVM: x86: Disable intercept for Intel processor trace MSRs

2017-11-13 Thread Kang, Luwei
> > +   if (pt_mode == PT_MODE_HOST_GUEST) {
> > +   u32 i, eax, ebx, ecx, edx;
> > +
> > +   cpuid_count(0x14, 1, , , , );
> > +   vmx_disable_intercept_for_msr(MSR_IA32_RTIT_STATUS, false);
> > +   vmx_disable_intercept_for_msr(MSR_IA32_RTIT_OUTPUT_BASE, false);
> > +   vmx_disable_intercept_for_msr(MSR_IA32_RTIT_OUTPUT_MASK, false);
> > +   vmx_disable_intercept_for_msr(MSR_IA32_RTIT_CR3_MATCH, false);
> > +   for (i = 0; i < (eax & 0x7); i++)
> > +   vmx_disable_intercept_for_msr(MSR_IA32_RTIT_ADDR0_A + i,
> > +   false);
> > +   }
> > +
> 
> As I mentioned earlier, this probably makes vmentry/vmexit too expensive when 
> guests are not using processor tracing.  I would do  it only if guest 
> TRACEEN=1 (since anyway the values have to be correct if guest TRACEEN=1, and 
> a change in TRACEEN always causes a vmexit).
> 

Will change in next version.

Thanks,
Luwei Kang

> 
> > return alloc_kvm_area();
> >
> >  out:
> >



RE: [patch v2 8/8] KVM: x86: Disable intercept for Intel processor trace MSRs

2017-11-13 Thread Kang, Luwei
> > +   if (pt_mode == PT_MODE_HOST_GUEST) {
> > +   u32 i, eax, ebx, ecx, edx;
> > +
> > +   cpuid_count(0x14, 1, , , , );
> > +   vmx_disable_intercept_for_msr(MSR_IA32_RTIT_STATUS, false);
> > +   vmx_disable_intercept_for_msr(MSR_IA32_RTIT_OUTPUT_BASE, false);
> > +   vmx_disable_intercept_for_msr(MSR_IA32_RTIT_OUTPUT_MASK, false);
> > +   vmx_disable_intercept_for_msr(MSR_IA32_RTIT_CR3_MATCH, false);
> > +   for (i = 0; i < (eax & 0x7); i++)
> > +   vmx_disable_intercept_for_msr(MSR_IA32_RTIT_ADDR0_A + i,
> > +   false);
> > +   }
> > +
> 
> As I mentioned earlier, this probably makes vmentry/vmexit too expensive when 
> guests are not using processor tracing.  I would do  it only if guest 
> TRACEEN=1 (since anyway the values have to be correct if guest TRACEEN=1, and 
> a change in TRACEEN always causes a vmexit).
> 

Will change in next version.

Thanks,
Luwei Kang

> 
> > return alloc_kvm_area();
> >
> >  out:
> >



RE: [patch v2 7/8] KVM: x86: add Intel PT msr RTIT_CTL read/write

2017-11-13 Thread Kang, Luwei
> >  static DEFINE_PER_CPU(struct vmcs *, vmxarea);  static
> > DEFINE_PER_CPU(struct vmcs *, current_vmcs); @@ -3384,6 +3385,11 @@
> > static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > return 1;
> > msr_info->data = vcpu->arch.ia32_xss;
> > break;
> > +   case MSR_IA32_RTIT_CTL:
> > +   if (!vmx_pt_supported())
> > +   return 1;
> > +   msr_info->data = vmcs_read64(GUEST_IA32_RTIT_CTL);
> > +   break;
> > case MSR_TSC_AUX:
> > if (!msr_info->host_initiated &&
> > !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP)) @@ -3508,6 
> > +3514,11
> > @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > else
> > clear_atomic_switch_msr(vmx, MSR_IA32_XSS);
> > break;
> > +   case MSR_IA32_RTIT_CTL:
> > +   if (!vmx_pt_supported() || to_vmx(vcpu)->nested.vmxon)
> > +   return 1;
> 
> VMXON must also clear TraceEn bit (see 23.4 in the SDM).

Will clear TraceEn bit in handle_vmon() function.

> 
> You also need to support R/W of all the other MSRs, in order to make them 
> accessible to userspace, and add them in msrs_to_save and kvm_init_msr_list.
> 

Will add it in next version. This is use for live migration, is that right?

> Regarding the save/restore of the state, I am now wondering if you could also 
> use XSAVES and XRSTORS instead of multiple rdmsr/wrmsr in a loop.
> The cost is two MSR writes on vmenter (because the guest must run with 
> IA32_XSS=0) and two on vmexit.
> 

If use XSAVES and XRSTORS for context switch.
1. Before  kvm_load_guest_fpu(vcpu), we need to save host RTIT_CTL, disable PT 
and restore the value of  "vmx->pt_desc.guest.ctl" to GUEST_IA32_RTIT_CTL. Is 
that right?
2. After VM-exit (step out from kvm_x86_ops->run(vcpu)),  we need to save the 
status of GUEST_IA32_RTIT_CTL . TRACEEN=0 and others MSRs are still in guest 
status. Where to enable PT if in host-guest mode. I think we should enable PT 
after vm-exit but it may cause #GP. " If XRSTORS would restore (or initialize) 
PT state and IA32_RTIT_CTL.TraceEn = 1, the instruction causes a 
general-protection exception. SDM 13.5.6". if enable after kvm_put_guest_fpu() 
I think it too late.)

Thanks,
Luwei Kang
> 
> > +   vmcs_write64(GUEST_IA32_RTIT_CTL, data);
> > +   break;
> > case MSR_TSC_AUX:
> > if (!msr_info->host_initiated &&
> > !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
> >



RE: [patch v2 7/8] KVM: x86: add Intel PT msr RTIT_CTL read/write

2017-11-13 Thread Kang, Luwei
> >  static DEFINE_PER_CPU(struct vmcs *, vmxarea);  static
> > DEFINE_PER_CPU(struct vmcs *, current_vmcs); @@ -3384,6 +3385,11 @@
> > static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > return 1;
> > msr_info->data = vcpu->arch.ia32_xss;
> > break;
> > +   case MSR_IA32_RTIT_CTL:
> > +   if (!vmx_pt_supported())
> > +   return 1;
> > +   msr_info->data = vmcs_read64(GUEST_IA32_RTIT_CTL);
> > +   break;
> > case MSR_TSC_AUX:
> > if (!msr_info->host_initiated &&
> > !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP)) @@ -3508,6 
> > +3514,11
> > @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > else
> > clear_atomic_switch_msr(vmx, MSR_IA32_XSS);
> > break;
> > +   case MSR_IA32_RTIT_CTL:
> > +   if (!vmx_pt_supported() || to_vmx(vcpu)->nested.vmxon)
> > +   return 1;
> 
> VMXON must also clear TraceEn bit (see 23.4 in the SDM).

Will clear TraceEn bit in handle_vmon() function.

> 
> You also need to support R/W of all the other MSRs, in order to make them 
> accessible to userspace, and add them in msrs_to_save and kvm_init_msr_list.
> 

Will add it in next version. This is use for live migration, is that right?

> Regarding the save/restore of the state, I am now wondering if you could also 
> use XSAVES and XRSTORS instead of multiple rdmsr/wrmsr in a loop.
> The cost is two MSR writes on vmenter (because the guest must run with 
> IA32_XSS=0) and two on vmexit.
> 

If use XSAVES and XRSTORS for context switch.
1. Before  kvm_load_guest_fpu(vcpu), we need to save host RTIT_CTL, disable PT 
and restore the value of  "vmx->pt_desc.guest.ctl" to GUEST_IA32_RTIT_CTL. Is 
that right?
2. After VM-exit (step out from kvm_x86_ops->run(vcpu)),  we need to save the 
status of GUEST_IA32_RTIT_CTL . TRACEEN=0 and others MSRs are still in guest 
status. Where to enable PT if in host-guest mode. I think we should enable PT 
after vm-exit but it may cause #GP. " If XRSTORS would restore (or initialize) 
PT state and IA32_RTIT_CTL.TraceEn = 1, the instruction causes a 
general-protection exception. SDM 13.5.6". if enable after kvm_put_guest_fpu() 
I think it too late.)

Thanks,
Luwei Kang
> 
> > +   vmcs_write64(GUEST_IA32_RTIT_CTL, data);
> > +   break;
> > case MSR_TSC_AUX:
> > if (!msr_info->host_initiated &&
> > !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
> >



[PATCH] lib: Avoid redundant sizeof checking in __bitmap_weight() calculation.

2017-11-13 Thread Rakib Mullick
Currently, during __bitmap_weight() calculation hweight_long() is used.
Inside a hweight_long() a check has been made to figure out whether a
hweight32() or hweight64() version to use.

However, it's unnecessary to do it in case of __bitmap_weight() calculation
inside the loop. We can detect whether to use hweight32() or hweight64()
upfront and call respective function directly. It also reduces the
vmlinux size.

Before the patch:
   textdata bss dec hex filename
129013327798930 1454181635242078219c05e vmlinux

After the patch:
   textdata bss dec hex filename
129013317798930 1454181635242077219c05d vmlinux

Signed-off-by: Rakib Mullick 
Cc: Andrew Morton 
Cc: Rasmus Villemoes 
Cc: Matthew Wilcox 
Cc: Yury Norov 
Cc: Mauro Carvalho Chehab 
 
---
Patch was created against torvald's tree (commit 43ff2f4db9d0f764).

 lib/bitmap.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/lib/bitmap.c b/lib/bitmap.c
index d8f0c09..552096f 100644
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -241,10 +241,15 @@ EXPORT_SYMBOL(__bitmap_subset);
 int __bitmap_weight(const unsigned long *bitmap, unsigned int bits)
 {
unsigned int k, lim = bits/BITS_PER_LONG;
-   int w = 0;
-
-   for (k = 0; k < lim; k++)
-   w += hweight_long(bitmap[k]);
+   int w = 0, is32 = sizeof(bitmap[0]) ? 1 : 0;
+
+   if (is32) {
+   for (k = 0; k < lim; k++)
+   w += hweight32(bitmap[k]);
+   } else {
+   for (k = 0; k < lim; k++)
+   w += hweight64(bitmap[k]);
+   }
 
if (bits % BITS_PER_LONG)
w += hweight_long(bitmap[k] & BITMAP_LAST_WORD_MASK(bits));
-- 
2.9.3



[PATCH] lib: Avoid redundant sizeof checking in __bitmap_weight() calculation.

2017-11-13 Thread Rakib Mullick
Currently, during __bitmap_weight() calculation hweight_long() is used.
Inside a hweight_long() a check has been made to figure out whether a
hweight32() or hweight64() version to use.

However, it's unnecessary to do it in case of __bitmap_weight() calculation
inside the loop. We can detect whether to use hweight32() or hweight64()
upfront and call respective function directly. It also reduces the
vmlinux size.

Before the patch:
   textdata bss dec hex filename
129013327798930 1454181635242078219c05e vmlinux

After the patch:
   textdata bss dec hex filename
129013317798930 1454181635242077219c05d vmlinux

Signed-off-by: Rakib Mullick 
Cc: Andrew Morton 
Cc: Rasmus Villemoes 
Cc: Matthew Wilcox 
Cc: Yury Norov 
Cc: Mauro Carvalho Chehab 
 
---
Patch was created against torvald's tree (commit 43ff2f4db9d0f764).

 lib/bitmap.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/lib/bitmap.c b/lib/bitmap.c
index d8f0c09..552096f 100644
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -241,10 +241,15 @@ EXPORT_SYMBOL(__bitmap_subset);
 int __bitmap_weight(const unsigned long *bitmap, unsigned int bits)
 {
unsigned int k, lim = bits/BITS_PER_LONG;
-   int w = 0;
-
-   for (k = 0; k < lim; k++)
-   w += hweight_long(bitmap[k]);
+   int w = 0, is32 = sizeof(bitmap[0]) ? 1 : 0;
+
+   if (is32) {
+   for (k = 0; k < lim; k++)
+   w += hweight32(bitmap[k]);
+   } else {
+   for (k = 0; k < lim; k++)
+   w += hweight64(bitmap[k]);
+   }
 
if (bits % BITS_PER_LONG)
w += hweight_long(bitmap[k] & BITMAP_LAST_WORD_MASK(bits));
-- 
2.9.3



Re: KASAN: use-after-free Read in rds_tcp_dev_event

2017-11-13 Thread Sowmini Varadhan
On (11/13/17 19:30), Girish Moodalbail wrote:
> (L538-540). However, it leaves behind some of the rds_tcp connections that
> shared the same underlying RDS connection (L534 and 535). These connections
> with pointer to stale network namespace are left behind in the global list.

It leaves behind no such thing. After mprds, you want to collect
only one instance of the conn that is being removed, that's why
lines 534-535 skips over duplicat instances of the same conn
(for multiple paths in the same conn).

> When the 2nd network namespace is deleted, we will hit the above stale
> pointer and hit UAF panic.
> I think we should move away from global list to a per-namespace list. The
> global list are used only in two places (both of which are per-namespace
> operations):

Nice try, but not so. 

Let me look at this tomorrow, I missed this mail in my mbox.

--Sowmini



Re: KASAN: use-after-free Read in rds_tcp_dev_event

2017-11-13 Thread Sowmini Varadhan
On (11/13/17 19:30), Girish Moodalbail wrote:
> (L538-540). However, it leaves behind some of the rds_tcp connections that
> shared the same underlying RDS connection (L534 and 535). These connections
> with pointer to stale network namespace are left behind in the global list.

It leaves behind no such thing. After mprds, you want to collect
only one instance of the conn that is being removed, that's why
lines 534-535 skips over duplicat instances of the same conn
(for multiple paths in the same conn).

> When the 2nd network namespace is deleted, we will hit the above stale
> pointer and hit UAF panic.
> I think we should move away from global list to a per-namespace list. The
> global list are used only in two places (both of which are per-namespace
> operations):

Nice try, but not so. 

Let me look at this tomorrow, I missed this mail in my mbox.

--Sowmini



[PATCH -next] irqchip/exiu: Fix return value check in exiu_init()

2017-11-13 Thread Wei Yongjun
In case of error, the function of_iomap() returns NULL pointer not
ERR_PTR(). The IS_ERR() test in the return value check should be
replaced with NULL test.

Fixes: 706cffc1b912 ("irqchip/exiu: Add support for Socionext Synquacer EXIU 
controller")
Signed-off-by: Wei Yongjun 
---
 drivers/irqchip/irq-sni-exiu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/irqchip/irq-sni-exiu.c b/drivers/irqchip/irq-sni-exiu.c
index 1b6e2f7..1927b2f 100644
--- a/drivers/irqchip/irq-sni-exiu.c
+++ b/drivers/irqchip/irq-sni-exiu.c
@@ -196,8 +196,8 @@ static int __init exiu_init(struct device_node *node,
}
 
data->base = of_iomap(node, 0);
-   if (IS_ERR(data->base)) {
-   err = PTR_ERR(data->base);
+   if (!data->base) {
+   err = -ENODEV;
goto out_free;
}



[PATCH -next] irqchip/exiu: Fix return value check in exiu_init()

2017-11-13 Thread Wei Yongjun
In case of error, the function of_iomap() returns NULL pointer not
ERR_PTR(). The IS_ERR() test in the return value check should be
replaced with NULL test.

Fixes: 706cffc1b912 ("irqchip/exiu: Add support for Socionext Synquacer EXIU 
controller")
Signed-off-by: Wei Yongjun 
---
 drivers/irqchip/irq-sni-exiu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/irqchip/irq-sni-exiu.c b/drivers/irqchip/irq-sni-exiu.c
index 1b6e2f7..1927b2f 100644
--- a/drivers/irqchip/irq-sni-exiu.c
+++ b/drivers/irqchip/irq-sni-exiu.c
@@ -196,8 +196,8 @@ static int __init exiu_init(struct device_node *node,
}
 
data->base = of_iomap(node, 0);
-   if (IS_ERR(data->base)) {
-   err = PTR_ERR(data->base);
+   if (!data->base) {
+   err = -ENODEV;
goto out_free;
}



Re: [PATCH] wcn36xx: Set BTLE coexistence related configuration values to defaults

2017-11-13 Thread Kalle Valo
Ramon Fried  writes:

> From: Eyal Ilsar 
>
> If the value for the firmware configuration parameters
> BTC_STATIC_LEN_LE_BT and BTC_STATIC_LEN_LE_WLAN are not set the duty
> cycle between BT and WLAN is such that if BT (including BLE) is active
> WLAN gets 0 bandwidth. When tuning these parameters having a too high
> value for WLAN means that BLE performance degrades. The "sweet" point
> of roughly half of the maximal values was empirically found to achieve
> a balance between BLE and Wi-Fi coexistence performance.
>
> Signed-off-by: Eyal Ilsar 
> Signed-off-by: Ramon Fried 

Then submit a new version of the patch then please include the version
number:

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches#patch_version_missing

So after fixing Bjorn's comments the next version should be v3.

-- 
Kalle Valo


Re: [PATCH] wcn36xx: Set BTLE coexistence related configuration values to defaults

2017-11-13 Thread Kalle Valo
Ramon Fried  writes:

> From: Eyal Ilsar 
>
> If the value for the firmware configuration parameters
> BTC_STATIC_LEN_LE_BT and BTC_STATIC_LEN_LE_WLAN are not set the duty
> cycle between BT and WLAN is such that if BT (including BLE) is active
> WLAN gets 0 bandwidth. When tuning these parameters having a too high
> value for WLAN means that BLE performance degrades. The "sweet" point
> of roughly half of the maximal values was empirically found to achieve
> a balance between BLE and Wi-Fi coexistence performance.
>
> Signed-off-by: Eyal Ilsar 
> Signed-off-by: Ramon Fried 

Then submit a new version of the patch then please include the version
number:

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches#patch_version_missing

So after fixing Bjorn's comments the next version should be v3.

-- 
Kalle Valo


Re: [PATCH v6 4/5] crash: export paddr_vmcoreinfo_note()

2017-11-13 Thread Baoquan He
On 11/13/17 at 08:29pm, Marc-André Lureau wrote:
> The following patch is going to use the symbol from the fw_cfg module,
> to call the function and write the note location details in the
> vmcoreinfo entry, so qemu can produce dumps with the vmcoreinfo note.
> 
> CC: Andrew Morton 
> CC: Baoquan He 
> CC: Dave Young 
> CC: Dave Young 
> CC: Hari Bathini 
> CC: Tony Luck 
> CC: Vivek Goyal 
> Signed-off-by: Marc-André Lureau 
> Acked-by: Gabriel Somlo 

ACK

Acked-by: Baoquan He 

Thanks
Baoquan

> ---
>  kernel/crash_core.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 6db80fc0810b..47541c891810 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -375,6 +375,7 @@ phys_addr_t __weak paddr_vmcoreinfo_note(void)
>  {
>   return __pa(vmcoreinfo_note);
>  }
> +EXPORT_SYMBOL(paddr_vmcoreinfo_note);
>  
>  static int __init crash_save_vmcoreinfo_init(void)
>  {
> -- 
> 2.15.0.125.g8f49766d64
> 


Re: [PATCH v6 4/5] crash: export paddr_vmcoreinfo_note()

2017-11-13 Thread Baoquan He
On 11/13/17 at 08:29pm, Marc-André Lureau wrote:
> The following patch is going to use the symbol from the fw_cfg module,
> to call the function and write the note location details in the
> vmcoreinfo entry, so qemu can produce dumps with the vmcoreinfo note.
> 
> CC: Andrew Morton 
> CC: Baoquan He 
> CC: Dave Young 
> CC: Dave Young 
> CC: Hari Bathini 
> CC: Tony Luck 
> CC: Vivek Goyal 
> Signed-off-by: Marc-André Lureau 
> Acked-by: Gabriel Somlo 

ACK

Acked-by: Baoquan He 

Thanks
Baoquan

> ---
>  kernel/crash_core.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 6db80fc0810b..47541c891810 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -375,6 +375,7 @@ phys_addr_t __weak paddr_vmcoreinfo_note(void)
>  {
>   return __pa(vmcoreinfo_note);
>  }
> +EXPORT_SYMBOL(paddr_vmcoreinfo_note);
>  
>  static int __init crash_save_vmcoreinfo_init(void)
>  {
> -- 
> 2.15.0.125.g8f49766d64
> 


Re: [PATCH 4.9 00/87] 4.9.62-stable review --> crash

2017-11-13 Thread Sebastian Gottschall

ahm it compiles well. but

[   24.838120] Unable to handle kernel NULL pointer dereference at 
virtual address 0055

[   24.846193] pgd = c0004000
[   24.848893] [0055] *pgd=
[   24.852472] Internal error: Oops - BUG: 817 [#1] PREEMPT SMP ARM
[   24.858463] Modules linked in: xhci_plat_hcd xhci_pci xhci_hcd 
ohci_hcd ehci_pci ehci_platform ehci_hcd usbcore usb_common nls_base 
qca_ssdk gpio_pca953x mii_gpio wil6210 ath10k_pci ath10k_core ath9k 
ath9k_common ath9k_hw ath mac80211 cfg80211 compat

[   24.880852] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.9.62-rc1 #90
[   24.887189] Hardware name: AnnapurnaLabs Alpine (Device Tree)
[   24.892921] task: ef029ac0 task.stack: ef05a000
[   24.897444] PC is at nf_nat_cleanup_conntrack+0x4c/0x74
[   24.902657] LR is at nf_nat_cleanup_conntrack+0x38/0x74
[   24.907869] pc : []    lr : []    psr: 6153
[   24.907869] sp : ef05bb58  ip : ef05bb58  fp : ef05bb6c
[   24.919317] r10: ed230cc0  r9 : ed230c00  r8 : edf45800
[   24.924529] r7 : ebcd2f00  r6 : ec33739e  r5 : c0892294  r4 : ebcd2f00
[   24.931040] r3 :   r2 : 0055  r1 :   r0 : c0892718
[   24.937551] Flags: nZCv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  
Segment user

[   24.944755] Control: 10c5387d  Table: 2bd1006a  DAC: 0055
[   24.950486] Process swapper/2 (pid: 0, stack limit = 0xef05a210)
[   24.956477] Stack: (0xef05bb58 to 0xef05c000)


will dig into the code to find the reason


Am 13.11.2017 um 13:55 schrieb Greg Kroah-Hartman:

This is the start of the stable review cycle for the 4.9.62 release.
There are 87 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Wed Nov 15 12:55:40 UTC 2017.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.62-rc1.gz
or in the git tree and branch at:
   git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
linux-4.9.y
and the diffstat can be found below.

thanks,

greg k-h

-
Pseudo-Shortlog of commits:

Greg Kroah-Hartman 
 Linux 4.9.62-rc1

Borislav Petkov 
 x86/oprofile/ppro: Do not use __this_cpu*() in preemptible context

Pavel Tatashin 
 x86/smpboot: Make optimization of delay calibration work correctly

Florian Westphal 
 netfilter: nat: Revert "netfilter: nat: convert nat bysrc hash to 
rhashtable"

Richard Schütz 
 can: c_can: don't indicate triple sampling support for D_CAN

Marek Vasut 
 can: ifi: Fix transmitter delay calculation

Gerhard Bertelsmann 
 can: sun4i: handle overrun in RX FIFO

John Stultz 
 drm/bridge: adv7511: Re-write the i2c address before EDID probing

John Stultz 
 drm/bridge: adv7511: Reuse __adv7511_power_on/off() when probing EDID

John Stultz 
 drm/bridge: adv7511: Rework adv7511_power_on/off() so they can be reused 
internally

Sinclair Yeh 
 drm/vmwgfx: Fix Ubuntu 17.10 Wayland black screen issue

Ilya Dryomov 
 rbd: use GFP_NOIO for parent stat and data requests

Kai-Heng Feng 
 Input: elan_i2c - add ELAN060C to the ACPI table

Oswald Buddenhagen 
 MIPS: AR7: Ensure that serial ports are properly set up

Jonas Gorski 
 MIPS: AR7: Defer registration of GPIO

Jaedon Shin 
 MIPS: BMIPS: Fix missing cbr address

Marcus Cooper 
 ASoC: sun4i-spdif: remove legacy dapm components

Luis R. Rodriguez 
 tools: firmware: check for distro fallback udev cancel rule

Luis R. Rodriguez 
 selftests: firmware: send expected errors to /dev/null

Matt Redfearn 
 MIPS: SMP: Fix deadlock & online race

Matija Glavinic Pecotic 
 MIPS: Fix race on setting and getting cpu_online_mask

Matt Redfearn 
 MIPS: SMP: Use a completion event to signal CPU up

Paul Burton 
 MIPS: Fix CM region target definitions

Gustavo A. R. Silva 
 MIPS: microMIPS: Fix incorrect mask in insn_table_MM

Maarten Lankhorst 
 drm/i915: Do not rely on wm preservation for ILK watermarks

Takashi Iwai 
 ALSA: seq: Avoid invalid lockdep class warning

Takashi Iwai 
 ALSA: seq: Fix OSS sysex delivery in OSS emulation

Mark Rutland 
 ARM: 8720/1: ensure dump_instr() checks addr_limit

Eric Biggers 
 

Re: [PATCH 4.9 00/87] 4.9.62-stable review --> crash

2017-11-13 Thread Sebastian Gottschall

ahm it compiles well. but

[   24.838120] Unable to handle kernel NULL pointer dereference at 
virtual address 0055

[   24.846193] pgd = c0004000
[   24.848893] [0055] *pgd=
[   24.852472] Internal error: Oops - BUG: 817 [#1] PREEMPT SMP ARM
[   24.858463] Modules linked in: xhci_plat_hcd xhci_pci xhci_hcd 
ohci_hcd ehci_pci ehci_platform ehci_hcd usbcore usb_common nls_base 
qca_ssdk gpio_pca953x mii_gpio wil6210 ath10k_pci ath10k_core ath9k 
ath9k_common ath9k_hw ath mac80211 cfg80211 compat

[   24.880852] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.9.62-rc1 #90
[   24.887189] Hardware name: AnnapurnaLabs Alpine (Device Tree)
[   24.892921] task: ef029ac0 task.stack: ef05a000
[   24.897444] PC is at nf_nat_cleanup_conntrack+0x4c/0x74
[   24.902657] LR is at nf_nat_cleanup_conntrack+0x38/0x74
[   24.907869] pc : []    lr : []    psr: 6153
[   24.907869] sp : ef05bb58  ip : ef05bb58  fp : ef05bb6c
[   24.919317] r10: ed230cc0  r9 : ed230c00  r8 : edf45800
[   24.924529] r7 : ebcd2f00  r6 : ec33739e  r5 : c0892294  r4 : ebcd2f00
[   24.931040] r3 :   r2 : 0055  r1 :   r0 : c0892718
[   24.937551] Flags: nZCv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  
Segment user

[   24.944755] Control: 10c5387d  Table: 2bd1006a  DAC: 0055
[   24.950486] Process swapper/2 (pid: 0, stack limit = 0xef05a210)
[   24.956477] Stack: (0xef05bb58 to 0xef05c000)


will dig into the code to find the reason


Am 13.11.2017 um 13:55 schrieb Greg Kroah-Hartman:

This is the start of the stable review cycle for the 4.9.62 release.
There are 87 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Wed Nov 15 12:55:40 UTC 2017.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.62-rc1.gz
or in the git tree and branch at:
   git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
linux-4.9.y
and the diffstat can be found below.

thanks,

greg k-h

-
Pseudo-Shortlog of commits:

Greg Kroah-Hartman 
 Linux 4.9.62-rc1

Borislav Petkov 
 x86/oprofile/ppro: Do not use __this_cpu*() in preemptible context

Pavel Tatashin 
 x86/smpboot: Make optimization of delay calibration work correctly

Florian Westphal 
 netfilter: nat: Revert "netfilter: nat: convert nat bysrc hash to 
rhashtable"

Richard Schütz 
 can: c_can: don't indicate triple sampling support for D_CAN

Marek Vasut 
 can: ifi: Fix transmitter delay calculation

Gerhard Bertelsmann 
 can: sun4i: handle overrun in RX FIFO

John Stultz 
 drm/bridge: adv7511: Re-write the i2c address before EDID probing

John Stultz 
 drm/bridge: adv7511: Reuse __adv7511_power_on/off() when probing EDID

John Stultz 
 drm/bridge: adv7511: Rework adv7511_power_on/off() so they can be reused 
internally

Sinclair Yeh 
 drm/vmwgfx: Fix Ubuntu 17.10 Wayland black screen issue

Ilya Dryomov 
 rbd: use GFP_NOIO for parent stat and data requests

Kai-Heng Feng 
 Input: elan_i2c - add ELAN060C to the ACPI table

Oswald Buddenhagen 
 MIPS: AR7: Ensure that serial ports are properly set up

Jonas Gorski 
 MIPS: AR7: Defer registration of GPIO

Jaedon Shin 
 MIPS: BMIPS: Fix missing cbr address

Marcus Cooper 
 ASoC: sun4i-spdif: remove legacy dapm components

Luis R. Rodriguez 
 tools: firmware: check for distro fallback udev cancel rule

Luis R. Rodriguez 
 selftests: firmware: send expected errors to /dev/null

Matt Redfearn 
 MIPS: SMP: Fix deadlock & online race

Matija Glavinic Pecotic 
 MIPS: Fix race on setting and getting cpu_online_mask

Matt Redfearn 
 MIPS: SMP: Use a completion event to signal CPU up

Paul Burton 
 MIPS: Fix CM region target definitions

Gustavo A. R. Silva 
 MIPS: microMIPS: Fix incorrect mask in insn_table_MM

Maarten Lankhorst 
 drm/i915: Do not rely on wm preservation for ILK watermarks

Takashi Iwai 
 ALSA: seq: Avoid invalid lockdep class warning

Takashi Iwai 
 ALSA: seq: Fix OSS sysex delivery in OSS emulation

Mark Rutland 
 ARM: 8720/1: ensure dump_instr() checks addr_limit

Eric Biggers 
 KEYS: fix NULL pointer dereference during ASN.1 parsing [ver #2]

Andrey Ryabinin 
 crypto: x86/sha256-mb - fix panic due to unaligned access

Andrey Ryabinin 
 crypto: x86/sha1-mb - fix panic due to unaligned access

Romain Izard 
 crypto: ccm - preserve the IV buffer

Li Bin 
 workqueue: Fix NULL pointer dereference

Peter Zijlstra 
 x86/uaccess, sched/preempt: Verify access_ok() context

Carlo Caione 
 platform/x86: hp-wmi: Do not shadow error values

Carlo Caione 
 platform/x86: hp-wmi: Fix error value for hp_wmi_tablet_state

Eric Biggers 
 KEYS: trusted: fix writing past end of buffer in trusted_read()

Eric Biggers 
 KEYS: 

Re: [PATCH 1/2] arm64: mm: abort uaccess retries upon fatal signal

2017-11-13 Thread Rabin Vincent
On Tue, Aug 22, 2017 at 10:45:24AM +0100, Will Deacon wrote:
> On Mon, Aug 21, 2017 at 02:42:03PM +0100, Mark Rutland wrote:
> > On Tue, Jul 11, 2017 at 03:58:49PM +0100, Will Deacon wrote:
> > > On Tue, Jul 11, 2017 at 03:19:22PM +0100, Mark Rutland wrote:
> > > > When there's a fatal signal pending, arm64's do_page_fault()
> > > > implementation returns 0. The intent is that we'll return to the
> > > > faulting userspace instruction, delivering the signal on the way.
> > > > 
> > > > However, if we take a fatal signal during fixing up a uaccess, this
> > > > results in a return to the faulting kernel instruction, which will be
> > > > instantly retried, resulting in the same fault being taken forever. As
> > > > the task never reaches userspace, the signal is not delivered, and the
> > > > task is left unkillable. While the task is stuck in this state, it can
> > > > inhibit the forward progress of the system.
> > > > 
> > > > To avoid this, we must ensure that when a fatal signal is pending, we
> > > > apply any necessary fixup for a faulting kernel instruction. Thus we
> > > > will return to an error path, and it is up to that code to make forward
> > > > progress towards delivering the fatal signal.
> > > > 
> > > > Signed-off-by: Mark Rutland 
> > > > Reviewed-by: Steve Capper 
> > > > Tested-by: Steve Capper 
> > > > Cc: Catalin Marinas 
> > > > Cc: James Morse 
> > > > Cc: Laura Abbott 
> > > > Cc: Will Deacon 
> > > > Cc: sta...@vger.kernel.org
> > > > ---
> > > >  arch/arm64/mm/fault.c | 5 -
> > > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> > > > index 37b95df..3952d5e 100644
> > > > --- a/arch/arm64/mm/fault.c
> > > > +++ b/arch/arm64/mm/fault.c
> > > > @@ -397,8 +397,11 @@ static int __kprobes do_page_fault(unsigned long 
> > > > addr, unsigned int esr,
> > > >  * signal first. We do not need to release the mmap_sem because 
> > > > it
> > > >  * would already be released in __lock_page_or_retry in 
> > > > mm/filemap.c.
> > > >  */
> > > > -   if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
> > > > +   if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) {
> > > > +   if (!user_mode(regs))
> > > > +   goto no_context;
> > > > return 0;
> > > > +   }
> > > 
> > > This will need rebasing at -rc1 (take a look at current HEAD).
> > > 
> > > Also, I think it introduces a weird corner case where we take a page fault
> > > when writing the signal frame to the user stack to deliver a SIGSEGV. If
> > > we end up with VM_FAULT_RETRY and somebody has sent a SIGKILL to the task,
> > > then we'll fail setup_sigframe and force an un-handleable SIGSEGV instead
> > > of SIGKILL.
> > > 
> > > The end result (task is killed) is the same, but the fatal signal is 
> > > wrong.
> > 
> > That doesn't seem to be the case, testing on v4.13-rc5.
> > 
> > I used sigaltstack() to use the userfaultfd region as signal stack,
> > registerd a SIGSEGV handler, and dereferenced NULL. The task locks up,
> > but when killed with a SIGINT or SIGKILL, the exit status reflects that
> > signal, rather than the SIGSEGV.
> > 
> > If I move the SIGINT handler onto the userfaultfd-monitored stack, then
> > delivering SIGINT hangs, but can be killed with SIGKILL, and the exit
> > status reflects that SIGKILL.
> > 
> > As you say, it does look like we'd try to set up a deferred SIGSEGV for
> > the failed signal delivery.
> > 
> > I haven't yet figured out exactly how that works; I'll keep digging.
> 
> The SEGV makes it all the way into do_group_exit, but then signal_group_exit
> is set and the exit_code is overridden with SIGKILL at the last minute (see
> complete_signal).

Unfortunately, this last minute is too late for print-fatal-signals.
With print-fatal-signals enabled, this patch leads to misleading
"potentially unexpected fatal signal 11" warnings if a process is
SIGKILL'd at the right time.

I've seen it without userfaultfd, but it's easiest reproduced by
patching Mark's original test code [1] with the following patch and
simply running "pkill -WINCH foo; pkill -KILL foo".  This results in:

 foo: potentially unexpected fatal signal 11.
 CPU: 1 PID: 1793 Comm: foo Not tainted 4.9.58-devel #3
 task: b3534780 task.stack: b4b7c000
 PC is at 0x76effb60
 LR is at 0x4227f4
 pc : [<76effb60>]lr : [<004227f4>]psr: 600b0010
 sp : 7eaf7bb4  ip :   fp : 
 r10: 0001  r9 : 0003  r8 : 76fcd000
 r7 : 001d  r6 : 76fd0cf0  r5 : 7eaf7c08  r4 : 
 r3 :   r2 :   r1 : 7eaf7a88  r0 : fffc
 Flags: nZCv  IRQs on  FIQs on  Mode USER_32  ISA ARM  Segment user
 Control: 10c5387d  Table: 3357404a  DAC: 0055
 CPU: 1 PID: 1793 Comm: foo Not tainted 

Re: [PATCH 1/2] arm64: mm: abort uaccess retries upon fatal signal

2017-11-13 Thread Rabin Vincent
On Tue, Aug 22, 2017 at 10:45:24AM +0100, Will Deacon wrote:
> On Mon, Aug 21, 2017 at 02:42:03PM +0100, Mark Rutland wrote:
> > On Tue, Jul 11, 2017 at 03:58:49PM +0100, Will Deacon wrote:
> > > On Tue, Jul 11, 2017 at 03:19:22PM +0100, Mark Rutland wrote:
> > > > When there's a fatal signal pending, arm64's do_page_fault()
> > > > implementation returns 0. The intent is that we'll return to the
> > > > faulting userspace instruction, delivering the signal on the way.
> > > > 
> > > > However, if we take a fatal signal during fixing up a uaccess, this
> > > > results in a return to the faulting kernel instruction, which will be
> > > > instantly retried, resulting in the same fault being taken forever. As
> > > > the task never reaches userspace, the signal is not delivered, and the
> > > > task is left unkillable. While the task is stuck in this state, it can
> > > > inhibit the forward progress of the system.
> > > > 
> > > > To avoid this, we must ensure that when a fatal signal is pending, we
> > > > apply any necessary fixup for a faulting kernel instruction. Thus we
> > > > will return to an error path, and it is up to that code to make forward
> > > > progress towards delivering the fatal signal.
> > > > 
> > > > Signed-off-by: Mark Rutland 
> > > > Reviewed-by: Steve Capper 
> > > > Tested-by: Steve Capper 
> > > > Cc: Catalin Marinas 
> > > > Cc: James Morse 
> > > > Cc: Laura Abbott 
> > > > Cc: Will Deacon 
> > > > Cc: sta...@vger.kernel.org
> > > > ---
> > > >  arch/arm64/mm/fault.c | 5 -
> > > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> > > > index 37b95df..3952d5e 100644
> > > > --- a/arch/arm64/mm/fault.c
> > > > +++ b/arch/arm64/mm/fault.c
> > > > @@ -397,8 +397,11 @@ static int __kprobes do_page_fault(unsigned long 
> > > > addr, unsigned int esr,
> > > >  * signal first. We do not need to release the mmap_sem because 
> > > > it
> > > >  * would already be released in __lock_page_or_retry in 
> > > > mm/filemap.c.
> > > >  */
> > > > -   if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
> > > > +   if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) {
> > > > +   if (!user_mode(regs))
> > > > +   goto no_context;
> > > > return 0;
> > > > +   }
> > > 
> > > This will need rebasing at -rc1 (take a look at current HEAD).
> > > 
> > > Also, I think it introduces a weird corner case where we take a page fault
> > > when writing the signal frame to the user stack to deliver a SIGSEGV. If
> > > we end up with VM_FAULT_RETRY and somebody has sent a SIGKILL to the task,
> > > then we'll fail setup_sigframe and force an un-handleable SIGSEGV instead
> > > of SIGKILL.
> > > 
> > > The end result (task is killed) is the same, but the fatal signal is 
> > > wrong.
> > 
> > That doesn't seem to be the case, testing on v4.13-rc5.
> > 
> > I used sigaltstack() to use the userfaultfd region as signal stack,
> > registerd a SIGSEGV handler, and dereferenced NULL. The task locks up,
> > but when killed with a SIGINT or SIGKILL, the exit status reflects that
> > signal, rather than the SIGSEGV.
> > 
> > If I move the SIGINT handler onto the userfaultfd-monitored stack, then
> > delivering SIGINT hangs, but can be killed with SIGKILL, and the exit
> > status reflects that SIGKILL.
> > 
> > As you say, it does look like we'd try to set up a deferred SIGSEGV for
> > the failed signal delivery.
> > 
> > I haven't yet figured out exactly how that works; I'll keep digging.
> 
> The SEGV makes it all the way into do_group_exit, but then signal_group_exit
> is set and the exit_code is overridden with SIGKILL at the last minute (see
> complete_signal).

Unfortunately, this last minute is too late for print-fatal-signals.
With print-fatal-signals enabled, this patch leads to misleading
"potentially unexpected fatal signal 11" warnings if a process is
SIGKILL'd at the right time.

I've seen it without userfaultfd, but it's easiest reproduced by
patching Mark's original test code [1] with the following patch and
simply running "pkill -WINCH foo; pkill -KILL foo".  This results in:

 foo: potentially unexpected fatal signal 11.
 CPU: 1 PID: 1793 Comm: foo Not tainted 4.9.58-devel #3
 task: b3534780 task.stack: b4b7c000
 PC is at 0x76effb60
 LR is at 0x4227f4
 pc : [<76effb60>]lr : [<004227f4>]psr: 600b0010
 sp : 7eaf7bb4  ip :   fp : 
 r10: 0001  r9 : 0003  r8 : 76fcd000
 r7 : 001d  r6 : 76fd0cf0  r5 : 7eaf7c08  r4 : 
 r3 :   r2 :   r1 : 7eaf7a88  r0 : fffc
 Flags: nZCv  IRQs on  FIQs on  Mode USER_32  ISA ARM  Segment user
 Control: 10c5387d  Table: 3357404a  DAC: 0055
 CPU: 1 PID: 1793 Comm: foo Not tainted 4.9.58-devel #3
 [<801113f0>] (unwind_backtrace) from [<8010cfb0>] (show_stack+0x18/0x1c)
 [<8010cfb0>] (show_stack) from [<8039725c>] 

Re: [PATCH v2] coccinelle: fix parallel build with CHECK=scripts/coccicheck

2017-11-13 Thread Julia Lawall


On Tue, 14 Nov 2017, Masahiro Yamada wrote:

> Hi Julia,
>
> 2017-11-14 1:45 GMT+09:00 Julia Lawall :
> >
> >
> > On Tue, 14 Nov 2017, Masahiro Yamada wrote:
> >
> >> Hi Julia,
> >>
> >>
> >> 2017-11-14 0:30 GMT+09:00 Julia Lawall :
> >> >
> >> >
> >> > On Thu, 9 Nov 2017, Masahiro Yamada wrote:
> >> >
> >> >> The command "make -j8 C=1 CHECK=scripts/coccicheck" produces lots of
> >> >> "coccicheck failed" error messages.
> >> >>
> >> >> I do not know the coccinelle internals, but I guess --jobs does not
> >> >> work well if spatch is invoked from Make running in parallel.
> >> >> Disable --jobs in this case.
> >> >
> >> > Why is this change under:
> >> >
> >> > if [ "$C" = "1" -o "$C" = "2" ];
> >> >
> >> > The coccicheck failed messages come also if one runs Coccinelle on the
> >> > entire kernel.
> >>
> >> As far as I tested, "coccicheck failed" error only happens
> >> when ONLINE=1.
> >>
> >>
> >> make -j8 C=1 CHECK=scripts/coccicheck  
> >> COCCI=scripts/coccinelle/misc/bugon.cocci
> >>
> >> emits lots of errors.
> >>
> >>
> >> make -j8 coccicheck  COCCI=scripts/coccinelle/misc/bugon.cocci
> >>
> >> is fine.
> >>
> >>
> >> Have you tested it?
> >> Do you mean you got a different result from mine?
> >
> > I agree with your results, with respect to the number of errors.
> >
> > julia
> >
>
> So, what shall we do?
>
> If you do not like to fix it (or you can fix coccinelle itself),
> I can take back this patch.

I'm OK with your fix.  I will check and ack it today.

> I am not a coccinelle developer, so
> setting USE_JOBS="no" is the best I can do.

The problem on the Coccinelle side is that it uses a subdirectory with the
name of the semantic patch to store standard output and standard error for
the different threads.  I didn't want to use a name with the pid, so that
one could easily find this information while Coccinelle is running.
Normally the subdirectory is cleaned up when Coccinelle completes, so
there is only one of them at a time.  Maybe it is best to just add the
pid.  There is the risk that these subdirectories will accumulate if
Coccinelle crashes in a way such that they don't get cleaned up, but
Coccinelle could print a warning if it detects this case, rather than
failing.

Still I think it is useful to do something on the make coccicheck side,
because there is no need for the double layer of parallelism.

julia


Re: [PATCH v2] coccinelle: fix parallel build with CHECK=scripts/coccicheck

2017-11-13 Thread Julia Lawall


On Tue, 14 Nov 2017, Masahiro Yamada wrote:

> Hi Julia,
>
> 2017-11-14 1:45 GMT+09:00 Julia Lawall :
> >
> >
> > On Tue, 14 Nov 2017, Masahiro Yamada wrote:
> >
> >> Hi Julia,
> >>
> >>
> >> 2017-11-14 0:30 GMT+09:00 Julia Lawall :
> >> >
> >> >
> >> > On Thu, 9 Nov 2017, Masahiro Yamada wrote:
> >> >
> >> >> The command "make -j8 C=1 CHECK=scripts/coccicheck" produces lots of
> >> >> "coccicheck failed" error messages.
> >> >>
> >> >> I do not know the coccinelle internals, but I guess --jobs does not
> >> >> work well if spatch is invoked from Make running in parallel.
> >> >> Disable --jobs in this case.
> >> >
> >> > Why is this change under:
> >> >
> >> > if [ "$C" = "1" -o "$C" = "2" ];
> >> >
> >> > The coccicheck failed messages come also if one runs Coccinelle on the
> >> > entire kernel.
> >>
> >> As far as I tested, "coccicheck failed" error only happens
> >> when ONLINE=1.
> >>
> >>
> >> make -j8 C=1 CHECK=scripts/coccicheck  
> >> COCCI=scripts/coccinelle/misc/bugon.cocci
> >>
> >> emits lots of errors.
> >>
> >>
> >> make -j8 coccicheck  COCCI=scripts/coccinelle/misc/bugon.cocci
> >>
> >> is fine.
> >>
> >>
> >> Have you tested it?
> >> Do you mean you got a different result from mine?
> >
> > I agree with your results, with respect to the number of errors.
> >
> > julia
> >
>
> So, what shall we do?
>
> If you do not like to fix it (or you can fix coccinelle itself),
> I can take back this patch.

I'm OK with your fix.  I will check and ack it today.

> I am not a coccinelle developer, so
> setting USE_JOBS="no" is the best I can do.

The problem on the Coccinelle side is that it uses a subdirectory with the
name of the semantic patch to store standard output and standard error for
the different threads.  I didn't want to use a name with the pid, so that
one could easily find this information while Coccinelle is running.
Normally the subdirectory is cleaned up when Coccinelle completes, so
there is only one of them at a time.  Maybe it is best to just add the
pid.  There is the risk that these subdirectories will accumulate if
Coccinelle crashes in a way such that they don't get cleaned up, but
Coccinelle could print a warning if it detects this case, rather than
failing.

Still I think it is useful to do something on the make coccicheck side,
because there is no need for the double layer of parallelism.

julia


Re: [PATCH v6 4/5] crash: export paddr_vmcoreinfo_note()

2017-11-13 Thread Dave Young
On 11/13/17 at 08:29pm, Marc-André Lureau wrote:
> The following patch is going to use the symbol from the fw_cfg module,
> to call the function and write the note location details in the
> vmcoreinfo entry, so qemu can produce dumps with the vmcoreinfo note.
> 
> CC: Andrew Morton 
> CC: Baoquan He 
> CC: Dave Young 
> CC: Dave Young 
> CC: Hari Bathini 
> CC: Tony Luck 
> CC: Vivek Goyal 
> Signed-off-by: Marc-André Lureau 
> Acked-by: Gabriel Somlo 
> ---
>  kernel/crash_core.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 6db80fc0810b..47541c891810 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -375,6 +375,7 @@ phys_addr_t __weak paddr_vmcoreinfo_note(void)
>  {
>   return __pa(vmcoreinfo_note);
>  }
> +EXPORT_SYMBOL(paddr_vmcoreinfo_note);
>  
>  static int __init crash_save_vmcoreinfo_init(void)
>  {
> -- 
> 2.15.0.125.g8f49766d64
> 

Acked-by: Dave Young 

Thanks
Dave


Re: [PATCH v6 4/5] crash: export paddr_vmcoreinfo_note()

2017-11-13 Thread Dave Young
On 11/13/17 at 08:29pm, Marc-André Lureau wrote:
> The following patch is going to use the symbol from the fw_cfg module,
> to call the function and write the note location details in the
> vmcoreinfo entry, so qemu can produce dumps with the vmcoreinfo note.
> 
> CC: Andrew Morton 
> CC: Baoquan He 
> CC: Dave Young 
> CC: Dave Young 
> CC: Hari Bathini 
> CC: Tony Luck 
> CC: Vivek Goyal 
> Signed-off-by: Marc-André Lureau 
> Acked-by: Gabriel Somlo 
> ---
>  kernel/crash_core.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 6db80fc0810b..47541c891810 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -375,6 +375,7 @@ phys_addr_t __weak paddr_vmcoreinfo_note(void)
>  {
>   return __pa(vmcoreinfo_note);
>  }
> +EXPORT_SYMBOL(paddr_vmcoreinfo_note);
>  
>  static int __init crash_save_vmcoreinfo_init(void)
>  {
> -- 
> 2.15.0.125.g8f49766d64
> 

Acked-by: Dave Young 

Thanks
Dave


Re: n900 in next-20170901

2017-11-13 Thread Joonsoo Kim
On Mon, Nov 13, 2017 at 01:15:30PM -0800, Tony Lindgren wrote:
> * Tony Lindgren  [171110 07:36]:
> > * Joonsoo Kim  [171110 06:34]:
> > > On Thu, Nov 09, 2017 at 07:26:10PM -0800, Tony Lindgren wrote:
> > > > +#define OMAP34XX_SRAM_PHYS 0x4020
> > > > +#define OMAP34XX_SRAM_VIRT 0xd001
> > > > +#define OMAP34XX_SRAM_SIZE 0x1
> > > 
> > > For my testing environment, vmalloc address space is started at
> > > roughly 0xe000 so 0xd001 would not be valid.
> > 
> > Well we can map it anywhere we want, got any preferences?
> 
> Hmm and I'm also wondering what you do to make vmalloc space to
> start at 0xe000 instead of 0xd000?

Please see the another reply.

> 
> The reason I'm asking is because I think we can just move all of
> save_secure_ram_context to run from DDR instead of SRAM. But I'd
> rather do a minimal patch first that fixes your series and then we
> can test the further changes with more time.

Okay. I agree to make a minimal patch first and then go next step.

> After moving save_secure_ram_context to DDR, we can call
> save_secure_ram_context directly with something like:
> 
>   args_pa = __pa(omap3_secure_ram_storage);
>   offset = (unsigned long)omap3_secure_ram_storage - args_pa;
>   ret = save_secure_ram_context(args_pa, offset);
> 
> > Just that the current save_secure_ram_context uses "high_mask"
> > of 0x to translate the address. To make this more flexible,
> > we need the save_secure_ram_context changes too. So we might
> > want to do the static mapping and save_secure_ram_context changes
> > as a single patch.
> > 
> > > And, PHYS can be different according to the system type. Maybe either
> > > OMAP3_SRAM_PUB_PA or OMAP3_SRAM_PA. It seems that SIZE and TYPE should
> > > be considered, too. My understanding is correct?
> > 
> > We can have a static map for the whole SRAM area, see function
> > __arm_ioremap_pfn_caller() for the comment "Try to reuse one of the
> > static mapping whenever possible". So the different public SRAM start
> > addresses and sizes don't matter there.
> 
> And then if save_secure_ram_contet runs in DDR, no static map is
> needed.

Okay.

Thanks.



Re: n900 in next-20170901

2017-11-13 Thread Joonsoo Kim
On Mon, Nov 13, 2017 at 01:15:30PM -0800, Tony Lindgren wrote:
> * Tony Lindgren  [171110 07:36]:
> > * Joonsoo Kim  [171110 06:34]:
> > > On Thu, Nov 09, 2017 at 07:26:10PM -0800, Tony Lindgren wrote:
> > > > +#define OMAP34XX_SRAM_PHYS 0x4020
> > > > +#define OMAP34XX_SRAM_VIRT 0xd001
> > > > +#define OMAP34XX_SRAM_SIZE 0x1
> > > 
> > > For my testing environment, vmalloc address space is started at
> > > roughly 0xe000 so 0xd001 would not be valid.
> > 
> > Well we can map it anywhere we want, got any preferences?
> 
> Hmm and I'm also wondering what you do to make vmalloc space to
> start at 0xe000 instead of 0xd000?

Please see the another reply.

> 
> The reason I'm asking is because I think we can just move all of
> save_secure_ram_context to run from DDR instead of SRAM. But I'd
> rather do a minimal patch first that fixes your series and then we
> can test the further changes with more time.

Okay. I agree to make a minimal patch first and then go next step.

> After moving save_secure_ram_context to DDR, we can call
> save_secure_ram_context directly with something like:
> 
>   args_pa = __pa(omap3_secure_ram_storage);
>   offset = (unsigned long)omap3_secure_ram_storage - args_pa;
>   ret = save_secure_ram_context(args_pa, offset);
> 
> > Just that the current save_secure_ram_context uses "high_mask"
> > of 0x to translate the address. To make this more flexible,
> > we need the save_secure_ram_context changes too. So we might
> > want to do the static mapping and save_secure_ram_context changes
> > as a single patch.
> > 
> > > And, PHYS can be different according to the system type. Maybe either
> > > OMAP3_SRAM_PUB_PA or OMAP3_SRAM_PA. It seems that SIZE and TYPE should
> > > be considered, too. My understanding is correct?
> > 
> > We can have a static map for the whole SRAM area, see function
> > __arm_ioremap_pfn_caller() for the comment "Try to reuse one of the
> > static mapping whenever possible". So the different public SRAM start
> > addresses and sizes don't matter there.
> 
> And then if save_secure_ram_contet runs in DDR, no static map is
> needed.

Okay.

Thanks.



Re: Fwd: linux v4.14 causes firmware iwlwifi errors on Lenovo Thinkpad T440s

2017-11-13 Thread Luciano Coelho
On Mon, 2017-11-13 at 16:23 -0600, Larry Finger wrote:
> On 11/13/2017 03:30 PM, Bartosz Golaszewski wrote:
> > 2017-11-13 21:45 GMT+01:00 Larry Finger 
> > :
> > > On 11/13/2017 02:22 PM, Bartosz Golaszewski wrote:
> > > > 
> > > > Forwarding here too as I messed up the address the last time.
> > > > --
> > > > 
> > > > Hi,
> > > > 
> > > > I noticed my wireless interface can't get up with linux v4.14
> > > > and the
> > > > kernel log is flooded with firmware errors:
> > > > 
> > > > iwlwifi :03:00.0: Firmware error during reconfiguration -
> > > > reprobe!
> > > > iwlwifi :03:00.0: FW error in SYNC CMD DQA_ENABLE_CMD
> > > > 
> > > > and
> > > > 
> > > > ieee80211 phy63: Hardware restart was requested.
> > > > 
> > > > The wireless controller is: Intel Corporation Wireless 7260
> > > > (rev 83)
> > > > Firmware used is: iwlwifi-7260-17
> > > > 
> > > > Everything works fine with v4.13.12.
> > > > 
> > > > I didn't have time today to bisect for the offending commit.
> > > > Full log
> > > > uploaded[1].
> > > > 
> > > > Best regards,
> > > > Bartosz Golaszewski
> > > > 
> > > > [1] https://pastebin.com/jksqxvS6
> > > 
> > > 
> > > Your log shows "iwlwifi :03:00.0: loaded firmware version
> > > 17.228510.0
> > > op_mode iwlmvm"
> > > 
> > > Mine, where the 7260 works, shows "iwlwifi :04:00.0: loaded
> > > firmware
> > > version 17.459231.0 op_mode iwlmvm".
> > > 
> > > It seems as if you need newer firmware. A detailed file listing
> > > shows
> > > "-rw-r--r-- 1 root root 1049340 Oct  9 12:03
> > > /lib/firmware/iwlwifi-7260-17.ucode". That date is likely when I
> > > installed
> > > the updated kernel firmware package from my distro. The md5sum
> > > for the file
> > > is 73a217f55c47d3a70bb5dbbe1d676423.
> > > 
> > > Larry
> > > 
> > 
> > Ok so it seems the version in linux-firmware is outdated. The file
> > you're using is available on github[1] and fixed the issue for me.
> > 
> > Thanks!
> > Bartosz Golaszewski
> > 
> > [1] https://github.com/OpenELEC/iwlwifi-firmware
> 
> Interesting. Using md5sum of the git repo for linux-firmware gets
> 73a217f55c47d3a70bb5dbbe1d676423  iwlwifi-7260-17.ucode.
> 
> That is the file I'm using.

You shouldn't use firmwares from github or anywhere else except from
the two official places where we release it:

Mainline (but slow to get updated):

https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/iwlwifi-7260-17.ucode


Our official public tree:

https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/linux-firmware.git/plain/iwlwifi-7260-17.ucode


And, of course, your distro may distribute these in an official package
as well, but it's good to check the versions to be sure you're running
the latest one we released.

--
Cheers,
Luca.


Re: Fwd: linux v4.14 causes firmware iwlwifi errors on Lenovo Thinkpad T440s

2017-11-13 Thread Luciano Coelho
On Mon, 2017-11-13 at 16:23 -0600, Larry Finger wrote:
> On 11/13/2017 03:30 PM, Bartosz Golaszewski wrote:
> > 2017-11-13 21:45 GMT+01:00 Larry Finger 
> > :
> > > On 11/13/2017 02:22 PM, Bartosz Golaszewski wrote:
> > > > 
> > > > Forwarding here too as I messed up the address the last time.
> > > > --
> > > > 
> > > > Hi,
> > > > 
> > > > I noticed my wireless interface can't get up with linux v4.14
> > > > and the
> > > > kernel log is flooded with firmware errors:
> > > > 
> > > > iwlwifi :03:00.0: Firmware error during reconfiguration -
> > > > reprobe!
> > > > iwlwifi :03:00.0: FW error in SYNC CMD DQA_ENABLE_CMD
> > > > 
> > > > and
> > > > 
> > > > ieee80211 phy63: Hardware restart was requested.
> > > > 
> > > > The wireless controller is: Intel Corporation Wireless 7260
> > > > (rev 83)
> > > > Firmware used is: iwlwifi-7260-17
> > > > 
> > > > Everything works fine with v4.13.12.
> > > > 
> > > > I didn't have time today to bisect for the offending commit.
> > > > Full log
> > > > uploaded[1].
> > > > 
> > > > Best regards,
> > > > Bartosz Golaszewski
> > > > 
> > > > [1] https://pastebin.com/jksqxvS6
> > > 
> > > 
> > > Your log shows "iwlwifi :03:00.0: loaded firmware version
> > > 17.228510.0
> > > op_mode iwlmvm"
> > > 
> > > Mine, where the 7260 works, shows "iwlwifi :04:00.0: loaded
> > > firmware
> > > version 17.459231.0 op_mode iwlmvm".
> > > 
> > > It seems as if you need newer firmware. A detailed file listing
> > > shows
> > > "-rw-r--r-- 1 root root 1049340 Oct  9 12:03
> > > /lib/firmware/iwlwifi-7260-17.ucode". That date is likely when I
> > > installed
> > > the updated kernel firmware package from my distro. The md5sum
> > > for the file
> > > is 73a217f55c47d3a70bb5dbbe1d676423.
> > > 
> > > Larry
> > > 
> > 
> > Ok so it seems the version in linux-firmware is outdated. The file
> > you're using is available on github[1] and fixed the issue for me.
> > 
> > Thanks!
> > Bartosz Golaszewski
> > 
> > [1] https://github.com/OpenELEC/iwlwifi-firmware
> 
> Interesting. Using md5sum of the git repo for linux-firmware gets
> 73a217f55c47d3a70bb5dbbe1d676423  iwlwifi-7260-17.ucode.
> 
> That is the file I'm using.

You shouldn't use firmwares from github or anywhere else except from
the two official places where we release it:

Mainline (but slow to get updated):

https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/iwlwifi-7260-17.ucode


Our official public tree:

https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/linux-firmware.git/plain/iwlwifi-7260-17.ucode


And, of course, your distro may distribute these in an official package
as well, but it's good to check the versions to be sure you're running
the latest one we released.

--
Cheers,
Luca.


Re: n900 in next-20170901

2017-11-13 Thread Joonsoo Kim
On Fri, Nov 10, 2017 at 07:36:20AM -0800, Tony Lindgren wrote:
> * Joonsoo Kim  [171110 06:34]:
> > On Thu, Nov 09, 2017 at 07:26:10PM -0800, Tony Lindgren wrote:
> > > +#define OMAP34XX_SRAM_PHYS   0x4020
> > > +#define OMAP34XX_SRAM_VIRT   0xd001
> > > +#define OMAP34XX_SRAM_SIZE   0x1
> > 
> > For my testing environment, vmalloc address space is started at
> > roughly 0xe000 so 0xd001 would not be valid.
> 
> Well we can map it anywhere we want, got any preferences?

My testing environment is a beagle-(xm?) for QEMU. It is configured by
CONFIG_VMSPLIT_3G=y so kernel address space is started at 0xc000.
And, it has 512 MB memory so 0xc000 ~ 0xdff0 is used for
direct mapping. See below.

[0.00] Memory: 429504K/522240K available (11264K kernel code,
1562K rwdata, 4288K rodata, 2048K init, 405K bss, 27200K reserved,
65536K cma-reserved, 0K highmem)
[0.00] Virtual kernel memory layout:
[0.00] vector  : 0x - 0x1000   (   4 kB)
[0.00] fixmap  : 0xffc0 - 0xfff0   (3072 kB)
[0.00] vmalloc : 0xe000 - 0xff80   ( 504 MB)
[0.00] lowmem  : 0xc000 - 0xdff0   ( 511 MB)
[0.00] pkmap   : 0xbfe0 - 0xc000   (   2 MB)
[0.00] modules : 0xbf00 - 0xbfe0   (  14 MB)
[0.00]   .text : 0xc0208000 - 0xc0e0   (12256 kB)
[0.00]   .init : 0xc130 - 0xc150   (2048 kB)
[0.00]   .data : 0xc150 - 0xc1686810   (1563 kB)
[0.00].bss : 0xc168fc68 - 0xc16f512c   ( 406 kB)

Therefore, if OMAP34XX_SRAM_VIRT is 0xd001, direct mapping is
broken and the system doesn't work. I guess that we should use more
stable address like as 0xf000.

> 
> Just that the current save_secure_ram_context uses "high_mask"
> of 0x to translate the address. To make this more flexible,
> we need the save_secure_ram_context changes too. So we might
> want to do the static mapping and save_secure_ram_context changes
> as a single patch.
> 
> > And, PHYS can be different according to the system type. Maybe either
> > OMAP3_SRAM_PUB_PA or OMAP3_SRAM_PA. It seems that SIZE and TYPE should
> > be considered, too. My understanding is correct?
> 
> We can have a static map for the whole SRAM area, see function
> __arm_ioremap_pfn_caller() for the comment "Try to reuse one of the
> static mapping whenever possible". So the different public SRAM start
> addresses and sizes don't matter there.

Okay. Look fine with SRAM start addresses and sizes. However, we need
to consider mtype since __arm_ioremap_pfn_caller() doesn't reuse the
mapping if mtype is different. mtype can be either MT_MEMORY_RWX or
MT_MEMORY_RWX_NONCACHED.

Thanks.


Re: n900 in next-20170901

2017-11-13 Thread Joonsoo Kim
On Fri, Nov 10, 2017 at 07:36:20AM -0800, Tony Lindgren wrote:
> * Joonsoo Kim  [171110 06:34]:
> > On Thu, Nov 09, 2017 at 07:26:10PM -0800, Tony Lindgren wrote:
> > > +#define OMAP34XX_SRAM_PHYS   0x4020
> > > +#define OMAP34XX_SRAM_VIRT   0xd001
> > > +#define OMAP34XX_SRAM_SIZE   0x1
> > 
> > For my testing environment, vmalloc address space is started at
> > roughly 0xe000 so 0xd001 would not be valid.
> 
> Well we can map it anywhere we want, got any preferences?

My testing environment is a beagle-(xm?) for QEMU. It is configured by
CONFIG_VMSPLIT_3G=y so kernel address space is started at 0xc000.
And, it has 512 MB memory so 0xc000 ~ 0xdff0 is used for
direct mapping. See below.

[0.00] Memory: 429504K/522240K available (11264K kernel code,
1562K rwdata, 4288K rodata, 2048K init, 405K bss, 27200K reserved,
65536K cma-reserved, 0K highmem)
[0.00] Virtual kernel memory layout:
[0.00] vector  : 0x - 0x1000   (   4 kB)
[0.00] fixmap  : 0xffc0 - 0xfff0   (3072 kB)
[0.00] vmalloc : 0xe000 - 0xff80   ( 504 MB)
[0.00] lowmem  : 0xc000 - 0xdff0   ( 511 MB)
[0.00] pkmap   : 0xbfe0 - 0xc000   (   2 MB)
[0.00] modules : 0xbf00 - 0xbfe0   (  14 MB)
[0.00]   .text : 0xc0208000 - 0xc0e0   (12256 kB)
[0.00]   .init : 0xc130 - 0xc150   (2048 kB)
[0.00]   .data : 0xc150 - 0xc1686810   (1563 kB)
[0.00].bss : 0xc168fc68 - 0xc16f512c   ( 406 kB)

Therefore, if OMAP34XX_SRAM_VIRT is 0xd001, direct mapping is
broken and the system doesn't work. I guess that we should use more
stable address like as 0xf000.

> 
> Just that the current save_secure_ram_context uses "high_mask"
> of 0x to translate the address. To make this more flexible,
> we need the save_secure_ram_context changes too. So we might
> want to do the static mapping and save_secure_ram_context changes
> as a single patch.
> 
> > And, PHYS can be different according to the system type. Maybe either
> > OMAP3_SRAM_PUB_PA or OMAP3_SRAM_PA. It seems that SIZE and TYPE should
> > be considered, too. My understanding is correct?
> 
> We can have a static map for the whole SRAM area, see function
> __arm_ioremap_pfn_caller() for the comment "Try to reuse one of the
> static mapping whenever possible". So the different public SRAM start
> addresses and sizes don't matter there.

Okay. Look fine with SRAM start addresses and sizes. However, we need
to consider mtype since __arm_ioremap_pfn_caller() doesn't reuse the
mapping if mtype is different. mtype can be either MT_MEMORY_RWX or
MT_MEMORY_RWX_NONCACHED.

Thanks.


[RESEND PATCH v2 2/4] x86/umip: Inform that UMIP has been enabled

2017-11-13 Thread Ricardo Neri
Let us have an indication that this feature has been enabled.

Cc: Andy Lutomirski 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Suggested-by: Ingo Molnar 
Signed-off-by: Ricardo Neri 
---
 arch/x86/kernel/cpu/common.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 13ae9e5..fa998ca 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -341,6 +341,8 @@ static __always_inline void setup_umip(struct cpuinfo_x86 
*c)
 
cr4_set_bits(X86_CR4_UMIP);
 
+   pr_info("x86/cpu: Activated the Intel User Mode Instruction Prevention 
(UMIP) CPU feature\n");
+
return;
 
 out:
-- 
2.7.4



[PATCH] usb: quirks: Add no-lpm quirk for KY-688 USB 3.1 Type-C Hub

2017-11-13 Thread Kai-Heng Feng
KY-688 USB 3.1 Type-C Hub internally uses a Genesys Logic hub to connect
to Realtek r8153.

Similar to commit ("7496cfe5431f2 usb: quirks: Add no-lpm quirk for Moshi
USB to Ethernet Adapter"), no-lpm can make r8153 ethernet work.

Signed-off-by: Kai-Heng Feng 
---
 drivers/usb/core/quirks.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/usb/core/quirks.c b/drivers/usb/core/quirks.c
index a6aaf2f193a4..12246da8fcf6 100644
--- a/drivers/usb/core/quirks.c
+++ b/drivers/usb/core/quirks.c
@@ -151,6 +151,9 @@ static const struct usb_device_id usb_quirk_list[] = {
/* appletouch */
{ USB_DEVICE(0x05ac, 0x021a), .driver_info = USB_QUIRK_RESET_RESUME },
 
+   /* Genesys Logic hub, internally used by KY-688 USB 3.1 Type-C Hub */
+   { USB_DEVICE(0x05e3, 0x0612), .driver_info = USB_QUIRK_NO_LPM },
+
/* Genesys Logic hub, internally used by Moshi USB to Ethernet Adapter 
*/
{ USB_DEVICE(0x05e3, 0x0616), .driver_info = USB_QUIRK_NO_LPM },
 
-- 
2.14.1



[RESEND PATCH v2 2/4] x86/umip: Inform that UMIP has been enabled

2017-11-13 Thread Ricardo Neri
Let us have an indication that this feature has been enabled.

Cc: Andy Lutomirski 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Suggested-by: Ingo Molnar 
Signed-off-by: Ricardo Neri 
---
 arch/x86/kernel/cpu/common.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 13ae9e5..fa998ca 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -341,6 +341,8 @@ static __always_inline void setup_umip(struct cpuinfo_x86 
*c)
 
cr4_set_bits(X86_CR4_UMIP);
 
+   pr_info("x86/cpu: Activated the Intel User Mode Instruction Prevention 
(UMIP) CPU feature\n");
+
return;
 
 out:
-- 
2.7.4



[PATCH] usb: quirks: Add no-lpm quirk for KY-688 USB 3.1 Type-C Hub

2017-11-13 Thread Kai-Heng Feng
KY-688 USB 3.1 Type-C Hub internally uses a Genesys Logic hub to connect
to Realtek r8153.

Similar to commit ("7496cfe5431f2 usb: quirks: Add no-lpm quirk for Moshi
USB to Ethernet Adapter"), no-lpm can make r8153 ethernet work.

Signed-off-by: Kai-Heng Feng 
---
 drivers/usb/core/quirks.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/usb/core/quirks.c b/drivers/usb/core/quirks.c
index a6aaf2f193a4..12246da8fcf6 100644
--- a/drivers/usb/core/quirks.c
+++ b/drivers/usb/core/quirks.c
@@ -151,6 +151,9 @@ static const struct usb_device_id usb_quirk_list[] = {
/* appletouch */
{ USB_DEVICE(0x05ac, 0x021a), .driver_info = USB_QUIRK_RESET_RESUME },
 
+   /* Genesys Logic hub, internally used by KY-688 USB 3.1 Type-C Hub */
+   { USB_DEVICE(0x05e3, 0x0612), .driver_info = USB_QUIRK_NO_LPM },
+
/* Genesys Logic hub, internally used by Moshi USB to Ethernet Adapter 
*/
{ USB_DEVICE(0x05e3, 0x0616), .driver_info = USB_QUIRK_NO_LPM },
 
-- 
2.14.1



[RESEND PATCH v2 4/4] x86/umip: Warn if UMIP-protected instructions are used

2017-11-13 Thread Ricardo Neri
Issue a rate-limited warning whenever any of the instructions that UMIP
protects (i.e., SGDT, SIDT, SLDT, STR and SMSW) are used by user space
programs.

This is useful because, with UMIP enabled, the few programs that use such
instructions will start receiving a SIGSEGV signal. In the specific cases
for which emulation is provided (instructions SGDT, SIDT and SMSW in
protected and virtual-8086 modes), a warning is also helpful to encourage
updates in such programs to avoid the use of such instructions.

An existing rate-limited pr_err() is converted to use the new function
umip_pr_warn() in order to have it printing at the same rate and log
level.

Cc: Andy Lutomirski 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Suggested-by: Linus Torvalds 
Signed-off-by: Ricardo Neri 
---
 arch/x86/kernel/umip.c | 65 +-
 1 file changed, 59 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
index 2e09b5b..50f4b11 100644
--- a/arch/x86/kernel/umip.c
+++ b/arch/x86/kernel/umip.c
@@ -82,6 +82,54 @@
 #defineUMIP_INST_SLDT  3   /* 0F 00 /0 */
 #defineUMIP_INST_STR   4   /* 0F 00 /1 */
 
+const char * const umip_insns[5] = {
+   [UMIP_INST_SGDT] = "sgdt",
+   [UMIP_INST_SIDT] = "sidt",
+   [UMIP_INST_SMSW] = "smsw",
+   [UMIP_INST_SLDT] = "sldt",
+   [UMIP_INST_STR] = "str",
+};
+
+/*
+ * If you change these strings, ensure that buffers using them are sufficiently
+ * large.
+ */
+static const char umip_warn_use[] = "cannot be used by applications.";
+static const char umip_warn_emu[] = "For now, expensive software emulation 
returns result.";
+
+/**
+ * umip_pr_warn() - Print a rate-limited warning
+ * @regs:  Register set with the context in which the warning is printed
+ * @msg:   Pointer to a string with the warning message
+ * @error: Error code to print along with the warning
+ *
+ * Print the message contained in @msg along with the task name, ID number and
+ * instruction and stack pointers of the associated process. Optionally, an
+ * error code is printed if @error is not zero. These warning messages are
+ * limited to a burst of 5 messages every two minutes.
+ *
+ * Returns:
+ *
+ * None.
+ */
+static void umip_pr_warn(struct pt_regs *regs, char *msg, long error)
+{
+   struct task_struct *tsk = current;
+   char err_str[8 + BITS_PER_LONG / 4] = "";
+
+   /* Bursts of 5 messages every two minutes */
+   static DEFINE_RATELIMIT_STATE(ratelimit, 2 * 60 * HZ, 5);
+
+   if (!__ratelimit())
+   return;
+
+   if (error)
+   snprintf(err_str, sizeof(err_str), " error:%lx", error);
+
+   pr_warn("%s[%d] %s ip:%lx sp:%lx%s\n", tsk->comm, task_pid_nr(tsk), msg,
+   regs->ip, regs->sp, err_str);
+}
+
 /**
  * identify_insn() - Identify a UMIP-protected instruction
  * @insn:  Instruction structure with opcode and ModRM byte.
@@ -236,10 +284,7 @@ static void force_sig_info_umip_fault(void __user *addr, 
struct pt_regs *regs)
if (!(show_unhandled_signals && unhandled_signal(tsk, SIGSEGV)))
return;
 
-   pr_err_ratelimited("%s[%d] umip emulation segfault ip:%lx sp:%lx 
error:%x in %lx\n",
-  tsk->comm, task_pid_nr(tsk), regs->ip,
-  regs->sp, X86_PF_USER | X86_PF_WRITE,
-  regs->ip);
+   umip_pr_warn(regs, "segfault at", X86_PF_USER | X86_PF_WRITE);
 }
 
 /**
@@ -264,10 +309,10 @@ static void force_sig_info_umip_fault(void __user *addr, 
struct pt_regs *regs)
 bool fixup_umip_exception(struct pt_regs *regs)
 {
int not_copied, nr_copied, reg_offset, dummy_data_size, umip_inst;
+   unsigned char buf[MAX_INSN_SIZE], warn[128];
unsigned long seg_base = 0, *reg_addr;
/* 10 bytes is the maximum size of the result of UMIP instructions */
unsigned char dummy_data[10] = { 0 };
-   unsigned char buf[MAX_INSN_SIZE];
void __user *uaddr;
struct insn insn;
char seg_defs;
@@ -326,10 +371,18 @@ bool fixup_umip_exception(struct pt_regs *regs)
if (umip_inst < 0)
return false;
 
+   snprintf(warn, sizeof(warn), "%s %s", umip_insns[umip_inst],
+umip_warn_use);
+
/* Do not emulate SLDT, STR or user long mode processes. */
if (umip_inst == UMIP_INST_STR || umip_inst == UMIP_INST_SLDT ||
-   user_64bit_mode(regs))
+   user_64bit_mode(regs)) {
+   umip_pr_warn(regs, warn, 0);
return false;
+   }
+
+   snprintf(warn, sizeof(warn), "%s %s", warn, umip_warn_emu);
+   umip_pr_warn(regs, warn, 0);
 
if 

[RESEND PATCH v2 3/4] x86/umip: Identify the str and sldt instructions

2017-11-13 Thread Ricardo Neri
The instructions STR and SLDT are not emulated in any case. Thus, it made
sense to not implement functionality to identify them. However, a
subsequent commit will introduce functionality to warn about the use of
all the instructions that UMIP protect, not only those that are emulated.
A first step for that is the ability to identify them.

Plus, now that STR and SLDT are identified, we need to explicitly avoid
their emulation (i.e., not rely on unsuccessful identification). Group
togehter all the cases that we do not want to emulate: STR, SLDT and user
long mode processes.

Cc: Andy Lutomirski 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
This patch also corrects the #define of SMSW. This change does not have a
functional impact as it is only used as an identifier.
---
 arch/x86/kernel/umip.c | 25 +
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
index 6ba82be..2e09b5b 100644
--- a/arch/x86/kernel/umip.c
+++ b/arch/x86/kernel/umip.c
@@ -78,7 +78,9 @@
 
 #defineUMIP_INST_SGDT  0   /* 0F 01 /0 */
 #defineUMIP_INST_SIDT  1   /* 0F 01 /1 */
-#defineUMIP_INST_SMSW  3   /* 0F 01 /4 */
+#defineUMIP_INST_SMSW  2   /* 0F 01 /4 */
+#defineUMIP_INST_SLDT  3   /* 0F 00 /0 */
+#defineUMIP_INST_STR   4   /* 0F 00 /1 */
 
 /**
  * identify_insn() - Identify a UMIP-protected instruction
@@ -118,10 +120,16 @@ static int identify_insn(struct insn *insn)
default:
return -EINVAL;
}
+   } else if (insn->opcode.bytes[1] == 0x0) {
+   if (X86_MODRM_REG(insn->modrm.value) == 0)
+   return UMIP_INST_SLDT;
+   else if (X86_MODRM_REG(insn->modrm.value) == 1)
+   return UMIP_INST_STR;
+   else
+   return -EINVAL;
+   } else {
+   return -EINVAL;
}
-
-   /* SLDT AND STR are not emulated */
-   return -EINVAL;
 }
 
 /**
@@ -267,10 +275,6 @@ bool fixup_umip_exception(struct pt_regs *regs)
if (!regs)
return false;
 
-   /* Do not emulate 64-bit processes. */
-   if (user_64bit_mode(regs))
-   return false;
-
/*
 * If not in user-space long mode, a custom code segment could be in
 * use. This is true in protected mode (if the process defined a local
@@ -322,6 +326,11 @@ bool fixup_umip_exception(struct pt_regs *regs)
if (umip_inst < 0)
return false;
 
+   /* Do not emulate SLDT, STR or user long mode processes. */
+   if (umip_inst == UMIP_INST_STR || umip_inst == UMIP_INST_SLDT ||
+   user_64bit_mode(regs))
+   return false;
+
if (emulate_umip_insn(, umip_inst, dummy_data, _data_size))
return false;
 
-- 
2.7.4



[RESEND PATCH v2 3/4] x86/umip: Identify the str and sldt instructions

2017-11-13 Thread Ricardo Neri
The instructions STR and SLDT are not emulated in any case. Thus, it made
sense to not implement functionality to identify them. However, a
subsequent commit will introduce functionality to warn about the use of
all the instructions that UMIP protect, not only those that are emulated.
A first step for that is the ability to identify them.

Plus, now that STR and SLDT are identified, we need to explicitly avoid
their emulation (i.e., not rely on unsuccessful identification). Group
togehter all the cases that we do not want to emulate: STR, SLDT and user
long mode processes.

Cc: Andy Lutomirski 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
This patch also corrects the #define of SMSW. This change does not have a
functional impact as it is only used as an identifier.
---
 arch/x86/kernel/umip.c | 25 +
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
index 6ba82be..2e09b5b 100644
--- a/arch/x86/kernel/umip.c
+++ b/arch/x86/kernel/umip.c
@@ -78,7 +78,9 @@
 
 #defineUMIP_INST_SGDT  0   /* 0F 01 /0 */
 #defineUMIP_INST_SIDT  1   /* 0F 01 /1 */
-#defineUMIP_INST_SMSW  3   /* 0F 01 /4 */
+#defineUMIP_INST_SMSW  2   /* 0F 01 /4 */
+#defineUMIP_INST_SLDT  3   /* 0F 00 /0 */
+#defineUMIP_INST_STR   4   /* 0F 00 /1 */
 
 /**
  * identify_insn() - Identify a UMIP-protected instruction
@@ -118,10 +120,16 @@ static int identify_insn(struct insn *insn)
default:
return -EINVAL;
}
+   } else if (insn->opcode.bytes[1] == 0x0) {
+   if (X86_MODRM_REG(insn->modrm.value) == 0)
+   return UMIP_INST_SLDT;
+   else if (X86_MODRM_REG(insn->modrm.value) == 1)
+   return UMIP_INST_STR;
+   else
+   return -EINVAL;
+   } else {
+   return -EINVAL;
}
-
-   /* SLDT AND STR are not emulated */
-   return -EINVAL;
 }
 
 /**
@@ -267,10 +275,6 @@ bool fixup_umip_exception(struct pt_regs *regs)
if (!regs)
return false;
 
-   /* Do not emulate 64-bit processes. */
-   if (user_64bit_mode(regs))
-   return false;
-
/*
 * If not in user-space long mode, a custom code segment could be in
 * use. This is true in protected mode (if the process defined a local
@@ -322,6 +326,11 @@ bool fixup_umip_exception(struct pt_regs *regs)
if (umip_inst < 0)
return false;
 
+   /* Do not emulate SLDT, STR or user long mode processes. */
+   if (umip_inst == UMIP_INST_STR || umip_inst == UMIP_INST_SLDT ||
+   user_64bit_mode(regs))
+   return false;
+
if (emulate_umip_insn(, umip_inst, dummy_data, _data_size))
return false;
 
-- 
2.7.4



[RESEND PATCH v2 4/4] x86/umip: Warn if UMIP-protected instructions are used

2017-11-13 Thread Ricardo Neri
Issue a rate-limited warning whenever any of the instructions that UMIP
protects (i.e., SGDT, SIDT, SLDT, STR and SMSW) are used by user space
programs.

This is useful because, with UMIP enabled, the few programs that use such
instructions will start receiving a SIGSEGV signal. In the specific cases
for which emulation is provided (instructions SGDT, SIDT and SMSW in
protected and virtual-8086 modes), a warning is also helpful to encourage
updates in such programs to avoid the use of such instructions.

An existing rate-limited pr_err() is converted to use the new function
umip_pr_warn() in order to have it printing at the same rate and log
level.

Cc: Andy Lutomirski 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Suggested-by: Linus Torvalds 
Signed-off-by: Ricardo Neri 
---
 arch/x86/kernel/umip.c | 65 +-
 1 file changed, 59 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
index 2e09b5b..50f4b11 100644
--- a/arch/x86/kernel/umip.c
+++ b/arch/x86/kernel/umip.c
@@ -82,6 +82,54 @@
 #defineUMIP_INST_SLDT  3   /* 0F 00 /0 */
 #defineUMIP_INST_STR   4   /* 0F 00 /1 */
 
+const char * const umip_insns[5] = {
+   [UMIP_INST_SGDT] = "sgdt",
+   [UMIP_INST_SIDT] = "sidt",
+   [UMIP_INST_SMSW] = "smsw",
+   [UMIP_INST_SLDT] = "sldt",
+   [UMIP_INST_STR] = "str",
+};
+
+/*
+ * If you change these strings, ensure that buffers using them are sufficiently
+ * large.
+ */
+static const char umip_warn_use[] = "cannot be used by applications.";
+static const char umip_warn_emu[] = "For now, expensive software emulation 
returns result.";
+
+/**
+ * umip_pr_warn() - Print a rate-limited warning
+ * @regs:  Register set with the context in which the warning is printed
+ * @msg:   Pointer to a string with the warning message
+ * @error: Error code to print along with the warning
+ *
+ * Print the message contained in @msg along with the task name, ID number and
+ * instruction and stack pointers of the associated process. Optionally, an
+ * error code is printed if @error is not zero. These warning messages are
+ * limited to a burst of 5 messages every two minutes.
+ *
+ * Returns:
+ *
+ * None.
+ */
+static void umip_pr_warn(struct pt_regs *regs, char *msg, long error)
+{
+   struct task_struct *tsk = current;
+   char err_str[8 + BITS_PER_LONG / 4] = "";
+
+   /* Bursts of 5 messages every two minutes */
+   static DEFINE_RATELIMIT_STATE(ratelimit, 2 * 60 * HZ, 5);
+
+   if (!__ratelimit())
+   return;
+
+   if (error)
+   snprintf(err_str, sizeof(err_str), " error:%lx", error);
+
+   pr_warn("%s[%d] %s ip:%lx sp:%lx%s\n", tsk->comm, task_pid_nr(tsk), msg,
+   regs->ip, regs->sp, err_str);
+}
+
 /**
  * identify_insn() - Identify a UMIP-protected instruction
  * @insn:  Instruction structure with opcode and ModRM byte.
@@ -236,10 +284,7 @@ static void force_sig_info_umip_fault(void __user *addr, 
struct pt_regs *regs)
if (!(show_unhandled_signals && unhandled_signal(tsk, SIGSEGV)))
return;
 
-   pr_err_ratelimited("%s[%d] umip emulation segfault ip:%lx sp:%lx 
error:%x in %lx\n",
-  tsk->comm, task_pid_nr(tsk), regs->ip,
-  regs->sp, X86_PF_USER | X86_PF_WRITE,
-  regs->ip);
+   umip_pr_warn(regs, "segfault at", X86_PF_USER | X86_PF_WRITE);
 }
 
 /**
@@ -264,10 +309,10 @@ static void force_sig_info_umip_fault(void __user *addr, 
struct pt_regs *regs)
 bool fixup_umip_exception(struct pt_regs *regs)
 {
int not_copied, nr_copied, reg_offset, dummy_data_size, umip_inst;
+   unsigned char buf[MAX_INSN_SIZE], warn[128];
unsigned long seg_base = 0, *reg_addr;
/* 10 bytes is the maximum size of the result of UMIP instructions */
unsigned char dummy_data[10] = { 0 };
-   unsigned char buf[MAX_INSN_SIZE];
void __user *uaddr;
struct insn insn;
char seg_defs;
@@ -326,10 +371,18 @@ bool fixup_umip_exception(struct pt_regs *regs)
if (umip_inst < 0)
return false;
 
+   snprintf(warn, sizeof(warn), "%s %s", umip_insns[umip_inst],
+umip_warn_use);
+
/* Do not emulate SLDT, STR or user long mode processes. */
if (umip_inst == UMIP_INST_STR || umip_inst == UMIP_INST_SLDT ||
-   user_64bit_mode(regs))
+   user_64bit_mode(regs)) {
+   umip_pr_warn(regs, warn, 0);
return false;
+   }
+
+   snprintf(warn, sizeof(warn), "%s %s", warn, umip_warn_emu);
+   umip_pr_warn(regs, warn, 0);
 
if (emulate_umip_insn(, umip_inst, dummy_data, _data_size))
return false;
-- 
2.7.4



[RESEND PATCH v2 0/4] x86: Tweaks for UMIP

2017-11-13 Thread Ricardo Neri
[To tip maintainers: This is a resend to copy the Linux kernel mailing
list. No changes in the patches since my original v2 submission.]

Now that the support for UMIP [1], [2] has been merged in the tip tree,
this series add a couple of tweaks.

Ingo asked for two small additions to select UMIP by default when building
and inform of this feature being enabled [3].

Also, Linus suggested to issue a rate-limited warning whenever the any of
the instructions that UMIP protects are used by user space programs [4].
This is useful to give programs a hint on the reason for which they start
seeing an unexpected SIGSEGV signal. Also, it helps to encourage updates
to those programs and avoid using these instructions if possible.

Thanks and BR,
Ricardo

[1]. https://lkml.org/lkml/2017/10/27/699
[2]. https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1523438.html
[3]. https://lkml.org/lkml/2017/11/8/238
[4]. https://lkml.org/lkml/2017/11/8/593

Changes since V1:
* Capitalize all the instructions' mnemonics in both code and patch
  descriptions.
* Correct documentation of umip_pr_warn() to correctly reflect the function
  name.
* Update description of patch #4 to describe the update to the existing
  rate-limited pr_err().

Ricardo Neri (4):
  x86/umip: Select X86_INTEL_UMIP by default
  x86/umip: Inform that UMIP has been enabled
  x86/umip: Identify the str and sldt instructions
  x86/umip: Warn if UMIP-protected instructions are used

 arch/x86/Kconfig | 10 -
 arch/x86/kernel/cpu/common.c |  2 +
 arch/x86/kernel/umip.c   | 88 +---
 3 files changed, 85 insertions(+), 15 deletions(-)

-- 
2.7.4



[RESEND PATCH v2 1/4] x86/umip: Select X86_INTEL_UMIP by default

2017-11-13 Thread Ricardo Neri
UMIP does not incur in a significant performance penalty. Furthermore, it
is triggered only when a small group of instructions are used from user
space programs.

While here, provide more details on the benefits UMIP provides and the
behavior that can expect the few applications that use the instructions
protected by UMIP.

Cc: Andy Lutomirski 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Suggested-by: Ingo Molnar 
Signed-off-by: Ricardo Neri 
---
 arch/x86/Kconfig | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f08977d..a524a7a 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1805,14 +1805,20 @@ config X86_SMAP
  If unsure, say Y.
 
 config X86_INTEL_UMIP
-   def_bool n
+   def_bool y
depends on CPU_SUP_INTEL
prompt "Intel User Mode Instruction Prevention" if EXPERT
---help---
  The User Mode Instruction Prevention (UMIP) is a security
  feature in newer Intel processors. If enabled, a general
  protection fault is issued if the instructions SGDT, SLDT,
- SIDT, SMSW and STR are executed in user mode.
+ SIDT, SMSW and STR are executed in user mode. These instructions
+ unnecessarily expose information about the hardware state.
+
+ The vast majority of applications do not use these instructions.
+ For the very few that do, software emulation is provided in
+ specific cases in protected and virtual-8086 modes. Emulated
+ results are dummy.
 
 config X86_INTEL_MPX
prompt "Intel MPX (Memory Protection Extensions)"
-- 
2.7.4



  1   2   3   4   5   6   7   8   9   10   >