[PATCH] scsi: core: fix two wrong indentation cases

2018-02-25 Thread Jianchao Wang
No functional changes. Just fix two wrong indentation cases in
scsi_finish_command and scsi_decide_disposition.

Signed-off-by: Jianchao Wang 
---
 drivers/scsi/scsi.c   | 2 +-
 drivers/scsi/scsi_error.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index a7e4fba..4c60c26 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -231,7 +231,7 @@ void scsi_finish_command(struct scsi_cmnd *cmd)
"(result %x)\n", cmd->result));
 
good_bytes = scsi_bufflen(cmd);
-if (!blk_rq_is_passthrough(cmd->request)) {
+   if (!blk_rq_is_passthrough(cmd->request)) {
int old_good_bytes = good_bytes;
drv = scsi_cmd_to_driver(cmd);
if (drv->done)
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index d042915..96066d1 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1888,7 +1888,7 @@ int scsi_decide_disposition(struct scsi_cmnd *scmd)
}
return FAILED;
 
-  maybe_retry:
+maybe_retry:
 
/* we requeue for retry because the error was retryable, and
 * the request was not marked fast fail.  Note that above,
-- 
2.7.4



[PATCH] scsi: core: fix two wrong indentation cases

2018-02-25 Thread Jianchao Wang
No functional changes. Just fix two wrong indentation cases in
scsi_finish_command and scsi_decide_disposition.

Signed-off-by: Jianchao Wang 
---
 drivers/scsi/scsi.c   | 2 +-
 drivers/scsi/scsi_error.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index a7e4fba..4c60c26 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -231,7 +231,7 @@ void scsi_finish_command(struct scsi_cmnd *cmd)
"(result %x)\n", cmd->result));
 
good_bytes = scsi_bufflen(cmd);
-if (!blk_rq_is_passthrough(cmd->request)) {
+   if (!blk_rq_is_passthrough(cmd->request)) {
int old_good_bytes = good_bytes;
drv = scsi_cmd_to_driver(cmd);
if (drv->done)
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index d042915..96066d1 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1888,7 +1888,7 @@ int scsi_decide_disposition(struct scsi_cmnd *scmd)
}
return FAILED;
 
-  maybe_retry:
+maybe_retry:
 
/* we requeue for retry because the error was retryable, and
 * the request was not marked fast fail.  Note that above,
-- 
2.7.4



[PATCH] scsi: core: use blk_mq_requeue_request in __scsi_queue_insert

2018-02-25 Thread Jianchao Wang
In scsi core, __scsi_queue_insert should just put request back on
the queue and retry using the same command as before. However, for
blk-mq, scsi_mq_requeue_cmd is employed here which will unprepare
the request. To align with the semantics of __scsi_queue_insert,
just use blk_mq_requeue_request with kick_requeue_list == true.

Cc: Christoph Hellwig 
Signed-off-by: Jianchao Wang 
---
 drivers/scsi/scsi_lib.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index a86df9c..06d8110 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -191,7 +191,7 @@ static void __scsi_queue_insert(struct scsi_cmnd *cmd, int 
reason, bool unbusy)
 */
cmd->result = 0;
if (q->mq_ops) {
-   scsi_mq_requeue_cmd(cmd);
+   blk_mq_requeue_request(cmd->request, true);
return;
}
spin_lock_irqsave(q->queue_lock, flags);
-- 
2.7.4



[PATCH] scsi: core: use blk_mq_requeue_request in __scsi_queue_insert

2018-02-25 Thread Jianchao Wang
In scsi core, __scsi_queue_insert should just put request back on
the queue and retry using the same command as before. However, for
blk-mq, scsi_mq_requeue_cmd is employed here which will unprepare
the request. To align with the semantics of __scsi_queue_insert,
just use blk_mq_requeue_request with kick_requeue_list == true.

Cc: Christoph Hellwig 
Signed-off-by: Jianchao Wang 
---
 drivers/scsi/scsi_lib.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index a86df9c..06d8110 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -191,7 +191,7 @@ static void __scsi_queue_insert(struct scsi_cmnd *cmd, int 
reason, bool unbusy)
 */
cmd->result = 0;
if (q->mq_ops) {
-   scsi_mq_requeue_cmd(cmd);
+   blk_mq_requeue_request(cmd->request, true);
return;
}
spin_lock_irqsave(q->queue_lock, flags);
-- 
2.7.4



Re: [PATCH] staging:r8188eu: use lib80211 CCMP decrypt

2018-02-25 Thread Dan Carpenter
On Fri, Feb 23, 2018 at 05:57:42PM +0300, Ivan Safonov wrote:
> Custom AES decrypt implementation replaced with lib80211 library.
> 
> Signed-off-by: Ivan Safonov 

The new code looks like original RTL code (really bad) so I'm guessing
you copy and pasted the code from somewhere else?

The idea is good, but RTL code is painful to look at.

regards,
dan carpenter



Re: [PATCH] staging:r8188eu: use lib80211 CCMP decrypt

2018-02-25 Thread Dan Carpenter
On Fri, Feb 23, 2018 at 05:57:42PM +0300, Ivan Safonov wrote:
> Custom AES decrypt implementation replaced with lib80211 library.
> 
> Signed-off-by: Ivan Safonov 

The new code looks like original RTL code (really bad) so I'm guessing
you copy and pasted the code from somewhere else?

The idea is good, but RTL code is painful to look at.

regards,
dan carpenter



Re: [PATCH RFC] driver core: Reprobe consumer if it was unbound by dropped device_link

2018-02-25 Thread Jyri Sarha
On 25/02/18 11:22, Lukas Wunner wrote:
> On Thu, Feb 22, 2018 at 07:42:46PM +0200, Jyri Sarha wrote:
>> Put consumer device to deferred probe list if it is unbound due to a
>> dropped link to a supplier.
>>
>> When a device link supplier is unbound (either manually or because one
>> of its own suppliers was unbound), its consumers are unbound as
>> well. Currently if the supplier binds again after this the consumer
>> does not automatically probe again. With this patch it does.
> 
> Yes I think this makes sense, based on the rationale that the consumer
> was automatically unbound, so by symmetry it should also be automatically
> rebound.
> 
> The only thing I don't understand is you wrote in an earlier e-mail of a
> difference in behavior depending on whether driver_deferred_probe_add()
> is called before or after device_release_driver_internal().
> That's really odd, it shouldn't make a difference.
> 

In that version there was a couple of other bugs elsewhere in the
system. I tried to reproduce that situation again multiple times, but I
could not (even write those bugs back in there, and move the link
creation back to panel bridge code). But even from reading the code that
difference did not make any sense. I suspect a heisen-bug, after I have
read the code and understood that the crash is not possible, it does not
happen anymore :).

With the current version I could not find any difference in behaviour
depending on the order of device_release_driver_internal() and
driver_deferred_probe_add() calls. I just thought having them in this
order just lookes nicer.

I stress tested the code by unloading and loading the panel driver in a
tight loop, for several minutes, but it simply wont crash (not in this
setup anyway). The probe of tilcdc eventually gracefully fails in a CMA
failure.

Best regards,
Jyri

> Thanks,
> 
> Lukas
> 
>>
>> If this patch is not acceptable as such, how about adding this
>> behavior behind a new device link flag?
>>
>> The idea to this patch was gotten from this post by Lucas Wunner:
>> https://www.spinics.net/lists/dri-devel/msg166318.html
>>
>> Part of the code and the description is borrowed from him.
>>
>> cc: Lukas Wunner 
>> cc: Rafael J. Wysocki 
>> cc: Thierry Reding 
>> Signed-off-by: Jyri Sarha 
>> ---
>>  drivers/base/base.h | 1 +
>>  drivers/base/core.c | 2 ++
>>  drivers/base/dd.c   | 2 +-
>>  3 files changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/base/base.h b/drivers/base/base.h
>> index d800de6..39370eb 100644
>> --- a/drivers/base/base.h
>> +++ b/drivers/base/base.h
>> @@ -114,6 +114,7 @@ extern void device_release_driver_internal(struct device 
>> *dev,
>>  
>>  extern void driver_detach(struct device_driver *drv);
>>  extern int driver_probe_device(struct device_driver *drv, struct device 
>> *dev);
>> +extern void driver_deferred_probe_add(struct device *dev);
>>  extern void driver_deferred_probe_del(struct device *dev);
>>  static inline int driver_match_device(struct device_driver *drv,
>>struct device *dev)
>> diff --git a/drivers/base/core.c b/drivers/base/core.c
>> index b2261f9..0964ed5 100644
>> --- a/drivers/base/core.c
>> +++ b/drivers/base/core.c
>> @@ -570,6 +570,8 @@ void device_links_unbind_consumers(struct device *dev)
>>  
>>  device_release_driver_internal(consumer, NULL,
>> consumer->parent);
>> +driver_deferred_probe_add(consumer);
>> +
>>  put_device(consumer);
>>  goto start;
>>  }
>> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
>> index de6fd09..846ae78 100644
>> --- a/drivers/base/dd.c
>> +++ b/drivers/base/dd.c
>> @@ -140,7 +140,7 @@ static void deferred_probe_work_func(struct work_struct 
>> *work)
>>  }
>>  static DECLARE_WORK(deferred_probe_work, deferred_probe_work_func);
>>  
>> -static void driver_deferred_probe_add(struct device *dev)
>> +void driver_deferred_probe_add(struct device *dev)
>>  {
>>  mutex_lock(_probe_mutex);
>>  if (list_empty(>p->deferred_probe)) {


-- 
Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki


Re: [PATCH RFC] driver core: Reprobe consumer if it was unbound by dropped device_link

2018-02-25 Thread Jyri Sarha
On 25/02/18 11:22, Lukas Wunner wrote:
> On Thu, Feb 22, 2018 at 07:42:46PM +0200, Jyri Sarha wrote:
>> Put consumer device to deferred probe list if it is unbound due to a
>> dropped link to a supplier.
>>
>> When a device link supplier is unbound (either manually or because one
>> of its own suppliers was unbound), its consumers are unbound as
>> well. Currently if the supplier binds again after this the consumer
>> does not automatically probe again. With this patch it does.
> 
> Yes I think this makes sense, based on the rationale that the consumer
> was automatically unbound, so by symmetry it should also be automatically
> rebound.
> 
> The only thing I don't understand is you wrote in an earlier e-mail of a
> difference in behavior depending on whether driver_deferred_probe_add()
> is called before or after device_release_driver_internal().
> That's really odd, it shouldn't make a difference.
> 

In that version there was a couple of other bugs elsewhere in the
system. I tried to reproduce that situation again multiple times, but I
could not (even write those bugs back in there, and move the link
creation back to panel bridge code). But even from reading the code that
difference did not make any sense. I suspect a heisen-bug, after I have
read the code and understood that the crash is not possible, it does not
happen anymore :).

With the current version I could not find any difference in behaviour
depending on the order of device_release_driver_internal() and
driver_deferred_probe_add() calls. I just thought having them in this
order just lookes nicer.

I stress tested the code by unloading and loading the panel driver in a
tight loop, for several minutes, but it simply wont crash (not in this
setup anyway). The probe of tilcdc eventually gracefully fails in a CMA
failure.

Best regards,
Jyri

> Thanks,
> 
> Lukas
> 
>>
>> If this patch is not acceptable as such, how about adding this
>> behavior behind a new device link flag?
>>
>> The idea to this patch was gotten from this post by Lucas Wunner:
>> https://www.spinics.net/lists/dri-devel/msg166318.html
>>
>> Part of the code and the description is borrowed from him.
>>
>> cc: Lukas Wunner 
>> cc: Rafael J. Wysocki 
>> cc: Thierry Reding 
>> Signed-off-by: Jyri Sarha 
>> ---
>>  drivers/base/base.h | 1 +
>>  drivers/base/core.c | 2 ++
>>  drivers/base/dd.c   | 2 +-
>>  3 files changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/base/base.h b/drivers/base/base.h
>> index d800de6..39370eb 100644
>> --- a/drivers/base/base.h
>> +++ b/drivers/base/base.h
>> @@ -114,6 +114,7 @@ extern void device_release_driver_internal(struct device 
>> *dev,
>>  
>>  extern void driver_detach(struct device_driver *drv);
>>  extern int driver_probe_device(struct device_driver *drv, struct device 
>> *dev);
>> +extern void driver_deferred_probe_add(struct device *dev);
>>  extern void driver_deferred_probe_del(struct device *dev);
>>  static inline int driver_match_device(struct device_driver *drv,
>>struct device *dev)
>> diff --git a/drivers/base/core.c b/drivers/base/core.c
>> index b2261f9..0964ed5 100644
>> --- a/drivers/base/core.c
>> +++ b/drivers/base/core.c
>> @@ -570,6 +570,8 @@ void device_links_unbind_consumers(struct device *dev)
>>  
>>  device_release_driver_internal(consumer, NULL,
>> consumer->parent);
>> +driver_deferred_probe_add(consumer);
>> +
>>  put_device(consumer);
>>  goto start;
>>  }
>> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
>> index de6fd09..846ae78 100644
>> --- a/drivers/base/dd.c
>> +++ b/drivers/base/dd.c
>> @@ -140,7 +140,7 @@ static void deferred_probe_work_func(struct work_struct 
>> *work)
>>  }
>>  static DECLARE_WORK(deferred_probe_work, deferred_probe_work_func);
>>  
>> -static void driver_deferred_probe_add(struct device *dev)
>> +void driver_deferred_probe_add(struct device *dev)
>>  {
>>  mutex_lock(_probe_mutex);
>>  if (list_empty(>p->deferred_probe)) {


-- 
Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki


Re: [RESEND PATCH v3 1/2] sched/deadline: Add cpudl_maximum_dl() for clean-up

2018-02-25 Thread Byungchul Park

On 1/11/2018 6:07 PM, Peter Zijlstra wrote:



Sorry for the huge delay on this, but I'll have to postpone further.
Still busy with meltdown/spectre stuff.


Do you have time to see the patch, now that it seems to be managed
to solve those security issues?

--
Thanks,
Byungchul


Re: [RESEND PATCH v3 1/2] sched/deadline: Add cpudl_maximum_dl() for clean-up

2018-02-25 Thread Byungchul Park

On 1/11/2018 6:07 PM, Peter Zijlstra wrote:



Sorry for the huge delay on this, but I'll have to postpone further.
Still busy with meltdown/spectre stuff.


Do you have time to see the patch, now that it seems to be managed
to solve those security issues?

--
Thanks,
Byungchul


Re: [tip:x86/boot] x86/boot/compressed/64: Prepare trampoline memory

2018-02-25 Thread Ingo Molnar

* Ingo Molnar  wrote:

> We need to re-do this as we have now run into _exactly_ the kind of difficult 
> to 
> debug bug that I was worried about when I insisted on the many iterations of 
> this patch-set...

Ok, so I rebased tip:x86/mm and removed these two commits:

adf9ca9c69a2: x86/boot/compressed/64: Handle 5-level paging boot if kernel is 
above 4G

 arch/x86/boot/compressed/head_64.S | 127 ++---
 1 file changed, 89 insertions(+), 38 deletions(-)

b91993a87aff: x86/boot/compressed/64: Prepare trampoline memory

 arch/x86/boot/compressed/head_64.S| 24 ++-
 arch/x86/boot/compressed/pgtable.h| 18 
 arch/x86/boot/compressed/pgtable_64.c | 79 +++
 3 files changed, 120 insertions(+), 1 deletion(-)

The other 5-level paging related commits tested out fine and are properly 
fine-grained.

Please split these two patches up - the first patch can probably be split up 
into 
5 parts or so. Let's start with no more than 5 patches per iteration, ok?

Thanks,

Ingo


Re: [tip:x86/boot] x86/boot/compressed/64: Prepare trampoline memory

2018-02-25 Thread Ingo Molnar

* Ingo Molnar  wrote:

> We need to re-do this as we have now run into _exactly_ the kind of difficult 
> to 
> debug bug that I was worried about when I insisted on the many iterations of 
> this patch-set...

Ok, so I rebased tip:x86/mm and removed these two commits:

adf9ca9c69a2: x86/boot/compressed/64: Handle 5-level paging boot if kernel is 
above 4G

 arch/x86/boot/compressed/head_64.S | 127 ++---
 1 file changed, 89 insertions(+), 38 deletions(-)

b91993a87aff: x86/boot/compressed/64: Prepare trampoline memory

 arch/x86/boot/compressed/head_64.S| 24 ++-
 arch/x86/boot/compressed/pgtable.h| 18 
 arch/x86/boot/compressed/pgtable_64.c | 79 +++
 3 files changed, 120 insertions(+), 1 deletion(-)

The other 5-level paging related commits tested out fine and are properly 
fine-grained.

Please split these two patches up - the first patch can probably be split up 
into 
5 parts or so. Let's start with no more than 5 patches per iteration, ok?

Thanks,

Ingo


Re: [PATCH 05/21] powerpc: Avoid comparison of unsigned long >= 0 in pfn_valid

2018-02-25 Thread Mathieu Malaterre
On Mon, Feb 26, 2018 at 7:32 AM, Christophe LEROY
 wrote:
>
>
> Le 25/02/2018 à 18:22, Mathieu Malaterre a écrit :
>>
>> Rewrite comparison since all values compared are of type `unsigned long`.
>>
>> Fix a warning (treated as error in W=1):
>>
>>CC  arch/powerpc/kernel/irq.o
>> In file included from ./include/linux/bug.h:5:0,
>>   from ./include/linux/cpumask.h:13,
>>   from ./include/linux/smp.h:13,
>>   from ./include/linux/kernel_stat.h:5,
>>   from arch/powerpc/kernel/irq.c:35:
>> ./include/linux/dma-mapping.h: In function ‘dma_map_resource’:
>> ./arch/powerpc/include/asm/page.h:129:32: error: comparison of unsigned
>> expression >= 0 is always true [-Werror=type-limits]
>>   #define pfn_valid(pfn)  ((pfn) >= ARCH_PFN_OFFSET && (pfn) < max_mapnr)
>>  ^
>> Suggested-by: Segher Boessenkool 
>> Signed-off-by: Mathieu Malaterre 
>> ---
>>   arch/powerpc/include/asm/page.h | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/include/asm/page.h
>> b/arch/powerpc/include/asm/page.h
>> index 8da5d4c1cab2..19dea64e7ed2 100644
>> --- a/arch/powerpc/include/asm/page.h
>> +++ b/arch/powerpc/include/asm/page.h
>> @@ -126,7 +126,8 @@ extern long long virt_phys_offset;
>> #ifdef CONFIG_FLATMEM
>>   #define ARCH_PFN_OFFSET   ((unsigned long)(MEMORY_START >>
>> PAGE_SHIFT))
>> -#define pfn_valid(pfn) ((pfn) >= ARCH_PFN_OFFSET && (pfn) <
>> max_mapnr)
>> +#define pfn_valid(pfn) \
>> +   (((pfn) - ARCH_PFN_OFFSET) < (max_mapnr -
>> ARCH_PFN_OFFSET))
>
>
> What will happen when ARCH_PFN_OFFSET is not nul and pfn is lower than
> ARCH_PFN_OFFSET ?

I assumed that normal unsigned integers modulo would make the test
fail. But for some particular value of max_mapnr the two are indeed
not equivalent.

> Christophe
>
>
>>   #endif
>> #define virt_to_pfn(kaddr)  (__pa(kaddr) >> PAGE_SHIFT)
>>
>


Re: [PATCH 05/21] powerpc: Avoid comparison of unsigned long >= 0 in pfn_valid

2018-02-25 Thread Mathieu Malaterre
On Mon, Feb 26, 2018 at 7:32 AM, Christophe LEROY
 wrote:
>
>
> Le 25/02/2018 à 18:22, Mathieu Malaterre a écrit :
>>
>> Rewrite comparison since all values compared are of type `unsigned long`.
>>
>> Fix a warning (treated as error in W=1):
>>
>>CC  arch/powerpc/kernel/irq.o
>> In file included from ./include/linux/bug.h:5:0,
>>   from ./include/linux/cpumask.h:13,
>>   from ./include/linux/smp.h:13,
>>   from ./include/linux/kernel_stat.h:5,
>>   from arch/powerpc/kernel/irq.c:35:
>> ./include/linux/dma-mapping.h: In function ‘dma_map_resource’:
>> ./arch/powerpc/include/asm/page.h:129:32: error: comparison of unsigned
>> expression >= 0 is always true [-Werror=type-limits]
>>   #define pfn_valid(pfn)  ((pfn) >= ARCH_PFN_OFFSET && (pfn) < max_mapnr)
>>  ^
>> Suggested-by: Segher Boessenkool 
>> Signed-off-by: Mathieu Malaterre 
>> ---
>>   arch/powerpc/include/asm/page.h | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/include/asm/page.h
>> b/arch/powerpc/include/asm/page.h
>> index 8da5d4c1cab2..19dea64e7ed2 100644
>> --- a/arch/powerpc/include/asm/page.h
>> +++ b/arch/powerpc/include/asm/page.h
>> @@ -126,7 +126,8 @@ extern long long virt_phys_offset;
>> #ifdef CONFIG_FLATMEM
>>   #define ARCH_PFN_OFFSET   ((unsigned long)(MEMORY_START >>
>> PAGE_SHIFT))
>> -#define pfn_valid(pfn) ((pfn) >= ARCH_PFN_OFFSET && (pfn) <
>> max_mapnr)
>> +#define pfn_valid(pfn) \
>> +   (((pfn) - ARCH_PFN_OFFSET) < (max_mapnr -
>> ARCH_PFN_OFFSET))
>
>
> What will happen when ARCH_PFN_OFFSET is not nul and pfn is lower than
> ARCH_PFN_OFFSET ?

I assumed that normal unsigned integers modulo would make the test
fail. But for some particular value of max_mapnr the two are indeed
not equivalent.

> Christophe
>
>
>>   #endif
>> #define virt_to_pfn(kaddr)  (__pa(kaddr) >> PAGE_SHIFT)
>>
>


Re: [PATCH] cpufreq: scpi: invoke frequency-invariance setter function

2018-02-25 Thread Dietmar Eggemann
On 02/22/2018 11:27 PM, Rafael J. Wysocki wrote:
> On Tue, Feb 20, 2018 at 12:10 PM, Dietmar Eggemann
>  wrote:

[...]

>> Fixes: 343a8d17fa8d ("cpufreq: scpi: remove arm_big_little dependency")
>> Cc: Rafael J. Wysocki 
>> Cc: Viresh Kumar 
>> Cc: Sudeep Holla 
>> Signed-off-by: Dietmar Eggemann 
>> Acked-by: Sudeep Holla 
> 
> This is really minor, but I would reorder this slightly.

Tried to figure out what would be the better order. Not sure since I saw
different examples. Can you tell what would be the best tag order?

[...]

>> diff --git a/drivers/cpufreq/scpi-cpufreq.c b/drivers/cpufreq/scpi-cpufreq.c
>> index c32a833e1b00..3101d4e9c2de 100644
>> --- a/drivers/cpufreq/scpi-cpufreq.c
>> +++ b/drivers/cpufreq/scpi-cpufreq.c
>> @@ -51,13 +51,19 @@ static unsigned int scpi_cpufreq_get_rate(unsigned int 
>> cpu)
>>   static int
>>   scpi_cpufreq_set_target(struct cpufreq_policy *policy, unsigned int index)
>>   {
>> +   unsigned long freq = policy->freq_table[index].frequency;
>>  struct scpi_data *priv = policy->driver_data;
>> -   u64 rate = policy->freq_table[index].frequency * 1000;
>> +   u64 rate = freq * 1000;
>>  int ret;
>>
>>  ret = clk_set_rate(priv->clk, rate);
>> -   if (!ret && (clk_get_rate(priv->clk) != rate))
>> -   ret = -EIO;
>> +   if (!ret) {
> 
> I would do:
> 
> if (ret)
>  return ret;
> 
> arch_set_freq_scale(policy->related_cpus, freq, policy->cpuinfo.max_freq);
> 
> if (clk_get_rate(priv->clk) != rate)
>  return -EIO;
> 
> return 0;
> 
> That's somewhat easier to follow for me.

Yes I can change this.

> 
>> +   if (clk_get_rate(priv->clk) != rate)
>> +   ret = -EIO;
>> +
>> +   arch_set_freq_scale(policy->related_cpus, freq,
>> +   policy->cpuinfo.max_freq);
>> +   }
>>
>>  return ret;
>>   }
>> --
> 
> I also am not sure why you want to call arch_set_freq_scale() even if
> the new clock rate didn't stick.

Right, this is much better.

 static int
 scpi_cpufreq_set_target(struct cpufreq_policy *policy, unsigned int index)
 {
+   unsigned long freq = policy->freq_table[index].frequency;
struct scpi_data *priv = policy->driver_data;
-   u64 rate = policy->freq_table[index].frequency * 1000;
+   u64 rate = freq * 1000;
int ret;
 
ret = clk_set_rate(priv->clk, rate);
-   if (!ret && (clk_get_rate(priv->clk) != rate))
-   ret = -EIO;
 
-   return ret;
+   if (ret)
+   return ret;
+
+   if (clk_get_rate(priv->clk) != rate)
+   return -EIO;
+
+   arch_set_freq_scale(policy->related_cpus, freq,
+   policy->cpuinfo.max_freq);
+
+   return 0;

Will send out a v2 as soon as I know the preferred tag order.


Re: [PATCH] cpufreq: scpi: invoke frequency-invariance setter function

2018-02-25 Thread Dietmar Eggemann
On 02/22/2018 11:27 PM, Rafael J. Wysocki wrote:
> On Tue, Feb 20, 2018 at 12:10 PM, Dietmar Eggemann
>  wrote:

[...]

>> Fixes: 343a8d17fa8d ("cpufreq: scpi: remove arm_big_little dependency")
>> Cc: Rafael J. Wysocki 
>> Cc: Viresh Kumar 
>> Cc: Sudeep Holla 
>> Signed-off-by: Dietmar Eggemann 
>> Acked-by: Sudeep Holla 
> 
> This is really minor, but I would reorder this slightly.

Tried to figure out what would be the better order. Not sure since I saw
different examples. Can you tell what would be the best tag order?

[...]

>> diff --git a/drivers/cpufreq/scpi-cpufreq.c b/drivers/cpufreq/scpi-cpufreq.c
>> index c32a833e1b00..3101d4e9c2de 100644
>> --- a/drivers/cpufreq/scpi-cpufreq.c
>> +++ b/drivers/cpufreq/scpi-cpufreq.c
>> @@ -51,13 +51,19 @@ static unsigned int scpi_cpufreq_get_rate(unsigned int 
>> cpu)
>>   static int
>>   scpi_cpufreq_set_target(struct cpufreq_policy *policy, unsigned int index)
>>   {
>> +   unsigned long freq = policy->freq_table[index].frequency;
>>  struct scpi_data *priv = policy->driver_data;
>> -   u64 rate = policy->freq_table[index].frequency * 1000;
>> +   u64 rate = freq * 1000;
>>  int ret;
>>
>>  ret = clk_set_rate(priv->clk, rate);
>> -   if (!ret && (clk_get_rate(priv->clk) != rate))
>> -   ret = -EIO;
>> +   if (!ret) {
> 
> I would do:
> 
> if (ret)
>  return ret;
> 
> arch_set_freq_scale(policy->related_cpus, freq, policy->cpuinfo.max_freq);
> 
> if (clk_get_rate(priv->clk) != rate)
>  return -EIO;
> 
> return 0;
> 
> That's somewhat easier to follow for me.

Yes I can change this.

> 
>> +   if (clk_get_rate(priv->clk) != rate)
>> +   ret = -EIO;
>> +
>> +   arch_set_freq_scale(policy->related_cpus, freq,
>> +   policy->cpuinfo.max_freq);
>> +   }
>>
>>  return ret;
>>   }
>> --
> 
> I also am not sure why you want to call arch_set_freq_scale() even if
> the new clock rate didn't stick.

Right, this is much better.

 static int
 scpi_cpufreq_set_target(struct cpufreq_policy *policy, unsigned int index)
 {
+   unsigned long freq = policy->freq_table[index].frequency;
struct scpi_data *priv = policy->driver_data;
-   u64 rate = policy->freq_table[index].frequency * 1000;
+   u64 rate = freq * 1000;
int ret;
 
ret = clk_set_rate(priv->clk, rate);
-   if (!ret && (clk_get_rate(priv->clk) != rate))
-   ret = -EIO;
 
-   return ret;
+   if (ret)
+   return ret;
+
+   if (clk_get_rate(priv->clk) != rate)
+   return -EIO;
+
+   arch_set_freq_scale(policy->related_cpus, freq,
+   policy->cpuinfo.max_freq);
+
+   return 0;

Will send out a v2 as soon as I know the preferred tag order.


Re: [PATCH v6 4/5] fuse: Ensure posix acls are translated outside of init_user_ns

2018-02-25 Thread Miklos Szeredi
On Thu, Feb 22, 2018 at 11:50 PM, Eric W. Biederman
 wrote:

> So if we could figure out how to use the generic acl support for the old
> brand of fuse filesystems that don't set FUSE_POSIX_ACL it would be much
> easier to support them long term.

Simplest and most robust way seems to be to do everything the same (as
with FUSE_POSIX_ACL) but tell the vfs not to cache the acl.

Thanks,
Miklos


Re: [PATCH v6 4/5] fuse: Ensure posix acls are translated outside of init_user_ns

2018-02-25 Thread Miklos Szeredi
On Thu, Feb 22, 2018 at 11:50 PM, Eric W. Biederman
 wrote:

> So if we could figure out how to use the generic acl support for the old
> brand of fuse filesystems that don't set FUSE_POSIX_ACL it would be much
> easier to support them long term.

Simplest and most robust way seems to be to do everything the same (as
with FUSE_POSIX_ACL) but tell the vfs not to cache the acl.

Thanks,
Miklos


Re: [PATCHv3 1/2] zsmalloc: introduce zs_huge_object() function

2018-02-25 Thread Minchan Kim
On Mon, Feb 26, 2018 at 03:50:35PM +0900, Sergey Senozhatsky wrote:
> On (02/26/18 14:58), Minchan Kim wrote:
> [..]
> > > Right. The changes are pretty trivial, that's why I kept then in
> > > 2 simple patches. Besides, I didn't want to mix zsmalloc and zram
> > > changes.
> > 
> > As I said earlier, it's not thing we usually do, at least, MM.
> > Anyway, I don't want to insist on it because it depends each
> > person's point of view what's the better for review, git-bisect.
> 
> Thanks :)
> 
> > > > size_t huge_size = _zs_huge_object(pool);
> > > > ..
> > > > ..
> > > > if (comp_size >= huge_size)
> > > > memcpy(dst, src, 4K);
> > > 
> > > Yes, can do. My plan was to keep it completely internally to zsmalloc.
> > > Who knows, it might become smart enough one day to do something more
> > > than just size comparison. Any reason you used that leading underscore
> > 
> > Let's do that in future if someone want it. :)
> 
> OK.
> 
> > > in _zs_huge_object()?
> > 
> > 
> > Nope. It's just typo. Let's think better name.
> > How about using zs_huge_size()?
> 
> hm, I think `huge_size' on it's own is a bit general and cryptic.
> zs_huge_object_size() or zs_huge_class_size()?

I wanted to use more general word to hide zsmalloc internal but
I realized it's really impossible to hide them all.
If so, let's use zs_huge_class_size and then let's add big fat
comment what the API represents in there.

Thanks, Sergey!


Re: [PATCHv3 1/2] zsmalloc: introduce zs_huge_object() function

2018-02-25 Thread Minchan Kim
On Mon, Feb 26, 2018 at 03:50:35PM +0900, Sergey Senozhatsky wrote:
> On (02/26/18 14:58), Minchan Kim wrote:
> [..]
> > > Right. The changes are pretty trivial, that's why I kept then in
> > > 2 simple patches. Besides, I didn't want to mix zsmalloc and zram
> > > changes.
> > 
> > As I said earlier, it's not thing we usually do, at least, MM.
> > Anyway, I don't want to insist on it because it depends each
> > person's point of view what's the better for review, git-bisect.
> 
> Thanks :)
> 
> > > > size_t huge_size = _zs_huge_object(pool);
> > > > ..
> > > > ..
> > > > if (comp_size >= huge_size)
> > > > memcpy(dst, src, 4K);
> > > 
> > > Yes, can do. My plan was to keep it completely internally to zsmalloc.
> > > Who knows, it might become smart enough one day to do something more
> > > than just size comparison. Any reason you used that leading underscore
> > 
> > Let's do that in future if someone want it. :)
> 
> OK.
> 
> > > in _zs_huge_object()?
> > 
> > 
> > Nope. It's just typo. Let's think better name.
> > How about using zs_huge_size()?
> 
> hm, I think `huge_size' on it's own is a bit general and cryptic.
> zs_huge_object_size() or zs_huge_class_size()?

I wanted to use more general word to hide zsmalloc internal but
I realized it's really impossible to hide them all.
If so, let's use zs_huge_class_size and then let's add big fat
comment what the API represents in there.

Thanks, Sergey!


Re: [PATCH 06/21] powerpc: Avoid comparison of unsigned long >= 0 in __access_ok

2018-02-25 Thread Mathieu Malaterre
On Mon, Feb 26, 2018 at 7:50 AM, Christophe LEROY
 wrote:
>
>
> Le 26/02/2018 à 07:34, Christophe LEROY a écrit :
>>
>>
>>
>> Le 25/02/2018 à 18:22, Mathieu Malaterre a écrit :
>>>
>>> Rewrite check size - 1 <= Y as size < Y since `size` is unsigned value.
>>> Fix warning (treated as error in W=1):
>>>
>>>CC  arch/powerpc/kernel/signal_32.o
>>> In file included from ./include/linux/uaccess.h:14:0,
>>>   from ./include/asm-generic/termios-base.h:8,
>>>   from ./arch/powerpc/include/asm/termios.h:20,
>>>   from ./include/uapi/linux/termios.h:6,
>>>   from ./include/linux/tty.h:7,
>>>   from arch/powerpc/kernel/signal_32.c:36:
>>> ./include/asm-generic/termios-base.h: In function
>>> ‘user_termio_to_kernel_termios’:
>>> ./arch/powerpc/include/asm/uaccess.h:52:35: error: comparison of unsigned
>>> expression >= 0 is always true [-Werror=type-limits]
>>> (((size) == 0) || (((size) - 1) <= ((segment).seg - (addr)
>>> ^
>>> ./arch/powerpc/include/asm/uaccess.h:58:3: note: in expansion of macro
>>> ‘__access_ok’
>>> __access_ok((__force unsigned long)(addr), (size), get_fs()))
>>> ^~~
>>> ./arch/powerpc/include/asm/uaccess.h:262:6: note: in expansion of macro
>>> ‘access_ok’
>>>if (access_ok(VERIFY_READ, __gu_addr, (size)))   \
>>>^
>>> ./arch/powerpc/include/asm/uaccess.h:80:2: note: in expansion of macro
>>> ‘__get_user_check’
>>>__get_user_check((x), (ptr), sizeof(*(ptr)))
>>>^~~~
>>> ./include/asm-generic/termios-base.h:36:6: note: in expansion of macro
>>> ‘get_user’
>>>if (get_user(termios->c_line, >c_line) < 0)
>>>^~~~
>>> [...]
>>> cc1: all warnings being treated as errors
>>>
>>> Signed-off-by: Mathieu Malaterre 
>>> ---
>>>   arch/powerpc/include/asm/uaccess.h | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/arch/powerpc/include/asm/uaccess.h
>>> b/arch/powerpc/include/asm/uaccess.h
>>> index 51bfeb8777f0..fadc406bd39d 100644
>>> --- a/arch/powerpc/include/asm/uaccess.h
>>> +++ b/arch/powerpc/include/asm/uaccess.h
>>> @@ -49,7 +49,7 @@
>>>   #define __access_ok(addr, size, segment)\
>>>   (((addr) <= (segment).seg) &&\
>>> - (((size) == 0) || (((size) - 1) <= ((segment).seg - (addr)
>>> + (((size) == 0) || ((size) < ((segment).seg - (addr)
>>
>>
>> IIUC, ((2 - 1) <= 1) is the same as (2 < 1) ?
>

The whole series was pretty mediocre, but this one was actually pretty
destructive. Thanks for catching this.

>
> Note that I already try to submit a fix for this warning 3 years ago
> (https://patchwork.ozlabs.org/patch/418075/) and it was rejected with the
> following comment:
>
> Again, I don't think Linux enables this warning.  What did you do to
> produce this?  In any case, it's a bad warning that doesn't take macros
> into account, and the answer is not to make the code less clear by hiding
> the fact that zero is a special case.

Right. I'll try to see how to make W=1 run without error with an
alternate solution.

> Christophe
>
>
>>
>> Christophe
>>
>>>   #endif
>>>
>


Re: [PATCH 06/21] powerpc: Avoid comparison of unsigned long >= 0 in __access_ok

2018-02-25 Thread Mathieu Malaterre
On Mon, Feb 26, 2018 at 7:50 AM, Christophe LEROY
 wrote:
>
>
> Le 26/02/2018 à 07:34, Christophe LEROY a écrit :
>>
>>
>>
>> Le 25/02/2018 à 18:22, Mathieu Malaterre a écrit :
>>>
>>> Rewrite check size - 1 <= Y as size < Y since `size` is unsigned value.
>>> Fix warning (treated as error in W=1):
>>>
>>>CC  arch/powerpc/kernel/signal_32.o
>>> In file included from ./include/linux/uaccess.h:14:0,
>>>   from ./include/asm-generic/termios-base.h:8,
>>>   from ./arch/powerpc/include/asm/termios.h:20,
>>>   from ./include/uapi/linux/termios.h:6,
>>>   from ./include/linux/tty.h:7,
>>>   from arch/powerpc/kernel/signal_32.c:36:
>>> ./include/asm-generic/termios-base.h: In function
>>> ‘user_termio_to_kernel_termios’:
>>> ./arch/powerpc/include/asm/uaccess.h:52:35: error: comparison of unsigned
>>> expression >= 0 is always true [-Werror=type-limits]
>>> (((size) == 0) || (((size) - 1) <= ((segment).seg - (addr)
>>> ^
>>> ./arch/powerpc/include/asm/uaccess.h:58:3: note: in expansion of macro
>>> ‘__access_ok’
>>> __access_ok((__force unsigned long)(addr), (size), get_fs()))
>>> ^~~
>>> ./arch/powerpc/include/asm/uaccess.h:262:6: note: in expansion of macro
>>> ‘access_ok’
>>>if (access_ok(VERIFY_READ, __gu_addr, (size)))   \
>>>^
>>> ./arch/powerpc/include/asm/uaccess.h:80:2: note: in expansion of macro
>>> ‘__get_user_check’
>>>__get_user_check((x), (ptr), sizeof(*(ptr)))
>>>^~~~
>>> ./include/asm-generic/termios-base.h:36:6: note: in expansion of macro
>>> ‘get_user’
>>>if (get_user(termios->c_line, >c_line) < 0)
>>>^~~~
>>> [...]
>>> cc1: all warnings being treated as errors
>>>
>>> Signed-off-by: Mathieu Malaterre 
>>> ---
>>>   arch/powerpc/include/asm/uaccess.h | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/arch/powerpc/include/asm/uaccess.h
>>> b/arch/powerpc/include/asm/uaccess.h
>>> index 51bfeb8777f0..fadc406bd39d 100644
>>> --- a/arch/powerpc/include/asm/uaccess.h
>>> +++ b/arch/powerpc/include/asm/uaccess.h
>>> @@ -49,7 +49,7 @@
>>>   #define __access_ok(addr, size, segment)\
>>>   (((addr) <= (segment).seg) &&\
>>> - (((size) == 0) || (((size) - 1) <= ((segment).seg - (addr)
>>> + (((size) == 0) || ((size) < ((segment).seg - (addr)
>>
>>
>> IIUC, ((2 - 1) <= 1) is the same as (2 < 1) ?
>

The whole series was pretty mediocre, but this one was actually pretty
destructive. Thanks for catching this.

>
> Note that I already try to submit a fix for this warning 3 years ago
> (https://patchwork.ozlabs.org/patch/418075/) and it was rejected with the
> following comment:
>
> Again, I don't think Linux enables this warning.  What did you do to
> produce this?  In any case, it's a bad warning that doesn't take macros
> into account, and the answer is not to make the code less clear by hiding
> the fact that zero is a special case.

Right. I'll try to see how to make W=1 run without error with an
alternate solution.

> Christophe
>
>
>>
>> Christophe
>>
>>>   #endif
>>>
>


RE: [PATCH V1 2/3] efi: Introduce efi_rts_workqueue and necessary infrastructure to invoke all efi_runtime_services()

2018-02-25 Thread Prakhya, Sai Praneeth
> > diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
> > index ac5db5f8dbbf..4714b305ca90 100644
> > --- a/drivers/firmware/efi/efi.c
> > +++ b/drivers/firmware/efi/efi.c
> > @@ -76,6 +76,8 @@ static unsigned long *efi_tables[] = {
> > _attr_table,
> >  };
> >
> > +struct workqueue_struct *efi_rts_wq;
> > +
> >  static bool disable_runtime;
> >  static int __init setup_noefi(char *arg)  { @@ -329,6 +331,15 @@
> > static int __init efisubsys_init(void)
> > if (!efi_enabled(EFI_BOOT))
> > return 0;
> >
> > +   /* Create a work queue to run EFI Runtime Services */
> > +   efi_rts_wq = create_workqueue("efi_rts_workqueue");
> 
> Please consider alloc_workqueue() instead with the appropriate flags, since
> create_workqueue() and friends are deprecated.

Sure! Will change in V2.

> 
> > +   if (!efi_rts_wq) {
> > +   pr_err("Failed to create efi_rts_workqueue, EFI runtime 
> > services "
> > +  "disabled.\n");
> > +   clear_bit(EFI_RUNTIME_SERVICES, );
> > +   return 0;
> > +   }
> > +
> > /*
> >  * Clean DUMMY object calls EFI Runtime Service, set_variable(), so
> >  * it should be invoked only after efi_rts_workqueue is ready.
> > diff --git a/drivers/firmware/efi/runtime-wrappers.c
> > b/drivers/firmware/efi/runtime-wrappers.c
> > index ae54870b2788..5cdb787da5d3 100644
> > --- a/drivers/firmware/efi/runtime-wrappers.c
> > +++ b/drivers/firmware/efi/runtime-wrappers.c
> > @@ -1,6 +1,14 @@
> >  /*
> >   * runtime-wrappers.c - Runtime Services function call wrappers
> >   *
> > + * Implementation summary:
> > + * ---
> > + * 1. When user/kernel thread requests to execute
> > + efi_runtime_service(),
> > + * enqueue work to efi_rts_workqueue.
> > + * 2. Caller thread waits until the work is finished because it's
> > + * dependent on the return status and execution of efi_runtime_service().
> > + * For instance, get_variable() and get_next_variable().
> > + *
> >   * Copyright (C) 2014 Linaro Ltd. 
> >   *
> >   * Split off from arch/x86/platform/efi/efi.c @@ -22,6 +30,8 @@
> > #include   #include   #include
> > 
> > +#include 
> > +
> >  #include 
> >
> >  /*
> > @@ -33,6 +43,50 @@
> >  #define __efi_call_virt(f, args...) \
> > __efi_call_virt_pointer(efi.systab->runtime, f, args)
> >
> > +/* Each EFI Runtime Service is represented with a unique number */
> > +#define GET_TIME   0
> > +#define SET_TIME   1
> > +#define GET_WAKEUP_TIME2
> > +#define SET_WAKEUP_TIME3
> > +#define GET_VARIABLE   4
> > +#define GET_NEXT_VARIABLE  5
> > +#define SET_VARIABLE   6
> > +#define SET_VARIABLE_NONBLOCKING   7
> > +#define QUERY_VARIABLE_INFO8
> > +#define QUERY_VARIABLE_INFO_NONBLOCKING9
> > +#define GET_NEXT_HIGH_MONO_COUNT   10
> > +#define RESET_SYSTEM   11
> > +#define UPDATE_CAPSULE 12
> > +#define QUERY_CAPSULE_CAPS 13
> 
> An enum would be better, given these are just internal, contiguous IDs.
> 

Makes sense.

> > +
> > +/*
> > + * efi_queue_work: Queue efi_runtime_service() and wait until it's done
> > + * @rts:   efi_runtime_service() function identifier
> > + * @rts_arg<1-5>:  efi_runtime_service() function arguments
> > + *
> > + * Accesses to efi_runtime_services() are serialized by a binary
> > + * semaphore (efi_runtime_lock) and caller waits until the work is
> > + * finished, hence _only_ one work is queued at a time. So,
> > +queue_work()
> > + * should never fail.
> > + */
> > +#define efi_queue_work(_rts, _arg1, _arg2, _arg3, _arg4, _arg5)
> > \
> > +({ \
> > +   struct efi_runtime_work efi_rts_work;   \
> > +   efi_rts_work.status = EFI_ABORTED;  \
> > +   \
> > +   INIT_WORK_ONSTACK(_rts_work.work, efi_call_rts);\
> > +   efi_rts_work.func = _rts;   \
> > +   efi_rts_work.arg1 = _arg1;  \
> > +   efi_rts_work.arg2 = _arg2;  \
> > +   efi_rts_work.arg3 = _arg3;  \
> > +   efi_rts_work.arg4 = _arg4;  \
> > +   efi_rts_work.arg5 = _arg5;  \
> > +   if (queue_work(efi_rts_wq, _rts_work.work))

Re: [PATCH] mm: Fix races between address_space dereference and free in page_evicatable

2018-02-25 Thread Minchan Kim
On Mon, Feb 26, 2018 at 02:38:04PM +0800, Huang, Ying wrote:
> Minchan Kim  writes:
> 
> > Hi Jan,
> >
> > On Mon, Feb 19, 2018 at 11:57:35AM +0100, Jan Kara wrote:
> >> Hi Minchan,
> >> 
> >> On Sun 18-02-18 18:22:45, Minchan Kim wrote:
> >> > On Mon, Feb 12, 2018 at 04:12:27PM +0800, Huang, Ying wrote:
> >> > > From: Huang Ying 
> >> > > 
> >> > > When page_mapping() is called and the mapping is dereferenced in
> >> > > page_evicatable() through shrink_active_list(), it is possible for the
> >> > > inode to be truncated and the embedded address space to be freed at
> >> > > the same time.  This may lead to the following race.
> >> > > 
> >> > > CPU1CPU2
> >> > > 
> >> > > truncate(inode) 
> >> > > shrink_active_list()
> >> > >   ... 
> >> > > page_evictable(page)
> >> > >   truncate_inode_page(mapping, page);
> >> > > delete_from_page_cache(page)
> >> > >   spin_lock_irqsave(>tree_lock, flags);
> >> > > __delete_from_page_cache(page, NULL)
> >> > >   page_cache_tree_delete(..)
> >> > > ... mapping = 
> >> > > page_mapping(page);
> >> > > page->mapping = NULL;
> >> > > ...
> >> > >   spin_unlock_irqrestore(>tree_lock, flags);
> >> > >   page_cache_free_page(mapping, page)
> >> > > put_page(page)
> >> > >   if (put_page_testzero(page)) -> false
> >> > > - inode now has no pages and can be freed including embedded 
> >> > > address_space
> >> > > 
> >> > > 
> >> > > mapping_unevictable(mapping)
> >> > >  
> >> > > test_bit(AS_UNEVICTABLE, >flags);
> >> > > - we've dereferenced mapping which is potentially already free.
> >> > > 
> >> > > Similar race exists between swap cache freeing and page_evicatable() 
> >> > > too.
> >> > > 
> >> > > The address_space in inode and swap cache will be freed after a RCU
> >> > > grace period.  So the races are fixed via enclosing the page_mapping()
> >> > > and address_space usage in rcu_read_lock/unlock().  Some comments are
> >> > > added in code to make it clear what is protected by the RCU read lock.
> >> > 
> >> > Is it always true for every FSes, even upcoming FSes?
> >> > IOW, do we have any strict rule FS folks must use RCU(i.e., call_rcu)
> >> > to destroy inode?
> >> > 
> >> > Let's cc linux-fs.
> >> 
> >> That's actually a good question. Pathname lookup relies on inodes being
> >> protected by RCU so "normal" filesystems definitely need to use RCU freeing
> >> of inodes. OTOH a filesystem could in theory refuse any attempt for RCU
> >> pathname walk (in its .d_revalidate/.d_compare callback) and then get away
> >> with freeing its inodes normally AFAICT. I don't see that happening
> >> anywhere in the tree but in theory it is possible with some effort... But
> >> frankly I don't see a good reason for that so all we should do is to
> >> document that .destroy_inode needs to free the inode structure through RCU
> >> if it uses page cache? Al?
> >
> > Yub, it would be much better. However, how does this patch fix the problem?
> > Although it can make only page_evictable safe, we could go with the page
> > further and finally uses page->mapping, again.
> > For instance,
> >
> > shrink_active_list
> > page_evictable();
> > ..
> > page_referened()
> > page_rmapping
> > page->mapping
> 
> This only checks the value of page->mapping, not deference
> page->mapping.  So it should be safe.

Oops, you're right. I got confused. However, I want to make the lock
consistent(i.e., use page_lock to protect address_space) but cannot
come with better way.

Sorry for the noise, Huang.

> 
> Best Regards,
> Huang, Ying
> 
> > I think caller should lock the page to protect entire operation, which
> > have been used more widely to pin a address_space.
> >
> > Thanks.


RE: [PATCH V1 2/3] efi: Introduce efi_rts_workqueue and necessary infrastructure to invoke all efi_runtime_services()

2018-02-25 Thread Prakhya, Sai Praneeth
> > diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
> > index ac5db5f8dbbf..4714b305ca90 100644
> > --- a/drivers/firmware/efi/efi.c
> > +++ b/drivers/firmware/efi/efi.c
> > @@ -76,6 +76,8 @@ static unsigned long *efi_tables[] = {
> > _attr_table,
> >  };
> >
> > +struct workqueue_struct *efi_rts_wq;
> > +
> >  static bool disable_runtime;
> >  static int __init setup_noefi(char *arg)  { @@ -329,6 +331,15 @@
> > static int __init efisubsys_init(void)
> > if (!efi_enabled(EFI_BOOT))
> > return 0;
> >
> > +   /* Create a work queue to run EFI Runtime Services */
> > +   efi_rts_wq = create_workqueue("efi_rts_workqueue");
> 
> Please consider alloc_workqueue() instead with the appropriate flags, since
> create_workqueue() and friends are deprecated.

Sure! Will change in V2.

> 
> > +   if (!efi_rts_wq) {
> > +   pr_err("Failed to create efi_rts_workqueue, EFI runtime 
> > services "
> > +  "disabled.\n");
> > +   clear_bit(EFI_RUNTIME_SERVICES, );
> > +   return 0;
> > +   }
> > +
> > /*
> >  * Clean DUMMY object calls EFI Runtime Service, set_variable(), so
> >  * it should be invoked only after efi_rts_workqueue is ready.
> > diff --git a/drivers/firmware/efi/runtime-wrappers.c
> > b/drivers/firmware/efi/runtime-wrappers.c
> > index ae54870b2788..5cdb787da5d3 100644
> > --- a/drivers/firmware/efi/runtime-wrappers.c
> > +++ b/drivers/firmware/efi/runtime-wrappers.c
> > @@ -1,6 +1,14 @@
> >  /*
> >   * runtime-wrappers.c - Runtime Services function call wrappers
> >   *
> > + * Implementation summary:
> > + * ---
> > + * 1. When user/kernel thread requests to execute
> > + efi_runtime_service(),
> > + * enqueue work to efi_rts_workqueue.
> > + * 2. Caller thread waits until the work is finished because it's
> > + * dependent on the return status and execution of efi_runtime_service().
> > + * For instance, get_variable() and get_next_variable().
> > + *
> >   * Copyright (C) 2014 Linaro Ltd. 
> >   *
> >   * Split off from arch/x86/platform/efi/efi.c @@ -22,6 +30,8 @@
> > #include   #include   #include
> > 
> > +#include 
> > +
> >  #include 
> >
> >  /*
> > @@ -33,6 +43,50 @@
> >  #define __efi_call_virt(f, args...) \
> > __efi_call_virt_pointer(efi.systab->runtime, f, args)
> >
> > +/* Each EFI Runtime Service is represented with a unique number */
> > +#define GET_TIME   0
> > +#define SET_TIME   1
> > +#define GET_WAKEUP_TIME2
> > +#define SET_WAKEUP_TIME3
> > +#define GET_VARIABLE   4
> > +#define GET_NEXT_VARIABLE  5
> > +#define SET_VARIABLE   6
> > +#define SET_VARIABLE_NONBLOCKING   7
> > +#define QUERY_VARIABLE_INFO8
> > +#define QUERY_VARIABLE_INFO_NONBLOCKING9
> > +#define GET_NEXT_HIGH_MONO_COUNT   10
> > +#define RESET_SYSTEM   11
> > +#define UPDATE_CAPSULE 12
> > +#define QUERY_CAPSULE_CAPS 13
> 
> An enum would be better, given these are just internal, contiguous IDs.
> 

Makes sense.

> > +
> > +/*
> > + * efi_queue_work: Queue efi_runtime_service() and wait until it's done
> > + * @rts:   efi_runtime_service() function identifier
> > + * @rts_arg<1-5>:  efi_runtime_service() function arguments
> > + *
> > + * Accesses to efi_runtime_services() are serialized by a binary
> > + * semaphore (efi_runtime_lock) and caller waits until the work is
> > + * finished, hence _only_ one work is queued at a time. So,
> > +queue_work()
> > + * should never fail.
> > + */
> > +#define efi_queue_work(_rts, _arg1, _arg2, _arg3, _arg4, _arg5)
> > \
> > +({ \
> > +   struct efi_runtime_work efi_rts_work;   \
> > +   efi_rts_work.status = EFI_ABORTED;  \
> > +   \
> > +   INIT_WORK_ONSTACK(_rts_work.work, efi_call_rts);\
> > +   efi_rts_work.func = _rts;   \
> > +   efi_rts_work.arg1 = _arg1;  \
> > +   efi_rts_work.arg2 = _arg2;  \
> > +   efi_rts_work.arg3 = _arg3;  \
> > +   efi_rts_work.arg4 = _arg4;  \
> > +   efi_rts_work.arg5 = _arg5;  \
> > +   if (queue_work(efi_rts_wq, _rts_work.work)) \
> > +   

Re: [PATCH] mm: Fix races between address_space dereference and free in page_evicatable

2018-02-25 Thread Minchan Kim
On Mon, Feb 26, 2018 at 02:38:04PM +0800, Huang, Ying wrote:
> Minchan Kim  writes:
> 
> > Hi Jan,
> >
> > On Mon, Feb 19, 2018 at 11:57:35AM +0100, Jan Kara wrote:
> >> Hi Minchan,
> >> 
> >> On Sun 18-02-18 18:22:45, Minchan Kim wrote:
> >> > On Mon, Feb 12, 2018 at 04:12:27PM +0800, Huang, Ying wrote:
> >> > > From: Huang Ying 
> >> > > 
> >> > > When page_mapping() is called and the mapping is dereferenced in
> >> > > page_evicatable() through shrink_active_list(), it is possible for the
> >> > > inode to be truncated and the embedded address space to be freed at
> >> > > the same time.  This may lead to the following race.
> >> > > 
> >> > > CPU1CPU2
> >> > > 
> >> > > truncate(inode) 
> >> > > shrink_active_list()
> >> > >   ... 
> >> > > page_evictable(page)
> >> > >   truncate_inode_page(mapping, page);
> >> > > delete_from_page_cache(page)
> >> > >   spin_lock_irqsave(>tree_lock, flags);
> >> > > __delete_from_page_cache(page, NULL)
> >> > >   page_cache_tree_delete(..)
> >> > > ... mapping = 
> >> > > page_mapping(page);
> >> > > page->mapping = NULL;
> >> > > ...
> >> > >   spin_unlock_irqrestore(>tree_lock, flags);
> >> > >   page_cache_free_page(mapping, page)
> >> > > put_page(page)
> >> > >   if (put_page_testzero(page)) -> false
> >> > > - inode now has no pages and can be freed including embedded 
> >> > > address_space
> >> > > 
> >> > > 
> >> > > mapping_unevictable(mapping)
> >> > >  
> >> > > test_bit(AS_UNEVICTABLE, >flags);
> >> > > - we've dereferenced mapping which is potentially already free.
> >> > > 
> >> > > Similar race exists between swap cache freeing and page_evicatable() 
> >> > > too.
> >> > > 
> >> > > The address_space in inode and swap cache will be freed after a RCU
> >> > > grace period.  So the races are fixed via enclosing the page_mapping()
> >> > > and address_space usage in rcu_read_lock/unlock().  Some comments are
> >> > > added in code to make it clear what is protected by the RCU read lock.
> >> > 
> >> > Is it always true for every FSes, even upcoming FSes?
> >> > IOW, do we have any strict rule FS folks must use RCU(i.e., call_rcu)
> >> > to destroy inode?
> >> > 
> >> > Let's cc linux-fs.
> >> 
> >> That's actually a good question. Pathname lookup relies on inodes being
> >> protected by RCU so "normal" filesystems definitely need to use RCU freeing
> >> of inodes. OTOH a filesystem could in theory refuse any attempt for RCU
> >> pathname walk (in its .d_revalidate/.d_compare callback) and then get away
> >> with freeing its inodes normally AFAICT. I don't see that happening
> >> anywhere in the tree but in theory it is possible with some effort... But
> >> frankly I don't see a good reason for that so all we should do is to
> >> document that .destroy_inode needs to free the inode structure through RCU
> >> if it uses page cache? Al?
> >
> > Yub, it would be much better. However, how does this patch fix the problem?
> > Although it can make only page_evictable safe, we could go with the page
> > further and finally uses page->mapping, again.
> > For instance,
> >
> > shrink_active_list
> > page_evictable();
> > ..
> > page_referened()
> > page_rmapping
> > page->mapping
> 
> This only checks the value of page->mapping, not deference
> page->mapping.  So it should be safe.

Oops, you're right. I got confused. However, I want to make the lock
consistent(i.e., use page_lock to protect address_space) but cannot
come with better way.

Sorry for the noise, Huang.

> 
> Best Regards,
> Huang, Ying
> 
> > I think caller should lock the page to protect entire operation, which
> > have been used more widely to pin a address_space.
> >
> > Thanks.


Re: [tip:x86/boot] x86/boot/compressed/64: Prepare trampoline memory

2018-02-25 Thread Ingo Molnar

* Kirill A. Shutemov  wrote:

> +#if 0
>   /*
>* Find a suitable spot for the trampoline.
>* This code is based on reserve_bios_regions().
> @@ -49,6 +50,9 @@ struct paging_config paging_prepare(void)
>   /* Place the trampoline just below the end of low memory, aligned to 4k 
> */
>   paging_config.trampoline_start = bios_start - TRAMPOLINE_32BIT_SIZE;
>   paging_config.trampoline_start = 
> round_down(paging_config.trampoline_start, PAGE_SIZE);
> +#else
> + paging_config.trampoline_start = 0x99000;
> +#endif

So if it's suspected to be 'Video BIOS undeclared RAM use' related then 
wouldn't a 
lower address be safer?

Such as:

paging_config.trampoline_start = 0x4;

or so?

Also, could do a puts() hexdump of the affected memory area _before_ we 
overwrite 
it? Is it empty? Could we add some debug warning that checks that it's all 
zeroes?

I also kind of regret that this remained a single commit:

 3 files changed, 120 insertions(+), 1 deletion(-)

this should be split up further:

 - one patch that adds trampoline space to the kernel image
 - one patch that calculates the trampoline address and prints the address
 - one or two patch that does the functional changes
 - (any more split-up you can think of - early boot code is very fragile!)

It will be painful to rebase x86/mm but I think it's unavoidable at this stage.

There's also a few other things I don't like in paging_prepare():

1)

/* Check if LA57 is desired and supported */
if (IS_ENABLED(CONFIG_X86_5LEVEL) && native_cpuid_eax(0) >= 7 &&
(native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31
paging_config.l5_required = 1;

... it isn't explained why this feature CPU check is so complex.

2)

+   /* Place the trampoline just below the end of low memory, aligned to 4k 
*/
+   paging_config.trampoline_start = bios_start - TRAMPOLINE_32BIT_SIZE;
+   paging_config.trampoline_start = 
round_down(paging_config.trampoline_start, PAGE_SIZE);

placing trampolines just below or just above BIOS images is fragile. Instead a 
better heuristic is to use the "middle" of suspected available RAM and work 
from 
there.

3)

+   /* Clear trampoline memory first */
+   memset(trampoline, 0, TRAMPOLINE_32BIT_SIZE);

Memory bootup state is typically all zeroes (except maybe for kexec), so this 
should check that what it's clearing doesn't contain any data.

It should probably also clear this memory _after_ use.

4)

+   /*
+* Set up a new page table that will be used for switching from 4-
+* to 5-level paging or vice versa. In other cases trampoline
+* wouldn't touch CR3.
+*
+* For 4- to 5-level paging transition, set up current CR3 as the
+* first and the only entry in a new top-level page table.
+*
+* For 5- to 4-level paging transition, copy page table pointed by
+* first entry in the current top-level page table as our new
+* top-level page table. We just cannot point to the page table
+* from trampoline as it may be above 4G.
+*/
+   if (paging_config.l5_required) {
+   trampoline[TRAMPOLINE_32BIT_PGTABLE_OFFSET] = 
__native_read_cr3() + _PAGE_TABLE_NOENC;
+   } else if (native_read_cr4() & X86_CR4_LA57) {
+   unsigned long src;
+
+   src = *(unsigned long *)__native_read_cr3() & PAGE_MASK;
+   memcpy(trampoline + TRAMPOLINE_32BIT_PGTABLE_OFFSET / 
sizeof(unsigned long),
+  (void *)src, PAGE_SIZE);
+   }

Why '+ _PAGE_TABLE_NOENC', while not ' |' ?

Also, it isn't clear what is where at this stage and it would be helpful to add 
comments explaining the general purpose.

There's also two main objects here:

 - the mode switching code trampoline
 - the trampoline pagetable

it's not clear from this code where is which - and the naming isn't overly 
clear 
either: is '*trampoline' the code, or the pagetable, or both?

We need to re-do this as we have now run into _exactly_ the kind of difficult 
to 
debug bug that I was worried about when I insisted on the many iterations of 
this 
patch-set...

Thanks,

Ingo


Re: [tip:x86/boot] x86/boot/compressed/64: Prepare trampoline memory

2018-02-25 Thread Ingo Molnar

* Kirill A. Shutemov  wrote:

> +#if 0
>   /*
>* Find a suitable spot for the trampoline.
>* This code is based on reserve_bios_regions().
> @@ -49,6 +50,9 @@ struct paging_config paging_prepare(void)
>   /* Place the trampoline just below the end of low memory, aligned to 4k 
> */
>   paging_config.trampoline_start = bios_start - TRAMPOLINE_32BIT_SIZE;
>   paging_config.trampoline_start = 
> round_down(paging_config.trampoline_start, PAGE_SIZE);
> +#else
> + paging_config.trampoline_start = 0x99000;
> +#endif

So if it's suspected to be 'Video BIOS undeclared RAM use' related then 
wouldn't a 
lower address be safer?

Such as:

paging_config.trampoline_start = 0x4;

or so?

Also, could do a puts() hexdump of the affected memory area _before_ we 
overwrite 
it? Is it empty? Could we add some debug warning that checks that it's all 
zeroes?

I also kind of regret that this remained a single commit:

 3 files changed, 120 insertions(+), 1 deletion(-)

this should be split up further:

 - one patch that adds trampoline space to the kernel image
 - one patch that calculates the trampoline address and prints the address
 - one or two patch that does the functional changes
 - (any more split-up you can think of - early boot code is very fragile!)

It will be painful to rebase x86/mm but I think it's unavoidable at this stage.

There's also a few other things I don't like in paging_prepare():

1)

/* Check if LA57 is desired and supported */
if (IS_ENABLED(CONFIG_X86_5LEVEL) && native_cpuid_eax(0) >= 7 &&
(native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31
paging_config.l5_required = 1;

... it isn't explained why this feature CPU check is so complex.

2)

+   /* Place the trampoline just below the end of low memory, aligned to 4k 
*/
+   paging_config.trampoline_start = bios_start - TRAMPOLINE_32BIT_SIZE;
+   paging_config.trampoline_start = 
round_down(paging_config.trampoline_start, PAGE_SIZE);

placing trampolines just below or just above BIOS images is fragile. Instead a 
better heuristic is to use the "middle" of suspected available RAM and work 
from 
there.

3)

+   /* Clear trampoline memory first */
+   memset(trampoline, 0, TRAMPOLINE_32BIT_SIZE);

Memory bootup state is typically all zeroes (except maybe for kexec), so this 
should check that what it's clearing doesn't contain any data.

It should probably also clear this memory _after_ use.

4)

+   /*
+* Set up a new page table that will be used for switching from 4-
+* to 5-level paging or vice versa. In other cases trampoline
+* wouldn't touch CR3.
+*
+* For 4- to 5-level paging transition, set up current CR3 as the
+* first and the only entry in a new top-level page table.
+*
+* For 5- to 4-level paging transition, copy page table pointed by
+* first entry in the current top-level page table as our new
+* top-level page table. We just cannot point to the page table
+* from trampoline as it may be above 4G.
+*/
+   if (paging_config.l5_required) {
+   trampoline[TRAMPOLINE_32BIT_PGTABLE_OFFSET] = 
__native_read_cr3() + _PAGE_TABLE_NOENC;
+   } else if (native_read_cr4() & X86_CR4_LA57) {
+   unsigned long src;
+
+   src = *(unsigned long *)__native_read_cr3() & PAGE_MASK;
+   memcpy(trampoline + TRAMPOLINE_32BIT_PGTABLE_OFFSET / 
sizeof(unsigned long),
+  (void *)src, PAGE_SIZE);
+   }

Why '+ _PAGE_TABLE_NOENC', while not ' |' ?

Also, it isn't clear what is where at this stage and it would be helpful to add 
comments explaining the general purpose.

There's also two main objects here:

 - the mode switching code trampoline
 - the trampoline pagetable

it's not clear from this code where is which - and the naming isn't overly 
clear 
either: is '*trampoline' the code, or the pagetable, or both?

We need to re-do this as we have now run into _exactly_ the kind of difficult 
to 
debug bug that I was worried about when I insisted on the many iterations of 
this 
patch-set...

Thanks,

Ingo


Re: [PATCH v12 1/3] mm, powerpc, x86: define VM_PKEY_BITx bits if CONFIG_ARCH_HAS_PKEYS is enabled

2018-02-25 Thread Ram Pai
On Sun, Feb 25, 2018 at 05:27:11PM +0530, Aneesh Kumar K.V wrote:
> 
> 
> On 02/24/2018 06:35 AM, Ram Pai wrote:
> >On Fri, Feb 23, 2018 at 03:11:45PM +0800, kbuild test robot wrote:
> >>Hi Ram,
> >>
> >>Thank you for the patch! Yet something to improve:
> >>
> >>[auto build test ERROR on linus/master]
> >>[also build test ERROR on v4.16-rc2 next-20180222]
> >>[if your patch is applied to the wrong git tree, please drop us a note to 
> >>help improve the system]
> >>
> >> chmod +x ~/bin/make.cross
> >...snip..
> >> # save the attached .config to linux build tree
> >> make.cross ARCH=powerpc
> >>
> >>Note: the 
> >>linux-review/Ram-Pai/mm-x86-powerpc-Enhancements-to-Memory-Protection-Keys/20180223-042743
> >> HEAD c5692bca45543c242ffca15c811923e4c548ed19 builds fine.
> >>   It only hurts bisectibility.
> >
> >oops, it broke git-bisect on powerpc :-(
> >The following change will fix it. This should nail it down.
> >
> >diff --git a/arch/powerpc/include/asm/pkeys.h
> >b/arch/powerpc/include/asm/pkeys.h
> >index 0409c80..0b3b669 100644
> >--- a/arch/powerpc/include/asm/pkeys.h
> >+++ b/arch/powerpc/include/asm/pkeys.h
> >@@ -25,6 +25,7 @@
> >  # define VM_PKEY_BIT1  VM_HIGH_ARCH_1
> >  # define VM_PKEY_BIT2  VM_HIGH_ARCH_2
> >  # define VM_PKEY_BIT3  VM_HIGH_ARCH_3
> >  # define VM_PKEY_BIT4  VM_HIGH_ARCH_4
> >+#elif !defined(VM_PKEY_BIT4)
> >+# define VM_PKEY_BIT4  VM_HIGH_ARCH_4
> >#endif
> >
> 
> Why don't you remove this powerpc definition completely in this
> patch? 

That was my thought too, but refrained from sneaking in the changes into
the patch, to maintain the integrity of all the reviewed-by.

Was planning on sending a seperate patch to remove the
powerpc definition entirely.

RP



Re: [PATCH v12 1/3] mm, powerpc, x86: define VM_PKEY_BITx bits if CONFIG_ARCH_HAS_PKEYS is enabled

2018-02-25 Thread Ram Pai
On Sun, Feb 25, 2018 at 05:27:11PM +0530, Aneesh Kumar K.V wrote:
> 
> 
> On 02/24/2018 06:35 AM, Ram Pai wrote:
> >On Fri, Feb 23, 2018 at 03:11:45PM +0800, kbuild test robot wrote:
> >>Hi Ram,
> >>
> >>Thank you for the patch! Yet something to improve:
> >>
> >>[auto build test ERROR on linus/master]
> >>[also build test ERROR on v4.16-rc2 next-20180222]
> >>[if your patch is applied to the wrong git tree, please drop us a note to 
> >>help improve the system]
> >>
> >> chmod +x ~/bin/make.cross
> >...snip..
> >> # save the attached .config to linux build tree
> >> make.cross ARCH=powerpc
> >>
> >>Note: the 
> >>linux-review/Ram-Pai/mm-x86-powerpc-Enhancements-to-Memory-Protection-Keys/20180223-042743
> >> HEAD c5692bca45543c242ffca15c811923e4c548ed19 builds fine.
> >>   It only hurts bisectibility.
> >
> >oops, it broke git-bisect on powerpc :-(
> >The following change will fix it. This should nail it down.
> >
> >diff --git a/arch/powerpc/include/asm/pkeys.h
> >b/arch/powerpc/include/asm/pkeys.h
> >index 0409c80..0b3b669 100644
> >--- a/arch/powerpc/include/asm/pkeys.h
> >+++ b/arch/powerpc/include/asm/pkeys.h
> >@@ -25,6 +25,7 @@
> >  # define VM_PKEY_BIT1  VM_HIGH_ARCH_1
> >  # define VM_PKEY_BIT2  VM_HIGH_ARCH_2
> >  # define VM_PKEY_BIT3  VM_HIGH_ARCH_3
> >  # define VM_PKEY_BIT4  VM_HIGH_ARCH_4
> >+#elif !defined(VM_PKEY_BIT4)
> >+# define VM_PKEY_BIT4  VM_HIGH_ARCH_4
> >#endif
> >
> 
> Why don't you remove this powerpc definition completely in this
> patch? 

That was my thought too, but refrained from sneaking in the changes into
the patch, to maintain the integrity of all the reviewed-by.

Was planning on sending a seperate patch to remove the
powerpc definition entirely.

RP



[PATCH] KVM: X86: Allow userspace to define the microcode version

2018-02-25 Thread Wanpeng Li
From: Wanpeng Li 

Linux (among the others) has checks to make sure that certain features 
aren't enabled on a certain family/model/stepping if the microcode version 
isn't greater than or equal to a known good version.

By exposing the real microcode version, we're preventing buggy guests that
don't check that they are running virtualized (i.e., they should trust the
hypervisor) from disabling features that are effectively not buggy.

Suggested-by: Filippo Sironi 
Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Wanpeng Li 
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/x86.c  | 8 ++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 938d453..6e13f2f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -507,6 +507,7 @@ struct kvm_vcpu_arch {
u64 smi_count;
bool tpr_access_reporting;
u64 ia32_xss;
+   u32 microcode_version;
 
/*
 * Paging state of the vcpu
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1a3ed81..cc51c61 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2247,7 +2247,6 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
 
switch (msr) {
case MSR_AMD64_NB_CFG:
-   case MSR_IA32_UCODE_REV:
case MSR_IA32_UCODE_WRITE:
case MSR_VM_HSAVE_PA:
case MSR_AMD64_PATCH_LOADER:
@@ -2255,6 +2254,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
case MSR_AMD64_DC_CFG:
break;
 
+   case MSR_IA32_UCODE_REV:
+   if (msr_info->host_initiated)
+   vcpu->arch.microcode_version = data >> 32;
+   break;
case MSR_EFER:
return set_efer(vcpu, data);
case MSR_K7_HWCR:
@@ -2550,7 +2553,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
msr_info->data = 0;
break;
case MSR_IA32_UCODE_REV:
-   msr_info->data = 0x1ULL;
+   msr_info->data = (u64)vcpu->arch.microcode_version << 32;
break;
case MSR_MTRRcap:
case 0x200 ... 0x2ff:
@@ -8232,6 +8235,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool 
init_event)
vcpu->arch.regs_dirty = ~0;
 
vcpu->arch.ia32_xss = 0;
+   vcpu->arch.microcode_version = 0x1;
 
kvm_x86_ops->vcpu_reset(vcpu, init_event);
 }
-- 
2.7.4



[PATCH] KVM: X86: Allow userspace to define the microcode version

2018-02-25 Thread Wanpeng Li
From: Wanpeng Li 

Linux (among the others) has checks to make sure that certain features 
aren't enabled on a certain family/model/stepping if the microcode version 
isn't greater than or equal to a known good version.

By exposing the real microcode version, we're preventing buggy guests that
don't check that they are running virtualized (i.e., they should trust the
hypervisor) from disabling features that are effectively not buggy.

Suggested-by: Filippo Sironi 
Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Wanpeng Li 
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/x86.c  | 8 ++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 938d453..6e13f2f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -507,6 +507,7 @@ struct kvm_vcpu_arch {
u64 smi_count;
bool tpr_access_reporting;
u64 ia32_xss;
+   u32 microcode_version;
 
/*
 * Paging state of the vcpu
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1a3ed81..cc51c61 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2247,7 +2247,6 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
 
switch (msr) {
case MSR_AMD64_NB_CFG:
-   case MSR_IA32_UCODE_REV:
case MSR_IA32_UCODE_WRITE:
case MSR_VM_HSAVE_PA:
case MSR_AMD64_PATCH_LOADER:
@@ -2255,6 +2254,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
case MSR_AMD64_DC_CFG:
break;
 
+   case MSR_IA32_UCODE_REV:
+   if (msr_info->host_initiated)
+   vcpu->arch.microcode_version = data >> 32;
+   break;
case MSR_EFER:
return set_efer(vcpu, data);
case MSR_K7_HWCR:
@@ -2550,7 +2553,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
msr_info->data = 0;
break;
case MSR_IA32_UCODE_REV:
-   msr_info->data = 0x1ULL;
+   msr_info->data = (u64)vcpu->arch.microcode_version << 32;
break;
case MSR_MTRRcap:
case 0x200 ... 0x2ff:
@@ -8232,6 +8235,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool 
init_event)
vcpu->arch.regs_dirty = ~0;
 
vcpu->arch.ia32_xss = 0;
+   vcpu->arch.microcode_version = 0x1;
 
kvm_x86_ops->vcpu_reset(vcpu, init_event);
 }
-- 
2.7.4



Re: [v1 1/1] xen, mm: Allow deferred page initialization for xen pv domains

2018-02-25 Thread Juergen Gross
On 24/02/18 00:25, Pavel Tatashin wrote:
> Juergen Gross noticed that commit
> f7f99100d8d ("mm: stop zeroing memory during allocation in vmemmap")
> broke XEN PV domains when deferred struct page initialization is enabled.
> 
> This is because the xen's PagePinned() flag is getting erased from struct
> pages when they are initialized later in boot.
> 
> Juergen fixed this problem by disabling deferred pages on xen pv domains.
> However, it is desirable to have this feature available, as it reduces boot
> time. This fix re-enables the feature for pv-dmains, and fixes the problem
> the following way:
> 
> The fix is to delay setting PagePinned flag until struct pages for all
> allocated memory are initialized (until free_all_bootmem()).
> 
> A new hypervisor op pv_init_ops.after_bootmem() is called to let xen know
> that boot allocator is done, and hence struct pages for all the allocated
> memory are now initialized. If deferred page initialization is enabled, the
> rest of struct pages are going to be initialized later in boot once
> page_alloc_init_late() is called.
> 
> xen_after_bootmem() is xen's implementation of pv_init_ops.after_bootmem(),
> we walk page table and mark every page as pinned.
> 
> Signed-off-by: Pavel Tatashin 
> ---
>  arch/x86/include/asm/paravirt.h   |  9 +
>  arch/x86/include/asm/paravirt_types.h |  3 +++
>  arch/x86/kernel/paravirt.c|  1 +
>  arch/x86/mm/init_32.c |  1 +
>  arch/x86/mm/init_64.c |  1 +
>  arch/x86/xen/mmu_pv.c | 38 
> ---
>  mm/page_alloc.c   |  4 
>  7 files changed, 41 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
> index 9be2bf13825b..737e596a9836 100644
> --- a/arch/x86/include/asm/paravirt.h
> +++ b/arch/x86/include/asm/paravirt.h
> @@ -820,6 +820,11 @@ static inline notrace unsigned long 
> arch_local_irq_save(void)
>  
>  extern void default_banner(void);
>  
> +static inline void paravirt_after_bootmem(void)
> +{
> + pv_init_ops.after_bootmem();
> +}
> +

Putting this in the paravirt framework is overkill IMO. There is no need
to patch the callsites for optimal performance.

I'd put it into struct x86_hyper_init and pre-init it with x86_init_noop

>  #else  /* __ASSEMBLY__ */
>  
>  #define _PVSITE(ptype, clobbers, ops, word, algn)\
> @@ -964,6 +969,10 @@ static inline void paravirt_arch_dup_mmap(struct 
> mm_struct *oldmm,
>  static inline void paravirt_arch_exit_mmap(struct mm_struct *mm)
>  {
>  }
> +
> +static inline void paravirt_after_bootmem(void)
> +{
> +}
>  #endif /* __ASSEMBLY__ */
>  #endif /* !CONFIG_PARAVIRT */
>  #endif /* _ASM_X86_PARAVIRT_H */
> diff --git a/arch/x86/include/asm/paravirt_types.h 
> b/arch/x86/include/asm/paravirt_types.h
> index 180bc0bff0fb..da78a3610168 100644
> --- a/arch/x86/include/asm/paravirt_types.h
> +++ b/arch/x86/include/asm/paravirt_types.h
> @@ -86,6 +86,9 @@ struct pv_init_ops {
>*/
>   unsigned (*patch)(u8 type, u16 clobber, void *insnbuf,
> unsigned long addr, unsigned len);
> +
> + /* called right after we finish boot allocator */
> + void (*after_bootmem)(void);
>  } __no_randomize_layout;
>  
>  
> diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
> index 99dc79e76bdc..7b5f931e2e3a 100644
> --- a/arch/x86/kernel/paravirt.c
> +++ b/arch/x86/kernel/paravirt.c
> @@ -315,6 +315,7 @@ struct pv_info pv_info = {
>  
>  struct pv_init_ops pv_init_ops = {
>   .patch = native_patch,
> + .after_bootmem = paravirt_nop,
>  };
>  
>  struct pv_time_ops pv_time_ops = {
> diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
> index 79cb066f40c0..6096d0d9ecbc 100644
> --- a/arch/x86/mm/init_32.c
> +++ b/arch/x86/mm/init_32.c
> @@ -763,6 +763,7 @@ void __init mem_init(void)
>   free_all_bootmem();
>  
>   after_bootmem = 1;
> + paravirt_after_bootmem();
>  
>   mem_init_print_info(NULL);
>   printk(KERN_INFO "virtual kernel memory layout:\n"
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 332f6e25977a..70b7b5093d07 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -1189,6 +1189,7 @@ void __init mem_init(void)
>   /* this will put all memory onto the freelists */
>   free_all_bootmem();
>   after_bootmem = 1;
> + paravirt_after_bootmem();
>  
>   /*
>* Must be done after boot memory is put on freelist, because here we
> diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
> index d20763472920..603589809334 100644
> --- a/arch/x86/xen/mmu_pv.c
> +++ b/arch/x86/xen/mmu_pv.c
> @@ -116,6 +116,8 @@ DEFINE_PER_CPU(unsigned long, xen_current_cr3);/* 
> actual vcpu cr3 */
>  
>  static phys_addr_t xen_pt_base, xen_pt_size __initdata;
>  
> +static DEFINE_STATIC_KEY_FALSE(xen_struct_pages_ready);
> +
>  /*
>   * Just beyond 

Re: [v1 1/1] xen, mm: Allow deferred page initialization for xen pv domains

2018-02-25 Thread Juergen Gross
On 24/02/18 00:25, Pavel Tatashin wrote:
> Juergen Gross noticed that commit
> f7f99100d8d ("mm: stop zeroing memory during allocation in vmemmap")
> broke XEN PV domains when deferred struct page initialization is enabled.
> 
> This is because the xen's PagePinned() flag is getting erased from struct
> pages when they are initialized later in boot.
> 
> Juergen fixed this problem by disabling deferred pages on xen pv domains.
> However, it is desirable to have this feature available, as it reduces boot
> time. This fix re-enables the feature for pv-dmains, and fixes the problem
> the following way:
> 
> The fix is to delay setting PagePinned flag until struct pages for all
> allocated memory are initialized (until free_all_bootmem()).
> 
> A new hypervisor op pv_init_ops.after_bootmem() is called to let xen know
> that boot allocator is done, and hence struct pages for all the allocated
> memory are now initialized. If deferred page initialization is enabled, the
> rest of struct pages are going to be initialized later in boot once
> page_alloc_init_late() is called.
> 
> xen_after_bootmem() is xen's implementation of pv_init_ops.after_bootmem(),
> we walk page table and mark every page as pinned.
> 
> Signed-off-by: Pavel Tatashin 
> ---
>  arch/x86/include/asm/paravirt.h   |  9 +
>  arch/x86/include/asm/paravirt_types.h |  3 +++
>  arch/x86/kernel/paravirt.c|  1 +
>  arch/x86/mm/init_32.c |  1 +
>  arch/x86/mm/init_64.c |  1 +
>  arch/x86/xen/mmu_pv.c | 38 
> ---
>  mm/page_alloc.c   |  4 
>  7 files changed, 41 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
> index 9be2bf13825b..737e596a9836 100644
> --- a/arch/x86/include/asm/paravirt.h
> +++ b/arch/x86/include/asm/paravirt.h
> @@ -820,6 +820,11 @@ static inline notrace unsigned long 
> arch_local_irq_save(void)
>  
>  extern void default_banner(void);
>  
> +static inline void paravirt_after_bootmem(void)
> +{
> + pv_init_ops.after_bootmem();
> +}
> +

Putting this in the paravirt framework is overkill IMO. There is no need
to patch the callsites for optimal performance.

I'd put it into struct x86_hyper_init and pre-init it with x86_init_noop

>  #else  /* __ASSEMBLY__ */
>  
>  #define _PVSITE(ptype, clobbers, ops, word, algn)\
> @@ -964,6 +969,10 @@ static inline void paravirt_arch_dup_mmap(struct 
> mm_struct *oldmm,
>  static inline void paravirt_arch_exit_mmap(struct mm_struct *mm)
>  {
>  }
> +
> +static inline void paravirt_after_bootmem(void)
> +{
> +}
>  #endif /* __ASSEMBLY__ */
>  #endif /* !CONFIG_PARAVIRT */
>  #endif /* _ASM_X86_PARAVIRT_H */
> diff --git a/arch/x86/include/asm/paravirt_types.h 
> b/arch/x86/include/asm/paravirt_types.h
> index 180bc0bff0fb..da78a3610168 100644
> --- a/arch/x86/include/asm/paravirt_types.h
> +++ b/arch/x86/include/asm/paravirt_types.h
> @@ -86,6 +86,9 @@ struct pv_init_ops {
>*/
>   unsigned (*patch)(u8 type, u16 clobber, void *insnbuf,
> unsigned long addr, unsigned len);
> +
> + /* called right after we finish boot allocator */
> + void (*after_bootmem)(void);
>  } __no_randomize_layout;
>  
>  
> diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
> index 99dc79e76bdc..7b5f931e2e3a 100644
> --- a/arch/x86/kernel/paravirt.c
> +++ b/arch/x86/kernel/paravirt.c
> @@ -315,6 +315,7 @@ struct pv_info pv_info = {
>  
>  struct pv_init_ops pv_init_ops = {
>   .patch = native_patch,
> + .after_bootmem = paravirt_nop,
>  };
>  
>  struct pv_time_ops pv_time_ops = {
> diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
> index 79cb066f40c0..6096d0d9ecbc 100644
> --- a/arch/x86/mm/init_32.c
> +++ b/arch/x86/mm/init_32.c
> @@ -763,6 +763,7 @@ void __init mem_init(void)
>   free_all_bootmem();
>  
>   after_bootmem = 1;
> + paravirt_after_bootmem();
>  
>   mem_init_print_info(NULL);
>   printk(KERN_INFO "virtual kernel memory layout:\n"
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 332f6e25977a..70b7b5093d07 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -1189,6 +1189,7 @@ void __init mem_init(void)
>   /* this will put all memory onto the freelists */
>   free_all_bootmem();
>   after_bootmem = 1;
> + paravirt_after_bootmem();
>  
>   /*
>* Must be done after boot memory is put on freelist, because here we
> diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
> index d20763472920..603589809334 100644
> --- a/arch/x86/xen/mmu_pv.c
> +++ b/arch/x86/xen/mmu_pv.c
> @@ -116,6 +116,8 @@ DEFINE_PER_CPU(unsigned long, xen_current_cr3);/* 
> actual vcpu cr3 */
>  
>  static phys_addr_t xen_pt_base, xen_pt_size __initdata;
>  
> +static DEFINE_STATIC_KEY_FALSE(xen_struct_pages_ready);
> +
>  /*
>   * Just beyond the highest usermode address. 

Re: [PATCHv3 1/2] zsmalloc: introduce zs_huge_object() function

2018-02-25 Thread Sergey Senozhatsky
On (02/26/18 14:58), Minchan Kim wrote:
[..]
> > Right. The changes are pretty trivial, that's why I kept then in
> > 2 simple patches. Besides, I didn't want to mix zsmalloc and zram
> > changes.
> 
> As I said earlier, it's not thing we usually do, at least, MM.
> Anyway, I don't want to insist on it because it depends each
> person's point of view what's the better for review, git-bisect.

Thanks :)

> > >   size_t huge_size = _zs_huge_object(pool);
> > >   ..
> > >   ..
> > >   if (comp_size >= huge_size)
> > >   memcpy(dst, src, 4K);
> > 
> > Yes, can do. My plan was to keep it completely internally to zsmalloc.
> > Who knows, it might become smart enough one day to do something more
> > than just size comparison. Any reason you used that leading underscore
> 
> Let's do that in future if someone want it. :)

OK.

> > in _zs_huge_object()?
> 
> 
> Nope. It's just typo. Let's think better name.
> How about using zs_huge_size()?

hm, I think `huge_size' on it's own is a bit general and cryptic.
zs_huge_object_size() or zs_huge_class_size()?

-ss


Re: [PATCHv3 1/2] zsmalloc: introduce zs_huge_object() function

2018-02-25 Thread Sergey Senozhatsky
On (02/26/18 14:58), Minchan Kim wrote:
[..]
> > Right. The changes are pretty trivial, that's why I kept then in
> > 2 simple patches. Besides, I didn't want to mix zsmalloc and zram
> > changes.
> 
> As I said earlier, it's not thing we usually do, at least, MM.
> Anyway, I don't want to insist on it because it depends each
> person's point of view what's the better for review, git-bisect.

Thanks :)

> > >   size_t huge_size = _zs_huge_object(pool);
> > >   ..
> > >   ..
> > >   if (comp_size >= huge_size)
> > >   memcpy(dst, src, 4K);
> > 
> > Yes, can do. My plan was to keep it completely internally to zsmalloc.
> > Who knows, it might become smart enough one day to do something more
> > than just size comparison. Any reason you used that leading underscore
> 
> Let's do that in future if someone want it. :)

OK.

> > in _zs_huge_object()?
> 
> 
> Nope. It's just typo. Let's think better name.
> How about using zs_huge_size()?

hm, I think `huge_size' on it's own is a bit general and cryptic.
zs_huge_object_size() or zs_huge_class_size()?

-ss


Re: [PATCH 06/21] powerpc: Avoid comparison of unsigned long >= 0 in __access_ok

2018-02-25 Thread Christophe LEROY



Le 26/02/2018 à 07:34, Christophe LEROY a écrit :



Le 25/02/2018 à 18:22, Mathieu Malaterre a écrit :

Rewrite check size - 1 <= Y as size < Y since `size` is unsigned value.
Fix warning (treated as error in W=1):

   CC  arch/powerpc/kernel/signal_32.o
In file included from ./include/linux/uaccess.h:14:0,
  from ./include/asm-generic/termios-base.h:8,
  from ./arch/powerpc/include/asm/termios.h:20,
  from ./include/uapi/linux/termios.h:6,
  from ./include/linux/tty.h:7,
  from arch/powerpc/kernel/signal_32.c:36:
./include/asm-generic/termios-base.h: In function 
‘user_termio_to_kernel_termios’:
./arch/powerpc/include/asm/uaccess.h:52:35: error: comparison of 
unsigned expression >= 0 is always true [-Werror=type-limits]

    (((size) == 0) || (((size) - 1) <= ((segment).seg - (addr)
    ^
./arch/powerpc/include/asm/uaccess.h:58:3: note: in expansion of macro 
‘__access_ok’

    __access_ok((__force unsigned long)(addr), (size), get_fs()))
    ^~~
./arch/powerpc/include/asm/uaccess.h:262:6: note: in expansion of 
macro ‘access_ok’

   if (access_ok(VERIFY_READ, __gu_addr, (size)))   \
   ^
./arch/powerpc/include/asm/uaccess.h:80:2: note: in expansion of macro 
‘__get_user_check’

   __get_user_check((x), (ptr), sizeof(*(ptr)))
   ^~~~
./include/asm-generic/termios-base.h:36:6: note: in expansion of macro 
‘get_user’

   if (get_user(termios->c_line, >c_line) < 0)
   ^~~~
[...]
cc1: all warnings being treated as errors

Signed-off-by: Mathieu Malaterre 
---
  arch/powerpc/include/asm/uaccess.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/uaccess.h 
b/arch/powerpc/include/asm/uaccess.h

index 51bfeb8777f0..fadc406bd39d 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -49,7 +49,7 @@
  #define __access_ok(addr, size, segment)    \
  (((addr) <= (segment).seg) &&    \
- (((size) == 0) || (((size) - 1) <= ((segment).seg - (addr)
+ (((size) == 0) || ((size) < ((segment).seg - (addr)


IIUC, ((2 - 1) <= 1) is the same as (2 < 1) ?



Note that I already try to submit a fix for this warning 3 years ago 
(https://patchwork.ozlabs.org/patch/418075/) and it was rejected with 
the following comment:


Again, I don't think Linux enables this warning.  What did you do to
produce this?  In any case, it's a bad warning that doesn't take macros
into account, and the answer is not to make the code less clear by hiding
the fact that zero is a special case.

Christophe




Christophe


  #endif



Re: [PATCH 06/21] powerpc: Avoid comparison of unsigned long >= 0 in __access_ok

2018-02-25 Thread Christophe LEROY



Le 26/02/2018 à 07:34, Christophe LEROY a écrit :



Le 25/02/2018 à 18:22, Mathieu Malaterre a écrit :

Rewrite check size - 1 <= Y as size < Y since `size` is unsigned value.
Fix warning (treated as error in W=1):

   CC  arch/powerpc/kernel/signal_32.o
In file included from ./include/linux/uaccess.h:14:0,
  from ./include/asm-generic/termios-base.h:8,
  from ./arch/powerpc/include/asm/termios.h:20,
  from ./include/uapi/linux/termios.h:6,
  from ./include/linux/tty.h:7,
  from arch/powerpc/kernel/signal_32.c:36:
./include/asm-generic/termios-base.h: In function 
‘user_termio_to_kernel_termios’:
./arch/powerpc/include/asm/uaccess.h:52:35: error: comparison of 
unsigned expression >= 0 is always true [-Werror=type-limits]

    (((size) == 0) || (((size) - 1) <= ((segment).seg - (addr)
    ^
./arch/powerpc/include/asm/uaccess.h:58:3: note: in expansion of macro 
‘__access_ok’

    __access_ok((__force unsigned long)(addr), (size), get_fs()))
    ^~~
./arch/powerpc/include/asm/uaccess.h:262:6: note: in expansion of 
macro ‘access_ok’

   if (access_ok(VERIFY_READ, __gu_addr, (size)))   \
   ^
./arch/powerpc/include/asm/uaccess.h:80:2: note: in expansion of macro 
‘__get_user_check’

   __get_user_check((x), (ptr), sizeof(*(ptr)))
   ^~~~
./include/asm-generic/termios-base.h:36:6: note: in expansion of macro 
‘get_user’

   if (get_user(termios->c_line, >c_line) < 0)
   ^~~~
[...]
cc1: all warnings being treated as errors

Signed-off-by: Mathieu Malaterre 
---
  arch/powerpc/include/asm/uaccess.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/uaccess.h 
b/arch/powerpc/include/asm/uaccess.h

index 51bfeb8777f0..fadc406bd39d 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -49,7 +49,7 @@
  #define __access_ok(addr, size, segment)    \
  (((addr) <= (segment).seg) &&    \
- (((size) == 0) || (((size) - 1) <= ((segment).seg - (addr)
+ (((size) == 0) || ((size) < ((segment).seg - (addr)


IIUC, ((2 - 1) <= 1) is the same as (2 < 1) ?



Note that I already try to submit a fix for this warning 3 years ago 
(https://patchwork.ozlabs.org/patch/418075/) and it was rejected with 
the following comment:


Again, I don't think Linux enables this warning.  What did you do to
produce this?  In any case, it's a bad warning that doesn't take macros
into account, and the answer is not to make the code less clear by hiding
the fact that zero is a special case.

Christophe




Christophe


  #endif



Re: [PATCH] tee: correct max value for id allocation

2018-02-25 Thread Jens Wiklander
On Fri, Feb 23, 2018 at 3:43 AM, Peng Fan  wrote:
> Hi Jens,
>
> Any comments on this patch?
>
> Thanks,
> Peng

Sorry, forgot about this one. Looks good to me, I'll pick it up.

Thanks,
Jens

> On Mon, Jan 15, 2018 at 05:27:35PM +0800, Peng Fan wrote:
>>The privileged dev id range is [TEE_NUM_DEVICES / 2, TEE_NUM_DEVICES).
>>The non-privileged dev id range is [0, TEE_NUM_DEVICES / 2).
>>
>>So when finding a slot for them, need to use different max value.
>>
>>Signed-off-by: Peng Fan 
>>Cc: Jens Wiklander 
>>---
>> drivers/tee/tee_core.c | 14 +-
>> 1 file changed, 9 insertions(+), 5 deletions(-)
>>
>>diff --git a/drivers/tee/tee_core.c b/drivers/tee/tee_core.c
>>index 3d49ac2e3c84..d1f4ecdbc8cc 100644
>>--- a/drivers/tee/tee_core.c
>>+++ b/drivers/tee/tee_core.c
>>@@ -701,7 +701,7 @@ struct tee_device *tee_device_alloc(const struct tee_desc 
>>*teedesc,
>> {
>>   struct tee_device *teedev;
>>   void *ret;
>>-  int rc;
>>+  int rc, max_id;
>>   int offs = 0;
>>
>>   if (!teedesc || !teedesc->name || !teedesc->ops ||
>>@@ -715,16 +715,20 @@ struct tee_device *tee_device_alloc(const struct 
>>tee_desc *teedesc,
>>   goto err;
>>   }
>>
>>-  if (teedesc->flags & TEE_DESC_PRIVILEGED)
>>+  max_id = TEE_NUM_DEVICES / 2;
>>+
>>+  if (teedesc->flags & TEE_DESC_PRIVILEGED) {
>>   offs = TEE_NUM_DEVICES / 2;
>>+  max_id = TEE_NUM_DEVICES;
>>+  }
>>
>>   spin_lock(_lock);
>>-  teedev->id = find_next_zero_bit(dev_mask, TEE_NUM_DEVICES, offs);
>>-  if (teedev->id < TEE_NUM_DEVICES)
>>+  teedev->id = find_next_zero_bit(dev_mask, max_id, offs);
>>+  if (teedev->id < max_id)
>>   set_bit(teedev->id, dev_mask);
>>   spin_unlock(_lock);
>>
>>-  if (teedev->id >= TEE_NUM_DEVICES) {
>>+  if (teedev->id >= max_id) {
>>   ret = ERR_PTR(-ENOMEM);
>>   goto err;
>>   }
>>--
>>2.14.1
>>
>
> --


Re: [PATCH] tee: correct max value for id allocation

2018-02-25 Thread Jens Wiklander
On Fri, Feb 23, 2018 at 3:43 AM, Peng Fan  wrote:
> Hi Jens,
>
> Any comments on this patch?
>
> Thanks,
> Peng

Sorry, forgot about this one. Looks good to me, I'll pick it up.

Thanks,
Jens

> On Mon, Jan 15, 2018 at 05:27:35PM +0800, Peng Fan wrote:
>>The privileged dev id range is [TEE_NUM_DEVICES / 2, TEE_NUM_DEVICES).
>>The non-privileged dev id range is [0, TEE_NUM_DEVICES / 2).
>>
>>So when finding a slot for them, need to use different max value.
>>
>>Signed-off-by: Peng Fan 
>>Cc: Jens Wiklander 
>>---
>> drivers/tee/tee_core.c | 14 +-
>> 1 file changed, 9 insertions(+), 5 deletions(-)
>>
>>diff --git a/drivers/tee/tee_core.c b/drivers/tee/tee_core.c
>>index 3d49ac2e3c84..d1f4ecdbc8cc 100644
>>--- a/drivers/tee/tee_core.c
>>+++ b/drivers/tee/tee_core.c
>>@@ -701,7 +701,7 @@ struct tee_device *tee_device_alloc(const struct tee_desc 
>>*teedesc,
>> {
>>   struct tee_device *teedev;
>>   void *ret;
>>-  int rc;
>>+  int rc, max_id;
>>   int offs = 0;
>>
>>   if (!teedesc || !teedesc->name || !teedesc->ops ||
>>@@ -715,16 +715,20 @@ struct tee_device *tee_device_alloc(const struct 
>>tee_desc *teedesc,
>>   goto err;
>>   }
>>
>>-  if (teedesc->flags & TEE_DESC_PRIVILEGED)
>>+  max_id = TEE_NUM_DEVICES / 2;
>>+
>>+  if (teedesc->flags & TEE_DESC_PRIVILEGED) {
>>   offs = TEE_NUM_DEVICES / 2;
>>+  max_id = TEE_NUM_DEVICES;
>>+  }
>>
>>   spin_lock(_lock);
>>-  teedev->id = find_next_zero_bit(dev_mask, TEE_NUM_DEVICES, offs);
>>-  if (teedev->id < TEE_NUM_DEVICES)
>>+  teedev->id = find_next_zero_bit(dev_mask, max_id, offs);
>>+  if (teedev->id < max_id)
>>   set_bit(teedev->id, dev_mask);
>>   spin_unlock(_lock);
>>
>>-  if (teedev->id >= TEE_NUM_DEVICES) {
>>+  if (teedev->id >= max_id) {
>>   ret = ERR_PTR(-ENOMEM);
>>   goto err;
>>   }
>>--
>>2.14.1
>>
>
> --


Re: [PATCH] mm: Fix races between address_space dereference and free in page_evicatable

2018-02-25 Thread Huang, Ying
Minchan Kim  writes:

> Hi Jan,
>
> On Mon, Feb 19, 2018 at 11:57:35AM +0100, Jan Kara wrote:
>> Hi Minchan,
>> 
>> On Sun 18-02-18 18:22:45, Minchan Kim wrote:
>> > On Mon, Feb 12, 2018 at 04:12:27PM +0800, Huang, Ying wrote:
>> > > From: Huang Ying 
>> > > 
>> > > When page_mapping() is called and the mapping is dereferenced in
>> > > page_evicatable() through shrink_active_list(), it is possible for the
>> > > inode to be truncated and the embedded address space to be freed at
>> > > the same time.  This may lead to the following race.
>> > > 
>> > > CPU1CPU2
>> > > 
>> > > truncate(inode) shrink_active_list()
>> > >   ... 
>> > > page_evictable(page)
>> > >   truncate_inode_page(mapping, page);
>> > > delete_from_page_cache(page)
>> > >   spin_lock_irqsave(>tree_lock, flags);
>> > > __delete_from_page_cache(page, NULL)
>> > >   page_cache_tree_delete(..)
>> > > ... mapping = 
>> > > page_mapping(page);
>> > > page->mapping = NULL;
>> > > ...
>> > >   spin_unlock_irqrestore(>tree_lock, flags);
>> > >   page_cache_free_page(mapping, page)
>> > > put_page(page)
>> > >   if (put_page_testzero(page)) -> false
>> > > - inode now has no pages and can be freed including embedded 
>> > > address_space
>> > > 
>> > > 
>> > > mapping_unevictable(mapping)
>> > >
>> > > test_bit(AS_UNEVICTABLE, >flags);
>> > > - we've dereferenced mapping which is potentially already free.
>> > > 
>> > > Similar race exists between swap cache freeing and page_evicatable() too.
>> > > 
>> > > The address_space in inode and swap cache will be freed after a RCU
>> > > grace period.  So the races are fixed via enclosing the page_mapping()
>> > > and address_space usage in rcu_read_lock/unlock().  Some comments are
>> > > added in code to make it clear what is protected by the RCU read lock.
>> > 
>> > Is it always true for every FSes, even upcoming FSes?
>> > IOW, do we have any strict rule FS folks must use RCU(i.e., call_rcu)
>> > to destroy inode?
>> > 
>> > Let's cc linux-fs.
>> 
>> That's actually a good question. Pathname lookup relies on inodes being
>> protected by RCU so "normal" filesystems definitely need to use RCU freeing
>> of inodes. OTOH a filesystem could in theory refuse any attempt for RCU
>> pathname walk (in its .d_revalidate/.d_compare callback) and then get away
>> with freeing its inodes normally AFAICT. I don't see that happening
>> anywhere in the tree but in theory it is possible with some effort... But
>> frankly I don't see a good reason for that so all we should do is to
>> document that .destroy_inode needs to free the inode structure through RCU
>> if it uses page cache? Al?
>
> Yub, it would be much better. However, how does this patch fix the problem?
> Although it can make only page_evictable safe, we could go with the page
> further and finally uses page->mapping, again.
> For instance,
>
> shrink_active_list
>   page_evictable();
>   ..
>   page_referened()
>   page_rmapping
>   page->mapping

This only checks the value of page->mapping, not deference
page->mapping.  So it should be safe.

Best Regards,
Huang, Ying

> I think caller should lock the page to protect entire operation, which
> have been used more widely to pin a address_space.
>
> Thanks.


Re: [PATCH] mm: Fix races between address_space dereference and free in page_evicatable

2018-02-25 Thread Huang, Ying
Minchan Kim  writes:

> Hi Jan,
>
> On Mon, Feb 19, 2018 at 11:57:35AM +0100, Jan Kara wrote:
>> Hi Minchan,
>> 
>> On Sun 18-02-18 18:22:45, Minchan Kim wrote:
>> > On Mon, Feb 12, 2018 at 04:12:27PM +0800, Huang, Ying wrote:
>> > > From: Huang Ying 
>> > > 
>> > > When page_mapping() is called and the mapping is dereferenced in
>> > > page_evicatable() through shrink_active_list(), it is possible for the
>> > > inode to be truncated and the embedded address space to be freed at
>> > > the same time.  This may lead to the following race.
>> > > 
>> > > CPU1CPU2
>> > > 
>> > > truncate(inode) shrink_active_list()
>> > >   ... 
>> > > page_evictable(page)
>> > >   truncate_inode_page(mapping, page);
>> > > delete_from_page_cache(page)
>> > >   spin_lock_irqsave(>tree_lock, flags);
>> > > __delete_from_page_cache(page, NULL)
>> > >   page_cache_tree_delete(..)
>> > > ... mapping = 
>> > > page_mapping(page);
>> > > page->mapping = NULL;
>> > > ...
>> > >   spin_unlock_irqrestore(>tree_lock, flags);
>> > >   page_cache_free_page(mapping, page)
>> > > put_page(page)
>> > >   if (put_page_testzero(page)) -> false
>> > > - inode now has no pages and can be freed including embedded 
>> > > address_space
>> > > 
>> > > 
>> > > mapping_unevictable(mapping)
>> > >
>> > > test_bit(AS_UNEVICTABLE, >flags);
>> > > - we've dereferenced mapping which is potentially already free.
>> > > 
>> > > Similar race exists between swap cache freeing and page_evicatable() too.
>> > > 
>> > > The address_space in inode and swap cache will be freed after a RCU
>> > > grace period.  So the races are fixed via enclosing the page_mapping()
>> > > and address_space usage in rcu_read_lock/unlock().  Some comments are
>> > > added in code to make it clear what is protected by the RCU read lock.
>> > 
>> > Is it always true for every FSes, even upcoming FSes?
>> > IOW, do we have any strict rule FS folks must use RCU(i.e., call_rcu)
>> > to destroy inode?
>> > 
>> > Let's cc linux-fs.
>> 
>> That's actually a good question. Pathname lookup relies on inodes being
>> protected by RCU so "normal" filesystems definitely need to use RCU freeing
>> of inodes. OTOH a filesystem could in theory refuse any attempt for RCU
>> pathname walk (in its .d_revalidate/.d_compare callback) and then get away
>> with freeing its inodes normally AFAICT. I don't see that happening
>> anywhere in the tree but in theory it is possible with some effort... But
>> frankly I don't see a good reason for that so all we should do is to
>> document that .destroy_inode needs to free the inode structure through RCU
>> if it uses page cache? Al?
>
> Yub, it would be much better. However, how does this patch fix the problem?
> Although it can make only page_evictable safe, we could go with the page
> further and finally uses page->mapping, again.
> For instance,
>
> shrink_active_list
>   page_evictable();
>   ..
>   page_referened()
>   page_rmapping
>   page->mapping

This only checks the value of page->mapping, not deference
page->mapping.  So it should be safe.

Best Regards,
Huang, Ying

> I think caller should lock the page to protect entire operation, which
> have been used more widely to pin a address_space.
>
> Thanks.


Re: [PATCH v3] printk: Relocate wake_klogd check close to the end of console_unlock()

2018-02-25 Thread Sergey Senozhatsky
On (02/19/18 17:01), Petr Mladek wrote:
[..]
> - raw_spin_lock(_lock);
>   retry = console_seq != log_next_seq;
> + /*
> +  * Check whether userland needs notification. Do this only when really
> +  * leaving to avoid race with console_trylock_spinning().
> +  */
> + if (seen_seq != log_next_seq && !retry) {
> + wake_klogd = true;
> + seen_seq = log_next_seq;
> + }

Let's add the "why" part. This "!retry" might be hard to understand. We
are looking at

- CPUa is about to leave console_unlock()
- printk on CPUb appends a new message
- CPUa detects that `console_seq != log_next_seq', updates `seen_seq'
- printk on CPUb is getting preempted
- CPUa re-takes the console_sem via retry path
- printk CPUb is becoming TASK_RUNNING again - it now spins for console_sem,
  since we have an active console_sem owner
- CPUa detects that there is a console_sem waiter, so it offloads the
  printing task, without ever waking up klogd


Either we can have that complex "seen_seq != log_next_seq && !retry"
check - or we simply can add

if (console_lock_spinning_disable_and_check()) {
printk_safe_exit_irqrestore(flags);
if (wake_klogd)
wake_up_klogd();
}

to the offloading return path.

The later is *may be* simpler to follow. The rule is: every
!console_suspend and !cant-use-consoles return path from console_unlock()
must wake_up_klogd() [if needed].

-ss


Re: [PATCH v3] printk: Relocate wake_klogd check close to the end of console_unlock()

2018-02-25 Thread Sergey Senozhatsky
On (02/19/18 17:01), Petr Mladek wrote:
[..]
> - raw_spin_lock(_lock);
>   retry = console_seq != log_next_seq;
> + /*
> +  * Check whether userland needs notification. Do this only when really
> +  * leaving to avoid race with console_trylock_spinning().
> +  */
> + if (seen_seq != log_next_seq && !retry) {
> + wake_klogd = true;
> + seen_seq = log_next_seq;
> + }

Let's add the "why" part. This "!retry" might be hard to understand. We
are looking at

- CPUa is about to leave console_unlock()
- printk on CPUb appends a new message
- CPUa detects that `console_seq != log_next_seq', updates `seen_seq'
- printk on CPUb is getting preempted
- CPUa re-takes the console_sem via retry path
- printk CPUb is becoming TASK_RUNNING again - it now spins for console_sem,
  since we have an active console_sem owner
- CPUa detects that there is a console_sem waiter, so it offloads the
  printing task, without ever waking up klogd


Either we can have that complex "seen_seq != log_next_seq && !retry"
check - or we simply can add

if (console_lock_spinning_disable_and_check()) {
printk_safe_exit_irqrestore(flags);
if (wake_klogd)
wake_up_klogd();
}

to the offloading return path.

The later is *may be* simpler to follow. The rule is: every
!console_suspend and !cant-use-consoles return path from console_unlock()
must wake_up_klogd() [if needed].

-ss


Re: [PATCH 06/21] powerpc: Avoid comparison of unsigned long >= 0 in __access_ok

2018-02-25 Thread Christophe LEROY



Le 25/02/2018 à 18:22, Mathieu Malaterre a écrit :

Rewrite check size - 1 <= Y as size < Y since `size` is unsigned value.
Fix warning (treated as error in W=1):

   CC  arch/powerpc/kernel/signal_32.o
In file included from ./include/linux/uaccess.h:14:0,
  from ./include/asm-generic/termios-base.h:8,
  from ./arch/powerpc/include/asm/termios.h:20,
  from ./include/uapi/linux/termios.h:6,
  from ./include/linux/tty.h:7,
  from arch/powerpc/kernel/signal_32.c:36:
./include/asm-generic/termios-base.h: In function 
‘user_termio_to_kernel_termios’:
./arch/powerpc/include/asm/uaccess.h:52:35: error: comparison of unsigned 
expression >= 0 is always true [-Werror=type-limits]
(((size) == 0) || (((size) - 1) <= ((segment).seg - (addr)
^
./arch/powerpc/include/asm/uaccess.h:58:3: note: in expansion of macro 
‘__access_ok’
__access_ok((__force unsigned long)(addr), (size), get_fs()))
^~~
./arch/powerpc/include/asm/uaccess.h:262:6: note: in expansion of macro 
‘access_ok’
   if (access_ok(VERIFY_READ, __gu_addr, (size)))   \
   ^
./arch/powerpc/include/asm/uaccess.h:80:2: note: in expansion of macro 
‘__get_user_check’
   __get_user_check((x), (ptr), sizeof(*(ptr)))
   ^~~~
./include/asm-generic/termios-base.h:36:6: note: in expansion of macro 
‘get_user’
   if (get_user(termios->c_line, >c_line) < 0)
   ^~~~
[...]
cc1: all warnings being treated as errors

Signed-off-by: Mathieu Malaterre 
---
  arch/powerpc/include/asm/uaccess.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/uaccess.h 
b/arch/powerpc/include/asm/uaccess.h
index 51bfeb8777f0..fadc406bd39d 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -49,7 +49,7 @@
  
  #define __access_ok(addr, size, segment)	\

(((addr) <= (segment).seg) &&\
-(((size) == 0) || (((size) - 1) <= ((segment).seg - (addr)
+(((size) == 0) || ((size) < ((segment).seg - (addr)


IIUC, ((2 - 1) <= 1) is the same as (2 < 1) ?

Christophe

  
  #endif
  



Re: [PATCH 06/21] powerpc: Avoid comparison of unsigned long >= 0 in __access_ok

2018-02-25 Thread Christophe LEROY



Le 25/02/2018 à 18:22, Mathieu Malaterre a écrit :

Rewrite check size - 1 <= Y as size < Y since `size` is unsigned value.
Fix warning (treated as error in W=1):

   CC  arch/powerpc/kernel/signal_32.o
In file included from ./include/linux/uaccess.h:14:0,
  from ./include/asm-generic/termios-base.h:8,
  from ./arch/powerpc/include/asm/termios.h:20,
  from ./include/uapi/linux/termios.h:6,
  from ./include/linux/tty.h:7,
  from arch/powerpc/kernel/signal_32.c:36:
./include/asm-generic/termios-base.h: In function 
‘user_termio_to_kernel_termios’:
./arch/powerpc/include/asm/uaccess.h:52:35: error: comparison of unsigned 
expression >= 0 is always true [-Werror=type-limits]
(((size) == 0) || (((size) - 1) <= ((segment).seg - (addr)
^
./arch/powerpc/include/asm/uaccess.h:58:3: note: in expansion of macro 
‘__access_ok’
__access_ok((__force unsigned long)(addr), (size), get_fs()))
^~~
./arch/powerpc/include/asm/uaccess.h:262:6: note: in expansion of macro 
‘access_ok’
   if (access_ok(VERIFY_READ, __gu_addr, (size)))   \
   ^
./arch/powerpc/include/asm/uaccess.h:80:2: note: in expansion of macro 
‘__get_user_check’
   __get_user_check((x), (ptr), sizeof(*(ptr)))
   ^~~~
./include/asm-generic/termios-base.h:36:6: note: in expansion of macro 
‘get_user’
   if (get_user(termios->c_line, >c_line) < 0)
   ^~~~
[...]
cc1: all warnings being treated as errors

Signed-off-by: Mathieu Malaterre 
---
  arch/powerpc/include/asm/uaccess.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/uaccess.h 
b/arch/powerpc/include/asm/uaccess.h
index 51bfeb8777f0..fadc406bd39d 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -49,7 +49,7 @@
  
  #define __access_ok(addr, size, segment)	\

(((addr) <= (segment).seg) &&\
-(((size) == 0) || (((size) - 1) <= ((segment).seg - (addr)
+(((size) == 0) || ((size) < ((segment).seg - (addr)


IIUC, ((2 - 1) <= 1) is the same as (2 < 1) ?

Christophe

  
  #endif
  



Re: [PATCH 05/21] powerpc: Avoid comparison of unsigned long >= 0 in pfn_valid

2018-02-25 Thread Christophe LEROY



Le 25/02/2018 à 18:22, Mathieu Malaterre a écrit :

Rewrite comparison since all values compared are of type `unsigned long`.

Fix a warning (treated as error in W=1):

   CC  arch/powerpc/kernel/irq.o
In file included from ./include/linux/bug.h:5:0,
  from ./include/linux/cpumask.h:13,
  from ./include/linux/smp.h:13,
  from ./include/linux/kernel_stat.h:5,
  from arch/powerpc/kernel/irq.c:35:
./include/linux/dma-mapping.h: In function ‘dma_map_resource’:
./arch/powerpc/include/asm/page.h:129:32: error: comparison of unsigned expression 
>= 0 is always true [-Werror=type-limits]
  #define pfn_valid(pfn)  ((pfn) >= ARCH_PFN_OFFSET && (pfn) < max_mapnr)
 ^
Suggested-by: Segher Boessenkool 
Signed-off-by: Mathieu Malaterre 
---
  arch/powerpc/include/asm/page.h | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index 8da5d4c1cab2..19dea64e7ed2 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -126,7 +126,8 @@ extern long long virt_phys_offset;
  
  #ifdef CONFIG_FLATMEM

  #define ARCH_PFN_OFFSET   ((unsigned long)(MEMORY_START >> 
PAGE_SHIFT))
-#define pfn_valid(pfn) ((pfn) >= ARCH_PFN_OFFSET && (pfn) < max_mapnr)
+#define pfn_valid(pfn) \
+   (((pfn) - ARCH_PFN_OFFSET) < (max_mapnr - ARCH_PFN_OFFSET))


What will happen when ARCH_PFN_OFFSET is not nul and pfn is lower than 
ARCH_PFN_OFFSET ?


Christophe


  #endif
  
  #define virt_to_pfn(kaddr)	(__pa(kaddr) >> PAGE_SHIFT)




Re: [PATCH 05/21] powerpc: Avoid comparison of unsigned long >= 0 in pfn_valid

2018-02-25 Thread Christophe LEROY



Le 25/02/2018 à 18:22, Mathieu Malaterre a écrit :

Rewrite comparison since all values compared are of type `unsigned long`.

Fix a warning (treated as error in W=1):

   CC  arch/powerpc/kernel/irq.o
In file included from ./include/linux/bug.h:5:0,
  from ./include/linux/cpumask.h:13,
  from ./include/linux/smp.h:13,
  from ./include/linux/kernel_stat.h:5,
  from arch/powerpc/kernel/irq.c:35:
./include/linux/dma-mapping.h: In function ‘dma_map_resource’:
./arch/powerpc/include/asm/page.h:129:32: error: comparison of unsigned expression 
>= 0 is always true [-Werror=type-limits]
  #define pfn_valid(pfn)  ((pfn) >= ARCH_PFN_OFFSET && (pfn) < max_mapnr)
 ^
Suggested-by: Segher Boessenkool 
Signed-off-by: Mathieu Malaterre 
---
  arch/powerpc/include/asm/page.h | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index 8da5d4c1cab2..19dea64e7ed2 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -126,7 +126,8 @@ extern long long virt_phys_offset;
  
  #ifdef CONFIG_FLATMEM

  #define ARCH_PFN_OFFSET   ((unsigned long)(MEMORY_START >> 
PAGE_SHIFT))
-#define pfn_valid(pfn) ((pfn) >= ARCH_PFN_OFFSET && (pfn) < max_mapnr)
+#define pfn_valid(pfn) \
+   (((pfn) - ARCH_PFN_OFFSET) < (max_mapnr - ARCH_PFN_OFFSET))


What will happen when ARCH_PFN_OFFSET is not nul and pfn is lower than 
ARCH_PFN_OFFSET ?


Christophe


  #endif
  
  #define virt_to_pfn(kaddr)	(__pa(kaddr) >> PAGE_SHIFT)




Re: [PATCH v2] printk: Relocate wake_klogd check close to the end of console_unlock()

2018-02-25 Thread Sergey Senozhatsky
Hello,

Sorry for the delay. Could not reply sooner.


On (02/19/18 16:58), Petr Mladek wrote:
[..]
> > @@ -2417,12 +2413,17 @@ void console_unlock(void)
> > up_console_sem();
> >  
> > /*
> > -* Someone could have filled up the buffer again, so re-check if there's
> > -* something to flush. In case we cannot trylock the console_sem again,
> > -* there's a new owner and the console_unlock() from them will do the
> > -* flush, no worries.
> > +* Check whether userland needs notification.  Also, someone could
> > +* have filled up the buffer again, so re-check if there's
> > +* something to flush. In case we cannot trylock the console_sem
> > +* again, there's a new owner and the console_unlock() from them
> > +* will do the flush, no worries.
> >  */
> > raw_spin_lock(_lock);
> > +   if (seen_seq != log_next_seq) {
> > +   wake_klogd = true;
> > +   seen_seq = log_next_seq;
> 
> Sigh, there is actually still a race with console_trylock_spinning().

I see what you are talking about. Good catch.

> The simplest solution seems to be to do this only when !retry.

Yeah, offloading makes any "internal state" flags almost impossible to
use. We either need to upgrade those "internal state" flags to "global
state" flags, or to add more complex checks/race conditions workarounds
and so on. So I think that !retry should work. But, honestly, the whole
wakeup_klogd thing is getting too complex to like it.

-ss


Re: [PATCH v2] printk: Relocate wake_klogd check close to the end of console_unlock()

2018-02-25 Thread Sergey Senozhatsky
Hello,

Sorry for the delay. Could not reply sooner.


On (02/19/18 16:58), Petr Mladek wrote:
[..]
> > @@ -2417,12 +2413,17 @@ void console_unlock(void)
> > up_console_sem();
> >  
> > /*
> > -* Someone could have filled up the buffer again, so re-check if there's
> > -* something to flush. In case we cannot trylock the console_sem again,
> > -* there's a new owner and the console_unlock() from them will do the
> > -* flush, no worries.
> > +* Check whether userland needs notification.  Also, someone could
> > +* have filled up the buffer again, so re-check if there's
> > +* something to flush. In case we cannot trylock the console_sem
> > +* again, there's a new owner and the console_unlock() from them
> > +* will do the flush, no worries.
> >  */
> > raw_spin_lock(_lock);
> > +   if (seen_seq != log_next_seq) {
> > +   wake_klogd = true;
> > +   seen_seq = log_next_seq;
> 
> Sigh, there is actually still a race with console_trylock_spinning().

I see what you are talking about. Good catch.

> The simplest solution seems to be to do this only when !retry.

Yeah, offloading makes any "internal state" flags almost impossible to
use. We either need to upgrade those "internal state" flags to "global
state" flags, or to add more complex checks/race conditions workarounds
and so on. So I think that !retry should work. But, honestly, the whole
wakeup_klogd thing is getting too complex to like it.

-ss


Re: [PATCH v2] Input: gpio_keys: Add level trigger support for GPIO keys

2018-02-25 Thread Baolin Wang
Hi Rob,

On 21 February 2018 at 19:35, Baolin Wang  wrote:
> Hi Rob,
>
> On 20 February 2018 at 02:11, Rob Herring  wrote:
>> On Sun, Feb 11, 2018 at 02:55:04PM +0800, Baolin Wang wrote:
>>> On some platforms (such as Spreadtrum platform), the GPIO keys can only
>>> be triggered by level type. So this patch introduces one property to
>>> indicate if the GPIO trigger type is level trigger or edge trigger.
>>
>> If the parent interrupt controller only supports a certain trigger, then
>> it should ignore setting the trigger type.
>
> We still need to set high level type trigger or low level type trigger
> if it only supports level trigger.
>
>>
>>>
>>> Signed-off-by: Baolin Wang 
>>> ---
>>> Changes since v1:
>>>  - Diable the GPIO irq until reversing the GPIO level type.
>>> ---
>>>  .../devicetree/bindings/input/gpio-keys.txt|2 ++
>>>  drivers/input/keyboard/gpio_keys.c |   26 
>>> +++-
>>>  include/linux/gpio_keys.h  |1 +
>>>  3 files changed, 28 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/Documentation/devicetree/bindings/input/gpio-keys.txt 
>>> b/Documentation/devicetree/bindings/input/gpio-keys.txt
>>> index a949404..e3104bd 100644
>>> --- a/Documentation/devicetree/bindings/input/gpio-keys.txt
>>> +++ b/Documentation/devicetree/bindings/input/gpio-keys.txt
>>> @@ -29,6 +29,8 @@ Optional subnode-properties:
>>>   - linux,can-disable: Boolean, indicates that button is connected
>>> to dedicated (not shared) interrupt which can be disabled to
>>> suppress events from the button.
>>> + - gpio-key,level-trigger: Boolean, indicates that button's interrupt
>>> +   type is level trigger. Otherwise it is edge trigger as default.
>>
>> No. Just use 'interrupts' instead of 'gpios' and specify the trigger
>> type. Or put both if you need to read the state.
>
> Okay, so something as below to get the level type from the
> 'interrupts' property.
> if (fwnode_property_read_u32(child, "interrupts", >level_type))
> button->level_type = IRQ_TYPE_NONE;

After more thinking, if we use 'interrupts' to indicate the irq type
for this case, we cannot specify the irq number due to the irq number
should be get by gpiod_to_irq(). So the device nodes look weird, since
we should define the index of the interrupt controller instead of the
irq type if the #interrupt_cells is set to 1 according to the
interrupt controller documentation. What do you think about this?
Thanks.

gpio-keys {
compatible = "gpio-keys";

key-volumedown {
label = "Volume Down Key";
linux,code = ;
gpios = <_eic_debounce 2 GPIO_ACTIVE_LOW>;
debounce-interval = <2>;
wakeup-source;
interrupts = ;
};

key-volumeup {
label = "Volume Up Key";
linux,code = ;
gpios = <_eic 10 GPIO_ACTIVE_HIGH>;
debounce-interval = <2>;
wakeup-source;
interrupts = ;
};

key-power {
label = "Power Key";
linux,code = ;
gpios = <_eic 1 GPIO_ACTIVE_HIGH>;
wakeup-source;
interrupts = ;
};
};

-- 
Baolin Wang
Best Regards


Re: [PATCH v2] Input: gpio_keys: Add level trigger support for GPIO keys

2018-02-25 Thread Baolin Wang
Hi Rob,

On 21 February 2018 at 19:35, Baolin Wang  wrote:
> Hi Rob,
>
> On 20 February 2018 at 02:11, Rob Herring  wrote:
>> On Sun, Feb 11, 2018 at 02:55:04PM +0800, Baolin Wang wrote:
>>> On some platforms (such as Spreadtrum platform), the GPIO keys can only
>>> be triggered by level type. So this patch introduces one property to
>>> indicate if the GPIO trigger type is level trigger or edge trigger.
>>
>> If the parent interrupt controller only supports a certain trigger, then
>> it should ignore setting the trigger type.
>
> We still need to set high level type trigger or low level type trigger
> if it only supports level trigger.
>
>>
>>>
>>> Signed-off-by: Baolin Wang 
>>> ---
>>> Changes since v1:
>>>  - Diable the GPIO irq until reversing the GPIO level type.
>>> ---
>>>  .../devicetree/bindings/input/gpio-keys.txt|2 ++
>>>  drivers/input/keyboard/gpio_keys.c |   26 
>>> +++-
>>>  include/linux/gpio_keys.h  |1 +
>>>  3 files changed, 28 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/Documentation/devicetree/bindings/input/gpio-keys.txt 
>>> b/Documentation/devicetree/bindings/input/gpio-keys.txt
>>> index a949404..e3104bd 100644
>>> --- a/Documentation/devicetree/bindings/input/gpio-keys.txt
>>> +++ b/Documentation/devicetree/bindings/input/gpio-keys.txt
>>> @@ -29,6 +29,8 @@ Optional subnode-properties:
>>>   - linux,can-disable: Boolean, indicates that button is connected
>>> to dedicated (not shared) interrupt which can be disabled to
>>> suppress events from the button.
>>> + - gpio-key,level-trigger: Boolean, indicates that button's interrupt
>>> +   type is level trigger. Otherwise it is edge trigger as default.
>>
>> No. Just use 'interrupts' instead of 'gpios' and specify the trigger
>> type. Or put both if you need to read the state.
>
> Okay, so something as below to get the level type from the
> 'interrupts' property.
> if (fwnode_property_read_u32(child, "interrupts", >level_type))
> button->level_type = IRQ_TYPE_NONE;

After more thinking, if we use 'interrupts' to indicate the irq type
for this case, we cannot specify the irq number due to the irq number
should be get by gpiod_to_irq(). So the device nodes look weird, since
we should define the index of the interrupt controller instead of the
irq type if the #interrupt_cells is set to 1 according to the
interrupt controller documentation. What do you think about this?
Thanks.

gpio-keys {
compatible = "gpio-keys";

key-volumedown {
label = "Volume Down Key";
linux,code = ;
gpios = <_eic_debounce 2 GPIO_ACTIVE_LOW>;
debounce-interval = <2>;
wakeup-source;
interrupts = ;
};

key-volumeup {
label = "Volume Up Key";
linux,code = ;
gpios = <_eic 10 GPIO_ACTIVE_HIGH>;
debounce-interval = <2>;
wakeup-source;
interrupts = ;
};

key-power {
label = "Power Key";
linux,code = ;
gpios = <_eic 1 GPIO_ACTIVE_HIGH>;
wakeup-source;
interrupts = ;
};
};

-- 
Baolin Wang
Best Regards


Re: [PATCH v4 08/10] ARM: sunxi: smp: Move assembly code into a file

2018-02-25 Thread kbuild test robot
Hi Mylène,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on next-20180223]
[cannot apply to arm-soc/for-next robh/for-next linux-rpi/for-rpi-next 
v4.16-rc2 v4.16-rc1 v4.15 v4.16-rc3]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Myl-ne-Josserand/Sunxi-Add-SMP-support-on-A83T/20180226-035312
config: arm-allmodconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm 

All errors (new ones prefixed by >>):

   arch/arm/mach-sunxi/headsmp.S: Assembler messages:
>> arch/arm/mach-sunxi/headsmp.S:22: Error: selected processor does not support 
>> `movw r2,#(0xff00fff0&0x)' in ARM mode
>> arch/arm/mach-sunxi/headsmp.S:23: Error: selected processor does not support 
>> `movt r2,#(0xff00fff0>>16)' in ARM mode
   arch/arm/mach-sunxi/headsmp.S:25: Error: selected processor does not support 
`movw r2,#(0x4100c0f0&0x)' in ARM mode
   arch/arm/mach-sunxi/headsmp.S:26: Error: selected processor does not support 
`movt r2,#(0x4100c0f0>>16)' in ARM mode

vim +22 arch/arm/mach-sunxi/headsmp.S

13  
14  ENTRY(sunxi_mc_smp_cluster_cache_enable)
15  /*
16  * Enable cluster-level coherency, in preparation for turning on 
the MMU.
17  *
18  * Also enable regional clock gating and L2 data latency 
settings for
19  * Cortex-A15. These settings are from the vendor kernel.
20  */
21  mrc p15, 0, r1, c0, c0, 0
  > 22  movwr2, #(0xff00fff0&0x)
  > 23  movtr2, #(0xff00fff0>>16)
24  and r1, r1, r2
25  movwr2, #(0x4100c0f0&0x)
26  movtr2, #(0x4100c0f0>>16)
27  cmp r1, r2
28  bne not_a15
29  
30  /* The following is Cortex-A15 specific */
31  
32  /* ACTLR2: Enable CPU regional clock gates */
33  mrc p15, 1, r1, c15, c0, 4
34  orr r1, r1, #(0x1<<31)
35  mcr p15, 1, r1, c15, c0, 4
36  
37  /* L2ACTLR */
38  mrc p15, 1, r1, c15, c0, 0
39  /* Enable L2, GIC, and Timer regional clock gates */
40  orr r1, r1, #(0x1<<26)
41  /* Disable clean/evict from being pushed to external */
42  orr r1, r1, #(0x1<<3)
43  mcr p15, 1, r1, c15, c0, 0
44  
45  /* L2CTRL: L2 data RAM latency */
46  mrc p15, 1, r1, c9, c0, 2
47  bic r1, r1, #(0x7<<0)
48  orr r1, r1, #(0x3<<0)
49  mcr p15, 1, r1, c9, c0, 2
50  
51  /* End of Cortex-A15 specific setup */
52  not_a15:
53  
54  /* Get value of sunxi_mc_smp_first_comer */
55  adr r1, first
56  ldr r0, [r1]
57  ldr r0, [r1, r0]
58  
59  /* Skip cci_enable_port_for_self if not first comer */
60  cmp r0, #0
61  bxeqlr
62  b   cci_enable_port_for_self
63  
64  .align 2
65  first: .word sunxi_mc_smp_first_comer - .
66  ENDPROC(sunxi_mc_smp_cluster_cache_enable)
67  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH v4 08/10] ARM: sunxi: smp: Move assembly code into a file

2018-02-25 Thread kbuild test robot
Hi Mylène,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on next-20180223]
[cannot apply to arm-soc/for-next robh/for-next linux-rpi/for-rpi-next 
v4.16-rc2 v4.16-rc1 v4.15 v4.16-rc3]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Myl-ne-Josserand/Sunxi-Add-SMP-support-on-A83T/20180226-035312
config: arm-allmodconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm 

All errors (new ones prefixed by >>):

   arch/arm/mach-sunxi/headsmp.S: Assembler messages:
>> arch/arm/mach-sunxi/headsmp.S:22: Error: selected processor does not support 
>> `movw r2,#(0xff00fff0&0x)' in ARM mode
>> arch/arm/mach-sunxi/headsmp.S:23: Error: selected processor does not support 
>> `movt r2,#(0xff00fff0>>16)' in ARM mode
   arch/arm/mach-sunxi/headsmp.S:25: Error: selected processor does not support 
`movw r2,#(0x4100c0f0&0x)' in ARM mode
   arch/arm/mach-sunxi/headsmp.S:26: Error: selected processor does not support 
`movt r2,#(0x4100c0f0>>16)' in ARM mode

vim +22 arch/arm/mach-sunxi/headsmp.S

13  
14  ENTRY(sunxi_mc_smp_cluster_cache_enable)
15  /*
16  * Enable cluster-level coherency, in preparation for turning on 
the MMU.
17  *
18  * Also enable regional clock gating and L2 data latency 
settings for
19  * Cortex-A15. These settings are from the vendor kernel.
20  */
21  mrc p15, 0, r1, c0, c0, 0
  > 22  movwr2, #(0xff00fff0&0x)
  > 23  movtr2, #(0xff00fff0>>16)
24  and r1, r1, r2
25  movwr2, #(0x4100c0f0&0x)
26  movtr2, #(0x4100c0f0>>16)
27  cmp r1, r2
28  bne not_a15
29  
30  /* The following is Cortex-A15 specific */
31  
32  /* ACTLR2: Enable CPU regional clock gates */
33  mrc p15, 1, r1, c15, c0, 4
34  orr r1, r1, #(0x1<<31)
35  mcr p15, 1, r1, c15, c0, 4
36  
37  /* L2ACTLR */
38  mrc p15, 1, r1, c15, c0, 0
39  /* Enable L2, GIC, and Timer regional clock gates */
40  orr r1, r1, #(0x1<<26)
41  /* Disable clean/evict from being pushed to external */
42  orr r1, r1, #(0x1<<3)
43  mcr p15, 1, r1, c15, c0, 0
44  
45  /* L2CTRL: L2 data RAM latency */
46  mrc p15, 1, r1, c9, c0, 2
47  bic r1, r1, #(0x7<<0)
48  orr r1, r1, #(0x3<<0)
49  mcr p15, 1, r1, c9, c0, 2
50  
51  /* End of Cortex-A15 specific setup */
52  not_a15:
53  
54  /* Get value of sunxi_mc_smp_first_comer */
55  adr r1, first
56  ldr r0, [r1]
57  ldr r0, [r1, r0]
58  
59  /* Skip cci_enable_port_for_self if not first comer */
60  cmp r0, #0
61  bxeqlr
62  b   cci_enable_port_for_self
63  
64  .align 2
65  first: .word sunxi_mc_smp_first_comer - .
66  ENDPROC(sunxi_mc_smp_cluster_cache_enable)
67  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH] ARM: davinci_all_defconfig: set CONFIG_DAVINCI_WATCHDOG=y

2018-02-25 Thread Sekhar Nori
On Monday 15 January 2018 10:59 PM, David Lechner wrote:
> This changes CONFIG_DAVINCI_WATCHDOG from a module to a compiled-in
> option. Since the reset function has been moved out of the mach code in
> commit 0808d3260456 ("ARM: davinci: remove watchdog reset") and into the

commit id adjusted since the said patch did not make it to v4.16

> watchdog driver, devices cannot reboot unless the watchdog driver is
> loaded, so make it a compiled-in option so that we can always reboot, even
> when modules are not loaded.
> 
> Cc: Sekhar Nori 
> Suggested-by: Adam Ford 
> Signed-off-by: David Lechner 

Applied to v4.17/defconfig

Thanks,
Sekhar


Re: [PATCH] ARM: davinci_all_defconfig: set CONFIG_DAVINCI_WATCHDOG=y

2018-02-25 Thread Sekhar Nori
On Monday 15 January 2018 10:59 PM, David Lechner wrote:
> This changes CONFIG_DAVINCI_WATCHDOG from a module to a compiled-in
> option. Since the reset function has been moved out of the mach code in
> commit 0808d3260456 ("ARM: davinci: remove watchdog reset") and into the

commit id adjusted since the said patch did not make it to v4.16

> watchdog driver, devices cannot reboot unless the watchdog driver is
> loaded, so make it a compiled-in option so that we can always reboot, even
> when modules are not loaded.
> 
> Cc: Sekhar Nori 
> Suggested-by: Adam Ford 
> Signed-off-by: David Lechner 

Applied to v4.17/defconfig

Thanks,
Sekhar


Re: [PATCHv3 1/2] zsmalloc: introduce zs_huge_object() function

2018-02-25 Thread Minchan Kim
Hi Sergey,

On Mon, Feb 26, 2018 at 02:49:27PM +0900, Sergey Senozhatsky wrote:
> > I think it's simple enough. :)
> 
> Right. The changes are pretty trivial, that's why I kept then in
> 2 simple patches. Besides, I didn't want to mix zsmalloc and zram
> changes.

As I said earlier, it's not thing we usually do, at least, MM.
Anyway, I don't want to insist on it because it depends each
person's point of view what's the better for review, git-bisect.

> 
> > Can't zram ask to zsmalloc about what size is for hugeobject from?
> > With that, zram can save the wartermark in itself and use it.
> > What I mean is as follows,
> > 
> > zram:
> > size_t huge_size = _zs_huge_object(pool);
> > ..
> > ..
> > if (comp_size >= huge_size)
> > memcpy(dst, src, 4K);
> 
> Yes, can do. My plan was to keep it completely internally to zsmalloc.
> Who knows, it might become smart enough one day to do something more
> than just size comparison. Any reason you used that leading underscore

Let's do that in future if someone want it. :)

> in _zs_huge_object()?


Nope. It's just typo. Let's think better name.
How about using zs_huge_size()?





Re: [PATCHv3 1/2] zsmalloc: introduce zs_huge_object() function

2018-02-25 Thread Minchan Kim
Hi Sergey,

On Mon, Feb 26, 2018 at 02:49:27PM +0900, Sergey Senozhatsky wrote:
> > I think it's simple enough. :)
> 
> Right. The changes are pretty trivial, that's why I kept then in
> 2 simple patches. Besides, I didn't want to mix zsmalloc and zram
> changes.

As I said earlier, it's not thing we usually do, at least, MM.
Anyway, I don't want to insist on it because it depends each
person's point of view what's the better for review, git-bisect.

> 
> > Can't zram ask to zsmalloc about what size is for hugeobject from?
> > With that, zram can save the wartermark in itself and use it.
> > What I mean is as follows,
> > 
> > zram:
> > size_t huge_size = _zs_huge_object(pool);
> > ..
> > ..
> > if (comp_size >= huge_size)
> > memcpy(dst, src, 4K);
> 
> Yes, can do. My plan was to keep it completely internally to zsmalloc.
> Who knows, it might become smart enough one day to do something more
> than just size comparison. Any reason you used that leading underscore

Let's do that in future if someone want it. :)

> in _zs_huge_object()?


Nope. It's just typo. Let's think better name.
How about using zs_huge_size()?





Re: [PATCH] ARM: dts: da850-lego-ev3: use a correct fallback for at24 compatible

2018-02-25 Thread Sekhar Nori
On Monday 22 January 2018 10:20 PM, David Lechner wrote:
> On 01/22/2018 06:55 AM, Bartosz Golaszewski wrote:
>> From: Bartosz Golaszewski 
>>
>> We now require all at24 users to use the "atmel," fallback in
>> device tree for different manufacturers.
>>
>> Signed-off-by: Bartosz Golaszewski 
>> ---
> 
> Reviewed-by: David Lechner 

Applied to v4.17/dt

Thanks,
Sekhar


Re: [PATCH] ARM: dts: da850-lego-ev3: use a correct fallback for at24 compatible

2018-02-25 Thread Sekhar Nori
On Monday 22 January 2018 10:20 PM, David Lechner wrote:
> On 01/22/2018 06:55 AM, Bartosz Golaszewski wrote:
>> From: Bartosz Golaszewski 
>>
>> We now require all at24 users to use the "atmel," fallback in
>> device tree for different manufacturers.
>>
>> Signed-off-by: Bartosz Golaszewski 
>> ---
> 
> Reviewed-by: David Lechner 

Applied to v4.17/dt

Thanks,
Sekhar


Re: [PATCH v12 07/11] mm: Add address parameter to arch_validate_prot()

2018-02-25 Thread Michael Ellerman
Khalid Aziz  writes:

> A protection flag may not be valid across entire address space and
> hence arch_validate_prot() might need the address a protection bit is
> being set on to ensure it is a valid protection flag. For example, sparc
> processors support memory corruption detection (as part of ADI feature)
> flag on memory addresses mapped on to physical RAM but not on PFN mapped
> pages or addresses mapped on to devices. This patch adds address to the
> parameters being passed to arch_validate_prot() so protection bits can
> be validated in the relevant context.
>
> Signed-off-by: Khalid Aziz 
> Cc: Khalid Aziz 
> Reviewed-by: Anthony Yznaga 
> ---
> v8:
>   - Added addr parameter to powerpc arch_validate_prot() (suggested
> by Michael Ellerman)
> v9:
>   - new patch
>
>  arch/powerpc/include/asm/mman.h | 4 ++--
>  arch/powerpc/kernel/syscalls.c  | 2 +-

These changes look fine to me:

Acked-by: Michael Ellerman  (powerpc)

cheers


Re: [PATCH v12 07/11] mm: Add address parameter to arch_validate_prot()

2018-02-25 Thread Michael Ellerman
Khalid Aziz  writes:

> A protection flag may not be valid across entire address space and
> hence arch_validate_prot() might need the address a protection bit is
> being set on to ensure it is a valid protection flag. For example, sparc
> processors support memory corruption detection (as part of ADI feature)
> flag on memory addresses mapped on to physical RAM but not on PFN mapped
> pages or addresses mapped on to devices. This patch adds address to the
> parameters being passed to arch_validate_prot() so protection bits can
> be validated in the relevant context.
>
> Signed-off-by: Khalid Aziz 
> Cc: Khalid Aziz 
> Reviewed-by: Anthony Yznaga 
> ---
> v8:
>   - Added addr parameter to powerpc arch_validate_prot() (suggested
> by Michael Ellerman)
> v9:
>   - new patch
>
>  arch/powerpc/include/asm/mman.h | 4 ++--
>  arch/powerpc/kernel/syscalls.c  | 2 +-

These changes look fine to me:

Acked-by: Michael Ellerman  (powerpc)

cheers


Re: [PATCHv3 1/2] zsmalloc: introduce zs_huge_object() function

2018-02-25 Thread Sergey Senozhatsky
On (02/20/18 10:24), Minchan Kim wrote:
> Hi Sergey,

[..]

> Sorry for the long delay. I was horribly busy for a few weeks. ;-(

My turn to say "Sorry for the delay" :)

[..]
> I think it's simple enough. :)

Right. The changes are pretty trivial, that's why I kept then in
2 simple patches. Besides, I didn't want to mix zsmalloc and zram
changes.

> Can't zram ask to zsmalloc about what size is for hugeobject from?
> With that, zram can save the wartermark in itself and use it.
> What I mean is as follows,
> 
> zram:
>   size_t huge_size = _zs_huge_object(pool);
>   ..
>   ..
>   if (comp_size >= huge_size)
>   memcpy(dst, src, 4K);

Yes, can do. My plan was to keep it completely internally to zsmalloc.
Who knows, it might become smart enough one day to do something more
than just size comparison. Any reason you used that leading underscore
in _zs_huge_object()?

-ss


Re: [PATCHv3 1/2] zsmalloc: introduce zs_huge_object() function

2018-02-25 Thread Sergey Senozhatsky
On (02/20/18 10:24), Minchan Kim wrote:
> Hi Sergey,

[..]

> Sorry for the long delay. I was horribly busy for a few weeks. ;-(

My turn to say "Sorry for the delay" :)

[..]
> I think it's simple enough. :)

Right. The changes are pretty trivial, that's why I kept then in
2 simple patches. Besides, I didn't want to mix zsmalloc and zram
changes.

> Can't zram ask to zsmalloc about what size is for hugeobject from?
> With that, zram can save the wartermark in itself and use it.
> What I mean is as follows,
> 
> zram:
>   size_t huge_size = _zs_huge_object(pool);
>   ..
>   ..
>   if (comp_size >= huge_size)
>   memcpy(dst, src, 4K);

Yes, can do. My plan was to keep it completely internally to zsmalloc.
Who knows, it might become smart enough one day to do something more
than just size comparison. Any reason you used that leading underscore
in _zs_huge_object()?

-ss


Re: [PATCH tip/core/rcu 06/10] trace: Eliminate cond_resched_rcu_qs() in favor of cond_resched()

2018-02-25 Thread Paul E. McKenney
On Sun, Feb 25, 2018 at 11:57:48PM -0500, Steven Rostedt wrote:
> On Sun, 25 Feb 2018 10:17:30 -0800
> "Paul E. McKenney"  wrote:
> 
> 
> > And probably not.  You are probably running CONFIG_PREEMPT=y (otherwise
> > RCU-tasks is trivial), so cond_resched() is a complete no-op:
> > 
> > static inline int _cond_resched(void) { return 0; }
> > 
> > I could make this call rcu_all_qs(), but I would not expect Peter Zijlstra
> > to be at all happy with that sort of change.
> > 
> > And the people who asked for the cond_resched() work probably aren't
> > going to be happy with the resumed proliferation of cond_resched_rcu_qs().
> > 
> > Hmmm...  Grasping at straws...  Could we make cond_resched() be something
> > like a tracepoint and instrument them with cond_resched_rcu_qs() if the
> > current RCU-tasks grace period ran for more that (say) a minute of its
> > ten-minute stall-warning span?
> > 
> 
> Instead of monkeying with cond_resched(), since this is "special" code,
> why don't I just have that code call it directly?
> 
>   cond_resched();
>   rcu_note_voluntary_context_switch(current);

The advantage of the last patch that I sent is that the special call
is in one place.  (This is the one that adds the "special" definition
for _cond_resched().)

Thanx, Paul



Re: [PATCH tip/core/rcu 06/10] trace: Eliminate cond_resched_rcu_qs() in favor of cond_resched()

2018-02-25 Thread Paul E. McKenney
On Sun, Feb 25, 2018 at 11:57:48PM -0500, Steven Rostedt wrote:
> On Sun, 25 Feb 2018 10:17:30 -0800
> "Paul E. McKenney"  wrote:
> 
> 
> > And probably not.  You are probably running CONFIG_PREEMPT=y (otherwise
> > RCU-tasks is trivial), so cond_resched() is a complete no-op:
> > 
> > static inline int _cond_resched(void) { return 0; }
> > 
> > I could make this call rcu_all_qs(), but I would not expect Peter Zijlstra
> > to be at all happy with that sort of change.
> > 
> > And the people who asked for the cond_resched() work probably aren't
> > going to be happy with the resumed proliferation of cond_resched_rcu_qs().
> > 
> > Hmmm...  Grasping at straws...  Could we make cond_resched() be something
> > like a tracepoint and instrument them with cond_resched_rcu_qs() if the
> > current RCU-tasks grace period ran for more that (say) a minute of its
> > ten-minute stall-warning span?
> > 
> 
> Instead of monkeying with cond_resched(), since this is "special" code,
> why don't I just have that code call it directly?
> 
>   cond_resched();
>   rcu_note_voluntary_context_switch(current);

The advantage of the last patch that I sent is that the special call
is in one place.  (This is the one that adds the "special" definition
for _cond_resched().)

Thanx, Paul



Re: [PATCH RESEND 1/2] mm: swap: clean up swap readahead

2018-02-25 Thread Minchan Kim
On Mon, Feb 26, 2018 at 01:18:50PM +0800, Huang, Ying wrote:
> Minchan Kim  writes:
> 
> > On Fri, Feb 23, 2018 at 04:02:27PM +0800, Huang, Ying wrote:
> >>  writes:
> >> [snip]
> >> 
> >> > diff --git a/mm/swap_state.c b/mm/swap_state.c
> >> > index 39ae7cfad90f..c56cce64b2c3 100644
> >> > --- a/mm/swap_state.c
> >> > +++ b/mm/swap_state.c
> >> > @@ -332,32 +332,38 @@ struct page *lookup_swap_cache(swp_entry_t entry, 
> >> > struct vm_area_struct *vma,
> >> > unsigned long addr)
> >> >  {
> >> >  struct page *page;
> >> > -unsigned long ra_info;
> >> > -int win, hits, readahead;
> >> >  
> >> >  page = find_get_page(swap_address_space(entry), 
> >> > swp_offset(entry));
> >> >  
> >> >  INC_CACHE_INFO(find_total);
> >> >  if (page) {
> >> > +bool vma_ra = swap_use_vma_readahead();
> >> > +bool readahead = TestClearPageReadahead(page);
> >> > +
> >> 
> >> TestClearPageReadahead() cannot be called for compound page.  As in
> >> 
> >> PAGEFLAG(Readahead, reclaim, PF_NO_COMPOUND)
> >>TESTCLEARFLAG(Readahead, reclaim, PF_NO_COMPOUND)
> >> 
> >> >  INC_CACHE_INFO(find_success);
> >> >  if (unlikely(PageTransCompound(page)))
> >> >  return page;
> >> > -readahead = TestClearPageReadahead(page);
> >> 
> >> So we can only call it here after checking whether page is compound.
> >
> > Hi Huang,
> >
> > Thanks for cathing this.
> > However, I don't see the reason we should rule out THP page for
> > readahead marker. Could't we relax the rule?
> >
> > I hope we can do so that we could remove PageTransCompound check
> > for readahead marker, which makes code ugly.
> >
> > From 748b084d5c3960ec2418d8c51a678aada30f1072 Mon Sep 17 00:00:00 2001
> > From: Minchan Kim 
> > Date: Mon, 26 Feb 2018 13:46:43 +0900
> > Subject: [PATCH] mm: relax policy for PG_readahead
> >
> > This flag is in use for anon THP page so let's relax it.
> >
> > Cc: Kirill A. Shutemov 
> > Signed-off-by: Minchan Kim 
> > ---
> >  include/linux/page-flags.h | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > index e34a27727b9a..f12d4dfae580 100644
> > --- a/include/linux/page-flags.h
> > +++ b/include/linux/page-flags.h
> > @@ -318,8 +318,8 @@ PAGEFLAG(MappedToDisk, mappedtodisk, PF_NO_TAIL)
> >  /* PG_readahead is only used for reads; PG_reclaim is only for writes */
> >  PAGEFLAG(Reclaim, reclaim, PF_NO_TAIL)
> > TESTCLEARFLAG(Reclaim, reclaim, PF_NO_TAIL)
> > -PAGEFLAG(Readahead, reclaim, PF_NO_COMPOUND)
> > -   TESTCLEARFLAG(Readahead, reclaim, PF_NO_COMPOUND)
> > +PAGEFLAG(Readahead, reclaim, PF_NO_TAIL)
> > +   TESTCLEARFLAG(Readahead, reclaim, PF_NO_TAIL)
> >  
> >  #ifdef CONFIG_HIGHMEM
> >  /*
> 
> We never set Readahead bit for THP in reality.  The original code acts
> as document for this.  I don't think it is a good idea to change this
> without a good reason.

I don't like such divergence so that we don't need to care about whether
the page is THP or not. However, there is pointless to confuse ra stat
counters, too. How about this?

diff --git a/mm/swap_state.c b/mm/swap_state.c
index 8dde719e973c..e169d137d27c 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -348,12 +348,17 @@ struct page *lookup_swap_cache(swp_entry_t entry, struct 
vm_area_struct *vma,
INC_CACHE_INFO(find_total);
if (page) {
bool vma_ra = swap_use_vma_readahead();
-   bool readahead = TestClearPageReadahead(page);
+   bool readahead;
 
INC_CACHE_INFO(find_success);
+   /*
+* At the moment, we doesn't support PG_readahead for anon THP
+* so let's bail out rather than confusing the readahead stat.
+*/
if (unlikely(PageTransCompound(page)))
return page;
 
+   readahead = TestClearPageReadahead(page);
if (vma && vma_ra) {
unsigned long ra_val;
int win, hits;
@@ -608,8 +613,7 @@ struct page *swap_cluster_readahead(swp_entry_t entry, 
gfp_t gfp_mask,
continue;
if (page_allocated) {
swap_readpage(page, false);
-   if (offset != entry_offset &&
-   likely(!PageTransCompound(page))) {
+   if (offset != entry_offset) {
SetPageReadahead(page);
count_vm_event(SWAP_RA);
}
@@ -772,8 +776,7 @@ struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t 
gfp_mask,
continue;
if (page_allocated) {
   

Re: [PATCH RESEND 1/2] mm: swap: clean up swap readahead

2018-02-25 Thread Minchan Kim
On Mon, Feb 26, 2018 at 01:18:50PM +0800, Huang, Ying wrote:
> Minchan Kim  writes:
> 
> > On Fri, Feb 23, 2018 at 04:02:27PM +0800, Huang, Ying wrote:
> >>  writes:
> >> [snip]
> >> 
> >> > diff --git a/mm/swap_state.c b/mm/swap_state.c
> >> > index 39ae7cfad90f..c56cce64b2c3 100644
> >> > --- a/mm/swap_state.c
> >> > +++ b/mm/swap_state.c
> >> > @@ -332,32 +332,38 @@ struct page *lookup_swap_cache(swp_entry_t entry, 
> >> > struct vm_area_struct *vma,
> >> > unsigned long addr)
> >> >  {
> >> >  struct page *page;
> >> > -unsigned long ra_info;
> >> > -int win, hits, readahead;
> >> >  
> >> >  page = find_get_page(swap_address_space(entry), 
> >> > swp_offset(entry));
> >> >  
> >> >  INC_CACHE_INFO(find_total);
> >> >  if (page) {
> >> > +bool vma_ra = swap_use_vma_readahead();
> >> > +bool readahead = TestClearPageReadahead(page);
> >> > +
> >> 
> >> TestClearPageReadahead() cannot be called for compound page.  As in
> >> 
> >> PAGEFLAG(Readahead, reclaim, PF_NO_COMPOUND)
> >>TESTCLEARFLAG(Readahead, reclaim, PF_NO_COMPOUND)
> >> 
> >> >  INC_CACHE_INFO(find_success);
> >> >  if (unlikely(PageTransCompound(page)))
> >> >  return page;
> >> > -readahead = TestClearPageReadahead(page);
> >> 
> >> So we can only call it here after checking whether page is compound.
> >
> > Hi Huang,
> >
> > Thanks for cathing this.
> > However, I don't see the reason we should rule out THP page for
> > readahead marker. Could't we relax the rule?
> >
> > I hope we can do so that we could remove PageTransCompound check
> > for readahead marker, which makes code ugly.
> >
> > From 748b084d5c3960ec2418d8c51a678aada30f1072 Mon Sep 17 00:00:00 2001
> > From: Minchan Kim 
> > Date: Mon, 26 Feb 2018 13:46:43 +0900
> > Subject: [PATCH] mm: relax policy for PG_readahead
> >
> > This flag is in use for anon THP page so let's relax it.
> >
> > Cc: Kirill A. Shutemov 
> > Signed-off-by: Minchan Kim 
> > ---
> >  include/linux/page-flags.h | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > index e34a27727b9a..f12d4dfae580 100644
> > --- a/include/linux/page-flags.h
> > +++ b/include/linux/page-flags.h
> > @@ -318,8 +318,8 @@ PAGEFLAG(MappedToDisk, mappedtodisk, PF_NO_TAIL)
> >  /* PG_readahead is only used for reads; PG_reclaim is only for writes */
> >  PAGEFLAG(Reclaim, reclaim, PF_NO_TAIL)
> > TESTCLEARFLAG(Reclaim, reclaim, PF_NO_TAIL)
> > -PAGEFLAG(Readahead, reclaim, PF_NO_COMPOUND)
> > -   TESTCLEARFLAG(Readahead, reclaim, PF_NO_COMPOUND)
> > +PAGEFLAG(Readahead, reclaim, PF_NO_TAIL)
> > +   TESTCLEARFLAG(Readahead, reclaim, PF_NO_TAIL)
> >  
> >  #ifdef CONFIG_HIGHMEM
> >  /*
> 
> We never set Readahead bit for THP in reality.  The original code acts
> as document for this.  I don't think it is a good idea to change this
> without a good reason.

I don't like such divergence so that we don't need to care about whether
the page is THP or not. However, there is pointless to confuse ra stat
counters, too. How about this?

diff --git a/mm/swap_state.c b/mm/swap_state.c
index 8dde719e973c..e169d137d27c 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -348,12 +348,17 @@ struct page *lookup_swap_cache(swp_entry_t entry, struct 
vm_area_struct *vma,
INC_CACHE_INFO(find_total);
if (page) {
bool vma_ra = swap_use_vma_readahead();
-   bool readahead = TestClearPageReadahead(page);
+   bool readahead;
 
INC_CACHE_INFO(find_success);
+   /*
+* At the moment, we doesn't support PG_readahead for anon THP
+* so let's bail out rather than confusing the readahead stat.
+*/
if (unlikely(PageTransCompound(page)))
return page;
 
+   readahead = TestClearPageReadahead(page);
if (vma && vma_ra) {
unsigned long ra_val;
int win, hits;
@@ -608,8 +613,7 @@ struct page *swap_cluster_readahead(swp_entry_t entry, 
gfp_t gfp_mask,
continue;
if (page_allocated) {
swap_readpage(page, false);
-   if (offset != entry_offset &&
-   likely(!PageTransCompound(page))) {
+   if (offset != entry_offset) {
SetPageReadahead(page);
count_vm_event(SWAP_RA);
}
@@ -772,8 +776,7 @@ struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t 
gfp_mask,
continue;
if (page_allocated) {
swap_readpage(page, false);
-   if (i != ra_info.offset &&
-

Re: [PATCH v11 2/7] PCI/AER: factor out error reporting from AER

2018-02-25 Thread poza

On 2018-02-26 11:02, p...@codeaurora.org wrote:

On 2018-02-24 05:12, Bjorn Helgaas wrote:

On Fri, Feb 23, 2018 at 01:53:59PM +0530, Oza Pawandeep wrote:

This patch factors out error reporting callbacks, which are currently
tightly coupled with AER.


Add blank line between paragraphs.

DPC should be able to register callbacks and attmept recovery when 
DPC

trigger event occurs.


s/attmept/attempt/

This patch basically moves code from aerdrv_core.c to pcie-err.c.  
Make it

do *only* that.



sure.


Signed-off-by: Oza Pawandeep 

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index fcd8191..abc514e 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -342,6 +342,9 @@ static inline resource_size_t 
pci_resource_alignment(struct pci_dev *dev,


 void pci_enable_acs(struct pci_dev *dev);

+/* PCI error reporting and recovery */
+void pcie_do_recovery(struct pci_dev *dev, int severity);


Add this declaration in the first patch, the one where you rename the
function.



done.


 #ifdef CONFIG_PCIEASPM
 void pcie_aspm_init_link_state(struct pci_dev *pdev);
 void pcie_aspm_exit_link_state(struct pci_dev *pdev);
diff --git a/drivers/pci/pcie/Makefile b/drivers/pci/pcie/Makefile
index 223e4c3..d669497 100644
--- a/drivers/pci/pcie/Makefile
+++ b/drivers/pci/pcie/Makefile
@@ -6,7 +6,7 @@
 # Build PCI Express ASPM if needed
 obj-$(CONFIG_PCIEASPM) += aspm.o

-pcieportdrv-y  := portdrv_core.o portdrv_pci.o portdrv_bus.o
+pcieportdrv-y			:= portdrv_core.o portdrv_pci.o portdrv_bus.o 
pcie-err.o


Can we name this just "drivers/pci/pcie/err.c"?  I know we have
pcie-dpc.c already, but it does get a little repetitious to type
"pci" THREE times in that path.



sure, will rename.


 pcieportdrv-$(CONFIG_ACPI) += portdrv_acpi.o

 obj-$(CONFIG_PCIEPORTBUS)  += pcieportdrv.o
diff --git a/drivers/pci/pcie/aer/aerdrv.h 
b/drivers/pci/pcie/aer/aerdrv.h

index 5449e5c..bc9db53 100644
--- a/drivers/pci/pcie/aer/aerdrv.h
+++ b/drivers/pci/pcie/aer/aerdrv.h
@@ -76,36 +76,6 @@ struct aer_rpc {
 */
 };

-struct aer_broadcast_data {
-   enum pci_channel_state state;
-   enum pci_ers_result result;
-};
-
-static inline pci_ers_result_t merge_result(enum pci_ers_result 
orig,

-   enum pci_ers_result new)
-{
-   if (new == PCI_ERS_RESULT_NO_AER_DRIVER)
-   return PCI_ERS_RESULT_NO_AER_DRIVER;
-
-   if (new == PCI_ERS_RESULT_NONE)
-   return orig;
-
-   switch (orig) {
-   case PCI_ERS_RESULT_CAN_RECOVER:
-   case PCI_ERS_RESULT_RECOVERED:
-   orig = new;
-   break;
-   case PCI_ERS_RESULT_DISCONNECT:
-   if (new == PCI_ERS_RESULT_NEED_RESET)
-   orig = PCI_ERS_RESULT_NEED_RESET;
-   break;
-   default:
-   break;
-   }
-
-   return orig;
-}
-
 extern struct bus_type pcie_port_bus_type;
 void aer_isr(struct work_struct *work);
 void aer_print_error(struct pci_dev *dev, struct aer_err_info 
*info);
diff --git a/drivers/pci/pcie/aer/aerdrv_core.c 
b/drivers/pci/pcie/aer/aerdrv_core.c

index aeb83a0..f60b1bb 100644
--- a/drivers/pci/pcie/aer/aerdrv_core.c
+++ b/drivers/pci/pcie/aer/aerdrv_core.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include "aerdrv.h"
+#include "../../pci.h"

 #define	PCI_EXP_AER_FLAGS	(PCI_EXP_DEVCTL_CERE | 
PCI_EXP_DEVCTL_NFERE | \

 PCI_EXP_DEVCTL_FERE | PCI_EXP_DEVCTL_URRE)
@@ -230,191 +231,6 @@ static bool find_source_device(struct pci_dev 
*parent,

return true;
 }

-static int report_error_detected(struct pci_dev *dev, void *data)
-{
-   pci_ers_result_t vote;
-   const struct pci_error_handlers *err_handler;
-   struct aer_broadcast_data *result_data;
-   result_data = (struct aer_broadcast_data *) data;
-
-   device_lock(>dev);
-   dev->error_state = result_data->state;
-
-   if (!dev->driver ||
-   !dev->driver->err_handler ||
-   !dev->driver->err_handler->error_detected) {
-   if (result_data->state == pci_channel_io_frozen &&
-   dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) {
-   /*
-* In case of fatal recovery, if one of down-
-* stream device has no driver. We might be
-* unable to recover because a later insmod
-* of a driver for this device is unaware of
-* its hw state.
-*/
-   pci_printk(KERN_DEBUG, dev, "device has %s\n",
-  dev->driver ?
-  "no AER-aware driver" : "no driver");
-   }
-
-   /*
-* If there's any device in the subtree that does not
-* have an error_detected callback, returning
-* 

Re: [PATCH v11 2/7] PCI/AER: factor out error reporting from AER

2018-02-25 Thread poza

On 2018-02-26 11:02, p...@codeaurora.org wrote:

On 2018-02-24 05:12, Bjorn Helgaas wrote:

On Fri, Feb 23, 2018 at 01:53:59PM +0530, Oza Pawandeep wrote:

This patch factors out error reporting callbacks, which are currently
tightly coupled with AER.


Add blank line between paragraphs.

DPC should be able to register callbacks and attmept recovery when 
DPC

trigger event occurs.


s/attmept/attempt/

This patch basically moves code from aerdrv_core.c to pcie-err.c.  
Make it

do *only* that.



sure.


Signed-off-by: Oza Pawandeep 

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index fcd8191..abc514e 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -342,6 +342,9 @@ static inline resource_size_t 
pci_resource_alignment(struct pci_dev *dev,


 void pci_enable_acs(struct pci_dev *dev);

+/* PCI error reporting and recovery */
+void pcie_do_recovery(struct pci_dev *dev, int severity);


Add this declaration in the first patch, the one where you rename the
function.



done.


 #ifdef CONFIG_PCIEASPM
 void pcie_aspm_init_link_state(struct pci_dev *pdev);
 void pcie_aspm_exit_link_state(struct pci_dev *pdev);
diff --git a/drivers/pci/pcie/Makefile b/drivers/pci/pcie/Makefile
index 223e4c3..d669497 100644
--- a/drivers/pci/pcie/Makefile
+++ b/drivers/pci/pcie/Makefile
@@ -6,7 +6,7 @@
 # Build PCI Express ASPM if needed
 obj-$(CONFIG_PCIEASPM) += aspm.o

-pcieportdrv-y  := portdrv_core.o portdrv_pci.o portdrv_bus.o
+pcieportdrv-y			:= portdrv_core.o portdrv_pci.o portdrv_bus.o 
pcie-err.o


Can we name this just "drivers/pci/pcie/err.c"?  I know we have
pcie-dpc.c already, but it does get a little repetitious to type
"pci" THREE times in that path.



sure, will rename.


 pcieportdrv-$(CONFIG_ACPI) += portdrv_acpi.o

 obj-$(CONFIG_PCIEPORTBUS)  += pcieportdrv.o
diff --git a/drivers/pci/pcie/aer/aerdrv.h 
b/drivers/pci/pcie/aer/aerdrv.h

index 5449e5c..bc9db53 100644
--- a/drivers/pci/pcie/aer/aerdrv.h
+++ b/drivers/pci/pcie/aer/aerdrv.h
@@ -76,36 +76,6 @@ struct aer_rpc {
 */
 };

-struct aer_broadcast_data {
-   enum pci_channel_state state;
-   enum pci_ers_result result;
-};
-
-static inline pci_ers_result_t merge_result(enum pci_ers_result 
orig,

-   enum pci_ers_result new)
-{
-   if (new == PCI_ERS_RESULT_NO_AER_DRIVER)
-   return PCI_ERS_RESULT_NO_AER_DRIVER;
-
-   if (new == PCI_ERS_RESULT_NONE)
-   return orig;
-
-   switch (orig) {
-   case PCI_ERS_RESULT_CAN_RECOVER:
-   case PCI_ERS_RESULT_RECOVERED:
-   orig = new;
-   break;
-   case PCI_ERS_RESULT_DISCONNECT:
-   if (new == PCI_ERS_RESULT_NEED_RESET)
-   orig = PCI_ERS_RESULT_NEED_RESET;
-   break;
-   default:
-   break;
-   }
-
-   return orig;
-}
-
 extern struct bus_type pcie_port_bus_type;
 void aer_isr(struct work_struct *work);
 void aer_print_error(struct pci_dev *dev, struct aer_err_info 
*info);
diff --git a/drivers/pci/pcie/aer/aerdrv_core.c 
b/drivers/pci/pcie/aer/aerdrv_core.c

index aeb83a0..f60b1bb 100644
--- a/drivers/pci/pcie/aer/aerdrv_core.c
+++ b/drivers/pci/pcie/aer/aerdrv_core.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include "aerdrv.h"
+#include "../../pci.h"

 #define	PCI_EXP_AER_FLAGS	(PCI_EXP_DEVCTL_CERE | 
PCI_EXP_DEVCTL_NFERE | \

 PCI_EXP_DEVCTL_FERE | PCI_EXP_DEVCTL_URRE)
@@ -230,191 +231,6 @@ static bool find_source_device(struct pci_dev 
*parent,

return true;
 }

-static int report_error_detected(struct pci_dev *dev, void *data)
-{
-   pci_ers_result_t vote;
-   const struct pci_error_handlers *err_handler;
-   struct aer_broadcast_data *result_data;
-   result_data = (struct aer_broadcast_data *) data;
-
-   device_lock(>dev);
-   dev->error_state = result_data->state;
-
-   if (!dev->driver ||
-   !dev->driver->err_handler ||
-   !dev->driver->err_handler->error_detected) {
-   if (result_data->state == pci_channel_io_frozen &&
-   dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) {
-   /*
-* In case of fatal recovery, if one of down-
-* stream device has no driver. We might be
-* unable to recover because a later insmod
-* of a driver for this device is unaware of
-* its hw state.
-*/
-   pci_printk(KERN_DEBUG, dev, "device has %s\n",
-  dev->driver ?
-  "no AER-aware driver" : "no driver");
-   }
-
-   /*
-* If there's any device in the subtree that does not
-* have an error_detected callback, returning
-* PCI_ERS_RESULT_NO_AER_DRIVER 

linux-next: Tree for Feb 26

2018-02-25 Thread Stephen Rothwell
Hi all,

Changes since 20180223:

New tree: leaks

The bpf-next tree gained a conflict against the bpf tree.

The kvms390 tree gained a conflict against the nds32 tree.

The percpu tree gained a build failure for which I applied a patch.

The akpm-current tree gained a conflict against the metag tree and a
build failure for which I applied a patch.

The akpm tree lost a patch that turned up elsewhere.

Non-merge commits (relative to Linus' tree): 3394
 4140 files changed, 155342 insertions(+), 102665 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 258 trees (counting Linus' and 44 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (3664ce2d9309 Merge tag 'powerpc-4.16-4' of 
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux)
Merging fixes/master (7928b2cbe55b Linux 4.16-rc1)
Merging kbuild-current/fixes (36c1681678b5 genksyms: drop *.hash.c from 
.gitignore)
Merging arc-current/for-curr (646a4c62e03a ARC: setup cpu possible mask 
according to possible-cpus dts property)
Merging arm-current/fixes (091f02483df7 ARM: net: bpf: clarify tail_call index)
Merging m68k-current/for-linus (2334b1ac1235 MAINTAINERS: Add NuBus subsystem 
entry)
Merging metag-fixes/fixes (b884a190afce metag/usercopy: Add missing fixups)
Merging powerpc-fixes/fixes (eb0a2d2620ae powerpc/powernv: Support firmware 
disable of RFI flush)
Merging sparc/master (aebb48f5e465 sparc64: fix typo in 
CONFIG_CRYPTO_DES_SPARC64 => CONFIG_CRYPTO_CAMELLIA_SPARC64)
Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2)
Merging net/master (9cb9c07d6b0c Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging bpf/master (ca36960211eb bpf: allow xadd only on aligned memory)
Merging ipsec/master (013cb81e89f8 xfrm: Fix infinite loop in 
xfrm_get_dst_nexthop with transport mode.)
Merging netfilter/master (7d98386d55a5 netfilter: use skb_to_full_sk in 
ip6_route_me_harder)
Merging ipvs/master (f7fb77fc1235 netfilter: nft_compat: check extension hook 
mask only if set)
Merging wireless-drivers/master (78dc897b7ee6 rtlwifi: rtl8723be: Fix loss of 
signal)
Merging mac80211/master (b323ac19b773 mac80211: drop frames with unexpected DS 
bits from fast-rx to slow path)
Merging rdma-fixes/for-rc (f45765872e7a RDMA/uverbs: Fix kernel panic while 
using XRC_TGT QP type)
Merging sound-current/for-linus (240a8af929c7 ALSA: usb-audio: Add a quirck for 
B PX headphones)
Merging pci-current/for-linus (7928b2cbe55b Linux 4.16-rc1)
Merging driver-core.current/driver-core-linus (91ab883eb213 Linux 4.16-rc2)
Merging tty.current/tty-linus (91ab883eb213 Linux 4.16-rc2)
Merging usb.current/usb-linus (0f9da844d877 MIPS: boot: Define __ASSEMBLY__ for 
its.S build)
Merging usb-gadget-fixes/fixes (98112041bcca usb: dwc3: core: Fix ULPI PHYs and 
prevent phy_get/ulpi_init during suspend/resume)
Merging usb-serial-fixes/usb-linus (d14ac576d10f USB: serial: cp210x: add new 
device ID ELV ALC 8xxx)
Merging usb-chipidea-fixes/ci-for-usb-stable (964728f9f407 USB: chipidea: msm: 
fix ulpi-node lookup)
Merging phy/fixes (7928b2cbe55b Linux 4.16-rc1)
Merging staging.current/staging-linus (c6754712e053 Merge tag 
'iio-fixes-for-4.16a' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus)
Merging char-misc.current/char-misc-linus 

linux-next: Tree for Feb 26

2018-02-25 Thread Stephen Rothwell
Hi all,

Changes since 20180223:

New tree: leaks

The bpf-next tree gained a conflict against the bpf tree.

The kvms390 tree gained a conflict against the nds32 tree.

The percpu tree gained a build failure for which I applied a patch.

The akpm-current tree gained a conflict against the metag tree and a
build failure for which I applied a patch.

The akpm tree lost a patch that turned up elsewhere.

Non-merge commits (relative to Linus' tree): 3394
 4140 files changed, 155342 insertions(+), 102665 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 258 trees (counting Linus' and 44 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (3664ce2d9309 Merge tag 'powerpc-4.16-4' of 
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux)
Merging fixes/master (7928b2cbe55b Linux 4.16-rc1)
Merging kbuild-current/fixes (36c1681678b5 genksyms: drop *.hash.c from 
.gitignore)
Merging arc-current/for-curr (646a4c62e03a ARC: setup cpu possible mask 
according to possible-cpus dts property)
Merging arm-current/fixes (091f02483df7 ARM: net: bpf: clarify tail_call index)
Merging m68k-current/for-linus (2334b1ac1235 MAINTAINERS: Add NuBus subsystem 
entry)
Merging metag-fixes/fixes (b884a190afce metag/usercopy: Add missing fixups)
Merging powerpc-fixes/fixes (eb0a2d2620ae powerpc/powernv: Support firmware 
disable of RFI flush)
Merging sparc/master (aebb48f5e465 sparc64: fix typo in 
CONFIG_CRYPTO_DES_SPARC64 => CONFIG_CRYPTO_CAMELLIA_SPARC64)
Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2)
Merging net/master (9cb9c07d6b0c Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging bpf/master (ca36960211eb bpf: allow xadd only on aligned memory)
Merging ipsec/master (013cb81e89f8 xfrm: Fix infinite loop in 
xfrm_get_dst_nexthop with transport mode.)
Merging netfilter/master (7d98386d55a5 netfilter: use skb_to_full_sk in 
ip6_route_me_harder)
Merging ipvs/master (f7fb77fc1235 netfilter: nft_compat: check extension hook 
mask only if set)
Merging wireless-drivers/master (78dc897b7ee6 rtlwifi: rtl8723be: Fix loss of 
signal)
Merging mac80211/master (b323ac19b773 mac80211: drop frames with unexpected DS 
bits from fast-rx to slow path)
Merging rdma-fixes/for-rc (f45765872e7a RDMA/uverbs: Fix kernel panic while 
using XRC_TGT QP type)
Merging sound-current/for-linus (240a8af929c7 ALSA: usb-audio: Add a quirck for 
B PX headphones)
Merging pci-current/for-linus (7928b2cbe55b Linux 4.16-rc1)
Merging driver-core.current/driver-core-linus (91ab883eb213 Linux 4.16-rc2)
Merging tty.current/tty-linus (91ab883eb213 Linux 4.16-rc2)
Merging usb.current/usb-linus (0f9da844d877 MIPS: boot: Define __ASSEMBLY__ for 
its.S build)
Merging usb-gadget-fixes/fixes (98112041bcca usb: dwc3: core: Fix ULPI PHYs and 
prevent phy_get/ulpi_init during suspend/resume)
Merging usb-serial-fixes/usb-linus (d14ac576d10f USB: serial: cp210x: add new 
device ID ELV ALC 8xxx)
Merging usb-chipidea-fixes/ci-for-usb-stable (964728f9f407 USB: chipidea: msm: 
fix ulpi-node lookup)
Merging phy/fixes (7928b2cbe55b Linux 4.16-rc1)
Merging staging.current/staging-linus (c6754712e053 Merge tag 
'iio-fixes-for-4.16a' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus)
Merging char-misc.current/char-misc-linus 

Re: [PATCH v11 2/7] PCI/AER: factor out error reporting from AER

2018-02-25 Thread poza

On 2018-02-24 05:12, Bjorn Helgaas wrote:

On Fri, Feb 23, 2018 at 01:53:59PM +0530, Oza Pawandeep wrote:

This patch factors out error reporting callbacks, which are currently
tightly coupled with AER.


Add blank line between paragraphs.


DPC should be able to register callbacks and attmept recovery when DPC
trigger event occurs.


s/attmept/attempt/

This patch basically moves code from aerdrv_core.c to pcie-err.c.  Make 
it

do *only* that.



sure.


Signed-off-by: Oza Pawandeep 

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index fcd8191..abc514e 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -342,6 +342,9 @@ static inline resource_size_t 
pci_resource_alignment(struct pci_dev *dev,


 void pci_enable_acs(struct pci_dev *dev);

+/* PCI error reporting and recovery */
+void pcie_do_recovery(struct pci_dev *dev, int severity);


Add this declaration in the first patch, the one where you rename the
function.



done.


 #ifdef CONFIG_PCIEASPM
 void pcie_aspm_init_link_state(struct pci_dev *pdev);
 void pcie_aspm_exit_link_state(struct pci_dev *pdev);
diff --git a/drivers/pci/pcie/Makefile b/drivers/pci/pcie/Makefile
index 223e4c3..d669497 100644
--- a/drivers/pci/pcie/Makefile
+++ b/drivers/pci/pcie/Makefile
@@ -6,7 +6,7 @@
 # Build PCI Express ASPM if needed
 obj-$(CONFIG_PCIEASPM) += aspm.o

-pcieportdrv-y  := portdrv_core.o portdrv_pci.o portdrv_bus.o
+pcieportdrv-y			:= portdrv_core.o portdrv_pci.o portdrv_bus.o 
pcie-err.o


Can we name this just "drivers/pci/pcie/err.c"?  I know we have
pcie-dpc.c already, but it does get a little repetitious to type
"pci" THREE times in that path.



sure, will rename.


 pcieportdrv-$(CONFIG_ACPI) += portdrv_acpi.o

 obj-$(CONFIG_PCIEPORTBUS)  += pcieportdrv.o
diff --git a/drivers/pci/pcie/aer/aerdrv.h 
b/drivers/pci/pcie/aer/aerdrv.h

index 5449e5c..bc9db53 100644
--- a/drivers/pci/pcie/aer/aerdrv.h
+++ b/drivers/pci/pcie/aer/aerdrv.h
@@ -76,36 +76,6 @@ struct aer_rpc {
 */
 };

-struct aer_broadcast_data {
-   enum pci_channel_state state;
-   enum pci_ers_result result;
-};
-
-static inline pci_ers_result_t merge_result(enum pci_ers_result orig,
-   enum pci_ers_result new)
-{
-   if (new == PCI_ERS_RESULT_NO_AER_DRIVER)
-   return PCI_ERS_RESULT_NO_AER_DRIVER;
-
-   if (new == PCI_ERS_RESULT_NONE)
-   return orig;
-
-   switch (orig) {
-   case PCI_ERS_RESULT_CAN_RECOVER:
-   case PCI_ERS_RESULT_RECOVERED:
-   orig = new;
-   break;
-   case PCI_ERS_RESULT_DISCONNECT:
-   if (new == PCI_ERS_RESULT_NEED_RESET)
-   orig = PCI_ERS_RESULT_NEED_RESET;
-   break;
-   default:
-   break;
-   }
-
-   return orig;
-}
-
 extern struct bus_type pcie_port_bus_type;
 void aer_isr(struct work_struct *work);
 void aer_print_error(struct pci_dev *dev, struct aer_err_info *info);
diff --git a/drivers/pci/pcie/aer/aerdrv_core.c 
b/drivers/pci/pcie/aer/aerdrv_core.c

index aeb83a0..f60b1bb 100644
--- a/drivers/pci/pcie/aer/aerdrv_core.c
+++ b/drivers/pci/pcie/aer/aerdrv_core.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include "aerdrv.h"
+#include "../../pci.h"

 #define	PCI_EXP_AER_FLAGS	(PCI_EXP_DEVCTL_CERE | PCI_EXP_DEVCTL_NFERE 
| \

 PCI_EXP_DEVCTL_FERE | PCI_EXP_DEVCTL_URRE)
@@ -230,191 +231,6 @@ static bool find_source_device(struct pci_dev 
*parent,

return true;
 }

-static int report_error_detected(struct pci_dev *dev, void *data)
-{
-   pci_ers_result_t vote;
-   const struct pci_error_handlers *err_handler;
-   struct aer_broadcast_data *result_data;
-   result_data = (struct aer_broadcast_data *) data;
-
-   device_lock(>dev);
-   dev->error_state = result_data->state;
-
-   if (!dev->driver ||
-   !dev->driver->err_handler ||
-   !dev->driver->err_handler->error_detected) {
-   if (result_data->state == pci_channel_io_frozen &&
-   dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) {
-   /*
-* In case of fatal recovery, if one of down-
-* stream device has no driver. We might be
-* unable to recover because a later insmod
-* of a driver for this device is unaware of
-* its hw state.
-*/
-   pci_printk(KERN_DEBUG, dev, "device has %s\n",
-  dev->driver ?
-  "no AER-aware driver" : "no driver");
-   }
-
-   /*
-* If there's any device in the subtree that does not
-* have an error_detected callback, returning
-* PCI_ERS_RESULT_NO_AER_DRIVER prevents calling of
- 

Re: [PATCH v11 2/7] PCI/AER: factor out error reporting from AER

2018-02-25 Thread poza

On 2018-02-24 05:12, Bjorn Helgaas wrote:

On Fri, Feb 23, 2018 at 01:53:59PM +0530, Oza Pawandeep wrote:

This patch factors out error reporting callbacks, which are currently
tightly coupled with AER.


Add blank line between paragraphs.


DPC should be able to register callbacks and attmept recovery when DPC
trigger event occurs.


s/attmept/attempt/

This patch basically moves code from aerdrv_core.c to pcie-err.c.  Make 
it

do *only* that.



sure.


Signed-off-by: Oza Pawandeep 

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index fcd8191..abc514e 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -342,6 +342,9 @@ static inline resource_size_t 
pci_resource_alignment(struct pci_dev *dev,


 void pci_enable_acs(struct pci_dev *dev);

+/* PCI error reporting and recovery */
+void pcie_do_recovery(struct pci_dev *dev, int severity);


Add this declaration in the first patch, the one where you rename the
function.



done.


 #ifdef CONFIG_PCIEASPM
 void pcie_aspm_init_link_state(struct pci_dev *pdev);
 void pcie_aspm_exit_link_state(struct pci_dev *pdev);
diff --git a/drivers/pci/pcie/Makefile b/drivers/pci/pcie/Makefile
index 223e4c3..d669497 100644
--- a/drivers/pci/pcie/Makefile
+++ b/drivers/pci/pcie/Makefile
@@ -6,7 +6,7 @@
 # Build PCI Express ASPM if needed
 obj-$(CONFIG_PCIEASPM) += aspm.o

-pcieportdrv-y  := portdrv_core.o portdrv_pci.o portdrv_bus.o
+pcieportdrv-y			:= portdrv_core.o portdrv_pci.o portdrv_bus.o 
pcie-err.o


Can we name this just "drivers/pci/pcie/err.c"?  I know we have
pcie-dpc.c already, but it does get a little repetitious to type
"pci" THREE times in that path.



sure, will rename.


 pcieportdrv-$(CONFIG_ACPI) += portdrv_acpi.o

 obj-$(CONFIG_PCIEPORTBUS)  += pcieportdrv.o
diff --git a/drivers/pci/pcie/aer/aerdrv.h 
b/drivers/pci/pcie/aer/aerdrv.h

index 5449e5c..bc9db53 100644
--- a/drivers/pci/pcie/aer/aerdrv.h
+++ b/drivers/pci/pcie/aer/aerdrv.h
@@ -76,36 +76,6 @@ struct aer_rpc {
 */
 };

-struct aer_broadcast_data {
-   enum pci_channel_state state;
-   enum pci_ers_result result;
-};
-
-static inline pci_ers_result_t merge_result(enum pci_ers_result orig,
-   enum pci_ers_result new)
-{
-   if (new == PCI_ERS_RESULT_NO_AER_DRIVER)
-   return PCI_ERS_RESULT_NO_AER_DRIVER;
-
-   if (new == PCI_ERS_RESULT_NONE)
-   return orig;
-
-   switch (orig) {
-   case PCI_ERS_RESULT_CAN_RECOVER:
-   case PCI_ERS_RESULT_RECOVERED:
-   orig = new;
-   break;
-   case PCI_ERS_RESULT_DISCONNECT:
-   if (new == PCI_ERS_RESULT_NEED_RESET)
-   orig = PCI_ERS_RESULT_NEED_RESET;
-   break;
-   default:
-   break;
-   }
-
-   return orig;
-}
-
 extern struct bus_type pcie_port_bus_type;
 void aer_isr(struct work_struct *work);
 void aer_print_error(struct pci_dev *dev, struct aer_err_info *info);
diff --git a/drivers/pci/pcie/aer/aerdrv_core.c 
b/drivers/pci/pcie/aer/aerdrv_core.c

index aeb83a0..f60b1bb 100644
--- a/drivers/pci/pcie/aer/aerdrv_core.c
+++ b/drivers/pci/pcie/aer/aerdrv_core.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include "aerdrv.h"
+#include "../../pci.h"

 #define	PCI_EXP_AER_FLAGS	(PCI_EXP_DEVCTL_CERE | PCI_EXP_DEVCTL_NFERE 
| \

 PCI_EXP_DEVCTL_FERE | PCI_EXP_DEVCTL_URRE)
@@ -230,191 +231,6 @@ static bool find_source_device(struct pci_dev 
*parent,

return true;
 }

-static int report_error_detected(struct pci_dev *dev, void *data)
-{
-   pci_ers_result_t vote;
-   const struct pci_error_handlers *err_handler;
-   struct aer_broadcast_data *result_data;
-   result_data = (struct aer_broadcast_data *) data;
-
-   device_lock(>dev);
-   dev->error_state = result_data->state;
-
-   if (!dev->driver ||
-   !dev->driver->err_handler ||
-   !dev->driver->err_handler->error_detected) {
-   if (result_data->state == pci_channel_io_frozen &&
-   dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) {
-   /*
-* In case of fatal recovery, if one of down-
-* stream device has no driver. We might be
-* unable to recover because a later insmod
-* of a driver for this device is unaware of
-* its hw state.
-*/
-   pci_printk(KERN_DEBUG, dev, "device has %s\n",
-  dev->driver ?
-  "no AER-aware driver" : "no driver");
-   }
-
-   /*
-* If there's any device in the subtree that does not
-* have an error_detected callback, returning
-* PCI_ERS_RESULT_NO_AER_DRIVER prevents calling of
-* the subsequent 

Re: [V9fs-developer] [PATCH] fs/9p: don't set SB_NOATIME by default

2018-02-25 Thread Bo YU

Hi,
I think you would better to modify the subject line without
[V9fs-developers].
On Mon, Feb 26, 2018 at 09:34:49AM +0800, jiangyiwen wrote:

On 2018/2/24 10:47, jiangyiwen wrote:

On 2018/2/9 14:13, jiangyiwen wrote:

User use some syscall, for example mmap(v9fs_file_mmap), it will not
update atime even if user's mnt_flags have MNT_NOATIME, because
v9fs default set SB_NOATIME in v9fs_set_super.

For supporting access time is updated when user mount with relatime,
we should clear SB_NOATIME by default.

Signed-off-by: Yiwen Jiang 
---
 fs/9p/vfs_super.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/9p/vfs_super.c b/fs/9p/vfs_super.c
index af03c2a..48ce504 100644
--- a/fs/9p/vfs_super.c
+++ b/fs/9p/vfs_super.c
@@ -94,7 +94,7 @@ static int v9fs_set_super(struct super_block *s, void *data)
if (v9ses->cache)
sb->s_bdi->ra_pages = (VM_MAX_READAHEAD * 1024)/PAGE_SIZE;

-   sb->s_flags |= SB_ACTIVE | SB_DIRSYNC | SB_NOATIME;
+   sb->s_flags |= SB_ACTIVE | SB_DIRSYNC;
if (!v9ses->cache)
sb->s_flags |= SB_SYNCHRONOUS;


Hi Alexander Viro and Eric,

My patch has already sent two weeks, but nobody help me
to review, I have a question about now who is the v9fs's
maintainer? Or who can help me review the patch?

I hope v9fs's maintainer can give me some advices or
merge into the mainline if it has no problem.

Thanks,
Yiwen.


Hi Andrew,

My patch has already sent more than two weeks, but nobody
help me to review, I have a question about now who is the
v9fs's maintainer? Or who can help me review the patch?


There is no maintainer with V9fs  in get_maintain.pl and V9fs git tree
laterest commit before two years.so,situation become bad.
Maybe you eamil linus directly,although this is't good

I hope v9fs's maintainer can give me some advices or
merge into the mainline if it has no problem.

Thanks,
Yiwen.

--
To unsubscribe from this list: send the line "unsubscribe kernel-janitors" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [V9fs-developer] [PATCH] fs/9p: don't set SB_NOATIME by default

2018-02-25 Thread Bo YU

Hi,
I think you would better to modify the subject line without
[V9fs-developers].
On Mon, Feb 26, 2018 at 09:34:49AM +0800, jiangyiwen wrote:

On 2018/2/24 10:47, jiangyiwen wrote:

On 2018/2/9 14:13, jiangyiwen wrote:

User use some syscall, for example mmap(v9fs_file_mmap), it will not
update atime even if user's mnt_flags have MNT_NOATIME, because
v9fs default set SB_NOATIME in v9fs_set_super.

For supporting access time is updated when user mount with relatime,
we should clear SB_NOATIME by default.

Signed-off-by: Yiwen Jiang 
---
 fs/9p/vfs_super.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/9p/vfs_super.c b/fs/9p/vfs_super.c
index af03c2a..48ce504 100644
--- a/fs/9p/vfs_super.c
+++ b/fs/9p/vfs_super.c
@@ -94,7 +94,7 @@ static int v9fs_set_super(struct super_block *s, void *data)
if (v9ses->cache)
sb->s_bdi->ra_pages = (VM_MAX_READAHEAD * 1024)/PAGE_SIZE;

-   sb->s_flags |= SB_ACTIVE | SB_DIRSYNC | SB_NOATIME;
+   sb->s_flags |= SB_ACTIVE | SB_DIRSYNC;
if (!v9ses->cache)
sb->s_flags |= SB_SYNCHRONOUS;


Hi Alexander Viro and Eric,

My patch has already sent two weeks, but nobody help me
to review, I have a question about now who is the v9fs's
maintainer? Or who can help me review the patch?

I hope v9fs's maintainer can give me some advices or
merge into the mainline if it has no problem.

Thanks,
Yiwen.


Hi Andrew,

My patch has already sent more than two weeks, but nobody
help me to review, I have a question about now who is the
v9fs's maintainer? Or who can help me review the patch?


There is no maintainer with V9fs  in get_maintain.pl and V9fs git tree
laterest commit before two years.so,situation become bad.
Maybe you eamil linus directly,although this is't good

I hope v9fs's maintainer can give me some advices or
merge into the mainline if it has no problem.

Thanks,
Yiwen.

--
To unsubscribe from this list: send the line "unsubscribe kernel-janitors" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mm: Fix races between address_space dereference and free in page_evicatable

2018-02-25 Thread Minchan Kim
Hi Jan,

On Mon, Feb 19, 2018 at 11:57:35AM +0100, Jan Kara wrote:
> Hi Minchan,
> 
> On Sun 18-02-18 18:22:45, Minchan Kim wrote:
> > On Mon, Feb 12, 2018 at 04:12:27PM +0800, Huang, Ying wrote:
> > > From: Huang Ying 
> > > 
> > > When page_mapping() is called and the mapping is dereferenced in
> > > page_evicatable() through shrink_active_list(), it is possible for the
> > > inode to be truncated and the embedded address space to be freed at
> > > the same time.  This may lead to the following race.
> > > 
> > > CPU1CPU2
> > > 
> > > truncate(inode) shrink_active_list()
> > >   ... page_evictable(page)
> > >   truncate_inode_page(mapping, page);
> > > delete_from_page_cache(page)
> > >   spin_lock_irqsave(>tree_lock, flags);
> > > __delete_from_page_cache(page, NULL)
> > >   page_cache_tree_delete(..)
> > > ... mapping = 
> > > page_mapping(page);
> > > page->mapping = NULL;
> > > ...
> > >   spin_unlock_irqrestore(>tree_lock, flags);
> > >   page_cache_free_page(mapping, page)
> > > put_page(page)
> > >   if (put_page_testzero(page)) -> false
> > > - inode now has no pages and can be freed including embedded address_space
> > > 
> > > 
> > > mapping_unevictable(mapping)
> > > 
> > > test_bit(AS_UNEVICTABLE, >flags);
> > > - we've dereferenced mapping which is potentially already free.
> > > 
> > > Similar race exists between swap cache freeing and page_evicatable() too.
> > > 
> > > The address_space in inode and swap cache will be freed after a RCU
> > > grace period.  So the races are fixed via enclosing the page_mapping()
> > > and address_space usage in rcu_read_lock/unlock().  Some comments are
> > > added in code to make it clear what is protected by the RCU read lock.
> > 
> > Is it always true for every FSes, even upcoming FSes?
> > IOW, do we have any strict rule FS folks must use RCU(i.e., call_rcu)
> > to destroy inode?
> > 
> > Let's cc linux-fs.
> 
> That's actually a good question. Pathname lookup relies on inodes being
> protected by RCU so "normal" filesystems definitely need to use RCU freeing
> of inodes. OTOH a filesystem could in theory refuse any attempt for RCU
> pathname walk (in its .d_revalidate/.d_compare callback) and then get away
> with freeing its inodes normally AFAICT. I don't see that happening
> anywhere in the tree but in theory it is possible with some effort... But
> frankly I don't see a good reason for that so all we should do is to
> document that .destroy_inode needs to free the inode structure through RCU
> if it uses page cache? Al?

Yub, it would be much better. However, how does this patch fix the problem?
Although it can make only page_evictable safe, we could go with the page
further and finally uses page->mapping, again.
For instance,

shrink_active_list
page_evictable();
..
page_referened()
page_rmapping
page->mapping

I think caller should lock the page to protect entire operation, which
have been used more widely to pin a address_space.

Thanks.


Re: [PATCH] mm: Fix races between address_space dereference and free in page_evicatable

2018-02-25 Thread Minchan Kim
Hi Jan,

On Mon, Feb 19, 2018 at 11:57:35AM +0100, Jan Kara wrote:
> Hi Minchan,
> 
> On Sun 18-02-18 18:22:45, Minchan Kim wrote:
> > On Mon, Feb 12, 2018 at 04:12:27PM +0800, Huang, Ying wrote:
> > > From: Huang Ying 
> > > 
> > > When page_mapping() is called and the mapping is dereferenced in
> > > page_evicatable() through shrink_active_list(), it is possible for the
> > > inode to be truncated and the embedded address space to be freed at
> > > the same time.  This may lead to the following race.
> > > 
> > > CPU1CPU2
> > > 
> > > truncate(inode) shrink_active_list()
> > >   ... page_evictable(page)
> > >   truncate_inode_page(mapping, page);
> > > delete_from_page_cache(page)
> > >   spin_lock_irqsave(>tree_lock, flags);
> > > __delete_from_page_cache(page, NULL)
> > >   page_cache_tree_delete(..)
> > > ... mapping = 
> > > page_mapping(page);
> > > page->mapping = NULL;
> > > ...
> > >   spin_unlock_irqrestore(>tree_lock, flags);
> > >   page_cache_free_page(mapping, page)
> > > put_page(page)
> > >   if (put_page_testzero(page)) -> false
> > > - inode now has no pages and can be freed including embedded address_space
> > > 
> > > 
> > > mapping_unevictable(mapping)
> > > 
> > > test_bit(AS_UNEVICTABLE, >flags);
> > > - we've dereferenced mapping which is potentially already free.
> > > 
> > > Similar race exists between swap cache freeing and page_evicatable() too.
> > > 
> > > The address_space in inode and swap cache will be freed after a RCU
> > > grace period.  So the races are fixed via enclosing the page_mapping()
> > > and address_space usage in rcu_read_lock/unlock().  Some comments are
> > > added in code to make it clear what is protected by the RCU read lock.
> > 
> > Is it always true for every FSes, even upcoming FSes?
> > IOW, do we have any strict rule FS folks must use RCU(i.e., call_rcu)
> > to destroy inode?
> > 
> > Let's cc linux-fs.
> 
> That's actually a good question. Pathname lookup relies on inodes being
> protected by RCU so "normal" filesystems definitely need to use RCU freeing
> of inodes. OTOH a filesystem could in theory refuse any attempt for RCU
> pathname walk (in its .d_revalidate/.d_compare callback) and then get away
> with freeing its inodes normally AFAICT. I don't see that happening
> anywhere in the tree but in theory it is possible with some effort... But
> frankly I don't see a good reason for that so all we should do is to
> document that .destroy_inode needs to free the inode structure through RCU
> if it uses page cache? Al?

Yub, it would be much better. However, how does this patch fix the problem?
Although it can make only page_evictable safe, we could go with the page
further and finally uses page->mapping, again.
For instance,

shrink_active_list
page_evictable();
..
page_referened()
page_rmapping
page->mapping

I think caller should lock the page to protect entire operation, which
have been used more widely to pin a address_space.

Thanks.


Re: [PATCH RESEND 1/2] mm: swap: clean up swap readahead

2018-02-25 Thread Huang, Ying
Minchan Kim  writes:

> On Fri, Feb 23, 2018 at 04:02:27PM +0800, Huang, Ying wrote:
>>  writes:
>> [snip]
>> 
>> > diff --git a/mm/swap_state.c b/mm/swap_state.c
>> > index 39ae7cfad90f..c56cce64b2c3 100644
>> > --- a/mm/swap_state.c
>> > +++ b/mm/swap_state.c
>> > @@ -332,32 +332,38 @@ struct page *lookup_swap_cache(swp_entry_t entry, 
>> > struct vm_area_struct *vma,
>> >   unsigned long addr)
>> >  {
>> >struct page *page;
>> > -  unsigned long ra_info;
>> > -  int win, hits, readahead;
>> >  
>> >page = find_get_page(swap_address_space(entry), swp_offset(entry));
>> >  
>> >INC_CACHE_INFO(find_total);
>> >if (page) {
>> > +  bool vma_ra = swap_use_vma_readahead();
>> > +  bool readahead = TestClearPageReadahead(page);
>> > +
>> 
>> TestClearPageReadahead() cannot be called for compound page.  As in
>> 
>> PAGEFLAG(Readahead, reclaim, PF_NO_COMPOUND)
>>  TESTCLEARFLAG(Readahead, reclaim, PF_NO_COMPOUND)
>> 
>> >INC_CACHE_INFO(find_success);
>> >if (unlikely(PageTransCompound(page)))
>> >return page;
>> > -  readahead = TestClearPageReadahead(page);
>> 
>> So we can only call it here after checking whether page is compound.
>
> Hi Huang,
>
> Thanks for cathing this.
> However, I don't see the reason we should rule out THP page for
> readahead marker. Could't we relax the rule?
>
> I hope we can do so that we could remove PageTransCompound check
> for readahead marker, which makes code ugly.
>
> From 748b084d5c3960ec2418d8c51a678aada30f1072 Mon Sep 17 00:00:00 2001
> From: Minchan Kim 
> Date: Mon, 26 Feb 2018 13:46:43 +0900
> Subject: [PATCH] mm: relax policy for PG_readahead
>
> This flag is in use for anon THP page so let's relax it.
>
> Cc: Kirill A. Shutemov 
> Signed-off-by: Minchan Kim 
> ---
>  include/linux/page-flags.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index e34a27727b9a..f12d4dfae580 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -318,8 +318,8 @@ PAGEFLAG(MappedToDisk, mappedtodisk, PF_NO_TAIL)
>  /* PG_readahead is only used for reads; PG_reclaim is only for writes */
>  PAGEFLAG(Reclaim, reclaim, PF_NO_TAIL)
>   TESTCLEARFLAG(Reclaim, reclaim, PF_NO_TAIL)
> -PAGEFLAG(Readahead, reclaim, PF_NO_COMPOUND)
> - TESTCLEARFLAG(Readahead, reclaim, PF_NO_COMPOUND)
> +PAGEFLAG(Readahead, reclaim, PF_NO_TAIL)
> + TESTCLEARFLAG(Readahead, reclaim, PF_NO_TAIL)
>  
>  #ifdef CONFIG_HIGHMEM
>  /*

We never set Readahead bit for THP in reality.  The original code acts
as document for this.  I don't think it is a good idea to change this
without a good reason.

Best Regards,
Huang, Ying


Re: [PATCH RESEND 1/2] mm: swap: clean up swap readahead

2018-02-25 Thread Huang, Ying
Minchan Kim  writes:

> On Fri, Feb 23, 2018 at 04:02:27PM +0800, Huang, Ying wrote:
>>  writes:
>> [snip]
>> 
>> > diff --git a/mm/swap_state.c b/mm/swap_state.c
>> > index 39ae7cfad90f..c56cce64b2c3 100644
>> > --- a/mm/swap_state.c
>> > +++ b/mm/swap_state.c
>> > @@ -332,32 +332,38 @@ struct page *lookup_swap_cache(swp_entry_t entry, 
>> > struct vm_area_struct *vma,
>> >   unsigned long addr)
>> >  {
>> >struct page *page;
>> > -  unsigned long ra_info;
>> > -  int win, hits, readahead;
>> >  
>> >page = find_get_page(swap_address_space(entry), swp_offset(entry));
>> >  
>> >INC_CACHE_INFO(find_total);
>> >if (page) {
>> > +  bool vma_ra = swap_use_vma_readahead();
>> > +  bool readahead = TestClearPageReadahead(page);
>> > +
>> 
>> TestClearPageReadahead() cannot be called for compound page.  As in
>> 
>> PAGEFLAG(Readahead, reclaim, PF_NO_COMPOUND)
>>  TESTCLEARFLAG(Readahead, reclaim, PF_NO_COMPOUND)
>> 
>> >INC_CACHE_INFO(find_success);
>> >if (unlikely(PageTransCompound(page)))
>> >return page;
>> > -  readahead = TestClearPageReadahead(page);
>> 
>> So we can only call it here after checking whether page is compound.
>
> Hi Huang,
>
> Thanks for cathing this.
> However, I don't see the reason we should rule out THP page for
> readahead marker. Could't we relax the rule?
>
> I hope we can do so that we could remove PageTransCompound check
> for readahead marker, which makes code ugly.
>
> From 748b084d5c3960ec2418d8c51a678aada30f1072 Mon Sep 17 00:00:00 2001
> From: Minchan Kim 
> Date: Mon, 26 Feb 2018 13:46:43 +0900
> Subject: [PATCH] mm: relax policy for PG_readahead
>
> This flag is in use for anon THP page so let's relax it.
>
> Cc: Kirill A. Shutemov 
> Signed-off-by: Minchan Kim 
> ---
>  include/linux/page-flags.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index e34a27727b9a..f12d4dfae580 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -318,8 +318,8 @@ PAGEFLAG(MappedToDisk, mappedtodisk, PF_NO_TAIL)
>  /* PG_readahead is only used for reads; PG_reclaim is only for writes */
>  PAGEFLAG(Reclaim, reclaim, PF_NO_TAIL)
>   TESTCLEARFLAG(Reclaim, reclaim, PF_NO_TAIL)
> -PAGEFLAG(Readahead, reclaim, PF_NO_COMPOUND)
> - TESTCLEARFLAG(Readahead, reclaim, PF_NO_COMPOUND)
> +PAGEFLAG(Readahead, reclaim, PF_NO_TAIL)
> + TESTCLEARFLAG(Readahead, reclaim, PF_NO_TAIL)
>  
>  #ifdef CONFIG_HIGHMEM
>  /*

We never set Readahead bit for THP in reality.  The original code acts
as document for this.  I don't think it is a good idea to change this
without a good reason.

Best Regards,
Huang, Ying


linux-next: build failure after merge of the akpm-current tree

2018-02-25 Thread Stephen Rothwell
Hi Andrew,

After merging the akpm tree, today's linux-next build (powerpc
allyesconfig) failed like this:

security/keys/big_key.c: In function 'big_key_free_buffer':
security/keys/big_key.c:146:3: error: implicit declaration of function 
'vunmap'; did you mean 'kunmap'? [-Werror=implicit-function-declaration]
   vunmap(buf->virt);
   ^~
   kunmap
security/keys/big_key.c: In function 'big_key_alloc_buffer':
security/keys/big_key.c:187:14: error: implicit declaration of function 'vmap'; 
did you mean 'kmap'? [-Werror=implicit-function-declaration]
  buf->virt = vmap(buf->pages, buf->nr_pages, VM_MAP, PAGE_KERNEL);
  ^~~~
  kmap
security/keys/big_key.c:187:46: error: 'VM_MAP' undeclared (first use in this 
function); did you mean 'VM_SAO'?
  buf->virt = vmap(buf->pages, buf->nr_pages, VM_MAP, PAGE_KERNEL);
  ^~
  VM_SAO
security/keys/big_key.c:187:46: note: each undeclared identifier is reported 
only once for each function it appears in

Caused by commit

  d9f4bb1a0f4d ("KEYS: Use individual pages in big_key for crypto buffers")

in Linus' tree, but most likely exposed by commit

  e8bd5e5c63b6 ("headers: untangle kmemleak.h from mm.h")

in the akpm-current tree (kmemleak.h includes vmalloc.h).

I have added the following patch for today:

From: Stephen Rothwell 
Date: Mon, 26 Feb 2018 15:58:03 +1100
Subject: [PATCH] KEYS: include vmalloc.h for vmap etc

This was discovered after a patch that removed kmemleak.h from slab.h.

Fixes: d9f4bb1a0f4d ("KEYS: Use individual pages in big_key for crypto buffers")
Signed-off-by: Stephen Rothwell 
---
 security/keys/big_key.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/security/keys/big_key.c b/security/keys/big_key.c
index fa728f662a6f..e5823de23f4f 100644
--- a/security/keys/big_key.c
+++ b/security/keys/big_key.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
-- 
2.16.1

Linus, is it worth putting this directly into your tree, or should it
just wait for the patch from Andrew's tree that exposes it?
-- 
Cheers,
Stephen Rothwell


pgpcXmENVMaPJ.pgp
Description: OpenPGP digital signature


linux-next: build failure after merge of the akpm-current tree

2018-02-25 Thread Stephen Rothwell
Hi Andrew,

After merging the akpm tree, today's linux-next build (powerpc
allyesconfig) failed like this:

security/keys/big_key.c: In function 'big_key_free_buffer':
security/keys/big_key.c:146:3: error: implicit declaration of function 
'vunmap'; did you mean 'kunmap'? [-Werror=implicit-function-declaration]
   vunmap(buf->virt);
   ^~
   kunmap
security/keys/big_key.c: In function 'big_key_alloc_buffer':
security/keys/big_key.c:187:14: error: implicit declaration of function 'vmap'; 
did you mean 'kmap'? [-Werror=implicit-function-declaration]
  buf->virt = vmap(buf->pages, buf->nr_pages, VM_MAP, PAGE_KERNEL);
  ^~~~
  kmap
security/keys/big_key.c:187:46: error: 'VM_MAP' undeclared (first use in this 
function); did you mean 'VM_SAO'?
  buf->virt = vmap(buf->pages, buf->nr_pages, VM_MAP, PAGE_KERNEL);
  ^~
  VM_SAO
security/keys/big_key.c:187:46: note: each undeclared identifier is reported 
only once for each function it appears in

Caused by commit

  d9f4bb1a0f4d ("KEYS: Use individual pages in big_key for crypto buffers")

in Linus' tree, but most likely exposed by commit

  e8bd5e5c63b6 ("headers: untangle kmemleak.h from mm.h")

in the akpm-current tree (kmemleak.h includes vmalloc.h).

I have added the following patch for today:

From: Stephen Rothwell 
Date: Mon, 26 Feb 2018 15:58:03 +1100
Subject: [PATCH] KEYS: include vmalloc.h for vmap etc

This was discovered after a patch that removed kmemleak.h from slab.h.

Fixes: d9f4bb1a0f4d ("KEYS: Use individual pages in big_key for crypto buffers")
Signed-off-by: Stephen Rothwell 
---
 security/keys/big_key.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/security/keys/big_key.c b/security/keys/big_key.c
index fa728f662a6f..e5823de23f4f 100644
--- a/security/keys/big_key.c
+++ b/security/keys/big_key.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
-- 
2.16.1

Linus, is it worth putting this directly into your tree, or should it
just wait for the patch from Andrew's tree that exposes it?
-- 
Cheers,
Stephen Rothwell


pgpcXmENVMaPJ.pgp
Description: OpenPGP digital signature


[PATCH] rcu: Remove the unnecessary separate function, rcu_preempt_do_callback()

2018-02-25 Thread Byungchul Park
rcu_preemptp_do_callback() was introduced in commit 09223371dea(rcu:
Use softirq to address performance regression), where it had to be
distinguished between in the case CONFIG_TREE_PREEMPT_RCU is set and
it's not.

Now that the code was cleaned up so that rcu_preemt_do_callback() is
only called in rcu_kthread_do_work() in the same file, tree_plugin.h,
we don't have to keep the separate function anymore. Remove it for a
better readability.

Signed-off-by: Byungchul Park 
---
 kernel/rcu/tree.h|  1 -
 kernel/rcu/tree_plugin.h | 11 +--
 2 files changed, 1 insertion(+), 11 deletions(-)

diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 478b79e..4172833 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -439,7 +439,6 @@ struct rcu_state {
 static void invoke_rcu_callbacks_kthread(void);
 static bool rcu_is_callbacks_kthread(void);
 #ifdef CONFIG_RCU_BOOST
-static void rcu_preempt_do_callbacks(void);
 static int rcu_spawn_one_boost_kthread(struct rcu_state *rsp,
 struct rcu_node *rnp);
 #endif /* #ifdef CONFIG_RCU_BOOST */
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 26d7a31..1070a04 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -686,15 +686,6 @@ static void rcu_preempt_check_callbacks(void)
t->rcu_read_unlock_special.b.need_qs = true;
 }
 
-#ifdef CONFIG_RCU_BOOST
-
-static void rcu_preempt_do_callbacks(void)
-{
-   rcu_do_batch(rcu_state_p, this_cpu_ptr(rcu_data_p));
-}
-
-#endif /* #ifdef CONFIG_RCU_BOOST */
-
 /**
  * call_rcu() - Queue an RCU callback for invocation after a grace period.
  * @head: structure to be used for queueing the RCU updates.
@@ -1170,7 +1161,7 @@ static void rcu_kthread_do_work(void)
 {
rcu_do_batch(_sched_state, this_cpu_ptr(_sched_data));
rcu_do_batch(_bh_state, this_cpu_ptr(_bh_data));
-   rcu_preempt_do_callbacks();
+   rcu_do_batch(rcu_state_p, this_cpu_ptr(rcu_data_p));
 }
 
 static void rcu_cpu_kthread_setup(unsigned int cpu)
-- 
1.9.1



[PATCH] rcu: Remove the unnecessary separate function, rcu_preempt_do_callback()

2018-02-25 Thread Byungchul Park
rcu_preemptp_do_callback() was introduced in commit 09223371dea(rcu:
Use softirq to address performance regression), where it had to be
distinguished between in the case CONFIG_TREE_PREEMPT_RCU is set and
it's not.

Now that the code was cleaned up so that rcu_preemt_do_callback() is
only called in rcu_kthread_do_work() in the same file, tree_plugin.h,
we don't have to keep the separate function anymore. Remove it for a
better readability.

Signed-off-by: Byungchul Park 
---
 kernel/rcu/tree.h|  1 -
 kernel/rcu/tree_plugin.h | 11 +--
 2 files changed, 1 insertion(+), 11 deletions(-)

diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 478b79e..4172833 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -439,7 +439,6 @@ struct rcu_state {
 static void invoke_rcu_callbacks_kthread(void);
 static bool rcu_is_callbacks_kthread(void);
 #ifdef CONFIG_RCU_BOOST
-static void rcu_preempt_do_callbacks(void);
 static int rcu_spawn_one_boost_kthread(struct rcu_state *rsp,
 struct rcu_node *rnp);
 #endif /* #ifdef CONFIG_RCU_BOOST */
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 26d7a31..1070a04 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -686,15 +686,6 @@ static void rcu_preempt_check_callbacks(void)
t->rcu_read_unlock_special.b.need_qs = true;
 }
 
-#ifdef CONFIG_RCU_BOOST
-
-static void rcu_preempt_do_callbacks(void)
-{
-   rcu_do_batch(rcu_state_p, this_cpu_ptr(rcu_data_p));
-}
-
-#endif /* #ifdef CONFIG_RCU_BOOST */
-
 /**
  * call_rcu() - Queue an RCU callback for invocation after a grace period.
  * @head: structure to be used for queueing the RCU updates.
@@ -1170,7 +1161,7 @@ static void rcu_kthread_do_work(void)
 {
rcu_do_batch(_sched_state, this_cpu_ptr(_sched_data));
rcu_do_batch(_bh_state, this_cpu_ptr(_bh_data));
-   rcu_preempt_do_callbacks();
+   rcu_do_batch(rcu_state_p, this_cpu_ptr(rcu_data_p));
 }
 
 static void rcu_cpu_kthread_setup(unsigned int cpu)
-- 
1.9.1



RE: [PATCH 1/3] arm64: dts: ls1012a: add cpu idle support

2018-02-25 Thread Ran Wang
Hi Shawn,

> -Original Message-
> From: Shawn Guo [mailto:shawn...@kernel.org]
> Sent: Saturday, February 24, 2018 3:13 PM
> To: Ran Wang 
> Cc: Mark Rutland ; devicet...@vger.kernel.org; Andy
> Tang ; Catalin Marinas ; Will
> Deacon ; linux-kernel@vger.kernel.org; Leo Li
> ; Rob Herring ; linux-arm-
> ker...@lists.infradead.org
> Subject: Re: [PATCH 1/3] arm64: dts: ls1012a: add cpu idle support
> 
> On Sat, Feb 24, 2018 at 02:25:16PM +0800, Shawn Guo wrote:
> > On Thu, Feb 08, 2018 at 03:54:34PM +0800, Ran Wang wrote:
> > > From: Yuantian Tang 
> > >
> > > Signed-off-by: Tang Yuantian 
> > > ---
> > >  arch/arm64/boot/dts/freescale/fsl-ls1012a.dtsi |   18 ++
> > >  1 files changed, 18 insertions(+), 0 deletions(-)
> >
> > Applied all, thanks.
> 
> Ran,
> 
> Just noticed that these patches are authored by Yuantian.  When you send
> patches authored by someone else, you should have your SoB added to it.
> 
> I just append your SoB below to all 3 patches.
> 
> Signed-off-by: Ran Wang 

Got it, thanks!

Ran


RE: [PATCH 1/3] arm64: dts: ls1012a: add cpu idle support

2018-02-25 Thread Ran Wang
Hi Shawn,

> -Original Message-
> From: Shawn Guo [mailto:shawn...@kernel.org]
> Sent: Saturday, February 24, 2018 3:13 PM
> To: Ran Wang 
> Cc: Mark Rutland ; devicet...@vger.kernel.org; Andy
> Tang ; Catalin Marinas ; Will
> Deacon ; linux-kernel@vger.kernel.org; Leo Li
> ; Rob Herring ; linux-arm-
> ker...@lists.infradead.org
> Subject: Re: [PATCH 1/3] arm64: dts: ls1012a: add cpu idle support
> 
> On Sat, Feb 24, 2018 at 02:25:16PM +0800, Shawn Guo wrote:
> > On Thu, Feb 08, 2018 at 03:54:34PM +0800, Ran Wang wrote:
> > > From: Yuantian Tang 
> > >
> > > Signed-off-by: Tang Yuantian 
> > > ---
> > >  arch/arm64/boot/dts/freescale/fsl-ls1012a.dtsi |   18 ++
> > >  1 files changed, 18 insertions(+), 0 deletions(-)
> >
> > Applied all, thanks.
> 
> Ran,
> 
> Just noticed that these patches are authored by Yuantian.  When you send
> patches authored by someone else, you should have your SoB added to it.
> 
> I just append your SoB below to all 3 patches.
> 
> Signed-off-by: Ran Wang 

Got it, thanks!

Ran


Re: [PATCH tip/core/rcu 06/10] trace: Eliminate cond_resched_rcu_qs() in favor of cond_resched()

2018-02-25 Thread Steven Rostedt
On Sun, 25 Feb 2018 10:17:30 -0800
"Paul E. McKenney"  wrote:


> And probably not.  You are probably running CONFIG_PREEMPT=y (otherwise
> RCU-tasks is trivial), so cond_resched() is a complete no-op:
> 
> static inline int _cond_resched(void) { return 0; }
> 
> I could make this call rcu_all_qs(), but I would not expect Peter Zijlstra
> to be at all happy with that sort of change.
> 
> And the people who asked for the cond_resched() work probably aren't
> going to be happy with the resumed proliferation of cond_resched_rcu_qs().
> 
> Hmmm...  Grasping at straws...  Could we make cond_resched() be something
> like a tracepoint and instrument them with cond_resched_rcu_qs() if the
> current RCU-tasks grace period ran for more that (say) a minute of its
> ten-minute stall-warning span?
> 

Instead of monkeying with cond_resched(), since this is "special" code,
why don't I just have that code call it directly?

cond_resched();
rcu_note_voluntary_context_switch(current);

-- Steve


Re: [PATCH tip/core/rcu 06/10] trace: Eliminate cond_resched_rcu_qs() in favor of cond_resched()

2018-02-25 Thread Steven Rostedt
On Sun, 25 Feb 2018 10:17:30 -0800
"Paul E. McKenney"  wrote:


> And probably not.  You are probably running CONFIG_PREEMPT=y (otherwise
> RCU-tasks is trivial), so cond_resched() is a complete no-op:
> 
> static inline int _cond_resched(void) { return 0; }
> 
> I could make this call rcu_all_qs(), but I would not expect Peter Zijlstra
> to be at all happy with that sort of change.
> 
> And the people who asked for the cond_resched() work probably aren't
> going to be happy with the resumed proliferation of cond_resched_rcu_qs().
> 
> Hmmm...  Grasping at straws...  Could we make cond_resched() be something
> like a tracepoint and instrument them with cond_resched_rcu_qs() if the
> current RCU-tasks grace period ran for more that (say) a minute of its
> ten-minute stall-warning span?
> 

Instead of monkeying with cond_resched(), since this is "special" code,
why don't I just have that code call it directly?

cond_resched();
rcu_note_voluntary_context_switch(current);

-- Steve


Re: [PATCH RESEND 1/2] mm: swap: clean up swap readahead

2018-02-25 Thread Minchan Kim
On Fri, Feb 23, 2018 at 04:02:27PM +0800, Huang, Ying wrote:
>  writes:
> [snip]
> 
> > diff --git a/mm/swap_state.c b/mm/swap_state.c
> > index 39ae7cfad90f..c56cce64b2c3 100644
> > --- a/mm/swap_state.c
> > +++ b/mm/swap_state.c
> > @@ -332,32 +332,38 @@ struct page *lookup_swap_cache(swp_entry_t entry, 
> > struct vm_area_struct *vma,
> >unsigned long addr)
> >  {
> > struct page *page;
> > -   unsigned long ra_info;
> > -   int win, hits, readahead;
> >  
> > page = find_get_page(swap_address_space(entry), swp_offset(entry));
> >  
> > INC_CACHE_INFO(find_total);
> > if (page) {
> > +   bool vma_ra = swap_use_vma_readahead();
> > +   bool readahead = TestClearPageReadahead(page);
> > +
> 
> TestClearPageReadahead() cannot be called for compound page.  As in
> 
> PAGEFLAG(Readahead, reclaim, PF_NO_COMPOUND)
>   TESTCLEARFLAG(Readahead, reclaim, PF_NO_COMPOUND)
> 
> > INC_CACHE_INFO(find_success);
> > if (unlikely(PageTransCompound(page)))
> > return page;
> > -   readahead = TestClearPageReadahead(page);
> 
> So we can only call it here after checking whether page is compound.

Hi Huang,

Thanks for cathing this.
However, I don't see the reason we should rule out THP page for
readahead marker. Could't we relax the rule?

I hope we can do so that we could remove PageTransCompound check
for readahead marker, which makes code ugly.

>From 748b084d5c3960ec2418d8c51a678aada30f1072 Mon Sep 17 00:00:00 2001
From: Minchan Kim 
Date: Mon, 26 Feb 2018 13:46:43 +0900
Subject: [PATCH] mm: relax policy for PG_readahead

This flag is in use for anon THP page so let's relax it.

Cc: Kirill A. Shutemov 
Signed-off-by: Minchan Kim 
---
 include/linux/page-flags.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index e34a27727b9a..f12d4dfae580 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -318,8 +318,8 @@ PAGEFLAG(MappedToDisk, mappedtodisk, PF_NO_TAIL)
 /* PG_readahead is only used for reads; PG_reclaim is only for writes */
 PAGEFLAG(Reclaim, reclaim, PF_NO_TAIL)
TESTCLEARFLAG(Reclaim, reclaim, PF_NO_TAIL)
-PAGEFLAG(Readahead, reclaim, PF_NO_COMPOUND)
-   TESTCLEARFLAG(Readahead, reclaim, PF_NO_COMPOUND)
+PAGEFLAG(Readahead, reclaim, PF_NO_TAIL)
+   TESTCLEARFLAG(Readahead, reclaim, PF_NO_TAIL)
 
 #ifdef CONFIG_HIGHMEM
 /*
-- 
2.16.1.291.g4437f3f132-goog



Re: [PATCH RESEND 1/2] mm: swap: clean up swap readahead

2018-02-25 Thread Minchan Kim
On Fri, Feb 23, 2018 at 04:02:27PM +0800, Huang, Ying wrote:
>  writes:
> [snip]
> 
> > diff --git a/mm/swap_state.c b/mm/swap_state.c
> > index 39ae7cfad90f..c56cce64b2c3 100644
> > --- a/mm/swap_state.c
> > +++ b/mm/swap_state.c
> > @@ -332,32 +332,38 @@ struct page *lookup_swap_cache(swp_entry_t entry, 
> > struct vm_area_struct *vma,
> >unsigned long addr)
> >  {
> > struct page *page;
> > -   unsigned long ra_info;
> > -   int win, hits, readahead;
> >  
> > page = find_get_page(swap_address_space(entry), swp_offset(entry));
> >  
> > INC_CACHE_INFO(find_total);
> > if (page) {
> > +   bool vma_ra = swap_use_vma_readahead();
> > +   bool readahead = TestClearPageReadahead(page);
> > +
> 
> TestClearPageReadahead() cannot be called for compound page.  As in
> 
> PAGEFLAG(Readahead, reclaim, PF_NO_COMPOUND)
>   TESTCLEARFLAG(Readahead, reclaim, PF_NO_COMPOUND)
> 
> > INC_CACHE_INFO(find_success);
> > if (unlikely(PageTransCompound(page)))
> > return page;
> > -   readahead = TestClearPageReadahead(page);
> 
> So we can only call it here after checking whether page is compound.

Hi Huang,

Thanks for cathing this.
However, I don't see the reason we should rule out THP page for
readahead marker. Could't we relax the rule?

I hope we can do so that we could remove PageTransCompound check
for readahead marker, which makes code ugly.

>From 748b084d5c3960ec2418d8c51a678aada30f1072 Mon Sep 17 00:00:00 2001
From: Minchan Kim 
Date: Mon, 26 Feb 2018 13:46:43 +0900
Subject: [PATCH] mm: relax policy for PG_readahead

This flag is in use for anon THP page so let's relax it.

Cc: Kirill A. Shutemov 
Signed-off-by: Minchan Kim 
---
 include/linux/page-flags.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index e34a27727b9a..f12d4dfae580 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -318,8 +318,8 @@ PAGEFLAG(MappedToDisk, mappedtodisk, PF_NO_TAIL)
 /* PG_readahead is only used for reads; PG_reclaim is only for writes */
 PAGEFLAG(Reclaim, reclaim, PF_NO_TAIL)
TESTCLEARFLAG(Reclaim, reclaim, PF_NO_TAIL)
-PAGEFLAG(Readahead, reclaim, PF_NO_COMPOUND)
-   TESTCLEARFLAG(Readahead, reclaim, PF_NO_COMPOUND)
+PAGEFLAG(Readahead, reclaim, PF_NO_TAIL)
+   TESTCLEARFLAG(Readahead, reclaim, PF_NO_TAIL)
 
 #ifdef CONFIG_HIGHMEM
 /*
-- 
2.16.1.291.g4437f3f132-goog



[RFC PATCH V4] pci: virtio_pci: Add SR-IOV support for virtio_pci devices

2018-02-25 Thread Mark Rustad
Hardware-realized virtio_pci devices can implement SR-IOV, so this
patch enables its use. The device in question is an upcoming Intel
NIC that implements both a virtio_net PF and virtio_net VFs. These
are hardware realizations of what has been up to now been a software
interface.

The device in question has the following 4-part PCI IDs:

PF: vendor: 1af4 device: 1041 subvendor: 8086 subdevice: 15fe
VF: vendor: 1af4 device: 1041 subvendor: 8086 subdevice: 05fe

The patch needs no check for device ID, because the callback will
never be made for devices that do not assert the capability or
when run on a platform incapable of SR-IOV.

One reason for this patch is because the hardware requires the
vendor ID of a VF to be the same as the vendor ID of the PF that
created it. So it seemed logical to simply have a fully-functioning
virtio_net PF create the VFs. This patch makes that possible.

Signed-off-by: Mark Rustad 
Reviewed-by: Alexander Duyck 
---
Changes in V4:
- V3 was a mis-send, this has what was intended
- Move most code to new helpers in pci/iov.c, pci_sriov_configure_generic
  and pci_sriov_disable_generic
- Correct mislabeling of vendor and device IDs
- Other minor changelog fixes
- Rebased to pci/master, since most changes are in that area now
- No new ifdefs with this approach (yay)
Changes in V3:
- Missent patch, please disregard
Changes in V2:
- Simplified logic from previous version, removed added driver variable
- Disable SR-IOV on driver removal except when VFs are assigned
- Sent as RFC to virtio-dev, linux-pci, netdev, lkml and others
---
 drivers/pci/iov.c  |   50 
 drivers/virtio/virtio_pci_common.c |2 +
 include/linux/pci.h|   10 +++
 3 files changed, 62 insertions(+)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 677924ae0350..4b110e169b7c 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -367,6 +367,56 @@ static void sriov_disable(struct pci_dev *dev)
pci_iov_set_numvfs(dev, 0);
 }
 
+/**
+ * pci_sriov_disable_generic - standard helper to disable SR-IOV
+ * @dev:the PCI PF device whose VFs are to be disabled
+ */
+int pci_sriov_disable_generic(struct pci_dev *dev)
+{
+   /*
+* If vfs are assigned we cannot shut down SR-IOV without causing
+* issues, so just leave the hardware available.
+*/
+   if (pci_vfs_assigned(dev)) {
+   pci_warn(dev,
+"Cannot disable SR-IOV while VFs are assigned - VFs 
will not be deallocated\n");
+   return -EPERM;
+   }
+   pci_disable_sriov(dev);
+   return 0;
+}
+EXPORT_SYMBOL_GPL(pci_sriov_disable_generic);
+
+static int pci_sriov_enable(struct pci_dev *dev, int num_vfs)
+{
+   int rc;
+
+   if (pci_num_vf(dev))
+   return -EINVAL;
+
+   rc = pci_enable_sriov(dev, num_vfs);
+   if (rc) {
+   pci_warn(dev, "Failed to enable PCI sriov: %d\n", rc);
+   return rc;
+   }
+   pci_info(dev, "SR-IOV enabled with %d VFs\n", num_vfs);
+   return num_vfs;
+}
+
+/**
+ * pci_sriov_configure_generic - standard helper to configure SR-IOV
+ * @dev: the PCI PF device that is configuring SR-IOV
+ */
+int pci_sriov_configure_generic(struct pci_dev *dev, int num_vfs)
+{
+   if (num_vfs)
+   return pci_sriov_enable(dev, num_vfs);
+   if (!pci_num_vf(dev))
+   return -EINVAL;
+   return pci_sriov_disable_generic(dev);
+}
+EXPORT_SYMBOL_GPL(pci_sriov_configure_generic);
+
 static int sriov_init(struct pci_dev *dev, int pos)
 {
int i, bar64;
diff --git a/drivers/virtio/virtio_pci_common.c 
b/drivers/virtio/virtio_pci_common.c
index 48d4d1cf1cb6..d7679377131f 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -584,6 +584,7 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
else
virtio_pci_modern_remove(vp_dev);
 
+   pci_sriov_disable_generic(pci_dev);
pci_disable_device(pci_dev);
put_device(dev);
 }
@@ -596,6 +597,7 @@ static struct pci_driver virtio_pci_driver = {
 #ifdef CONFIG_PM_SLEEP
.driver.pm  = _pci_pm_ops,
 #endif
+   .sriov_configure = pci_sriov_configure_generic,
 };
 
 module_pci_driver(virtio_pci_driver);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 024a1beda008..937124d4e098 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1947,6 +1947,8 @@ int pci_iov_virtfn_devfn(struct pci_dev *dev, int id);
 
 int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
 void pci_disable_sriov(struct pci_dev *dev);
+int pci_sriov_disable_generic(struct pci_dev *dev);
+int pci_sriov_configure_generic(struct pci_dev *dev, int num_vfs);
 int pci_iov_add_virtfn(struct pci_dev *dev, int id);
 void pci_iov_remove_virtfn(struct pci_dev *dev, int id);
 int pci_num_vf(struct pci_dev *dev);
@@ -1973,6 

[RFC PATCH V4] pci: virtio_pci: Add SR-IOV support for virtio_pci devices

2018-02-25 Thread Mark Rustad
Hardware-realized virtio_pci devices can implement SR-IOV, so this
patch enables its use. The device in question is an upcoming Intel
NIC that implements both a virtio_net PF and virtio_net VFs. These
are hardware realizations of what has been up to now been a software
interface.

The device in question has the following 4-part PCI IDs:

PF: vendor: 1af4 device: 1041 subvendor: 8086 subdevice: 15fe
VF: vendor: 1af4 device: 1041 subvendor: 8086 subdevice: 05fe

The patch needs no check for device ID, because the callback will
never be made for devices that do not assert the capability or
when run on a platform incapable of SR-IOV.

One reason for this patch is because the hardware requires the
vendor ID of a VF to be the same as the vendor ID of the PF that
created it. So it seemed logical to simply have a fully-functioning
virtio_net PF create the VFs. This patch makes that possible.

Signed-off-by: Mark Rustad 
Reviewed-by: Alexander Duyck 
---
Changes in V4:
- V3 was a mis-send, this has what was intended
- Move most code to new helpers in pci/iov.c, pci_sriov_configure_generic
  and pci_sriov_disable_generic
- Correct mislabeling of vendor and device IDs
- Other minor changelog fixes
- Rebased to pci/master, since most changes are in that area now
- No new ifdefs with this approach (yay)
Changes in V3:
- Missent patch, please disregard
Changes in V2:
- Simplified logic from previous version, removed added driver variable
- Disable SR-IOV on driver removal except when VFs are assigned
- Sent as RFC to virtio-dev, linux-pci, netdev, lkml and others
---
 drivers/pci/iov.c  |   50 
 drivers/virtio/virtio_pci_common.c |2 +
 include/linux/pci.h|   10 +++
 3 files changed, 62 insertions(+)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 677924ae0350..4b110e169b7c 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -367,6 +367,56 @@ static void sriov_disable(struct pci_dev *dev)
pci_iov_set_numvfs(dev, 0);
 }
 
+/**
+ * pci_sriov_disable_generic - standard helper to disable SR-IOV
+ * @dev:the PCI PF device whose VFs are to be disabled
+ */
+int pci_sriov_disable_generic(struct pci_dev *dev)
+{
+   /*
+* If vfs are assigned we cannot shut down SR-IOV without causing
+* issues, so just leave the hardware available.
+*/
+   if (pci_vfs_assigned(dev)) {
+   pci_warn(dev,
+"Cannot disable SR-IOV while VFs are assigned - VFs 
will not be deallocated\n");
+   return -EPERM;
+   }
+   pci_disable_sriov(dev);
+   return 0;
+}
+EXPORT_SYMBOL_GPL(pci_sriov_disable_generic);
+
+static int pci_sriov_enable(struct pci_dev *dev, int num_vfs)
+{
+   int rc;
+
+   if (pci_num_vf(dev))
+   return -EINVAL;
+
+   rc = pci_enable_sriov(dev, num_vfs);
+   if (rc) {
+   pci_warn(dev, "Failed to enable PCI sriov: %d\n", rc);
+   return rc;
+   }
+   pci_info(dev, "SR-IOV enabled with %d VFs\n", num_vfs);
+   return num_vfs;
+}
+
+/**
+ * pci_sriov_configure_generic - standard helper to configure SR-IOV
+ * @dev: the PCI PF device that is configuring SR-IOV
+ */
+int pci_sriov_configure_generic(struct pci_dev *dev, int num_vfs)
+{
+   if (num_vfs)
+   return pci_sriov_enable(dev, num_vfs);
+   if (!pci_num_vf(dev))
+   return -EINVAL;
+   return pci_sriov_disable_generic(dev);
+}
+EXPORT_SYMBOL_GPL(pci_sriov_configure_generic);
+
 static int sriov_init(struct pci_dev *dev, int pos)
 {
int i, bar64;
diff --git a/drivers/virtio/virtio_pci_common.c 
b/drivers/virtio/virtio_pci_common.c
index 48d4d1cf1cb6..d7679377131f 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -584,6 +584,7 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
else
virtio_pci_modern_remove(vp_dev);
 
+   pci_sriov_disable_generic(pci_dev);
pci_disable_device(pci_dev);
put_device(dev);
 }
@@ -596,6 +597,7 @@ static struct pci_driver virtio_pci_driver = {
 #ifdef CONFIG_PM_SLEEP
.driver.pm  = _pci_pm_ops,
 #endif
+   .sriov_configure = pci_sriov_configure_generic,
 };
 
 module_pci_driver(virtio_pci_driver);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 024a1beda008..937124d4e098 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1947,6 +1947,8 @@ int pci_iov_virtfn_devfn(struct pci_dev *dev, int id);
 
 int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
 void pci_disable_sriov(struct pci_dev *dev);
+int pci_sriov_disable_generic(struct pci_dev *dev);
+int pci_sriov_configure_generic(struct pci_dev *dev, int num_vfs);
 int pci_iov_add_virtfn(struct pci_dev *dev, int id);
 void pci_iov_remove_virtfn(struct pci_dev *dev, int id);
 int pci_num_vf(struct pci_dev *dev);
@@ -1973,6 +1975,14 @@ static inline int pci_iov_add_virtfn(struct 

Re: [PATCH] cpuidle/powernv : Restore different PSSCR for idle and hotplug

2018-02-25 Thread Stewart Smith
Akshay Adiga  writes:
> commit 1e1601b38e6e ("powerpc/powernv/idle: Restore SPRs for deep idle
> states via stop API.") uses stop-api provided by the firmware to restore
> PSSCR. PSSCR restore is required for handling special wakeup. When special
> wakeup is completed, the core enters stop state based on restored PSSCR.
>
> Currently PSSCR is restored to deepest available stop state, causing
> a idle cpu to enter deeper stop state on a special wakeup, which causes
> the cpu to hang on wakeup.
>
> A "sensors" command which reads temperature (through DTS sensors) on idle
> cpu can trigger special wakeup.
>
> Failed Scenario :
> Request restore of PSSCR with RL = 11
> cpu enters idle state (stop5)
>   user triggers "sensors" command
>Assert special wakeup on cpu
>  Restores PSSCR with RL = 11  < Done by firmware
>   Read DTS sensor
>Deassert special wakeup
>   cpu enters idle state (stop11) <-- Instead of stop5
>
> Cpu hang is caused because cpu ended up in a deeper state than it requested
>
> This patch fixes instability caused by special wakeup when stop11 is
> enabled. Requests restore of PSSCR to deepest stop state used by cpuidle.
> Only when offlining cpu, request restore of PSSCR to deepest stop state.
> On onlining cpu, request restore of PSSCR to deepest stop state used by
> cpuidle.
>
> Fixes : 1e1601b38e6e ("powerpc/powernv/idle: Restore SPRs for deep idle
> states via stop API.")

This should CC stable ?

We'll need this to enable stop11 in firmware and not break things, right?

-- 
Stewart Smith
OPAL Architect, IBM.



Re: [PATCH] cpuidle/powernv : Restore different PSSCR for idle and hotplug

2018-02-25 Thread Stewart Smith
Akshay Adiga  writes:
> commit 1e1601b38e6e ("powerpc/powernv/idle: Restore SPRs for deep idle
> states via stop API.") uses stop-api provided by the firmware to restore
> PSSCR. PSSCR restore is required for handling special wakeup. When special
> wakeup is completed, the core enters stop state based on restored PSSCR.
>
> Currently PSSCR is restored to deepest available stop state, causing
> a idle cpu to enter deeper stop state on a special wakeup, which causes
> the cpu to hang on wakeup.
>
> A "sensors" command which reads temperature (through DTS sensors) on idle
> cpu can trigger special wakeup.
>
> Failed Scenario :
> Request restore of PSSCR with RL = 11
> cpu enters idle state (stop5)
>   user triggers "sensors" command
>Assert special wakeup on cpu
>  Restores PSSCR with RL = 11  < Done by firmware
>   Read DTS sensor
>Deassert special wakeup
>   cpu enters idle state (stop11) <-- Instead of stop5
>
> Cpu hang is caused because cpu ended up in a deeper state than it requested
>
> This patch fixes instability caused by special wakeup when stop11 is
> enabled. Requests restore of PSSCR to deepest stop state used by cpuidle.
> Only when offlining cpu, request restore of PSSCR to deepest stop state.
> On onlining cpu, request restore of PSSCR to deepest stop state used by
> cpuidle.
>
> Fixes : 1e1601b38e6e ("powerpc/powernv/idle: Restore SPRs for deep idle
> states via stop API.")

This should CC stable ?

We'll need this to enable stop11 in firmware and not break things, right?

-- 
Stewart Smith
OPAL Architect, IBM.



Re: [PATCH] mm/page_poison: move PAGE_POISON to page_poison.c

2018-02-25 Thread Wei Wang

On 02/13/2018 06:16 PM, Michal Hocko wrote:

On Fri 09-02-18 16:08:14, Wei Wang wrote:

The PAGE_POISON macro is used in page_poison.c only, so avoid exporting
it. Also remove the "mm/debug-pagealloc.c" related comment, which is
obsolete.

Why is this an improvement? I thought the whole point of poison.h is to
keep all the poison value at a single place to make them obviously
unique.


There isn't a comment explaining why they are exposed. We did this 
because PAGE_POISON is used by page_poison.c only, it seems not 
necessary to expose the private values.
Why would it be not unique if moved to page_poison.c (on condition that 
it is only used there)?


Best,
Wei


Re: [PATCH] mm/page_poison: move PAGE_POISON to page_poison.c

2018-02-25 Thread Wei Wang

On 02/13/2018 06:16 PM, Michal Hocko wrote:

On Fri 09-02-18 16:08:14, Wei Wang wrote:

The PAGE_POISON macro is used in page_poison.c only, so avoid exporting
it. Also remove the "mm/debug-pagealloc.c" related comment, which is
obsolete.

Why is this an improvement? I thought the whole point of poison.h is to
keep all the poison value at a single place to make them obviously
unique.


There isn't a comment explaining why they are exposed. We did this 
because PAGE_POISON is used by page_poison.c only, it seems not 
necessary to expose the private values.
Why would it be not unique if moved to page_poison.c (on condition that 
it is only used there)?


Best,
Wei


Re: [PATCH V2 6/7] thermal/drivers/cpu_cooling: Introduce the cpu idle cooling driver

2018-02-25 Thread Viresh Kumar
On 23-02-18, 12:28, Daniel Lezcano wrote:
> On 23/02/2018 08:34, Viresh Kumar wrote:
> > On 21-02-18, 16:29, Daniel Lezcano wrote:
> >> diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
> >> index 5c219dc..9340216 100644
> >> --- a/drivers/thermal/cpu_cooling.c
> >> +++ b/drivers/thermal/cpu_cooling.c
> >> @@ -10,18 +10,32 @@
> >>   *Viresh Kumar 
> >>   *
> >>   */
> >> +#undef DEBUG
> > 
> > Why is this required ?
> 
> It is usually added, so if you set the -DDEBUG flag when compiling, you
> don't get all the pr_debug traces for all files, but the just the ones
> where you commented the #undef above. pr_debug is a no-op otherwise.

Yeah, but this is a mess as you need to go edit the files before
enabling debug with it. Everyone prefers the dynamic debug thing now,
where we don't need such stuff. Just drop it.

> >> +#define pr_fmt(fmt) "CPU cooling: " fmt
> > 
> > I think you can use the dev_***() routines instead, as you can
> > directly the CPU device from anywhere.
> 
> Can we postpone this change for later ? All the file is using pr_*
> (cpufreq_cooling included). There is only one place where dev_err is
> used but it is removed by the patch 3/7.

okay.

> >> +  while (1) {
> >> +  s64 next_wakeup;
> >> +
> >> +  prepare_to_wait(>waitq, , TASK_INTERRUPTIBLE);
> >> +
> >> +  schedule();
> >> +
> >> +  atomic_inc(_cdev->count);
> >> +
> >> +  play_idle(idle_cdev->idle_cycle / USEC_PER_MSEC);
> >> +
> >> +  /*
> >> +   * The last CPU waking up is in charge of setting the
> >> +   * timer. If the CPU is hotplugged, the timer will
> >> +   * move to another CPU (which may not belong to the
> >> +   * same cluster) but that is not a problem as the
> >> +   * timer will be set again by another CPU belonging to
> >> +   * the cluster, so this mechanism is self adaptive and
> >> +   * does not require any hotplugging dance.
> >> +   */
> > 
> > Well this depends on how CPU hotplug really happens. What happens to
> > the per-cpu-tasks which are in the middle of something when hotplug
> > happens?  Does hotplug wait for those per-cpu-tasks to finish ?

Missed this one ?

> >> +int cpuidle_cooling_register(void)
> >> +{
> >> +  struct cpuidle_cooling_device *idle_cdev = NULL;
> >> +  struct thermal_cooling_device *cdev;
> >> +  struct cpuidle_cooling_tsk *cct;
> >> +  struct task_struct *tsk;
> >> +  struct device_node *np;
> >> +  cpumask_t *cpumask;
> >> +  char dev_name[THERMAL_NAME_LENGTH];
> >> +  int ret = -ENOMEM, cpu;
> >> +  int index = 0;
> >> +
> >> +  for_each_possible_cpu(cpu) {
> >> +  cpumask = topology_core_cpumask(cpu);
> >> +
> >> +  cct = per_cpu_ptr(_cooling_tsk, cpu);
> >> +
> >> +  /*
> >> +   * This condition makes the first cpu belonging to the
> >> +   * cluster to create a cooling device and allocates
> >> +   * the structure. Others CPUs belonging to the same
> >> +   * cluster will just increment the refcount on the
> >> +   * cooling device structure and initialize it.
> >> +   */
> >> +  if (cpu == cpumask_first(cpumask)) {
> > 
> > Your function still have few assumptions of cpu numbering and it will
> > break in few cases. What if the CPUs on a big Little system (4x4) are
> > present in this order: B L L L L B B B  ??
> > 
> > This configuration can happen if CPUs in DT are marked as: 0-3 LITTLE,
> > 4-7 big and a big CPU is used by the boot loader to bring up Linux.
> 
> Ok, how can I sort it out ?

I would do something like this:

cpumask_copy(possible, cpu_possible_mask);

while (!cpumask_empty(possible)) {
first = cpumask_first(possible);
cpumask = topology_core_cpumask(first);
cpumask_andnot(possible, possible, cpumask);

allocate_cooling_dev(first); //This is most of this function in 
your patch.

while (!cpumask_empty(cpumask)) {
temp = cpumask_first(possible);
//rest init "temp"
cpumask_clear_cpu(temp, cpumask);
}

//Everything done, register cooling device for cpumask.
}

> >> +  np = of_cpu_device_node_get(cpu);
> >> +
> >> +  idle_cdev = kzalloc(sizeof(*idle_cdev), GFP_KERNEL);
> >> +  if (!idle_cdev)
> >> +  goto out_fail;
> >> +
> >> +  idle_cdev->idle_cycle = DEFAULT_IDLE_TIME_US;
> >> +
> >> +  atomic_set(_cdev->count, 0);
> > 
> > This should already be 0, isn't it ?
> 
> Yes.

I read it as, "I will drop it" :)

-- 
viresh


Re: [PATCH V2 6/7] thermal/drivers/cpu_cooling: Introduce the cpu idle cooling driver

2018-02-25 Thread Viresh Kumar
On 23-02-18, 12:28, Daniel Lezcano wrote:
> On 23/02/2018 08:34, Viresh Kumar wrote:
> > On 21-02-18, 16:29, Daniel Lezcano wrote:
> >> diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
> >> index 5c219dc..9340216 100644
> >> --- a/drivers/thermal/cpu_cooling.c
> >> +++ b/drivers/thermal/cpu_cooling.c
> >> @@ -10,18 +10,32 @@
> >>   *Viresh Kumar 
> >>   *
> >>   */
> >> +#undef DEBUG
> > 
> > Why is this required ?
> 
> It is usually added, so if you set the -DDEBUG flag when compiling, you
> don't get all the pr_debug traces for all files, but the just the ones
> where you commented the #undef above. pr_debug is a no-op otherwise.

Yeah, but this is a mess as you need to go edit the files before
enabling debug with it. Everyone prefers the dynamic debug thing now,
where we don't need such stuff. Just drop it.

> >> +#define pr_fmt(fmt) "CPU cooling: " fmt
> > 
> > I think you can use the dev_***() routines instead, as you can
> > directly the CPU device from anywhere.
> 
> Can we postpone this change for later ? All the file is using pr_*
> (cpufreq_cooling included). There is only one place where dev_err is
> used but it is removed by the patch 3/7.

okay.

> >> +  while (1) {
> >> +  s64 next_wakeup;
> >> +
> >> +  prepare_to_wait(>waitq, , TASK_INTERRUPTIBLE);
> >> +
> >> +  schedule();
> >> +
> >> +  atomic_inc(_cdev->count);
> >> +
> >> +  play_idle(idle_cdev->idle_cycle / USEC_PER_MSEC);
> >> +
> >> +  /*
> >> +   * The last CPU waking up is in charge of setting the
> >> +   * timer. If the CPU is hotplugged, the timer will
> >> +   * move to another CPU (which may not belong to the
> >> +   * same cluster) but that is not a problem as the
> >> +   * timer will be set again by another CPU belonging to
> >> +   * the cluster, so this mechanism is self adaptive and
> >> +   * does not require any hotplugging dance.
> >> +   */
> > 
> > Well this depends on how CPU hotplug really happens. What happens to
> > the per-cpu-tasks which are in the middle of something when hotplug
> > happens?  Does hotplug wait for those per-cpu-tasks to finish ?

Missed this one ?

> >> +int cpuidle_cooling_register(void)
> >> +{
> >> +  struct cpuidle_cooling_device *idle_cdev = NULL;
> >> +  struct thermal_cooling_device *cdev;
> >> +  struct cpuidle_cooling_tsk *cct;
> >> +  struct task_struct *tsk;
> >> +  struct device_node *np;
> >> +  cpumask_t *cpumask;
> >> +  char dev_name[THERMAL_NAME_LENGTH];
> >> +  int ret = -ENOMEM, cpu;
> >> +  int index = 0;
> >> +
> >> +  for_each_possible_cpu(cpu) {
> >> +  cpumask = topology_core_cpumask(cpu);
> >> +
> >> +  cct = per_cpu_ptr(_cooling_tsk, cpu);
> >> +
> >> +  /*
> >> +   * This condition makes the first cpu belonging to the
> >> +   * cluster to create a cooling device and allocates
> >> +   * the structure. Others CPUs belonging to the same
> >> +   * cluster will just increment the refcount on the
> >> +   * cooling device structure and initialize it.
> >> +   */
> >> +  if (cpu == cpumask_first(cpumask)) {
> > 
> > Your function still have few assumptions of cpu numbering and it will
> > break in few cases. What if the CPUs on a big Little system (4x4) are
> > present in this order: B L L L L B B B  ??
> > 
> > This configuration can happen if CPUs in DT are marked as: 0-3 LITTLE,
> > 4-7 big and a big CPU is used by the boot loader to bring up Linux.
> 
> Ok, how can I sort it out ?

I would do something like this:

cpumask_copy(possible, cpu_possible_mask);

while (!cpumask_empty(possible)) {
first = cpumask_first(possible);
cpumask = topology_core_cpumask(first);
cpumask_andnot(possible, possible, cpumask);

allocate_cooling_dev(first); //This is most of this function in 
your patch.

while (!cpumask_empty(cpumask)) {
temp = cpumask_first(possible);
//rest init "temp"
cpumask_clear_cpu(temp, cpumask);
}

//Everything done, register cooling device for cpumask.
}

> >> +  np = of_cpu_device_node_get(cpu);
> >> +
> >> +  idle_cdev = kzalloc(sizeof(*idle_cdev), GFP_KERNEL);
> >> +  if (!idle_cdev)
> >> +  goto out_fail;
> >> +
> >> +  idle_cdev->idle_cycle = DEFAULT_IDLE_TIME_US;
> >> +
> >> +  atomic_set(_cdev->count, 0);
> > 
> > This should already be 0, isn't it ?
> 
> Yes.

I read it as, "I will drop it" :)

-- 
viresh


Re: [PATCH] linux-next: SLIMbus: doc: Fix a warning "Title underline too short"

2018-02-25 Thread Randy Dunlap
On 02/19/2018 06:32 AM, Matthew Wilcox wrote:
> On Mon, Feb 19, 2018 at 10:55:50PM +0900, Masanari Iida wrote:
>>  Driver and Controller APIs:
>> ---
>> +---
>>  .. kernel-doc:: include/linux/slimbus.h
> 
> Is this the right fix?  Shouldn't we rather delete the : instead?
> We don't normally have a colon at the end of subsection titles.

Agreed -- no ending colon.

-- 
~Randy


Re: [PATCH] linux-next: SLIMbus: doc: Fix a warning "Title underline too short"

2018-02-25 Thread Randy Dunlap
On 02/19/2018 06:32 AM, Matthew Wilcox wrote:
> On Mon, Feb 19, 2018 at 10:55:50PM +0900, Masanari Iida wrote:
>>  Driver and Controller APIs:
>> ---
>> +---
>>  .. kernel-doc:: include/linux/slimbus.h
> 
> Is this the right fix?  Shouldn't we rather delete the : instead?
> We don't normally have a colon at the end of subsection titles.

Agreed -- no ending colon.

-- 
~Randy


  1   2   3   4   5   6   7   >