date:20190129

On Mon, Jan 28, 2019 at 02:31:03PM +0100, Jan Kara wrote:
> On Thu 24-01-19 20:19:57, Greg Kroah-Hartman wrote:
> > 4.4-stable review patch.  If anyone has any objections, please let me know.
> > 
> > --
> > 
> > From: Jan Kara 
> > 
> > commit 967d1dc144b50ad005e5eecdfadfbcfb3996 upstream.
> > 
> > __loop_release() has a single call site. Fold it there. This is
> > currently not a huge win but it will make following replacement of
> > loop_index_mutex more obvious.
> > 
> > Signed-off-by: Jan Kara 
> > Signed-off-by: Jens Axboe 
> > Signed-off-by: Greg Kroah-Hartman 
> > 
> 
> Hello Greg!
> 
> This and the following two (patches 69 & 70) loop patches are just
> preparatory cleanups for commits 0da03cab87e632 "loop: Fix deadlock when
> calling blkdev_reread_part()" and 1dded9acf6dc9a "loop: Avoid circular
> locking dependency between loop_ctl_mutex and bd_mutex". As such they don't
> fix anything and it doesn't make sense to carry them in stable unless
> someone backports also the other patches in the series including the fixes
> themselves (which honestly I don't think is worth it for stable).

Ah, you are right, sorry about that.  I was backporting the loop fixes
and these ended up working on 4.4.y, but as you say, were not needed
unless the later patches also showed up.  I'll go revert them now,
thanks.

greg k-h

Re: [PATCH] cpufreq: Auto-register the driver as a thermal cooling device if asked

2019-01-29 Thread Daniel Lezcano

On 30/01/2019 06:22, Amit Kucheria wrote:
> All cpufreq drivers do similar things to register as a cooling device.
> Provide a cpufreq driver flag so drivers can just ask the cpufreq core
> to register the cooling device on their behalf. This allows us to get
> rid of duplicated code in the drivers.
> 
> In order to allow this, we add a struct thermal_cooling_device pointer
> to struct cpufreq_policy so that drivers don't need to store it in a
> private data structure.
> 
> Suggested-by: Stephen Boyd 
> Suggested-by: Viresh Kumar 
> Signed-off-by: Amit Kucheria 
> Reviewed-by: Matthias Kaehlcke 
> Tested-by: Matthias Kaehlcke 
> Acked-by: Viresh Kumar 

Reviewed-by: Daniel Lezcano 

[ ... ]

-- 
  Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog

Re: [EXT] [PATCH] lightnvm: pblk: extend line wp balance check

2019-01-29 Thread Hans Holmberg

On Tue, Jan 29, 2019 at 11:10 PM Zhoujie Wu  wrote:
>
> Sorry that my Linux email client has configuration issue and can't reply 
> email. Used my outlook to reply as plain text and hope that I won't corrupt 
> the format.
> Tested on my board and it works well. Since this is a good fix, I think you 
> don't need to do it based on my previous v3 patch and can directly apply 
> yours:)
>
> Tested-by: Zhoujie Wu 

Thanks Zhoujie!

I'll squash the two patches into a V2, carrying over the tested-by and
credit you as Reported-by if you don't mind.


>
>
> -Original Message-
> From: h...@owltronix.com 
> Sent: Tuesday, January 29, 2019 12:48 AM
> To: Matias Bjorling 
> Cc: jav...@javigon.com; Zhoujie Wu ; 
> linux-bl...@vger.kernel.org; linux-kernel@vger.kernel.org; Hans Holmberg 
> 
> Subject: [EXT] [PATCH] lightnvm: pblk: extend line wp balance check
>
> External Email
>
> --
> From: Hans Holmberg 
>
> pblk stripes writes of minimal write size across all non-offline chunks in a 
> line, which means that the maximum write pointer delta should not exceed the 
> minimal write size. Extend the line write pointer balance check to cover this 
> case.
>
> Signed-off-by: Hans Holmberg 
> ---
>
> This patch applies on top of Zhoujie's V3 of
> "lightnvm: pblk: ignore bad block wp for pblk_line_wp_is_unbalanced
>
>  drivers/lightnvm/pblk-recovery.c | 60 
>  1 file changed, 37 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/lightnvm/pblk-recovery.c 
> b/drivers/lightnvm/pblk-recovery.c
> index 02d466e6925e..d86f580036d3 100644
> --- a/drivers/lightnvm/pblk-recovery.c
> +++ b/drivers/lightnvm/pblk-recovery.c
> @@ -302,41 +302,55 @@ static int pblk_pad_distance(struct pblk *pblk, struct 
> pblk_line *line)
> return (distance > line->left_msecs) ? line->left_msecs : distance;  }
>
> -static int pblk_line_wp_is_unbalanced(struct pblk *pblk,
> - struct pblk_line *line)
> +/* Return a chunk belonging to a line by stripe(write order) index */
> +static struct nvm_chk_meta *pblk_get_stripe_chunk(struct pblk *pblk,
> + struct pblk_line *line,
> + int index)
>  {
> struct nvm_tgt_dev *dev = pblk->dev;
> struct nvm_geo *geo = >geo;
> -   struct pblk_line_meta *lm = >lm;
> struct pblk_lun *rlun;
> -   struct nvm_chk_meta *chunk;
> struct ppa_addr ppa;
> -   u64 line_wp;
> -   int pos, i, bit;
> +   int pos;
>
> -   bit = find_first_zero_bit(line->blk_bitmap, lm->blk_per_line);
> -   if (bit >= lm->blk_per_line)
> -   return 0;
> -   rlun = >luns[bit];
> +   rlun = >luns[index];
> ppa = rlun->bppa;
> pos = pblk_ppa_to_pos(geo, ppa);
> -   chunk = >chks[pos];
>
> -   line_wp = chunk->wp;
> +   return >chks[pos];
> +}
>
> -   for (i = bit + 1; i < lm->blk_per_line; i++) {
> -   rlun = >luns[i];
> -   ppa = rlun->bppa;
> -   pos = pblk_ppa_to_pos(geo, ppa);
> -   chunk = >chks[pos];
> +static int pblk_line_wps_are_unbalanced(struct pblk *pblk,
> + struct pblk_line *line)
> +{
> +   struct pblk_line_meta *lm = >lm;
> +   int blk_in_line = lm->blk_per_line;
> +   struct nvm_chk_meta *chunk;
> +   u64 max_wp, min_wp;
> +   int i;
>
> -   if (chunk->state & NVM_CHK_ST_OFFLINE)
> -   continue;
> +   i = find_first_zero_bit(line->blk_bitmap, blk_in_line);
> +
> +   /* If there is one or zero good chunks in the line,
> +* the write pointers can't be unbalanced.
> +*/
> +   if (i >= (blk_in_line - 1))
> +   return 0;
>
> -   if (chunk->wp > line_wp)
> +   chunk = pblk_get_stripe_chunk(pblk, line, i);
> +   max_wp = chunk->wp;
> +   if (max_wp > pblk->max_write_pgs)
> +   min_wp = max_wp - pblk->max_write_pgs;
> +   else
> +   min_wp = 0;
> +
> +   i = find_next_zero_bit(line->blk_bitmap, blk_in_line, i + 1);
> +   while (i < blk_in_line) {
> +   chunk = pblk_get_stripe_chunk(pblk, line, i);
> +   if (chunk->wp > max_wp || chunk->wp < min_wp)
> return 1;
> -   else if (chunk->wp < line_wp)
> -   line_wp = chunk->wp;
> +
> +   i = find_next_zero_bit(line->blk_bitmap, blk_in_line, i + 1);
> }
>
> return 0;
> @@ -362,7 +376,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct 
> pblk_line *line,
> int ret;
> u64 left_ppas = pblk_sec_in_open_line(pblk, line) - lm->smeta_sec;
>
> -   if (pblk_line_wp_is_unbalanced(pblk, line))
> +   if (pblk_line_wps_are_unbalanced(pblk, line))
> pblk_warn(pblk,

[PATCH v3] async: Add cmdline option to specify drivers to be async probed

2019-01-29 Thread Feng Tang

Asynchronous driver probing can help much on kernel fastboot, and
this option can provide a flexible way to optimize and quickly verify
async driver probe.

Also it will help in below cases:
* Some driver actually covers several families of HWs, some of which
  could use async probing while others don't. So we can't simply
  turn on the PROBE_PREFER_ASYNCHRONOUS flag in driver, but use this
  cmdline option, like igb driver async patch discussed at
  https://www.spinics.net/lists/netdev/msg545986.html

* For SOC (System on Chip) with multiple spi or i2c controllers, most
  of the slave spi/i2c devices will be assigned with fixed controller
  number, while async probing may make those controllers get different
  index for each boot, which prevents those controller drivers to be
  async probed. For platforms not using these spi/i2c slave devices,
  they can use this cmdline option to benefit from the async probing.

Suggested-by: Alexander Duyck 
Signed-off-by: Feng Tang 
---
Changelog:
  v3: 
 move the cmdline_requested_async_probing() into "default:"
 part to enforce the PROBE_FORCE_SYNCHRONOUS check, as suggested
 by Alexander Duyck

  v2:
* change Alexander Duyck's email to alexander.h.du...@linux.intel.com
* fix the parameter for strlcpy()

 drivers/base/dd.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 8ac10af..e99d781 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -57,6 +57,10 @@ static atomic_t deferred_trigger_count = ATOMIC_INIT(0);
 static struct dentry *deferred_devices;
 static bool initcalls_done;
 
+/* Save the async probe drivers' name from kernel cmdline */
+#define ASYNC_DRV_NAMES_MAX_LEN256
+static char async_probe_drv_names[ASYNC_DRV_NAMES_MAX_LEN];
+
 /*
  * In some cases, like suspend to RAM or hibernation, It might be reasonable
  * to prohibit probing of devices as it could be unsafe.
@@ -674,6 +678,22 @@ int driver_probe_device(struct device_driver *drv, struct 
device *dev)
return ret;
 }
 
+static inline bool cmdline_requested_async_probing(const char *drv_name)
+{
+   return parse_option_str(async_probe_drv_names, drv_name);
+}
+
+/* The format is like driver_async_probe=drv_name1,drv_name2,drv_name3 */
+static int __init save_async_options(char *buf)
+{
+   if (strlen(buf) >= ASYNC_DRV_NAMES_MAX_LEN)
+   printk(KERN_WARNING "Too long list for 
async_probe_drv_names!\n");
+
+   strlcpy(async_probe_drv_names, buf, ASYNC_DRV_NAMES_MAX_LEN);
+   return 0;
+}
+__setup("driver_async_probe=", save_async_options);
+
 bool driver_allows_async_probing(struct device_driver *drv)
 {
switch (drv->probe_type) {
@@ -684,6 +704,9 @@ bool driver_allows_async_probing(struct device_driver *drv)
return false;
 
default:
+   if (cmdline_requested_async_probing(drv->name))
+   return true;
+
if (module_requested_async_probing(drv->owner))
return true;
 
-- 
2.7.4

Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

2019-01-29 Thread Tomasz Figa

On Wed, Jan 30, 2019 at 3:28 PM Ayaka  wrote:
>
>
>
> Sent from my iPad
>
> > On Jan 30, 2019, at 11:35 AM, Tomasz Figa  wrote:
> >
> > On Wed, Jan 30, 2019 at 11:29 AM Alexandre Courbot
> >  wrote:
> >>
> >>> On Wed, Jan 30, 2019 at 6:41 AM Nicolas Dufresne  
> >>> wrote:
> >>>
>  Le mardi 29 janvier 2019 à 16:44 +0900, Alexandre Courbot a écrit :
>  On Fri, Jan 25, 2019 at 10:04 PM Paul Kocialkowski
>   wrote:
> > Hi,
> >
> >> On Thu, 2019-01-24 at 20:23 +0800, Ayaka wrote:
> >> Sent from my iPad
> >>
> >>> On Jan 24, 2019, at 6:27 PM, Paul Kocialkowski 
> >>>  wrote:
> >>>
> >>> Hi,
> >>>
>  On Thu, 2019-01-10 at 21:32 +0800, ayaka wrote:
>  I forget a important thing, for the rkvdec and rk hevc decoder, it 
>  would
>  requests cabac table, scaling list, picture parameter set and 
>  reference
>  picture storing in one or various of DMA buffers. I am not talking 
>  about
>  the data been parsed, the decoder would requests a raw data.
> 
>  For the pps and rps, it is possible to reuse the slice header, just 
>  let
>  the decoder know the offset from the bitstream bufer, I would 
>  suggest to
>  add three properties(with sps) for them. But I think we need a 
>  method to
>  mark a OUTPUT side buffer for those aux data.
> >>>
> >>> I'm quite confused about the hardware implementation then. From what
> >>> you're saying, it seems that it takes the raw bitstream elements 
> >>> rather
> >>> than parsed elements. Is it really a stateless implementation?
> >>>
> >>> The stateless implementation was designed with the idea that only the
> >>> raw slice data should be passed in bitstream form to the decoder. For
> >>> H.264, it seems that some decoders also need the slice header in raw
> >>> bitstream form (because they take the full slice NAL unit), see the
> >>> discussions in this thread:
> >>> media: docs-rst: Document m2m stateless video decoder interface
> >>
> >> Stateless just mean it won’t track the previous result, but I don’t
> >> think you can define what a date the hardware would need. Even you
> >> just build a dpb for the decoder, it is still stateless, but parsing
> >> less or more data from the bitstream doesn’t stop a decoder become a
> >> stateless decoder.
> >
> > Yes fair enough, the format in which the hardware decoder takes the
> > bitstream parameters does not make it stateless or stateful per-se.
> > It's just that stateless decoders should have no particular reason for
> > parsing the bitstream on their own since the hardware can be designed
> > with registers for each relevant bitstream element to configure the
> > decoding pipeline. That's how GPU-based decoder implementations are
> > implemented (VAAPI/VDPAU/NVDEC, etc).
> >
> > So the format we have agreed on so far for the stateless interface is
> > to pass parsed elements via v4l2 control structures.
> >
> > If the hardware can only work by parsing the bitstream itself, I'm not
> > sure what the best solution would be. Reconstructing the bitstream in
> > the kernel is a pretty bad option, but so is parsing in the kernel or
> > having the data both in parsed and raw forms. Do you see another
> > possibility?
> 
>  Is reconstructing the bitstream so bad? The v4l2 controls provide a
>  generic interface to an encoded format which the driver needs to
>  convert into a sequence that the hardware can understand. Typically
>  this is done by populating hardware-specific structures. Can't we
>  consider that in this specific instance, the hardware-specific
>  structure just happens to be identical to the original bitstream
>  format?
> >>>
> >>> At maximum allowed bitrate for let's say HEVC (940MB/s iirc), yes, it
> >>> would be really really bad. In GStreamer project we have discussed for
> >>> a while (but have never done anything about) adding the ability through
> >>> a bitmask to select which part of the stream need to be parsed, as
> >>> parsing itself was causing some overhead. Maybe similar thing applies,
> >>> though as per our new design, it's the fourcc that dictate the driver
> >>> behaviour, we'd need yet another fourcc for drivers that wants the full
> >>> bitstream (which seems odd if you have already parsed everything, I
> >>> think this need some clarification).
> >>
> >> Note that I am not proposing to rebuild the *entire* bitstream
> >> in-kernel. What I am saying is that if the hardware interprets some
> >> structures (like SPS/PPS) in their raw format, this raw format could
> >> be reconstructed from the structures passed by userspace at negligible
> >> cost. Such manipulation would only happen on a small amount of data.
> >>
> >> Exposing finer-grained driver

Re: [PATCH v7 1/2] spi: Add Renesas R-Car Gen3 RPC-IF SPI controller driver

2019-01-29 Thread Marek Vasut

On 1/30/19 3:22 AM, masonccy...@mxic.com.tw wrote:
> Hi Marek,

Hi,

>> "Marek Vasut" 
>> 2019/01/29 下午 12:45
>>
>> To
>>
>> masonccy...@mxic.com.tw,
>>
>> cc
>>
>> bbrezil...@kernel.org, broo...@kernel.org, "Geert Uytterhoeven"
>> , "Simon Horman" ,
>> julie...@mxic.com.tw, linux-kernel@vger.kernel.org, linux-renesas-
>> s...@vger.kernel.org, linux-...@vger.kernel.org,
>> sergei.shtyl...@cogentembedded.com, zhengxu...@mxic.com.tw
>>
>> Subject
>>
>> Re: [PATCH v7 1/2] spi: Add Renesas R-Car Gen3 RPC-IF SPI controller
> driver
>>
>> On 1/29/19 3:26 AM, masonccy...@mxic.com.tw wrote:
>> > Hi Marek,
>>
>> Hi,
>>
>> >> >> "Marek Vasut" 
>> >> >> >> >> > +module_platform_driver(rpc_spi_driver);
>> >> >> >> >>
>> >> >> >> >> RPC is not a SPI controller, it's a SPI and HF controller.
>> >> >> >> >>
>> >> >> >> >> Also, how difficult will it be to add the HF support ?
>> >> >> >> >
>> >> >> >> > One of my customers needs RPC SPI driver for our company's
>> >> >> >> > Octal-Flash,MX25UW51245G.
>> >> >> >> > We don't have HF product and hope you could understanding.
>> >> >> >>
>> >> >> >> I am worried that when we need to add RPC HF support (which is
>> > what all
>> >> >> >> boards but the D3 Draak use), we will have to rewrite the entire
>> > driver
>> >> >> >> and/or convert it to MFD and that would be a tremendous
>> >> > undertaking. I'd
>> >> >> >> prefer to have the driver ready for the HF addition before it's
>> >> > accepted
>> >> >> >> upstream.
>> >> >> >>
>> >> >> >
>> >> >> > I think maybe your concerned would be happened only if HF driver
>> >> > goes with
>> >> >> > spi-mem layer.
>> >> >> >
>> >> >> > A comment for HF from Daniel Fishman. FYR.
>> >> >> >
>> >> >> > https://www.quora.com/What-is-a-hyper-flash-memory-and-how-is-it-
>> >> >> different-from-normal-flash-memory
>> >> >>
>> >> >> I have a decent idea what HF and SPI NOR are, since I wrote the RPC
>> >> >> driver for both HF and SPI mode for U-Boot (as I mentioned earlier).
>> >> >>
>> >> >> The HF in Linux would use the CFI NOR part of MTD framework. My
> concern
>> >> >> is that when we need to add HF support into this driver, this driver
>> >> >> will have to be basically rewritten, since the architecture
> won't allow
>> >> >> for that. I'd like to avoid that, since the majority of Gen3 boards,
>> >> >> expect for the D3 Draak, use RPC in HF mode.
>> >> >
>> >> > FYI~
>> >> >
>> >> > MX25UW51245g(64MByte Octa)                      S26KL512S(64MByte HF)
>> >> >    8 IO                                                  8 IO
>> >> > 200MHz DDR@1.8v                                   166MHz DDR@1.8v
>> >> >
>> >> > support Read-while-write                       Not support
>> >> > good for OTA,etc
>> >> > powerful application
>> >>
>> >> What does that mean ?
>> >
>> > I have no idea why would you say "since the majority of Gen3 boards use
>> > RPC in HF mode" ?
>>
>> Well, the H3/M3W/M3N S-X(S) and the H3/M3 ULCB and E3 Ebisu all boot
>> from HF. Only the D3 Draak uses QSPI NOR.
> 
> It's understandable because mx25uw51245g is a new product and it has been
> adopted by Renesas’ Automotive Instrument Cluster RH850/D1M1A MCU.

The aforementioned boards are not going away however. There's too many
users to ignore those.

> We also have patched R-Car's BL for booting from Octa-Flash as bellow log:
> 
> NOTICE:  BL2: R-Car D3 Initial Program Loader(CA53) Rev.0.5.1
> NOTICE:  BL2: PRR is R-Car D3 Ver1.0
> NOTICE:  BL2: Boot device is MXIC_OctaFlash
> NOTICE:  BL2: LCM state is CM
> NOTICE:  BL2: DDR3L-1866(rev.0.02)
> NOTICE:  BL2: QoS is default setting(rev.0.07)
> NOTICE:  BL2: v1.3(release):
> NOTICE:  BL2: Built : 09:56:31, Sep 26 2018
> NOTICE:  BL2: Normal boot
> NOTICE:  BL2: dst=0xe63111f0 src=0x818 len=512(0x200)
> NOTICE:  BL2: dst=0x43f0 src=0x8180400 len=6144(0x1800)
> NOTICE:  BL2: dst=0x4400 src=0x81c len=65536(0x1)
> NOTICE:  BL2: dst=0x5000 src=0x864 len=1048576(0x10)

This looks like a very old ATF version. Anyway, please submit those
patches upstream, they should be generic.

>> > So far as I know that HF is provided by Cypress only and
>> > any mass production product use the component which is provided by only
>> > one provider
>> > will be a big risk.
>> >
>> > Compare to HF, there are more provider of SPI/Octa could support the
>> > mass production product
>> > as their second provider.
>> >
>> > In addition, from the technical points of view, mx25uw51245g is more
>> > powerful than HF and
>> > good for complicate user application, i.e., OTA and so on.
>>
>> Did you consider protocol overhead too ? I don't think you can compare
>> them just by raw numbers of pins and bus frequency.
>>
>> Note that over-the-air update (if that's what you mean by OTA) is
>> completely separate from the underlying storage device.
> 
> It's key feature of mx25uw51245g supports Read-while-Write capability
> that allows read access from one memory bank while writing to another
> memory bank.

Note that this sales

[char-misc for v5.0] mei: free read cb on ctrl_wr list flush

2019-01-29 Thread Tomas Winkler

From: Alexander Usyskin 

There is a little window during disconnection flow
when read cb is moved between lists and may be not freed.
Remove moving read cbs explicitly during flash fixes this memory
leak.

Signed-off-by: Alexander Usyskin 
Signed-off-by: Tomas Winkler 
---
 drivers/misc/mei/client.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/mei/client.c b/drivers/misc/mei/client.c
index 1fc8ea0f519b..ca4c9cc218a2 100644
--- a/drivers/misc/mei/client.c
+++ b/drivers/misc/mei/client.c
@@ -401,8 +401,11 @@ static void mei_io_list_flush_cl(struct list_head *head,
struct mei_cl_cb *cb, *next;
 
list_for_each_entry_safe(cb, next, head, list) {
-   if (cl == cb->cl)
+   if (cl == cb->cl) {
list_del_init(>list);
+   if (cb->fop_type == MEI_FOP_READ)
+   mei_io_cb_free(cb);
+   }
}
 }
 
-- 
2.20.1

Re: [LSF/MM TOPIC] NUMA remote THP vs NUMA local non-THP under MADV_HUGEPAGE

2019-01-29 Thread Michal Hocko

On Tue 29-01-19 18:40:58, Andrea Arcangeli wrote:
> Hello,
> 
> I'd like to attend the LSF/MM Summit 2019. I'm interested in most MM
> topics and it's enlightening to listen to the common non-MM topics
> too.
> 
> One current topic that could be of interest is the THP / NUMA tradeoff
> in subject.
> 
> One issue about a change in MADV_HUGEPAGE behavior made ~3 years ago
> kept floating around for the last 6 months (~12 months since it was
> initially reported as regression through an enterprise-like workload)
> and it was hot-fixed in commit
> ac5b2c18911ffe95c08d69273917f90212cf5659, but it got quickly reverted
> for various reasons.
> 
> I posted some benchmark results showing that for tasks without strong
> NUMA locality the __GFP_THISNODE logic is not guaranteed to be optimal
> (and here of course I mean even if we ignore the large slowdown with
> swap storms at allocation time that might be caused by
> __GFP_THISNODE). The results also show NUMA remote THPs help
> intrasocket as well as intersocket.
> 
> https://lkml.kernel.org/r/20181210044916.gc24...@redhat.com
> https://lkml.kernel.org/r/20181212104418.ge1...@redhat.com
> 
> The following seems the interim conclusion which I happen to be in
> agreement with Michal and Mel:
> 
> https://lkml.kernel.org/r/20181212095051.go1...@dhcp22.suse.cz
> https://lkml.kernel.org/r/20181212170016.gg1...@redhat.com

I am definitely interested in discussing this topic and actually wanted
to propose it myself. I would add that part of the discussion was
proposing a neww memory policy that would effectively enable per-vma
node-reclaim like behavior.
-- 
Michal Hocko
SUSE Labs

RE: [PATCH v2 2/2] media: v4l: xilinx: Add Xilinx MIPI CSI-2 Rx Subsystem driver

2019-01-29 Thread Vishal Sagar

Hi Hyun,

> -Original Message-
> From: Hyun Kwon [mailto:hyun.k...@xilinx.com]
> Sent: Tuesday, January 29, 2019 12:05 AM
> To: Vishal Sagar 
> Cc: Hyun Kwon ; Vishal Sagar ;
> laurent.pinch...@ideasonboard.com; mche...@kernel.org;
> robh...@kernel.org; mark.rutl...@arm.com; Michal Simek
> ; linux-me...@vger.kernel.org;
> devicet...@vger.kernel.org; sakari.ai...@linux.intel.com;
> hans.verk...@cisco.com; linux-arm-ker...@lists.infradead.org; linux-
> ker...@vger.kernel.org; Dinesh Kumar ; Sandip Kothari
> 
> Subject: Re: [PATCH v2 2/2] media: v4l: xilinx: Add Xilinx MIPI CSI-2 Rx
> Subsystem driver
> 
> Hi Vishal,
> 
> On Mon, 2019-01-28 at 03:16:49 -0800, Vishal Sagar wrote:
> > Hi Hyun,
> >
> > Thanks for the review.
> >
> > > -Original Message-
> > > From: Hyun Kwon [mailto:hyun.k...@xilinx.com]
> > > Sent: Saturday, January 26, 2019 7:45 AM
> > > To: Vishal Sagar 
> > > Cc: Hyun Kwon ; laurent.pinch...@ideasonboard.com;
> > > mche...@kernel.org; robh...@kernel.org; mark.rutl...@arm.com;
> Michal
> > > Simek ; linux-me...@vger.kernel.org;
> > > devicet...@vger.kernel.org; sakari.ai...@linux.intel.com;
> > > hans.verk...@cisco.com; linux-arm-ker...@lists.infradead.org; linux-
> > > ker...@vger.kernel.org; Dinesh Kumar ; Sandip
> Kothari
> > > 
> > > Subject: Re: [PATCH v2 2/2] media: v4l: xilinx: Add Xilinx MIPI CSI-2 Rx
> > > Subsystem driver
> > >
> > > Hi Vishal,
> > >
> > > Thanks for the patch.
> > >
> > > On Fri, 2019-01-25 at 09:52:57 -0800, Vishal Sagar wrote:
> > > > The Xilinx MIPI CSI-2 Rx Subsystem soft IP is used to capture images
> > > > from MIPI CSI-2 camera sensors and output AXI4-Stream video data ready
> > > > for image processing. Please refer to PG232 for details.
> > > >
> > > > The driver is used to set the number of active lanes, if enabled
> > > > in hardware. The CSI2 Rx controller filters out all packets except for
> > > > the packets with data type fixed in hardware. RAW8 packets are always
> > > > allowed to pass through.
> > > >
> > > > It is also used to setup and handle interrupts and enable the core. It
> > > > logs all the events in respective counters between streaming on and off.
> > > > The generic short packets received are notified to application via
> > > > v4l2_events.
> > > >
> > > > The driver supports only the video format bridge enabled configuration.
> > > > Some data types like YUV 422 10bpc, RAW16, RAW20 are supported
> when
> > > the
> > > > CSI v2.0 feature is enabled in design. When the VCX feature is enabled,
> > > > the maximum number of virtual channels becomes 16 from 4.
> > > >
> > > > Signed-off-by: Vishal Sagar 
> > > > ---
> > > > v2
> > > > - Fixed comments given by Hyun and Sakari.
> > > > - Made all bitmask using BIT() and GENMASK()
> > > > - Removed unused definitions
> > > > - Removed DPHY access. This will be done by separate DPHY PHY driver.
> > > > - Added support for CSI v2.0 for YUV 422 10bpc, RAW16, RAW20 and
> extra
> > > >   virtual channels
> > > > - Fixed the ports as sink and source
> > > > - Now use the v4l2fwnode API to get number of data-lanes
> > > > - Added clock framework support
> > > > - Removed the close() function
> > > > - updated the set format function
> > > > - support only VFB enabled configuration
> > > >
> > > >  drivers/media/platform/xilinx/Kconfig   |   10 +
> > > >  drivers/media/platform/xilinx/Makefile  |1 +
> > > >  drivers/media/platform/xilinx/xilinx-csi2rxss.c | 1609
> > > +++
> > > >  include/uapi/linux/xilinx-v4l2-controls.h   |   14 +
> > > >  include/uapi/linux/xilinx-v4l2-events.h |   28 +
> > > >  5 files changed, 1662 insertions(+)
> > > >  create mode 100644 drivers/media/platform/xilinx/xilinx-csi2rxss.c
> > > >  create mode 100644 include/uapi/linux/xilinx-v4l2-events.h
> > > >
> > > > diff --git a/drivers/media/platform/xilinx/Kconfig
> > > b/drivers/media/platform/xilinx/Kconfig
> > > > index 74ec8aa..30b4a25 100644
> > > > --- a/drivers/media/platform/xilinx/Kconfig
> > > > +++ b/drivers/media/platform/xilinx/Kconfig
> > > > @@ -10,6 +10,16 @@ config VIDEO_XILINX
> > > >
> > > >  if VIDEO_XILINX
> > > >
> > > > +config VIDEO_XILINX_CSI2RXSS
> > > > +   tristate "Xilinx CSI2 Rx Subsystem"
> > > > +   help
> > > > + Driver for Xilinx MIPI CSI2 Rx Subsystem. This is a V4L 
> > > > sub-device
> > > > + based driver that takes input from CSI2 Tx source and converts
> > > > + it into an AXI4-Stream. The subsystem comprises of a CSI2 Rx
> > > > + controller, DPHY, an optional I2C controller and a Video 
> > > > Format
> > > > + Bridge. The driver is used to set the number of active lanes 
> > > > and
> > > > + get short packet data.
> > > > +
> > > >  config VIDEO_XILINX_TPG
> > > > tristate "Xilinx Video Test Pattern Generator"
> > > > depends on VIDEO_XILINX
> > > > diff --git a/drivers/media/platform/xilinx/Makefile
> > >

Re: [PATCH v3] async: Add cmdline option to specify drivers to be async probed

On Wed, Jan 30, 2019 at 02:38:07PM +0800, Feng Tang wrote:
> Asynchronous driver probing can help much on kernel fastboot, and
> this option can provide a flexible way to optimize and quickly verify
> async driver probe.
> 
> Also it will help in below cases:
> * Some driver actually covers several families of HWs, some of which
>   could use async probing while others don't. So we can't simply
>   turn on the PROBE_PREFER_ASYNCHRONOUS flag in driver, but use this
>   cmdline option, like igb driver async patch discussed at
>   https://www.spinics.net/lists/netdev/msg545986.html
> 
> * For SOC (System on Chip) with multiple spi or i2c controllers, most
>   of the slave spi/i2c devices will be assigned with fixed controller
>   number, while async probing may make those controllers get different
>   index for each boot, which prevents those controller drivers to be
>   async probed. For platforms not using these spi/i2c slave devices,
>   they can use this cmdline option to benefit from the async probing.
> 
> Suggested-by: Alexander Duyck 
> Signed-off-by: Feng Tang 
> ---
>  drivers/base/dd.c | 23 +++
>  1 file changed, 23 insertions(+)

This is v3 and yet no change information below the --- line saying what
is different from previous versions at all?

Not good, please fix.

greg k-h

Re: [PATCH] ipmr: ip6mr: Create new sockopt to clear mfc cache only

2019-01-29 Thread kbuild test robot

Hi Callum,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net/master]
[also build test ERROR on v5.0-rc4 next-20190129]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Callum-Sinclair/ipmr-ip6mr-Create-new-sockopt-to-clear-mfc-cache-only/20190130-104146
config: i386-defconfig (attached as .config)
compiler: gcc-8 (Debian 8.2.0-14) 8.2.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   net/ipv4/ipmr.c: In function 'mroute_clean_cache':
>> net/ipv4/ipmr.c:1312:3: error: 'cache' undeclared (first use in this 
>> function); did you mean 'hh_cache'?
  cache = (struct mfc_cache *)c;
  ^
  hh_cache
   net/ipv4/ipmr.c:1312:3: note: each undeclared identifier is reported only 
once for each function it appears in
>> net/ipv4/ipmr.c:1313:33: error: 'net' undeclared (first use in this function)
  call_ipmr_mfc_entry_notifiers(net, FIB_EVENT_ENTRY_DEL, cache,
^~~
   net/ipv4/ipmr.c: In function 'mroute_clean_tables':
   net/ipv4/ipmr.c:1334:14: warning: unused variable 'net' [-Wunused-variable]
 struct net *net = read_pnet(>net);
 ^~~

vim +1312 net/ipv4/ipmr.c

^1da177e4 Linus Torvalds  2005-04-16  1300  
7ba7b80d1 Callum Sinclair 2019-01-30  1301  /* Clear the vif tables */
7ba7b80d1 Callum Sinclair 2019-01-30  1302  static void 
mroute_clean_cache(struct mr_table *mrt, bool all)
^1da177e4 Linus Torvalds  2005-04-16  1303  {
494fff563 Yuval Mintz 2018-02-28  1304  struct mr_mfc *c, *tmp;
^1da177e4 Linus Torvalds  2005-04-16  1305  
a8cb16dd9 Eric Dumazet2010-10-01  1306  /* Wipe the cache */
8fb472c09 Nikolay Aleksandrov 2017-01-12  1307  
list_for_each_entry_safe(c, tmp, >mfc_cache_list, list) {
0e615e960 Nikolay Aleksandrov 2015-11-20  1308  if (!all && 
(c->mfc_flags & MFC_STATIC))
^1da177e4 Linus Torvalds  2005-04-16  1309  
continue;
8fb472c09 Nikolay Aleksandrov 2017-01-12  1310  
rhltable_remove(>mfc_hash, >mnode, ipmr_rht_params);
a8c9486b8 Eric Dumazet2010-10-01  1311  
list_del_rcu(>list);
494fff563 Yuval Mintz 2018-02-28 @1312  cache = (struct 
mfc_cache *)c;
494fff563 Yuval Mintz 2018-02-28 @1313  
call_ipmr_mfc_entry_notifiers(net, FIB_EVENT_ENTRY_DEL, cache,
b362053a7 Yotam Gigi  2017-09-27  1314  
  mrt->id);
494fff563 Yuval Mintz 2018-02-28  1315  
mroute_netlink_event(mrt, cache, RTM_DELROUTE);
8c13af2a2 Yuval Mintz 2018-03-26  1316  mr_cache_put(c);
^1da177e4 Linus Torvalds  2005-04-16  1317  }
^1da177e4 Linus Torvalds  2005-04-16  1318  
0c12295a7 Patrick McHardy 2010-04-13  1319  if 
(atomic_read(>cache_resolve_queue_len) != 0) {
^1da177e4 Linus Torvalds  2005-04-16  1320  
spin_lock_bh(_unres_lock);
8fb472c09 Nikolay Aleksandrov 2017-01-12  1321  
list_for_each_entry_safe(c, tmp, >mfc_unres_queue, list) {
862465f2e Patrick McHardy 2010-04-13  1322  
list_del(>list);
494fff563 Yuval Mintz 2018-02-28  1323  cache = 
(struct mfc_cache *)c;
494fff563 Yuval Mintz 2018-02-28  1324  
mroute_netlink_event(mrt, cache, RTM_DELROUTE);
494fff563 Yuval Mintz 2018-02-28  1325  
ipmr_destroy_unres(mrt, cache);
^1da177e4 Linus Torvalds  2005-04-16  1326  }
^1da177e4 Linus Torvalds  2005-04-16  1327  
spin_unlock_bh(_unres_lock);
^1da177e4 Linus Torvalds  2005-04-16  1328  }
^1da177e4 Linus Torvalds  2005-04-16  1329  }
^1da177e4 Linus Torvalds  2005-04-16  1330  

:: The code at line 1312 was first introduced by commit
:: 494fff56379c4ad5b8fe36a5b7ffede4044ca7bb ipmr, ip6mr: Make mfc_cache a 
common structure

:: TO: Yuval Mintz 
:: CC: David S. Miller 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [v3 PATCH] mm: ksm: do not block on page lock when searching stable tree

2019-01-29 Thread John Hubbard


On 1/29/19 12:29 PM, Yang Shi wrote:

ksmd need search stable tree to look for the suitable KSM page, but the
KSM page might be locked for a while due to i.e. KSM page rmap walk.
Basically it is not a big deal since commit 2c653d0ee2ae
("ksm: introduce ksm_max_page_sharing per page deduplication limit"),
since max_page_sharing limits the number of shared KSM pages.

But it still sounds not worth waiting for the lock, the page can be skip,
then try to merge it in the next scan to avoid potential stall if its
content is still intact.

Introduce trylock mode to get_ksm_page() to not block on page lock, like
what try_to_merge_one_page() does.  And, define three possible
operations (nolock, lock and trylock) as enum type to avoid stacking up
bools and make the code more readable.

Return -EBUSY if trylock fails, since NULL means not find suitable KSM
page, which is a valid case.

With the default max_page_sharing setting (256), there is almost no
observed change comparing lock vs trylock.

However, with ksm02 of LTP, the reduced ksmd full scan time can be
observed, which has set max_page_sharing to 786432.  With lock version,
ksmd may tak 10s - 11s to run two full scans, with trylock version ksmd
may take 8s - 11s to run two full scans.  And, the number of
pages_sharing and pages_to_scan keep same.  Basically, this change has
no harm >
Cc: Hugh Dickins 
Cc: Andrea Arcangeli 
Suggested-by: John Hubbard 
Reviewed-by: Kirill Tkhai 
Signed-off-by: Yang Shi 
---
Hi folks,

This patch was with "mm: vmscan: skip KSM page in direct reclaim if priority
is low" in the initial submission.  Then Hugh and Andrea pointed out commit
2c653d0ee2ae ("ksm: introduce ksm_max_page_sharing per page deduplication
limit") is good enough for limiting the number of shared KSM page to prevent
from softlock when walking ksm page rmap.  This commit does solve the problem.
So, the series was dropped by Andrew from -mm tree.

However, I thought the second patch (this one) still sounds useful.  So, I did
some test and resubmit it.  The first version was reviewed by Krill Tkhai, so
I keep his Reviewed-by tag since there is no change to the patch except the
commit log.

So, would you please reconsider this patch?

v3: Use enum to define get_ksm_page operations (nolock, lock and trylock) per
 John Hubbard
v2: Updated the commit log to reflect some test result and latest discussion

  mm/ksm.c | 46 --
  1 file changed, 36 insertions(+), 10 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 6c48ad1..5647bc1 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -667,6 +667,12 @@ static void remove_node_from_stable_tree(struct 
stable_node *stable_node)
free_stable_node(stable_node);
  }
  
+enum get_ksm_page_flags {

+   GET_KSM_PAGE_NOLOCK,
+   GET_KSM_PAGE_LOCK,
+   GET_KSM_PAGE_TRYLOCK
+};
+
  /*
   * get_ksm_page: checks if the page indicated by the stable node
   * is still its ksm page, despite having held no reference to it.
@@ -686,7 +692,8 @@ static void remove_node_from_stable_tree(struct stable_node 
*stable_node)
   * a page to put something that might look like our key in page->mapping.
   * is on its way to being freed; but it is an anomaly to bear in mind.
   */
-static struct page *get_ksm_page(struct stable_node *stable_node, bool lock_it)
+static struct page *get_ksm_page(struct stable_node *stable_node,
+enum get_ksm_page_flags flags)
  {
struct page *page;
void *expected_mapping;
@@ -728,8 +735,15 @@ static struct page *get_ksm_page(struct stable_node 
*stable_node, bool lock_it)
goto stale;
}
  
-	if (lock_it) {

+   if (flags == GET_KSM_PAGE_TRYLOCK) {
+   if (!trylock_page(page)) {
+   put_page(page);
+   return ERR_PTR(-EBUSY);
+   }
+   } else if (flags == GET_KSM_PAGE_LOCK)
lock_page(page);
+
+   if (flags != GET_KSM_PAGE_NOLOCK) {
if (READ_ONCE(page->mapping) != expected_mapping) {
unlock_page(page);
put_page(page);
@@ -763,7 +777,7 @@ static void remove_rmap_item_from_tree(struct rmap_item 
*rmap_item)
struct page *page;
  
  		stable_node = rmap_item->head;

-   page = get_ksm_page(stable_node, true);
+   page = get_ksm_page(stable_node, GET_KSM_PAGE_LOCK);
if (!page)
goto out;
  
@@ -863,7 +877,7 @@ static int remove_stable_node(struct stable_node *stable_node)

struct page *page;
int err;
  
-	page = get_ksm_page(stable_node, true);

+   page = get_ksm_page(stable_node, GET_KSM_PAGE_LOCK);
if (!page) {
/*
 * get_ksm_page did remove_node_from_stable_tree itself.
@@ -1385,7 +1399,7 @@ static struct page *stable_node_dup(struct stable_node 
**_stable_node_dup,
 * stable_node parameter

Re: [PATCH] usb: xhci: remove unused member 'parent' in xhci_regset struct

On Wed, Jan 30, 2019 at 10:12:21AM +0800, Chunfeng Yun wrote:
> The member @parent of xhci_regset struct is not used in fact,
> so remove it
> 
> Change-Id: Ic6727c28f7200782fe4516bcb41c789b427318a2

No need for this line :(

[PATCH 2/2] regulator: uniphier: Constify uniphier_regulator_ops

2019-01-29 Thread Axel Lin

Signed-off-by: Axel Lin 
---
 drivers/regulator/uniphier-regulator.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/regulator/uniphier-regulator.c 
b/drivers/regulator/uniphier-regulator.c
index 6ba0ae405f2b..9026d5a3e964 100644
--- a/drivers/regulator/uniphier-regulator.c
+++ b/drivers/regulator/uniphier-regulator.c
@@ -32,7 +32,7 @@ struct uniphier_regulator_priv {
const struct uniphier_regulator_soc_data *data;
 };
 
-static struct regulator_ops uniphier_regulator_ops = {
+static const struct regulator_ops uniphier_regulator_ops = {
.enable = regulator_enable_regmap,
.disable= regulator_disable_regmap,
.is_enabled = regulator_is_enabled_regmap,
-- 
2.17.1

[PATCH 1/2] regulator: uniphier: Fix probe error handling

2019-01-29 Thread Axel Lin

Ensure unwind all resources if probe fails.

Signed-off-by: Axel Lin 
---
 drivers/regulator/uniphier-regulator.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/regulator/uniphier-regulator.c 
b/drivers/regulator/uniphier-regulator.c
index abf22acbd13e..6ba0ae405f2b 100644
--- a/drivers/regulator/uniphier-regulator.c
+++ b/drivers/regulator/uniphier-regulator.c
@@ -87,8 +87,10 @@ static int uniphier_regulator_probe(struct platform_device 
*pdev)
}
 
regmap = devm_regmap_init_mmio(dev, base, priv->data->regconf);
-   if (IS_ERR(regmap))
-   return PTR_ERR(regmap);
+   if (IS_ERR(regmap)) {
+   ret = PTR_ERR(regmap);
+   goto out_rst_assert;
+   }
 
config.dev = dev;
config.driver_data = priv;
-- 
2.17.1

Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

2019-01-29 Thread Ayaka




Sent from my iPad

> On Jan 30, 2019, at 5:41 AM, Nicolas Dufresne  wrote:
> 
>> Le mardi 29 janvier 2019 à 16:44 +0900, Alexandre Courbot a écrit :
>> On Fri, Jan 25, 2019 at 10:04 PM Paul Kocialkowski
>>  wrote:
>>> Hi,
>>> 
 On Thu, 2019-01-24 at 20:23 +0800, Ayaka wrote:
 Sent from my iPad
 
> On Jan 24, 2019, at 6:27 PM, Paul Kocialkowski 
>  wrote:
> 
> Hi,
> 
>> On Thu, 2019-01-10 at 21:32 +0800, ayaka wrote:
>> I forget a important thing, for the rkvdec and rk hevc decoder, it would
>> requests cabac table, scaling list, picture parameter set and reference
>> picture storing in one or various of DMA buffers. I am not talking about
>> the data been parsed, the decoder would requests a raw data.
>> 
>> For the pps and rps, it is possible to reuse the slice header, just let
>> the decoder know the offset from the bitstream bufer, I would suggest to
>> add three properties(with sps) for them. But I think we need a method to
>> mark a OUTPUT side buffer for those aux data.
> 
> I'm quite confused about the hardware implementation then. From what
> you're saying, it seems that it takes the raw bitstream elements rather
> than parsed elements. Is it really a stateless implementation?
> 
> The stateless implementation was designed with the idea that only the
> raw slice data should be passed in bitstream form to the decoder. For
> H.264, it seems that some decoders also need the slice header in raw
> bitstream form (because they take the full slice NAL unit), see the
> discussions in this thread:
> media: docs-rst: Document m2m stateless video decoder interface
 
 Stateless just mean it won’t track the previous result, but I don’t
 think you can define what a date the hardware would need. Even you
 just build a dpb for the decoder, it is still stateless, but parsing
 less or more data from the bitstream doesn’t stop a decoder become a
 stateless decoder.
>>> 
>>> Yes fair enough, the format in which the hardware decoder takes the
>>> bitstream parameters does not make it stateless or stateful per-se.
>>> It's just that stateless decoders should have no particular reason for
>>> parsing the bitstream on their own since the hardware can be designed
>>> with registers for each relevant bitstream element to configure the
>>> decoding pipeline. That's how GPU-based decoder implementations are
>>> implemented (VAAPI/VDPAU/NVDEC, etc).
>>> 
>>> So the format we have agreed on so far for the stateless interface is
>>> to pass parsed elements via v4l2 control structures.
>>> 
>>> If the hardware can only work by parsing the bitstream itself, I'm not
>>> sure what the best solution would be. Reconstructing the bitstream in
>>> the kernel is a pretty bad option, but so is parsing in the kernel or
>>> having the data both in parsed and raw forms. Do you see another
>>> possibility?
>> 
>> Is reconstructing the bitstream so bad? The v4l2 controls provide a
>> generic interface to an encoded format which the driver needs to
>> convert into a sequence that the hardware can understand. Typically
>> this is done by populating hardware-specific structures. Can't we
>> consider that in this specific instance, the hardware-specific
>> structure just happens to be identical to the original bitstream
>> format?
> 
> At maximum allowed bitrate for let's say HEVC (940MB/s iirc), yes, it
Lucky, most of hardware won’t be able to processing such a big buffer.
General speaking, the register is 24bits for stream length in bytes.
> would be really really bad. In GStreamer project we have discussed for
> a while (but have never done anything about) adding the ability through
> a bitmask to select which part of the stream need to be parsed, as
> parsing itself was causing some overhead. Maybe similar thing applies,
> though as per our new design, it's the fourcc that dictate the driver
> behaviour, we'd need yet another fourcc for drivers that wants the full
> bitstream (which seems odd if you have already parsed everything, I
> think this need some clarification).
> 
>> 
>> I agree that this is not strictly optimal for that particular
>> hardware, but such is the cost of abstractions, and in this specific
>> case I don't believe the cost would be particularly high?

[RFC PATCH] USB: PCI: set 32bit DMA mask for PCI based USB controllers

2019-01-29 Thread Hanjun Guo

From: Hanjun Guo 

We met an issue that when we update the IORT table to revision D,
and the kernel update to 4.19, the USB on D06 (ARM64 based server)
will probe fail:

[   13.495751] CPU: 0 PID: 15 Comm: kworker/0:1 Not tainted 
4.19.0-00115-gb2b5200 #5
[   13.503219] Hardware name: Huawei D06/D06, BIOS Hisilicon D06 UEFI RC0 - 
V1.09.02 12/25/2018
[   13.511645] Workqueue: events work_for_cpu_fn
[   13.515989] pstate: a0c9 (NzCv daif +PAN +UAO)
[   13.520767] pc : dma_pool_alloc+0x218/0x270
[   13.524937] lr : dma_pool_alloc+0xa0/0x270
[   13.529019] sp : 09e23b20
[   13.532320] x29: 09e23b20 x28: 8027c58ad098 
[   13.537619] x27: 1000 x26: 8027d7a790a8 
[   13.542918] x25: 08fa7000 x24: 09e23bc0 
[   13.548216] x23: 006000c0 x22: 8027c58ad010 
[   13.553515] x21: 097e1000 x20: 8027c58ad000 
[   13.558814] x19: 8027c58ad080 x18:  
[   13.564112] x17:  x16: 7fff 
[   13.569411] x15: 097e16c8 x14: 8027c5d39885 
[   13.574709] x13: 8027c5d39884 x12: 0038 
[   13.580008] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f 
[   13.585307] x9 :  x8 : 8027c587c400 
[   13.590605] x7 :  x6 : 003f 
[   13.595904] x5 : 8027dc5b8000 x4 : 8027e09b91e0 
[   13.601202] x3 : 008d2280 x2 : 8027c58ad100 
[   13.606501] x1 : 0028 x0 :  
[   13.611800] Call trace:
[   13.614234]  dma_pool_alloc+0x218/0x270
[   13.617710] ata1: SATA link down (SStatus 0 SControl 300)
[   13.618059]  ehci_qh_alloc+0x5c/0xf8
[   13.627002]  ehci_setup+0x17c/0x4b8
[   13.630478]  ehci_pci_setup+0x18c/0x5b8
[   13.634301]  usb_add_hcd+0x290/0x7a0
[   13.637863]  usb_hcd_pci_probe+0x2cc/0x3e8
[   13.641946]  ehci_pci_probe+0x34/0x48
[   13.645596]  local_pci_probe+0x3c/0xb0
[   13.649331]  work_for_cpu_fn+0x18/0x28
[   13.653067]  process_one_work+0x1e4/0x458
[   13.657063]  worker_thread+0x228/0x450
[   13.660798]  kthread+0x12c/0x130
[   13.664014]  ret_from_fork+0x10/0x18
[   13.667577] ---[ end trace 6f8757456e2ec456 ]---

It turns out the the IORT revision D introduce the DMA address
limit size for PCI RC and in commit 5ac65e8c8941 ("ACPI/IORT: Support
address size limit for root complexes"), will set the DMA mask
for the RC and that will be inherited by device under the RC.

D06 only enables 1 RC but has EPs with different DMA address sizes,
for USB it use 32bit DMA, and 64bit for HNS and SAS, so this will
cause probe failure if we use 64bit DMA for USB controllers.

Set the DMA mask to 32bit for PCI based USB controllers,
EHCI and OHCI USB controllers are using 32bit DMA address,
XHCI will set the DMA mask in its probe after the pci probe,
so it's safe just add dma_coerce_mask_and_coherent() in
usb_hcd_pci_probe().

Signed-off-by: Hanjun Guo 
---
Hi all,

This is the RFC version, I'm not sure this is the best solution,
comments are warmly welcomed.

Thanks
Hanjun

 drivers/usb/core/hcd-pci.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c
index 0343246..a9c33e6 100644
--- a/drivers/usb/core/hcd-pci.c
+++ b/drivers/usb/core/hcd-pci.c
@@ -188,6 +188,10 @@ int usb_hcd_pci_probe(struct pci_dev *dev, const struct 
pci_device_id *id)
if (pci_enable_device(dev) < 0)
return -ENODEV;
 
+   retval = dma_coerce_mask_and_coherent(>dev, DMA_BIT_MASK(32));
+   if (retval)
+   return retval;
+
/*
 * The xHCI driver has its own irq management
 * make sure irq setup is not touched for xhci in generic hcd code
-- 
1.7.12.4

Re: [PATCH v2] media: docs-rst: Document m2m stateless video decoder interface

2019-01-29 Thread Randy Li




On 1/22/19 2:26 PM, Alexandre Courbot wrote:

Documents the protocol that user-space should follow when
communicating with stateless video decoders.

The stateless video decoding API makes use of the new request and tags
APIs. While it has been implemented with the Cedrus driver so far, it
should probably still be considered staging for a short while.

Signed-off-by: Alexandre Courbot 
---
Changes since v1:

* Use timestamps instead of tags to reference frames,
* Applied Paul's suggestions to not require one frame worth of data per OUTPUT
   buffer

One of the effects of requiring sub-frame units to be submitted per request is
that the stateless decoders are not exactly "stateless" anymore: if a frame is
made of several slices, then the decoder must keep track of the buffer in which
the current frame is being decoded between requests, and all the slices for the
current frame must be submitted before we can consider decoding the next one.

Also if we decide to force clients to submit one slice per request, then doesn't
some of the H.264 controls need to change? For instance, in the current v2
there is still a v4l2_ctrl_h264_decode_param::num_slices member. It is used in
Chromium to specify the number of slices given to the
V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAMS control, but is apparently ignored by the


No the rkvdec of rockchip need to know how many slices the current 
picture has.


There are two mode for decoder, slice and frame mode. In the most of 
case, it would work in frame mode and at the most of time, there would 
be one slice for a picture. But when thing comes to the real time video 
transmission, it can be various. It is hard to whether user want to 
decode it in slice or frame, both of them having many use case.


And I saw there are some talking about a slice would be assigned with a 
request, I wonder whether it is a good idea.


I know ayaka would be FOSDEM 2019, you can ask him about this.


Cedrus driver. Maxime, can you comment on this?

  Documentation/media/uapi/v4l/dev-codec.rst|   5 +
  .../media/uapi/v4l/dev-stateless-decoder.rst  | 378 ++
  2 files changed, 383 insertions(+)
  create mode 100644 Documentation/media/uapi/v4l/dev-stateless-decoder.rst

diff --git a/Documentation/media/uapi/v4l/dev-codec.rst 
b/Documentation/media/uapi/v4l/dev-codec.rst
index b5e017c17834..6ce38045d3c8 100644
--- a/Documentation/media/uapi/v4l/dev-codec.rst
+++ b/Documentation/media/uapi/v4l/dev-codec.rst
@@ -13,6 +13,11 @@
  Codec Interface
  ***
  
+.. toctree::

+:maxdepth: 1
+
+dev-stateless-decoder
+
  A V4L2 codec can compress, decompress, transform, or otherwise convert
  video data from one format into another format, in memory. Typically
  such devices are memory-to-memory devices (i.e. devices with the
diff --git a/Documentation/media/uapi/v4l/dev-stateless-decoder.rst 
b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
new file mode 100644
index ..148b1751dd20
--- /dev/null
+++ b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
@@ -0,0 +1,378 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _stateless_decoder:
+
+**
+Memory-to-memory Stateless Video Decoder Interface
+**
+
+A stateless decoder is a decoder that works without retaining any kind of state
+between processing frames. This means that each frame is decoded independently
+of any previous and future frames, and that the client is responsible for
+maintaining the decoding state and providing it to the decoder with each
+decoding request. This is in contrast to the stateful video decoder interface,
+where the hardware and driver maintain the decoding state and all the client
+has to do is to provide the raw encoded stream.
+
+This section describes how user-space ("the client") is expected to communicate
+with such decoders in order to successfully decode an encoded stream. Compared
+to stateful codecs, the decoder/client sequence is simpler, but the cost of
+this simplicity is extra complexity in the client which must maintain a
+consistent decoding state.
+
+Stateless decoders make use of the request API. A stateless decoder must thus
+expose the ``V4L2_BUF_CAP_SUPPORTS_REQUESTS`` capability on its ``OUTPUT`` 
queue
+when :c:func:`VIDIOC_REQBUFS` or :c:func:`VIDIOC_CREATE_BUFS` are invoked.
+
+Querying capabilities
+=
+
+1. To enumerate the set of coded formats supported by the decoder, the client
+   calls :c:func:`VIDIOC_ENUM_FMT` on the ``OUTPUT`` queue.
+
+   * The driver must always return the full set of supported ``OUTPUT`` 
formats,
+ irrespective of the format currently set on the ``CAPTURE`` queue.
+
+   * Simultaneously, the driver must restrain the set of values returned by
+ codec-specific capability controls (such as H.264 profiles) to the set
+ actually supported by the hardware.
+
+2. To enumerate the set of supported raw

Re: [PATCH 4.20 000/117] 4.20.6-stable review

On Tue, Jan 29, 2019 at 07:07:58PM -0700, shuah wrote:
> On 1/29/19 4:34 AM, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.20.6 release.
> > There are 117 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Thu Jan 31 11:31:34 UTC 2019.
> > Anything received after that time might be too late.
> > 
> > The whole patch series can be found in one patch at:
> > 
> > https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.20.6-rc1.gz
> > or in the git tree and branch at:
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> > linux-4.20.y
> > and the diffstat can be found below.
> > 
> > thanks,
> > 
> > greg k-h
> > 
> 
> Compiled and booted on my test system. No dmesg regressions.

Thanks for testing all of these and letting me know.

greg k-h

Re: [PATCH v9 1/3] mm: Shuffle initial free memory to improve memory-side-cache utilization

2019-01-29 Thread Mike Rapoport

On Tue, Jan 29, 2019 at 09:02:16PM -0800, Dan Williams wrote:
> Randomization of the page allocator improves the average utilization of
> a direct-mapped memory-side-cache. Memory side caching is a platform
> capability that Linux has been previously exposed to in HPC
> (high-performance computing) environments on specialty platforms. In
> that instance it was a smaller pool of high-bandwidth-memory relative to
> higher-capacity / lower-bandwidth DRAM. Now, this capability is going to
> be found on general purpose server platforms where DRAM is a cache in
> front of higher latency persistent memory [1].

[ ... ]
 
> Cc: Michal Hocko 
> Cc: Dave Hansen 
> Cc: Mike Rapoport 
> Reviewed-by: Kees Cook 
> Signed-off-by: Dan Williams 
> ---
>  include/linux/list.h|   17 
>  include/linux/mmzone.h  |4 +
>  include/linux/shuffle.h |   45 +++
>  init/Kconfig|   23 ++
>  mm/Makefile |7 ++
>  mm/memblock.c   |1 
>  mm/memory_hotplug.c |3 +
>  mm/page_alloc.c |6 +-
>  mm/shuffle.c|  188 
> +++
>  9 files changed, 292 insertions(+), 2 deletions(-)
>  create mode 100644 include/linux/shuffle.h
>  create mode 100644 mm/shuffle.c

...

> diff --git a/mm/memblock.c b/mm/memblock.c
> index 022d4cbb3618..c0cfbfae4a03 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -17,6 +17,7 @@
>  #include 
>  #include 
>  #include 
> +#include 

Nit: does not seem to be required

>  #include 
>  #include 
>  #include 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 02/10] arm64: dts: qcom: sdm845: Define rmtfs memory

2019-01-29 Thread Bjorn Andersson

On Tue 29 Jan 15:20 PST 2019, Bjorn Andersson wrote:

> Define the rmtfs memory node, as described in version 10 of the memory
> map.
> 
> Signed-off-by: Bjorn Andersson 
> ---
> 
> Changes since v3:
> - Labeled the node
> 
>  arch/arm64/boot/dts/qcom/sdm845.dtsi | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi 
> b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> index c363848e9001..afaffcc1e835 100644
> --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
> +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> @@ -78,6 +78,15 @@
>   no-map;
>   };
>  
> + mpss_efs: memory@85d0 {
> + compatible = "qcom,rmtfs-mem";
> + reg = <0 0x85d0 0 0x20>;

It would be nice if this is how math works, but unfortunately I missed
that this overlaps with the xbl_mem below.

Will fix and resend the series.

Regards,
Bjorn

> + no-map;
> +
> + qcom,client-id = <1>;
> + qcom,vmid = <15>;
> + };
> +
>   xbl_mem: memory@85e0 {
>   reg = <0x0 0x85e0 0 0x10>;
>   no-map;
> -- 
> 2.18.0
>

hi

2019-01-29 Thread Sherri Gallagher

Please get back to me My name is Sgt.Sherri Gallagher,

[PATCH v3] async: Add cmdline option to specify drivers to be async probed

2019-01-29 Thread Feng Tang

Asynchronous driver probing can help much on kernel fastboot, and
this option can provide a flexible way to optimize and quickly verify
async driver probe.

Also it will help in below cases:
* Some driver actually covers several families of HWs, some of which
  could use async probing while others don't. So we can't simply
  turn on the PROBE_PREFER_ASYNCHRONOUS flag in driver, but use this
  cmdline option, like igb driver async patch discussed at
  https://www.spinics.net/lists/netdev/msg545986.html

* For SOC (System on Chip) with multiple spi or i2c controllers, most
  of the slave spi/i2c devices will be assigned with fixed controller
  number, while async probing may make those controllers get different
  index for each boot, which prevents those controller drivers to be
  async probed. For platforms not using these spi/i2c slave devices,
  they can use this cmdline option to benefit from the async probing.

Suggested-by: Alexander Duyck 
Signed-off-by: Feng Tang 
---
 drivers/base/dd.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 8ac10af..e99d781 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -57,6 +57,10 @@ static atomic_t deferred_trigger_count = ATOMIC_INIT(0);
 static struct dentry *deferred_devices;
 static bool initcalls_done;
 
+/* Save the async probe drivers' name from kernel cmdline */
+#define ASYNC_DRV_NAMES_MAX_LEN256
+static char async_probe_drv_names[ASYNC_DRV_NAMES_MAX_LEN];
+
 /*
  * In some cases, like suspend to RAM or hibernation, It might be reasonable
  * to prohibit probing of devices as it could be unsafe.
@@ -674,6 +678,22 @@ int driver_probe_device(struct device_driver *drv, struct 
device *dev)
return ret;
 }
 
+static inline bool cmdline_requested_async_probing(const char *drv_name)
+{
+   return parse_option_str(async_probe_drv_names, drv_name);
+}
+
+/* The format is like driver_async_probe=drv_name1,drv_name2,drv_name3 */
+static int __init save_async_options(char *buf)
+{
+   if (strlen(buf) >= ASYNC_DRV_NAMES_MAX_LEN)
+   printk(KERN_WARNING "Too long list for async_probe_drv_names!");
+
+   strlcpy(async_probe_drv_names, buf, ASYNC_DRV_NAMES_MAX_LEN);
+   return 0;
+}
+__setup("driver_async_probe=", save_async_options);
+
 bool driver_allows_async_probing(struct device_driver *drv)
 {
switch (drv->probe_type) {
@@ -684,6 +704,9 @@ bool driver_allows_async_probing(struct device_driver *drv)
return false;
 
default:
+   if (cmdline_requested_async_probing(drv->name))
+   return true;
+
if (module_requested_async_probing(drv->owner))
return true;
 
-- 
2.7.4

Re: [PATCH] cpufreq: Auto-register the driver as a thermal cooling device if asked

On 30-01-19, 10:52, Amit Kucheria wrote:
> All cpufreq drivers do similar things to register as a cooling device.
> Provide a cpufreq driver flag so drivers can just ask the cpufreq core
> to register the cooling device on their behalf. This allows us to get
> rid of duplicated code in the drivers.
> 
> In order to allow this, we add a struct thermal_cooling_device pointer
> to struct cpufreq_policy so that drivers don't need to store it in a
> private data structure.
> 
> Suggested-by: Stephen Boyd 
> Suggested-by: Viresh Kumar 
> Signed-off-by: Amit Kucheria 
> Reviewed-by: Matthias Kaehlcke 
> Tested-by: Matthias Kaehlcke 
> Acked-by: Viresh Kumar 
> ---
>  drivers/cpufreq/cpufreq.c | 11 +++
>  include/linux/cpufreq.h   |  9 +
>  2 files changed, 20 insertions(+)

Thanks for the rework. This looks good now.

-- 
viresh

Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

2019-01-29 Thread Ayaka




Sent from my iPad

> On Jan 30, 2019, at 11:35 AM, Tomasz Figa  wrote:
> 
> On Wed, Jan 30, 2019 at 11:29 AM Alexandre Courbot
>  wrote:
>> 
>>> On Wed, Jan 30, 2019 at 6:41 AM Nicolas Dufresne  
>>> wrote:
>>> 
 Le mardi 29 janvier 2019 à 16:44 +0900, Alexandre Courbot a écrit :
 On Fri, Jan 25, 2019 at 10:04 PM Paul Kocialkowski
  wrote:
> Hi,
> 
>> On Thu, 2019-01-24 at 20:23 +0800, Ayaka wrote:
>> Sent from my iPad
>> 
>>> On Jan 24, 2019, at 6:27 PM, Paul Kocialkowski 
>>>  wrote:
>>> 
>>> Hi,
>>> 
 On Thu, 2019-01-10 at 21:32 +0800, ayaka wrote:
 I forget a important thing, for the rkvdec and rk hevc decoder, it 
 would
 requests cabac table, scaling list, picture parameter set and reference
 picture storing in one or various of DMA buffers. I am not talking 
 about
 the data been parsed, the decoder would requests a raw data.
 
 For the pps and rps, it is possible to reuse the slice header, just let
 the decoder know the offset from the bitstream bufer, I would suggest 
 to
 add three properties(with sps) for them. But I think we need a method 
 to
 mark a OUTPUT side buffer for those aux data.
>>> 
>>> I'm quite confused about the hardware implementation then. From what
>>> you're saying, it seems that it takes the raw bitstream elements rather
>>> than parsed elements. Is it really a stateless implementation?
>>> 
>>> The stateless implementation was designed with the idea that only the
>>> raw slice data should be passed in bitstream form to the decoder. For
>>> H.264, it seems that some decoders also need the slice header in raw
>>> bitstream form (because they take the full slice NAL unit), see the
>>> discussions in this thread:
>>> media: docs-rst: Document m2m stateless video decoder interface
>> 
>> Stateless just mean it won’t track the previous result, but I don’t
>> think you can define what a date the hardware would need. Even you
>> just build a dpb for the decoder, it is still stateless, but parsing
>> less or more data from the bitstream doesn’t stop a decoder become a
>> stateless decoder.
> 
> Yes fair enough, the format in which the hardware decoder takes the
> bitstream parameters does not make it stateless or stateful per-se.
> It's just that stateless decoders should have no particular reason for
> parsing the bitstream on their own since the hardware can be designed
> with registers for each relevant bitstream element to configure the
> decoding pipeline. That's how GPU-based decoder implementations are
> implemented (VAAPI/VDPAU/NVDEC, etc).
> 
> So the format we have agreed on so far for the stateless interface is
> to pass parsed elements via v4l2 control structures.
> 
> If the hardware can only work by parsing the bitstream itself, I'm not
> sure what the best solution would be. Reconstructing the bitstream in
> the kernel is a pretty bad option, but so is parsing in the kernel or
> having the data both in parsed and raw forms. Do you see another
> possibility?
 
 Is reconstructing the bitstream so bad? The v4l2 controls provide a
 generic interface to an encoded format which the driver needs to
 convert into a sequence that the hardware can understand. Typically
 this is done by populating hardware-specific structures. Can't we
 consider that in this specific instance, the hardware-specific
 structure just happens to be identical to the original bitstream
 format?
>>> 
>>> At maximum allowed bitrate for let's say HEVC (940MB/s iirc), yes, it
>>> would be really really bad. In GStreamer project we have discussed for
>>> a while (but have never done anything about) adding the ability through
>>> a bitmask to select which part of the stream need to be parsed, as
>>> parsing itself was causing some overhead. Maybe similar thing applies,
>>> though as per our new design, it's the fourcc that dictate the driver
>>> behaviour, we'd need yet another fourcc for drivers that wants the full
>>> bitstream (which seems odd if you have already parsed everything, I
>>> think this need some clarification).
>> 
>> Note that I am not proposing to rebuild the *entire* bitstream
>> in-kernel. What I am saying is that if the hardware interprets some
>> structures (like SPS/PPS) in their raw format, this raw format could
>> be reconstructed from the structures passed by userspace at negligible
>> cost. Such manipulation would only happen on a small amount of data.
>> 
>> Exposing finer-grained driver requirements through a bitmask may
>> deserve more exploring. Maybe we could end with a spectrum of
>> capabilities that would allow us to cover the range from fully
>> stateless to fully stateful IPs more smoothly. Right now we have two
>> specifications that only

Re: [PATCH -next] irqchip/tango: Fix potential NULL pointer dereference

2019-01-29 Thread YueHaibing

On 2019/1/29 20:20, Måns Rullgård wrote:
> Marc Zyngier  writes:
> 
>> On Tue, 29 Jan 2019 08:01:22 +,
>> YueHaibing  wrote:
>>>
>>> There is a potential NULL pointer dereference in case kzalloc()
>>> fails and returns NULL.
>>>
>>> Fixes: 4bba66899ac6 ("irqchip/tango: Add support for Sigma Designs 
>>> SMP86xx/SMP87xx interrupt controller")
>>> Signed-off-by: YueHaibing 
>>> ---
>>>  drivers/irqchip/irq-tango.c | 2 ++
>>>  1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/irqchip/irq-tango.c b/drivers/irqchip/irq-tango.c
>>> index ae28d86..a63b828 100644
>>> --- a/drivers/irqchip/irq-tango.c
>>> +++ b/drivers/irqchip/irq-tango.c
>>> @@ -191,6 +191,8 @@ static int __init tangox_irq_init(void __iomem *base, 
>>> struct resource *baseres,
>>> panic("%pOFn: failed to get address", node);
>>>  
>>> chip = kzalloc(sizeof(*chip), GFP_KERNEL);
>>> +   if (!chip)
>>> +   return -ENOMEM;
>>> chip->ctl = res.start - baseres->start;
>>> chip->base = base;
>>>  
>>
>> This is a commendable effort, but given that the whole error handling
>> of this driver is just to simply panic, I have the ugly feeling that
>> this lack of check is more a feature than a bug... Not that I like it,
>> but at least it is consistent.
> 
> That seemed to be the norm for irqchip drivers when I wrote this one,
> and a fair number of them still panic on errors during init.  There's
> really not much else that can sanely be done since nothing will work
> without irq handling.
> 
> As for the error return added by this patch, nothing checks it, so a
> failure would merely result in the irqchip being silently skipped and
> nothing working.  Propagating the error back to of_irq_init() also has
> no effect, not even a warning.  Besides, kzalloc() is extremely unlikely
> to fail at this stage, and if it does, you have much bigger problems.

Thanks for your comment.

>

RE: [PATCH v8 1/2] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc

2019-01-29 Thread Vadim Pasternak




> -Original Message-
> From: Liming Sun 
> Sent: Monday, January 28, 2019 7:28 PM
> To: Rob Herring ; Mark Rutland
> ; Arnd Bergmann ; David Woods
> ; Andy Shevchenko ; Darren
> Hart ; Vadim Pasternak 
> Cc: Liming Sun ; devicet...@vger.kernel.org; linux-
> ker...@vger.kernel.org; platform-driver-...@vger.kernel.org
> Subject: [PATCH v8 1/2] platform/mellanox: Add TmFifo driver for Mellanox
> BlueField Soc
> 
> This commit adds the TmFifo platform driver for Mellanox BlueField Soc. TmFifo
> is a shared FIFO which enables external host machine to exchange data with the
> SoC via USB or PCIe. The driver is based on virtio framework and has console
> and network access enabled.
> 
> Reviewed-by: David Woods 
> Signed-off-by: Liming Sun 
> ---
>  drivers/platform/mellanox/Kconfig |   13 +-
>  drivers/platform/mellanox/Makefile|1 +
>  drivers/platform/mellanox/mlxbf-tmfifo-regs.h |   67 ++
>  drivers/platform/mellanox/mlxbf-tmfifo.c  | 1289
> +
>  4 files changed, 1369 insertions(+), 1 deletion(-)  create mode 100644
> drivers/platform/mellanox/mlxbf-tmfifo-regs.h
>  create mode 100644 drivers/platform/mellanox/mlxbf-tmfifo.c
> 
> diff --git a/drivers/platform/mellanox/Kconfig
> b/drivers/platform/mellanox/Kconfig
> index cd8a908..a565070 100644
> --- a/drivers/platform/mellanox/Kconfig
> +++ b/drivers/platform/mellanox/Kconfig
> @@ -5,7 +5,7 @@
> 
>  menuconfig MELLANOX_PLATFORM
>   bool "Platform support for Mellanox hardware"
> - depends on X86 || ARM || COMPILE_TEST
> + depends on X86 || ARM || ARM64 || COMPILE_TEST
>   ---help---
> Say Y here to get to see options for platform support for
> Mellanox systems. This option alone does not add any kernel code.
> @@ -34,4 +34,15 @@ config MLXREG_IO
> to system resets operation, system reset causes monitoring and some
> kinds of mux selection.
> 
> +config MLXBF_TMFIFO
> + tristate "Mellanox BlueField SoC TmFifo platform driver"
> + depends on ARM64

Why you make it dependent on ARM64?
Should not it work on any host, x86?

> + default m

User who needs it should select this option.
No need default 'm'.

> + select VIRTIO_CONSOLE
> + select VIRTIO_NET
> + help
> +   Say y here to enable TmFifo support. The TmFifo driver provides
> +  platform driver support for the TmFifo which supports console
> +  and networking based on the virtio framework.
> +
>  endif # MELLANOX_PLATFORM
> diff --git a/drivers/platform/mellanox/Makefile
> b/drivers/platform/mellanox/Makefile
> index 57074d9c..f0c061d 100644
> --- a/drivers/platform/mellanox/Makefile
> +++ b/drivers/platform/mellanox/Makefile
> @@ -5,3 +5,4 @@
>  #
>  obj-$(CONFIG_MLXREG_HOTPLUG) += mlxreg-hotplug.o
>  obj-$(CONFIG_MLXREG_IO) += mlxreg-io.o
> +obj-$(CONFIG_MLXBF_TMFIFO)   += mlxbf-tmfifo.o
> diff --git a/drivers/platform/mellanox/mlxbf-tmfifo-regs.h
> b/drivers/platform/mellanox/mlxbf-tmfifo-regs.h
> new file mode 100644
> index 000..90c9c2cf
> --- /dev/null
> +++ b/drivers/platform/mellanox/mlxbf-tmfifo-regs.h
> @@ -0,0 +1,67 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (c) 2019, Mellanox Technologies. All rights reserved.
> + */
> +
> +#ifndef __MLXBF_TMFIFO_REGS_H__
> +#define __MLXBF_TMFIFO_REGS_H__
> +
> +#include 
> +
> +#define MLXBF_TMFIFO_TX_DATA 0x0
> +
> +#define MLXBF_TMFIFO_TX_STS 0x8
> +#define MLXBF_TMFIFO_TX_STS__LENGTH 0x0001 #define
> +MLXBF_TMFIFO_TX_STS__COUNT_SHIFT 0 #define
> +MLXBF_TMFIFO_TX_STS__COUNT_WIDTH 9 #define
> +MLXBF_TMFIFO_TX_STS__COUNT_RESET_VAL 0 #define
> +MLXBF_TMFIFO_TX_STS__COUNT_RMASK 0x1ff #define
> +MLXBF_TMFIFO_TX_STS__COUNT_MASK  0x1ff
> +
> +#define MLXBF_TMFIFO_TX_CTL 0x10
> +#define MLXBF_TMFIFO_TX_CTL__LENGTH 0x0001 #define
> +MLXBF_TMFIFO_TX_CTL__LWM_SHIFT 0 #define
> MLXBF_TMFIFO_TX_CTL__LWM_WIDTH
> +8 #define MLXBF_TMFIFO_TX_CTL__LWM_RESET_VAL 128 #define
> +MLXBF_TMFIFO_TX_CTL__LWM_RMASK 0xff #define
> +MLXBF_TMFIFO_TX_CTL__LWM_MASK  0xff #define
> +MLXBF_TMFIFO_TX_CTL__HWM_SHIFT 8 #define
> MLXBF_TMFIFO_TX_CTL__HWM_WIDTH
> +8 #define MLXBF_TMFIFO_TX_CTL__HWM_RESET_VAL 128 #define
> +MLXBF_TMFIFO_TX_CTL__HWM_RMASK 0xff #define
> +MLXBF_TMFIFO_TX_CTL__HWM_MASK  0xff00 #define
> +MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_SHIFT 32 #define
> +MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_WIDTH 9 #define
> +MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_RESET_VAL 256 #define
> +MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_RMASK 0x1ff #define
> +MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_MASK  0x1ffULL
> +
> +#define MLXBF_TMFIFO_RX_DATA 0x0
> +
> +#define MLXBF_TMFIFO_RX_STS 0x8
> +#define MLXBF_TMFIFO_RX_STS__LENGTH 0x0001 #define
> +MLXBF_TMFIFO_RX_STS__COUNT_SHIFT 0 #define
> +MLXBF_TMFIFO_RX_STS__COUNT_WIDTH 9 #define
> +MLXBF_TMFIFO_RX_STS__COUNT_RESET_VAL 0 #define
> +MLXBF_TMFIFO_RX_STS__COUNT_RMASK 0x1ff #define
> +MLXBF_TMFIFO_RX_STS__COUNT_MASK  0x1ff
> +
> +#define MLXBF_TMFIFO_RX_CTL 0x10
> +#define

RE: [PATCH v1 1/1] platform/mellanox: Add bootctl driver for Mellanox BlueField Soc

2019-01-29 Thread Vadim Pasternak


[...]

Please, be consistent with naming convention.
All the above should have same prefix as others routines.

> 
> > +static ssize_t post_reset_wdog_store(struct device_driver *drv,
> > +const char *buf, size_t count) {
> > +   int err;
> > +   unsigned long watchdog;
> > +
> > +   err = kstrtoul(buf, 10, );
> > +   if (err)
> > +   return err;
> > +
> 
> > +   if (mlxbf_bootctl_smc_call1(MLXBF_BOOTCTL_SET_POST_RESET_WDOG,
> > +   watchdog) < 0)
> > +   return -EINVAL;
> 
> If that call returns an error it shouldn't be shadowed here.
> 
> > +
> > +   return count;
> > +}
> > +
> > +static ssize_t reset_action_show(struct device_driver *drv, char
> > +*buf) {
> 
> > +   return sprintf(buf, "%s\n", reset_action_to_string(
> > +
> > + mlxbf_bootctl_smc_call0(MLXBF_BOOTCTL_GET_RESET_ACTION)));
> 
> Wouldn't be easy to parse this as
> 
> int action = ...call0();
> return sprintf(...);
> 
> ?
> 
> (int is an arbitrary type here, choose one that suits)
> 
> > +}
> > +
> > +static ssize_t reset_action_store(struct device_driver *drv,
> > + const char *buf, size_t count) {
> > +   int action = reset_action_to_val(buf, count);
> > +
> 
> > +   if (action < 0 || action == MLXBF_BOOTCTL_NONE)
> > +   return -EINVAL;
> 
> Don't shadow an error.
> 
> > +
> > +   if (mlxbf_bootctl_smc_call1(MLXBF_BOOTCTL_SET_RESET_ACTION,
> action) < 0)
> > +   return -EINVAL;
> 
> Same.
> 
> > +
> > +   return count;
> > +}
> > +
> > +static ssize_t second_reset_action_show(struct device_driver *drv,
> > +char *buf) {
> 
> > +   return sprintf(buf, "%s\n", reset_action_to_string(
> > +   mlxbf_bootctl_smc_call0(
> > +   MLXBF_BOOTCTL_GET_SECOND_RESET_ACTION)));
> 
> Use temp variable.
> 
> > +}
> > +
> > +static ssize_t second_reset_action_store(struct device_driver *drv,
> > +const char *buf, size_t
> > +count) {
> > +   int action = reset_action_to_val(buf, count);
> > +
> > +   if (action < 0)
> > +   return -EINVAL;
> 
> Don't shadow an error.
> 
> > +
> > +   if
> (mlxbf_bootctl_smc_call1(MLXBF_BOOTCTL_SET_SECOND_RESET_ACTION,
> > +   action) < 0)
> > +   return -EINVAL;
> 
> Same.
> 
> > +
> > +   return count;
> > +}
> > +
> > +static ssize_t lifecycle_state_show(struct device_driver *drv, char
> > +*buf) {
> 
> > +   int lc_state = mlxbf_bootctl_smc_call1(
> > +   MLXBF_BOOTCTL_GET_TBB_FUSE_STATUS,
> > +   MLXBF_BOOTCTL_FUSE_STATUS_LIFECYCLE);
> 
> Split it as
> 
> int ...;
> 
> ... = call1();
> if (...)
> 
> > +
> > +   if (lc_state < 0)
> > +   return -EINVAL;
> 
> Don't shadow an error.
> 
> > +
> > +   lc_state &= (MLXBF_BOOTCTL_SB_MODE_TEST_MASK |
> > +MLXBF_BOOTCTL_SB_MODE_SECURE_MASK);
> 
> Better to split like
> 
> xxx =
>  (A | B);
> 
> > +   /*
> > +* If the test bits are set, we specify that the current state may 
> > be
> > +* due to using the test bits.
> > +*/
> 
> > +   if ((lc_state & MLXBF_BOOTCTL_SB_MODE_TEST_MASK) != 0) {
> 
> ' != 0' is redundant.
> 
> > +
> > +   lc_state &= MLXBF_BOOTCTL_SB_MODE_SECURE_MASK;
> > +
> 
> > +   return sprintf(buf, "%s(test)\n",
> > +
> > + mlxbf_bootctl_lifecycle_states[lc_state]);
> 
> One line?
> 
> > +   }
> > +
> > +   return sprintf(buf, "%s\n",
> > +mlxbf_bootctl_lifecycle_states[lc_state]);
> > +}
> > +
> > +static ssize_t secure_boot_fuse_state_show(struct device_driver *drv,
> > +char *buf) {
> > +   int key;
> > +   int buf_len = 0;
> > +   int upper_key_used = 0;
> > +   int sb_key_state = mlxbf_bootctl_smc_call1(
> > +   MLXBF_BOOTCTL_GET_TBB_FUSE_STATUS,
> > +   MLXBF_BOOTCTL_FUSE_STATUS_KEYS);
> > +
> > +   if (sb_key_state < 0)
> > +   return -EINVAL;
> > +
> 
> > +   for (key = MLXBF_SB_KEY_NUM - 1; key >= 0; key--) {
> 
> I'm not sure it's a good idea to put several lines in one sysfs attribute.
> 
> > +   int burnt = ((sb_key_state & (1 << key)) != 0);
> 
> Redundant  ' != 0', redundant parens.
> 
> > +   int valid = ((sb_key_state &
> > + (1 << (key + MLXBF_SB_KEY_NUM))) != 0);
> 
> Same.
> 
> > +
> > +   buf_len += sprintf(buf + buf_len, "Ver%d:", key);
> > +   if (upper_key_used) {
> > +   if (burnt) {
> > +   if (valid)
> > +   buf_len += sprintf(buf + buf_len,
> > + "Used");
> 
> Oh, why not just
> 
> const char *status;
> 
> if (...) {
> ...
>  status =

Re: [PATCH v2] nfit: add Hyper-V NVDIMM DSM command set to white list

On Mon, Jan 28, 2019 at 4:56 PM Dexuan Cui  wrote:
>
>
> Add the Hyper-V _DSM command set to the white list of NVDIMM command
> sets.
>
> This command set is documented at http://www.uefi.org/RFIC_LIST
> (see "Virtual NVDIMM 0x1901").
>
> Thanks Dan Williams  for writing the
> comment change.
>
> Signed-off-by: Dexuan Cui 
> Reviewed-by: Michael Kelley 
> ---
>
> Changes in v2:
> Updated the comment and changelog (Thanks, Dan!)
> Rebased to the tag libnvdimm-fixes-5.0-rc4 of the nvdimm tree.

Thanks for the re-spin, applied.

Re: [PATCH] nfit: acpi_nfit_ctl(): check out_obj->type in the right place

On Tue, Jan 29, 2019 at 5:23 PM Dexuan Cui  wrote:
>
>
> In the case of ND_CMD_CALL, we should also check out_obj->type.
>
> The patch uses out_obj->type, which is a short alias to
> out_obj->package.type.
>
> Fixes: 31eca76ba2fc ("nfit, libnvdimm: limited/whitelisted dimm command 
> marshaling mechanism")
> Cc: 
> Signed-off-by: Dexuan Cui 

Looks good to me, applied.

Re: [PATCH 10/10] venus: dec: make decoder compliant with stateful codec API

2019-01-29 Thread Tomasz Figa

On Wed, Jan 30, 2019 at 1:21 PM Nicolas Dufresne  wrote:
>
> Le mercredi 30 janvier 2019 à 12:38 +0900, Tomasz Figa a écrit :
> > > Yes, unfortunately, GStreamer still rely on G_FMT waiting a minimal
> > > amount of time of the headers to be processed. This was how things was
> > > created back in 2011, I could not program GStreamer for the future. If
> > > we stop doing this, we do break GStreamer as a valid userspace
> > > application.
> >
> > Does it? Didn't you say earlier that you end up setting the OUTPUT
> > format with the stream resolution as parsed on your own? If so, that
> > would actually expose a matching framebuffer format on the CAPTURE
> > queue, so there is no need to wait for the real parsing to happen.
>
> I don't remember saying that, maybe I meant to say there might be a
> workaround ?
>
> For the fact, here we queue the headers (or first frame):
>
> https://gitlab.freedesktop.org/gstreamer/gst-plugins-good/blob/master/sys/v4l2/gstv4l2videodec.c#L624
>
> Then few line below this helper does G_FMT internally:
>
> https://gitlab.freedesktop.org/gstreamer/gst-plugins-good/blob/master/sys/v4l2/gstv4l2videodec.c#L634
> https://gitlab.freedesktop.org/gstreamer/gst-plugins-good/blob/master/sys/v4l2/gstv4l2object.c#L3907
>
> And just plainly fails if G_FMT returns an error of any type. This was
> how Kamil designed it initially for MFC driver. There was no other
> alternative back then (no EAGAIN yet either).

Hmm, was that ffmpeg then?

So would it just set the OUTPUT width and height to 0? Does it mean
that gstreamer doesn't work with coda and mtk-vcodec, which don't have
such wait in their g_fmt implementations?

>
> Nicolas
>
> p.s. it's still in my todo's to implement source change event as I
> believe it is a better mechanism (specially if you header happened to
> be corrupted, then the driver can consume the stream until it finds a
> sync). So these sleep or normally wait exist all over to support this
> legacy thing. It is unfortunate, the question is do you want to break
> userspace now ? Without having first placed a patch that would maybe
> warn or something for a while ?
>

I don't want and my understanding was that we could workaround it by
the propagation of format from OUTPUT to CAPTURE. Also see above.

Best regards,
Tomasz

Re: [PATCH for-5.0] ath10k: correct bus type for WCN3990

2019-01-29 Thread Bjorn Andersson

On Tue 29 Jan 15:12 PST 2019, Brian Norris wrote:

> WCN3990 is SNOC, not PCI. This prevents probing WCN3990.
> 
> Fixes: 367c899f622c ("ath10k: add bus type check in ath10k_init_hw_params")
> Signed-off-by: Brian Norris 

Reviewed-by: Bjorn Andersson 

> ---
> This was a regression in 4.20.
> 
>  drivers/net/wireless/ath/ath10k/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/wireless/ath/ath10k/core.c 
> b/drivers/net/wireless/ath/ath10k/core.c
> index 399b501f3c3c..e8891f5fc83a 100644
> --- a/drivers/net/wireless/ath/ath10k/core.c
> +++ b/drivers/net/wireless/ath/ath10k/core.c
> @@ -548,7 +548,7 @@ static const struct ath10k_hw_params 
> ath10k_hw_params_list[] = {
>   {
>   .id = WCN3990_HW_1_0_DEV_VERSION,
>   .dev_id = 0,
> - .bus = ATH10K_BUS_PCI,
> + .bus = ATH10K_BUS_SNOC,
>   .name = "wcn3990 hw1.0",
>   .continuous_frag_desc = true,
>   .tx_chain_mask = 0x7,
> -- 
> 2.20.1.495.gaa96b0ce6b-goog
>

Re: [PATCH] ARM: socfpga: fix base address of SDR controller

2019-01-29 Thread Simon Goldschmidt

On Tue, Jan 29, 2019 at 11:31 PM Alan Tull  wrote:
>
> On Tue, Jan 29, 2019 at 2:09 PM Simon Goldschmidt
>  wrote:
>
> Hi Simon,
>
> Thanks for submitting.   A couple of things...
>
> > diff --git a/arch/arm/boot/dts/socfpga.dtsi b/arch/arm/boot/dts/socfpga.dtsi
> > index f365003f0..8f6c1a5d6 100644
> > --- a/arch/arm/boot/dts/socfpga.dtsi
> > +++ b/arch/arm/boot/dts/socfpga.dtsi
> > @@ -788,9 +788,9 @@
> > reg = <0xfffec000 0x100>;
> > };
> >
> > -   sdr: sdr@ffc25000 {
> > +   sdr: sdr@ffc2 {
> > compatible = "altr,sdr-ctl", "syscon";
> > -   reg = <0xffc25000 0x1000>;
> > +   reg = <0xffc2 0x6000>;
>
> The binding doc will also need this change (in a separate patch)
> Documentation/devicetree/bindings/arm/altera/socfpga-sdram-controller.txt

Right. I didn't realise there is an actual address in that file as it says
"Example"...

But I'll make sure to change that if this patch is accepted.

>
> > diff --git a/arch/arm/mach-socfpga/self-refresh.S 
> > b/arch/arm/mach-socfpga/self-refresh.S
> > index f2d7f883e..bd7759357 100644
> > --- a/arch/arm/mach-socfpga/self-refresh.S
> > +++ b/arch/arm/mach-socfpga/self-refresh.S
> > @@ -19,8 +19,8 @@
> >  #define MAX_LOOP_COUNT 1000
> >
> >  /* Register offset */
> > -#define SDR_CTRLGRP_LOWPWREQ_ADDR   0x54
> > -#define SDR_CTRLGRP_LOWPWRACK_ADDR  0x58
> > +#define SDR_CTRLGRP_LOWPWREQ_ADDR   0x5054
> > +#define SDR_CTRLGRP_LOWPWRACK_ADDR  0x5058
>
> These offsets are used for ldr/sdr and are limited to 12 bits.  This
> won't build if CONFIG_SOCFPGA_SUSPEND is enabled.
>
> /home/atull/repos/linux-socfpga/arch/arm/mach-socfpga/self-refresh.S:
> Assembler messages:
> /home/atull/repos/linux-socfpga/arch/arm/mach-socfpga/self-refresh.S:65:
> Error: bad immediate value for offset (20564)
> /home/atull/repos/linux-socfpga/arch/arm/mach-socfpga/self-refresh.S:67:
> Error: bad immediate value for offset (20564)
> /home/atull/repos/linux-socfpga/arch/arm/mach-socfpga/self-refresh.S:72:
> Error: bad immediate value for offset (20568)
> /home/atull/repos/linux-socfpga/arch/arm/mach-socfpga/self-refresh.S:101:
> Error: bad immediate value for offset (20564)
> /home/atull/repos/linux-socfpga/arch/arm/mach-socfpga/self-refresh.S:103:
> Error: bad immediate value for offset (20564)
> /home/atull/repos/linux-socfpga/arch/arm/mach-socfpga/self-refresh.S:108:
> Error: bad immediate value for offset (20568)
> /home/atull/repos/linux-socfpga/scripts/Makefile.build:367: recipe for
> target 'arch/arm/mach-socfpga/self-refresh.o' failed

Oops, you're right. Sorry for that. I just saw now that socfpga_defconfig
leaves CONFIG_SOCFPGA_SUSPEND inactive. I'll make sure to test that if it
comes to v2 (depending on the discussion).

Thanks,
Simon

Re: [PATCH] thermal: mtk: Allocate enough space for mtk_thermal.

2019-01-29 Thread Peter Shih

Adding Michael Kao to cc list.

On Wed, Jan 9, 2019 at 1:57 PM Pi-Hsun Shih  wrote:
>
> The mtk_thermal struct contains a 'struct mtk_thermal_bank banks[];',
> but the allocation only allocates sizeof(struct mtk_thermal) bytes,
> which cause out of bound access with the ->banks[] member. Change it to
> a fixed size array instead.
>
> Signed-off-by: Pi-Hsun Shih 
> ---
>  drivers/thermal/mtk_thermal.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/thermal/mtk_thermal.c b/drivers/thermal/mtk_thermal.c
> index 0691f260f6eabe..ea11edb3fcced6 100644
> --- a/drivers/thermal/mtk_thermal.c
> +++ b/drivers/thermal/mtk_thermal.c
> @@ -159,6 +159,9 @@
>  #define MT7622_NUM_SENSORS_PER_ZONE1
>  #define MT7622_TS1 0
>
> +/* The maximum number of banks */
> +#define MAX_NUM_ZONES  8
> +
>  struct mtk_thermal;
>
>  struct thermal_bank_cfg {
> @@ -178,7 +181,7 @@ struct mtk_thermal_data {
> const int *sensor_mux_values;
> const int *msr;
> const int *adcpnp;
> -   struct thermal_bank_cfg bank_data[];
> +   struct thermal_bank_cfg bank_data[MAX_NUM_ZONES];
>  };
>
>  struct mtk_thermal {
> @@ -197,7 +200,7 @@ struct mtk_thermal {
> s32 vts[MT8173_NUM_SENSORS];
>
> const struct mtk_thermal_data *conf;
> -   struct mtk_thermal_bank banks[];
> +   struct mtk_thermal_bank banks[MAX_NUM_ZONES];
>  };
>
>  /* MT8173 thermal sensor data */
> --
> 2.20.1.97.g81188d93c3-goog
>

Re: [PATCH 08/10] soc: mediatek: add packet encoder function


On 01/29/2019 03:32 PM, Bibby Hsieh wrote:

Implement a function can encode the GCE instructions

Signed-off-by: Bibby Hsieh 
---
  drivers/soc/mediatek/mtk-cmdq-helper.c   | 102 ---
  include/linux/mailbox/mtk-cmdq-mailbox.h |   2 +
  include/linux/soc/mediatek/mtk-cmdq.h|  14 ++---
  3 files changed, 76 insertions(+), 42 deletions(-)

diff --git a/drivers/soc/mediatek/mtk-cmdq-helper.c 
b/drivers/soc/mediatek/mtk-cmdq-helper.c
index 16c0393..923a815 100644
--- a/drivers/soc/mediatek/mtk-cmdq-helper.c
+++ b/drivers/soc/mediatek/mtk-cmdq-helper.c
@@ -9,11 +9,43 @@
  #include 
  #include 
  
-#define CMDQ_ARG_A_WRITE_MASK	0x

+#define CMDQ_GET_ARG_B(arg)(((arg) & GENMASK(31, 16)) >> 16)
+#define CMDQ_GET_ARG_C(arg)((arg) & GENMASK(15, 0))
  #define CMDQ_WRITE_ENABLE_MASKBIT(0)
  #define CMDQ_EOC_IRQ_EN   BIT(0)
  #define CMDQ_EOC_CMD  ((u64)((CMDQ_CODE_EOC << CMDQ_OP_CODE_SHIFT)) \
<< 32 | CMDQ_EOC_IRQ_EN)
+#define CMDQ_IMMEDIATE_VALUE   0
+#define CMDQ_REG_TYPE  1
+
+struct cmdq_instruction {
+   s16 arg_c:16;
+   s16 arg_b:16;
+   s16 arg_a:16;
+   u8 s_op:5;
+   u8 arg_c_type:1;
+   u8 arg_b_type:1;
+   u8 arg_a_type:1;
+   u8 op:8;
+};
+
+static void cmdq_pkt_instr_encoder(struct cmdq_pkt *pkt, s16 arg_c, s16 arg_b,
+  s16 arg_a, u8 s_op, u8 arg_c_type,
+  u8 arg_b_type, u8 arg_a_type, u8 op)
+{
+   struct cmdq_instruction *cmdq_inst;
+
+   cmdq_inst = pkt->va_base + pkt->cmd_buf_size;
+   cmdq_inst->op = op;
+   cmdq_inst->arg_a_type = arg_a_type;
+   cmdq_inst->arg_b_type = arg_b_type;
+   cmdq_inst->arg_c_type = arg_c_type;
+   cmdq_inst->s_op = s_op;
+   cmdq_inst->arg_a = arg_a;
+   cmdq_inst->arg_b = arg_b;
+   cmdq_inst->arg_c = arg_c;
+   pkt->cmd_buf_size += CMDQ_INST_SIZE;
+}
  
  u8 cmdq_subsys_base_to_id(struct cmdq_base *clt_base, u32 base)

  {
@@ -180,10 +212,11 @@ void cmdq_pkt_destroy(struct cmdq_pkt *pkt)
  }
  EXPORT_SYMBOL(cmdq_pkt_destroy);
  
-static int cmdq_pkt_append_command(struct cmdq_pkt *pkt, enum cmdq_code code,

-  u32 arg_a, u32 arg_b)
+static int cmdq_pkt_append_command(struct cmdq_pkt *pkt, s16 arg_c, s16 arg_b,
+  s16 arg_a, u8 s_op, u8 arg_c_type,
+  u8 arg_b_type, u8 arg_a_type,
+  enum cmdq_code code)
  {
-   u64 *cmd_ptr;
  
  	if (unlikely(pkt->cmd_buf_size + CMDQ_INST_SIZE > pkt->buf_size)) {

/*
@@ -199,65 +232,59 @@ static int cmdq_pkt_append_command(struct cmdq_pkt *pkt, 
enum cmdq_code code,
__func__, (u32)pkt->buf_size);
return -ENOMEM;
}
-   cmd_ptr = pkt->va_base + pkt->cmd_buf_size;
-   (*cmd_ptr) = (u64)((code << CMDQ_OP_CODE_SHIFT) | arg_a) << 32 | arg_b;
-   pkt->cmd_buf_size += CMDQ_INST_SIZE;
+   cmdq_pkt_instr_encoder(pkt, arg_c, arg_b, arg_a, s_op, arg_c_type,
+  arg_b_type, arg_a_type, code);
  
  	return 0;

  }
  
-int cmdq_pkt_write(struct cmdq_pkt *pkt, u32 value, u32 subsys, u32 offset)

+int cmdq_pkt_write(struct cmdq_pkt *pkt, u8 subsys, u16 offset, u32 value)
  {
-   u32 arg_a = (offset & CMDQ_ARG_A_WRITE_MASK) |
-   (subsys << CMDQ_SUBSYS_SHIFT);
-
-   return cmdq_pkt_append_command(pkt, CMDQ_CODE_WRITE, arg_a, value);
+   return cmdq_pkt_append_command(pkt, CMDQ_GET_ARG_C(value),
+  CMDQ_GET_ARG_B(value), offset, subsys,
+  CMDQ_IMMEDIATE_VALUE,
+  CMDQ_IMMEDIATE_VALUE,
+  CMDQ_IMMEDIATE_VALUE, CMDQ_CODE_WRITE);


All other codes use 0 instead of CMDQ_IMMEDIATE_VALUE. Also use 0 here
or use CMDQ_IMMEDIATE_VALUE at other places too?


  }
  EXPORT_SYMBOL(cmdq_pkt_write);
  
-int cmdq_pkt_write_mask(struct cmdq_pkt *pkt, u32 value,

-   u32 subsys, u32 offset, u32 mask)
+int cmdq_pkt_write_mask(struct cmdq_pkt *pkt, u8 subsys, u16 offset,
+   u32 value, u32 mask)
  {
u32 offset_mask = offset;
int err = 0;
  
  	if (mask != 0x) {

-   err = cmdq_pkt_append_command(pkt, CMDQ_CODE_MASK, 0, ~mask);
+   err = cmdq_pkt_append_command(pkt, CMDQ_GET_ARG_C(~mask),
+ CMDQ_GET_ARG_B(~mask), 0, 0, 0, 0,
+ 0, CMDQ_CODE_MASK);
offset_mask |= CMDQ_WRITE_ENABLE_MASK;
}
-   err |= cmdq_pkt_write(pkt, value, subsys, offset_mask);
+   err |= cmdq_pkt_write(pkt, subsys, offset_mask, value);


I know that the code was already that way before, but why do you or
the 2 return values, instead of just

Re: [PATCH 07/10] soc: mediatek: add cmdq_dev_get_event function


On 01/29/2019 03:32 PM, Bibby Hsieh wrote:

When client ask gce to clear or wait for event,
client need to pass event number to the API.
We suggest client store the event information in device node,
so we provide an API for client parse the event property.

Signed-off-by: Bibby Hsieh 
---
  drivers/soc/mediatek/mtk-cmdq-helper.c | 29 +
  include/linux/soc/mediatek/mtk-cmdq.h  |  1 +
  2 files changed, 30 insertions(+)

diff --git a/drivers/soc/mediatek/mtk-cmdq-helper.c 
b/drivers/soc/mediatek/mtk-cmdq-helper.c
index 6ad997f..16c0393 100644
--- a/drivers/soc/mediatek/mtk-cmdq-helper.c
+++ b/drivers/soc/mediatek/mtk-cmdq-helper.c
@@ -56,6 +56,35 @@ struct cmdq_base *cmdq_register_device(struct device *dev)
  }
  EXPORT_SYMBOL(cmdq_register_device);
  
+s32 cmdq_dev_get_event(struct device *dev, const char *name)

+{
+   s32 index = 0;
+   struct of_phandle_args spec;
+   s32 result;
+
+   if (!dev)
+   return -EINVAL;
+
+   index = of_property_match_string(dev->of_node, "gce-event-names", name);
+   if (index < 0) {
+   dev_err(dev, "no gce-event-names property or no such event:%s",
+   name);
+   return index;
+   }
+
+   if (of_parse_phandle_with_args(dev->of_node, "gce-events",
+   "#gce-event-cells", index, )) {


nit: Should have more indention for the line above. (Align with the
dev->of_node?)


+   dev_err(dev, "can't parse gce-events property");
+   return -ENODEV;
+   }
+
+   result = spec.args[0];
+   of_node_put(spec.np);
+
+   return result;
+}
+EXPORT_SYMBOL(cmdq_dev_get_event);
+
  static void cmdq_client_timeout(struct timer_list *t)
  {
struct cmdq_client *client = from_timer(client, t, timer);
diff --git a/include/linux/soc/mediatek/mtk-cmdq.h 
b/include/linux/soc/mediatek/mtk-cmdq.h
index a1f5eb6..e5b0a98 100644
--- a/include/linux/soc/mediatek/mtk-cmdq.h
+++ b/include/linux/soc/mediatek/mtk-cmdq.h
@@ -139,5 +139,6 @@ int cmdq_pkt_flush_async(struct cmdq_pkt *pkt, 
cmdq_async_flush_cb cb,
  
  u8 cmdq_subsys_base_to_id(struct cmdq_base *clt_base, u32 base);

  struct cmdq_base *cmdq_register_device(struct device *dev);
+s32 cmdq_dev_get_event(struct device *dev, const char *name);
  
  #endif	/* __MTK_CMDQ_H__ */

Re: [PATCH 06/10] soc: mediatek: add register device function


On 01/29/2019 03:32 PM, Bibby Hsieh wrote:

GCE cannot know the register base address, we store the subsys-base address
relationship in the device node, and store the relationship by
cmdq_register_device function.

Signed-off-by: Bibby Hsieh 
---
  drivers/soc/mediatek/mtk-cmdq-helper.c | 24 
  include/linux/soc/mediatek/mtk-cmdq.h  |  1 +
  2 files changed, 25 insertions(+)

diff --git a/drivers/soc/mediatek/mtk-cmdq-helper.c 
b/drivers/soc/mediatek/mtk-cmdq-helper.c
index 6e4b85e..6ad997f 100644
--- a/drivers/soc/mediatek/mtk-cmdq-helper.c
+++ b/drivers/soc/mediatek/mtk-cmdq-helper.c
@@ -32,6 +32,30 @@ u8 cmdq_subsys_base_to_id(struct cmdq_base *clt_base, u32 
base)
  }
  EXPORT_SYMBOL(cmdq_subsys_base_to_id);
  
+struct cmdq_base *cmdq_register_device(struct device *dev)

+{
+   struct cmdq_base *clt_base;
+   struct of_phandle_args spec;
+   u32 idx;
+
+   clt_base = devm_kzalloc(dev, sizeof(*clt_base), GFP_KERNEL);
+   if (!clt_base)
+   return NULL;
+
+   /* parse subsys */
+   for (idx = 0; idx < ARRAY_SIZE(clt_base->subsys); idx++) {
+   if (of_parse_phandle_with_args(dev->of_node, "gce-subsys",
+   "#gce-subsys-cells", idx, ))


nit: Should have more indention for the line above. (Align with the
dev->of_node?)


+   break;
+   clt_base->subsys[idx].base = spec.args[0];
+   clt_base->subsys[idx].id = spec.args[1];
+   }
+   clt_base->count = idx;
+
+   return clt_base;
+}
+EXPORT_SYMBOL(cmdq_register_device);
+
  static void cmdq_client_timeout(struct timer_list *t)
  {
struct cmdq_client *client = from_timer(client, t, timer);
diff --git a/include/linux/soc/mediatek/mtk-cmdq.h 
b/include/linux/soc/mediatek/mtk-cmdq.h
index 0c7a6ee..a1f5eb6 100644
--- a/include/linux/soc/mediatek/mtk-cmdq.h
+++ b/include/linux/soc/mediatek/mtk-cmdq.h
@@ -138,5 +138,6 @@ int cmdq_pkt_flush_async(struct cmdq_pkt *pkt, 
cmdq_async_flush_cb cb,
  int cmdq_pkt_flush(struct cmdq_pkt *pkt);
  
  u8 cmdq_subsys_base_to_id(struct cmdq_base *clt_base, u32 base);

+struct cmdq_base *cmdq_register_device(struct device *dev);
  
  #endif	/* __MTK_CMDQ_H__ */

Re: [PATCH] ARM: socfpga: fix base address of SDR controller

2019-01-29 Thread Simon Goldschmidt

+ Marek (as I really want to keep the dts in Linux and U-Boot in sync)
On Wed, Jan 30, 2019 at 1:16 AM Dinh Nguyen  wrote:
>
>
>
> On 1/29/19 2:08 PM, Simon Goldschmidt wrote:
> > From: Simon Goldschmidt 
> >
> > The documentation for socfpga gen5 says the base address of the sdram
> > controller is 0xffc2, while the current devicetree says it is at
> > 0xffc25000.
> >
> > While this is not a problem for Linux, as it only accesses the registers
> > above 0xffc25000, it *is* a problem for U-Boot because the lower registers
> > are used during DDR calibration (up to now, the U-Boot driver does not use
> > the dts address, but that should change).
> >
> > To keep Linux and U-Boot devicetrees in sync, this patch changes the base
> > address to 0xffc2 and adapts the 2 files where it is currently used.
> >
> > This patch changes the dts and 2 drivers with one commit to prevent
> > breaking the code if dts change and driver change would be split.
> >
> > Signed-off-by: Simon Goldschmidt 
> > ---
> >
> >  arch/arm/boot/dts/socfpga.dtsi   | 4 ++--
> >  arch/arm/mach-socfpga/self-refresh.S | 4 ++--
> >  drivers/fpga/altera-fpga2sdram.c | 2 +-
> >  3 files changed, 5 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/arm/boot/dts/socfpga.dtsi b/arch/arm/boot/dts/socfpga.dtsi
> > index f365003f0..8f6c1a5d6 100644
> > --- a/arch/arm/boot/dts/socfpga.dtsi
> > +++ b/arch/arm/boot/dts/socfpga.dtsi
> > @@ -788,9 +788,9 @@
> >   reg = <0xfffec000 0x100>;
> >   };
> >
> > - sdr: sdr@ffc25000 {
> > + sdr: sdr@ffc2 {
> >   compatible = "altr,sdr-ctl", "syscon";
> > - reg = <0xffc25000 0x1000>;
> > + reg = <0xffc2 0x6000>;
>
> I don't see the U-Boot device tree having this change. Yes, the
> documentation does state that the SDR address starts at 0xffc2, but
> all of the pertinent registers start at 0x5000 offset. Thus, the
> starting address should be 0xffc25000.[1]

You don't see it in U-Boot as I'm working on a patch for that.
As I wrote in the commit message, U-Boot currently does not use the
devicetree for the SDR driver, but I want to convert it to do that.

But before converting, I need to find a clean way to provide the
register addresses to the driver. That doesn't work with the current dts.

>
> [1]
> https://www.intel.com/content/www/us/en/programmable/documentation/sfo1410143707420.html#sfo1411577366917

Well, in [2], you see that the peripheral's address range actually starts
at 0xffc2. It's only the public documented registers that start at
0xffc25000. I don't know why the lower address range is undocumented.
Maybe you can help me here?

But U-Boot needs to use the undocumented registers to bring up the DDR-RAM.
Even if the registers for that are not (clearly?) documented, I think the
devicetree should still reflect the correct address range.

The U-Boot driver is made up of 2 files (in drivers/ddr/altera):
- sdram_gen5.c [3]: using the documented registers from 0xffc25000
- sequencer.c [4]: using the (undocumented?) registers from 0xffc2

In both files, you can see the register addresses they use by checking the
static variables at the top of the file. And for convenience, use [5] to
search for the values of defines.

[2]
https://www.intel.com/content/www/us/en/programmable/hps/cyclone-v/hps.html
[3]
https://github.com/u-boot/u-boot/blob/master/drivers/ddr/altera/sdram_gen5.c
[4]
https://github.com/u-boot/u-boot/blob/master/drivers/ddr/altera/sequencer.c
[5]
https://elixir.bootlin.com/u-boot/latest/source

Regards,
Simon

Re: [PATCH v16 0/7] Parse ACPI table and limit KASLR to choosing immovable memory

2019-01-29 Thread Chao Fan

On Mon, Jan 28, 2019 at 06:51:32PM +0100, Borislav Petkov wrote:
>On Wed, Jan 23, 2019 at 07:08:43PM +0800, Chao Fan wrote:
>> PATCH 1/7 Copy kstrtoull() to boot/string.c to instead of using
>>   old simple_strtoull()
>> PATCH 2/7 Introduce get_acpi_rsdp() to parse RSDP in cmdline from KEXEC
>> PATCH 3/6 Introduce efi_get_rsdp_addr() to find RSDP from EFI table when
>>   booting from EFI.
>> PATCH 4/7 Introduce bios_get_rsdp_addr() to search RSDP in memory when
>>   booting from BIOS
>> PATCH 5/7 Parse RSDP and fill in boot_params->acpi_rsdp_addr before
>>   KASLR.
>> PATCH 6/7 Compute SRAT from RSDP and walk SRAT to store the immovable
>>   memory regions and store the immovable memory regions.
>> PATCH 7/7 Calculate the intersection between memory regions from e820/efi
>>   memory table and immovable memory regions. Limit KASLR to
>>   choosing these regions for randomization.

Hi Boris,

Sorry for delay.
>
>Ok, I've massaged the whole pile and fixed a couple of things that
>sprang at me, see each commit message for details.

Thanks for your fix, your change looks good.
>
>Please run it and check whether I haven't broken anything:
>
>https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=tip-x86-boot

I clone your branch and test some cases. Except the build issue of
cmdline_find_option(), PATCHSET works well.
Build in x86_64 and i386.
Tested EFI/BIOS environment in x86_64 and BIOS environment in i386.
Tested the 'acpi_rsdp=' and 'boot_params->acpi_rsdp_addr' issue, all
work well.

Thanks,
Chao Fan

>
>Thx.
>
>-- 
>Regards/Gruss,
>Boris.
>
>Good mailing practices for 400: avoid top-posting and trim the reply.
>
>

Re: [PATCH 03/10] soc: mediatek: move the CMDQ_IRQ_MASK into cmdq driver data


On 01/29/2019 03:32 PM, Bibby Hsieh wrote:

The interrupt mask and thread number has positive correlation,
so we move the CMDQ_IRQ_MASK into cmdq driver data and calculate
it by thread number.

Signed-off-by: Bibby Hsieh 
---
  drivers/mailbox/mtk-cmdq-mailbox.c | 12 
  1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/mailbox/mtk-cmdq-mailbox.c 

b/drivers/mailbox/mtk-cmdq-mailbox.c

index 909eb23..f6174ca 100644
--- a/drivers/mailbox/mtk-cmdq-mailbox.c
+++ b/drivers/mailbox/mtk-cmdq-mailbox.c
@@ -17,7 +17,6 @@
  #include 

  #define CMDQ_OP_CODE_MASK (0xff << CMDQ_OP_CODE_SHIFT)
-#define CMDQ_IRQ_MASK  0x
  #define CMDQ_NUM_CMD(t)   (t->cmd_buf_size / 
CMDQ_INST_SIZE)

  #define CMDQ_CURR_IRQ_STATUS  0x10
@@ -71,6 +70,7 @@ struct cmdq {
void __iomem*base;
u32 irq;
u32 thread_nr;
+   u32 irq_mask;
struct cmdq_thread  *thread;
struct clk  *clock;
boolsuspended;
@@ -284,11 +284,11 @@ static irqreturn_t cmdq_irq_handler(int irq, 

void *dev)

unsigned long irq_status, flags = 0L;
int bit;

-   irq_status = readl(cmdq->base + CMDQ_CURR_IRQ_STATUS) & CMDQ_IRQ_MASK;
-   if (!(irq_status ^ CMDQ_IRQ_MASK))
+   irq_status = readl(cmdq->base + CMDQ_CURR_IRQ_STATUS) & cmdq->irq_mask;
+   if (!(irq_status ^ cmdq->irq_mask))
return IRQ_NONE;

-   for_each_clear_bit(bit, _status, fls(CMDQ_IRQ_MASK)) {
+   for_each_clear_bit(bit, _status, fls(cmdq->irq_mask)) {
struct cmdq_thread *thread = >thread[bit];

spin_lock_irqsave(>chan->lock, flags);
@@ -472,6 +472,9 @@ static int cmdq_probe(struct platform_device *pdev)
dev_err(dev, "failed to get irq\n");
return -EINVAL;
}
+
+   cmdq->thread_nr = (u32)(unsigned long)of_device_get_match_data(dev);
+   cmdq->irq_mask = GENMASK(cmdq->thread_nr - 1, 0);
err = devm_request_irq(dev, cmdq->irq, cmdq_irq_handler, IRQF_SHARED,
   "mtk_cmdq", cmdq);
if (err < 0) {
@@ -489,6 +492,7 @@ static int cmdq_probe(struct platform_device *pdev)
}

cmdq->thread_nr = (u32)(unsigned long)of_device_get_match_data(dev);
+   cmdq->irq_mask = GENMASK(cmdq->thread_nr - 1, 0);


The cmdq->thread_nr and cmdq->irq_mask are already set above, so these 
two lines can be removed.



cmdq->mbox.dev = dev;
cmdq->mbox.chans = devm_kcalloc(dev, cmdq->thread_nr,
sizeof(*cmdq->mbox.chans), GFP_KERNEL);

Re: Will the recent memory leak fixes be backported to longterm kernels?

2019-01-29 Thread Roman Gushchin

On Tue, Jan 29, 2019 at 07:23:56PM -0500, Sasha Levin wrote:
> On Fri, Dec 28, 2018 at 11:50:08AM +0100, Greg KH wrote:
> > On Mon, Nov 05, 2018 at 10:21:23AM +0100, Michal Hocko wrote:
> > > On Fri 02-11-18 19:38:35, Roman Gushchin wrote:
> > > > On Fri, Nov 02, 2018 at 06:48:23PM +0100, Michal Hocko wrote:
> > > > > On Fri 02-11-18 17:25:58, Roman Gushchin wrote:
> > > > > > On Fri, Nov 02, 2018 at 05:51:47PM +0100, Michal Hocko wrote:
> > > > > > > On Fri 02-11-18 16:22:41, Roman Gushchin wrote:
> > > > > [...]
> > > > > > > > 2) We do forget to scan the last page in the LRU list. So if we 
> > > > > > > > ended up with
> > > > > > > > 1-page long LRU, it can stay there basically forever.
> > > > > > >
> > > > > > > Why
> > > > > > >   /*
> > > > > > >* If the cgroup's already been deleted, make sure to
> > > > > > >* scrape out the remaining cache.
> > > > > > >*/
> > > > > > >   if (!scan && !mem_cgroup_online(memcg))
> > > > > > >   scan = min(size, SWAP_CLUSTER_MAX);
> > > > > > >
> > > > > > > in get_scan_count doesn't work for that case?
> > > > > >
> > > > > > No, it doesn't. Let's look at the whole picture:
> > > > > >
> > > > > > size = lruvec_lru_size(lruvec, lru, sc->reclaim_idx);
> > > > > > scan = size >> sc->priority;
> > > > > > /*
> > > > > >  * If the cgroup's already been deleted, make sure to
> > > > > >  * scrape out the remaining cache.
> > > > > >  */
> > > > > > if (!scan && !mem_cgroup_online(memcg))
> > > > > > scan = min(size, SWAP_CLUSTER_MAX);
> > > > > >
> > > > > > If size == 1, scan == 0 => scan = min(1, 32) == 1.
> > > > > > And after proportional adjustment we'll have 0.
> > > > >
> > > > > My friday brain hurst when looking at this but if it doesn't work as
> > > > > advertized then it should be fixed. I do not see any of your patches 
> > > > > to
> > > > > touch this logic so how come it would work after them applied?
> > > >
> > > > This part works as expected. But the following
> > > > scan = div64_u64(scan * fraction[file], denominator);
> > > > reliable turns 1 page to scan to 0 pages to scan.
> > > 
> > > OK, 68600f623d69 ("mm: don't miss the last page because of round-off
> > > error") sounds like a good and safe stable backport material.
> > 
> > Thanks for this, now queued up.
> > 
> > greg k-h
> 
> It seems that 172b06c32b949 ("mm: slowly shrink slabs with a relatively
> small number of objects") and a76cf1a474d ("mm: don't reclaim inodes
> with many attached pages") cause a regression reported against the 4.19
> stable tree: https://bugzilla.kernel.org/show_bug.cgi?id=202441 .
> 
> Given the history and complexity of these (and other patches from that
> series) it would be nice to understand if this is something that will be
> fixed soon or should we look into reverting the series for now?

In that thread I've just suggested to give a chance to Rik's patch, which
hopefully will mitigate or easy the regression (
https://lkml.org/lkml/2019/1/28/1865 ).

Of course, we can simple revert those changes, but this will re-introduce
the memory leak, so I'd leave it as a last option.

Thanks!

[PATCH 2/2] iommu/amd: Remove clear_flush_young notifier

2019-01-29 Thread Peter Xu

AMD IOMMU driver is using the clear_flush_young() to do cache flushing
but that's actually already covered by invalidate_range().  Remove the
extra notifier and the chunks.

Signed-off-by: Peter Xu 
---
 drivers/iommu/amd_iommu_v2.c | 24 
 1 file changed, 24 deletions(-)

diff --git a/drivers/iommu/amd_iommu_v2.c b/drivers/iommu/amd_iommu_v2.c
index 23dae9348ace..5d7ef750e4a0 100644
--- a/drivers/iommu/amd_iommu_v2.c
+++ b/drivers/iommu/amd_iommu_v2.c
@@ -370,29 +370,6 @@ static struct pasid_state *mn_to_state(struct mmu_notifier 
*mn)
return container_of(mn, struct pasid_state, mn);
 }
 
-static void __mn_flush_page(struct mmu_notifier *mn,
-   unsigned long address)
-{
-   struct pasid_state *pasid_state;
-   struct device_state *dev_state;
-
-   pasid_state = mn_to_state(mn);
-   dev_state   = pasid_state->device_state;
-
-   amd_iommu_flush_page(dev_state->domain, pasid_state->pasid, address);
-}
-
-static int mn_clear_flush_young(struct mmu_notifier *mn,
-   struct mm_struct *mm,
-   unsigned long start,
-   unsigned long end)
-{
-   for (; start < end; start += PAGE_SIZE)
-   __mn_flush_page(mn, start);
-
-   return 0;
-}
-
 static void mn_invalidate_range(struct mmu_notifier *mn,
struct mm_struct *mm,
unsigned long start, unsigned long end)
@@ -430,7 +407,6 @@ static void mn_release(struct mmu_notifier *mn, struct 
mm_struct *mm)
 
 static const struct mmu_notifier_ops iommu_mn = {
.release= mn_release,
-   .clear_flush_young  = mn_clear_flush_young,
.invalidate_range   = mn_invalidate_range,
 };
 
-- 
2.17.1

[PATCH 1/2] iommu/vt-d: Remove change_pte notifier

2019-01-29 Thread Peter Xu

The change_pte() interface is tailored for PFN updates, while the
other notifier invalidate_range() should be enough for Intel IOMMU
cache flushing.  Actually we've done similar thing for AMD IOMMU
already in 8301da53fbc1 ("iommu/amd: Remove change_pte mmu_notifier
call-back", 2014-07-30) but the Intel IOMMU driver still have it.

Signed-off-by: Peter Xu 
---
 drivers/iommu/intel-svm.c | 9 -
 1 file changed, 9 deletions(-)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index a2a2aa4439aa..e9fd3ca057ac 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -180,14 +180,6 @@ static void intel_flush_svm_range(struct intel_svm *svm, 
unsigned long address,
rcu_read_unlock();
 }
 
-static void intel_change_pte(struct mmu_notifier *mn, struct mm_struct *mm,
-unsigned long address, pte_t pte)
-{
-   struct intel_svm *svm = container_of(mn, struct intel_svm, notifier);
-
-   intel_flush_svm_range(svm, address, 1, 1, 0);
-}
-
 /* Pages have been freed at this point */
 static void intel_invalidate_range(struct mmu_notifier *mn,
   struct mm_struct *mm,
@@ -227,7 +219,6 @@ static void intel_mm_release(struct mmu_notifier *mn, 
struct mm_struct *mm)
 
 static const struct mmu_notifier_ops intel_mmuops = {
.release = intel_mm_release,
-   .change_pte = intel_change_pte,
.invalidate_range = intel_invalidate_range,
 };
 
-- 
2.17.1

[PATCH 0/2] Some MMU notifier cleanups for Intel/AMD IOMMU

2019-01-29 Thread Peter Xu

Recently when I'm reading the mmu notifiers I noticed that both
Intel/AMD IOMMU drivers seem to have redundancies in using the MMU
notifiers.  It can also be seen as a follow up of commit 8301da53fbc1
("iommu/amd: Remove change_pte mmu_notifier call-back", 2014-07-30).

I don't have hardwares to test them, but they compile well.

Please have a look, thanks.

Peter Xu (2):
  iommu/vt-d: Remove change_pte notifier
  iommu/amd: Remove clear_flush_young notifier

 drivers/iommu/amd_iommu_v2.c | 24 
 drivers/iommu/intel-svm.c|  9 -
 2 files changed, 33 deletions(-)

-- 
2.17.1

Re: [PATCH] proc: calculate end pointer for /proc// lookup at compile time

2019-01-29 Thread Alexey Dobriyan

On Tue, Jan 29, 2019 at 02:18:48PM -0800, Andrew Morton wrote:
> On Mon, 14 Jan 2019 23:04:23 +0300 Alexey Dobriyan  
> wrote:
> 
> > Compilers like to transform loops like
> > 
> > for (i = 0; i < n; i++) {
> > [use p[i]]
> > }
> > 
> > into
> > for (p = p0; p < end; p++) {
> > ...
> > }
> > 
> > Do it by hand, so that it results in overall simpler loop
> > and smaller code.
> > 
> > Space savings:
> > 
> > $ ./scripts/bloat-o-meter ../vmlinux-001 ../obj/vmlinux
> > add/remove: 0/0 grow/shrink: 2/1 up/down: 4/-9 (-5)
> > Function old new   delta
> > proc_tid_base_lookup  17  19  +2
> > proc_tgid_base_lookup 17  19  +2
> > proc_pident_lookup   179 170  -9
> > 
> > Note: this trick bloats readdir, so don't do it :-\
> 
> I don't understand the Note:.  Can you please expand?

The same could be done to proc_pident_readdir(), but the code becomes
bigger for some reason.

RE: [PATCH 1/2] media: dt-bindings: media: xilinx: Add Xilinx MIPI CSI-2 Rx Subsystem

2019-01-29 Thread Vishal Sagar

Hi Sakari,

> -Original Message-
> From: Sakari Ailus [mailto:sakari.ai...@linux.intel.com]
> Sent: Monday, January 28, 2019 5:30 PM
> To: Vishal Sagar 
> Cc: Vishal Sagar ; Hyun Kwon ;
> laurent.pinch...@ideasonboard.com; Michal Simek ;
> linux-me...@vger.kernel.org; devicet...@vger.kernel.org;
> hans.verk...@cisco.com; mche...@kernel.org; robh...@kernel.org;
> mark.rutl...@arm.com; Dinesh Kumar ; linux-arm-
> ker...@lists.infradead.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH 1/2] media: dt-bindings: media: xilinx: Add Xilinx MIPI 
> CSI-2
> Rx Subsystem
> 
> Hi Vishal,
> 
> On Mon, Jan 14, 2019 at 09:47:41AM +, Vishal Sagar wrote:
> > Hi Sakari,
> >
> > Thanks for reviewing this.
> >
> > > -Original Message-
> > > From: Sakari Ailus [mailto:sakari.ai...@linux.intel.com]
> > > Sent: Tuesday, January 08, 2019 6:35 PM
> > > To: Vishal Sagar 
> > > Cc: Hyun Kwon ; laurent.pinch...@ideasonboard.com;
> > > Michal Simek ; linux-me...@vger.kernel.org;
> > > devicet...@vger.kernel.org; hans.verk...@cisco.com; mche...@kernel.org;
> > > robh...@kernel.org; mark.rutl...@arm.com; Dinesh Kumar
> > > ; linux-arm-ker...@lists.infradead.org; linux-
> > > ker...@vger.kernel.org
> > > Subject: Re: [PATCH 1/2] media: dt-bindings: media: xilinx: Add Xilinx 
> > > MIPI
> CSI-2
> > > Rx Subsystem
> > >
> > > EXTERNAL EMAIL
> > >
> > > Hi Vishal,
> > >
> > > The patchset hard escaped me somehow earlier and your reply to Rob made
> me
> > > notice it again. Thanks. :-)
> > >
> > > On Wed, May 30, 2018 at 12:24:43AM +0530, Vishal Sagar wrote:
> > > > Add bindings documentation for Xilinx MIPI CSI-2 Rx Subsystem.
> > > >
> > > > The Xilinx MIPI CSI-2 Rx Subsystem consists of a DPHY, CSI-2 Rx, an
> > > > optional I2C controller and an optional Video Format Bridge (VFB). The
> > > > active lanes can be configured at run time if enabled in the IP. The
> > > > DPHY register interface may also be enabled.
> > > >
> > > > Signed-off-by: Vishal Sagar 
> > > > ---
> > > >  .../bindings/media/xilinx/xlnx,csi2rxss.txt| 117
> > > +
> > > >  1 file changed, 117 insertions(+)
> > > >  create mode 100644
> > > Documentation/devicetree/bindings/media/xilinx/xlnx,csi2rxss.txt
> > > >
> > > > diff --git
> a/Documentation/devicetree/bindings/media/xilinx/xlnx,csi2rxss.txt
> > > b/Documentation/devicetree/bindings/media/xilinx/xlnx,csi2rxss.txt
> > > > new file mode 100644
> > > > index 000..31ed721
> > > > --- /dev/null
> > > > +++ b/Documentation/devicetree/bindings/media/xilinx/xlnx,csi2rxss.txt
> > > > @@ -0,0 +1,117 @@
> > > > +
> > >
> > > Extra newline.
> > >
> >
> > Will remove it in next version.
> >
> > > > +Xilinx MIPI CSI2 Receiver Subsystem Device Tree Bindings
> > > > +
> > > > +
> > > > +The Xilinx MIPI CSI2 Receiver Subsystem is used to capture MIPI CSI2
> traffic
> > > > +from compliant camera sensors and send the output as AXI4 Stream
> video
> > > data
> > > > +for image processing.
> > > > +
> > > > +The subsystem consists of a MIPI DPHY in slave mode which captures the
> > > > +data packets. This is passed along the MIPI CSI2 Rx IP which extracts 
> > > > the
> > > > +packet data. This data is taken in by the Video Format Bridge (VFB),
> > > > +if selected, and converted into AXI4 Stream video data at selected
> > > > +pixels per clock as per AXI4-Stream Video IP and System Design UG934.
> > > > +
> > > > +For more details, please refer to PG232 MIPI CSI-2 Receiver Subsystem.
> > > >
> > >
> +https://www.xilinx.com/support/documentation/ip_documentation/mipi_csi
> > > 2_rx_subsystem/v3_0/pg232-mipi-csi2-rx.pdf
> > > > +
> > > > +Required properties:
> > > > +
> > > > +- compatible: Must contain "xlnx,mipi-csi2-rx-subsystem-2.0" or
> > > > +  "xlnx,mipi-csi2-rx-subsystem-3.0"
> > > > +
> > > > +- reg: Physical base address and length of the registers set for the 
> > > > device.
> > > > +
> > > > +- interrupt-parent: specifies the phandle to the parent interrupt
> controller
> > > > +
> > > > +- interrupts: Property with a value describing the interrupt number.
> > > > +
> > > > +- xlnx,max-lanes: Maximum active lanes in the design.
> > > > +
> > > > +- xlnx,vc: Virtual Channel, specifies virtual channel number to be 
> > > > filtered.
> > > > +  If this is 4 then all virtual channels are allowed.
> > >
> > > This seems like something a driver should configure, based on the
> > > configuration of the connected device.
> > >
> >
> > The filtering of the Virtual channels is property of the hardware IP and is 
> > fixed
> in design.
> > This is not software controlled.
> 
> So... you have different IP blocks between which (one of) the difference(s)
> is the virtual channel?
> 

Your understanding is correct. 

The Xilinx CSI2 Rx subsystem has the 3 blocks -
1 - Xilinx CSI2 Rx controller
2 - Xilinx DPHY in Rx mode (whose register interface may be disabled/fixed 
configuration to reduce logic gate count).
3 - Xilinx

Re: [PATCH 0/2] [REGRESSION v4.19-20] mm: shrinkers are now way too aggressive

2019-01-29 Thread Roman Gushchin

Hi, Dave!

Instead of reverting (which will bring back the memcg memory leak),
can you, please, try Rik's patch: https://lkml.org/lkml/2019/1/28/1865 ?

It should protect small cgroups from being scanned too hard by the memory
pressure, however keeping the pressure big enough to avoid memory leaks.

Thanks!

On Wed, Jan 30, 2019 at 03:17:05PM +1100, Dave Chinner wrote:
> Hi mm-folks,
> 
> TL;DR: these two commits break system wide memory VFS cache reclaim
> balance badly, cause severe performance regressions in stable
> kernels and they need to be reverted ASAP.
> 
> For background, let's start with the bug reports that have come from
> desktop users on 4.19 stable kernels. First this one:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=202349
> 
> Whereby copying a large amount of data to files on an XFS filesystem
> would cause the desktop to freeze for multiple seconds and,
> apparently occasionally hang completely. Basically, GPU based
> GFP_KERNEL allocations getting stuck in shrinkers under realtively
> light memory loads killing desktop interactivity. Kernel 4.19.16
> 
> The second:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=202441
> 
> Whereby copying a large data set across NFS filesystems at the same
> time as running a kernel compile on a local XFS filesystem results
> in the kernel compile going from 3m30s to over an hour and file copy
> performance tanking.
> 
> We ran an abbreviated bisect from 4.18 through to 4.19.18, and found
> two things:
> 
>   1: there was a major change in page cache reclaim behaviour
>   introduced in 4.19-rc5. Basically the page cache would get
>   trashed periodically for no apparent reason, the
>   characteristic being a sawtooth cache usage pattern.
> 
>   2: in 4.19.3, kernel compile performance turned to crap.
> 
> The kernel compile regression is essentially caused by memory
> reclaim driving the XFS inode shrinker hard in to reclaiming dirty
> inodes and getting blocked, essentially slowing reclaim down to the
> rate at which a slow SATA drive could write back inodes. There were
> also indications of a similar GPU-based GFP_KERNEL allocation
> stalls, but most of the testing was done from the CLI with no X so
> that could be discounted.
> 
> It was reported that less severe slowdowns also occurred on ext2,
> ext3, ext4 and jfs, so XFS is really just the messenger here - it is
> most definitely not the cause of the problem being seen, so stop and
> thing before you go and blame XFS.
> 
> Looking at the change history of the mm/ subsystem after the first
> bug report, I noticed and red-flagged this commit for deeper
> analysis:
> 
> 172b06c32b94 ("mm: slowly shrink slabs with a relatively small number of 
> objects")
> 
> That "simple" change ran a read flag because it makes shrinker
> reclaim far, far more agressive at initial priority reclaims (ie..
> reclaim priority = 12). And it also means that small caches that
> don't need reclaim (because they are small) will be agressively
> scanned and reclaimed when there is very little memory pressure,
> too. It also means tha tlarge caches are reclaimed very agressively
> under light memory pressure - pressure that would have resulted in
> single digit scan count now gets run out to batch size, which for
> filesystems is 1024 objects. i.e. we increase reclaim filesystem
> superblock shrinker pressure by an order of 100x at light reclaim.
> 
> That's a *bad thing* because it means we can't retain working sets
> of small caches even under light memory pressure - they get
> excessively reclaimed in comparison to large caches instead of in
> proptortion to the rest of the system caches.
> 
> So, yeah, red flag. Big one. And the patch never got sent to
> linux-fsdevel so us filesystem people didn't ahve any idea that
> there were changes to VFS cache balances coming down the line. Hence
> our users reporting problems ar the first sign we get of a
> regression...
> 
> So when Roger reported that the page cache behaviour changed
> massively in 4.19-rc5, and I found that commit was between -rc4 and
> -rc5? Yeah, that kinda proved my theory that it changed the
> fundamental cache balance of the system and the red flag is real...
> 
> So, the second, performance killing change? Well, I think you all
> know what's coming:
> 
> a76cf1a474d7 mm: don't reclaim inodes with many attached pages
> 
> [ Yup, a "MM" tagged patch that changed code in fs/inode.c and wasn't
> cc'd to any fileystem list. There's a pattern emerging here. Did
> anyone think to cc the guy who originally designed ithe numa aware
> shrinker infrastucture and helped design the memcg shrinker
> infrastructure on fundamental changes? ]
> 
> So, that commit was an attempt to fix the shitty behaviour
> introduced by 172b06c32b94 - it's a bandaid over a symptom rather
> than something that attempts to correct the actual bug that was
> introduced. i.e. the increased inode cache reclaim pressure was now
> reclaiming inodes faster than the

Re: [PATCH] ipmr: ip6mr: Create new sockopt to clear mfc cache only

2019-01-29 Thread kbuild test robot

Hi Callum,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net/master]
[also build test ERROR on v5.0-rc4 next-20190129]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Callum-Sinclair/ipmr-ip6mr-Create-new-sockopt-to-clear-mfc-cache-only/20190130-104146
config: arm-allmodconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 8.2.0-11) 8.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=8.2.0 make.cross ARCH=arm 

All errors (new ones prefixed by >>):

   net/ipv4/ipmr.c: In function 'mroute_clean_cache':
>> net/ipv4/ipmr.c:1312:3: error: 'cache' undeclared (first use in this 
>> function); did you mean 'cacheid'?
  cache = (struct mfc_cache *)c;
  ^
  cacheid
   net/ipv4/ipmr.c:1312:3: note: each undeclared identifier is reported only 
once for each function it appears in
   net/ipv4/ipmr.c:1313:33: error: 'net' undeclared (first use in this function)
  call_ipmr_mfc_entry_notifiers(net, FIB_EVENT_ENTRY_DEL, cache,
^~~
   net/ipv4/ipmr.c: In function 'mroute_clean_tables':
   net/ipv4/ipmr.c:1334:14: warning: unused variable 'net' [-Wunused-variable]
 struct net *net = read_pnet(>net);
 ^~~

vim +1312 net/ipv4/ipmr.c

^1da177e4 Linus Torvalds  2005-04-16  1300  
7ba7b80d1 Callum Sinclair 2019-01-30  1301  /* Clear the vif tables */
7ba7b80d1 Callum Sinclair 2019-01-30  1302  static void 
mroute_clean_cache(struct mr_table *mrt, bool all)
^1da177e4 Linus Torvalds  2005-04-16  1303  {
494fff563 Yuval Mintz 2018-02-28  1304  struct mr_mfc *c, *tmp;
^1da177e4 Linus Torvalds  2005-04-16  1305  
a8cb16dd9 Eric Dumazet2010-10-01  1306  /* Wipe the cache */
8fb472c09 Nikolay Aleksandrov 2017-01-12  1307  
list_for_each_entry_safe(c, tmp, >mfc_cache_list, list) {
0e615e960 Nikolay Aleksandrov 2015-11-20  1308  if (!all && 
(c->mfc_flags & MFC_STATIC))
^1da177e4 Linus Torvalds  2005-04-16  1309  
continue;
8fb472c09 Nikolay Aleksandrov 2017-01-12  1310  
rhltable_remove(>mfc_hash, >mnode, ipmr_rht_params);
a8c9486b8 Eric Dumazet2010-10-01  1311  
list_del_rcu(>list);
494fff563 Yuval Mintz 2018-02-28 @1312  cache = (struct 
mfc_cache *)c;
494fff563 Yuval Mintz 2018-02-28  1313  
call_ipmr_mfc_entry_notifiers(net, FIB_EVENT_ENTRY_DEL, cache,
b362053a7 Yotam Gigi  2017-09-27  1314  
  mrt->id);
494fff563 Yuval Mintz 2018-02-28  1315  
mroute_netlink_event(mrt, cache, RTM_DELROUTE);
8c13af2a2 Yuval Mintz 2018-03-26  1316  mr_cache_put(c);
^1da177e4 Linus Torvalds  2005-04-16  1317  }
^1da177e4 Linus Torvalds  2005-04-16  1318  
0c12295a7 Patrick McHardy 2010-04-13  1319  if 
(atomic_read(>cache_resolve_queue_len) != 0) {
^1da177e4 Linus Torvalds  2005-04-16  1320  
spin_lock_bh(_unres_lock);
8fb472c09 Nikolay Aleksandrov 2017-01-12  1321  
list_for_each_entry_safe(c, tmp, >mfc_unres_queue, list) {
862465f2e Patrick McHardy 2010-04-13  1322  
list_del(>list);
494fff563 Yuval Mintz 2018-02-28  1323  cache = 
(struct mfc_cache *)c;
494fff563 Yuval Mintz 2018-02-28  1324  
mroute_netlink_event(mrt, cache, RTM_DELROUTE);
494fff563 Yuval Mintz 2018-02-28  1325  
ipmr_destroy_unres(mrt, cache);
^1da177e4 Linus Torvalds  2005-04-16  1326  }
^1da177e4 Linus Torvalds  2005-04-16  1327  
spin_unlock_bh(_unres_lock);
^1da177e4 Linus Torvalds  2005-04-16  1328  }
^1da177e4 Linus Torvalds  2005-04-16  1329  }
^1da177e4 Linus Torvalds  2005-04-16  1330  

:: The code at line 1312 was first introduced by commit
:: 494fff56379c4ad5b8fe36a5b7ffede4044ca7bb ipmr, ip6mr: Make mfc_cache a 
common structure

:: TO: Yuval Mintz 
:: CC: David S. Miller 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

[PATCH v2] Bluetooth: Add NULL check for tiocmget() and tiocmset()

2019-01-29 Thread Myungho Jung

tiocmget() and tiocmset() operations are optional and some tty drivers
like pty miss the operations. We need NULL check to prevent from
dereference.

Signed-off-by: Myungho Jung 
---
 drivers/bluetooth/hci_ath.c   | 6 ++
 drivers/bluetooth/hci_ldisc.c | 4 
 2 files changed, 10 insertions(+)

diff --git a/drivers/bluetooth/hci_ath.c b/drivers/bluetooth/hci_ath.c
index d568fbd94d6c..fb9f6323a911 100644
--- a/drivers/bluetooth/hci_ath.c
+++ b/drivers/bluetooth/hci_ath.c
@@ -185,8 +185,14 @@ static int ath_set_bdaddr(struct hci_dev *hdev, const 
bdaddr_t *bdaddr)
 
 static int ath_setup(struct hci_uart *hu)
 {
+   struct tty_struct *tty = hu->tty;
+
BT_DBG("hu %p", hu);
 
+   /* tty driver should support operations to set RTS */
+   if (!tty->driver->ops->tiocmget || !tty->driver->ops->tiocmset)
+   return -EOPNOTSUPP;
+
hu->hdev->set_bdaddr = ath_set_bdaddr;
 
return 0;
diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index fbf7b4df23ab..cb31c2d8d826 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -314,6 +314,10 @@ void hci_uart_set_flow_control(struct hci_uart *hu, bool 
enable)
return;
}
 
+   /* tiocmget() and tiocmset() operations are optional */
+   if (!tty->driver->ops->tiocmget || !tty->driver->ops->tiocmset)
+   return;
+
if (enable) {
/* Disable hardware flow control */
ktermios = tty->termios;
-- 
2.17.1

Re: [PATCH 0/3] iommu/arm-smmu: Add support to use Last level cache

2019-01-29 Thread Vivek Gautam

On Tue, Jan 29, 2019 at 8:34 PM Ard Biesheuvel
 wrote:
>
> (+ Bjorn)
>
> On Mon, 28 Jan 2019 at 12:27, Vivek Gautam  
> wrote:
> >
> > Hi Ard,
> >
> > On Thu, Jan 24, 2019 at 1:25 PM Ard Biesheuvel
> >  wrote:
> > >
> > > On Thu, 24 Jan 2019 at 07:58, Vivek Gautam  
> > > wrote:
> > > >
> > > > On Mon, Jan 21, 2019 at 7:55 PM Ard Biesheuvel
> > > >  wrote:
> > > > >
> > > > > On Mon, 21 Jan 2019 at 14:56, Robin Murphy  
> > > > > wrote:
> > > > > >
> > > > > > On 21/01/2019 13:36, Ard Biesheuvel wrote:
> > > > > > > On Mon, 21 Jan 2019 at 14:25, Robin Murphy  
> > > > > > > wrote:
> > > > > > >>
> > > > > > >> On 21/01/2019 10:50, Ard Biesheuvel wrote:
> > > > > > >>> On Mon, 21 Jan 2019 at 11:17, Vivek Gautam 
> > > > > > >>>  wrote:
> > > > > > 
> > > > > >  Hi,
> > > > > > 
> > > > > > 
> > > > > >  On Mon, Jan 21, 2019 at 12:56 PM Ard Biesheuvel
> > > > > >   wrote:
> > > > > > >
> > > > > > > On Mon, 21 Jan 2019 at 06:54, Vivek Gautam 
> > > > > > >  wrote:
> > > > > > >>
> > > > > > >> Qualcomm SoCs have an additional level of cache called as
> > > > > > >> System cache, aka. Last level cache (LLC). This cache sits 
> > > > > > >> right
> > > > > > >> before the DDR, and is tightly coupled with the memory 
> > > > > > >> controller.
> > > > > > >> The clients using this cache request their slices from this
> > > > > > >> system cache, make it active, and can then start using it.
> > > > > > >> For these clients with smmu, to start using the system cache 
> > > > > > >> for
> > > > > > >> buffers and, related page tables [1], memory attributes need 
> > > > > > >> to be
> > > > > > >> set accordingly. This series add the required support.
> > > > > > >>
> > > > > > >
> > > > > > > Does this actually improve performance on reads from a 
> > > > > > > device? The
> > > > > > > non-cache coherent DMA routines perform an unconditional 
> > > > > > > D-cache
> > > > > > > invalidate by VA to the PoC before reading from the buffers 
> > > > > > > filled by
> > > > > > > the device, and I would expect the PoC to be defined as lying 
> > > > > > > beyond
> > > > > > > the LLC to still guarantee the architected behavior.
> > > > > > 
> > > > > >  We have seen performance improvements when running Manhattan
> > > > > >  GFXBench benchmarks.
> > > > > > 
> > > > > > >>>
> > > > > > >>> Ah ok, that makes sense, since in that case, the data flow is 
> > > > > > >>> mostly
> > > > > > >>> to the device, not from the device.
> > > > > > >>>
> > > > > >  As for the PoC, from my knowledge on sdm845 the system cache, 
> > > > > >  aka
> > > > > >  Last level cache (LLC) lies beyond the point of coherency.
> > > > > >  Non-cache coherent buffers will not be cached to system cache 
> > > > > >  also, and
> > > > > >  no additional software cache maintenance ops are required for 
> > > > > >  system cache.
> > > > > >  Pratik can add more if I am missing something.
> > > > > > 
> > > > > >  To take care of the memory attributes from DMA APIs side, we 
> > > > > >  can add a
> > > > > >  DMA_ATTR definition to take care of any dma non-coherent APIs 
> > > > > >  calls.
> > > > > > 
> > > > > > >>>
> > > > > > >>> So does the device use the correct inner non-cacheable, outer
> > > > > > >>> writeback cacheable attributes if the SMMU is in pass-through?
> > > > > > >>>
> > > > > > >>> We have been looking into another use case where the fact that 
> > > > > > >>> the
> > > > > > >>> SMMU overrides memory attributes is causing issues (WC mappings 
> > > > > > >>> used
> > > > > > >>> by the radeon and amdgpu driver). So if the SMMU would honour 
> > > > > > >>> the
> > > > > > >>> existing attributes, would you still need the SMMU changes?
> > > > > > >>
> > > > > > >> Even if we could force a stage 2 mapping with the weakest 
> > > > > > >> pagetable
> > > > > > >> attributes (such that combining would work), there would still 
> > > > > > >> need to
> > > > > > >> be a way to set the TCR attributes appropriately if this 
> > > > > > >> behaviour is
> > > > > > >> wanted for the SMMU's own table walks as well.
> > > > > > >>
> > > > > > >
> > > > > > > Isn't that just a matter of implementing support for SMMUs that 
> > > > > > > lack
> > > > > > > the 'dma-coherent' attribute?
> > > > > >
> > > > > > Not quite - in general they need INC-ONC attributes in case there
> > > > > > actually is something in the architectural outer-cacheable domain.
> > > > >
> > > > > But is it a problem to use INC-ONC attributes for the SMMU PTW on this
> > > > > chip? AIUI, the reason for the SMMU changes is to avoid the
> > > > > performance hit of snooping, which is more expensive than cache
> > > > > maintenance of SMMU page tables. So are you saying the by-VA cache
> > > > > maintenance is not relayed to this system cache, resulting in

Re: [PATCH] mtd: rawnand: meson: Fix linking error on 32-bit platforms

2019-01-29 Thread Liang Yang


Hello Nathan,

On 2019/1/30 5:46, Nathan Chancellor wrote:

On arm little endian allyesconfig:

   ld.lld: error: undefined symbol: __aeabi_uldivmod
   >>> referenced by meson_nand.c
   >>> mtd/nand/raw/meson_nand.o:(meson_nfc_setup_data_interface) in archive 
drivers/built-in.a

The dividend tBERS_max is u64, meaning we need to use DIV_ROUND_UP_ULL
(which wraps do_div) to prevent the compiler from emitting
__aebi_uldivmod.



ok. thanks for your time.


Fixes: 2d570b34b41a ("mtd: rawnand: meson: add support for Amlogic NAND flash 
controller")
Signed-off-by: Nathan Chancellor 
---
  drivers/mtd/nand/raw/meson_nand.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/mtd/nand/raw/meson_nand.c 
b/drivers/mtd/nand/raw/meson_nand.c
index e858d58d97b0..6f12a96195d1 100644
--- a/drivers/mtd/nand/raw/meson_nand.c
+++ b/drivers/mtd/nand/raw/meson_nand.c
@@ -1116,8 +1116,8 @@ int meson_nfc_setup_data_interface(struct nand_chip 
*nand, int csline,
   div * NFC_CLK_CYCLE);
meson_chip->tadl = DIV_ROUND_UP(PSEC_TO_NSEC(timings->tADL_min),
div * NFC_CLK_CYCLE);
-   tbers_clocks = DIV_ROUND_UP(PSEC_TO_NSEC(timings->tBERS_max),
-   div * NFC_CLK_CYCLE);
+   tbers_clocks = DIV_ROUND_UP_ULL(PSEC_TO_NSEC(timings->tBERS_max),
+   div * NFC_CLK_CYCLE);

ok.

meson_chip->tbers_max = ilog2(tbers_clocks);
if (!is_power_of_2(tbers_clocks))
meson_chip->tbers_max++;

Re: [PATCH][next] mtd: rawnand: meson: fix missing assignment of ret on a call to meson_chip_buffer_init

2019-01-29 Thread Liang Yang


Hello Colin,

On 2019/1/29 18:57, Colin King wrote:

From: Colin Ian King 

The call to meson_chip_buffer_init is not assigning ret, however, ret
is being checked for failure. Fix this by adding in the missing assignment.


ok. thanks for your time.


Fixes: 2d570b34b41a ("mtd: rawnand: meson: add support for Amlogic NAND flash 
controller")
Signed-off-by: Colin Ian King 
---
  drivers/mtd/nand/raw/meson_nand.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mtd/nand/raw/meson_nand.c 
b/drivers/mtd/nand/raw/meson_nand.c
index e858d58d97b0..b9c543d1054c 100644
--- a/drivers/mtd/nand/raw/meson_nand.c
+++ b/drivers/mtd/nand/raw/meson_nand.c
@@ -1206,7 +1206,7 @@ static int meson_nand_attach_chip(struct nand_chip *nand)
dev_err(nfc->dev, "16bits bus width not supported");
return -EINVAL;
}
-   meson_chip_buffer_init(nand);
+   ret = meson_chip_buffer_init(nand); >if (ret)
return -ENOMEM;

Re: [PATCH][next] mtd: rawnand: meson:: make several functions static

2019-01-29 Thread Liang Yang


Hello Colin,

On 2019/1/29 20:44, Colin King wrote:

From: Colin Ian King 

There are several functions that are local to the source and do
not need to be in global scope, so make them static.

Cleans up sparse warnings.


ok. thanks

Signed-off-by: Colin Ian King 
---
  drivers/mtd/nand/raw/meson_nand.c | 10 +-
  1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/mtd/nand/raw/meson_nand.c 
b/drivers/mtd/nand/raw/meson_nand.c
index b9c543d1054c..9557bd94dcd2 100644
--- a/drivers/mtd/nand/raw/meson_nand.c
+++ b/drivers/mtd/nand/raw/meson_nand.c
@@ -829,14 +829,14 @@ static int meson_nfc_read_oob(struct nand_chip *nand, int 
page)
return meson_nfc_read_page_hwecc(nand, NULL, 1, page);
  }
  
-bool meson_nfc_is_buffer_dma_safe(const void *buffer)

+static bool meson_nfc_is_buffer_dma_safe(const void *buffer)
  {
if (virt_addr_valid(buffer) && (!object_is_on_stack(buffer)))
return true;
return false;
  }
  
-void *

+static void *
  meson_nand_op_get_dma_safe_input_buf(const struct nand_op_instr *instr)
  {
if (WARN_ON(instr->type != NAND_OP_DATA_IN_INSTR))
@@ -848,7 +848,7 @@ meson_nand_op_get_dma_safe_input_buf(const struct 
nand_op_instr *instr)
return kzalloc(instr->ctx.data.len, GFP_KERNEL);
  }
  
-void

+static void
  meson_nand_op_put_dma_safe_input_buf(const struct nand_op_instr *instr,
 void *buf)
  {
@@ -863,7 +863,7 @@ meson_nand_op_put_dma_safe_input_buf(const struct 
nand_op_instr *instr,
kfree(buf);
  }
  
-void *

+static void *
  meson_nand_op_get_dma_safe_output_buf(const struct nand_op_instr *instr)
  {
if (WARN_ON(instr->type != NAND_OP_DATA_OUT_INSTR))
@@ -876,7 +876,7 @@ meson_nand_op_get_dma_safe_output_buf(const struct 
nand_op_instr *instr)
   instr->ctx.data.len, GFP_KERNEL);
  }
  
-void

+static void
  meson_nand_op_put_dma_safe_output_buf(const struct nand_op_instr *instr,
  const void *buf)
  {

Re: [PATCH 0/3] drivers: Frequency constraint infrastructure

On 28-01-19, 14:04, Qais Yousef wrote:
> But we have no way to enforce this, no? I'm thinking if frequency can be
> constrained in PM QoS framework, then we will end up with some drivers that
> think it's a good idea to use it and potentially end up breaking this "should
> not work against schedutil and similar".
> 
> Or did I miss something?
> 
> My point is that if we introduce something too generic we might end up
> encouraging more users and end up with a complex set of rules/interactions and
> lose some determinism. But I could be reading too much into it :-)

People are free to use notifiers today as well and there is nobody
stopping them. A new framework/layer may actually make them more
accountable as we can easily record which all entities have requested
to impose a freq-limit on CPUs.

-- 
viresh

Re: [PATCH 0/3] drivers: Frequency constraint infrastructure

On 17-01-19, 14:16, Juri Lelli wrote:
> I was also wondering how this new framework is dealing with
> constraints/request imposed/generated by the scheduler and related
> interfaces (thinking about schedutil and Patrick's util_clamp).

I am not very sure about what constraints are imposed by schedutil or
util-clamp stuff that may not work well with this stuff.

As you are already aware of it, this series isn't doing anything new
as we already have thermal/user constraints available in kernel. I am
just trying to implement a better way to present those to the cpufreq
core.

-- 
viresh

[PATCH] cpufreq: Auto-register the driver as a thermal cooling device if asked

2019-01-29 Thread Amit Kucheria

All cpufreq drivers do similar things to register as a cooling device.
Provide a cpufreq driver flag so drivers can just ask the cpufreq core
to register the cooling device on their behalf. This allows us to get
rid of duplicated code in the drivers.

In order to allow this, we add a struct thermal_cooling_device pointer
to struct cpufreq_policy so that drivers don't need to store it in a
private data structure.

Suggested-by: Stephen Boyd 
Suggested-by: Viresh Kumar 
Signed-off-by: Amit Kucheria 
Reviewed-by: Matthias Kaehlcke 
Tested-by: Matthias Kaehlcke 
Acked-by: Viresh Kumar 
---
 drivers/cpufreq/cpufreq.c | 11 +++
 include/linux/cpufreq.h   |  9 +
 2 files changed, 20 insertions(+)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index a8fa684f5f90..cae730264bc0 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -19,6 +19,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1316,6 +1317,10 @@ static int cpufreq_online(unsigned int cpu)
if (cpufreq_driver->ready)
cpufreq_driver->ready(policy);
 
+   if (IS_ENABLED(CONFIG_CPU_THERMAL) &&
+   cpufreq_driver->flags & CPUFREQ_IS_COOLING_DEV)
+   policy->cdev = of_cpufreq_cooling_register(policy);
+
pr_debug("initialization complete\n");
 
return 0;
@@ -1403,6 +1408,12 @@ static int cpufreq_offline(unsigned int cpu)
goto unlock;
}
 
+   if (IS_ENABLED(CONFIG_CPU_THERMAL) &&
+   cpufreq_driver->flags & CPUFREQ_IS_COOLING_DEV) {
+   cpufreq_cooling_unregister(policy->cdev);
+   policy->cdev = NULL;
+   }
+
if (cpufreq_driver->stop_cpu)
cpufreq_driver->stop_cpu(policy);
 
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index bd7fbd6a4478..6078eb07a7e4 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -151,6 +151,9 @@ struct cpufreq_policy {
 
/* For cpufreq driver's internal use */
void*driver_data;
+
+   /* Pointer to the cooling device if used for thermal mitigation */
+   struct thermal_cooling_device *cdev;
 };
 
 /* Only for ACPI */
@@ -386,6 +389,12 @@ struct cpufreq_driver {
  */
 #define CPUFREQ_NO_AUTO_DYNAMIC_SWITCHING  BIT(6)
 
+/*
+ * Set by drivers that want the core to automatically register the cpufreq
+ * driver as a thermal cooling device.
+ */
+#define CPUFREQ_IS_COOLING_DEV BIT(7)
+
 int cpufreq_register_driver(struct cpufreq_driver *driver_data);
 int cpufreq_unregister_driver(struct cpufreq_driver *driver_data);
 
-- 
2.17.1

[PATCH v2] sched/fair: Fix insertion in rq->leaf_cfs_rq_list

2019-01-29 Thread Vincent Guittot

Sargun reported a crash:
  "I picked up c40f7d74c741a907cfaeb73a7697081881c497d0 sched/fair: Fix
   infinite loop in update_blocked_averages() by reverting a9e7f6544b9c
   and put it on top of 4.19.13. In addition to this, I uninlined
   list_add_leaf_cfs_rq for debugging.

   This revealed a new bug that we didn't get to because we kept getting
   crashes from the previous issue. When we are running with cgroups that
   are rapidly changing, with CFS bandwidth control, and in addition
   using the cpusets cgroup, we see this crash. Specifically, it seems to
   occur with cgroups that are throttled and we change the allowed
   cpuset."

The algorithm used to order cfs_rq in rq->leaf_cfs_rq_list assumes that
it will walk down to root the 1st time a cfs_rq is used and we will finish
to add either a cfs_rq without parent or a cfs_rq with a parent that is
already on the list. But this is not always true in presence of throttling.
Because a cfs_rq can be throttled even if it has never been used but other CPUs
of the cgroup have already used all the bandwdith, we are not sure to go down to
the root and add all cfs_rq in the list.

Ensure that all cfs_rq will be added in the list even if they are throttled.

Reported-by: Sargun Dhillon 
Fixes: 9c2791f936ef ("Fix hierarchical order in rq->leaf_cfs_rq_list")
Signed-off-by: Vincent Guittot 
---

v2:
- Added dummy function for !CONFIG_FAIR_GROUP_SCHED

This patch doesn't fix:
  a9e7f6544b9c ("sched/fair: Fix O(nr_cgroups) in load balance path")
which has been reverted in v5.0-rc1. I'm working on an additonal patch
that should be similar to this one to fix a9e7f6544b9c.

 kernel/sched/fair.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e2ff4b6..826fbe5 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -352,6 +352,20 @@ static inline void list_del_leaf_cfs_rq(struct cfs_rq 
*cfs_rq)
}
 }
 
+static inline void list_add_branch_cfs_rq(struct sched_entity *se, struct rq 
*rq)
+{
+   struct cfs_rq *cfs_rq;
+
+   for_each_sched_entity(se) {
+   cfs_rq = cfs_rq_of(se);
+   list_add_leaf_cfs_rq(cfs_rq);
+
+   /* If parent is already in the list, we can stop */
+   if (rq->tmp_alone_branch == >leaf_cfs_rq_list)
+   break;
+   }
+}
+
 /* Iterate through all leaf cfs_rq's on a runqueue: */
 #define for_each_leaf_cfs_rq(rq, cfs_rq) \
list_for_each_entry_rcu(cfs_rq, >leaf_cfs_rq_list, leaf_cfs_rq_list)
@@ -446,6 +460,10 @@ static inline void list_del_leaf_cfs_rq(struct cfs_rq 
*cfs_rq)
 {
 }
 
+static inline void list_add_branch_cfs_rq(struct sched_entity *se, struct rq 
*rq)
+{
+}
+
 #define for_each_leaf_cfs_rq(rq, cfs_rq)   \
for (cfs_rq = >cfs; cfs_rq; cfs_rq = NULL)
 
@@ -5179,6 +5197,9 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, 
int flags)
 
}
 
+   /* Ensure that all cfs_rq have been added to the list */
+   list_add_branch_cfs_rq(se, rq);
+
hrtick_update(rq);
 }
 
-- 
2.7.4

Re: [PATCH 3/3] autofs: add ignore mount option

2019-01-29 Thread Ian Kent

On Tue, 2019-01-29 at 20:58 -0800, Andrew Morton wrote:
> On Wed, 30 Jan 2019 10:07:15 +0800 Ian Kent  wrote:
> 
> > On Tue, 2019-01-29 at 17:16 -0800, Andrew Morton wrote:
> > > On Sat, 12 Jan 2019 08:00:40 +0800 Ian Kent  wrote:
> > > 
> > > > Add an autofs file system mount option that can be used to provide
> > > > a generic indicator to applications that the mount entry should be
> > > > ignored when displaying mount information.
> > > 
> > > What is the reason for adding this feature?
> > 
> > In other OSes that provide autofs and that provide a mount list
> > to user space based on the kernel mount list a no-op mount option
> > ("ignore" is the one use on the most common OS) is allowed so that
> > autofs file system users can optionally use it.
> > 
> > The idea is that it be used by user space programs to exclude
> > autofs mounts from consideration when reading the mounts list.
> > 
> > Prior to the change to link /etc/mtab to /proc/self/mounts all
> > I needed to do to achieve this was to use mount(2) and not update
> > the mtab but now that no longer works.
> > 
> > I know the symlinking happened a long time ago and I considered
> > doing this then but, at the time I couldn't remember the commonly
> > used option name and thought persuading the various utility
> > maintainers would be too hard.
> > 
> > But now I have a RHEL request to do this for compatibility for a
> > widely used product so I want to go ahead with it and try and
> > enlist the help of some utility package maintainers.
> > 
> > Clearly, without the option nothing can be done so it's at least
> > a start.
> 
> OK.  I guess I can just paste the above into the changelog.

I thought this description would be too long but, by all means,
replace or add it to the changelog.

Or, if you prefer, I could try and come up with something more
succinct, the above explanation probably goes into more detail
than is really needed to get the message across.

> 
> Also, Documentation/filesystems/autofs*.txt are owed an update?

Yes, I think so, I'll have a look at it and get onto it, thanks
for the reminder.

Ian

Re: [PATCH 2/7] cpufreq: dt: Register an Energy Model

On 29-01-19, 09:15, Quentin Perret wrote:
> On Tuesday 29 Jan 2019 at 10:51:44 (+0530), Viresh Kumar wrote:
> > On 28-01-19, 11:36, Matthias Kaehlcke wrote:
> > > I think this patch will result in error messages at registration on
> > > platforms that use the cpufreq-dt driver and don't specify
> > > 'dynamic-power-coefficient' for the CPUs in the DT. Not sure if that's
> > > a problem as long as the cpufreq initialization succeeds regardless,
> > > it could be seen as a not-so-gentle nudge to add the values.
> > 
> > That wouldn't be acceptable.
> 
> Fair enough. What I can propose in this case is to have in PM_OPP a
> helper called 'dev_pm_opp_of_register_em()' or something like this. This
> function will check all prerequisites are present (we have the right
> values in DT, and so on) and then call (or not) em_register_perf_domain().
> Then we can make the CPUFreq drivers use that instead of calling
> em_register_perf_domain() directly.

That should be fine.

> That would also make it easy to implement Matthias' suggestion to not
> call em_register_perf_domain() if an EM is already present.

So you will track registration state within the OPP core for that ?
Sorry but that doesn't sound right. What's wrong with having an
unregister helper in energy-model to keep proper code flow everywhere
?

-- 
viresh

[PATCH v9 1/3] mm: Shuffle initial free memory to improve memory-side-cache utilization

Randomization of the page allocator improves the average utilization of
a direct-mapped memory-side-cache. Memory side caching is a platform
capability that Linux has been previously exposed to in HPC
(high-performance computing) environments on specialty platforms. In
that instance it was a smaller pool of high-bandwidth-memory relative to
higher-capacity / lower-bandwidth DRAM. Now, this capability is going to
be found on general purpose server platforms where DRAM is a cache in
front of higher latency persistent memory [1].

Robert offered an explanation of the state of the art of Linux
interactions with memory-side-caches [2], and I copy it here:

It's been a problem in the HPC space:

http://www.nersc.gov/research-and-development/knl-cache-mode-performance-coe/

A kernel module called zonesort is available to try to help:
https://software.intel.com/en-us/articles/xeon-phi-software

and this abandoned patch series proposed that for the kernel:
https://lkml.kernel.org/r/20170823100205.17311-1-lukasz.dani...@intel.com

Dan's patch series doesn't attempt to ensure buffers won't conflict, but
also reduces the chance that the buffers will. This will make performance
more consistent, albeit slower than "optimal" (which is near impossible
to attain in a general-purpose kernel).  That's better than forcing
users to deploy remedies like:
"To eliminate this gradual degradation, we have added a Stream
 measurement to the Node Health Check that follows each job;
 nodes are rebooted whenever their measured memory bandwidth
 falls below 300 GB/s."

A replacement for zonesort was merged upstream in commit cc9aec03e58f
"x86/numa_emulation: Introduce uniform split capability". With this
numa_emulation capability, memory can be split into cache sized
("near-memory" sized) numa nodes. A bind operation to such a node, and
disabling workloads on other nodes, enables full cache performance.
However, once the workload exceeds the cache size then cache conflicts
are unavoidable. While HPC environments might be able to tolerate
time-scheduling of cache sized workloads, for general purpose server
platforms, the oversubscribed cache case will be the common case.

The worst case scenario is that a server system owner benchmarks a
workload at boot with an un-contended cache only to see that performance
degrade over time, even below the average cache performance due to
excessive conflicts. Randomization clips the peaks and fills in the
valleys of cache utilization to yield steady average performance.

Here are some performance impact details of the patches:

1/ An Intel internal synthetic memory bandwidth measurement tool, saw a
3X speedup in a contrived case that tries to force cache conflicts. The
contrived cased used the numa_emulation capability to force an instance
of the benchmark to be run in two of the near-memory sized numa nodes.
If both instances were placed on the same emulated they would fit and
cause zero conflicts.  While on separate emulated nodes without
randomization they underutilized the cache and conflicted unnecessarily
due to the in-order allocation per node.

2/ A well known Java server application benchmark was run with a heap
size that exceeded cache size by 3X. The cache conflict rate was 8% for
the first run and degraded to 21% after page allocator aging. With
randomization enabled the rate levelled out at 11%.

3/ A MongoDB workload did not observe measurable difference in
cache-conflict rates, but the overall throughput dropped by 7% with
randomization in one case.

4/ Mel Gorman ran his suite of performance workloads with randomization
enabled on platforms without a memory-side-cache and saw a mix of some
improvements and some losses [3].

While there is potentially significant improvement for applications that
depend on low latency access across a wide working-set, the performance
may be negligible to negative for other workloads. For this reason the
shuffle capability defaults to off unless a direct-mapped
memory-side-cache is detected. Even then, the page_alloc.shuffle=0
parameter can be specified to disable the randomization on those
systems.

Outside of memory-side-cache utilization concerns there is potentially
security benefit from randomization. Some data exfiltration and
return-oriented-programming attacks rely on the ability to infer the
location of sensitive data objects. The kernel page allocator,
especially early in system boot, has predictable first-in-first out
behavior for physical pages. Pages are freed in physical address order
when first onlined.

Quoting Kees:
"While we already have a base-address randomization
 (CONFIG_RANDOMIZE_MEMORY), attacks against the same hardware and
 memory layouts would certainly be using the predictability of
 allocation ordering (i.e. for attacks where the base address isn't
 important: only the relative positions between allocated memory).
 This is

[PATCH v9 2/3] mm: Move buddy list manipulations into helpers

In preparation for runtime randomization of the zone lists, take all
(well, most of) the list_*() functions in the buddy allocator and put
them in helper functions. Provide a common control point for injecting
additional behavior when freeing pages.

Acked-by: Michal Hocko 
Cc: Dave Hansen 
Signed-off-by: Dan Williams 
---
 include/linux/mm.h   |3 --
 include/linux/mm_types.h |3 ++
 include/linux/mmzone.h   |   46 ++
 mm/compaction.c  |4 +--
 mm/page_alloc.c  |   70 ++
 5 files changed, 79 insertions(+), 47 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 80bb6408fe73..1621acd10f83 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -500,9 +500,6 @@ static inline void vma_set_anonymous(struct vm_area_struct 
*vma)
 struct mmu_gather;
 struct inode;
 
-#define page_private(page) ((page)->private)
-#define set_page_private(page, v)  ((page)->private = (v))
-
 #if !defined(__HAVE_ARCH_PTE_DEVMAP) || !defined(CONFIG_TRANSPARENT_HUGEPAGE)
 static inline int pmd_devmap(pmd_t pmd)
 {
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 2c471a2c43fa..1c7dc7ffa288 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -214,6 +214,9 @@ struct page {
 #define PAGE_FRAG_CACHE_MAX_SIZE   __ALIGN_MASK(32768, ~PAGE_MASK)
 #define PAGE_FRAG_CACHE_MAX_ORDER  get_order(PAGE_FRAG_CACHE_MAX_SIZE)
 
+#define page_private(page) ((page)->private)
+#define set_page_private(page, v)  ((page)->private = (v))
+
 struct page_frag_cache {
void * va;
 #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 374e9d483382..6ab8b58c6481 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -18,6 +18,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 
 /* Free memory management - zoned buddy allocator.  */
@@ -98,6 +100,50 @@ struct free_area {
unsigned long   nr_free;
 };
 
+/* Used for pages not on another list */
+static inline void add_to_free_area(struct page *page, struct free_area *area,
+int migratetype)
+{
+   list_add(>lru, >free_list[migratetype]);
+   area->nr_free++;
+}
+
+/* Used for pages not on another list */
+static inline void add_to_free_area_tail(struct page *page, struct free_area 
*area,
+ int migratetype)
+{
+   list_add_tail(>lru, >free_list[migratetype]);
+   area->nr_free++;
+}
+
+/* Used for pages which are on another list */
+static inline void move_to_free_area(struct page *page, struct free_area *area,
+int migratetype)
+{
+   list_move(>lru, >free_list[migratetype]);
+}
+
+static inline struct page *get_page_from_free_area(struct free_area *area,
+   int migratetype)
+{
+   return list_first_entry_or_null(>free_list[migratetype],
+   struct page, lru);
+}
+
+static inline void del_page_from_free_area(struct page *page,
+   struct free_area *area, int migratetype)
+{
+   list_del(>lru);
+   __ClearPageBuddy(page);
+   set_page_private(page, 0);
+   area->nr_free--;
+}
+
+static inline bool free_area_empty(struct free_area *area, int migratetype)
+{
+   return list_empty(>free_list[migratetype]);
+}
+
 struct pglist_data;
 
 /*
diff --git a/mm/compaction.c b/mm/compaction.c
index ef29490b0f46..a22ac7ab65c5 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1359,13 +1359,13 @@ static enum compact_result __compact_finished(struct 
zone *zone,
bool can_steal;
 
/* Job done if page is free of the right migratetype */
-   if (!list_empty(>free_list[migratetype]))
+   if (!free_area_empty(area, migratetype))
return COMPACT_SUCCESS;
 
 #ifdef CONFIG_CMA
/* MIGRATE_MOVABLE can fallback on MIGRATE_CMA */
if (migratetype == MIGRATE_MOVABLE &&
-   !list_empty(>free_list[MIGRATE_CMA]))
+   !free_area_empty(area, MIGRATE_CMA))
return COMPACT_SUCCESS;
 #endif
/*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6208ff744b07..1cb9a467e451 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -743,12 +743,6 @@ static inline void set_page_order(struct page *page, 
unsigned int order)
__SetPageBuddy(page);
 }
 
-static inline void rmv_page_order(struct page *page)
-{
-   __ClearPageBuddy(page);
-   set_page_private(page, 0);
-}
-
 /*
  * This function checks whether a page is free && is the buddy
  * we can coalesce a page and its buddy if
@@ -849,13 +843,11 @@ static inline void __free_one_page(struct page *page,
 * Our buddy is free or it is CONFIG_DEBUG_PAGEALLOC

[PATCH v9 0/3] mm: Randomize free memory

Changes since v8 [1]:
* Rework shuffle call sites from 3 locations to 2, i.e. one for the
initial memory online path, and one for the hotplug memory online path.
This simplification results in an incremental diffstat of "7 files
changed, 31 insertions(+), 82 deletions(-)". The consolidation of the
initial shuffle in page_alloc_init_late() leads to a beneficial increase
in the number of shuffles performed in a qemu-VM test. (Michal)

* Drop the CONFIG_SHUFFLE_PAGE_ORDER configuration option. If it turns out
that there is a use case to make the shuffle-order dynamic that can be
addressed in a follow on update, but no such case is known at present.
(Michal)

* Replace lkml.org links with lkml.kernel.org, where possible.
Unfortunately lkml.kernel.org failed to capture Mel's feedback, so the
lkml.org link remains for that one. (Michal)

* Fix definition of pfn_present() in the !sparsemem case. (Michal)

* Collect Michal's ack on patch2, and open code rmv_page_order() in its
only caller.

[1]:
https://lkml.kernel.org/r/154767945660.1983228.12167020940431682725.st...@dwillia2-desk3.amr.corp.intel.com

---

Hi Andrew,

As you can see the series is improved thanks to Michal's review. Please
await his ack, but I believe this version addresses all pending
feedback.

Still based on v5.0-rc1 for my tests, but it applies and builds cleanly
to current linux-next.

---

Quote Patch 1:

Randomization of the page allocator improves the average utilization of
a direct-mapped memory-side-cache. Memory side caching is a platform
capability that Linux has been previously exposed to in HPC
(high-performance computing) environments on specialty platforms. In
that instance it was a smaller pool of high-bandwidth-memory relative to
higher-capacity / lower-bandwidth DRAM. Now, this capability is going to
be found on general purpose server platforms where DRAM is a cache in
front of higher latency persistent memory [2].

Robert offered an explanation of the state of the art of Linux
interactions with memory-side-caches [3], and I copy it here:

It's been a problem in the HPC space:

http://www.nersc.gov/research-and-development/knl-cache-mode-performance-coe/

A kernel module called zonesort is available to try to help:
https://software.intel.com/en-us/articles/xeon-phi-software

and this abandoned patch series proposed that for the kernel:
https://lkml.kernel.org/r/20170823100205.17311-1-lukasz.dani...@intel.com

Dan's patch series doesn't attempt to ensure buffers won't conflict, but
also reduces the chance that the buffers will. This will make performance
more consistent, albeit slower than "optimal" (which is near impossible
to attain in a general-purpose kernel). That's better than forcing
users to deploy remedies like:
"To eliminate this gradual degradation, we have added a Stream
measurement to the Node Health Check that follows each job;
nodes are rebooted whenever their measured memory bandwidth
falls below 300 GB/s."

A replacement for zonesort was merged upstream in commit cc9aec03e58f
"x86/numa_emulation: Introduce uniform split capability". With this
numa_emulation capability, memory can be split into cache sized
("near-memory" sized) numa nodes. A bind operation to such a node, and
disabling workloads on other nodes, enables full cache performance.
However, once the workload exceeds the cache size then cache conflicts
are unavoidable. While HPC environments might be able to tolerate
time-scheduling of cache sized workloads, for general purpose server
platforms, the oversubscribed cache case will be the common case.

The worst case scenario is that a server system owner benchmarks a
workload at boot with an un-contended cache only to see that performance
degrade over time, even below the average cache performance due to
excessive conflicts. Randomization clips the peaks and fills in the
valleys of cache utilization to yield steady average performance.

See patch 1 for more details.

[2]:
https://itpeernetwork.intel.com/intel-optane-dc-persistent-memory-operating-modes/
[3]:
https://lkml.kernel.org/r/at5pr8401mb1169d656c8b5e121752fc0f8ab...@at5pr8401mb1169.namprd84.prod.outlook.com

---

Dan Williams (3):
mm: Shuffle initial free memory to improve memory-side-cache utilization
mm: Move buddy list manipulations into helpers
mm: Maintain randomization of page free lists

[PATCH v9 3/3] mm: Maintain randomization of page free lists