Re: [PATCH 3/3] powerpc/mm/hash: Avoid multiple HPT resize-downs on memory hotunplug

2021-04-18 Thread David Gibson
On Fri, Apr 09, 2021 at 12:31:03AM -0300, Leonardo Bras wrote:
> Hello David, thanks for commenting.
> 
> On Tue, 2021-03-23 at 10:45 +1100, David Gibson wrote:
> > > @@ -805,6 +808,10 @@ static int resize_hpt_for_hotplug(unsigned long 
> > > new_mem_size, bool shrinking)
> > >   if (shrinking) {
> > > 
> > > + /* When batch removing entries, only resizes HPT at the end. */
> > > + if (atomic_read_acquire(_resize_disable))
> > > + return 0;
> > > +
> > 
> > I'm not quite convinced by this locking.  Couldn't hpt_resize_disable
> > be set after this point, but while you're still inside
> > resize_hpt_for_hotplug()?  Probably better to use an explicit mutex
> > (and mutex_trylock()) to make the critical sections clearer.
> 
> Sure, I can do that for v2.
> 
> > Except... do we even need the fancy mechanics to suppress the resizes
> > in one place to do them elswhere.  Couldn't we just replace the
> > existing resize calls with the batched ones?
> 
> How do you think of having batched resizes-down in HPT?

I think it's a good idea.  We still have to have the loop to resize
bigger if we can't fit everything into the smallest target size, but
that still only makes the worst case as bad at the always-case is
currently.

> Other than the current approach, I could only think of a way that would
> touch a lot of generic code, and/or duplicate some functions, as
> dlpar_add_lmb() does a lot of other stuff.
> 
> > > +void hash_memory_batch_shrink_end(void)
> > > +{
> > > + unsigned long newsize;
> > > +
> > > + /* Re-enables HPT resize-down after hot-unplug */
> > > + atomic_set_release(_resize_disable, 0);
> > > +
> > > + newsize = memblock_phys_mem_size();
> > > + /* Resize to smallest SHIFT possible */
> > > + while (resize_hpt_for_hotplug(newsize, true) == -ENOSPC) {
> > > + newsize *= 2;
> > 
> > As noted earlier, doing this without an explicit cap on the new hpt
> > size (of the existing size) this makes me nervous. 
> > 
> 
> I can add a stop in v2.
> 
> >  Less so, but doing
> > the calculations on memory size, rather than explictly on HPT size /
> > HPT order also seems kinda clunky.
> 
> Agree, but at this point, it would seem kind of a waste to find the
> shift from newsize, then calculate (1 << shift) for each retry of
> resize_hpt_for_hotplug() only to point that we are retrying the order
> value.

Yeah, I see your poiint.

> 
> But sure, if you think it looks better, I can change that. 
> 
> > > +void memory_batch_shrink_begin(void)
> > > +{
> > > + if (!radix_enabled())
> > > + hash_memory_batch_shrink_begin();
> > > +}
> > > +
> > > +void memory_batch_shrink_end(void)
> > > +{
> > > + if (!radix_enabled())
> > > + hash_memory_batch_shrink_end();
> > > +}
> > 
> > Again, these wrappers don't seem particularly useful to me.
> 
> Options would be add 'if (!radix_enabled())' to hotplug-memory.c
> functions or to hash* functions, which look kind of wrong.

I think the if !radix_enabled in hotplug-memory.c isn't too bad, in
fact possibly helpful as a hint that this is HPT only logic.

> 
> > > + memory_batch_shrink_end();
> > 
> > remove_by_index only removes a single LMB, so there's no real point to
> > batching here.
> 
> Sure, will be fixed for v2.
> 
> > > @@ -700,6 +712,7 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add)
> > >   if (lmbs_added != lmbs_to_add) {
> > >   pr_err("Memory hot-add failed, removing any added LMBs\n");
> > > 
> > > + memory_batch_shrink_begin();
> > 
> > 
> > The effect of these on the memory grow path is far from clear.
> > 
> 
> On hotplug, HPT is resized-up before adding LMBs.
> On hotunplug, HPT is resized-down after removing LMBs.
> And each one has it's own mechanism to batch HPT resizes...
> 
> I can't understand exactly how using it on hotplug fail path can be any
> different than using it on hotunplug.
> > 
> 
> Can you please help me understanding this?
> 
> Best regards,
> Leonardo Bras
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 2/3] powerpc/mm/hash: Avoid multiple HPT resize-ups on memory hotplug

2021-04-18 Thread David Gibson
On Thu, Apr 08, 2021 at 11:51:36PM -0300, Leonardo Bras wrote:
> Hello David, thanks for the feedback!
> 
> On Mon, 2021-03-22 at 18:55 +1100, David Gibson wrote:
> > > +void hash_memory_batch_expand_prepare(unsigned long newsize)
> > > +{
> > > + /*
> > > +  * Resizing-up HPT should never fail, but there are some cases system 
> > > starts with higher
> > > +  * SHIFT than required, and we go through the funny case of resizing 
> > > HPT down while
> > > +  * adding memory
> > > +  */
> > > +
> > > + while (resize_hpt_for_hotplug(newsize, false) == -ENOSPC) {
> > > + newsize *= 2;
> > > + pr_warn("Hash collision while resizing HPT\n");
> > 
> > This unbounded increase in newsize makes me nervous - we should be
> > bounded by the current size of the HPT at least.  In practice we
> > should be fine, since the resize should always succeed by the time we
> > reach our current HPT size, but that's far from obvious from this
> > point in the code.
> 
> Sure, I will add bounds in v2.
> 
> > 
> > And... you're doubling newsize which is a value which might not be a
> > power of 2.  I'm wondering if there's an edge case where this could
> > actually cause us to skip the current size and erroneously resize to
> > one bigger than we have currently.
> 
> I also though that at the start, but it seems quite reliable.
> Before using this value, htab_shift_for_mem_size() will always round it
> to next power of 2. 
> Ex.
> Any value between 0b0101 and 0b1000 will be rounded to 0b1000 for shift
> calculation. If we multiply it by 2 (same as << 1), we have that
> anything between 0b01010 and 0b1 will be rounded to 0b1. 

Ah, good point.

> This works just fine as long as we are multiplying. 
> Division may have the behavior you expect, as 0b0101 >> 1 would become
> 0b010 and skip a shift.
>   
> > > +void memory_batch_expand_prepare(unsigned long newsize)
> > 
> > This wrapper doesn't seem useful.
> 
> Yeah, it does little, but I can't just jump into hash_* functions
> directly from hotplug-memory.c, without even knowing if it's using hash
> pagetables. (in case the suggestion would be test for disable_radix
> inside hash_memory_batch*)
> 
> > 
> > > +{
> > > + if (!radix_enabled())
> > > + hash_memory_batch_expand_prepare(newsize);
> > > +}
> > >  #endif /* CONFIG_MEMORY_HOTPLUG */
> > >  
> > > 
> > > + memory_batch_expand_prepare(memblock_phys_mem_size() +
> > > +  drmem_info->n_lmbs * drmem_lmb_size());
> > 
> > This doesn't look right.  memory_add_by_index() is adding a *single*
> > LMB, I think using drmem_info->n_lmbs here means you're counting this
> > as adding again as much memory as you already have hotplugged.
> 
> Yeah, my mistake. This makes sense.
> I will change it to something like 
> memblock_phys_mem_size() + drmem_lmb_size()
> 
> > > 
> > > + memory_batch_expand_prepare(memblock_phys_mem_size() + lmbs_to_add * 
> > > drmem_lmb_size());
> > > +
> > >   for_each_drmem_lmb_in_range(lmb, start_lmb, end_lmb) {
> > >   if (lmb->flags & DRCONF_MEM_ASSIGNED)
> > >   continue;
> > 
> > I don't see memory_batch_expand_prepare() suppressing any existing HPT
> > resizes.  Won't this just resize to the right size for the full add,
> > then resize several times again as we perform the add?  Or.. I guess
> > that will be suppressed by patch 1/3. 
> 
> Correct.
> 
> >  That's seems kinda fragile, though.
> 
> What do you mean by fragile here?
> What would you suggest doing different?
> 
> Best regards,
> Leonardo Bras
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH] vfio/pci: Add missing range check in vfio_pci_mmap

2021-04-12 Thread David Gibson
On Mon, Apr 12, 2021 at 11:41:24PM +0200, Christian A. Ehrhardt wrote:
> 
> When mmaping an extra device region verify that the region index
> derived from the mmap offset is valid.
> 
> Fixes: a15b1883fee1 ("vfio_pci: Allow mapping extra regions")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Christian A. Ehrhardt 

Reviewed-by: David Gibson 

> ---
>  drivers/vfio/pci/vfio_pci.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 65e7e6b44578..5023e23db3bc 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -1656,6 +1656,8 @@ static int vfio_pci_mmap(void *device_data, struct 
> vm_area_struct *vma)
>  
>   index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT);
>  
> + if (index >= VFIO_PCI_NUM_REGIONS + vdev->num_regions)
> + return -EINVAL;
>   if (vma->vm_end < vma->vm_start)
>   return -EINVAL;
>   if ((vma->vm_flags & VM_SHARED) == 0)
> @@ -1664,7 +1666,7 @@ static int vfio_pci_mmap(void *device_data, struct 
> vm_area_struct *vma)
>   int regnum = index - VFIO_PCI_NUM_REGIONS;
>   struct vfio_pci_region *region = vdev->region + regnum;
>  
> - if (region && region->ops && region->ops->mmap &&
> + if (region->ops && region->ops->mmap &&
>   (region->flags & VFIO_REGION_INFO_FLAG_MMAP))
>   return region->ops->mmap(vdev, region, vma);
>   return -EINVAL;

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 3/3] powerpc/mm/hash: Avoid multiple HPT resize-downs on memory hotunplug

2021-03-22 Thread David Gibson
le (resize_hpt_for_hotplug(newsize, true) == -ENOSPC) {
> + newsize *= 2;

As noted earlier, doing this without an explicit cap on the new hpt
size (of the existing size) this makes me nervous.  Less so, but doing
the calculations on memory size, rather than explictly on HPT size /
HPT order also seems kinda clunky.

> + pr_warn("Hash collision while resizing HPT\n");
> + }
> +}
>  #endif /* CONFIG_MEMORY_HOTPLUG */
>  
>  static void __init hash_init_partition_table(phys_addr_t hash_table,
> diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
> b/arch/powerpc/mm/book3s64/pgtable.c
> index f1cd8af0f67f..e01681e22e00 100644
> --- a/arch/powerpc/mm/book3s64/pgtable.c
> +++ b/arch/powerpc/mm/book3s64/pgtable.c
> @@ -199,6 +199,18 @@ void memory_batch_expand_prepare(unsigned long newsize)
>   if (!radix_enabled())
>   hash_memory_batch_expand_prepare(newsize);
>  }
> +
> +void memory_batch_shrink_begin(void)
> +{
> + if (!radix_enabled())
> + hash_memory_batch_shrink_begin();
> +}
> +
> +void memory_batch_shrink_end(void)
> +{
> + if (!radix_enabled())
> + hash_memory_batch_shrink_end();
> +}

Again, these wrappers don't seem particularly useful to me.

>  #endif /* CONFIG_MEMORY_HOTPLUG */
>  
>  void __init mmu_partition_table_init(void)
> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
> b/arch/powerpc/platforms/pseries/hotplug-memory.c
> index 353c71249214..9182fb5b5c01 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -425,6 +425,8 @@ static int dlpar_memory_remove_by_count(u32 
> lmbs_to_remove)
>   return -EINVAL;
>   }
>  
> + memory_batch_shrink_begin();
> +
>   for_each_drmem_lmb(lmb) {
>   rc = dlpar_remove_lmb(lmb);
>   if (rc)
> @@ -470,6 +472,8 @@ static int dlpar_memory_remove_by_count(u32 
> lmbs_to_remove)
>   rc = 0;
>   }
>  
> + memory_batch_shrink_end();
> +
>   return rc;
>  }
>  
> @@ -481,6 +485,8 @@ static int dlpar_memory_remove_by_index(u32 drc_index)
>  
>   pr_debug("Attempting to hot-remove LMB, drc index %x\n", drc_index);
>  
> + memory_batch_shrink_begin();
> +
>   lmb_found = 0;
>   for_each_drmem_lmb(lmb) {
>   if (lmb->drc_index == drc_index) {
> @@ -502,6 +508,8 @@ static int dlpar_memory_remove_by_index(u32 drc_index)
>   else
>   pr_debug("Memory at %llx was hot-removed\n", lmb->base_addr);
>  
> + memory_batch_shrink_end();

remove_by_index only removes a single LMB, so there's no real point to
batching here.

>   return rc;
>  }
>  
> @@ -532,6 +540,8 @@ static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, 
> u32 drc_index)
>   if (lmbs_available < lmbs_to_remove)
>   return -EINVAL;
>  
> + memory_batch_shrink_begin();
> +
>   for_each_drmem_lmb_in_range(lmb, start_lmb, end_lmb) {
>   if (!(lmb->flags & DRCONF_MEM_ASSIGNED))
>   continue;
> @@ -572,6 +582,8 @@ static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, 
> u32 drc_index)
>   }
>   }
>  
> + memory_batch_shrink_end();
> +
>   return rc;
>  }
>  
> @@ -700,6 +712,7 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add)
>   if (lmbs_added != lmbs_to_add) {
>   pr_err("Memory hot-add failed, removing any added LMBs\n");
>  
> + memory_batch_shrink_begin();


The effect of these on the memory grow path is far from clear.

>   for_each_drmem_lmb(lmb) {
>   if (!drmem_lmb_reserved(lmb))
>   continue;
> @@ -713,6 +726,7 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add)
>  
>   drmem_remove_lmb_reservation(lmb);
>   }
> + memory_batch_shrink_end();
>   rc = -EINVAL;
>   } else {
>   for_each_drmem_lmb(lmb) {
> @@ -814,6 +828,7 @@ static int dlpar_memory_add_by_ic(u32 lmbs_to_add, u32 
> drc_index)
>   if (rc) {
>   pr_err("Memory indexed-count-add failed, removing any added 
> LMBs\n");
>  
> + memory_batch_shrink_begin();
>   for_each_drmem_lmb_in_range(lmb, start_lmb, end_lmb) {
>   if (!drmem_lmb_reserved(lmb))
>   continue;
> @@ -827,6 +842,7 @@ static int dlpar_memory_add_by_ic(u32 lmbs_to_add, u32 
> drc_index)
>  
>   drmem_remove_lmb_reservation(lmb);
>   }
> + memory_batch_shrink_end();
>   rc = -EINVAL;
>   } else {
>   for_each_drmem_lmb_in_range(lmb, start_lmb, end_lmb) {

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 2/3] powerpc/mm/hash: Avoid multiple HPT resize-ups on memory hotplug

2021-03-22 Thread David Gibson
tplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -671,6 +671,8 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add)
>   if (lmbs_available < lmbs_to_add)
>   return -EINVAL;
>  
> + memory_batch_expand_prepare(memblock_phys_mem_size() + lmbs_to_add * 
> drmem_lmb_size());
> +
>   for_each_drmem_lmb(lmb) {
>   if (lmb->flags & DRCONF_MEM_ASSIGNED)
>   continue;
> @@ -734,6 +736,8 @@ static int dlpar_memory_add_by_index(u32 drc_index)
>  
>   pr_info("Attempting to hot-add LMB, drc index %x\n", drc_index);
>  
> + memory_batch_expand_prepare(memblock_phys_mem_size() +
> +  drmem_info->n_lmbs * drmem_lmb_size());

This doesn't look right.  memory_add_by_index() is adding a *single*
LMB, I think using drmem_info->n_lmbs here means you're counting this
as adding again as much memory as you already have hotplugged.

>   lmb_found = 0;
>   for_each_drmem_lmb(lmb) {
>   if (lmb->drc_index == drc_index) {
> @@ -788,6 +792,8 @@ static int dlpar_memory_add_by_ic(u32 lmbs_to_add, u32 
> drc_index)
>   if (lmbs_available < lmbs_to_add)
>   return -EINVAL;
>  
> + memory_batch_expand_prepare(memblock_phys_mem_size() + lmbs_to_add * 
> drmem_lmb_size());
> +
>   for_each_drmem_lmb_in_range(lmb, start_lmb, end_lmb) {
>   if (lmb->flags & DRCONF_MEM_ASSIGNED)
>   continue;

I don't see memory_batch_expand_prepare() suppressing any existing HPT
resizes.  Won't this just resize to the right size for the full add,
then resize several times again as we perform the add?  Or.. I guess
that will be suppressed by patch 1/3.  That's seems kinda fragile,
though.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 1/3] powerpc/mm/hash: Avoid resizing-down HPT on first memory hotplug

2021-03-22 Thread David Gibson
On Fri, Mar 12, 2021 at 04:29:39AM -0300, Leonardo Bras wrote:
> Because hypervisors may need to create HPTs without knowing the guest
> page size, the smallest used page-size (4k) may be chosen, resulting in
> a HPT that is possibly bigger than needed.
> 
> On a guest with bigger page-sizes, the amount of entries for HTP may be
> too high, causing the guest to ask for a HPT resize-down on the first
> hotplug.
> 
> This becomes a problem when HPT resize-down fails, and causes the
> HPT resize to be performed on every LMB added, until HPT size is
> compatible to guest memory size, causing a major slowdown.
> 
> So, avoiding HPT resizing-down on hot-add significantly improves memory
> hotplug times.
> 
> As an example, hotplugging 256GB on a 129GB guest took 710s without this
> patch, and 21s after applied.
> 
> Signed-off-by: Leonardo Bras 

I don't love this approach.  Adding the extra flag at this level seems
a bit inelegant, and it means we're passing up an easy opportunity to
reduce our resource footprint on the host.

But... maybe we'll have to do it.  I'd like to see if we can get
things to work well enough with just the "batching" to avoid multiple
resize attempts first.

> ---
>  arch/powerpc/mm/book3s64/hash_utils.c | 36 ---
>  1 file changed, 21 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/powerpc/mm/book3s64/hash_utils.c 
> b/arch/powerpc/mm/book3s64/hash_utils.c
> index 73b06adb6eeb..cfb3ec164f56 100644
> --- a/arch/powerpc/mm/book3s64/hash_utils.c
> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
> @@ -794,7 +794,7 @@ static unsigned long __init htab_get_table_size(void)
>  }
>  
>  #ifdef CONFIG_MEMORY_HOTPLUG
> -static int resize_hpt_for_hotplug(unsigned long new_mem_size)
> +static int resize_hpt_for_hotplug(unsigned long new_mem_size, bool shrinking)
>  {
>   unsigned target_hpt_shift;
>  
> @@ -803,19 +803,25 @@ static int resize_hpt_for_hotplug(unsigned long 
> new_mem_size)
>  
>   target_hpt_shift = htab_shift_for_mem_size(new_mem_size);
>  
> - /*
> -  * To avoid lots of HPT resizes if memory size is fluctuating
> -  * across a boundary, we deliberately have some hysterisis
> -  * here: we immediately increase the HPT size if the target
> -  * shift exceeds the current shift, but we won't attempt to
> -  * reduce unless the target shift is at least 2 below the
> -  * current shift
> -  */
> - if (target_hpt_shift > ppc64_pft_size ||
> - target_hpt_shift < ppc64_pft_size - 1)
> - return mmu_hash_ops.resize_hpt(target_hpt_shift);
> + if (shrinking) {
>  
> - return 0;
> + /*
> +  * To avoid lots of HPT resizes if memory size is fluctuating
> +  * across a boundary, we deliberately have some hysterisis
> +  * here: we immediately increase the HPT size if the target
> +  * shift exceeds the current shift, but we won't attempt to
> +  * reduce unless the target shift is at least 2 below the
> +  * current shift
> +  */
> +
> + if (target_hpt_shift >= ppc64_pft_size - 1)
> + return 0;
> +
> + } else if (target_hpt_shift <= ppc64_pft_size) {
> + return 0;
> + }
> +
> + return mmu_hash_ops.resize_hpt(target_hpt_shift);
>  }
>  
>  int hash__create_section_mapping(unsigned long start, unsigned long end,
> @@ -828,7 +834,7 @@ int hash__create_section_mapping(unsigned long start, 
> unsigned long end,
>   return -1;
>   }
>  
> - resize_hpt_for_hotplug(memblock_phys_mem_size());
> + resize_hpt_for_hotplug(memblock_phys_mem_size(), false);
>  
>   rc = htab_bolt_mapping(start, end, __pa(start),
>  pgprot_val(prot), mmu_linear_psize,
> @@ -847,7 +853,7 @@ int hash__remove_section_mapping(unsigned long start, 
> unsigned long end)
>   int rc = htab_remove_mapping(start, end, mmu_linear_psize,
>        mmu_kernel_ssize);
>  
> - if (resize_hpt_for_hotplug(memblock_phys_mem_size()) == -ENOSPC)
> + if (resize_hpt_for_hotplug(memblock_phys_mem_size(), true) == -ENOSPC)
>   pr_warn("Hash collision while resizing HPT\n");
>  
>   return rc;

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH V4 0/3] scripts: dtc: Build fdtoverlay

2021-02-21 Thread David Gibson
On Thu, Feb 04, 2021 at 08:25:23AM -0600, Rob Herring wrote:
> On Sun, Jan 31, 2021 at 10:39 PM David Gibson
>  wrote:
> >
> > On Mon, Jan 25, 2021 at 09:42:21PM -0600, Frank Rowand wrote:
> > > Hi David,
> > >
> > > On 1/22/21 12:34 AM, David Gibson wrote:
> > > > On Wed, Jan 20, 2021 at 10:47:40AM +0530, Viresh Kumar wrote:
> > > >> +David.
> > > >>
> > > >> On 19-01-21, 11:12, Frank Rowand wrote:
> > > >>> On 1/12/21 2:28 AM, Viresh Kumar wrote:
> > > >>>> We will start building overlays for platforms soon in the kernel and
> > > >>>> would need fdtoverlay tool going forward. Lets start fetching and
> > > >>>> building it.
> > > >>>>
> > > >>>> While at it, also remove fdtdump.c file, which isn't used by the 
> > > >>>> kernel.
> > > >>>>
> > > >>>> V4:
> > > >>>> - Don't fetch and build fdtdump.c
> > > >>>> - Remove fdtdump.c
> > > >>>>
> > > >>>> Viresh Kumar (3):
> > > >>>>   scripts: dtc: Add fdtoverlay.c to DTC_SOURCE
> > > >>>>   scripts: dtc: Build fdtoverlay tool
> > > >>>>   scripts: dtc: Remove the unused fdtdump.c file
> > > >>>>
> > > >>>>  scripts/dtc/Makefile |   6 +-
> > > >>>>  scripts/dtc/fdtdump.c| 163 
> > > >>>> ---
> > > >>>>  scripts/dtc/update-dtc-source.sh |   6 +-
> > > >>>>  3 files changed, 8 insertions(+), 167 deletions(-)
> > > >>>>  delete mode 100644 scripts/dtc/fdtdump.c
> > > >>>>
> > > >>>
> > > >>> My first inclination was to accept fdtoverlay, as is, from the 
> > > >>> upstream
> > > >>> project.
> > > >>>
> > > >>> But my experiences debugging use of fdtoverlay against the existing
> > > >>> unittest overlay files has me very wary of accepting fdtoverlay in
> > > >>> it's current form.
> > > >>>
> > > >>> As an exmple, adding an overlay that fails to reply results in the
> > > >>> following build messages:
> > > >>>
> > > >>>linux--5.11-rc> make zImage
> > > >>>make[1]: Entering directory 
> > > >>> '/local/frowand_nobackup/src/git_linus/build/dragon_linus_5.11-rc'
> > > >>>  GEN Makefile
> > > >>>  CALL
> > > >>> /local/frowand_nobackup/src/git_linus/linux--5.11-rc/scripts/checksyscalls.sh
> > > >>>  CALL
> > > >>> /local/frowand_nobackup/src/git_linus/linux--5.11-rc/scripts/atomic/check-atomics.sh
> > > >>>  CHK include/generated/compile.h
> > > >>>  FDTOVERLAY drivers/of/unittest-data/static_test.dtb
> > > >>>
> > > >>>Failed to apply 'drivers/of/unittest-data/overlay.dtb': 
> > > >>> FDT_ERR_NOTFOUND
> > > >>>make[4]: *** 
> > > >>> [/local/frowand_nobackup/src/git_linus/linux--5.11-rc/drivers/of/unittest-data/Makefile:96:
> > > >>>  drivers/of/unittest-data/static_test.dtb] Error 1
> > > >>>make[3]: *** 
> > > >>> [/local/frowand_nobackup/src/git_linus/linux--5.11-rc/scripts/Makefile.build:496:
> > > >>>  drivers/of/unittest-data] Error 2
> > > >>>make[2]: *** 
> > > >>> [/local/frowand_nobackup/src/git_linus/linux--5.11-rc/scripts/Makefile.build:496:
> > > >>>  drivers/of] Error 2
> > > >>>make[1]: *** 
> > > >>> [/local/frowand_nobackup/src/git_linus/linux--5.11-rc/Makefile:1805: 
> > > >>> drivers] Error 2
> > > >>>make[1]: Leaving directory 
> > > >>> '/local/frowand_nobackup/src/git_linus/build/dragon_linus_5.11-rc'
> > > >>>make: *** [Makefile:185: __sub-make] Error 2
> > > >>>
> > > >>>
> > > >>> The specific error message (copied from above) is:
> > > >>>
> > > >>>Failed to apply 'drivers/of/unittest-data/overlay.dtb': 
> > > >>> FDT_ERR_NOTFOUND
> > > >>>
> > > >>> which is cryptic an

Re: [PATCH V7 4/6] kbuild: Add support to build overlays (%.dtbo)

2021-02-05 Thread David Gibson
On Fri, Feb 05, 2021 at 02:55:07PM +0530, Viresh Kumar wrote:
> On 05-02-21, 10:02, Geert Uytterhoeven wrote:
> > Hi Viresh,
> > 
> > Thanks for your patch
> > (which I only noticed because it appeared in dt-rh/for-next ;-)
> > 
> > On Fri, Jan 29, 2021 at 8:31 AM Viresh Kumar  
> > wrote:
> > > Add support for building DT overlays (%.dtbo). The overlay's source file
> > > will have the usual extension, i.e. .dts, though the blob will have
> > 
> > Why use .dts and not .dtso for overlays?
> > Because you originally (until v5) had a single rule for building .dtb
> > and .dtbo files?
> 
> I am fine with doing that as well if Rob and David agree to it. Rob
> did suggest that at one point but we didn't do much about it later on
> for some reason.
> 
> FWIW, this will also require a change in the DTC compiler.

Not really.  It would need a change to automatically recognize that
extension, but you can easily work around that by explicitly giving
the -I option to specify the input type.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH V4 0/3] scripts: dtc: Build fdtoverlay

2021-01-31 Thread David Gibson
On Mon, Jan 25, 2021 at 09:42:21PM -0600, Frank Rowand wrote:
> Hi David,
> 
> On 1/22/21 12:34 AM, David Gibson wrote:
> > On Wed, Jan 20, 2021 at 10:47:40AM +0530, Viresh Kumar wrote:
> >> +David.
> >>
> >> On 19-01-21, 11:12, Frank Rowand wrote:
> >>> On 1/12/21 2:28 AM, Viresh Kumar wrote:
> >>>> We will start building overlays for platforms soon in the kernel and
> >>>> would need fdtoverlay tool going forward. Lets start fetching and
> >>>> building it.
> >>>>
> >>>> While at it, also remove fdtdump.c file, which isn't used by the kernel.
> >>>>
> >>>> V4:
> >>>> - Don't fetch and build fdtdump.c
> >>>> - Remove fdtdump.c
> >>>>
> >>>> Viresh Kumar (3):
> >>>>   scripts: dtc: Add fdtoverlay.c to DTC_SOURCE
> >>>>   scripts: dtc: Build fdtoverlay tool
> >>>>   scripts: dtc: Remove the unused fdtdump.c file
> >>>>
> >>>>  scripts/dtc/Makefile |   6 +-
> >>>>  scripts/dtc/fdtdump.c| 163 ---
> >>>>  scripts/dtc/update-dtc-source.sh |   6 +-
> >>>>  3 files changed, 8 insertions(+), 167 deletions(-)
> >>>>  delete mode 100644 scripts/dtc/fdtdump.c
> >>>>
> >>>
> >>> My first inclination was to accept fdtoverlay, as is, from the upstream
> >>> project.
> >>>
> >>> But my experiences debugging use of fdtoverlay against the existing
> >>> unittest overlay files has me very wary of accepting fdtoverlay in
> >>> it's current form.
> >>>
> >>> As an exmple, adding an overlay that fails to reply results in the
> >>> following build messages:
> >>>
> >>>linux--5.11-rc> make zImage
> >>>make[1]: Entering directory 
> >>> '/local/frowand_nobackup/src/git_linus/build/dragon_linus_5.11-rc'
> >>>  GEN Makefile
> >>>  CALL
> >>> /local/frowand_nobackup/src/git_linus/linux--5.11-rc/scripts/checksyscalls.sh
> >>>  CALL
> >>> /local/frowand_nobackup/src/git_linus/linux--5.11-rc/scripts/atomic/check-atomics.sh
> >>>  CHK include/generated/compile.h
> >>>  FDTOVERLAY drivers/of/unittest-data/static_test.dtb
> >>>
> >>>Failed to apply 'drivers/of/unittest-data/overlay.dtb': 
> >>> FDT_ERR_NOTFOUND
> >>>make[4]: *** 
> >>> [/local/frowand_nobackup/src/git_linus/linux--5.11-rc/drivers/of/unittest-data/Makefile:96:
> >>>  drivers/of/unittest-data/static_test.dtb] Error 1
> >>>make[3]: *** 
> >>> [/local/frowand_nobackup/src/git_linus/linux--5.11-rc/scripts/Makefile.build:496:
> >>>  drivers/of/unittest-data] Error 2
> >>>make[2]: *** 
> >>> [/local/frowand_nobackup/src/git_linus/linux--5.11-rc/scripts/Makefile.build:496:
> >>>  drivers/of] Error 2
> >>>make[1]: *** 
> >>> [/local/frowand_nobackup/src/git_linus/linux--5.11-rc/Makefile:1805: 
> >>> drivers] Error 2
> >>>make[1]: Leaving directory 
> >>> '/local/frowand_nobackup/src/git_linus/build/dragon_linus_5.11-rc'
> >>>make: *** [Makefile:185: __sub-make] Error 2
> >>>
> >>>
> >>> The specific error message (copied from above) is:
> >>>
> >>>Failed to apply 'drivers/of/unittest-data/overlay.dtb': 
> >>> FDT_ERR_NOTFOUND
> >>>
> >>> which is cryptic and does not even point to the location in the overlay 
> >>> that
> >>> is problematic.  If you look at the source of fdtoverlay / libfdt, you 
> >>> will
> >>> find that FDT_ERR_NOTFOUND may be generated in one of many places.
> >>>
> >>> I do _not_ want to do a full review of fdtoverlay, but I think that it is
> >>> reasonable to request enhancing fdtoverlay in the parent project to 
> >>> generate
> >>> usable error messages before enabling fdtoverlay in the Linux kernel tree.
> > 
> 
> > That's... actually much harder than it sounds.  fdtoverlay is
> > basically a trivial wrapper around the fdt_overlay_apply() function in
> > libfdt.  Matching the conventions of the rest of the library, really
> > it's only way to report errors is a single error code.
> > 
> > Returning richer errors is not an easy problem in a C library,
>

Re: [PATCH V6 5/6] of: unittest: Create overlay_common.dtsi and testcases_common.dtsi

2021-01-22 Thread David Gibson
On Fri, Jan 22, 2021 at 04:20:35PM +0530, Viresh Kumar wrote:
> In order to build-test the same unit-test files using fdtoverlay tool,
> move the device nodes from the existing overlay_base.dts and
> testcases_common.dts files to .dtsi files. The .dts files now include
> the new .dtsi files, resulting in exactly the same behavior as earlier.
> 
> The .dtsi files can now be reused for compile time tests using
> fdtoverlay (will be done in a later patch).
> 
> This is required because the base files passed to fdtoverlay tool
> shouldn't be overlays themselves (i.e. shouldn't have the /plugin/;
> tag).
> 
> Signed-off-by: Viresh Kumar 
> ---
>  drivers/of/unittest-data/overlay_base.dts | 90 +-
>  drivers/of/unittest-data/overlay_common.dtsi  | 91 +++
>  drivers/of/unittest-data/testcases.dts| 17 +---
>  .../of/unittest-data/testcases_common.dtsi| 18 
>  4 files changed, 111 insertions(+), 105 deletions(-)
>  create mode 100644 drivers/of/unittest-data/overlay_common.dtsi
>  create mode 100644 drivers/of/unittest-data/testcases_common.dtsi
> 
> diff --git a/drivers/of/unittest-data/overlay_base.dts 
> b/drivers/of/unittest-data/overlay_base.dts
> index 99ab9d12d00b..ab9014589c5d 100644
> --- a/drivers/of/unittest-data/overlay_base.dts
> +++ b/drivers/of/unittest-data/overlay_base.dts
> @@ -2,92 +2,4 @@
>  /dts-v1/;
>  /plugin/;

This still makes no sense to me.  Is this data intended as a base
tree, or as an overlay?  If it's an overlay, what are the constraints
on the base tree it's supposed to apply to.

This patch is treating it as both in different places, but that's such
a bizarre usecase it needs detailed justification.  It really looks
like the unittest stuff is doing some very bogus stuff that should be
fixed first, before trying to do this on top.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH V6 2/6] scripts: dtc: Build fdtoverlay tool

2021-01-22 Thread David Gibson
On Fri, Jan 22, 2021 at 04:20:32PM +0530, Viresh Kumar wrote:
> We will start building overlays for platforms soon in the kernel and
> would need fdtoverlay going forward. Lets start building it.
> 
> The fdtoverlay program applies (or merges) one or more overlay dtb

Saying "merges" here is probably misleading, since as I mentioned
elsewhere fdtoverlay can *not* merge overlays, only apply them to a
base tree.

> blobs to a base dtb blob. The kernel build system would later use
> fdtoverlay to generate the overlaid blobs based on platform specific
> configurations.
> 
> Signed-off-by: Viresh Kumar 
> ---
>  scripts/dtc/Makefile | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/scripts/dtc/Makefile b/scripts/dtc/Makefile
> index 4852bf44e913..5f19386a49eb 100644
> --- a/scripts/dtc/Makefile
> +++ b/scripts/dtc/Makefile
> @@ -1,13 +1,17 @@
>  # SPDX-License-Identifier: GPL-2.0
>  # scripts/dtc makefile
>  
> -hostprogs-always-$(CONFIG_DTC)   += dtc
> +hostprogs-always-$(CONFIG_DTC)   += dtc fdtoverlay
>  hostprogs-always-$(CHECK_DT_BINDING) += dtc
>  
>  dtc-objs := dtc.o flattree.o fstree.o data.o livetree.o treesource.o \
>  srcpos.o checks.o util.o
>  dtc-objs += dtc-lexer.lex.o dtc-parser.tab.o
>  
> +libfdt-objs  := fdt.o fdt_ro.o fdt_wip.o fdt_sw.o fdt_rw.o fdt_strerror.o 
> fdt_empty_tree.o fdt_addresses.o fdt_overlay.o
> +libfdt   = $(addprefix libfdt/,$(libfdt-objs))
> +fdtoverlay-objs  := $(libfdt) fdtoverlay.o util.o
> +
>  # Source files need to get at the userspace version of libfdt_env.h to 
> compile
>  HOST_EXTRACFLAGS += -I $(srctree)/$(src)/libfdt
>  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH V4 0/3] scripts: dtc: Build fdtoverlay

2021-01-21 Thread David Gibson
On Wed, Jan 20, 2021 at 10:47:40AM +0530, Viresh Kumar wrote:
> +David.
> 
> On 19-01-21, 11:12, Frank Rowand wrote:
> > On 1/12/21 2:28 AM, Viresh Kumar wrote:
> > > We will start building overlays for platforms soon in the kernel and
> > > would need fdtoverlay tool going forward. Lets start fetching and
> > > building it.
> > > 
> > > While at it, also remove fdtdump.c file, which isn't used by the kernel.
> > > 
> > > V4:
> > > - Don't fetch and build fdtdump.c
> > > - Remove fdtdump.c
> > > 
> > > Viresh Kumar (3):
> > >   scripts: dtc: Add fdtoverlay.c to DTC_SOURCE
> > >   scripts: dtc: Build fdtoverlay tool
> > >   scripts: dtc: Remove the unused fdtdump.c file
> > > 
> > >  scripts/dtc/Makefile |   6 +-
> > >  scripts/dtc/fdtdump.c| 163 ---
> > >  scripts/dtc/update-dtc-source.sh |   6 +-
> > >  3 files changed, 8 insertions(+), 167 deletions(-)
> > >  delete mode 100644 scripts/dtc/fdtdump.c
> > > 
> > 
> > My first inclination was to accept fdtoverlay, as is, from the upstream
> > project.
> > 
> > But my experiences debugging use of fdtoverlay against the existing
> > unittest overlay files has me very wary of accepting fdtoverlay in
> > it's current form.
> > 
> > As an exmple, adding an overlay that fails to reply results in the
> > following build messages:
> > 
> >linux--5.11-rc> make zImage
> >make[1]: Entering directory 
> > '/local/frowand_nobackup/src/git_linus/build/dragon_linus_5.11-rc'
> >  GEN Makefile
> >  CALL
> > /local/frowand_nobackup/src/git_linus/linux--5.11-rc/scripts/checksyscalls.sh
> >  CALL
> > /local/frowand_nobackup/src/git_linus/linux--5.11-rc/scripts/atomic/check-atomics.sh
> >  CHK include/generated/compile.h
> >  FDTOVERLAY drivers/of/unittest-data/static_test.dtb
> > 
> >Failed to apply 'drivers/of/unittest-data/overlay.dtb': FDT_ERR_NOTFOUND
> >make[4]: *** 
> > [/local/frowand_nobackup/src/git_linus/linux--5.11-rc/drivers/of/unittest-data/Makefile:96:
> >  drivers/of/unittest-data/static_test.dtb] Error 1
> >make[3]: *** 
> > [/local/frowand_nobackup/src/git_linus/linux--5.11-rc/scripts/Makefile.build:496:
> >  drivers/of/unittest-data] Error 2
> >make[2]: *** 
> > [/local/frowand_nobackup/src/git_linus/linux--5.11-rc/scripts/Makefile.build:496:
> >  drivers/of] Error 2
> >make[1]: *** 
> > [/local/frowand_nobackup/src/git_linus/linux--5.11-rc/Makefile:1805: 
> > drivers] Error 2
> >make[1]: Leaving directory 
> > '/local/frowand_nobackup/src/git_linus/build/dragon_linus_5.11-rc'
> >make: *** [Makefile:185: __sub-make] Error 2
> > 
> > 
> > The specific error message (copied from above) is:
> > 
> >Failed to apply 'drivers/of/unittest-data/overlay.dtb': FDT_ERR_NOTFOUND
> > 
> > which is cryptic and does not even point to the location in the overlay that
> > is problematic.  If you look at the source of fdtoverlay / libfdt, you will
> > find that FDT_ERR_NOTFOUND may be generated in one of many places.
> > 
> > I do _not_ want to do a full review of fdtoverlay, but I think that it is
> > reasonable to request enhancing fdtoverlay in the parent project to generate
> > usable error messages before enabling fdtoverlay in the Linux kernel tree.

That's... actually much harder than it sounds.  fdtoverlay is
basically a trivial wrapper around the fdt_overlay_apply() function in
libfdt.  Matching the conventions of the rest of the library, really
it's only way to report errors is a single error code.

Returning richer errors is not an easy problem in a C library,
especially one designed to be usable in embedded systems, without an
allocator or much else available.

Of course it would be possible to write a friendly command line tool
specifically for applying overlays, which could give better errors.
fdtoverlay as it stands isn't really that - it was pretty much written
just to invoke fdt_overlay_apply() in testcases.

> > fdtoverlay in it's current form adds a potential maintenance burden to me
> > (as the overlay maintainer).  I now have the experience of how difficult it
> > was to debug the use of fdtoverlay in the context of the proposed patch to
> > use it with the devicetree unittest overlay .dtb files.
> > 
> > -Frank
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH V5 5/5] of: unittest: Statically apply overlays using fdtoverlay

2021-01-21 Thread David Gibson
On Fri, Jan 22, 2021 at 08:40:49AM +0530, Viresh Kumar wrote:
> On 22-01-21, 10:39, David Gibson wrote:
> > No, it definitely will not work in general.  It might kinda work in a
> > few trivial cases, but it absolutely will not do the neccessary
> > handling in some cases.
> > 
> > > I
> > > did inspect the output dtb (made by merging two overlays) using
> > > fdtdump and it looked okay.
> > 
> > Ok.. but if you're using these bizarre messed up "dtbs" that this test
> > code seems to be, I don't really trust that tells you much.
> 
> I only looked if the changes from the second overlay were present in
> the merge and they were. And so I assumed that it must have worked.
> 
> What about checking the base dtb for /plugin/; in fdtoverlay and fail
> the whole thing in case it is present ? I think it is possible for
> people to get confused otherwise, like I did.

/plugin/ doesn't exist in the dtb, only in the dts.  From the dtb
encoding point of view, there's no difference between a dtb and a
dtbo, a dtbo is just a dtb that follows some conventions for its
content.

If we were doing this from scratch, it would be better for dtbos to
have a different magic number from regular dtbs.  I think I actually
suggested that sometime in the past, but by the time that came up,
dtbos were already in pretty widespread use with the existing format.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH V5 5/5] of: unittest: Statically apply overlays using fdtoverlay

2021-01-21 Thread David Gibson
On Thu, Jan 21, 2021 at 12:27:28PM +0530, Viresh Kumar wrote:
> On 21-01-21, 17:34, David Gibson wrote:
> > No, this is the wrong way around.  The expected operation here is that
> > you apply overlay (1) to the base tree, giving you, say, output1.dtb.
> > output1.dtb is (effectively) a base tree itself, to which you can then
> > apply overlay-(2).
> 
> Thanks for the confirmation about this.
> 
> > Merging overlays is
> > something that could make sense, but fdtoverlay will not do it at
> > present.
> 
> FWIW, I think it works fine right now even if it not intentional.

No, it definitely will not work in general.  It might kinda work in a
few trivial cases, but it absolutely will not do the neccessary
handling in some cases.

> I
> did inspect the output dtb (made by merging two overlays) using
> fdtdump and it looked okay.

Ok.. but if you're using these bizarre messed up "dtbs" that this test
code seems to be, I don't really trust that tells you much.

> But yeah, I understand that we shouldn't
> do it.


-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH V5 5/5] of: unittest: Statically apply overlays using fdtoverlay

2021-01-21 Thread David Gibson
On Thu, Jan 21, 2021 at 01:04:46AM -0600, Frank Rowand wrote:
> On 1/20/21 11:58 PM, Viresh Kumar wrote:
> > On 20-01-21, 23:55, Frank Rowand wrote:
> >> yes, but using the modified versions ("/plugin/;" removed) of
> >> testcases.dtb and overlay_base.dtb.
> > 
> > Okay, that would work fine I guess. I will try to implement this in
> > the new version.
> > 
> >> I apologize in advance if I get confused in these threads or cause 
> >> confusion.
> >> I find myself going in circles and losing track of how things fit together 
> >> as
> >> I move through the various pieces of unittest.
> > 
> > Me too :)
> > 
> > Today is the first time where we have some overlap in our work hours
> > (probably you working late :)), and we are able to get this sorted out
> > quickly enough.
> 
> Working quite late.  I swear I stopped working 3 hours ago and that was
> already late.
> 
> I reacted quite negatively to the attempt to restructure the unittest
> .dts file in the original patch.  Now after walking around the problem
> space a bit, and reviewing the ugly things that unittest.c does, and
> coming up with the approach of sed to copy and modify the two base
> .dts files, I think I finally have my head wrapped around an easier
> and cleaner approach than sed.
> 
> I'll describe it for just one of the two base files, but the same
> concept would apply to the other.  Don't take my file names as
> good suggestions, I am avoiding using the brain power to come up
> with good names at the moment.
> 
> 1) rename overlay_base.dts to overlay_base.dtsi
> 
> 2) remove "/dtgs-v1/" and "/plugin/:" from overlay_base.dtsi
> 
> 3) create a new overlay_base.dts:
>// SPDX-License-Identifier: GPL-2.0
>/dts-v1/;
>/plugin/;
>#include overlay_base_dtsi

"overlay_base" is a terrible name - it's not clear if it's supposed to
be a base tree or an overlay.  I'm still not seeing any reasonable
use for the plugin version either, fwiw.

> 
> 4) create a new file overlay_base_static.dts:
>// SPDX-License-Identifier: GPL-2.0
>    /dts-v1/;
>#include overlay_base_dtsi
> 
> 5) then use overlay_base_static.dtb as the base blob for ftdoverlay
>applying overlay.dtb
> 
> Untested, hand typed, hopefully no typos.
> 
> -Frank
> 
> > 
> > Thanks for your feedback Frank.
> > 
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH V5 5/5] of: unittest: Statically apply overlays using fdtoverlay

2021-01-20 Thread David Gibson
On Wed, Jan 20, 2021 at 11:55:08PM -0600, Frank Rowand wrote:
> On 1/20/21 11:43 PM, Viresh Kumar wrote:
> > Hi Frank,
> > 
> > On 20-01-21, 23:34, Frank Rowand wrote:
> >> It should be possible to apply this same concept to copying 
> >> overlay_base.dts
> >> to overlay_base_base.dts, removing the "/plugin/;" from 
> >> overlay_base_base.dts
> >> and using an additional rule to use fdtoverlay to apply overlay.dtb on top
> >> of overlay_base_base.dtb.
> > 
> > Are you suggesting to then merge this with testcases.dtb to get
> > static_test.dtb
> 
> no
> 
> > or keep two output files (static_test.dtb from
> > testcases.dtb + overlays and static_test2.dtb from overlay_base.dtb
> > and overlay.dtb) ?
> 
> yes, but using the modified versions ("/plugin/;" removed) of
> testcases.dtb and overlay_base.dtb.

I really don't understand why you want /plugin/ in *any* version of
testcases.dtb.
> > 
> > Asking because as I mentioned earlier, overlay_base.dtb doesn't have
> > __overlay__ property for its nodes and we can't apply that to
> > testcases.dtb using fdtoverlay.
> 
> Correct.
> 
> I apologize in advance if I get confused in these threads or cause confusion.
> I find myself going in circles and losing track of how things fit together as
> I move through the various pieces of unittest.
> 
> -Frank
> 
> > 
> >> Yes, overlay_base_base is a terrible name.  Just used to illustrate the 
> >> point.
> >>
> >> I tried this by hand and am failing miserably.  But I am not using the 
> >> proper
> >> environment (just a quick hack to see if the method might work).  So I 
> >> would
> >> have to set things up properly to really test this.
> >>
> >> If this does work, it would remove my objections to you trying to transform
> >> the existing unittest .dts test data files (because you would not have to
> >> actually modify the existing .dts files).
> > 
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH] of: unittest: Statically apply overlays using fdtoverlay

2021-01-20 Thread David Gibson
On Wed, Jan 20, 2021 at 11:00:17PM -0600, Frank Rowand wrote:
> 
> +David
> 
> so I don't have to repeat this in another thread
> 
> On 1/19/21 11:06 PM, Viresh Kumar wrote:
> > On 19-01-21, 09:44, Frank Rowand wrote:
> >> No.  overlay_base.dts is intentionally compiled into a base FDT, not
> >> an overlay.  Unittest intentionally unflattens this FDT in early boot,
> >> in association with unflattening the system FDT.  One key intent
> >> behind this is to use the same memory allocation method that is
> >> used for the system FDT.
> >>
> >> Do not try to convert overlay_base.dts into an overlay.
> > 
> > Okay, but why does it have /plugin/; specified in it then ?
> 
> OK, so I sortof lied about overlay_base.dts not being an overlay.  It is
> a Frankenstein monster or a Schrodinger's dts/dtb.  It is not a normal
> object.  Nobody who is not looking at how it is abused inside unittest.c
> should be trying to touch it or understand it.

In that case, it absolutely should not be used as your standard base
dtb.

Note that overlays in general rely on particular details of the base
dtb they apply to - they'll need certain symbols and expect certain
paths to be there.  So applying random overlays to a "standard" base
dtb sounds destined to failure anyway.

Also, whatever they hell you're doing with testcases.dts sounds like a
terrible idea to begin with.

> unittest.c first unflattens overlay_base.dtb during early boot.  Then later
> it does some phandle resolution using the overlay metadata from overlay_base.
> Then it removes the overlay metadata from the in kernel devicetree data
> structure.  It is a hack, it is ugly, but it enables some overlay unit
> tests.
> 
> Quit trying to change overlay_base.dts.
> 
> In my suggested changes to the base patch I put overlay_base.dtb in the
> list of overlays for fdtoverlay to apply (apply_static_overlay in the
> Makefile) because overlay_base.dts is compiled as an overlay into
> overlay_base.dtb and it can be applied on top of the base tree
> testcases.dtb.  This gives a little bit more testcase data for
> fdtoverlay from an existing dtb.
> 
> If you keep trying to change overlay_base.dts I will just tell you
> to remove overlay_base.dtb from apply_static_overlay, and then the
> test coverage will become smaller.  I do not see that as a good change.
> 
> If you want more extensive testing of fdtoverlay, then create your
> own specific test cases from scratch and submit patches for them
> to the kernel or to the dtc compiler project.
> 
> > 
> > And shouldn't we create two separate dtb-s now, static_test.dtb and
> > static_overlay_test.dtb ? As fdtoverlay will not be able to merge it with
> > testcase.dtb anyway.
> > 
> > Or maybe we can create another file static_overlay.dts (like testcases.dts)
> > which can include both testcases.dts and overlay_base.dts, and then we can
> > create static_test.dtb out of it ? That won't impact the runtime tests at 
> > all.
> > 
> 
> Stop trying to use all of the unittest .dts test data files.  It is convenient
> that so many of them can be used in their current form.  That is goodness
> and nice leveraging.  Just ignore the .dts test data files that are not
> easily consumed by fdtoverlay.
> 
> The email threads around the various versions of this patch series show how
> normal devicetree knowledgeable people look at the contents of some of the
> .dts test data files and think that they are incorrect.  That is because
> the way that unittest uses them is not normal.  Trying to modify one or two
> of the many unittest .dts test data files so that they are usable by both
> the static fdtoverlay and the run time unittest is not worth it.
> 
> -Frank
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH V5 4/5] kbuild: Add support to build overlays (%.dtbo)

2021-01-20 Thread David Gibson
On Thu, Jan 21, 2021 at 09:43:00AM +0530, Viresh Kumar wrote:
> On 21-01-21, 11:49, David Gibson wrote:
> > If you're using overlays, you probably need the -@ flag, for both the
> > base file and the overlays, which AFAICT is not already the case.
> 
> I think the idea was to do that in the platform specific Makefiles,
> unless I have misunderstood that from earlier discussions. So a
> platform may want to do that per-file or just enable it for the entire
> platform.

Hm, ok.  Any platform that does anything with dtbo files is likely to
want it, though.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH V5 5/5] of: unittest: Statically apply overlays using fdtoverlay

2021-01-20 Thread David Gibson
On Thu, Jan 21, 2021 at 11:20:40AM +0530, Viresh Kumar wrote:
> On 20-01-21, 23:45, Frank Rowand wrote:
> > I have only the most surface knowledge of fdtoverlay, mostly from
> > "fdtoverlay --help", but you can apply multiple overlays with a
> > single invocation of fdtoverlay.  My _assumption_ was that the
> > overlays would be applied in order, and after any given overlay
> > was applied, subsequent overlays could reference the previously
> > applied overlay.
> > 
> > Is my assumption incorrect?
> 
> I think yes, if everything is in order then it should work just fine.
> 
> I was only suggesting that fdtoverlay accepting the base overlay with
> /plugin/; may well be a requirement and so intentionally done.

No.  It's simply the result of the fact that a dtbo is still a dtb.
So, you can still apply an overlay to it.  However, a dtbo is a
weirdly structured dtb, so applying an overlay to it is very unlikely
to give you something useful.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH V5 5/5] of: unittest: Statically apply overlays using fdtoverlay

2021-01-20 Thread David Gibson
On Thu, Jan 21, 2021 at 11:04:26AM +0530, Viresh Kumar wrote:
> On 20-01-21, 23:14, Frank Rowand wrote:
> > It is a convenient FDT to use because it provides the frame that the 
> > overlays
> > require to be applied.  It is fortunate that fdtoverlay does not reject the 
> > use
> > of an FDT with overlay metadata as the base blob.
> 
> > This is probably a good idea instead of depending on the leniency of 
> > fdtoverlay.
> 
> I believe fdtoverlay allows that intentionally, that would be required
> for the cases where we have a hierarchy of extension boards or
> overlays.

Um.. no.

> A platform can have a base dtb (with /plugin/;), then we can have an
> overlay (1) for an extension board (with /plugin/;) and then an
> overlay (2) for an extension board for the previous extension board.
> 
> In such a case overlay-(2) can't be applied directly to the base dtb
> as it may not find all the nodes it is trying to update. And so
> overlay-(2) needs to be applied to overlay-(1) and then the output of
> this can be applied to the base dtb.

No, this is the wrong way around.  The expected operation here is that
you apply overlay (1) to the base tree, giving you, say, output1.dtb.
output1.dtb is (effectively) a base tree itself, to which you can then
apply overlay-(2).

What you're talking about is "merging" overlays: combingin overlay (1)
and (2) into overlay-(X) which would have the same effect applied to
base.dtb as (1) and (2) applied in sequence.  Merging overlays is
something that could make sense, but fdtoverlay will not do it at
present.

> This is very similar to what I tried with the intermediate.dtb
> earlier.
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH V5 3/5] scripts: dtc: Remove the unused fdtdump.c file

2021-01-20 Thread David Gibson
On Thu, Jan 21, 2021 at 09:47:57AM +0530, Viresh Kumar wrote:
> On 21-01-21, 11:44, David Gibson wrote:
> > On Wed, Jan 20, 2021 at 12:36:45PM +0530, Viresh Kumar wrote:
> > > This was copied from external DTC repository long back and isn't used
> > > anymore. Over that the dtc tool can be used to generate the dts source
> > > back from the dtb. Remove the unused fdtdump.c file.
> > > 
> > > Signed-off-by: Viresh Kumar 
> > 
> > Doesn't this make updating the kernel dtc from upstream needlessly
> > more difficult?
> 
> Hmm, I am not sure I understand the concern well. The kernel keeps a
> list of files[1] it needs to automatically copy (using a script) from
> the upstream dtc repo and fdtdump.c was never part of that. Keeping it
> there isn't going to make any difficulty I believe.

Hm, ok.  Seems a bit clunky compared to embedding the whole directory,
but whatever.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH V5 4/5] kbuild: Add support to build overlays (%.dtbo)

2021-01-20 Thread David Gibson
On Wed, Jan 20, 2021 at 12:36:46PM +0530, Viresh Kumar wrote:
> Add support for building DT overlays (%.dtbo). The overlay's source file
> will have the usual extension, i.e. .dts, though the blob will have
> .dtbo extension to distinguish it from normal blobs.
> 
> Signed-off-by: Viresh Kumar 
> ---
>  .gitignore   | 3 +--
>  Makefile | 4 ++--
>  scripts/Makefile.dtbinst | 3 +++
>  scripts/Makefile.lib | 4 +++-
>  4 files changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/.gitignore b/.gitignore
> index d01cda8e1177..0458c36f3cb2 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -17,8 +17,7 @@
>  *.bz2
>  *.c.[012]*.*
>  *.dt.yaml
> -*.dtb
> -*.dtb.S
> +*.dtb*
>  *.dwo
>  *.elf
>  *.gcno
> diff --git a/Makefile b/Makefile
> index 9e73f82e0d86..b84f5e0b46ab 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1334,7 +1334,7 @@ endif
>  
>  ifneq ($(dtstree),)
>  
> -%.dtb: include/config/kernel.release scripts_dtc
> +%.dtb %.dtbo: include/config/kernel.release scripts_dtc
>   $(Q)$(MAKE) $(build)=$(dtstree) $(dtstree)/$@
>  
>  PHONY += dtbs dtbs_install dtbs_check
> @@ -1816,7 +1816,7 @@ clean: $(clean-dirs)
>   @find $(if $(KBUILD_EXTMOD), $(KBUILD_EXTMOD), .) $(RCS_FIND_IGNORE) \
>   \( -name '*.[aios]' -o -name '*.ko' -o -name '.*.cmd' \
>   -o -name '*.ko.*' \
> - -o -name '*.dtb' -o -name '*.dtb.S' -o -name '*.dt.yaml' \
> + -o -name '*.dtb' -o -name '*.dtbo' -o -name '*.dtb.S' -o -name 
> '*.dt.yaml' \
>   -o -name '*.dwo' -o -name '*.lst' \
>   -o -name '*.su' -o -name '*.mod' \
>   -o -name '.*.d' -o -name '.*.tmp' -o -name '*.mod.c' \
> diff --git a/scripts/Makefile.dtbinst b/scripts/Makefile.dtbinst
> index 50d580d77ae9..ba01f5ba2517 100644
> --- a/scripts/Makefile.dtbinst
> +++ b/scripts/Makefile.dtbinst
> @@ -29,6 +29,9 @@ quiet_cmd_dtb_install = INSTALL $@
>  $(dst)/%.dtb: $(obj)/%.dtb
>   $(call cmd,dtb_install)
>  
> +$(dst)/%.dtbo: $(obj)/%.dtbo
> + $(call cmd,dtb_install)
> +
>  PHONY += $(subdirs)
>  $(subdirs):
>   $(Q)$(MAKE) $(dtbinst)=$@ dst=$(patsubst $(obj)/%,$(dst)/%,$@)
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 213677a5ed33..30bc0a8e0087 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -86,7 +86,9 @@ extra-$(CONFIG_OF_ALL_DTBS) += $(dtb-)
>  
>  ifneq ($(CHECK_DTBS),)
>  extra-y += $(patsubst %.dtb,%.dt.yaml, $(dtb-y))
> +extra-y += $(patsubst %.dtbo,%.dt.yaml, $(dtb-y))
>  extra-$(CONFIG_OF_ALL_DTBS) += $(patsubst %.dtb,%.dt.yaml, $(dtb-))
> +extra-$(CONFIG_OF_ALL_DTBS) += $(patsubst %.dtbo,%.dt.yaml, $(dtb-))
>  endif
>  
>  # Add subdir path
> @@ -324,7 +326,7 @@ cmd_dtc = $(HOSTCC) -E $(dtc_cpp_flags) -x 
> assembler-with-cpp -o $(dtc-tmp) $< ;
>   -d $(depfile).dtc.tmp $(dtc-tmp) ; \
>   cat $(depfile).pre.tmp $(depfile).dtc.tmp > $(depfile)
>  
> -$(obj)/%.dtb: $(src)/%.dts $(DTC) FORCE
> +$(obj)/%.dtb $(obj)/%.dtbo: $(src)/%.dts $(DTC) FORCE
>   $(call if_changed_dep,dtc)

If you're using overlays, you probably need the -@ flag, for both the
base file and the overlays, which AFAICT is not already the case.

>  
>  DT_CHECKER ?= dt-validate

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH V5 5/5] of: unittest: Statically apply overlays using fdtoverlay

2021-01-20 Thread David Gibson
On Wed, Jan 20, 2021 at 12:36:47PM +0530, Viresh Kumar wrote:
> Now that fdtoverlay is part of the kernel build, start using it to test
> the unitest overlays we have by applying them statically.
> 
> Some unittest overlays deliberately contain errors that unittest checks
> for. These overlays will cause fdtoverlay to fail, and are thus not
> included in the static_test.dtb.
> 
> Signed-off-by: Viresh Kumar 
> ---
>  drivers/of/unittest-data/Makefile | 50 +++
>  1 file changed, 50 insertions(+)
> 
> diff --git a/drivers/of/unittest-data/Makefile 
> b/drivers/of/unittest-data/Makefile
> index 009f4045c8e4..ece7dfd5cafa 100644
> --- a/drivers/of/unittest-data/Makefile
> +++ b/drivers/of/unittest-data/Makefile
> @@ -38,3 +38,53 @@ DTC_FLAGS_testcases += -@
>  
>  # suppress warnings about intentional errors
>  DTC_FLAGS_testcases += -Wno-interrupts_property
> +
> +# Apply overlays statically with fdtoverlay.  This is a build time test that
> +# the overlays can be applied successfully by fdtoverlay.  This does not
> +# guarantee that the overlays can be applied successfully at run time by
> +# unittest, but it provides a bit of build time test coverage for those
> +# who do not execute unittest.
> +#
> +# The overlays are applied on top of testcases.dtb to create static_test.dtb
> +# If fdtoverlay detects an error than the kernel build will fail.
> +# static_test.dtb is not consumed by unittest.
> +#
> +# Some unittest overlays deliberately contain errors that unittest checks 
> for.
> +# These overlays will cause fdtoverlay to fail, and are thus not included
> +# in the static test:
> +#overlay.dtb \
> +#overlay_bad_add_dup_node.dtb \
> +#overlay_bad_add_dup_prop.dtb \
> +#overlay_bad_phandle.dtb \
> +#overlay_bad_symbol.dtb \
> +#overlay_base.dtb \
> +
> +apply_static_overlay := overlay_0.dtb \
> + overlay_1.dtb \
> + overlay_2.dtb \
> + overlay_3.dtb \
> + overlay_4.dtb \
> + overlay_5.dtb \
> + overlay_6.dtb \
> + overlay_7.dtb \
> + overlay_8.dtb \
> + overlay_9.dtb \
> + overlay_10.dtb \
> + overlay_11.dtb \
> + overlay_12.dtb \
> + overlay_13.dtb \
> + overlay_15.dtb \
> + overlay_gpio_01.dtb \
> + overlay_gpio_02a.dtb \
> + overlay_gpio_02b.dtb \
> + overlay_gpio_03.dtb \
> + overlay_gpio_04a.dtb \
> + overlay_gpio_04b.dtb
> +
> +quiet_cmd_fdtoverlay = FDTOVERLAY $@
> +  cmd_fdtoverlay = $(objtree)/scripts/dtc/fdtoverlay -o $@ -i $^
> +
> +$(obj)/static_test.dtb: $(obj)/testcases.dtb $(addprefix 
> $(obj)/,$(apply_static_overlay))
> + $(call if_changed,fdtoverlay)
> +
> +always-$(CONFIG_OF_OVERLAY) += static_test.dtb

The fact that testcases.dts includes /plugin/ still seems completely
wrong, if it's being used as the base tree.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH V5 3/5] scripts: dtc: Remove the unused fdtdump.c file

2021-01-20 Thread David Gibson
> - p = PALIGN(p + strlen(s) + 1, 4);
> -
> - if (*s == '\0')
> - s = "/";
> -
> - printf("%*s%s {\n", depth * shift, "", s);
> -
> - depth++;
> - continue;
> - }
> -
> - if (tag == FDT_END_NODE) {
> - depth--;
> -
> - printf("%*s};\n", depth * shift, "");
> - continue;
> - }
> -
> - if (tag == FDT_NOP) {
> - printf("%*s// [NOP]\n", depth * shift, "");
> - continue;
> - }
> -
> - if (tag != FDT_PROP) {
> - fprintf(stderr, "%*s ** Unknown tag 0x%08x\n", depth * 
> shift, "", tag);
> - break;
> - }
> - sz = fdt32_to_cpu(GET_CELL(p));
> - s = p_strings + fdt32_to_cpu(GET_CELL(p));
> - if (version < 16 && sz >= 8)
> - p = PALIGN(p, 8);
> - t = p;
> -
> - p = PALIGN(p + sz, 4);
> -
> - printf("%*s%s", depth * shift, "", s);
> - print_data(t, sz);
> - printf(";\n");
> - }
> -}
> -
> -
> -int main(int argc, char *argv[])
> -{
> - char *buf;
> -
> - if (argc < 2) {
> - fprintf(stderr, "supply input filename\n");
> - return 5;
> - }
> -
> - buf = utilfdt_read(argv[1]);
> - if (buf)
> - dump_blob(buf);
> - else
> - return 10;
> -
> - return 0;
> -}

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH] of: unittest: Statically apply overlays using fdtoverlay

2021-01-17 Thread David Gibson
> 
> -8<-
> 
> diff --git a/drivers/of/unittest-data/Makefile 
> b/drivers/of/unittest-data/Makefile
> index 9f3eb30b78f1..8cc23311b778 100644
> --- a/drivers/of/unittest-data/Makefile
> +++ b/drivers/of/unittest-data/Makefile
> @@ -41,7 +41,6 @@ DTC_FLAGS_testcases += -Wno-interrupts_property
>  
>  # Apply all overlays (except overlay_bad_* as they are not supposed to apply 
> and
>  # fail build) statically with fdtoverlay
> -intermediate-overlay   := overlay.dtb
>  master := overlay_0.dtb overlay_1.dtb overlay_2.dtb \
>overlay_3.dtb overlay_4.dtb overlay_5.dtb \
>overlay_6.dtb overlay_7.dtb overlay_8.dtb \
> @@ -50,15 +49,12 @@ master  := overlay_0.dtb 
> overlay_1.dtb overlay_2.dtb \
>overlay_gpio_01.dtb overlay_gpio_02a.dtb \
>overlay_gpio_02b.dtb overlay_gpio_03.dtb \
>overlay_gpio_04a.dtb overlay_gpio_04b.dtb \
> -  intermediate-overlay.dtb
> +  overlay_base.dtb overlay.dtb
>  
>  quiet_cmd_fdtoverlay = fdtoverlay $@
>cmd_fdtoverlay = $(objtree)/scripts/dtc/fdtoverlay -o $@ -i $^
>  
> -$(obj)/intermediate-overlay.dtb: $(obj)/overlay_base.dtb $(addprefix 
> $(obj)/,$(intermediate-overlay))
> -   $(call if_changed,fdtoverlay)
> -
>  $(obj)/master.dtb: $(obj)/testcases.dtb $(addprefix $(obj)/,$(master))
> $(call if_changed,fdtoverlay)
>  
> -always-$(CONFIG_OF_OVERLAY) += intermediate-overlay.dtb master.dtb
> +always-$(CONFIG_OF_OVERLAY) += master.dtb
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH] dtc: Allow overlays to have .dtbo extension

2021-01-11 Thread David Gibson
On Wed, Jan 06, 2021 at 03:26:08PM +0530, Viresh Kumar wrote:
> Allow the overlays to have .dtbo extension instead of just .dtb. This
> allows them to be identified easily by tools as well as humans.
> 
> Allow the dtbo outform in dtc.c for the same.
> 
> Signed-off-by: Viresh Kumar 

Seems reasonable.  Applied.

> 
> ---
> Hello,
> 
> This was earlier posted for the Linux Kernel and here is the thread
> where Rob gave his feedback:
> 
> https://lore.kernel.org/lkml/CAL_Jsq+0dL=lho8r9my_webp_bq97uobnt596j3jovhwgni...@mail.gmail.com/
> ---
>  dtc.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/dtc.c b/dtc.c
> index bdb3f5945699..838c5df96c00 100644
> --- a/dtc.c
> +++ b/dtc.c
> @@ -122,6 +122,8 @@ static const char *guess_type_by_name(const char *fname, 
> const char *fallback)
>   return "dts";
>   if (!strcasecmp(s, ".yaml"))
>   return "yaml";
> + if (!strcasecmp(s, ".dtbo"))
> + return "dtb";
>   if (!strcasecmp(s, ".dtb"))
>   return "dtb";
>   return fallback;
> @@ -357,6 +359,8 @@ int main(int argc, char *argv[])
>  #endif
>   } else if (streq(outform, "dtb")) {
>   dt_to_blob(outf, dti, outversion);
> + } else if (streq(outform, "dtbo")) {
> + dt_to_blob(outf, dti, outversion);
>   } else if (streq(outform, "asm")) {
>   dt_to_asm(outf, dti, outversion);
>   } else if (streq(outform, "null")) {

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


[PATCH] powerpc: kvm: Increase HDEC threshold to enter guest

2020-08-10 Thread David Gibson
Before entering a guest, we need to set the HDEC to pull us out again
when the guest's time is up.  This needs some care, though, because the
HDEC is edge triggered, which means that if it expires before entering the
guest, the interrupt will be lost, meaning we stay in the guest
indefinitely (in practice, until the the hard lockup detector pulls us out
with an NMI).

For the POWER9, independent threads mode specific path, we attempt to
prevent that, by testing time has already expired before setting the HDEC
in kvmhv_load_regs_and_go().  However, that doesn't account for the case
where the timer expires between that test and the actual guest entry.
Preliminary instrumentation suggests that can take as long as 1.5µs under
certain load conditions, and simply checking the HDEC value we're going to
load is positive isn't enough to guarantee that leeway.

That test here is sometimes masked by a test in kvmhv_p9_guest_entry(), its
caller.  That checks that the remaining time is at 1µs.  However as noted
above that doesn't appear to be sufficient in all circumstances even
from the point HDEC is set, let alone this earlier point.

Therefore, increase the threshold we check for in both locations to 4µs
(2048 timebase ticks).  This is a pretty crude approach, but it addresses
a real problem where guest load can trigger a host hard lockup.

We're hoping to refine this in future by gathering more data on exactly
how long these paths can take, and possibly by moving the check closer to
the actual guest entry point to reduce the variance.  Getting the details
for that might take some time however.

NOTE: For reasons I haven't yet tracked down yet, I haven't actually
managed to reproduce this on current upstream.  I have reproduced it on
RHEL kernels without obvious differences in this area.  I'm still trying
to determine what the cause of that difference is, but I think it's worth
applying this change as a precaution in the interim.

Signed-off-by: David Gibson 
---
 arch/powerpc/kvm/book3s_hv.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 0f83f39a2bd2..65a92dd890cb 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3435,7 +3435,7 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu 
*vcpu, u64 time_limit,
unsigned long host_pidr = mfspr(SPRN_PID);
 
hdec = time_limit - mftb();
-   if (hdec < 0)
+   if (hdec < 2048)
return BOOK3S_INTERRUPT_HV_DECREMENTER;
mtspr(SPRN_HDEC, hdec);
 
@@ -3564,7 +3564,7 @@ int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 
time_limit,
 
dec = mfspr(SPRN_DEC);
tb = mftb();
-   if (dec < 512)
+   if (dec < 2048)
return BOOK3S_INTERRUPT_HV_DECREMENTER;
local_paca->kvm_hstate.dec_expires = dec + tb;
if (local_paca->kvm_hstate.dec_expires < time_limit)
-- 
2.26.2



Re: [PATCH v4 0/2] dtc: Improve checks for i2c reg properties

2020-06-22 Thread David Gibson
On Mon, Jun 22, 2020 at 12:40:03PM +0930, Joel Stanley wrote:
> This is to fix a build warning in the Linux kernel caused by dtc
> incorrectly warning about I2C_OWN_SLAVE_ADDRESS.

Applied, thanks.

> 
> v4 adds a U to the defines
> v3 fixes the 10 bit size check
> v2 contains a second patch to check for 10 bit vs 7 bit addresses.
> 
> Joel Stanley (2):
>   checks: Remove warning for I2C_OWN_SLAVE_ADDRESS
>   checks: Improve i2c reg property checking
> 
>  checks.c | 14 --
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


[PATCHv2] tpm: ibmvtpm: Wait for ready buffer before probing for TPM2 attributes

2020-06-18 Thread David Gibson
The tpm2_get_cc_attrs_tbl() call will result in TPM commands being issued,
which will need the use of the internal command/response buffer.  But,
we're issuing this *before* we've waited to make sure that buffer is
allocated.

This can result in intermittent failures to probe if the hypervisor / TPM
implementation doesn't respond quickly enough.  I find it fails almost
every time with an 8 vcpu guest under KVM with software emulated TPM.

To fix it, just move the tpm2_get_cc_attrs_tlb() call after the
existing code to wait for initialization, which will ensure the buffer
is allocated.

Fixes: 18b3670d79ae9 ("tpm: ibmvtpm: Add support for TPM2")
Signed-off-by: David Gibson 
---

Changes from v1:
 * Fixed a formatting error in the commit message
 * Added some more detail to the commit message
 
drivers/char/tpm/tpm_ibmvtpm.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/char/tpm/tpm_ibmvtpm.c b/drivers/char/tpm/tpm_ibmvtpm.c
index 09fe45246b8cc..994385bf37c0c 100644
--- a/drivers/char/tpm/tpm_ibmvtpm.c
+++ b/drivers/char/tpm/tpm_ibmvtpm.c
@@ -683,13 +683,6 @@ static int tpm_ibmvtpm_probe(struct vio_dev *vio_dev,
if (rc)
goto init_irq_cleanup;
 
-   if (!strcmp(id->compat, "IBM,vtpm20")) {
-   chip->flags |= TPM_CHIP_FLAG_TPM2;
-   rc = tpm2_get_cc_attrs_tbl(chip);
-   if (rc)
-   goto init_irq_cleanup;
-   }
-
if (!wait_event_timeout(ibmvtpm->crq_queue.wq,
ibmvtpm->rtce_buf != NULL,
HZ)) {
@@ -697,6 +690,13 @@ static int tpm_ibmvtpm_probe(struct vio_dev *vio_dev,
goto init_irq_cleanup;
}
 
+   if (!strcmp(id->compat, "IBM,vtpm20")) {
+   chip->flags |= TPM_CHIP_FLAG_TPM2;
+   rc = tpm2_get_cc_attrs_tbl(chip);
+   if (rc)
+   goto init_irq_cleanup;
+   }
+
return tpm_chip_register(chip);
 init_irq_cleanup:
do {
-- 
2.26.2



[PATCH] tpm: ibmvtpm: Wait for ready buffer before probing for TPM2 attributes

2020-06-05 Thread David Gibson
The tpm2_get_cc_attrs_tbl() call will result in TPM commands being issued,
which will need the use of the internal command/response buffer.  But,
we're issuing this *before* we've waited to make sure that buffer is
allocated.

This can result in intermittent failures to probe if the hypervisor / TPM
implementation doesn't respond quickly enough.  I find it fails almost
every time with an 8 vcpu guest under KVM with software emulated TPM.

Fixes: 18b3670d79ae9 "tpm: ibmvtpm: Add support for TPM2"
Signed-off-by: David Gibson 
---
 drivers/char/tpm/tpm_ibmvtpm.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/char/tpm/tpm_ibmvtpm.c b/drivers/char/tpm/tpm_ibmvtpm.c
index 09fe45246b8c..994385bf37c0 100644
--- a/drivers/char/tpm/tpm_ibmvtpm.c
+++ b/drivers/char/tpm/tpm_ibmvtpm.c
@@ -683,13 +683,6 @@ static int tpm_ibmvtpm_probe(struct vio_dev *vio_dev,
if (rc)
goto init_irq_cleanup;
 
-   if (!strcmp(id->compat, "IBM,vtpm20")) {
-   chip->flags |= TPM_CHIP_FLAG_TPM2;
-   rc = tpm2_get_cc_attrs_tbl(chip);
-   if (rc)
-   goto init_irq_cleanup;
-   }
-
if (!wait_event_timeout(ibmvtpm->crq_queue.wq,
ibmvtpm->rtce_buf != NULL,
HZ)) {
@@ -697,6 +690,13 @@ static int tpm_ibmvtpm_probe(struct vio_dev *vio_dev,
goto init_irq_cleanup;
}
 
+   if (!strcmp(id->compat, "IBM,vtpm20")) {
+   chip->flags |= TPM_CHIP_FLAG_TPM2;
+   rc = tpm2_get_cc_attrs_tbl(chip);
+   if (rc)
+   goto init_irq_cleanup;
+   }
+
return tpm_chip_register(chip);
 init_irq_cleanup:
do {
-- 
2.26.2



Re: [PATCH v3 2/2] checks: Improve i2c reg property checking

2020-06-04 Thread David Gibson
On Tue, Jun 02, 2020 at 11:28:05AM +0300, Andy Shevchenko wrote:
> On Tue, Jun 2, 2020 at 11:03 AM Serge Semin
>  wrote:
> > On Thu, May 28, 2020 at 06:26:50PM +0930, Joel Stanley wrote:
> 
> ...
> 
> > > +#define I2C_TEN_BIT_ADDRESS  (1 << 31)
> >
> > As Andy neatly pointed out here:
> > https://lore.kernel.org/lkml/20200527133656.gv1634...@smile.fi.intel.com/
> > (1 << 31) is UB.
> 
> Thanks, Serge. Yes, we have to use 1U in the definitions (for 31 is
> necessary, for the rest is for the sake of consistency).

Joel, I know it seems trivial, but I'm a bit flat out right now.  Can
you please resend with the 1U fix applied.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: SPDX conversion under scripts/dtc/ of Linux Kernel

2019-06-20 Thread David Gibson
On Wed, Jun 19, 2019 at 09:39:13AM -0600, Rob Herring wrote:
> On Wed, Jun 19, 2019 at 6:59 AM Greg Kroah-Hartman
>  wrote:
> >
> > On Wed, Jun 19, 2019 at 07:23:19PM +0900, Masahiro Yamada wrote:
> > > Hi.
> > >
> > > In this development cycle of Linux kernel,
> > > lots of files were converted to use SPDX
> > > instead of the license boilerplate.
> > >
> > > However.
> > >
> > > Some files were imported from a different project,
> > > and are periodically synchronized with the upstream.
> > > Have we discussed what to do about this case?
> > >
> > >
> > > For example, scripts/dtc/ is the case.
> > >
> > > The files in scripts/dtc/ are synced with the upstream
> > > device tree compiler.
> > >
> > > Rob Herring periodically runs scripts/dtc/update-dtc-source.sh
> > > to import outcome from the upstream.
> > >
> > >
> > > The upstream DTC has not adopted SPDX yet.
> > >
> > > Some files in Linux (e.g. scripts/dtc/dtc.c)
> > > have been converted to SPDX.
> > >
> > > So, they are out of sync now.
> > >
> > > The license boilerplate will come back
> > > when Rob runs scripts/dtc/update-dtc-source.sh
> > > next time.
> 
> Already has. It just happened and is in next. The policy is everything
> is upstream first and any changes to dtc in the kernel are rejected.
> 
> > >
> > > What shall we do?
> > >
> > > [1] Convert upstream DTC to SPDX
> > >
> > > This will be a happy solution if it is acceptable in DTC.
> > > Since we cannot push the decision of the kernel to a different
> > > project, this is totally up to David Gibson.
> >
> > That's fine with me :)
> 
> I'll do the work if David is okay with it.

I have no objection.


> > > [2] Change scripts/dtc/update-dtc-source.sh to
> > > take care of the license block somehow
> >
> > That would also be good.
> >
> > > [3] Go back to license boilerplate, and keep the files
> > > synced with the upstream
> > > (and scripts/dtc/ should be excluded from the
> > >  SPDX conversion tool.)
> >
> > nothing is being excluded from the SPDX conversions, sorry.  The goal is
> > to do this for every file in the kernel tree.  Otherwise it's pointless.
> >
> > > Or, what else?
> >
> > Rob remembers to keep those first lines of the files intact when doing
> > the next sync?
> 
> Patches to the import script are welcome. The only thing I have to
> remember running the script is to add any new files. Otherwise, it's
> scripted so I don't have to remember anything.
> 
> Rob
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH] powerpc/mm: move warning from resize_hpt_for_hotplug()

2019-03-07 Thread David Gibson
On Thu, Mar 07, 2019 at 03:50:50PM +0100, Laurent Vivier wrote:
> resize_hpt_for_hotplug() reports a warning when it cannot
> resize the hash page table ("Unable to resize hash page
> table to target order") but in some cases it's not a problem
> and can make user thinks something has not worked properly.
> 
> This patch moves the warning to arch_remove_memory() to
> only report the problem when it is needed.
> 
> Signed-off-by: Laurent Vivier 

Looks ok, but apart from the H_PTEG_FULL case this does allow for some
error paths that previously warned to pass without error, which I
don't think is wise.

Specifically that will happen the prepare hcall returns H_PARAMETER or
H_RESOURCE.  Thse will result in -EINVAL or -EPERM from
pseries_lpar_resize_hpt() which previously would have tripped the
warning in resize_hpt_for_hotplug() but now will be silently ignored.

> ---
>  arch/powerpc/include/asm/sparsemem.h  |  4 ++--
>  arch/powerpc/mm/hash_utils_64.c   | 17 ++---
>  arch/powerpc/mm/mem.c |  3 ++-
>  arch/powerpc/platforms/pseries/lpar.c |  1 -
>  4 files changed, 10 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/sparsemem.h 
> b/arch/powerpc/include/asm/sparsemem.h
> index 68da49320592..3192d454a733 100644
> --- a/arch/powerpc/include/asm/sparsemem.h
> +++ b/arch/powerpc/include/asm/sparsemem.h
> @@ -17,9 +17,9 @@ extern int create_section_mapping(unsigned long start, 
> unsigned long end, int ni
>  extern int remove_section_mapping(unsigned long start, unsigned long end);
>  
>  #ifdef CONFIG_PPC_BOOK3S_64
> -extern void resize_hpt_for_hotplug(unsigned long new_mem_size);
> +extern int resize_hpt_for_hotplug(unsigned long new_mem_size);
>  #else
> -static inline void resize_hpt_for_hotplug(unsigned long new_mem_size) { }
> +static inline int resize_hpt_for_hotplug(unsigned long new_mem_size) { 
> return 0; }
>  #endif
>  
>  #ifdef CONFIG_NUMA
> diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
> index 0cc7fbc3bd1c..40bb2a8326bb 100644
> --- a/arch/powerpc/mm/hash_utils_64.c
> +++ b/arch/powerpc/mm/hash_utils_64.c
> @@ -755,12 +755,12 @@ static unsigned long __init htab_get_table_size(void)
>  }
>  
>  #ifdef CONFIG_MEMORY_HOTPLUG
> -void resize_hpt_for_hotplug(unsigned long new_mem_size)
> +int resize_hpt_for_hotplug(unsigned long new_mem_size)
>  {
>   unsigned target_hpt_shift;
>  
>   if (!mmu_hash_ops.resize_hpt)
> - return;
> + return 0;
>  
>   target_hpt_shift = htab_shift_for_mem_size(new_mem_size);
>  
> @@ -773,15 +773,10 @@ void resize_hpt_for_hotplug(unsigned long new_mem_size)
>* current shift
>*/
>   if ((target_hpt_shift > ppc64_pft_size)
> - || (target_hpt_shift < (ppc64_pft_size - 1))) {
> - int rc;
> -
> - rc = mmu_hash_ops.resize_hpt(target_hpt_shift);
> - if (rc && (rc != -ENODEV))
> - printk(KERN_WARNING
> -"Unable to resize hash page table to target 
> order %d: %d\n",
> -target_hpt_shift, rc);
> - }
> + || (target_hpt_shift < (ppc64_pft_size - 1)))
> + return mmu_hash_ops.resize_hpt(target_hpt_shift);
> +
> + return 0;
>  }
>  
>  int hash__create_section_mapping(unsigned long start, unsigned long end, int 
> nid)
> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> index 33cc6f676fa6..0d40d970cf4a 100644
> --- a/arch/powerpc/mm/mem.c
> +++ b/arch/powerpc/mm/mem.c
> @@ -169,7 +169,8 @@ int __meminit arch_remove_memory(int nid, u64 start, u64 
> size,
>*/
>   vm_unmap_aliases();
>  
> - resize_hpt_for_hotplug(memblock_phys_mem_size());
> + if (resize_hpt_for_hotplug(memblock_phys_mem_size()) == -ENOSPC)
> + pr_warn("Hash collision while resizing HPT\n");
>  
>   return ret;
>  }
> diff --git a/arch/powerpc/platforms/pseries/lpar.c 
> b/arch/powerpc/platforms/pseries/lpar.c
> index f2a9f0adc2d3..b407034a80ba 100644
> --- a/arch/powerpc/platforms/pseries/lpar.c
> +++ b/arch/powerpc/platforms/pseries/lpar.c
> @@ -918,7 +918,6 @@ static int pseries_lpar_resize_hpt(unsigned long shift)
>   if (rc != 0) {
>   switch (state.commit_rc) {
>   case H_PTEG_FULL:
> - pr_warn("Hash collision while resizing HPT\n");
>   return -ENOSPC;
>  
>   default:

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v2] powerpc/mm: move a KERN_WARNING message to pr_debug()

2019-02-06 Thread David Gibson
On Tue, Feb 05, 2019 at 09:21:33PM +0100, Laurent Vivier wrote:
> resize_hpt_for_hotplug() reports a warning when it cannot
> increase the hash page table ("Unable to resize hash page
> table to target order") but this is not blocking and
> can make user thinks something has not worked properly.
> As we move the message to the debug area, report again the
> ENODEV error.
> 
> If the operation cannot be done the real error message
> will be reported by arch_add_memory() if create_section_mapping()
> fails.
> 
> Fixes: 7339390d772dd
>powerpc/pseries: Don't give a warning when HPT resizing isn't available
> Signed-off-by: Laurent Vivier 

Sorry, I'm pretty dubious about this.  It's true that in the case for
which this bug was filed this is a harmless situation which deserves a
pr_debug() at most.

But that's not necessarily true in all paths leading to this message.
It will also trip if we fail to reshrink the HPT after genuinely
hotunplugging a bunch of memory, in which case failing to release
expected resources does deserve a warning.

> ---
> 
> Notes:
> v2:
>  - use pr_debug instead of printk(KERN_DEBUG
>  - remove check for ENODEV
> 
>  arch/powerpc/mm/hash_utils_64.c | 7 +++
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
> index 0cc7fbc3bd1c..6a0cc4eb2c83 100644
> --- a/arch/powerpc/mm/hash_utils_64.c
> +++ b/arch/powerpc/mm/hash_utils_64.c
> @@ -777,10 +777,9 @@ void resize_hpt_for_hotplug(unsigned long new_mem_size)
>   int rc;
>  
>   rc = mmu_hash_ops.resize_hpt(target_hpt_shift);
> - if (rc && (rc != -ENODEV))
> - printk(KERN_WARNING
> -"Unable to resize hash page table to target 
> order %d: %d\n",
> -target_hpt_shift, rc);
> + if (rc)
> + pr_debug("Unable to resize hash page table to target 
> order %d: %d\n",
> +  target_hpt_shift, rc);
>   }
>  }
>  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH V6 0/4] mm/kvm/vfio/ppc64: Migrate compound pages out of CMA region

2019-01-10 Thread David Gibson
On Wed, Jan 09, 2019 at 02:11:25PM +0530, Aneesh Kumar K.V wrote:
> Andrew Morton  writes:
> 
> > On Tue,  8 Jan 2019 10:21:06 +0530 "Aneesh Kumar K.V" 
> >  wrote:
> >
> >> ppc64 use CMA area for the allocation of guest page table (hash page 
> >> table). We won't
> >> be able to start guest if we fail to allocate hash page table. We have 
> >> observed
> >> hash table allocation failure because we failed to migrate pages out of 
> >> CMA region
> >> because they were pinned. This happen when we are using VFIO. VFIO on 
> >> ppc64 pins
> >> the entire guest RAM. If the guest RAM pages get allocated out of CMA 
> >> region, we
> >> won't be able to migrate those pages. The pages are also pinned for the 
> >> lifetime of the
> >> guest.
> >> 
> >> Currently we support migration of non-compound pages. With THP and with 
> >> the addition of
> >>  hugetlb migration we can end up allocating compound pages from CMA 
> >> region. This
> >> patch series add support for migrating compound pages. The first path adds 
> >> the helper
> >> get_user_pages_cma_migrate() which pin the page making sure we migrate 
> >> them out of
> >> CMA region before incrementing the reference count. 
> >
> > Does this code do anything for architectures other than powerpc?  If
> > not, should we be adding the ifdefs to avoid burdening other
> > architectures with unused code?
> 
> Any architecture enabling CMA may need this. I will move most of this below
> CONFIG_CMA.

In theory it could affect any architecture using CMA.  I suspect it's
much less likely to bite in practice on architectures other than ppc.
IIUC the main use of CMA there is to allocate things like framebuffers
or other large contiguous blocks used for hardware devices.  That's
usually going to happen rarely and during boot up.  What makes ppc
different is that we need a substantial CMA allocation every time we
start a (POWER8) guest for the HPT.  It's the fact that running guests
on a system both means we need the CMA unfragment and (with vfio added
in) can cause CMA fragmentation which makes this particularly
problematic.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH] PCI: Add no-D3 quirk for Mellanox ConnectX-[45]

2019-01-08 Thread David Gibson
On Wed, Jan 09, 2019 at 04:09:02PM +1100, Benjamin Herrenschmidt wrote:
> On Mon, 2019-01-07 at 21:01 -0700, Jason Gunthorpe wrote:
> > 
> > > In a very cryptic way that requires manual parsing using non-public
> > > docs sadly but yes. From the look of it, it's a completion timeout.
> > > 
> > > Looks to me like we don't get a response to a config space access
> > > during the change of D state. I don't know if it's the write of the D3
> > > state itself or the read back though (it's probably detected on the
> > > read back or a subsequent read, but that doesn't tell me which specific
> > > one failed).
> > 
> > If it is just one card doing it (again, check you have latest
> > firmware) I wonder if it is a sketchy PCI-E electrical link that is
> > causing a long re-training cycle? Can you tell if the PCI-E link is
> > permanently gone or does it eventually return?
> 
> No, it's 100% reproducable on systems with that specific card model,
> not card instance, and maybe different systems/cards as well, I'll let
> David & Alexey comment further on that.

Well, it's 100% reproducable on a particular model of system
(garrison) with a particular model of card.  I've had some suggestions
that it fails with some other systems card card models, but nothing
confirmed - the one other system model I've been able to try, which
also had a newer card model didn't reproduce the problem.

> > Does the card work in Gen 3 when it starts? Is there any indication of
> > PCI-E link errors?
> 
> Nope.
> 
> > Everytime or sometimes?
> > 
> > POWER 8 firmware is good? If the link does eventually come back, is
> > the POWER8's D3 resumption timeout long enough?
> > 
> > If this doesn't lead to an obvious conclusion you'll probably need to
> > connect to IBM's Mellanox support team to get more information from
> > the card side.
> 
> We are IBM :-) So far, it seems to be that the card is doing something
> not quite right, but we don't know what. We might need to engage
> Mellanox themselves.

Possibly.  On the other hand, I've had it reported that this is a
software regression at least with downstream red hat kernels.  I
haven't yet been able to eliminate factors that might be confusing
that, or try to find a working version upstream.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH] PCI: Add no-D3 quirk for Mellanox ConnectX-[45]

2019-01-03 Thread David Gibson
On Thu, Dec 06, 2018 at 08:45:09AM +0200, Leon Romanovsky wrote:
> On Thu, Dec 06, 2018 at 03:19:51PM +1100, David Gibson wrote:
> > Mellanox ConnectX-5 IB cards (MT27800) seem to cause a call trace when
> > unbound from their regular driver and attached to vfio-pci in order to pass
> > them through to a guest.
> >
> > This goes away if the disable_idle_d3 option is used, so it looks like a
> > problem with the hardware handling D3 state.  To fix that more permanently,
> > use a device quirk to disable D3 state for these devices.
> >
> > We do this by renaming the existing quirk_no_ata_d3() more generally and
> > attaching it to the ConnectX-[45] devices (0x15b3:0x1013).
> >
> > Signed-off-by: David Gibson 
> > ---
> >  drivers/pci/quirks.c | 17 +++--
> >  1 file changed, 11 insertions(+), 6 deletions(-)
> >
> 
> Hi David,
> 
> Thank for your patch,
> 
> I would like to reproduce the calltrace before moving forward,
> but have trouble to reproduce the original issue.
> 
> I'm working with vfio-pci and CX-4/5 cards on daily basis,
> tried manually enter into D3 state now, and it worked for me.

Interesting.  I've investigated this further, though I don't have as
many new clues as I'd like.  The problem occurs reliably, at least on
one particular type of machine (a POWER8 "Garrison" with ConnectX-4).
I don't yet know if it occurs with other machines, I'm having trouble
getting access to other machines with a suitable card.  I didn't
manage to reproduce it on a different POWER8 machine with a
ConnectX-5, but I don't know if it's the difference in machine or
difference in card revision that's important.

So possibilities that occur to me:
  * It's something specific about how the vfio-pci driver uses D3
state - have you tried rebinding your device to vfio-pci?
  * It's something specific about POWER, either the kernel or the PCI
bridge hardware
  * It's something specific about this particular type of machine

> Can you please post your full calltrace, and "lspci -s PCI_ID -vv"
> output?

[root@ibm-p8-garrison-01 ~]# lspci -vv -s 0008:01:00
0008:01:00.0 Infiniband controller: Mellanox Technologies MT27700 Family 
[ConnectX-4]
Subsystem: IBM Device 04f1
Physical Slot: Slot1
Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr+ 
Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- 
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [100 v1] Device Serial Number ba-da-ce-55-de-ad-ca-fe
Capabilities: [110 v1] Advanced Error Reporting
UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- 
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- 
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- 
RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 04, ECRCGenCap+ ECRCGenEn+ 
ECRCChkCap+ ECRCChkEn+
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog:    
Capabilities: [170 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 1
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [1c0 v1] #19
Kernel driver in use: vfio-pci
Kernel modules: mlx5_core

0008:01:00.1 Infiniband controller: Mellanox Technologies MT27700 Family 
[ConnectX-4]
Subsystem: IBM Device 04f1
Physical Slot: Slot1
Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr+ 
Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- 
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [100 v1] Device Serial Number ba-da-ce-55-de-ad-ca-fe
Capabilities: [110 v1] Advanced Error Reporting
UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- 
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- 
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- 
RxOF+ MalfTLP+ ECRC- Uns

Re: [PATCH] PCI: Add no-D3 quirk for Mellanox ConnectX-[45]

2018-12-11 Thread David Gibson
On Tue, Dec 11, 2018 at 08:01:43AM -0600, Bjorn Helgaas wrote:
> Hi David,
> 
> I see you're still working on this, but if you do end up going this
> direction eventually, would you mind splitting this into two patches:
> 1) rename the quirk to make it more generic (but not changing any
> behavior), and 2) add the ConnectX devices to the quirk.  That way
> the ConnectX change is smaller and more easily
> understood/reverted/etc.

Sure.  Would it make sense to send (1) as an independent cleanup,
while I'm still working out exactly what (if anything) we need for
(2)?

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH] PCI: Add no-D3 quirk for Mellanox ConnectX-[45]

2018-12-10 Thread David Gibson
On Thu, Dec 06, 2018 at 08:45:09AM +0200, Leon Romanovsky wrote:
> On Thu, Dec 06, 2018 at 03:19:51PM +1100, David Gibson wrote:
> > Mellanox ConnectX-5 IB cards (MT27800) seem to cause a call trace when
> > unbound from their regular driver and attached to vfio-pci in order to pass
> > them through to a guest.
> >
> > This goes away if the disable_idle_d3 option is used, so it looks like a
> > problem with the hardware handling D3 state.  To fix that more permanently,
> > use a device quirk to disable D3 state for these devices.
> >
> > We do this by renaming the existing quirk_no_ata_d3() more generally and
> > attaching it to the ConnectX-[45] devices (0x15b3:0x1013).
> >
> > Signed-off-by: David Gibson 
> > ---
> >  drivers/pci/quirks.c | 17 +++--
> >  1 file changed, 11 insertions(+), 6 deletions(-)
> >
> 
> Hi David,
> 
> Thank for your patch,
> 
> I would like to reproduce the calltrace before moving forward,
> but have trouble to reproduce the original issue.
> 
> I'm working with vfio-pci and CX-4/5 cards on daily basis,
> tried manually enter into D3 state now, and it worked for me.
> 
> Can you please post your full calltrace, and "lspci -s PCI_ID -vv"
> output?

Sorry, I may have jumped the gun on this.  Using disable_idle_d3 seems
to do _something_ for these cards, but there are some other things
going wrong which are confusing the issue.  This is on POWER, which
might affect the situation.  I'll get back to you once I have some
more information.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v4] drivers/vfio: Fix a redundant copy bug

2018-11-18 Thread David Gibson
On Mon, Oct 29, 2018 at 03:32:34PM -0600, Alex Williamson wrote:
> On Mon, 29 Oct 2018 13:56:54 -0500
> Wenwen Wang  wrote:
> 
> > Hello,
> > 
> > Could you please apply this patch? Thanks!
> 
> I'd like to see testing and/or review from David or Alexey since I also
> don't have an environment for spapr/eeh.  We're already late into the
> v4.20 merge window so this is probably v4.21 material.  Thanks,

I didn't spot this for ages, since I wasn't CCed, I'm guessing the
same is true for Alexey.

TBH, I don't think it's worth bothering with this.  It's a tiny amount
of extra work on a rare debug path.  A couple of code lines simplicity
is worth more than a few bytes worth of redundant copy here.

Testing is.. difficult, since the EEH is already so broken it's hard
to know what to compare against.  Sam Bobroff is already looking
medium-long term at a bunch of EEH rework, so it's quite possible this
will be rewritten then anyway.

> 
> Alex
> 
> > On Wed, Oct 17, 2018 at 2:18 PM Wenwen Wang  wrote:
> > >
> > > In vfio_spapr_iommu_eeh_ioctl(), if the ioctl command is VFIO_EEH_PE_OP,
> > > the user-space buffer 'arg' is copied to the kernel object 'op' and the
> > > 'argsz' and 'flags' fields of 'op' are checked. If the check fails, an
> > > error code EINVAL is returned. Otherwise, 'op.op' is further checked
> > > through a switch statement to invoke related handlers. If 'op.op' is
> > > VFIO_EEH_PE_INJECT_ERR, the whole user-space buffer 'arg' is copied again
> > > to 'op' to obtain the err information. However, in the following execution
> > > of this case, the fields of 'op', except the field 'err', are actually not
> > > used. That is, the second copy has a redundant part. Therefore, for
> > > performance consideration, the redundant part of the second copy should be
> > > removed.
> > >
> > > This patch removes such a part in the second copy. It only copies from
> > > 'err.type' to 'err.mask', which is exactly required by the
> > > VFIO_EEH_PE_INJECT_ERR op.
> > >
> > > Signed-off-by: Wenwen Wang 
> > > ---
> > >  drivers/vfio/vfio_spapr_eeh.c | 9 ++---
> > >  1 file changed, 6 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/vfio/vfio_spapr_eeh.c b/drivers/vfio/vfio_spapr_eeh.c
> > > index 38edeb4..66634c6 100644
> > > --- a/drivers/vfio/vfio_spapr_eeh.c
> > > +++ b/drivers/vfio/vfio_spapr_eeh.c
> > > @@ -37,6 +37,7 @@ long vfio_spapr_iommu_eeh_ioctl(struct iommu_group 
> > > *group,
> > > struct eeh_pe *pe;
> > > struct vfio_eeh_pe_op op;
> > > unsigned long minsz;
> > > +   unsigned long start, end;
> > > long ret = -EINVAL;
> > >
> > > switch (cmd) {
> > > @@ -86,10 +87,12 @@ long vfio_spapr_iommu_eeh_ioctl(struct iommu_group 
> > > *group,
> > > ret = eeh_pe_configure(pe);
> > > break;
> > > case VFIO_EEH_PE_INJECT_ERR:
> > > -   minsz = offsetofend(struct vfio_eeh_pe_op, 
> > > err.mask);
> > > -   if (op.argsz < minsz)
> > > +   start = offsetof(struct vfio_eeh_pe_op, err.type);
> > > +   end = offsetofend(struct vfio_eeh_pe_op, 
> > > err.mask);
> > > +       if (op.argsz < end)
> > > return -EINVAL;
> > > -   if (copy_from_user(, (void __user *)arg, 
> > > minsz))
> > > +   if (copy_from_user(, (char __user *)arg +
> > > +   start, end - start))
> > > return -EFAULT;
> > >
> > > ret = eeh_pe_inject_err(pe, op.err.type, 
> > > op.err.func,
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v4] drivers/vfio: Fix a redundant copy bug

2018-11-18 Thread David Gibson
On Mon, Oct 29, 2018 at 03:32:34PM -0600, Alex Williamson wrote:
> On Mon, 29 Oct 2018 13:56:54 -0500
> Wenwen Wang  wrote:
> 
> > Hello,
> > 
> > Could you please apply this patch? Thanks!
> 
> I'd like to see testing and/or review from David or Alexey since I also
> don't have an environment for spapr/eeh.  We're already late into the
> v4.20 merge window so this is probably v4.21 material.  Thanks,

I didn't spot this for ages, since I wasn't CCed, I'm guessing the
same is true for Alexey.

TBH, I don't think it's worth bothering with this.  It's a tiny amount
of extra work on a rare debug path.  A couple of code lines simplicity
is worth more than a few bytes worth of redundant copy here.

Testing is.. difficult, since the EEH is already so broken it's hard
to know what to compare against.  Sam Bobroff is already looking
medium-long term at a bunch of EEH rework, so it's quite possible this
will be rewritten then anyway.

> 
> Alex
> 
> > On Wed, Oct 17, 2018 at 2:18 PM Wenwen Wang  wrote:
> > >
> > > In vfio_spapr_iommu_eeh_ioctl(), if the ioctl command is VFIO_EEH_PE_OP,
> > > the user-space buffer 'arg' is copied to the kernel object 'op' and the
> > > 'argsz' and 'flags' fields of 'op' are checked. If the check fails, an
> > > error code EINVAL is returned. Otherwise, 'op.op' is further checked
> > > through a switch statement to invoke related handlers. If 'op.op' is
> > > VFIO_EEH_PE_INJECT_ERR, the whole user-space buffer 'arg' is copied again
> > > to 'op' to obtain the err information. However, in the following execution
> > > of this case, the fields of 'op', except the field 'err', are actually not
> > > used. That is, the second copy has a redundant part. Therefore, for
> > > performance consideration, the redundant part of the second copy should be
> > > removed.
> > >
> > > This patch removes such a part in the second copy. It only copies from
> > > 'err.type' to 'err.mask', which is exactly required by the
> > > VFIO_EEH_PE_INJECT_ERR op.
> > >
> > > Signed-off-by: Wenwen Wang 
> > > ---
> > >  drivers/vfio/vfio_spapr_eeh.c | 9 ++---
> > >  1 file changed, 6 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/vfio/vfio_spapr_eeh.c b/drivers/vfio/vfio_spapr_eeh.c
> > > index 38edeb4..66634c6 100644
> > > --- a/drivers/vfio/vfio_spapr_eeh.c
> > > +++ b/drivers/vfio/vfio_spapr_eeh.c
> > > @@ -37,6 +37,7 @@ long vfio_spapr_iommu_eeh_ioctl(struct iommu_group 
> > > *group,
> > > struct eeh_pe *pe;
> > > struct vfio_eeh_pe_op op;
> > > unsigned long minsz;
> > > +   unsigned long start, end;
> > > long ret = -EINVAL;
> > >
> > > switch (cmd) {
> > > @@ -86,10 +87,12 @@ long vfio_spapr_iommu_eeh_ioctl(struct iommu_group 
> > > *group,
> > > ret = eeh_pe_configure(pe);
> > > break;
> > > case VFIO_EEH_PE_INJECT_ERR:
> > > -   minsz = offsetofend(struct vfio_eeh_pe_op, 
> > > err.mask);
> > > -   if (op.argsz < minsz)
> > > +   start = offsetof(struct vfio_eeh_pe_op, err.type);
> > > +   end = offsetofend(struct vfio_eeh_pe_op, 
> > > err.mask);
> > > +       if (op.argsz < end)
> > > return -EINVAL;
> > > -   if (copy_from_user(, (void __user *)arg, 
> > > minsz))
> > > +   if (copy_from_user(, (char __user *)arg +
> > > +   start, end - start))
> > > return -EFAULT;
> > >
> > > ret = eeh_pe_inject_err(pe, op.err.type, 
> > > op.err.func,
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: Portable DT Connectors with regard to FPGAs

2018-09-24 Thread David Gibson
On Mon, Sep 24, 2018 at 03:32:44PM -0500, Alan Tull wrote:
> My interest here was in having some discussion on whether connectors
> are a good match for handling FPGAs.
> 
> The relevant use model is where a user applies a DT overlay targeting
> an FPGA region after the kernel has booted.  That overlay initiates
> FPGA programming and then adds nodes for the new FPGA hardware. This
> is discussed more completely in the FPGA manager DT binding document
> [1].  The main deal here is that I'd like to be able to add nodes
> in/below a FPGA region node to support devices in the FPGA (and be
> able also to remove them if we are going to reconfigure the FPGA.)
> 
> Previous discussions about DT connectors focused on the types of
> things likely to be on a physical connector. GPIO and SPI got named as
> good examples for discussion while MMIO specifically was dismissed
> [2].  That's problematic for embedded FPGAs for example since the FPGA
> is on a mmio bus and hardware that is programmed into the FPGA lives
> on that mmio bus similar to any embedded peripherals.  So there's a
> question - are mmio busses intended to be left un-connectorizable?

I don't see any particular reason that a connector couldn't be used
for mmio devices.  I think you'd want to treat the connection point as
a bridge on the mmio bus - that can have a 'ranges' property mapping
the connected device into the parent bus's address space (as an
identity mapping or otherwise).

> 
> Alan
> 
> [1] 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/fpga/fpga-region.txt
> 
> [2] https://lkml.org/lkml/2016/7/20/560
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: Portable DT Connectors with regard to FPGAs

2018-09-24 Thread David Gibson
On Mon, Sep 24, 2018 at 03:32:44PM -0500, Alan Tull wrote:
> My interest here was in having some discussion on whether connectors
> are a good match for handling FPGAs.
> 
> The relevant use model is where a user applies a DT overlay targeting
> an FPGA region after the kernel has booted.  That overlay initiates
> FPGA programming and then adds nodes for the new FPGA hardware. This
> is discussed more completely in the FPGA manager DT binding document
> [1].  The main deal here is that I'd like to be able to add nodes
> in/below a FPGA region node to support devices in the FPGA (and be
> able also to remove them if we are going to reconfigure the FPGA.)
> 
> Previous discussions about DT connectors focused on the types of
> things likely to be on a physical connector. GPIO and SPI got named as
> good examples for discussion while MMIO specifically was dismissed
> [2].  That's problematic for embedded FPGAs for example since the FPGA
> is on a mmio bus and hardware that is programmed into the FPGA lives
> on that mmio bus similar to any embedded peripherals.  So there's a
> question - are mmio busses intended to be left un-connectorizable?

I don't see any particular reason that a connector couldn't be used
for mmio devices.  I think you'd want to treat the connection point as
a bridge on the mmio bus - that can have a 'ranges' property mapping
the connected device into the parent bus's address space (as an
identity mapping or otherwise).

> 
> Alan
> 
> [1] 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/fpga/fpga-region.txt
> 
> [2] https://lkml.org/lkml/2016/7/20/560
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH] vfio-pci: Disable binding to PFs with SR-IOV enabled

2018-07-12 Thread David Gibson
On Thu, Jul 12, 2018 at 04:33:04PM -0600, Alex Williamson wrote:
> We expect to receive PFs with SR-IOV disabled, however some host
> drivers leave SR-IOV enabled at unbind.  This puts us in a state where
> we can potentially assign both the PF and the VF, leading to both
> functionality as well as security concerns due to lack of managing the
> SR-IOV state as well as vendor dependent isolation from the PF to VF.
> If we were to attempt to actively disable SR-IOV on driver probe, we
> risk VF bound drivers blocking, potentially risking live lock
> scenarios.  Therefore simply refuse to bind to PFs with SR-IOV enabled
> with a warning message indicating the issue.  Users can resolve this
> by re-binding to the host driver and disabling SR-IOV before
> attempting to use the device with vfio-pci.
> 
> Signed-off-by: Alex Williamson 

Reviewed-by: David Gibson 

> ---
>  drivers/vfio/pci/vfio_pci.c |   13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index b423a309a6e0..f372f209c5c2 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -1189,6 +1189,19 @@ static int vfio_pci_probe(struct pci_dev *pdev, const 
> struct pci_device_id *id)
>   if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
>   return -EINVAL;
>  
> + /*
> +  * Prevent binding to PFs with VFs enabled, this too easily allows
> +  * userspace instance with VFs and PFs from the same device, which
> +  * cannot work.  Disabling SR-IOV here would initiate removing the
> +  * VFs, which would unbind the driver, which is prone to blocking
> +  * if that VF is also in use by vfio-pci.  Just reject these PFs
> +  * and let the user sort it out.
> +  */
> + if (pci_num_vf(pdev)) {
> + pci_warn(pdev, "Cannot bind to PF with SR-IOV enabled\n");
> + return -EBUSY;
> + }
> +
>   group = vfio_iommu_group_get(>dev);
>   if (!group)
>   return -EINVAL;
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH] vfio-pci: Disable binding to PFs with SR-IOV enabled

2018-07-12 Thread David Gibson
On Thu, Jul 12, 2018 at 04:33:04PM -0600, Alex Williamson wrote:
> We expect to receive PFs with SR-IOV disabled, however some host
> drivers leave SR-IOV enabled at unbind.  This puts us in a state where
> we can potentially assign both the PF and the VF, leading to both
> functionality as well as security concerns due to lack of managing the
> SR-IOV state as well as vendor dependent isolation from the PF to VF.
> If we were to attempt to actively disable SR-IOV on driver probe, we
> risk VF bound drivers blocking, potentially risking live lock
> scenarios.  Therefore simply refuse to bind to PFs with SR-IOV enabled
> with a warning message indicating the issue.  Users can resolve this
> by re-binding to the host driver and disabling SR-IOV before
> attempting to use the device with vfio-pci.
> 
> Signed-off-by: Alex Williamson 

Reviewed-by: David Gibson 

> ---
>  drivers/vfio/pci/vfio_pci.c |   13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index b423a309a6e0..f372f209c5c2 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -1189,6 +1189,19 @@ static int vfio_pci_probe(struct pci_dev *pdev, const 
> struct pci_device_id *id)
>   if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
>   return -EINVAL;
>  
> + /*
> +  * Prevent binding to PFs with VFs enabled, this too easily allows
> +  * userspace instance with VFs and PFs from the same device, which
> +  * cannot work.  Disabling SR-IOV here would initiate removing the
> +  * VFs, which would unbind the driver, which is prone to blocking
> +  * if that VF is also in use by vfio-pci.  Just reject these PFs
> +  * and let the user sort it out.
> +  */
> + if (pci_num_vf(pdev)) {
> + pci_warn(pdev, "Cannot bind to PF with SR-IOV enabled\n");
> + return -EBUSY;
> + }
> +
>   group = vfio_iommu_group_get(>dev);
>   if (!group)
>   return -EINVAL;
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH] powerpc/pseries: include linux/types.h in asm/hvcall.h

2018-01-15 Thread David Gibson
On Tue, Jan 16, 2018 at 02:16:58PM +1100, Michael Ellerman wrote:
> Michal Suchanek <msucha...@suse.de> writes:
> 
> > Commit 6e032b350cd1 ("powerpc/powernv: Check device-tree for RFI flush
> > settings") uses u64 in asm/hvcall.h without including linux/types.h
> >
> > This breaks hvcall.h users that do not include the header themselves.
> >
> > Fixes: 6e032b350cd1 ("powerpc/powernv: Check device-tree for RFI flush
> > settings")
> >
> > Signed-off-by: Michal Suchanek <msucha...@suse.de>
> > ---
> >  arch/powerpc/include/asm/hvcall.h | 1 +
> >  1 file changed, 1 insertion(+)
> 
> Thanks. None of my ~250 defconfig test builds hit this, what config are
> you using?

I also hit this, but only when I backported the change to RH's 3.10
kernel.  I assumed something since then had added an indirect include.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH] powerpc/pseries: include linux/types.h in asm/hvcall.h

2018-01-15 Thread David Gibson
On Tue, Jan 16, 2018 at 02:16:58PM +1100, Michael Ellerman wrote:
> Michal Suchanek  writes:
> 
> > Commit 6e032b350cd1 ("powerpc/powernv: Check device-tree for RFI flush
> > settings") uses u64 in asm/hvcall.h without including linux/types.h
> >
> > This breaks hvcall.h users that do not include the header themselves.
> >
> > Fixes: 6e032b350cd1 ("powerpc/powernv: Check device-tree for RFI flush
> > settings")
> >
> > Signed-off-by: Michal Suchanek 
> > ---
> >  arch/powerpc/include/asm/hvcall.h | 1 +
> >  1 file changed, 1 insertion(+)
> 
> Thanks. None of my ~250 defconfig test builds hit this, what config are
> you using?

I also hit this, but only when I backported the change to RH's 3.10
kernel.  I assumed something since then had added an indirect include.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


[PATCH] KVM: PPC: Book3S HV: Always flush TLB in kvmppc_alloc_reset_hpt()

2018-01-09 Thread David Gibson
The KVM_PPC_ALLOCATE_HTAB ioctl(), implemented by kvmppc_alloc_reset_hpt()
is supposed to completely clear and reset a guest's Hashed Page Table (HPT)
allocating or re-allocating it if necessary.

In the case where an HPT of the right size already exists and it just
zeroes it, it forces a TLB flush on all guest CPUs, to remove any stale TLB
entries loaded from the old HPT.

However, that situation can arise when the HPT is resizing as well - or
even when switching from an RPT to HPT - so those cases need a TLB flush as
well.

So, move the TLB flush to trigger in all cases except for errors.

Signed-off-by: David Gibson <da...@gibson.dropbear.id.au>
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

Paul, this is based on Paolo's KVM tree, but it should apply without
modification to pretty much any vaguely current tree.  It's a pretty
nasty bug - the case we've found hitting it in the wild is a bit
esoteric, but it could in theory affect other situations as well.

Please apply ASAP, and should probably be queued for the stable
branches as well.

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 966097232d21..51a275cc8a4d 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -159,8 +159,6 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, int order)
 * Reset all the reverse-mapping chains for all memslots
 */
kvmppc_rmap_reset(kvm);
-   /* Ensure that each vcpu will flush its TLB on next entry. */
-   cpumask_setall(>arch.need_tlb_flush);
err = 0;
goto out;
}
@@ -176,6 +174,10 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, int order)
kvmppc_set_hpt(kvm, );
 
 out:
+   if (err == 0)
+   /* Ensure that each vcpu will flush its TLB on next entry. */
+   cpumask_setall(>arch.need_tlb_flush);
+
mutex_unlock(>lock);
return err;
 }
-- 
2.14.3



[PATCH] KVM: PPC: Book3S HV: Always flush TLB in kvmppc_alloc_reset_hpt()

2018-01-09 Thread David Gibson
The KVM_PPC_ALLOCATE_HTAB ioctl(), implemented by kvmppc_alloc_reset_hpt()
is supposed to completely clear and reset a guest's Hashed Page Table (HPT)
allocating or re-allocating it if necessary.

In the case where an HPT of the right size already exists and it just
zeroes it, it forces a TLB flush on all guest CPUs, to remove any stale TLB
entries loaded from the old HPT.

However, that situation can arise when the HPT is resizing as well - or
even when switching from an RPT to HPT - so those cases need a TLB flush as
well.

So, move the TLB flush to trigger in all cases except for errors.

Signed-off-by: David Gibson 
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

Paul, this is based on Paolo's KVM tree, but it should apply without
modification to pretty much any vaguely current tree.  It's a pretty
nasty bug - the case we've found hitting it in the wild is a bit
esoteric, but it could in theory affect other situations as well.

Please apply ASAP, and should probably be queued for the stable
branches as well.

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 966097232d21..51a275cc8a4d 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -159,8 +159,6 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, int order)
 * Reset all the reverse-mapping chains for all memslots
 */
kvmppc_rmap_reset(kvm);
-   /* Ensure that each vcpu will flush its TLB on next entry. */
-   cpumask_setall(>arch.need_tlb_flush);
err = 0;
goto out;
}
@@ -176,6 +174,10 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, int order)
kvmppc_set_hpt(kvm, );
 
 out:
+   if (err == 0)
+   /* Ensure that each vcpu will flush its TLB on next entry. */
+   cpumask_setall(>arch.need_tlb_flush);
+
mutex_unlock(>lock);
return err;
 }
-- 
2.14.3



Re: [PATCH 0/4] Fix use after free in HPT resizing code and related minor improvements

2017-12-03 Thread David Gibson
On Wed, Nov 29, 2017 at 11:38:22AM -0500, Serhii Popovych wrote:
> It is possible to trigger use after free during HPT resize
> causing host kernel to crash. More details and analysis of
> the problem can be found in change with corresponding subject
> (KVM: PPC: Book3S HV: Fix use after free in case of multiple
> resize requests).
> 
> We need some changes to prepare for the fix, especially
> make ->error in HPT resize instance single point for
> tracking allocation state, improve kvmppc_allocate_hpt()
> and kvmppc_free_hpt() so they can be used more safely.
> 
> See individual commit description message to get more
> information on changes presented.

I spoke with Paul Mackerras about these patches on IRC today.  We want
this as a fix, ASAP, in 4.15.  However, he's uncomfortable with
pushing some of extra cleanups which aren't necessary for the bug fix
this late for 4.15, and was having trouble following what was the core
of the fix.  He was also nervous about the addition of more BUG_ON()s.

To avoid the round trip to Ukraine time and back, I've made revised
versions of patches 1 & 3 which should apply standalone, replaced the
BUG_ON()s with WARN_ON()s and revised the commit messages to better
explain the crucial part of the fix.

However, I've run out of time to test them.

Serhii,  I'll send you my revised patches shortly.  Can you please
test them and repost.  Then you can rebase patches 2 & 4 from this
series on top of the revised patches and post those separately (as a
cleanup with less urgency than the actual fix).

A couple of people have also suggested CCing k...@vger.kernel.org on
the next round in addition to the lists already included.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 0/4] Fix use after free in HPT resizing code and related minor improvements

2017-12-03 Thread David Gibson
On Wed, Nov 29, 2017 at 11:38:22AM -0500, Serhii Popovych wrote:
> It is possible to trigger use after free during HPT resize
> causing host kernel to crash. More details and analysis of
> the problem can be found in change with corresponding subject
> (KVM: PPC: Book3S HV: Fix use after free in case of multiple
> resize requests).
> 
> We need some changes to prepare for the fix, especially
> make ->error in HPT resize instance single point for
> tracking allocation state, improve kvmppc_allocate_hpt()
> and kvmppc_free_hpt() so they can be used more safely.
> 
> See individual commit description message to get more
> information on changes presented.

I spoke with Paul Mackerras about these patches on IRC today.  We want
this as a fix, ASAP, in 4.15.  However, he's uncomfortable with
pushing some of extra cleanups which aren't necessary for the bug fix
this late for 4.15, and was having trouble following what was the core
of the fix.  He was also nervous about the addition of more BUG_ON()s.

To avoid the round trip to Ukraine time and back, I've made revised
versions of patches 1 & 3 which should apply standalone, replaced the
BUG_ON()s with WARN_ON()s and revised the commit messages to better
explain the crucial part of the fix.

However, I've run out of time to test them.

Serhii,  I'll send you my revised patches shortly.  Can you please
test them and repost.  Then you can rebase patches 2 & 4 from this
series on top of the revised patches and post those separately (as a
cleanup with less urgency than the actual fix).

A couple of people have also suggested CCing k...@vger.kernel.org on
the next round in addition to the lists already included.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


[PATCH] Revert "powerpc: Do not call ppc_md.panic in fadump panic notifier"

2017-12-03 Thread David Gibson
This reverts commit a3b2cb30f252b21a6f962e0dd107c8b897ca65e4.

The earlier patch tried to fix problems with panic on powerpc in
certain circumstances, where some output from the generic panic code
was being dropped.

Unfortunately, it breaks things worse in other circumstances.  In
particular when running a PAPR guest, it will now attempt to reboot
instead of informing the hypervisor (KVM or PowerVM) that the guest
has crashed.  The crash notification is important to some
virtualization management layers.

Since the circumstances in which the original patch helped are
somewhat obscure, revert it for now until we figure out how to do it
properly.

Signed-off-by: David Gibson <da...@gibson.dropbear.id.au>
---
 arch/powerpc/include/asm/machdep.h |  1 +
 arch/powerpc/include/asm/setup.h   |  1 +
 arch/powerpc/kernel/fadump.c   | 22 --
 arch/powerpc/kernel/setup-common.c | 27 +++
 arch/powerpc/platforms/ps3/setup.c | 15 +++
 arch/powerpc/platforms/pseries/setup.c |  1 +
 6 files changed, 45 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 73b92017b6d7..cd2fc1cc1cc7 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -76,6 +76,7 @@ struct machdep_calls {
 
void __noreturn (*restart)(char *cmd);
void __noreturn (*halt)(void);
+   void(*panic)(char *str);
void(*cpu_die)(void);
 
long(*time_init)(void); /* Optional, may be NULL */
diff --git a/arch/powerpc/include/asm/setup.h b/arch/powerpc/include/asm/setup.h
index 257d23dbf55d..cf00ec26303a 100644
--- a/arch/powerpc/include/asm/setup.h
+++ b/arch/powerpc/include/asm/setup.h
@@ -24,6 +24,7 @@ extern void reloc_got2(unsigned long);
 
 void check_for_initrd(void);
 void initmem_init(void);
+void setup_panic(void);
 #define ARCH_PANIC_TIMEOUT 180
 
 #ifdef CONFIG_PPC_PSERIES
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 04ea5c04fd24..3c2c2688918f 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -1462,25 +1462,6 @@ static void fadump_init_files(void)
return;
 }
 
-static int fadump_panic_event(struct notifier_block *this,
- unsigned long event, void *ptr)
-{
-   /*
-* If firmware-assisted dump has been registered then trigger
-* firmware-assisted dump and let firmware handle everything
-* else. If this returns, then fadump was not registered, so
-* go through the rest of the panic path.
-*/
-   crash_fadump(NULL, ptr);
-
-   return NOTIFY_DONE;
-}
-
-static struct notifier_block fadump_panic_block = {
-   .notifier_call = fadump_panic_event,
-   .priority = INT_MIN /* may not return; must be done last */
-};
-
 /*
  * Prepare for firmware-assisted dump.
  */
@@ -1513,9 +1494,6 @@ int __init setup_fadump(void)
init_fadump_mem_struct(, fw_dump.reserve_dump_area_start);
fadump_init_files();
 
-   atomic_notifier_chain_register(_notifier_list,
-   _panic_block);
-
return 1;
 }
 subsys_initcall(setup_fadump);
diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index 2075322cd225..9d213542a48b 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -704,6 +704,30 @@ int check_legacy_ioport(unsigned long base_port)
 }
 EXPORT_SYMBOL(check_legacy_ioport);
 
+static int ppc_panic_event(struct notifier_block *this,
+ unsigned long event, void *ptr)
+{
+   /*
+* If firmware-assisted dump has been registered then trigger
+* firmware-assisted dump and let firmware handle everything else.
+*/
+   crash_fadump(NULL, ptr);
+   ppc_md.panic(ptr);  /* May not return */
+   return NOTIFY_DONE;
+}
+
+static struct notifier_block ppc_panic_block = {
+   .notifier_call = ppc_panic_event,
+   .priority = INT_MIN /* may not return; must be done last */
+};
+
+void __init setup_panic(void)
+{
+   if (!ppc_md.panic)
+   return;
+   atomic_notifier_chain_register(_notifier_list, _panic_block);
+}
+
 #ifdef CONFIG_CHECK_CACHE_COHERENCY
 /*
  * For platforms that have configurable cache-coherency.  This function
@@ -848,6 +872,9 @@ void __init setup_arch(char **cmdline_p)
/* Probe the machine type, establish ppc_md. */
probe_machine();
 
+   /* Setup panic notifier if requested by the platform. */
+   setup_panic();
+
/*
 * Configure ppc_md.power_save (ppc32 only, 64-bit machines do
 * it from their respective probe() function.
diff --git a/arch/powerpc/platforms/ps3/setup.c 
b/arch/powerpc/platforms/ps3/setup.c
index 9dabea6e1443..6244bc849469 100644
--- a/arch/powerpc/platforms/ps3/s

[PATCH] Revert "powerpc: Do not call ppc_md.panic in fadump panic notifier"

2017-12-03 Thread David Gibson
This reverts commit a3b2cb30f252b21a6f962e0dd107c8b897ca65e4.

The earlier patch tried to fix problems with panic on powerpc in
certain circumstances, where some output from the generic panic code
was being dropped.

Unfortunately, it breaks things worse in other circumstances.  In
particular when running a PAPR guest, it will now attempt to reboot
instead of informing the hypervisor (KVM or PowerVM) that the guest
has crashed.  The crash notification is important to some
virtualization management layers.

Since the circumstances in which the original patch helped are
somewhat obscure, revert it for now until we figure out how to do it
properly.

Signed-off-by: David Gibson 
---
 arch/powerpc/include/asm/machdep.h |  1 +
 arch/powerpc/include/asm/setup.h   |  1 +
 arch/powerpc/kernel/fadump.c   | 22 --
 arch/powerpc/kernel/setup-common.c | 27 +++
 arch/powerpc/platforms/ps3/setup.c | 15 +++
 arch/powerpc/platforms/pseries/setup.c |  1 +
 6 files changed, 45 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 73b92017b6d7..cd2fc1cc1cc7 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -76,6 +76,7 @@ struct machdep_calls {
 
void __noreturn (*restart)(char *cmd);
void __noreturn (*halt)(void);
+   void(*panic)(char *str);
void(*cpu_die)(void);
 
long(*time_init)(void); /* Optional, may be NULL */
diff --git a/arch/powerpc/include/asm/setup.h b/arch/powerpc/include/asm/setup.h
index 257d23dbf55d..cf00ec26303a 100644
--- a/arch/powerpc/include/asm/setup.h
+++ b/arch/powerpc/include/asm/setup.h
@@ -24,6 +24,7 @@ extern void reloc_got2(unsigned long);
 
 void check_for_initrd(void);
 void initmem_init(void);
+void setup_panic(void);
 #define ARCH_PANIC_TIMEOUT 180
 
 #ifdef CONFIG_PPC_PSERIES
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 04ea5c04fd24..3c2c2688918f 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -1462,25 +1462,6 @@ static void fadump_init_files(void)
return;
 }
 
-static int fadump_panic_event(struct notifier_block *this,
- unsigned long event, void *ptr)
-{
-   /*
-* If firmware-assisted dump has been registered then trigger
-* firmware-assisted dump and let firmware handle everything
-* else. If this returns, then fadump was not registered, so
-* go through the rest of the panic path.
-*/
-   crash_fadump(NULL, ptr);
-
-   return NOTIFY_DONE;
-}
-
-static struct notifier_block fadump_panic_block = {
-   .notifier_call = fadump_panic_event,
-   .priority = INT_MIN /* may not return; must be done last */
-};
-
 /*
  * Prepare for firmware-assisted dump.
  */
@@ -1513,9 +1494,6 @@ int __init setup_fadump(void)
init_fadump_mem_struct(, fw_dump.reserve_dump_area_start);
fadump_init_files();
 
-   atomic_notifier_chain_register(_notifier_list,
-   _panic_block);
-
return 1;
 }
 subsys_initcall(setup_fadump);
diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index 2075322cd225..9d213542a48b 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -704,6 +704,30 @@ int check_legacy_ioport(unsigned long base_port)
 }
 EXPORT_SYMBOL(check_legacy_ioport);
 
+static int ppc_panic_event(struct notifier_block *this,
+ unsigned long event, void *ptr)
+{
+   /*
+* If firmware-assisted dump has been registered then trigger
+* firmware-assisted dump and let firmware handle everything else.
+*/
+   crash_fadump(NULL, ptr);
+   ppc_md.panic(ptr);  /* May not return */
+   return NOTIFY_DONE;
+}
+
+static struct notifier_block ppc_panic_block = {
+   .notifier_call = ppc_panic_event,
+   .priority = INT_MIN /* may not return; must be done last */
+};
+
+void __init setup_panic(void)
+{
+   if (!ppc_md.panic)
+   return;
+   atomic_notifier_chain_register(_notifier_list, _panic_block);
+}
+
 #ifdef CONFIG_CHECK_CACHE_COHERENCY
 /*
  * For platforms that have configurable cache-coherency.  This function
@@ -848,6 +872,9 @@ void __init setup_arch(char **cmdline_p)
/* Probe the machine type, establish ppc_md. */
probe_machine();
 
+   /* Setup panic notifier if requested by the platform. */
+   setup_panic();
+
/*
 * Configure ppc_md.power_save (ppc32 only, 64-bit machines do
 * it from their respective probe() function.
diff --git a/arch/powerpc/platforms/ps3/setup.c 
b/arch/powerpc/platforms/ps3/setup.c
index 9dabea6e1443..6244bc849469 100644
--- a/arch/powerpc/platforms/ps3/setup.c
+++ b/arch/powerpc/platforms/ps3

Re: [PATCH 4/4] KVM: PPC: Book3S HV: Remove redundant parameter from resize_hpt_release()

2017-11-29 Thread David Gibson
On Wed, Nov 29, 2017 at 11:38:26AM -0500, Serhii Popovych wrote:
> There is no need to pass it explicitly from the caller:
> struct kvm_resize_hpt already contains it.
> 
> Additional benefit from this change is that BUG_ON()
> assertion now checks that mutex is held on kvm instance
> associated with resize structure we going to release.
> 
> Also kill check for resize being NULL to make code
> simpler and we called with resize != NULL in all
> places except kvm_vm_ioctl_resize_hpt_commit().
> 
> Signed-off-by: Serhii Popovych <spopo...@redhat.com>

Reviewed-by: David Gibson <da...@gibson.dropbear.id.au>

> ---
>  arch/powerpc/kvm/book3s_64_mmu_hv.c | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
> b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index 690f061..a74a0ad 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -1415,12 +1415,11 @@ static void resize_hpt_pivot(struct kvm_resize_hpt 
> *resize)
>   resize_hpt_debug(resize, "resize_hpt_pivot() done\n");
>  }
>  
> -static void resize_hpt_release(struct kvm *kvm, struct kvm_resize_hpt 
> *resize)
> +static void resize_hpt_release(struct kvm_resize_hpt *resize)
>  {
> - BUG_ON(!mutex_is_locked(>lock));
> + struct kvm *kvm = resize->kvm;
>  
> - if (!resize)
> - return;
> + BUG_ON(!mutex_is_locked(>lock));
>  
>   if (resize->error != -EBUSY) {
>   kvmppc_free_hpt(>hpt);
> @@ -1469,7 +1468,7 @@ static void resize_hpt_prepare_work(struct work_struct 
> *work)
>   resize->error = err;
>  
>   if (kvm->arch.resize_hpt != resize)
> - resize_hpt_release(kvm, resize);
> + resize_hpt_release(resize);
>  
>   mutex_unlock(>lock);
>  }
> @@ -1499,13 +1498,13 @@ long kvm_vm_ioctl_resize_hpt_prepare(struct kvm *kvm,
>   if (ret == -EBUSY)
>   ret = 100; /* estimated time in ms */
>   else if (ret)
> - resize_hpt_release(kvm, resize);
> + resize_hpt_release(resize);
>  
>   goto out;
>   }
>  
>   /* not suitable, cancel it */
> - resize_hpt_release(kvm, resize);
> + resize_hpt_release(resize);
>   }
>  
>   ret = 0;
> @@ -1590,7 +1589,8 @@ long kvm_vm_ioctl_resize_hpt_commit(struct kvm *kvm,
>   kvm->arch.mmu_ready = 1;
>   smp_mb();
>  out_no_hpt:
> - resize_hpt_release(kvm, resize);
> + if (resize)
> + resize_hpt_release(resize);
>   mutex_unlock(>lock);
>   return ret;
>  }

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 0/4] Fix use after free in HPT resizing code and related minor improvements

2017-11-29 Thread David Gibson
On Wed, Nov 29, 2017 at 11:38:22AM -0500, Serhii Popovych wrote:
> It is possible to trigger use after free during HPT resize
> causing host kernel to crash. More details and analysis of
> the problem can be found in change with corresponding subject
> (KVM: PPC: Book3S HV: Fix use after free in case of multiple
> resize requests).
> 
> We need some changes to prepare for the fix, especially
> make ->error in HPT resize instance single point for
> tracking allocation state, improve kvmppc_allocate_hpt()
> and kvmppc_free_hpt() so they can be used more safely.
> 
> See individual commit description message to get more
> information on changes presented.
> 
> Serhii Popovych (4):
>   KVM: PPC: Book3S HV: Drop prepare_done from struct kvm_resize_hpt and
> cleanups
>   KVM: PPC: Book3S HV: Improve kvmppc_allocate_hpt()/kvmppc_free_hpt()
>   KVM: PPC: Book3S HV: Fix use after free in case of multiple resize
> requests
>   KVM: PPC: Book3S HV: Remove redundant parameter from
> resize_hpt_release()
> 
>  arch/powerpc/kvm/book3s_64_mmu_hv.c | 139 
> +---
>  1 file changed, 82 insertions(+), 57 deletions(-)

Paul, these (at least 1-3) fix (another :() host crash bug which can
be triggered by guest and/or userspace actions.  Please merge ASAP.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 3/4] KVM: PPC: Book3S HV: Fix use after free in case of multiple resize requests

2017-11-29 Thread David Gibson
rk+0xe4/0x1c0 [kvm_hv]
> [  635.280507] [c007e9183c40] [c0113c0c] 
> process_one_work+0x1dc/0x680
> [  635.280587] [c007e9183ce0] [c0114250] worker_thread+0x1a0/0x520
> [  635.280655] [c007e9183d80] [c012010c] kthread+0xec/0x100
> [  635.280724] [c007e9183e30] [c000a4b8] 
> ret_from_kernel_thread+0x5c/0xa4
> [  635.280814] Instruction dump:
> [  635.280880] 7c0802a6 fba1ffe8 fbc1fff0 7cbd2b78 fbe1fff8 7c9e2378 7c7f1b78
> f8010010
> [  635.281099] f821ff81 e8a50008 7fa52040 40de00b8  7fbd2840 
> 40de008c
> 7fbff040
> [  635.281324] ---[ end trace b628b73449719b9d ]---
> 
> Signed-off-by: Serhii Popovych <spopo...@redhat.com>

Reviewed-by: David Gibson <da...@gibson.dropbear.id.au>

> ---
>  arch/powerpc/kvm/book3s_64_mmu_hv.c | 45 
> ++---
>  1 file changed, 32 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
> b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index 3e9abd9..690f061 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -1417,15 +1417,18 @@ static void resize_hpt_pivot(struct kvm_resize_hpt 
> *resize)
>  
>  static void resize_hpt_release(struct kvm *kvm, struct kvm_resize_hpt 
> *resize)
>  {
> - BUG_ON(kvm->arch.resize_hpt != resize);
> + BUG_ON(!mutex_is_locked(>lock));
>  
>   if (!resize)
>   return;
>  
> - kvmppc_free_hpt(>hpt);
> + if (resize->error != -EBUSY) {
> + kvmppc_free_hpt(>hpt);
> + kfree(resize);
> + }
>  
> - kvm->arch.resize_hpt = NULL;
> - kfree(resize);
> + if (kvm->arch.resize_hpt == resize)
> + kvm->arch.resize_hpt = NULL;
>  }
>  
>  static void resize_hpt_prepare_work(struct work_struct *work)
> @@ -1434,24 +1437,40 @@ static void resize_hpt_prepare_work(struct 
> work_struct *work)
>struct kvm_resize_hpt,
>work);
>   struct kvm *kvm = resize->kvm;
> - int err;
> + int err = 0;
>  
>   BUG_ON(resize->error != -EBUSY);
>  
> - resize_hpt_debug(resize, "resize_hpt_prepare_work(): order = %d\n",
> -  resize->order);
> + mutex_lock(>lock);
> +
> + /* Request is still current? */
> + if (kvm->arch.resize_hpt == resize) {
> + /* We may request large allocations here:
> +  * do not sleep with kvm->lock held for a while.
> +  */
> + mutex_unlock(>lock);
>  
> - err = resize_hpt_allocate(resize);
> + resize_hpt_debug(resize, "resize_hpt_prepare_work(): order = 
> %d\n",
> +  resize->order);
>  
> - /* We have strict assumption about -EBUSY
> -  * when preparing for HPT resize.
> -  */
> - BUG_ON(err == -EBUSY);
> + err = resize_hpt_allocate(resize);
>  
> - mutex_lock(>lock);
> + /* We have strict assumption about -EBUSY
> +  * when preparing for HPT resize.
> +  */
> + BUG_ON(err == -EBUSY);
> +
> + mutex_lock(>lock);
> + /* It is possible that kvm->arch.resize_hpt != resize
> +  * after we grab kvm->lock again.
> +  */
> + }
>  
>   resize->error = err;
>  
> + if (kvm->arch.resize_hpt != resize)
> + resize_hpt_release(kvm, resize);
> +
>   mutex_unlock(>lock);
>  }
>  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 4/4] KVM: PPC: Book3S HV: Remove redundant parameter from resize_hpt_release()

2017-11-29 Thread David Gibson
On Wed, Nov 29, 2017 at 11:38:26AM -0500, Serhii Popovych wrote:
> There is no need to pass it explicitly from the caller:
> struct kvm_resize_hpt already contains it.
> 
> Additional benefit from this change is that BUG_ON()
> assertion now checks that mutex is held on kvm instance
> associated with resize structure we going to release.
> 
> Also kill check for resize being NULL to make code
> simpler and we called with resize != NULL in all
> places except kvm_vm_ioctl_resize_hpt_commit().
> 
> Signed-off-by: Serhii Popovych 

Reviewed-by: David Gibson 

> ---
>  arch/powerpc/kvm/book3s_64_mmu_hv.c | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
> b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index 690f061..a74a0ad 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -1415,12 +1415,11 @@ static void resize_hpt_pivot(struct kvm_resize_hpt 
> *resize)
>   resize_hpt_debug(resize, "resize_hpt_pivot() done\n");
>  }
>  
> -static void resize_hpt_release(struct kvm *kvm, struct kvm_resize_hpt 
> *resize)
> +static void resize_hpt_release(struct kvm_resize_hpt *resize)
>  {
> - BUG_ON(!mutex_is_locked(>lock));
> + struct kvm *kvm = resize->kvm;
>  
> - if (!resize)
> - return;
> + BUG_ON(!mutex_is_locked(>lock));
>  
>   if (resize->error != -EBUSY) {
>   kvmppc_free_hpt(>hpt);
> @@ -1469,7 +1468,7 @@ static void resize_hpt_prepare_work(struct work_struct 
> *work)
>   resize->error = err;
>  
>   if (kvm->arch.resize_hpt != resize)
> - resize_hpt_release(kvm, resize);
> + resize_hpt_release(resize);
>  
>   mutex_unlock(>lock);
>  }
> @@ -1499,13 +1498,13 @@ long kvm_vm_ioctl_resize_hpt_prepare(struct kvm *kvm,
>   if (ret == -EBUSY)
>   ret = 100; /* estimated time in ms */
>   else if (ret)
> - resize_hpt_release(kvm, resize);
> + resize_hpt_release(resize);
>  
>   goto out;
>   }
>  
>   /* not suitable, cancel it */
> - resize_hpt_release(kvm, resize);
> + resize_hpt_release(resize);
>   }
>  
>   ret = 0;
> @@ -1590,7 +1589,8 @@ long kvm_vm_ioctl_resize_hpt_commit(struct kvm *kvm,
>   kvm->arch.mmu_ready = 1;
>   smp_mb();
>  out_no_hpt:
> - resize_hpt_release(kvm, resize);
> + if (resize)
> + resize_hpt_release(resize);
>   mutex_unlock(>lock);
>   return ret;
>  }

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 0/4] Fix use after free in HPT resizing code and related minor improvements

2017-11-29 Thread David Gibson
On Wed, Nov 29, 2017 at 11:38:22AM -0500, Serhii Popovych wrote:
> It is possible to trigger use after free during HPT resize
> causing host kernel to crash. More details and analysis of
> the problem can be found in change with corresponding subject
> (KVM: PPC: Book3S HV: Fix use after free in case of multiple
> resize requests).
> 
> We need some changes to prepare for the fix, especially
> make ->error in HPT resize instance single point for
> tracking allocation state, improve kvmppc_allocate_hpt()
> and kvmppc_free_hpt() so they can be used more safely.
> 
> See individual commit description message to get more
> information on changes presented.
> 
> Serhii Popovych (4):
>   KVM: PPC: Book3S HV: Drop prepare_done from struct kvm_resize_hpt and
> cleanups
>   KVM: PPC: Book3S HV: Improve kvmppc_allocate_hpt()/kvmppc_free_hpt()
>   KVM: PPC: Book3S HV: Fix use after free in case of multiple resize
> requests
>   KVM: PPC: Book3S HV: Remove redundant parameter from
> resize_hpt_release()
> 
>  arch/powerpc/kvm/book3s_64_mmu_hv.c | 139 
> +---
>  1 file changed, 82 insertions(+), 57 deletions(-)

Paul, these (at least 1-3) fix (another :() host crash bug which can
be triggered by guest and/or userspace actions.  Please merge ASAP.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 3/4] KVM: PPC: Book3S HV: Fix use after free in case of multiple resize requests

2017-11-29 Thread David Gibson
[c007e9183c40] [c0113c0c] 
> process_one_work+0x1dc/0x680
> [  635.280587] [c007e9183ce0] [c0114250] worker_thread+0x1a0/0x520
> [  635.280655] [c007e9183d80] [c012010c] kthread+0xec/0x100
> [  635.280724] [c007e9183e30] [c000a4b8] 
> ret_from_kernel_thread+0x5c/0xa4
> [  635.280814] Instruction dump:
> [  635.280880] 7c0802a6 fba1ffe8 fbc1fff0 7cbd2b78 fbe1fff8 7c9e2378 7c7f1b78
> f8010010
> [  635.281099] f821ff81 e8a50008 7fa52040 40de00b8  7fbd2840 
> 40de008c
> 7fbff040
> [  635.281324] ---[ end trace b628b73449719b9d ]---
> 
> Signed-off-by: Serhii Popovych 

Reviewed-by: David Gibson 

> ---
>  arch/powerpc/kvm/book3s_64_mmu_hv.c | 45 
> ++---
>  1 file changed, 32 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
> b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index 3e9abd9..690f061 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -1417,15 +1417,18 @@ static void resize_hpt_pivot(struct kvm_resize_hpt 
> *resize)
>  
>  static void resize_hpt_release(struct kvm *kvm, struct kvm_resize_hpt 
> *resize)
>  {
> - BUG_ON(kvm->arch.resize_hpt != resize);
> + BUG_ON(!mutex_is_locked(>lock));
>  
>   if (!resize)
>   return;
>  
> - kvmppc_free_hpt(>hpt);
> + if (resize->error != -EBUSY) {
> + kvmppc_free_hpt(>hpt);
> + kfree(resize);
> + }
>  
> - kvm->arch.resize_hpt = NULL;
> - kfree(resize);
> + if (kvm->arch.resize_hpt == resize)
> + kvm->arch.resize_hpt = NULL;
>  }
>  
>  static void resize_hpt_prepare_work(struct work_struct *work)
> @@ -1434,24 +1437,40 @@ static void resize_hpt_prepare_work(struct 
> work_struct *work)
>struct kvm_resize_hpt,
>work);
>   struct kvm *kvm = resize->kvm;
> - int err;
> + int err = 0;
>  
>   BUG_ON(resize->error != -EBUSY);
>  
> - resize_hpt_debug(resize, "resize_hpt_prepare_work(): order = %d\n",
> -  resize->order);
> + mutex_lock(>lock);
> +
> + /* Request is still current? */
> + if (kvm->arch.resize_hpt == resize) {
> + /* We may request large allocations here:
> +  * do not sleep with kvm->lock held for a while.
> +  */
> + mutex_unlock(>lock);
>  
> - err = resize_hpt_allocate(resize);
> + resize_hpt_debug(resize, "resize_hpt_prepare_work(): order = 
> %d\n",
> +  resize->order);
>  
> - /* We have strict assumption about -EBUSY
> -  * when preparing for HPT resize.
> -  */
> - BUG_ON(err == -EBUSY);
> + err = resize_hpt_allocate(resize);
>  
> - mutex_lock(>lock);
> + /* We have strict assumption about -EBUSY
> +  * when preparing for HPT resize.
> +  */
> + BUG_ON(err == -EBUSY);
> +
> + mutex_lock(>lock);
> + /* It is possible that kvm->arch.resize_hpt != resize
> +  * after we grab kvm->lock again.
> +  */
> + }
>  
>   resize->error = err;
>  
> + if (kvm->arch.resize_hpt != resize)
> + resize_hpt_release(kvm, resize);
> +
>   mutex_unlock(>lock);
>  }
>  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 2/4] KVM: PPC: Book3S HV: Improve kvmppc_allocate_hpt()/kvmppc_free_hpt()

2017-11-29 Thread David Gibson
On Wed, Nov 29, 2017 at 11:38:24AM -0500, Serhii Popovych wrote:
> There are several points of improvements:
> 
>   1) Make kvmppc_free_hpt() check if allocation is made before attempt
>  to release. This follows kfree(p) semantics where p == NULL.
> 
>   2) Return initialized @info parameter from kvmppc_allocate_hpt()
>  even if allocation fails.
> 
>  This allows to use kvmppc_free_hpt() in the caller without
>  checking that preceded kvmppc_allocate_hpt() was successful
> 
>  p = kmalloc(size, gfp);
>  kfree(p);
> 
>  which is correct for both p != NULL and p == NULL. Followup
>  change will rely on this behaviour.
> 
>   3) Better code reuse: kvmppc_free_hpt() can be reused on error
>  path in kvmppc_allocate_hpt() to avoid code duplication.
> 
>   4) No need to check for !hpt if allocated from CMA: neither
>  pfn_to_kaddr() nor page_to_pfn() is 0 in case of page != NULL.
> 
> Signed-off-by: Serhii Popovych <spopo...@redhat.com>

Reviewed-by: David Gibson <da...@gibson.dropbear.id.au>

> ---
>  arch/powerpc/kvm/book3s_64_mmu_hv.c | 54 
> ++---
>  1 file changed, 26 insertions(+), 28 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
> b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index 0534aab..3e9abd9 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -82,47 +82,44 @@ struct kvm_resize_hpt {
>  int kvmppc_allocate_hpt(struct kvm_hpt_info *info, u32 order)
>  {
>   unsigned long hpt = 0;
> - int cma = 0;
> - struct page *page = NULL;
> - struct revmap_entry *rev;
> + int err, cma = 0;
> + struct page *page;
> + struct revmap_entry *rev = NULL;
>   unsigned long npte;
>  
> + err = -EINVAL;
>   if ((order < PPC_MIN_HPT_ORDER) || (order > PPC_MAX_HPT_ORDER))
> - return -EINVAL;
> + goto out;
>  
> + err = -ENOMEM;
>   page = kvm_alloc_hpt_cma(1ul << (order - PAGE_SHIFT));
>   if (page) {
>   hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page));
>   memset((void *)hpt, 0, (1ul << order));
>   cma = 1;
> - }
> -
> - if (!hpt)
> + } else {
>   hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_RETRY_MAYFAIL
>  |__GFP_NOWARN, order - PAGE_SHIFT);
> -
> - if (!hpt)
> - return -ENOMEM;
> + if (!hpt)
> + goto out;
> + }
>  
>   /* HPTEs are 2**4 bytes long */
>   npte = 1ul << (order - 4);
>  
>   /* Allocate reverse map array */
> - rev = vmalloc(sizeof(struct revmap_entry) * npte);
> - if (!rev) {
> - if (cma)
> - kvm_free_hpt_cma(page, 1 << (order - PAGE_SHIFT));
> - else
> - free_pages(hpt, order - PAGE_SHIFT);
> - return -ENOMEM;
> - }
> -
> + rev = vmalloc(sizeof(*rev) * npte);
> + if (rev)
> + err = 0;
> +out:
>   info->order = order;
>   info->virt = hpt;
>   info->cma = cma;
>   info->rev = rev;
>  
> - return 0;
> + if (err)
> + kvmppc_free_hpt(info);
> + return err;
>  }
>  
>  void kvmppc_set_hpt(struct kvm *kvm, struct kvm_hpt_info *info)
> @@ -190,12 +187,14 @@ void kvmppc_free_hpt(struct kvm_hpt_info *info)
>  {
>   vfree(info->rev);
>   info->rev = NULL;
> - if (info->cma)
> - kvm_free_hpt_cma(virt_to_page(info->virt),
> -  1 << (info->order - PAGE_SHIFT));
> - else if (info->virt)
> - free_pages(info->virt, info->order - PAGE_SHIFT);
> - info->virt = 0;
> + if (info->virt) {
> + if (info->cma)
> + kvm_free_hpt_cma(virt_to_page(info->virt),
> +  1 << (info->order - PAGE_SHIFT));
> + else
> + free_pages(info->virt, info->order - PAGE_SHIFT);
> + info->virt = 0;
> + }
>   info->order = 0;
>  }
>  
> @@ -1423,8 +1422,7 @@ static void resize_hpt_release(struct kvm *kvm, struct 
> kvm_resize_hpt *resize)
>   if (!resize)
>   return;
>  
> - if (resize->hpt.virt)
> - kvmppc_free_hpt(>hpt);
> + kvmppc_free_hpt(>hpt);
>  
>   kvm->arch.resize_hpt = NULL;
>   kfree(resize);

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 2/4] KVM: PPC: Book3S HV: Improve kvmppc_allocate_hpt()/kvmppc_free_hpt()

2017-11-29 Thread David Gibson
On Wed, Nov 29, 2017 at 11:38:24AM -0500, Serhii Popovych wrote:
> There are several points of improvements:
> 
>   1) Make kvmppc_free_hpt() check if allocation is made before attempt
>  to release. This follows kfree(p) semantics where p == NULL.
> 
>   2) Return initialized @info parameter from kvmppc_allocate_hpt()
>  even if allocation fails.
> 
>  This allows to use kvmppc_free_hpt() in the caller without
>  checking that preceded kvmppc_allocate_hpt() was successful
> 
>  p = kmalloc(size, gfp);
>  kfree(p);
> 
>  which is correct for both p != NULL and p == NULL. Followup
>  change will rely on this behaviour.
> 
>   3) Better code reuse: kvmppc_free_hpt() can be reused on error
>  path in kvmppc_allocate_hpt() to avoid code duplication.
> 
>   4) No need to check for !hpt if allocated from CMA: neither
>  pfn_to_kaddr() nor page_to_pfn() is 0 in case of page != NULL.
> 
> Signed-off-by: Serhii Popovych 

Reviewed-by: David Gibson 

> ---
>  arch/powerpc/kvm/book3s_64_mmu_hv.c | 54 
> ++---
>  1 file changed, 26 insertions(+), 28 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
> b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index 0534aab..3e9abd9 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -82,47 +82,44 @@ struct kvm_resize_hpt {
>  int kvmppc_allocate_hpt(struct kvm_hpt_info *info, u32 order)
>  {
>   unsigned long hpt = 0;
> - int cma = 0;
> - struct page *page = NULL;
> - struct revmap_entry *rev;
> + int err, cma = 0;
> + struct page *page;
> + struct revmap_entry *rev = NULL;
>   unsigned long npte;
>  
> + err = -EINVAL;
>   if ((order < PPC_MIN_HPT_ORDER) || (order > PPC_MAX_HPT_ORDER))
> - return -EINVAL;
> + goto out;
>  
> + err = -ENOMEM;
>   page = kvm_alloc_hpt_cma(1ul << (order - PAGE_SHIFT));
>   if (page) {
>   hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page));
>   memset((void *)hpt, 0, (1ul << order));
>   cma = 1;
> - }
> -
> - if (!hpt)
> + } else {
>   hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_RETRY_MAYFAIL
>  |__GFP_NOWARN, order - PAGE_SHIFT);
> -
> - if (!hpt)
> - return -ENOMEM;
> + if (!hpt)
> + goto out;
> + }
>  
>   /* HPTEs are 2**4 bytes long */
>   npte = 1ul << (order - 4);
>  
>   /* Allocate reverse map array */
> - rev = vmalloc(sizeof(struct revmap_entry) * npte);
> - if (!rev) {
> - if (cma)
> - kvm_free_hpt_cma(page, 1 << (order - PAGE_SHIFT));
> - else
> - free_pages(hpt, order - PAGE_SHIFT);
> - return -ENOMEM;
> - }
> -
> + rev = vmalloc(sizeof(*rev) * npte);
> + if (rev)
> + err = 0;
> +out:
>   info->order = order;
>   info->virt = hpt;
>   info->cma = cma;
>   info->rev = rev;
>  
> - return 0;
> + if (err)
> + kvmppc_free_hpt(info);
> + return err;
>  }
>  
>  void kvmppc_set_hpt(struct kvm *kvm, struct kvm_hpt_info *info)
> @@ -190,12 +187,14 @@ void kvmppc_free_hpt(struct kvm_hpt_info *info)
>  {
>   vfree(info->rev);
>   info->rev = NULL;
> - if (info->cma)
> - kvm_free_hpt_cma(virt_to_page(info->virt),
> -  1 << (info->order - PAGE_SHIFT));
> - else if (info->virt)
> - free_pages(info->virt, info->order - PAGE_SHIFT);
> - info->virt = 0;
> + if (info->virt) {
> + if (info->cma)
> + kvm_free_hpt_cma(virt_to_page(info->virt),
> +  1 << (info->order - PAGE_SHIFT));
> + else
> + free_pages(info->virt, info->order - PAGE_SHIFT);
> + info->virt = 0;
> + }
>   info->order = 0;
>  }
>  
> @@ -1423,8 +1422,7 @@ static void resize_hpt_release(struct kvm *kvm, struct 
> kvm_resize_hpt *resize)
>   if (!resize)
>   return;
>  
> - if (resize->hpt.virt)
> - kvmppc_free_hpt(>hpt);
> + kvmppc_free_hpt(>hpt);
>  
>   kvm->arch.resize_hpt = NULL;
>   kfree(resize);

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 1/4] KVM: PPC: Book3S HV: Drop prepare_done from struct kvm_resize_hpt and cleanups

2017-11-29 Thread David Gibson
On Wed, Nov 29, 2017 at 11:38:23AM -0500, Serhii Popovych wrote:
> Replace ->prepare_done flag functionality with special handling
> of -EBUSY in ->error as indicator that allocation work is running.
> 
> Besides cosmetics this reduces size of struct kvm_resize_hpt by
> __alignof__(struct kvm_hpt_info) and saves few bytes of code.
> 
> While there correct comment in struct kvm_resize_hpt about locking
> used to protect access to certain fields.
> 
> Assert with BUG_ON() in case of HPT allocation thread work runs
> more than once for resize request or resize_hpt_allocate()
> returns -EBUSY that is treated specially.
> 
> Change comparison against zero to make checkpatch.pl happy.
> 
> Signed-off-by: Serhii Popovych <spopo...@redhat.com>

Reviewed-by: David Gibson <da...@gibson.dropbear.id.au>

> ---
>  arch/powerpc/kvm/book3s_64_mmu_hv.c | 42 
> ++---
>  1 file changed, 25 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
> b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index 235319c..0534aab 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -65,11 +65,17 @@ struct kvm_resize_hpt {
>   u32 order;
>  
>   /* These fields protected by kvm->lock */
> +
> + /* Possible values and their usage:
> +  *  <0 an error occurred during allocation,
> +   * -EBUSY allocation is in the progress,
> +  *  0  allocation made successfuly.
> +  */
>   int error;
> - bool prepare_done;
>  
> - /* Private to the work thread, until prepare_done is true,
> -  * then protected by kvm->resize_hpt_sem */
> + /* Private to the work thread, until error != -EBUSY,
> +  * then protected by kvm->lock.
> +  */
>   struct kvm_hpt_info hpt;
>  };
>  
> @@ -1432,15 +1438,21 @@ static void resize_hpt_prepare_work(struct 
> work_struct *work)
>   struct kvm *kvm = resize->kvm;
>   int err;
>  
> + BUG_ON(resize->error != -EBUSY);
> +
>   resize_hpt_debug(resize, "resize_hpt_prepare_work(): order = %d\n",
>resize->order);
>  
>   err = resize_hpt_allocate(resize);
>  
> + /* We have strict assumption about -EBUSY
> +  * when preparing for HPT resize.
> +  */
> + BUG_ON(err == -EBUSY);
> +
>   mutex_lock(>lock);
>  
>   resize->error = err;
> - resize->prepare_done = true;
>  
>   mutex_unlock(>lock);
>  }
> @@ -1465,14 +1477,12 @@ long kvm_vm_ioctl_resize_hpt_prepare(struct kvm *kvm,
>  
>   if (resize) {
>   if (resize->order == shift) {
> - /* Suitable resize in progress */
> - if (resize->prepare_done) {
> - ret = resize->error;
> - if (ret != 0)
> - resize_hpt_release(kvm, resize);
> - } else {
> + /* Suitable resize in progress? */
> + ret = resize->error;
> + if (ret == -EBUSY)
>   ret = 100; /* estimated time in ms */
> - }
> + else if (ret)
> + resize_hpt_release(kvm, resize);
>  
>   goto out;
>   }
> @@ -1492,6 +1502,8 @@ long kvm_vm_ioctl_resize_hpt_prepare(struct kvm *kvm,
>   ret = -ENOMEM;
>   goto out;
>   }
> +
> + resize->error = -EBUSY;
>   resize->order = shift;
>   resize->kvm = kvm;
>   INIT_WORK(>work, resize_hpt_prepare_work);
> @@ -1546,16 +1558,12 @@ long kvm_vm_ioctl_resize_hpt_commit(struct kvm *kvm,
>   if (!resize || (resize->order != shift))
>   goto out;
>  
> - ret = -EBUSY;
> - if (!resize->prepare_done)
> - goto out;
> -
>   ret = resize->error;
> - if (ret != 0)
> + if (ret)
>   goto out;
>  
>   ret = resize_hpt_rehash(resize);
> - if (ret != 0)
> + if (ret)
>   goto out;
>  
>   resize_hpt_pivot(resize);

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 1/4] KVM: PPC: Book3S HV: Drop prepare_done from struct kvm_resize_hpt and cleanups

2017-11-29 Thread David Gibson
On Wed, Nov 29, 2017 at 11:38:23AM -0500, Serhii Popovych wrote:
> Replace ->prepare_done flag functionality with special handling
> of -EBUSY in ->error as indicator that allocation work is running.
> 
> Besides cosmetics this reduces size of struct kvm_resize_hpt by
> __alignof__(struct kvm_hpt_info) and saves few bytes of code.
> 
> While there correct comment in struct kvm_resize_hpt about locking
> used to protect access to certain fields.
> 
> Assert with BUG_ON() in case of HPT allocation thread work runs
> more than once for resize request or resize_hpt_allocate()
> returns -EBUSY that is treated specially.
> 
> Change comparison against zero to make checkpatch.pl happy.
> 
> Signed-off-by: Serhii Popovych 

Reviewed-by: David Gibson 

> ---
>  arch/powerpc/kvm/book3s_64_mmu_hv.c | 42 
> ++---
>  1 file changed, 25 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
> b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index 235319c..0534aab 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -65,11 +65,17 @@ struct kvm_resize_hpt {
>   u32 order;
>  
>   /* These fields protected by kvm->lock */
> +
> + /* Possible values and their usage:
> +  *  <0 an error occurred during allocation,
> +   * -EBUSY allocation is in the progress,
> +  *  0  allocation made successfuly.
> +  */
>   int error;
> - bool prepare_done;
>  
> - /* Private to the work thread, until prepare_done is true,
> -  * then protected by kvm->resize_hpt_sem */
> + /* Private to the work thread, until error != -EBUSY,
> +  * then protected by kvm->lock.
> +  */
>   struct kvm_hpt_info hpt;
>  };
>  
> @@ -1432,15 +1438,21 @@ static void resize_hpt_prepare_work(struct 
> work_struct *work)
>   struct kvm *kvm = resize->kvm;
>   int err;
>  
> + BUG_ON(resize->error != -EBUSY);
> +
>   resize_hpt_debug(resize, "resize_hpt_prepare_work(): order = %d\n",
>resize->order);
>  
>   err = resize_hpt_allocate(resize);
>  
> + /* We have strict assumption about -EBUSY
> +  * when preparing for HPT resize.
> +  */
> + BUG_ON(err == -EBUSY);
> +
>   mutex_lock(>lock);
>  
>   resize->error = err;
> - resize->prepare_done = true;
>  
>   mutex_unlock(>lock);
>  }
> @@ -1465,14 +1477,12 @@ long kvm_vm_ioctl_resize_hpt_prepare(struct kvm *kvm,
>  
>   if (resize) {
>   if (resize->order == shift) {
> - /* Suitable resize in progress */
> - if (resize->prepare_done) {
> - ret = resize->error;
> - if (ret != 0)
> - resize_hpt_release(kvm, resize);
> - } else {
> + /* Suitable resize in progress? */
> + ret = resize->error;
> + if (ret == -EBUSY)
>   ret = 100; /* estimated time in ms */
> - }
> + else if (ret)
> + resize_hpt_release(kvm, resize);
>  
>   goto out;
>   }
> @@ -1492,6 +1502,8 @@ long kvm_vm_ioctl_resize_hpt_prepare(struct kvm *kvm,
>   ret = -ENOMEM;
>   goto out;
>   }
> +
> + resize->error = -EBUSY;
>   resize->order = shift;
>   resize->kvm = kvm;
>   INIT_WORK(>work, resize_hpt_prepare_work);
> @@ -1546,16 +1558,12 @@ long kvm_vm_ioctl_resize_hpt_commit(struct kvm *kvm,
>   if (!resize || (resize->order != shift))
>   goto out;
>  
> - ret = -EBUSY;
> - if (!resize->prepare_done)
> - goto out;
> -
>   ret = resize->error;
> - if (ret != 0)
> + if (ret)
>   goto out;
>  
>   ret = resize_hpt_rehash(resize);
> - if (ret != 0)
> + if (ret)
>   goto out;
>  
>   resize_hpt_pivot(resize);

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 3/5] powerpc-pseries: Delete an unnecessary variable initialisation in iommu_pseries_alloc_group()

2017-10-19 Thread David Gibson
On Thu, Oct 19, 2017 at 03:55:59PM +0300, Dan Carpenter wrote:
> On Thu, Oct 19, 2017 at 01:37:18PM +0200, Michal Suchánek wrote:
> > Hello,
> > 
> > On Wed, 18 Oct 2017 21:24:25 +0200
> > SF Markus Elfring <elfr...@users.sourceforge.net> wrote:
> > 
> > > From: Markus Elfring <elfr...@users.sourceforge.net>
> > > Date: Wed, 18 Oct 2017 19:14:39 +0200
> > > 
> > > The variable "table_group" will be set to an appropriate pointer.
> > > Thus omit the explicit initialisation at the beginning.
> > > 
> > > Signed-off-by: Markus Elfring <elfr...@users.sourceforge.net>
> > > ---
> > >  arch/powerpc/platforms/pseries/iommu.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/arch/powerpc/platforms/pseries/iommu.c
> > > b/arch/powerpc/platforms/pseries/iommu.c index
> > > b37d4fb20d1c..b6c12b8e3ace 100644 ---
> > > a/arch/powerpc/platforms/pseries/iommu.c +++
> > > b/arch/powerpc/platforms/pseries/iommu.c @@ -55,7 +55,7 @@
> > >  
> > >  static struct iommu_table_group *iommu_pseries_alloc_group(int node)
> > >  {
> > > - struct iommu_table_group *table_group = NULL;
> > > + struct iommu_table_group *table_group;
> > >   struct iommu_table *tbl = NULL;
> > >   struct iommu_table_group_link *tgl = NULL;
> > >  
> > 
> > I think initializing pointers to NULL is generally a good idea.
> > 
> > If there is no use of the variable before it is reinitialized by
> > allocation gcc is free to optimize out the variable and its initial
> > value.
> > 
> > On the other hand, if the code is changed later and use of the variable
> > becomes possible you may crash (and get a gcc warning, too).
> 
> No, it's the opposite. GCC doesn't warn about potential NULL
> dereferences, it warns about uninitialized variables.  By initializing
> it to a bogus value, you're deliberately disabling static analysis.
> We do see bugs where, if only people didn't initialize stuff to bogus
> values, then the bug would have been caught before it was merged.

Seconded, I've seen this a number of times.  I think this alone is a
reason not to initiaize locals if they don't require it.
 
> You might imagine that static analysis tools would catch NULL
> dereferences but it's actually really really hard.  We used to have
> an __uninitialized_var() macro which was used to silence GCC false
> positives, but now we initialize the pointers to NULL instead.  So
> most of the code that you're dealing with is stuff that was marked as
> too hard for GCC to understand.  It's tricky.
> 
> regards,
> dan carpenter
> 
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 3/5] powerpc-pseries: Delete an unnecessary variable initialisation in iommu_pseries_alloc_group()

2017-10-19 Thread David Gibson
On Thu, Oct 19, 2017 at 03:55:59PM +0300, Dan Carpenter wrote:
> On Thu, Oct 19, 2017 at 01:37:18PM +0200, Michal Suchánek wrote:
> > Hello,
> > 
> > On Wed, 18 Oct 2017 21:24:25 +0200
> > SF Markus Elfring  wrote:
> > 
> > > From: Markus Elfring 
> > > Date: Wed, 18 Oct 2017 19:14:39 +0200
> > > 
> > > The variable "table_group" will be set to an appropriate pointer.
> > > Thus omit the explicit initialisation at the beginning.
> > > 
> > > Signed-off-by: Markus Elfring 
> > > ---
> > >  arch/powerpc/platforms/pseries/iommu.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/arch/powerpc/platforms/pseries/iommu.c
> > > b/arch/powerpc/platforms/pseries/iommu.c index
> > > b37d4fb20d1c..b6c12b8e3ace 100644 ---
> > > a/arch/powerpc/platforms/pseries/iommu.c +++
> > > b/arch/powerpc/platforms/pseries/iommu.c @@ -55,7 +55,7 @@
> > >  
> > >  static struct iommu_table_group *iommu_pseries_alloc_group(int node)
> > >  {
> > > - struct iommu_table_group *table_group = NULL;
> > > + struct iommu_table_group *table_group;
> > >   struct iommu_table *tbl = NULL;
> > >   struct iommu_table_group_link *tgl = NULL;
> > >  
> > 
> > I think initializing pointers to NULL is generally a good idea.
> > 
> > If there is no use of the variable before it is reinitialized by
> > allocation gcc is free to optimize out the variable and its initial
> > value.
> > 
> > On the other hand, if the code is changed later and use of the variable
> > becomes possible you may crash (and get a gcc warning, too).
> 
> No, it's the opposite. GCC doesn't warn about potential NULL
> dereferences, it warns about uninitialized variables.  By initializing
> it to a bogus value, you're deliberately disabling static analysis.
> We do see bugs where, if only people didn't initialize stuff to bogus
> values, then the bug would have been caught before it was merged.

Seconded, I've seen this a number of times.  I think this alone is a
reason not to initiaize locals if they don't require it.
 
> You might imagine that static analysis tools would catch NULL
> dereferences but it's actually really really hard.  We used to have
> an __uninitialized_var() macro which was used to silence GCC false
> positives, but now we initialize the pointers to NULL instead.  So
> most of the code that you're dealing with is stuff that was marked as
> too hard for GCC to understand.  It's tricky.
> 
> regards,
> dan carpenter
> 
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: dtc issue with overlays starting in next-20171009

2017-10-18 Thread David Gibson
lem unless you have the base DT.

Right.  You can check some things in overlays, but when parsing
something requires information from the base tree, there is indeed not
much that can be done.

> > You can avoid the problem in your example dts with 
> > "-Wno-interrupts_property"
> >
> >   dtc -Wno-interrupts_property fpga_01_a.dts
> >
> > The larger set of other checks that might trigger the assert is too large
> > for me to want to add "-Wno-" flags for all of them to the command line
> > (as temporary workarounds).
> 
> David thought more switches were better.

Right, as a rule, I like fine configurability of the checks.

But.. there's a trick to make things easier to configure in batches:
when you enable a warning, you automatically enable all warnings that
warning has as prerequisites.  And when you disable a warning, every
other warning that has it as a prerequisite is also disabled.

So by adding some dummy checks with the right depencies, you can allow
enabling and disabling of groups of related checks with a single
command line option.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: dtc issue with overlays starting in next-20171009

2017-10-18 Thread David Gibson
ght.  You can check some things in overlays, but when parsing
something requires information from the base tree, there is indeed not
much that can be done.

> > You can avoid the problem in your example dts with 
> > "-Wno-interrupts_property"
> >
> >   dtc -Wno-interrupts_property fpga_01_a.dts
> >
> > The larger set of other checks that might trigger the assert is too large
> > for me to want to add "-Wno-" flags for all of them to the command line
> > (as temporary workarounds).
> 
> David thought more switches were better.

Right, as a rule, I like fine configurability of the checks.

But.. there's a trick to make things easier to configure in batches:
when you enable a warning, you automatically enable all warnings that
warning has as prerequisites.  And when you disable a warning, every
other warning that has it as a prerequisite is also disabled.

So by adding some dummy checks with the right depencies, you can allow
enabling and disabling of groups of related checks with a single
command line option.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: dtc issue with overlays starting in next-20171009

2017-10-18 Thread David Gibson
a78ec0
> >>>checks: add interrupts property check
> >>>
> >>> There are a bunch of other new checks that call get_node_by_phandle(),
> >>> and thus could trigger the assertion.
> >>>
> >>> I'm guessing that those checks would also trigger the assert if an
> >>> overlay contained something that would lead to one of the other checks
> >>> being processed.
> >>
> >> You won't get an assert because I check for 0 or -1 and skip the check
> >> in those cases. The interrupts check missed that condition.
> >
> > Awesome, thank for confirming that.  That means the temporary work around
> > is quite easy.
> >
> >>
> >> However, as shown above, you will get an erroneous warning because it
> >> just skips 1 cell and goes to the next to handle the case where the
> >> phandle is optional and you want a fixed number of elements.
> >
> > Just for those reading along at home, with the one warning disabled, the
> > messages become:
> >
> >$  dtc -Wno-interrupts_property 
> > socfpga_arria10_socdk_sdmmc_ghrd_ovl_ext_cfg.dts >junk.dtb
> >: Warning (unit_address_vs_reg): Node /fragment@0 has a unit 
> > name, but no reg property
> >: Warning (unit_address_vs_reg): Node /fragment@0/__overlay__ 
> > has a reg or ranges property, but no unit name
> 
> Look like errors to me.
> 
> >> I guess we can't validate overlays which is unfortunate. I don't think
> >> that's a solvable problem unless you have the base DT.
> >
> > Yes, maybe.  But there are still some useful warnings for overlays, maybe.  
> > I'm
> > not sure what the implications of the "range_format" warning that I will 
> > show
> > below are (I'd actually have to spend a few minutes thinking about ranges, 
> > and
> > I don't have the cycles right now).
> 
> Well, yes all the simple localized checks should work, but as we add
> more complex checks with either dtc or the schema we'll have after
> ELCE next week there's a lot we can't validate. Perhaps we have to
> place some restrictions on overlays so we can validate them better.

Reworking overlay handling in dtc will help at least insofar as making
it easier to work out which checks can be applied to overlay fragments
and which only make sense on a fully assembled tree.  There will still
be plenty of things that can't be checked on overlays, since they
absolitely require information from the base tree.

This is a reason I really prefer the (alas, stalled) connector
proposal, since it makes it much more explicit what the external
dependencies of the fragments are.

> I do wonder if we could add tags to phandles to make them
> identifiable. Perhaps making them all 0xFF?? instead of starting
> at 1. 2^24 phandles should be enough for anyone(TM).

True, in a sense.  However this is a problem for systems generating
FDTs from a real OF.  On real OF, the phandles are usually pointers
inside OF, and if it happens to allocate it's memory at the top of the
address space, phandles in the 0xff?? range are entirely
plausible.

0 and -1 are the only values explicitly disallowed by IEEE1275, so
they're the only safe values to use as markers.

> That would not
> completely solve this problem, but then we could at least find the
> phandles in a property.

No, it wouldn't do that reliably, because a 0xff?? value could
just as well appear as another integer value in a property.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: dtc issue with overlays starting in next-20171009

2017-10-18 Thread David Gibson
> >>>
> >>> There are a bunch of other new checks that call get_node_by_phandle(),
> >>> and thus could trigger the assertion.
> >>>
> >>> I'm guessing that those checks would also trigger the assert if an
> >>> overlay contained something that would lead to one of the other checks
> >>> being processed.
> >>
> >> You won't get an assert because I check for 0 or -1 and skip the check
> >> in those cases. The interrupts check missed that condition.
> >
> > Awesome, thank for confirming that.  That means the temporary work around
> > is quite easy.
> >
> >>
> >> However, as shown above, you will get an erroneous warning because it
> >> just skips 1 cell and goes to the next to handle the case where the
> >> phandle is optional and you want a fixed number of elements.
> >
> > Just for those reading along at home, with the one warning disabled, the
> > messages become:
> >
> >$  dtc -Wno-interrupts_property 
> > socfpga_arria10_socdk_sdmmc_ghrd_ovl_ext_cfg.dts >junk.dtb
> >: Warning (unit_address_vs_reg): Node /fragment@0 has a unit 
> > name, but no reg property
> >: Warning (unit_address_vs_reg): Node /fragment@0/__overlay__ 
> > has a reg or ranges property, but no unit name
> 
> Look like errors to me.
> 
> >> I guess we can't validate overlays which is unfortunate. I don't think
> >> that's a solvable problem unless you have the base DT.
> >
> > Yes, maybe.  But there are still some useful warnings for overlays, maybe.  
> > I'm
> > not sure what the implications of the "range_format" warning that I will 
> > show
> > below are (I'd actually have to spend a few minutes thinking about ranges, 
> > and
> > I don't have the cycles right now).
> 
> Well, yes all the simple localized checks should work, but as we add
> more complex checks with either dtc or the schema we'll have after
> ELCE next week there's a lot we can't validate. Perhaps we have to
> place some restrictions on overlays so we can validate them better.

Reworking overlay handling in dtc will help at least insofar as making
it easier to work out which checks can be applied to overlay fragments
and which only make sense on a fully assembled tree.  There will still
be plenty of things that can't be checked on overlays, since they
absolitely require information from the base tree.

This is a reason I really prefer the (alas, stalled) connector
proposal, since it makes it much more explicit what the external
dependencies of the fragments are.

> I do wonder if we could add tags to phandles to make them
> identifiable. Perhaps making them all 0xFF?? instead of starting
> at 1. 2^24 phandles should be enough for anyone(TM).

True, in a sense.  However this is a problem for systems generating
FDTs from a real OF.  On real OF, the phandles are usually pointers
inside OF, and if it happens to allocate it's memory at the top of the
address space, phandles in the 0xff?? range are entirely
plausible.

0 and -1 are the only values explicitly disallowed by IEEE1275, so
they're the only safe values to use as markers.

> That would not
> completely solve this problem, but then we could at least find the
> phandles in a property.

No, it wouldn't do that reliably, because a 0xff?? value could
just as well appear as another integer value in a property.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: dtc issue with overlays starting in next-20171009

2017-10-18 Thread David Gibson
tion is "/soc/base_fpga_region"
> _region {
> 
>   ranges = <0x 0x 0xc000 0x0004>,
><0x0001 0x 0xff20 0x1000>;
> 
>   external-fpga-config;
> 
>   #address-cells = <2>;
>   #size-cells = <1>;
> 
>   fpga_pr_region0 {
>   compatible = "fpga-region";
>   fpga-bridges = <_controller_0>;
>   ranges;
>   };
> 
>   freeze_controller_0: freeze_controller@10450 {
>   compatible = "altr,freeze-bridge-controller";
>   reg = <0x0001 0x0450 0x0010>;
>   interrupt-parent = <>;  /* <-- remove to 
> fix build */
>   interrupts = <0 21 4>;
>   };
> };
> 
> Of course, if this was a real transformation, I would remove all the
> extra tabbing.  But the current form makes it easier to see that all
> of the stuff that is highly indented is unchanged from the original
> dts file.
> 
> $ diff -u socfpga_arria10_socdk_sdmmc_ghrd_ovl_ext_cfg.dts fpga_01_a.dts
> --- socfpga_arria10_socdk_sdmmc_ghrd_ovl_ext_cfg.dts  2017-10-18 
> 15:18:43.123137508 -0700
> +++ fpga_01_a.dts 2017-10-18 15:18:11.587136897 -0700
> @@ -1,12 +1,10 @@
>  /dts-v1/;
>  /plugin/;
> -/ {
> - fragment@0 {
> - target-path = "/soc/base_fpga_region";
> - #address-cells = <1>;
> - #size-cells = <1>;
>  
> - __overlay__ {
> +// This assumes an existing label in the base dts
> +// whose location is "/soc/base_fpga_region"
> +_region {
> +
>   ranges = <0x 0x 0xc000 0x0004>,
><0x0001 0x 0xff20 0x1000>;
>  
> @@ -27,6 +25,4 @@
>   interrupt-parent = <>;  /* <-- remove to 
> fix build */
>   interrupts = <0 21 4>;
>   };
> - };
> - };
>  };
> 
> 
> -Frank

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: dtc issue with overlays starting in next-20171009

2017-10-18 Thread David Gibson
> _region {
> 
>   ranges = <0x 0x 0xc000 0x0004>,
><0x0001 0x 0xff20 0x1000>;
> 
>   external-fpga-config;
> 
>   #address-cells = <2>;
>   #size-cells = <1>;
> 
>   fpga_pr_region0 {
>   compatible = "fpga-region";
>   fpga-bridges = <_controller_0>;
>       ranges;
>   };
> 
>   freeze_controller_0: freeze_controller@10450 {
>   compatible = "altr,freeze-bridge-controller";
>   reg = <0x0001 0x0450 0x0010>;
>   interrupt-parent = <>;  /* <-- remove to 
> fix build */
>   interrupts = <0 21 4>;
>   };
> };
> 
> Of course, if this was a real transformation, I would remove all the
> extra tabbing.  But the current form makes it easier to see that all
> of the stuff that is highly indented is unchanged from the original
> dts file.
> 
> $ diff -u socfpga_arria10_socdk_sdmmc_ghrd_ovl_ext_cfg.dts fpga_01_a.dts
> --- socfpga_arria10_socdk_sdmmc_ghrd_ovl_ext_cfg.dts  2017-10-18 
> 15:18:43.123137508 -0700
> +++ fpga_01_a.dts 2017-10-18 15:18:11.587136897 -0700
> @@ -1,12 +1,10 @@
>  /dts-v1/;
>  /plugin/;
> -/ {
> - fragment@0 {
> - target-path = "/soc/base_fpga_region";
> - #address-cells = <1>;
> - #size-cells = <1>;
>  
> - __overlay__ {
> +// This assumes an existing label in the base dts
> +// whose location is "/soc/base_fpga_region"
> +_region {
> +
>   ranges = <0x 0x 0xc000 0x0004>,
><0x0001 0x 0xff20 0x1000>;
>  
> @@ -27,6 +25,4 @@
>   interrupt-parent = <>;  /* <-- remove to 
> fix build */
>   interrupts = <0 21 4>;
>   };
> - };
> - };
>  };
> 
> 
> -Frank

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [RFC] yamldt v0.5, now a DTS compiler too

2017-10-10 Thread David Gibson
On Tue, Oct 10, 2017 at 06:19:03PM +0300, Pantelis Antoniou wrote:
> Hi David,
> 
> > On Oct 10, 2017, at 04:50 , David Gibson <da...@gibson.dropbear.id.au> 
> > wrote:
> > 
> > On Mon, Oct 09, 2017 at 06:07:28PM +0300, Pantelis Antoniou wrote:
> >> Hi David,
> >> 
> >>> On Oct 9, 2017, at 03:00 , David Gibson <da...@gibson.dropbear.id.au> 
> >>> wrote:
> >>> 
> >>> On Sun, Oct 08, 2017 at 04:08:03PM -0700, Frank Rowand wrote:
> >>>> On 10/07/17 03:23, Pantelis Antoniou wrote:
> >>>>> Hi Rob,
> >>>>> 
> >>>>>> On Oct 6, 2017, at 16:55 , Rob Herring <robherri...@gmail.com> wrote:
> >>>>>> 
> >>>>>> On Tue, Oct 3, 2017 at 12:39 PM, Pantelis Antoniou
> >>>>>> <pantelis.anton...@konsulko.com> wrote:
> >>>>>>> Hi Rob,
> >>>> 
> >>>> < snip >
> >>>> 
> >>>>>>> eBPF is portable, can be serialized after compiling in the schema file
> >>>>>>> and can be executed in the kernel.
> >>>>>> 
> >>>>>> Executing in the kernel is a non-goal for me.
> >>>> 
> >>>> Executing in the kernel is an anti-goal for me.
> >>>> 
> >>>> We are trying to reduce the device tree footprint inside the kernel,
> >>>> not increase it.
> >>>> 
> >>>> 99.99% of the validation should be possible statically, in the compile
> >>>> phase.
> >>>> 
> >>>> 
> >>>>>>> By stripping out all documentation related properties and nodes 
> >>>>>>> keeping
> >>>>>>> only the compiled filters you can generate a dtb blob that passed to
> >>>>>>> kernel can be used for verification of all runtime changes in the
> >>>>>>> kernel's live tree. eBPF is enforcing an execution model that is 
> >>>>>>> 'safe'
> >>>>>>> so we can be sure that no foul play is possible.
> >>>> 
> >>>> Run time changes can be assumed correct (short of bugs in the overlay
> >>>> application code), if the base tree is validated, the overlay is 
> >>>> validated,
> >>>> and the interface between the live tree and the overlay is a
> >>>> connector.
> >>> 
> >>> In addition, no amount of schema validation can really protect the
> >>> kernel from a bad DT.  Even if the schemas can 100% verify that the DT
> >>> is "syntactically" correct, which is ambitious, it can't protect
> >>> against a DT which is in the right form, but contains information that
> >>> is simply wrong for the hardware in question.  That can stuff the
> >>> kernel at least as easily as an incorrectly formatted DT.
> >>> 
> >> 
> >> I disagree.
> >> 
> >> There are multiple levels of validation. For now we’re only talking about
> >> binding validation. There can be SoC level validation, board level 
> >> validation,
> >> revision level validation and finally application specific validation.
> >> 
> >> Binding validation is making sure properties/nodes follow the binding 
> >> document.
> >> For instance that for a foo device there’s a mandatory interrupt property.
> >> 
> >> Simplified
> >> 
> >> interrupt = ;
> >> 
> >> Binding validation would ‘catch’ errors like assigning a string or not 
> >> having the
> >> interrupt property available.
> >> 
> >> SoC level validation would list the available interrupt number that a given
> >> SoC would support for that device.
> >> 
> >> For example that interrupt can only take the values 10 or 99 in a given 
> >> SoC.
> >> 
> >> Board level validation would narrow this down even further to a value of 
> >> 10 for
> >> a given board model.
> > 
> >> Similar revision level validation would place further restriction on the 
> >> allowed
> >> configuration.
> >> 
> >> Finally application specific validation could place restriction based on 
> >> the intended
> >> application that piece of hardware is used for. For instance devices that 
> >> should not
> >> exceed a given power budget would have restrictions on the clock frequency

Re: [RFC] yamldt v0.5, now a DTS compiler too

2017-10-10 Thread David Gibson
On Tue, Oct 10, 2017 at 06:19:03PM +0300, Pantelis Antoniou wrote:
> Hi David,
> 
> > On Oct 10, 2017, at 04:50 , David Gibson  
> > wrote:
> > 
> > On Mon, Oct 09, 2017 at 06:07:28PM +0300, Pantelis Antoniou wrote:
> >> Hi David,
> >> 
> >>> On Oct 9, 2017, at 03:00 , David Gibson  
> >>> wrote:
> >>> 
> >>> On Sun, Oct 08, 2017 at 04:08:03PM -0700, Frank Rowand wrote:
> >>>> On 10/07/17 03:23, Pantelis Antoniou wrote:
> >>>>> Hi Rob,
> >>>>> 
> >>>>>> On Oct 6, 2017, at 16:55 , Rob Herring  wrote:
> >>>>>> 
> >>>>>> On Tue, Oct 3, 2017 at 12:39 PM, Pantelis Antoniou
> >>>>>>  wrote:
> >>>>>>> Hi Rob,
> >>>> 
> >>>> < snip >
> >>>> 
> >>>>>>> eBPF is portable, can be serialized after compiling in the schema file
> >>>>>>> and can be executed in the kernel.
> >>>>>> 
> >>>>>> Executing in the kernel is a non-goal for me.
> >>>> 
> >>>> Executing in the kernel is an anti-goal for me.
> >>>> 
> >>>> We are trying to reduce the device tree footprint inside the kernel,
> >>>> not increase it.
> >>>> 
> >>>> 99.99% of the validation should be possible statically, in the compile
> >>>> phase.
> >>>> 
> >>>> 
> >>>>>>> By stripping out all documentation related properties and nodes 
> >>>>>>> keeping
> >>>>>>> only the compiled filters you can generate a dtb blob that passed to
> >>>>>>> kernel can be used for verification of all runtime changes in the
> >>>>>>> kernel's live tree. eBPF is enforcing an execution model that is 
> >>>>>>> 'safe'
> >>>>>>> so we can be sure that no foul play is possible.
> >>>> 
> >>>> Run time changes can be assumed correct (short of bugs in the overlay
> >>>> application code), if the base tree is validated, the overlay is 
> >>>> validated,
> >>>> and the interface between the live tree and the overlay is a
> >>>> connector.
> >>> 
> >>> In addition, no amount of schema validation can really protect the
> >>> kernel from a bad DT.  Even if the schemas can 100% verify that the DT
> >>> is "syntactically" correct, which is ambitious, it can't protect
> >>> against a DT which is in the right form, but contains information that
> >>> is simply wrong for the hardware in question.  That can stuff the
> >>> kernel at least as easily as an incorrectly formatted DT.
> >>> 
> >> 
> >> I disagree.
> >> 
> >> There are multiple levels of validation. For now we’re only talking about
> >> binding validation. There can be SoC level validation, board level 
> >> validation,
> >> revision level validation and finally application specific validation.
> >> 
> >> Binding validation is making sure properties/nodes follow the binding 
> >> document.
> >> For instance that for a foo device there’s a mandatory interrupt property.
> >> 
> >> Simplified
> >> 
> >> interrupt = ;
> >> 
> >> Binding validation would ‘catch’ errors like assigning a string or not 
> >> having the
> >> interrupt property available.
> >> 
> >> SoC level validation would list the available interrupt number that a given
> >> SoC would support for that device.
> >> 
> >> For example that interrupt can only take the values 10 or 99 in a given 
> >> SoC.
> >> 
> >> Board level validation would narrow this down even further to a value of 
> >> 10 for
> >> a given board model.
> > 
> >> Similar revision level validation would place further restriction on the 
> >> allowed
> >> configuration.
> >> 
> >> Finally application specific validation could place restriction based on 
> >> the intended
> >> application that piece of hardware is used for. For instance devices that 
> >> should not
> >> exceed a given power budget would have restrictions on the clock frequency 
> >> of the processor
> >> or bus frequencies etc.
> > 
> > This doesn't help.  In order to do this, th

Re: [RFC] yamldt v0.5, now a DTS compiler too

2017-10-09 Thread David Gibson
On Mon, Oct 09, 2017 at 06:07:28PM +0300, Pantelis Antoniou wrote:
> Hi David,
> 
> > On Oct 9, 2017, at 03:00 , David Gibson <da...@gibson.dropbear.id.au> wrote:
> > 
> > On Sun, Oct 08, 2017 at 04:08:03PM -0700, Frank Rowand wrote:
> >> On 10/07/17 03:23, Pantelis Antoniou wrote:
> >>> Hi Rob,
> >>> 
> >>>> On Oct 6, 2017, at 16:55 , Rob Herring <robherri...@gmail.com> wrote:
> >>>> 
> >>>> On Tue, Oct 3, 2017 at 12:39 PM, Pantelis Antoniou
> >>>> <pantelis.anton...@konsulko.com> wrote:
> >>>>> Hi Rob,
> >> 
> >> < snip >
> >> 
> >>>>> eBPF is portable, can be serialized after compiling in the schema file
> >>>>> and can be executed in the kernel.
> >>>> 
> >>>> Executing in the kernel is a non-goal for me.
> >> 
> >> Executing in the kernel is an anti-goal for me.
> >> 
> >> We are trying to reduce the device tree footprint inside the kernel,
> >> not increase it.
> >> 
> >> 99.99% of the validation should be possible statically, in the compile
> >> phase.
> >> 
> >> 
> >>>>> By stripping out all documentation related properties and nodes keeping
> >>>>> only the compiled filters you can generate a dtb blob that passed to
> >>>>> kernel can be used for verification of all runtime changes in the
> >>>>> kernel's live tree. eBPF is enforcing an execution model that is 'safe'
> >>>>> so we can be sure that no foul play is possible.
> >> 
> >> Run time changes can be assumed correct (short of bugs in the overlay
> >> application code), if the base tree is validated, the overlay is validated,
> >> and the interface between the live tree and the overlay is a
> >> connector.
> > 
> > In addition, no amount of schema validation can really protect the
> > kernel from a bad DT.  Even if the schemas can 100% verify that the DT
> > is "syntactically" correct, which is ambitious, it can't protect
> > against a DT which is in the right form, but contains information that
> > is simply wrong for the hardware in question.  That can stuff the
> > kernel at least as easily as an incorrectly formatted DT.
> > 
> 
> I disagree.
> 
> There are multiple levels of validation. For now we’re only talking about
> binding validation. There can be SoC level validation, board level validation,
> revision level validation and finally application specific validation.
> 
> Binding validation is making sure properties/nodes follow the binding 
> document.
> For instance that for a foo device there’s a mandatory interrupt property.
> 
> Simplified
> 
> interrupt = ;
> 
> Binding validation would ‘catch’ errors like assigning a string or not having 
> the
> interrupt property available.
> 
> SoC level validation would list the available interrupt number that a given
> SoC would support for that device.
> 
> For example that interrupt can only take the values 10 or 99 in a given SoC.
> 
> Board level validation would narrow this down even further to a value of 10 
> for
> a given board model.

> Similar revision level validation would place further restriction on the 
> allowed
> configuration.
> 
> Finally application specific validation could place restriction based on the 
> intended
> application that piece of hardware is used for. For instance devices that 
> should not
> exceed a given power budget would have restrictions on the clock frequency of 
> the processor
> or bus frequencies etc.

This doesn't help.  In order to do this, the validator would need
information that's essentially equivalent to the content of DT, at
which point there's no point to the DT at all - and you're left with
the problem of validating the information that the validator has.

Fundamentally a validator that's useful *cannot* tell the difference
between a correct tree and one which _could_ be correct for some
theoretical hardware, but isn't for this particular hardware.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [RFC] yamldt v0.5, now a DTS compiler too

2017-10-09 Thread David Gibson
On Mon, Oct 09, 2017 at 06:07:28PM +0300, Pantelis Antoniou wrote:
> Hi David,
> 
> > On Oct 9, 2017, at 03:00 , David Gibson  wrote:
> > 
> > On Sun, Oct 08, 2017 at 04:08:03PM -0700, Frank Rowand wrote:
> >> On 10/07/17 03:23, Pantelis Antoniou wrote:
> >>> Hi Rob,
> >>> 
> >>>> On Oct 6, 2017, at 16:55 , Rob Herring  wrote:
> >>>> 
> >>>> On Tue, Oct 3, 2017 at 12:39 PM, Pantelis Antoniou
> >>>>  wrote:
> >>>>> Hi Rob,
> >> 
> >> < snip >
> >> 
> >>>>> eBPF is portable, can be serialized after compiling in the schema file
> >>>>> and can be executed in the kernel.
> >>>> 
> >>>> Executing in the kernel is a non-goal for me.
> >> 
> >> Executing in the kernel is an anti-goal for me.
> >> 
> >> We are trying to reduce the device tree footprint inside the kernel,
> >> not increase it.
> >> 
> >> 99.99% of the validation should be possible statically, in the compile
> >> phase.
> >> 
> >> 
> >>>>> By stripping out all documentation related properties and nodes keeping
> >>>>> only the compiled filters you can generate a dtb blob that passed to
> >>>>> kernel can be used for verification of all runtime changes in the
> >>>>> kernel's live tree. eBPF is enforcing an execution model that is 'safe'
> >>>>> so we can be sure that no foul play is possible.
> >> 
> >> Run time changes can be assumed correct (short of bugs in the overlay
> >> application code), if the base tree is validated, the overlay is validated,
> >> and the interface between the live tree and the overlay is a
> >> connector.
> > 
> > In addition, no amount of schema validation can really protect the
> > kernel from a bad DT.  Even if the schemas can 100% verify that the DT
> > is "syntactically" correct, which is ambitious, it can't protect
> > against a DT which is in the right form, but contains information that
> > is simply wrong for the hardware in question.  That can stuff the
> > kernel at least as easily as an incorrectly formatted DT.
> > 
> 
> I disagree.
> 
> There are multiple levels of validation. For now we’re only talking about
> binding validation. There can be SoC level validation, board level validation,
> revision level validation and finally application specific validation.
> 
> Binding validation is making sure properties/nodes follow the binding 
> document.
> For instance that for a foo device there’s a mandatory interrupt property.
> 
> Simplified
> 
> interrupt = ;
> 
> Binding validation would ‘catch’ errors like assigning a string or not having 
> the
> interrupt property available.
> 
> SoC level validation would list the available interrupt number that a given
> SoC would support for that device.
> 
> For example that interrupt can only take the values 10 or 99 in a given SoC.
> 
> Board level validation would narrow this down even further to a value of 10 
> for
> a given board model.

> Similar revision level validation would place further restriction on the 
> allowed
> configuration.
> 
> Finally application specific validation could place restriction based on the 
> intended
> application that piece of hardware is used for. For instance devices that 
> should not
> exceed a given power budget would have restrictions on the clock frequency of 
> the processor
> or bus frequencies etc.

This doesn't help.  In order to do this, the validator would need
information that's essentially equivalent to the content of DT, at
which point there's no point to the DT at all - and you're left with
the problem of validating the information that the validator has.

Fundamentally a validator that's useful *cannot* tell the difference
between a correct tree and one which _could_ be correct for some
theoretical hardware, but isn't for this particular hardware.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [RFC] yamldt v0.5, now a DTS compiler too

2017-10-08 Thread David Gibson
On Sun, Oct 08, 2017 at 04:08:03PM -0700, Frank Rowand wrote:
> On 10/07/17 03:23, Pantelis Antoniou wrote:
> > Hi Rob,
> > 
> >> On Oct 6, 2017, at 16:55 , Rob Herring <robherri...@gmail.com> wrote:
> >>
> >> On Tue, Oct 3, 2017 at 12:39 PM, Pantelis Antoniou
> >> <pantelis.anton...@konsulko.com> wrote:
> >>> Hi Rob,
> 
> < snip >
> 
> >>> eBPF is portable, can be serialized after compiling in the schema file
> >>> and can be executed in the kernel.
> >>
> >> Executing in the kernel is a non-goal for me.
> 
> Executing in the kernel is an anti-goal for me.
> 
> We are trying to reduce the device tree footprint inside the kernel,
> not increase it.
> 
> 99.99% of the validation should be possible statically, in the compile
> phase.
> 
> 
> >>> By stripping out all documentation related properties and nodes keeping
> >>> only the compiled filters you can generate a dtb blob that passed to
> >>> kernel can be used for verification of all runtime changes in the
> >>> kernel's live tree. eBPF is enforcing an execution model that is 'safe'
> >>> so we can be sure that no foul play is possible.
> 
> Run time changes can be assumed correct (short of bugs in the overlay
> application code), if the base tree is validated, the overlay is validated,
> and the interface between the live tree and the overlay is a
> connector.

In addition, no amount of schema validation can really protect the
kernel from a bad DT.  Even if the schemas can 100% verify that the DT
is "syntactically" correct, which is ambitious, it can't protect
against a DT which is in the right form, but contains information that
is simply wrong for the hardware in question.  That can stuff the
kernel at least as easily as an incorrectly formatted DT.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [RFC] yamldt v0.5, now a DTS compiler too

2017-10-08 Thread David Gibson
On Sun, Oct 08, 2017 at 04:08:03PM -0700, Frank Rowand wrote:
> On 10/07/17 03:23, Pantelis Antoniou wrote:
> > Hi Rob,
> > 
> >> On Oct 6, 2017, at 16:55 , Rob Herring  wrote:
> >>
> >> On Tue, Oct 3, 2017 at 12:39 PM, Pantelis Antoniou
> >>  wrote:
> >>> Hi Rob,
> 
> < snip >
> 
> >>> eBPF is portable, can be serialized after compiling in the schema file
> >>> and can be executed in the kernel.
> >>
> >> Executing in the kernel is a non-goal for me.
> 
> Executing in the kernel is an anti-goal for me.
> 
> We are trying to reduce the device tree footprint inside the kernel,
> not increase it.
> 
> 99.99% of the validation should be possible statically, in the compile
> phase.
> 
> 
> >>> By stripping out all documentation related properties and nodes keeping
> >>> only the compiled filters you can generate a dtb blob that passed to
> >>> kernel can be used for verification of all runtime changes in the
> >>> kernel's live tree. eBPF is enforcing an execution model that is 'safe'
> >>> so we can be sure that no foul play is possible.
> 
> Run time changes can be assumed correct (short of bugs in the overlay
> application code), if the base tree is validated, the overlay is validated,
> and the interface between the live tree and the overlay is a
> connector.

In addition, no amount of schema validation can really protect the
kernel from a bad DT.  Even if the schemas can 100% verify that the DT
is "syntactically" correct, which is ambitious, it can't protect
against a DT which is in the right form, but contains information that
is simply wrong for the hardware in question.  That can stuff the
kernel at least as easily as an incorrectly formatted DT.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [RFC PATCH v5 1/5] iommu: Add capabilities to a group

2017-08-09 Thread David Gibson
On Mon, Aug 07, 2017 at 05:25:44PM +1000, Alexey Kardashevskiy wrote:
> This introduces capabilities to IOMMU groups. The first defined
> capability is IOMMU_GROUP_CAP_ISOLATE_MSIX which tells the IOMMU
> group users that a particular IOMMU group is capable of MSIX message
> filtering; this is useful when deciding whether or not to allow mapping
> of MSIX table to the userspace. Various architectures will enable it
> when they decide that it is safe to do so.
> 
> Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru>

Reviewed-by: David Gibson <da...@gibson.dropbear.id.au>

This seems like a reasonable concept that's probably useful for
something, whether or not it's the best approach for the problem at
hand.

> ---
>  include/linux/iommu.h | 20 
>  drivers/iommu/iommu.c | 28 
>  2 files changed, 48 insertions(+)
> 
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 2cb54adc4a33..6b6f3c2f4904 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -155,6 +155,9 @@ struct iommu_resv_region {
>   enum iommu_resv_typetype;
>  };
>  
> +/* IOMMU group capabilities */
> +#define IOMMU_GROUP_CAP_ISOLATE_MSIX (1U)
> +
>  #ifdef CONFIG_IOMMU_API
>  
>  /**
> @@ -312,6 +315,11 @@ extern void *iommu_group_get_iommudata(struct 
> iommu_group *group);
>  extern void iommu_group_set_iommudata(struct iommu_group *group,
> void *iommu_data,
> void (*release)(void *iommu_data));
> +extern void iommu_group_set_caps(struct iommu_group *group,
> +  unsigned long clearcaps,
> +  unsigned long setcaps);
> +extern bool iommu_group_is_capable(struct iommu_group *group,
> +unsigned long cap);
>  extern int iommu_group_set_name(struct iommu_group *group, const char *name);
>  extern int iommu_group_add_device(struct iommu_group *group,
> struct device *dev);
> @@ -513,6 +521,18 @@ static inline void iommu_group_set_iommudata(struct 
> iommu_group *group,
>  {
>  }
>  
> +static inline void iommu_group_set_caps(struct iommu_group *group,
> + unsigned long clearcaps,
> + unsigned long setcaps)
> +{
> +}
> +
> +static inline bool iommu_group_is_capable(struct iommu_group *group,
> +   unsigned long cap)
> +{
> + return false;
> +}
> +
>  static inline int iommu_group_set_name(struct iommu_group *group,
>  const char *name)
>  {
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 3f6ea160afed..6b2c34fe2c3d 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -52,6 +52,7 @@ struct iommu_group {
>   void (*iommu_data_release)(void *iommu_data);
>   char *name;
>   int id;
> + unsigned long caps;
>   struct iommu_domain *default_domain;
>   struct iommu_domain *domain;
>  };
> @@ -447,6 +448,33 @@ void iommu_group_set_iommudata(struct iommu_group 
> *group, void *iommu_data,
>  EXPORT_SYMBOL_GPL(iommu_group_set_iommudata);
>  
>  /**
> + * iommu_group_set_caps - Change the group capabilities
> + * @group: the group
> + * @clearcaps: capabilities mask to remove
> + * @setcaps: capabilities mask to add
> + *
> + * IOMMU groups can be capable of various features which device drivers
> + * may read and adjust the behavior.
> + */
> +void iommu_group_set_caps(struct iommu_group *group,
> + unsigned long clearcaps, unsigned long setcaps)
> +{
> + group->caps &= ~clearcaps;
> + group->caps |= setcaps;
> +}
> +EXPORT_SYMBOL_GPL(iommu_group_set_caps);
> +
> +/**
> + * iommu_group_is_capable - Returns if a group capability is present
> + * @group: the group
> + */
> +bool iommu_group_is_capable(struct iommu_group *group, unsigned long cap)
> +{
> + return !!(group->caps & cap);
> +}
> +EXPORT_SYMBOL_GPL(iommu_group_is_capable);
> +
> +/**
>   * iommu_group_set_name - set name for a group
>   * @group: the group
>   * @name: name

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [RFC PATCH v5 1/5] iommu: Add capabilities to a group

2017-08-09 Thread David Gibson
On Mon, Aug 07, 2017 at 05:25:44PM +1000, Alexey Kardashevskiy wrote:
> This introduces capabilities to IOMMU groups. The first defined
> capability is IOMMU_GROUP_CAP_ISOLATE_MSIX which tells the IOMMU
> group users that a particular IOMMU group is capable of MSIX message
> filtering; this is useful when deciding whether or not to allow mapping
> of MSIX table to the userspace. Various architectures will enable it
> when they decide that it is safe to do so.
> 
> Signed-off-by: Alexey Kardashevskiy 

Reviewed-by: David Gibson 

This seems like a reasonable concept that's probably useful for
something, whether or not it's the best approach for the problem at
hand.

> ---
>  include/linux/iommu.h | 20 
>  drivers/iommu/iommu.c | 28 
>  2 files changed, 48 insertions(+)
> 
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 2cb54adc4a33..6b6f3c2f4904 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -155,6 +155,9 @@ struct iommu_resv_region {
>   enum iommu_resv_typetype;
>  };
>  
> +/* IOMMU group capabilities */
> +#define IOMMU_GROUP_CAP_ISOLATE_MSIX (1U)
> +
>  #ifdef CONFIG_IOMMU_API
>  
>  /**
> @@ -312,6 +315,11 @@ extern void *iommu_group_get_iommudata(struct 
> iommu_group *group);
>  extern void iommu_group_set_iommudata(struct iommu_group *group,
> void *iommu_data,
> void (*release)(void *iommu_data));
> +extern void iommu_group_set_caps(struct iommu_group *group,
> +  unsigned long clearcaps,
> +  unsigned long setcaps);
> +extern bool iommu_group_is_capable(struct iommu_group *group,
> +unsigned long cap);
>  extern int iommu_group_set_name(struct iommu_group *group, const char *name);
>  extern int iommu_group_add_device(struct iommu_group *group,
> struct device *dev);
> @@ -513,6 +521,18 @@ static inline void iommu_group_set_iommudata(struct 
> iommu_group *group,
>  {
>  }
>  
> +static inline void iommu_group_set_caps(struct iommu_group *group,
> + unsigned long clearcaps,
> + unsigned long setcaps)
> +{
> +}
> +
> +static inline bool iommu_group_is_capable(struct iommu_group *group,
> +   unsigned long cap)
> +{
> + return false;
> +}
> +
>  static inline int iommu_group_set_name(struct iommu_group *group,
>  const char *name)
>  {
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 3f6ea160afed..6b2c34fe2c3d 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -52,6 +52,7 @@ struct iommu_group {
>   void (*iommu_data_release)(void *iommu_data);
>   char *name;
>   int id;
> + unsigned long caps;
>   struct iommu_domain *default_domain;
>   struct iommu_domain *domain;
>  };
> @@ -447,6 +448,33 @@ void iommu_group_set_iommudata(struct iommu_group 
> *group, void *iommu_data,
>  EXPORT_SYMBOL_GPL(iommu_group_set_iommudata);
>  
>  /**
> + * iommu_group_set_caps - Change the group capabilities
> + * @group: the group
> + * @clearcaps: capabilities mask to remove
> + * @setcaps: capabilities mask to add
> + *
> + * IOMMU groups can be capable of various features which device drivers
> + * may read and adjust the behavior.
> + */
> +void iommu_group_set_caps(struct iommu_group *group,
> + unsigned long clearcaps, unsigned long setcaps)
> +{
> + group->caps &= ~clearcaps;
> + group->caps |= setcaps;
> +}
> +EXPORT_SYMBOL_GPL(iommu_group_set_caps);
> +
> +/**
> + * iommu_group_is_capable - Returns if a group capability is present
> + * @group: the group
> + */
> +bool iommu_group_is_capable(struct iommu_group *group, unsigned long cap)
> +{
> + return !!(group->caps & cap);
> +}
> +EXPORT_SYMBOL_GPL(iommu_group_is_capable);
> +
> +/**
>   * iommu_group_set_name - set name for a group
>   * @group: the group
>   * @name: name

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [RFC PATCH v5 5/5] vfio-pci: Allow to expose MSI-X table to userspace when safe

2017-08-09 Thread David Gibson
On Mon, Aug 07, 2017 at 05:25:48PM +1000, Alexey Kardashevskiy wrote:
1;4803;0c> Some devices have a MSIX BAR not aligned to the system page size
> greater than 4K (like 64k for ppc64) which at the moment prevents
> such MMIO pages from being mapped to the userspace for the sake of
> the MSIX BAR content protection. If such page happens to share
> the same system page with some frequently accessed registers,
> the entire system page will be emulated which can seriously affect
> performance.
> 
> This allows mapping of MSI-X tables to userspace if hardware provides
> MSIX isolation via interrupt remapping or filtering; in other words
> allowing direct access to the MSIX BAR won't do any harm to other devices
> or cause spurious interrupts visible to the kernel.
> 
> This adds a wrapping helper to check if a capability is supported by
> an IOMMU group.
> 
> Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru>

Reviewed-by: David Gibson <da...@gibson.dropbear.id.au>

> ---
>  include/linux/vfio.h |  1 +
>  drivers/vfio/pci/vfio_pci.c  | 20 +---
>  drivers/vfio/pci/vfio_pci_rdwr.c |  5 -
>  drivers/vfio/vfio.c  | 15 +++
>  4 files changed, 37 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index 586809abb273..7110bca2fb60 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -46,6 +46,7 @@ struct vfio_device_ops {
>  
>  extern struct iommu_group *vfio_iommu_group_get(struct device *dev);
>  extern void vfio_iommu_group_put(struct iommu_group *group, struct device 
> *dev);
> +extern bool vfio_iommu_group_is_capable(struct device *dev, unsigned long 
> cap);

This diff probably belongs in the earlier patch adding the function,
rather than here where it's first used.  Not worth respinning just for
that, though.

>  extern int vfio_add_group_dev(struct device *dev,
> const struct vfio_device_ops *ops,
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index d87a0a3cda14..c4c39ed64b1e 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -561,11 +561,17 @@ static int msix_sparse_mmap_cap(struct vfio_pci_device 
> *vdev,
>   struct vfio_region_info_cap_sparse_mmap *sparse;
>   size_t end, size;
>   int nr_areas = 2, i = 0, ret;
> + bool is_msix_isolated = vfio_iommu_group_is_capable(>pdev->dev,
> + IOMMU_GROUP_CAP_ISOLATE_MSIX);
>  
>   end = pci_resource_len(vdev->pdev, vdev->msix_bar);
>  
> - /* If MSI-X table is aligned to the start or end, only one area */
> - if (((vdev->msix_offset & PAGE_MASK) == 0) ||
> + /*
> +  * If MSI-X table is allowed to mmap because of the capability
> +  * of IRQ remapping or aligned to the start or end, only one area
> +  */
> + if (is_msix_isolated ||
> + ((vdev->msix_offset & PAGE_MASK) == 0) ||
>   (PAGE_ALIGN(vdev->msix_offset + vdev->msix_size) >= end))
>   nr_areas = 1;
>  
> @@ -577,6 +583,12 @@ static int msix_sparse_mmap_cap(struct vfio_pci_device 
> *vdev,
>  
>   sparse->nr_areas = nr_areas;
>  
> + if (is_msix_isolated) {
> + sparse->areas[i].offset = 0;
> + sparse->areas[i].size = end;
> + return 0;
> + }
> +
>   if (vdev->msix_offset & PAGE_MASK) {
>   sparse->areas[i].offset = 0;
>   sparse->areas[i].size = vdev->msix_offset & PAGE_MASK;
> @@ -1094,6 +1106,8 @@ static int vfio_pci_mmap(void *device_data, struct 
> vm_area_struct *vma)
>   unsigned int index;
>   u64 phys_len, req_len, pgoff, req_start;
>   int ret;
> + bool is_msix_isolated = vfio_iommu_group_is_capable(>pdev->dev,
> + IOMMU_GROUP_CAP_ISOLATE_MSIX);
>  
>   index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT);
>  
> @@ -1115,7 +1129,7 @@ static int vfio_pci_mmap(void *device_data, struct 
> vm_area_struct *vma)
>   if (req_start + req_len > phys_len)
>   return -EINVAL;
>  
> - if (index == vdev->msix_bar) {
> + if (index == vdev->msix_bar && !is_msix_isolated) {
>   /*
>* Disallow mmaps overlapping the MSI-X table; users don't
>* get to touch this directly.  We could find somewhere
> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c 
> b/drivers/vfio/pci/vfio_pci_rdwr.c
> index 357243d76f10..7514206a5ea7 100644
> --- a/drivers/vfio/pci/vfio_pci_rdwr.c
> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c

Re: [RFC PATCH v5 5/5] vfio-pci: Allow to expose MSI-X table to userspace when safe

2017-08-09 Thread David Gibson
On Mon, Aug 07, 2017 at 05:25:48PM +1000, Alexey Kardashevskiy wrote:
1;4803;0c> Some devices have a MSIX BAR not aligned to the system page size
> greater than 4K (like 64k for ppc64) which at the moment prevents
> such MMIO pages from being mapped to the userspace for the sake of
> the MSIX BAR content protection. If such page happens to share
> the same system page with some frequently accessed registers,
> the entire system page will be emulated which can seriously affect
> performance.
> 
> This allows mapping of MSI-X tables to userspace if hardware provides
> MSIX isolation via interrupt remapping or filtering; in other words
> allowing direct access to the MSIX BAR won't do any harm to other devices
> or cause spurious interrupts visible to the kernel.
> 
> This adds a wrapping helper to check if a capability is supported by
> an IOMMU group.
> 
> Signed-off-by: Alexey Kardashevskiy 

Reviewed-by: David Gibson 

> ---
>  include/linux/vfio.h |  1 +
>  drivers/vfio/pci/vfio_pci.c  | 20 +---
>  drivers/vfio/pci/vfio_pci_rdwr.c |  5 -
>  drivers/vfio/vfio.c  | 15 +++
>  4 files changed, 37 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index 586809abb273..7110bca2fb60 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -46,6 +46,7 @@ struct vfio_device_ops {
>  
>  extern struct iommu_group *vfio_iommu_group_get(struct device *dev);
>  extern void vfio_iommu_group_put(struct iommu_group *group, struct device 
> *dev);
> +extern bool vfio_iommu_group_is_capable(struct device *dev, unsigned long 
> cap);

This diff probably belongs in the earlier patch adding the function,
rather than here where it's first used.  Not worth respinning just for
that, though.

>  extern int vfio_add_group_dev(struct device *dev,
> const struct vfio_device_ops *ops,
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index d87a0a3cda14..c4c39ed64b1e 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -561,11 +561,17 @@ static int msix_sparse_mmap_cap(struct vfio_pci_device 
> *vdev,
>   struct vfio_region_info_cap_sparse_mmap *sparse;
>   size_t end, size;
>   int nr_areas = 2, i = 0, ret;
> + bool is_msix_isolated = vfio_iommu_group_is_capable(>pdev->dev,
> + IOMMU_GROUP_CAP_ISOLATE_MSIX);
>  
>   end = pci_resource_len(vdev->pdev, vdev->msix_bar);
>  
> - /* If MSI-X table is aligned to the start or end, only one area */
> - if (((vdev->msix_offset & PAGE_MASK) == 0) ||
> + /*
> +  * If MSI-X table is allowed to mmap because of the capability
> +  * of IRQ remapping or aligned to the start or end, only one area
> +  */
> + if (is_msix_isolated ||
> + ((vdev->msix_offset & PAGE_MASK) == 0) ||
>   (PAGE_ALIGN(vdev->msix_offset + vdev->msix_size) >= end))
>   nr_areas = 1;
>  
> @@ -577,6 +583,12 @@ static int msix_sparse_mmap_cap(struct vfio_pci_device 
> *vdev,
>  
>   sparse->nr_areas = nr_areas;
>  
> + if (is_msix_isolated) {
> + sparse->areas[i].offset = 0;
> + sparse->areas[i].size = end;
> + return 0;
> + }
> +
>   if (vdev->msix_offset & PAGE_MASK) {
>   sparse->areas[i].offset = 0;
>   sparse->areas[i].size = vdev->msix_offset & PAGE_MASK;
> @@ -1094,6 +1106,8 @@ static int vfio_pci_mmap(void *device_data, struct 
> vm_area_struct *vma)
>   unsigned int index;
>   u64 phys_len, req_len, pgoff, req_start;
>   int ret;
> + bool is_msix_isolated = vfio_iommu_group_is_capable(>pdev->dev,
> + IOMMU_GROUP_CAP_ISOLATE_MSIX);
>  
>   index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT);
>  
> @@ -1115,7 +1129,7 @@ static int vfio_pci_mmap(void *device_data, struct 
> vm_area_struct *vma)
>   if (req_start + req_len > phys_len)
>   return -EINVAL;
>  
> - if (index == vdev->msix_bar) {
> + if (index == vdev->msix_bar && !is_msix_isolated) {
>   /*
>* Disallow mmaps overlapping the MSI-X table; users don't
>* get to touch this directly.  We could find somewhere
> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c 
> b/drivers/vfio/pci/vfio_pci_rdwr.c
> index 357243d76f10..7514206a5ea7 100644
> --- a/drivers/vfio/pci/vfio_pci_rdwr.c
> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
> @@ -18,6 +18,7 @@
>  #include 
>  #include 
>

Re: [PATCH v2] include/linux/vfio.h: Guard powerpc-specific functions with CONFIG_VFIO_SPAPR_EEH

2017-07-25 Thread David Gibson
On Tue, Jul 18, 2017 at 02:22:20PM -0300, Murilo Opsfelder Araujo wrote:
> When CONFIG_EEH=y and CONFIG_VFIO_SPAPR_EEH=n, build fails with the
> following:
> 
> drivers/vfio/pci/vfio_pci.o: In function `.vfio_pci_release':
> vfio_pci.c:(.text+0xa98): undefined reference to 
> `.vfio_spapr_pci_eeh_release'
> drivers/vfio/pci/vfio_pci.o: In function `.vfio_pci_open':
> vfio_pci.c:(.text+0x1420): undefined reference to 
> `.vfio_spapr_pci_eeh_open'
> 
> In this case, vfio_pci.c should use the empty definitions of
> vfio_spapr_pci_eeh_open and vfio_spapr_pci_eeh_release functions.
> 
> This patch fixes it by guarding these function definitions with
> CONFIG_VFIO_SPAPR_EEH, the symbol that controls whether vfio_spapr_eeh.c is
> built, which is where the non-empty versions of these functions are. We need 
> to
> make use of IS_ENABLED() macro because CONFIG_VFIO_SPAPR_EEH is a tristate
> option.
> 
> This issue was found during a randconfig build. Logs are here:
> 
> http://kisskb.ellerman.id.au/kisskb/buildresult/12982362/
> 
> Signed-off-by: Murilo Opsfelder Araujo <mopsfel...@gmail.com>

Reviewed-by: David Gibson <da...@gibson.dropbear.id.au>

> ---
> 
> Changes from v1:
> - Rebased on top of next-20170718.
> 
>  include/linux/vfio.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index 586809a..a47b985 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -152,7 +152,7 @@ extern int vfio_set_irqs_validate_and_prepare(struct 
> vfio_irq_set *hdr,
> size_t *data_size);
> 
>  struct pci_dev;
> -#ifdef CONFIG_EEH
> +#if IS_ENABLED(CONFIG_VFIO_SPAPR_EEH)
>  extern void vfio_spapr_pci_eeh_open(struct pci_dev *pdev);
>  extern void vfio_spapr_pci_eeh_release(struct pci_dev *pdev);
>  extern long vfio_spapr_iommu_eeh_ioctl(struct iommu_group *group,
> @@ -173,7 +173,7 @@ static inline long vfio_spapr_iommu_eeh_ioctl(struct 
> iommu_group *group,
>  {
>   return -ENOTTY;
>  }
> -#endif /* CONFIG_EEH */
> +#endif /* CONFIG_VFIO_SPAPR_EEH */
> 
>  /*
>   * IRQfd - generic

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v2] include/linux/vfio.h: Guard powerpc-specific functions with CONFIG_VFIO_SPAPR_EEH

2017-07-25 Thread David Gibson
On Tue, Jul 18, 2017 at 02:22:20PM -0300, Murilo Opsfelder Araujo wrote:
> When CONFIG_EEH=y and CONFIG_VFIO_SPAPR_EEH=n, build fails with the
> following:
> 
> drivers/vfio/pci/vfio_pci.o: In function `.vfio_pci_release':
> vfio_pci.c:(.text+0xa98): undefined reference to 
> `.vfio_spapr_pci_eeh_release'
> drivers/vfio/pci/vfio_pci.o: In function `.vfio_pci_open':
> vfio_pci.c:(.text+0x1420): undefined reference to 
> `.vfio_spapr_pci_eeh_open'
> 
> In this case, vfio_pci.c should use the empty definitions of
> vfio_spapr_pci_eeh_open and vfio_spapr_pci_eeh_release functions.
> 
> This patch fixes it by guarding these function definitions with
> CONFIG_VFIO_SPAPR_EEH, the symbol that controls whether vfio_spapr_eeh.c is
> built, which is where the non-empty versions of these functions are. We need 
> to
> make use of IS_ENABLED() macro because CONFIG_VFIO_SPAPR_EEH is a tristate
> option.
> 
> This issue was found during a randconfig build. Logs are here:
> 
> http://kisskb.ellerman.id.au/kisskb/buildresult/12982362/
> 
> Signed-off-by: Murilo Opsfelder Araujo 

Reviewed-by: David Gibson 

> ---
> 
> Changes from v1:
> - Rebased on top of next-20170718.
> 
>  include/linux/vfio.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index 586809a..a47b985 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -152,7 +152,7 @@ extern int vfio_set_irqs_validate_and_prepare(struct 
> vfio_irq_set *hdr,
> size_t *data_size);
> 
>  struct pci_dev;
> -#ifdef CONFIG_EEH
> +#if IS_ENABLED(CONFIG_VFIO_SPAPR_EEH)
>  extern void vfio_spapr_pci_eeh_open(struct pci_dev *pdev);
>  extern void vfio_spapr_pci_eeh_release(struct pci_dev *pdev);
>  extern long vfio_spapr_iommu_eeh_ioctl(struct iommu_group *group,
> @@ -173,7 +173,7 @@ static inline long vfio_spapr_iommu_eeh_ioctl(struct 
> iommu_group *group,
>  {
>   return -ENOTTY;
>  }
> -#endif /* CONFIG_EEH */
> +#endif /* CONFIG_VFIO_SPAPR_EEH */
> 
>  /*
>   * IRQfd - generic

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH] powerpc/pseries: Fix of_node_put() underflow during pseries remove

2017-07-23 Thread David Gibson
On Fri, Jul 21, 2017 at 04:51:39PM +0200, Laurent Vivier wrote:
> As for commit 68baf692c435 ("powerpc/pseries: Fix of_node_put()
> underflow during DLPAR remove"), the call to of_node_put()
> must be removed from pSeries_reconfig_remove_node().
> 
> dlpar_detach_node() and pSeries_reconfig_remove_node() call
> of_detach_node(), and thus the node should not be released
> in this case too.
> 
> Signed-off-by: Laurent Vivier <lviv...@redhat.com>

Reviewed-by: David Gibson <da...@gibson.dropbear.id.au>

> ---
>  arch/powerpc/platforms/pseries/reconfig.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/reconfig.c 
> b/arch/powerpc/platforms/pseries/reconfig.c
> index e5bf1e8..011ef21 100644
> --- a/arch/powerpc/platforms/pseries/reconfig.c
> +++ b/arch/powerpc/platforms/pseries/reconfig.c
> @@ -82,7 +82,6 @@ static int pSeries_reconfig_remove_node(struct device_node 
> *np)
>  
>   of_detach_node(np);
>   of_node_put(parent);
> - of_node_put(np); /* Must decrement the refcount */
>   return 0;
>  }
>  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH] powerpc/pseries: Fix of_node_put() underflow during pseries remove

2017-07-23 Thread David Gibson
On Fri, Jul 21, 2017 at 04:51:39PM +0200, Laurent Vivier wrote:
> As for commit 68baf692c435 ("powerpc/pseries: Fix of_node_put()
> underflow during DLPAR remove"), the call to of_node_put()
> must be removed from pSeries_reconfig_remove_node().
> 
> dlpar_detach_node() and pSeries_reconfig_remove_node() call
> of_detach_node(), and thus the node should not be released
> in this case too.
> 
> Signed-off-by: Laurent Vivier 

Reviewed-by: David Gibson 

> ---
>  arch/powerpc/platforms/pseries/reconfig.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/reconfig.c 
> b/arch/powerpc/platforms/pseries/reconfig.c
> index e5bf1e8..011ef21 100644
> --- a/arch/powerpc/platforms/pseries/reconfig.c
> +++ b/arch/powerpc/platforms/pseries/reconfig.c
> @@ -82,7 +82,6 @@ static int pSeries_reconfig_remove_node(struct device_node 
> *np)
>  
>   of_detach_node(np);
>   of_node_put(parent);
> - of_node_put(np); /* Must decrement the refcount */
>   return 0;
>  }
>  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


[PATCH] virtio_scsi: Always try to read VPD pages

2017-04-12 Thread David Gibson
Passed through SCSI targets may have transfer limits which come from the
host SCSI controller something on the host side other than the target
itself.

To make this work properly, the hypervisor can adjust the target's VPD
information to advertise these limits.  But for that to work, the guest has
to look at the VPD pages, which we won't do by default if it is an SPC-2
device, even if it does actually support it.

This adds a workaround to address this, forcing devices attached to a
virtio-scsi controller to always check the VPD pages.  This is modelled on
a similar workaround for the storvsc (Hyper-V) SCSI controller, although
that exists for slightly different reasons.

A specific case which causes this is a volume from IBM's IPR RAID
controller (which presents as an SPC-2 device, although it does support
VPD) passed through with qemu's 'scsi-block' device.

Signed-off-by: David Gibson <da...@gibson.dropbear.id.au>
---
 drivers/scsi/virtio_scsi.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 939c47d..961a1fc 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -705,6 +706,28 @@ static int virtscsi_device_reset(struct scsi_cmnd *sc)
return virtscsi_tmf(vscsi, cmd);
 }
 
+static int virtscsi_device_alloc(struct scsi_device *sdevice)
+{
+   /*
+* Passed through SCSI targets (e.g. with qemu's 'scsi-block')
+* may have transfer limits which come from the host SCSI
+* controller something on the host side other than the target
+* itself.
+*
+* To make this work properly, the hypervisor can adjust the
+* target's VPD information to advertise these limits.  But
+* for that to work, the guest has to look at the VPD pages,
+* which we won't do by default if it is an SPC-2 device, even
+* if it does actually support it.
+*
+* So, set the blist to always try to read the VPD pages.
+*/
+   sdevice->sdev_bflags = BLIST_TRY_VPD_PAGES;
+
+   return 0;
+}
+
+
 /**
  * virtscsi_change_queue_depth() - Change a virtscsi target's queue depth
  * @sdev:  Virtscsi target whose queue depth to change
@@ -783,6 +806,7 @@ static struct scsi_host_template 
virtscsi_host_template_single = {
.change_queue_depth = virtscsi_change_queue_depth,
.eh_abort_handler = virtscsi_abort,
.eh_device_reset_handler = virtscsi_device_reset,
+   .slave_alloc = virtscsi_device_alloc,
 
.can_queue = 1024,
.dma_boundary = UINT_MAX,
-- 
2.9.3



[PATCH] virtio_scsi: Always try to read VPD pages

2017-04-12 Thread David Gibson
Passed through SCSI targets may have transfer limits which come from the
host SCSI controller something on the host side other than the target
itself.

To make this work properly, the hypervisor can adjust the target's VPD
information to advertise these limits.  But for that to work, the guest has
to look at the VPD pages, which we won't do by default if it is an SPC-2
device, even if it does actually support it.

This adds a workaround to address this, forcing devices attached to a
virtio-scsi controller to always check the VPD pages.  This is modelled on
a similar workaround for the storvsc (Hyper-V) SCSI controller, although
that exists for slightly different reasons.

A specific case which causes this is a volume from IBM's IPR RAID
controller (which presents as an SPC-2 device, although it does support
VPD) passed through with qemu's 'scsi-block' device.

Signed-off-by: David Gibson 
---
 drivers/scsi/virtio_scsi.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 939c47d..961a1fc 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -705,6 +706,28 @@ static int virtscsi_device_reset(struct scsi_cmnd *sc)
return virtscsi_tmf(vscsi, cmd);
 }
 
+static int virtscsi_device_alloc(struct scsi_device *sdevice)
+{
+   /*
+* Passed through SCSI targets (e.g. with qemu's 'scsi-block')
+* may have transfer limits which come from the host SCSI
+* controller something on the host side other than the target
+* itself.
+*
+* To make this work properly, the hypervisor can adjust the
+* target's VPD information to advertise these limits.  But
+* for that to work, the guest has to look at the VPD pages,
+* which we won't do by default if it is an SPC-2 device, even
+* if it does actually support it.
+*
+* So, set the blist to always try to read the VPD pages.
+*/
+   sdevice->sdev_bflags = BLIST_TRY_VPD_PAGES;
+
+   return 0;
+}
+
+
 /**
  * virtscsi_change_queue_depth() - Change a virtscsi target's queue depth
  * @sdev:  Virtscsi target whose queue depth to change
@@ -783,6 +806,7 @@ static struct scsi_host_template 
virtscsi_host_template_single = {
.change_queue_depth = virtscsi_change_queue_depth,
.eh_abort_handler = virtscsi_abort,
.eh_device_reset_handler = virtscsi_device_reset,
+   .slave_alloc = virtscsi_device_alloc,
 
.can_queue = 1024,
.dma_boundary = UINT_MAX,
-- 
2.9.3



[PATCH] powerpc/pseries: Don't give a warning when HPT resizing isn't available

2017-03-16 Thread David Gibson
As of 438cc81a41 "powerpc/pseries: Automatically resize HPT for memory hot
add/remove" when running on the pseries platform, we always attempt to
use the PAPR extension to resize the hashed page table (HPT) when we add
or remove memory.

This is fine, but when the extension is available we'll give a harmless,
but scary warning.  This patch suppresses the warning in this case.  It
will still warn if the feature is supposed to be available, but didn't
work.

Signed-off-by: David Gibson <da...@gibson.dropbear.id.au>
---
 arch/powerpc/mm/hash_utils_64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Kind of cosmetic, but getting an error message on every memory hot
plug/unplug attempt if your host doesn't support HPT resizing is
pretty ugly.  So I think this is a candidate for quick inclusion.

diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index c554768..cc16e4f 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -771,7 +771,7 @@ void resize_hpt_for_hotplug(unsigned long new_mem_size)
int rc;
 
rc = mmu_hash_ops.resize_hpt(target_hpt_shift);
-   if (rc)
+   if (rc && (rc != -ENODEV))
printk(KERN_WARNING
   "Unable to resize hash page table to target 
order %d: %d\n",
   target_hpt_shift, rc);
-- 
2.9.3



[PATCH] powerpc/pseries: Don't give a warning when HPT resizing isn't available

2017-03-16 Thread David Gibson
As of 438cc81a41 "powerpc/pseries: Automatically resize HPT for memory hot
add/remove" when running on the pseries platform, we always attempt to
use the PAPR extension to resize the hashed page table (HPT) when we add
or remove memory.

This is fine, but when the extension is available we'll give a harmless,
but scary warning.  This patch suppresses the warning in this case.  It
will still warn if the feature is supposed to be available, but didn't
work.

Signed-off-by: David Gibson 
---
 arch/powerpc/mm/hash_utils_64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Kind of cosmetic, but getting an error message on every memory hot
plug/unplug attempt if your host doesn't support HPT resizing is
pretty ugly.  So I think this is a candidate for quick inclusion.

diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index c554768..cc16e4f 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -771,7 +771,7 @@ void resize_hpt_for_hotplug(unsigned long new_mem_size)
int rc;
 
rc = mmu_hash_ops.resize_hpt(target_hpt_shift);
-   if (rc)
+   if (rc && (rc != -ENODEV))
printk(KERN_WARNING
   "Unable to resize hash page table to target 
order %d: %d\n",
   target_hpt_shift, rc);
-- 
2.9.3



Re: [PATCH] KVM: Prevent double-free on HPT resize commit path

2017-02-27 Thread David Gibson
On Wed, Feb 15, 2017 at 02:40:04PM +1100, David Gibson wrote:
> resize_hpt_release(), called once the HPT resize of a KVM guest is
> completed (successfully or unsuccessfully) free()s the state structure for
> the resize.  It is currently not safe to call with a NULL pointer.
> 
> However, one of the error paths in kvm_vm_ioctl_resize_hpt_commit() can
> invoke it with a NULL pointer.  This will occur if userspace improperly
> invokes KVM_PPC_RESIZE_HPT_COMMIT without previously calling
> KVM_PPC_RESIZE_HPT_PREPARE, or if it calls COMMIT twice without an
> intervening PREPARE.
> 
> To fix this potential crash bug - and maybe others like it, make it safe
> (and a no-op) to call resize_hpt_release() with a NULL resize pointer.
> 
> Found by Dan Carpenter with a static checker.
> 
> Reported-by: Dan Carpenter <dan.carpen...@oracle.com>
> Signed-off-by: David Gibson <da...@gibson.dropbear.id.au>

Ping,

Paul have you taken this one?

> ---
>  arch/powerpc/kvm/book3s_64_mmu_hv.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
> b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index 013552f..72ccac2 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -1407,6 +1407,9 @@ static void resize_hpt_release(struct kvm *kvm, struct 
> kvm_resize_hpt *resize)
>  {
>   BUG_ON(kvm->arch.resize_hpt != resize);
>  
> + if (!resize)
> + return;
> +
>   if (resize->hpt.virt)
>   kvmppc_free_hpt(>hpt);
>  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH] KVM: Prevent double-free on HPT resize commit path

2017-02-27 Thread David Gibson
On Wed, Feb 15, 2017 at 02:40:04PM +1100, David Gibson wrote:
> resize_hpt_release(), called once the HPT resize of a KVM guest is
> completed (successfully or unsuccessfully) free()s the state structure for
> the resize.  It is currently not safe to call with a NULL pointer.
> 
> However, one of the error paths in kvm_vm_ioctl_resize_hpt_commit() can
> invoke it with a NULL pointer.  This will occur if userspace improperly
> invokes KVM_PPC_RESIZE_HPT_COMMIT without previously calling
> KVM_PPC_RESIZE_HPT_PREPARE, or if it calls COMMIT twice without an
> intervening PREPARE.
> 
> To fix this potential crash bug - and maybe others like it, make it safe
> (and a no-op) to call resize_hpt_release() with a NULL resize pointer.
> 
> Found by Dan Carpenter with a static checker.
> 
> Reported-by: Dan Carpenter 
> Signed-off-by: David Gibson 

Ping,

Paul have you taken this one?

> ---
>  arch/powerpc/kvm/book3s_64_mmu_hv.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
> b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index 013552f..72ccac2 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -1407,6 +1407,9 @@ static void resize_hpt_release(struct kvm *kvm, struct 
> kvm_resize_hpt *resize)
>  {
>   BUG_ON(kvm->arch.resize_hpt != resize);
>  
> + if (!resize)
> +     return;
> +
>   if (resize->hpt.virt)
>   kvmppc_free_hpt(>hpt);
>  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


[PATCH] KVM: Prevent double-free on HPT resize commit path

2017-02-14 Thread David Gibson
resize_hpt_release(), called once the HPT resize of a KVM guest is
completed (successfully or unsuccessfully) free()s the state structure for
the resize.  It is currently not safe to call with a NULL pointer.

However, one of the error paths in kvm_vm_ioctl_resize_hpt_commit() can
invoke it with a NULL pointer.  This will occur if userspace improperly
invokes KVM_PPC_RESIZE_HPT_COMMIT without previously calling
KVM_PPC_RESIZE_HPT_PREPARE, or if it calls COMMIT twice without an
intervening PREPARE.

To fix this potential crash bug - and maybe others like it, make it safe
(and a no-op) to call resize_hpt_release() with a NULL resize pointer.

Found by Dan Carpenter with a static checker.

Reported-by: Dan Carpenter <dan.carpen...@oracle.com>
Signed-off-by: David Gibson <da...@gibson.dropbear.id.au>
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 013552f..72ccac2 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -1407,6 +1407,9 @@ static void resize_hpt_release(struct kvm *kvm, struct 
kvm_resize_hpt *resize)
 {
BUG_ON(kvm->arch.resize_hpt != resize);
 
+   if (!resize)
+   return;
+
if (resize->hpt.virt)
kvmppc_free_hpt(>hpt);
 
-- 
2.9.3



[PATCH] KVM: Prevent double-free on HPT resize commit path

2017-02-14 Thread David Gibson
resize_hpt_release(), called once the HPT resize of a KVM guest is
completed (successfully or unsuccessfully) free()s the state structure for
the resize.  It is currently not safe to call with a NULL pointer.

However, one of the error paths in kvm_vm_ioctl_resize_hpt_commit() can
invoke it with a NULL pointer.  This will occur if userspace improperly
invokes KVM_PPC_RESIZE_HPT_COMMIT without previously calling
KVM_PPC_RESIZE_HPT_PREPARE, or if it calls COMMIT twice without an
intervening PREPARE.

To fix this potential crash bug - and maybe others like it, make it safe
(and a no-op) to call resize_hpt_release() with a NULL resize pointer.

Found by Dan Carpenter with a static checker.

Reported-by: Dan Carpenter 
Signed-off-by: David Gibson 
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 013552f..72ccac2 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -1407,6 +1407,9 @@ static void resize_hpt_release(struct kvm *kvm, struct 
kvm_resize_hpt *resize)
 {
BUG_ON(kvm->arch.resize_hpt != resize);
 
+   if (!resize)
+   return;
+
if (resize->hpt.virt)
kvmppc_free_hpt(>hpt);
 
-- 
2.9.3



Re: [PATCHv2 00/11] KVM implementation of PAPR HPT resizing extension

2017-01-31 Thread David Gibson
On Wed, Feb 01, 2017 at 09:18:13AM +1100, Paul Mackerras wrote:
> On Tue, Dec 20, 2016 at 04:48:56PM +1100, David Gibson wrote:
> > Here is the KVM implementation for the proposed PAPR extension which
> > allows the runtime resizing of a PAPR guest's Hashed Page Table (HPT).
> > 
> > Using this requires a guest kernel with support for the extension.
> > Patches for guest side support in Linux were posted earlier:
> >   https://lists.ozlabs.org/pipermail/linuxppc-dev/2016-December/152164.html
> > 
> > It also requires userspace (i.e. qemu) to intercept the HPT resizing
> > hypercalls and invoke the KVM ioctl()s to implement them.  This is
> > done instead of having KVM direclty intercept the hypercalls, so that
> > userspace can, if useful, impose additional restrictions on resizes:
> > for example it could refuse them entirely if policy for the VM
> > precludes resizing, or it could limit the size of HPT the guest can
> > request to meet resource limits.
> > 
> > Patches to implement the userspace part of HPT resizing are proposed
> > for qemu-2.9, and can be found at:
> >   https://github.com/dgibson/qemu/tree/hpt-resize
> > 
> > I'm posting these now, in the hopes that both these and the
> > corresponding guest side patches can be staged and merged for the 4.11
> > window.
> 
> Thanks, series applied to my kvm-ppc-next branch.

Great.  Where can I grab that tree?

> Next time, please cc kvm-...@vger.kernel.org so I shows up in
> patchwork and I don't miss it.

Sorry, I hadn't realized that list existed.  Noted for future
reference.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCHv2 00/11] KVM implementation of PAPR HPT resizing extension

2017-01-31 Thread David Gibson
On Wed, Feb 01, 2017 at 09:18:13AM +1100, Paul Mackerras wrote:
> On Tue, Dec 20, 2016 at 04:48:56PM +1100, David Gibson wrote:
> > Here is the KVM implementation for the proposed PAPR extension which
> > allows the runtime resizing of a PAPR guest's Hashed Page Table (HPT).
> > 
> > Using this requires a guest kernel with support for the extension.
> > Patches for guest side support in Linux were posted earlier:
> >   https://lists.ozlabs.org/pipermail/linuxppc-dev/2016-December/152164.html
> > 
> > It also requires userspace (i.e. qemu) to intercept the HPT resizing
> > hypercalls and invoke the KVM ioctl()s to implement them.  This is
> > done instead of having KVM direclty intercept the hypercalls, so that
> > userspace can, if useful, impose additional restrictions on resizes:
> > for example it could refuse them entirely if policy for the VM
> > precludes resizing, or it could limit the size of HPT the guest can
> > request to meet resource limits.
> > 
> > Patches to implement the userspace part of HPT resizing are proposed
> > for qemu-2.9, and can be found at:
> >   https://github.com/dgibson/qemu/tree/hpt-resize
> > 
> > I'm posting these now, in the hopes that both these and the
> > corresponding guest side patches can be staged and merged for the 4.11
> > window.
> 
> Thanks, series applied to my kvm-ppc-next branch.

Great.  Where can I grab that tree?

> Next time, please cc kvm-...@vger.kernel.org so I shows up in
> patchwork and I don't miss it.

Sorry, I hadn't realized that list existed.  Noted for future
reference.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH] vfio/spapr: fail tce_iommu_attach_group() when iommu_data is null

2017-01-30 Thread David Gibson
On Tue, Jan 24, 2017 at 11:07:59AM -0700, Alex Williamson wrote:
> On Tue, 24 Jan 2017 17:50:26 +0100
> Greg Kurz <gr...@kaod.org> wrote:
> 
> > The recently added mediated VFIO driver doesn't know about powerpc iommu.
> > It thus doesn't register a struct iommu_table_group in the iommu group
> > upon device creation. The iommu_data pointer hence remains null.
> > 
> > This causes a kernel oops when userspace tries to set the iommu type of a
> > container associated with a mediated device to VFIO_SPAPR_TCE_v2_IOMMU.
> > 
> > [   82.585440] mtty mtty: MDEV: Registered
> > [   87.655522] iommu: Adding device 83b8f4f2-509f-382f-3c1e-e6bfe0fa1001 to 
> > group 10
> > [   87.655527] vfio_mdev 83b8f4f2-509f-382f-3c1e-e6bfe0fa1001: MDEV: 
> > group_id = 10
> > [  116.297184] Unable to handle kernel paging request for data at address 
> > 0x0030
> > [  116.297389] Faulting instruction address: 0xd7870524
> > [  116.297465] Oops: Kernel access of bad area, sig: 11 [#1]
> > [  116.297611] SMP NR_CPUS=2048
> > [  116.297611] NUMA
> > [  116.297627] PowerNV
> > ...
> > [  116.297954] CPU: 33 PID: 7067 Comm: qemu-system-ppc Not tainted 
> > 4.10.0-rc5-mdev-test #8
> > [  116.297993] task: c00e7718b680 task.stack: c00e77214000
> > [  116.298025] NIP: d7870524 LR: d7870518 CTR: 
> > 
> > [  116.298064] REGS: c00e77217990 TRAP: 0300   Not tainted  
> > (4.10.0-rc5-mdev-test)
> > [  116.298103] MSR: 90009033 <SF,HV,EE,ME,IR,DR,RI,LE>
> > [  116.298107]   CR: 8400  XER: 
> > [  116.298154] CFAR: c000888c DAR: 0030 DSISR: 4000 
> > SOFTE: 1
> >GPR00: d7870518 c00e77217c10 d787b0ed 
> > c00eed2103c0
> >GPR04:   c00eed2103e0 
> > 000f2432
> >GPR08: 0104 0001  
> > d78729b0
> >GPR12: c025b7e0 cfe08400 0001 
> > 01002d31d100
> >GPR16: 01002c22c850 3315c750 43145680 
> > 43141bc0
> >GPR20: ffed f000 20003b65 
> > d7706018
> >GPR24: c00f16cf0d98 d7706000 c3f42980 
> > c3f42980
> >GPR28: c00f1575ac00 c3f429c8  
> > c00eed2103c0
> > [  116.298504] NIP [d7870524] tce_iommu_attach_group+0x10c/0x360 
> > [vfio_iommu_spapr_tce]
> > [  116.298555] LR [d7870518] tce_iommu_attach_group+0x100/0x360 
> > [vfio_iommu_spapr_tce]
> > [  116.298601] Call Trace:
> > [  116.298610] [c00e77217c10] [d7870518] 
> > tce_iommu_attach_group+0x100/0x360 [vfio_iommu_spapr_tce] (unreliable)
> > [  116.298671] [c00e77217cb0] [d77033a0] 
> > vfio_fops_unl_ioctl+0x278/0x3e0 [vfio]
> > [  116.298713] [c00e77217d40] [c02a3ebc] do_vfs_ioctl+0xcc/0x8b0
> > [  116.298745] [c00e77217de0] [c02a4700] SyS_ioctl+0x60/0xc0
> > [  116.298782] [c00e77217e30] [c000b220] system_call+0x38/0xfc
> > [  116.298812] Instruction dump:
> > [  116.298828] 7d3f4b78 409effc8 3d22 e9298020 3c800140 38a00018 
> > 608480c0 e8690028
> > [  116.298869] 4800249d e8410018 7c7f1b79 41820230  2fa9 
> > 419e0114 e9090020
> > [  116.298914] ---[ end trace 1e10b0ced08b9120 ]---
> > 
> > This patch fixes the oops.
> > 
> > Reported-by: Vaibhav Jain <vaib...@linux.vnet.ibm.com>
> > Signed-off-by: Greg Kurz <gr...@kaod.org>
> > ---
> >  drivers/vfio/vfio_iommu_spapr_tce.c |4 
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
> > b/drivers/vfio/vfio_iommu_spapr_tce.c
> > index c8823578a1b2..128d10282d16 100644
> > --- a/drivers/vfio/vfio_iommu_spapr_tce.c
> > +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> > @@ -1270,6 +1270,10 @@ static int tce_iommu_attach_group(void *iommu_data,
> > /* pr_debug("tce_vfio: Attaching group #%u to iommu %p\n",
> > iommu_group_id(iommu_group), iommu_group); */
> > table_group = iommu_group_get_iommudata(iommu_group);
> > +   if (!table_group) {
> > +   ret = -ENODEV;
> > +   goto unlock_exit;
> > +   }
> >  
> > if (tce_groups_attached(container) && (!table_group->ops ||
> > !table_group->ops->take_ownership ||
> > 
> 
> Seems sane to me.
> 
> David/Alexey, please review.  Thanks,

Seems reasonable.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH] vfio/spapr: fail tce_iommu_attach_group() when iommu_data is null

2017-01-30 Thread David Gibson
On Tue, Jan 24, 2017 at 11:07:59AM -0700, Alex Williamson wrote:
> On Tue, 24 Jan 2017 17:50:26 +0100
> Greg Kurz  wrote:
> 
> > The recently added mediated VFIO driver doesn't know about powerpc iommu.
> > It thus doesn't register a struct iommu_table_group in the iommu group
> > upon device creation. The iommu_data pointer hence remains null.
> > 
> > This causes a kernel oops when userspace tries to set the iommu type of a
> > container associated with a mediated device to VFIO_SPAPR_TCE_v2_IOMMU.
> > 
> > [   82.585440] mtty mtty: MDEV: Registered
> > [   87.655522] iommu: Adding device 83b8f4f2-509f-382f-3c1e-e6bfe0fa1001 to 
> > group 10
> > [   87.655527] vfio_mdev 83b8f4f2-509f-382f-3c1e-e6bfe0fa1001: MDEV: 
> > group_id = 10
> > [  116.297184] Unable to handle kernel paging request for data at address 
> > 0x0030
> > [  116.297389] Faulting instruction address: 0xd7870524
> > [  116.297465] Oops: Kernel access of bad area, sig: 11 [#1]
> > [  116.297611] SMP NR_CPUS=2048
> > [  116.297611] NUMA
> > [  116.297627] PowerNV
> > ...
> > [  116.297954] CPU: 33 PID: 7067 Comm: qemu-system-ppc Not tainted 
> > 4.10.0-rc5-mdev-test #8
> > [  116.297993] task: c00e7718b680 task.stack: c00e77214000
> > [  116.298025] NIP: d7870524 LR: d7870518 CTR: 
> > 
> > [  116.298064] REGS: c00e77217990 TRAP: 0300   Not tainted  
> > (4.10.0-rc5-mdev-test)
> > [  116.298103] MSR: 90009033 
> > [  116.298107]   CR: 8400  XER: 
> > [  116.298154] CFAR: c000888c DAR: 0030 DSISR: 4000 
> > SOFTE: 1
> >GPR00: d7870518 c00e77217c10 d787b0ed 
> > c00eed2103c0
> >GPR04:   c00eed2103e0 
> > 000f2432
> >GPR08: 0104 0001  
> > d78729b0
> >GPR12: c025b7e0 cfe08400 0001 
> > 01002d31d100
> >GPR16: 01002c22c850 3315c750 43145680 
> > 43141bc0
> >GPR20: ffed f000 20003b65 
> > d7706018
> >GPR24: c00f16cf0d98 d7706000 c3f42980 
> > c3f42980
> >GPR28: c00f1575ac00 c3f429c8  
> > c00eed2103c0
> > [  116.298504] NIP [d7870524] tce_iommu_attach_group+0x10c/0x360 
> > [vfio_iommu_spapr_tce]
> > [  116.298555] LR [d7870518] tce_iommu_attach_group+0x100/0x360 
> > [vfio_iommu_spapr_tce]
> > [  116.298601] Call Trace:
> > [  116.298610] [c00e77217c10] [d7870518] 
> > tce_iommu_attach_group+0x100/0x360 [vfio_iommu_spapr_tce] (unreliable)
> > [  116.298671] [c00e77217cb0] [d77033a0] 
> > vfio_fops_unl_ioctl+0x278/0x3e0 [vfio]
> > [  116.298713] [c00e77217d40] [c02a3ebc] do_vfs_ioctl+0xcc/0x8b0
> > [  116.298745] [c00e77217de0] [c02a4700] SyS_ioctl+0x60/0xc0
> > [  116.298782] [c00e77217e30] [c000b220] system_call+0x38/0xfc
> > [  116.298812] Instruction dump:
> > [  116.298828] 7d3f4b78 409effc8 3d22 e9298020 3c800140 38a00018 
> > 608480c0 e8690028
> > [  116.298869] 4800249d e8410018 7c7f1b79 41820230  2fa9 
> > 419e0114 e9090020
> > [  116.298914] ---[ end trace 1e10b0ced08b9120 ]---
> > 
> > This patch fixes the oops.
> > 
> > Reported-by: Vaibhav Jain 
> > Signed-off-by: Greg Kurz 
> > ---
> >  drivers/vfio/vfio_iommu_spapr_tce.c |4 
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
> > b/drivers/vfio/vfio_iommu_spapr_tce.c
> > index c8823578a1b2..128d10282d16 100644
> > --- a/drivers/vfio/vfio_iommu_spapr_tce.c
> > +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> > @@ -1270,6 +1270,10 @@ static int tce_iommu_attach_group(void *iommu_data,
> > /* pr_debug("tce_vfio: Attaching group #%u to iommu %p\n",
> > iommu_group_id(iommu_group), iommu_group); */
> > table_group = iommu_group_get_iommudata(iommu_group);
> > +   if (!table_group) {
> > +   ret = -ENODEV;
> > +   goto unlock_exit;
> > +   }
> >  
> > if (tce_groups_attached(container) && (!table_group->ops ||
> > !table_group->ops->take_ownership ||
> > 
> 
> Seems sane to me.
> 
> David/Alexey, please review.  Thanks,

Seems reasonable.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


[PATCHv2 05/11] powerpc/kvm: Don't store values derivable from HPT order

2016-12-19 Thread David Gibson
Currently the kvm_hpt_info structure stores the hashed page table's order,
and also the number of HPTEs it contains and a mask for its size.  The
last two can be easily derived from the order, so remove them and just
calculate them as necessary with a couple of helper inlines.

Signed-off-by: David Gibson <da...@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <th...@redhat.com>
---
 arch/powerpc/include/asm/kvm_book3s_64.h | 12 
 arch/powerpc/include/asm/kvm_host.h  |  2 --
 arch/powerpc/kvm/book3s_64_mmu_hv.c  | 28 +---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  | 18 +-
 4 files changed, 34 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 8482921..8810327 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -350,6 +350,18 @@ extern void kvmppc_mmu_debugfs_init(struct kvm *kvm);
 
 extern void kvmhv_rm_send_ipi(int cpu);
 
+static inline unsigned long kvmppc_hpt_npte(struct kvm_hpt_info *hpt)
+{
+   /* HPTEs are 2**4 bytes long */
+   return 1UL << (hpt->order - 4);
+}
+
+static inline unsigned long kvmppc_hpt_mask(struct kvm_hpt_info *hpt)
+{
+   /* 128 (2**7) bytes in each HPTEG */
+   return (1UL << (hpt->order - 7)) - 1;
+}
+
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 
 #endif /* __ASM_KVM_BOOK3S_64_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 074b17f..7576e0e 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -246,8 +246,6 @@ struct kvm_hpt_info {
unsigned long virt;
/* Array of reverse mapping entries for each guest HPTE */
struct revmap_entry *rev;
-   unsigned long npte;
-   unsigned long mask;
/* Guest HPT size is 2**(order) bytes */
u32 order;
/* 1 if HPT allocated with CMA, 0 otherwise */
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index b5799d1..fe88132 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -83,15 +83,11 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
 
kvm->arch.hpt.virt = hpt;
kvm->arch.hpt.order = order;
-   /* HPTEs are 2**4 bytes long */
-   kvm->arch.hpt.npte = 1ul << (order - 4);
-   /* 128 (2**7) bytes in each HPTEG */
-   kvm->arch.hpt.mask = (1ul << (order - 7)) - 1;
 
atomic64_set(>arch.mmio_update, 0);
 
/* Allocate reverse map array */
-   rev = vmalloc(sizeof(struct revmap_entry) * kvm->arch.hpt.npte);
+   rev = vmalloc(sizeof(struct revmap_entry) * 
kvmppc_hpt_npte(>arch.hpt));
if (!rev) {
pr_err("kvmppc_alloc_hpt: Couldn't alloc reverse map array\n");
goto out_freehpt;
@@ -194,8 +190,8 @@ void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct 
kvm_memory_slot *memslot,
if (npages > 1ul << (40 - porder))
npages = 1ul << (40 - porder);
/* Can't use more than 1 HPTE per HPTEG */
-   if (npages > kvm->arch.hpt.mask + 1)
-   npages = kvm->arch.hpt.mask + 1;
+   if (npages > kvmppc_hpt_mask(>arch.hpt) + 1)
+   npages = kvmppc_hpt_mask(>arch.hpt) + 1;
 
hp0 = HPTE_V_1TB_SEG | (VRMA_VSID << (40 - 16)) |
HPTE_V_BOLTED | hpte0_pgsize_encoding(psize);
@@ -205,7 +201,8 @@ void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct 
kvm_memory_slot *memslot,
for (i = 0; i < npages; ++i) {
addr = i << porder;
/* can't use hpt_hash since va > 64 bits */
-   hash = (i ^ (VRMA_VSID ^ (VRMA_VSID << 25))) & 
kvm->arch.hpt.mask;
+   hash = (i ^ (VRMA_VSID ^ (VRMA_VSID << 25)))
+   & kvmppc_hpt_mask(>arch.hpt);
/*
 * We assume that the hash table is empty and no
 * vcpus are using it at this stage.  Since we create
@@ -1306,7 +1303,7 @@ static ssize_t kvm_htab_read(struct file *file, char 
__user *buf,
 
/* Skip uninteresting entries, i.e. clean on not-first pass */
if (!first_pass) {
-   while (i < kvm->arch.hpt.npte &&
+   while (i < kvmppc_hpt_npte(>arch.hpt) &&
   !hpte_dirty(revp, hptp)) {
++i;
hptp += 2;
@@ -1316,7 +1313,7 @@ static ssize_t kvm_htab_read(struct file *file, char 
__user *buf,
hdr.index = i;
 
/* Grab a series of valid entries */
-   while (i < kvm->arch.hpt.npte &&
+   while (i < kvmppc_hpt_npte(>

[PATCHv2 03/11] powerpc/kvm: Rename kvm_alloc_hpt() for clarity

2016-12-19 Thread David Gibson
The difference between kvm_alloc_hpt() and kvmppc_alloc_hpt() is not at
all obvious from the name.  In practice kvmppc_alloc_hpt() allocates an HPT
by whatever means, and calls kvm_alloc_hpt() which will attempt to allocate
it with CMA only.

To make this less confusing, rename kvm_alloc_hpt() to kvm_alloc_hpt_cma().
Similarly, kvm_release_hpt() is renamed kvm_free_hpt_cma().

Signed-off-by: David Gibson <da...@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <th...@redhat.com>
---
 arch/powerpc/include/asm/kvm_ppc.h   | 4 ++--
 arch/powerpc/kvm/book3s_64_mmu_hv.c  | 8 
 arch/powerpc/kvm/book3s_hv_builtin.c | 8 
 3 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 2da67bf..3db6be9 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -186,8 +186,8 @@ extern long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
unsigned long tce_value, unsigned long npages);
 extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 unsigned long ioba);
-extern struct page *kvm_alloc_hpt(unsigned long nr_pages);
-extern void kvm_release_hpt(struct page *page, unsigned long nr_pages);
+extern struct page *kvm_alloc_hpt_cma(unsigned long nr_pages);
+extern void kvm_free_hpt_cma(struct page *page, unsigned long nr_pages);
 extern int kvmppc_core_init_vm(struct kvm *kvm);
 extern void kvmppc_core_destroy_vm(struct kvm *kvm);
 extern void kvmppc_core_free_memslot(struct kvm *kvm,
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index b795dd1..ae17cdd 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -62,7 +62,7 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
}
 
kvm->arch.hpt_cma_alloc = 0;
-   page = kvm_alloc_hpt(1ul << (order - PAGE_SHIFT));
+   page = kvm_alloc_hpt_cma(1ul << (order - PAGE_SHIFT));
if (page) {
hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page));
memset((void *)hpt, 0, (1ul << order));
@@ -108,7 +108,7 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
 
  out_freehpt:
if (kvm->arch.hpt_cma_alloc)
-   kvm_release_hpt(page, 1 << (order - PAGE_SHIFT));
+   kvm_free_hpt_cma(page, 1 << (order - PAGE_SHIFT));
else
free_pages(hpt, order - PAGE_SHIFT);
return -ENOMEM;
@@ -155,8 +155,8 @@ void kvmppc_free_hpt(struct kvm *kvm)
kvmppc_free_lpid(kvm->arch.lpid);
vfree(kvm->arch.revmap);
if (kvm->arch.hpt_cma_alloc)
-   kvm_release_hpt(virt_to_page(kvm->arch.hpt_virt),
-   1 << (kvm->arch.hpt_order - PAGE_SHIFT));
+   kvm_free_hpt_cma(virt_to_page(kvm->arch.hpt_virt),
+1 << (kvm->arch.hpt_order - PAGE_SHIFT));
else
free_pages(kvm->arch.hpt_virt,
   kvm->arch.hpt_order - PAGE_SHIFT);
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index 5bb24be..4c4aa47 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -52,19 +52,19 @@ static int __init early_parse_kvm_cma_resv(char *p)
 }
 early_param("kvm_cma_resv_ratio", early_parse_kvm_cma_resv);
 
-struct page *kvm_alloc_hpt(unsigned long nr_pages)
+struct page *kvm_alloc_hpt_cma(unsigned long nr_pages)
 {
VM_BUG_ON(order_base_2(nr_pages) < KVM_CMA_CHUNK_ORDER - PAGE_SHIFT);
 
return cma_alloc(kvm_cma, nr_pages, order_base_2(HPT_ALIGN_PAGES));
 }
-EXPORT_SYMBOL_GPL(kvm_alloc_hpt);
+EXPORT_SYMBOL_GPL(kvm_alloc_hpt_cma);
 
-void kvm_release_hpt(struct page *page, unsigned long nr_pages)
+void kvm_free_hpt_cma(struct page *page, unsigned long nr_pages)
 {
cma_release(kvm_cma, page, nr_pages);
 }
-EXPORT_SYMBOL_GPL(kvm_release_hpt);
+EXPORT_SYMBOL_GPL(kvm_free_hpt_cma);
 
 /**
  * kvm_cma_reserve() - reserve area for kvm hash pagetable
-- 
2.9.3



[PATCHv2 11/11] powerpc/kvm: Advertise availablity of HPT resizing on KVM HV

2016-12-19 Thread David Gibson
This updates the KVM_CAP_SPAPR_RESIZE_HPT capability to advertise the
presence of in-kernel HPT resizing on KVM HV.

Signed-off-by: David Gibson <da...@gibson.dropbear.id.au>
---
 arch/powerpc/kvm/powerpc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index efd1183..1f32235 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -605,6 +605,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_SPAPR_MULTITCE:
r = 1;
break;
+   case KVM_CAP_SPAPR_RESIZE_HPT:
+   r = !!hv_enabled;
+   break;
 #endif
case KVM_CAP_PPC_HTM:
r = cpu_has_feature(CPU_FTR_TM_COMP) &&
-- 
2.9.3



  1   2   3   4   5   6   7   8   9   10   >