Re: [PATCH] cxl: Fix NULL pointer dereference on kernel contexts with no AFU interrupts

2016-06-29 Thread Andrew Donnellan

On 30/06/16 15:00, Michael Ellerman wrote:

On Thu, 2016-06-30 at 08:28 +1000, Andrew Donnellan wrote:

On 30/06/16 04:55, Ian Munsie wrote:


From: Ian Munsie 

If a kernel context is initialised and does not have any AFU interrupts
allocated it will cause a NULL pointer dereference when the context is
detached since the irq_names list will not have been initialised.

Move the initialisation of the irq_names list into the cxl_context_init
routine so that it will be valid for the entire lifetime of the context
and will not cause a NULL pointer dereference.

Signed-off-by: Ian Munsie 



As it's nice having your machine not crash on every shutdown...


Fixes: 


Ian can correct me if I'm wrong, but I suspect this doesn't affect 
cxlflash (the only current user of the cxl kernel API) - this issue was 
hit while working on CAPI support for mlx5.


--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] cxl: Fix NULL pointer dereference on kernel contexts with no AFU interrupts

2016-06-29 Thread Michael Ellerman
On Thu, 2016-06-30 at 08:28 +1000, Andrew Donnellan wrote:
> On 30/06/16 04:55, Ian Munsie wrote:
> > 
> > From: Ian Munsie 
> > 
> > If a kernel context is initialised and does not have any AFU interrupts
> > allocated it will cause a NULL pointer dereference when the context is
> > detached since the irq_names list will not have been initialised.
> > 
> > Move the initialisation of the irq_names list into the cxl_context_init
> > routine so that it will be valid for the entire lifetime of the context
> > and will not cause a NULL pointer dereference.
> > 
> > Signed-off-by: Ian Munsie 

> As it's nice having your machine not crash on every shutdown...

Fixes: 

cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 02/12] genhd: Honor gen_uevent and add disk_gen_uevents

2016-06-29 Thread kbuild test robot
Hi,

[auto build test WARNING on block/for-next]
[also build test WARNING on v4.7-rc5]
[cannot apply to next-20160629]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Fam-Zheng/gendisk-Generate-uevent-after-attribute-available/20160630-100720
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git 
for-next
reproduce: make htmldocs

All warnings (new ones prefixed by >>):

   lib/crc32.c:148: warning: No description found for parameter 'tab)[256]'
   lib/crc32.c:148: warning: Excess function parameter 'tab' description in 
'crc32_le_generic'
   lib/crc32.c:293: warning: No description found for parameter 'tab)[256]'
   lib/crc32.c:293: warning: Excess function parameter 'tab' description in 
'crc32_be_generic'
   lib/crc32.c:1: warning: no structured comments found
   mm/memory.c:2881: warning: No description found for parameter 'old'
>> block/genhd.c:575: warning: No description found for parameter 'disk'
>> block/genhd.c:575: warning: No description found for parameter 'disk'

vim +/disk +575 block/genhd.c

   559  blkdev_put(bdev, FMODE_READ);
   560  
   561  exit:
   562  /* announce disk after possible partitions are created */
   563  dev_set_uevent_suppress(ddev, 0);
   564  if (gen_uevent)
   565  disk_gen_uevents(disk);
   566  }
   567  
   568  /**
   569   * disk_gen_uevents
   570   * @disk - the disk to generate uevent
   571   *
   572   * Generate KOBJ_ADD uevents on the disk and partitions.
   573   */
   574  void disk_gen_uevents(struct gendisk *disk)
 > 575  {
   576  struct device *ddev = disk_to_dev(disk);
   577  struct disk_part_iter piter;
   578  struct hd_struct *part;
   579  
   580  kobject_uevent(>kobj, KOBJ_ADD);
   581  
   582  /* announce possible partitions */
   583  disk_part_iter_init(, disk, 0);

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V2 7/7] thermal: qoriq: Add thermal management support

2016-06-29 Thread Jia Hongtao
This driver add thermal management support by enabling TMU (Thermal
Monitoring Unit) on QorIQ platform.

It's based on thermal of framework:
- Trip points defined in device tree.
- Cpufreq as cooling device registered in qoriq cpufreq driver.

Signed-off-by: Jia Hongtao 
---
Changes of V2:
* Add HAS_IOMEM dependency to fix build error on UM

 drivers/thermal/Kconfig |  10 ++
 drivers/thermal/Makefile|   1 +
 drivers/thermal/qoriq_thermal.c | 328 
 3 files changed, 339 insertions(+)
 create mode 100644 drivers/thermal/qoriq_thermal.c

diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
index 2d702ca..56ef30d 100644
--- a/drivers/thermal/Kconfig
+++ b/drivers/thermal/Kconfig
@@ -195,6 +195,16 @@ config IMX_THERMAL
  cpufreq is used as the cooling device to throttle CPUs when the
  passive trip is crossed.

+config QORIQ_THERMAL
+   tristate "QorIQ Thermal Monitoring Unit"
+   depends on THERMAL_OF
+   depends on HAS_IOMEM
+   help
+ Support for Thermal Monitoring Unit (TMU) found on QorIQ platforms.
+ It supports one critical trip point and one passive trip point. The
+ cpufreq is used as the cooling device to throttle CPUs when the
+ passive trip is crossed.
+
 config SPEAR_THERMAL
tristate "SPEAr thermal sensor driver"
depends on PLAT_SPEAR || COMPILE_TEST
diff --git a/drivers/thermal/Makefile b/drivers/thermal/Makefile
index 10b07c1..6662232 100644
--- a/drivers/thermal/Makefile
+++ b/drivers/thermal/Makefile
@@ -37,6 +37,7 @@ obj-$(CONFIG_DB8500_THERMAL)  += db8500_thermal.o
 obj-$(CONFIG_ARMADA_THERMAL)   += armada_thermal.o
 obj-$(CONFIG_TANGO_THERMAL)+= tango_thermal.o
 obj-$(CONFIG_IMX_THERMAL)  += imx_thermal.o
+obj-$(CONFIG_QORIQ_THERMAL)+= qoriq_thermal.o
 obj-$(CONFIG_DB8500_CPUFREQ_COOLING)   += db8500_cpufreq_cooling.o
 obj-$(CONFIG_INTEL_POWERCLAMP) += intel_powerclamp.o
 obj-$(CONFIG_X86_PKG_TEMP_THERMAL) += x86_pkg_temp_thermal.o
diff --git a/drivers/thermal/qoriq_thermal.c b/drivers/thermal/qoriq_thermal.c
new file mode 100644
index 000..644ba52
--- /dev/null
+++ b/drivers/thermal/qoriq_thermal.c
@@ -0,0 +1,328 @@
+/*
+ * Copyright 2016 Freescale Semiconductor, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "thermal_core.h"
+
+#define SITES_MAX  16
+
+/*
+ * QorIQ TMU Registers
+ */
+struct qoriq_tmu_site_regs {
+   u32 tritsr; /* Immediate Temperature Site Register */
+   u32 tratsr; /* Average Temperature Site Register */
+   u8 res0[0x8];
+};
+
+struct qoriq_tmu_regs {
+   u32 tmr;/* Mode Register */
+#define TMR_DISABLE0x0
+#define TMR_ME 0x8000
+#define TMR_ALPF   0x0c00
+   u32 tsr;/* Status Register */
+   u32 tmtmir; /* Temperature measurement interval Register */
+#define TMTMIR_DEFAULT 0x000f
+   u8 res0[0x14];
+   u32 tier;   /* Interrupt Enable Register */
+#define TIER_DISABLE   0x0
+   u32 tidr;   /* Interrupt Detect Register */
+   u32 tiscr;  /* Interrupt Site Capture Register */
+   u32 ticscr; /* Interrupt Critical Site Capture Register */
+   u8 res1[0x10];
+   u32 tmhtcrh;/* High Temperature Capture Register */
+   u32 tmhtcrl;/* Low Temperature Capture Register */
+   u8 res2[0x8];
+   u32 tmhtitr;/* High Temperature Immediate Threshold */
+   u32 tmhtatr;/* High Temperature Average Threshold */
+   u32 tmhtactr;   /* High Temperature Average Crit Threshold */
+   u8 res3[0x24];
+   u32 ttcfgr; /* Temperature Configuration Register */
+   u32 tscfgr; /* Sensor Configuration Register */
+   u8 res4[0x78];
+   struct qoriq_tmu_site_regs site[SITES_MAX];
+   u8 res5[0x9f8];
+   u32 ipbrr0; /* IP Block Revision Register 0 */
+   u32 ipbrr1; /* IP Block Revision Register 1 */
+   u8 res6[0x310];
+   u32 ttr0cr; /* Temperature Range 0 Control Register */
+   u32 ttr1cr; /* Temperature Range 1 Control Register */
+   u32 ttr2cr; /* Temperature Range 2 Control Register */
+   u32 ttr3cr; /* Temperature Range 3 Control Register */
+};
+
+/*
+ * Thermal zone data
+ */
+struct 

RE: [PATCH v3 1/2] clk: Add consumer APIs for discovering possible parent clocks

2016-06-29 Thread Yuantian Tang
> -Original Message-
> From: Rafael J. Wysocki [mailto:r...@rjwysocki.net]
> Sent: Thursday, June 30, 2016 10:24 AM
> To: Yuantian Tang 
> Cc: Scott Wood ; Russell King ;
> Michael Turquette ; Stephen Boyd
> ; Viresh Kumar ; linux-
> c...@vger.kernel.org; linux...@vger.kernel.org; linuxppc-
> d...@lists.ozlabs.org; Yang-Leo Li ; Xiaofeng Ren
> ; Scott Wood 
> Subject: Re: [PATCH v3 1/2] clk: Add consumer APIs for discovering possible
> parent clocks
> 
> On Thursday, June 30, 2016 01:47:09 AM Yuantian Tang wrote:
> > > -Original Message-
> > > From: Rafael J. Wysocki [mailto:r...@rjwysocki.net]
> > > Sent: Thursday, June 30, 2016 9:47 AM
> > > To: Yuantian Tang 
> > > Cc: Scott Wood ; Russell King
> > > ; Michael Turquette
> > > ; Stephen Boyd ;
> > > Viresh Kumar ; linux- c...@vger.kernel.org;
> > > linux...@vger.kernel.org; linuxppc- d...@lists.ozlabs.org; Yang-Leo
> > > Li ; Xiaofeng Ren ; Scott
> > > Wood 
> > > Subject: Re: [PATCH v3 1/2] clk: Add consumer APIs for discovering
> > > possible parent clocks
> > >
> > > On Wednesday, June 29, 2016 05:50:26 AM Yuantian Tang wrote:
> > > > Hi,
> > > >
> > > > This patch is acked by clock maintainer. If no comments from
> > > > anyone else,
> > > we will merge it in next week.
> > >
> > > There is a cpufreq commit depending on it.  Are you going to handle
> > > that one too?
> > >
> > That one has been acked by cpufreq maintainer. You can get this from
> patch comments.
> 
> I know that it has been ACKed.
> 
> My question is whether or not you are going to apply it along the [1/2].
> 
> If not, it will have to be deferred until the [1/2] is merged and then applied
> which may not be desirable.
> 
I hope we can apply both at same time. Seems Scott has a few concerns.

What you think about this patch? Can you apply it?
If you have applied this patch, then I can push CPUfreq maintainer to apply 
another one which will be delayed.

Regards,
Yuantian

> Thanks,
> Rafael

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 12/12] nvme: Generate uevent after attribute available

2016-06-29 Thread Fam Zheng
It is documented that KOBJ_ADD should be generated after the object's
attributes and children are ready.  We can achieve this with the new
disk_gen_uevents interface.

Signed-off-by: Fam Zheng 
---
 drivers/nvme/host/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index fd70894..2655521 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1462,11 +1462,12 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, 
unsigned nsid)
if (ns->type == NVME_NS_LIGHTNVM)
return;
 
-   add_disk(ns->disk, true);
+   add_disk(ns->disk, false);
if (sysfs_create_group(_to_dev(ns->disk)->kobj,
_ns_attr_group))
pr_warn("%s: failed to create sysfs group for identification\n",
ns->disk->disk_name);
+   disk_gen_uevents(ns->disk);
return;
  out_free_disk:
kfree(disk);
-- 
2.9.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 11/12] mtd: Generate uevent after attribute available

2016-06-29 Thread Fam Zheng
It is documented that KOBJ_ADD should be generated after the object's
attributes and children are ready.  We can achieve this with the new
disk_gen_uevents interface.

Signed-off-by: Fam Zheng 
---
 drivers/mtd/mtd_blkdevs.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/mtd/mtd_blkdevs.c b/drivers/mtd/mtd_blkdevs.c
index ab3bc22..6848141 100644
--- a/drivers/mtd/mtd_blkdevs.c
+++ b/drivers/mtd/mtd_blkdevs.c
@@ -436,13 +436,14 @@ int add_mtd_blktrans_dev(struct mtd_blktrans_dev *new)
if (new->readonly)
set_disk_ro(gd, 1);
 
-   add_disk(gd, true);
+   add_disk(gd, false);
 
if (new->disk_attributes) {
ret = sysfs_create_group(_to_dev(gd)->kobj,
new->disk_attributes);
WARN_ON(ret);
}
+   disk_gen_uevents(gd);
return 0;
 error4:
blk_cleanup_queue(new->rq);
-- 
2.9.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 10/12] mmc: Generate uevent after attribute available

2016-06-29 Thread Fam Zheng
It is documented that KOBJ_ADD should be generated after the object's
attributes and children are ready.  We can achieve this with the new
disk_gen_uevents interface.

Signed-off-by: Fam Zheng 
---
 drivers/mmc/card/block.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
index 94cf51e..4007106 100644
--- a/drivers/mmc/card/block.c
+++ b/drivers/mmc/card/block.c
@@ -2457,7 +2457,7 @@ static int mmc_add_disk(struct mmc_blk_data *md)
int ret;
struct mmc_card *card = md->queue.card;
 
-   add_disk(md->disk, true);
+   add_disk(md->disk, false);
md->force_ro.show = force_ro_show;
md->force_ro.store = force_ro_store;
sysfs_attr_init(>force_ro.attr);
@@ -2466,6 +2466,7 @@ static int mmc_add_disk(struct mmc_blk_data *md)
ret = device_create_file(disk_to_dev(md->disk), >force_ro);
if (ret)
goto force_ro_fail;
+   disk_gen_uevents(md->disk);
 
if ((md->area_type & MMC_BLK_DATA_AREA_BOOT) &&
 card->ext_csd.boot_ro_lockable) {
-- 
2.9.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 09/12] md: Generate uevent after attribute available

2016-06-29 Thread Fam Zheng
It is documented that KOBJ_ADD should be generated after the object's
attributes and children are ready.  We can achieve this with the new
disk_gen_uevents interface.

Signed-off-by: Fam Zheng 
---
 drivers/md/md.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 1391c72..dcd09ea 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5055,7 +5055,7 @@ static int md_alloc(dev_t dev, char *name)
 * through to md_open, so make sure it doesn't get too far
 */
mutex_lock(>open_mutex);
-   add_disk(disk, true);
+   add_disk(disk, false);
 
error = kobject_init_and_add(>kobj, _ktype,
 _to_dev(disk)->kobj, "%s", "md");
@@ -5070,6 +5070,7 @@ static int md_alloc(dev_t dev, char *name)
if (mddev->kobj.sd &&
sysfs_create_group(>kobj, _bitmap_group))
printk(KERN_DEBUG "pointless warning\n");
+   disk_gen_uevents(disk);
mutex_unlock(>open_mutex);
  abort:
mutex_unlock(_mutex);
-- 
2.9.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 08/12] zram: Generate uevent after attribute available

2016-06-29 Thread Fam Zheng
It is documented that KOBJ_ADD should be generated after the object's
attributes and children are ready.  We can achieve this with the new
disk_gen_uevents interface.

Signed-off-by: Fam Zheng 
---
 drivers/block/zram/zram_drv.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index d735513..83f10a0 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1287,7 +1287,7 @@ static int zram_add(void)
zram->disk->queue->limits.discard_zeroes_data = 0;
queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, zram->disk->queue);
 
-   add_disk(zram->disk, true);
+   add_disk(zram->disk, false);
 
ret = sysfs_create_group(_to_dev(zram->disk)->kobj,
_disk_attr_group);
@@ -1296,6 +1296,7 @@ static int zram_add(void)
device_id);
goto out_free_disk;
}
+   disk_gen_uevents(zram->disk);
strlcpy(zram->compressor, default_compressor, sizeof(zram->compressor));
zram->meta = NULL;
 
-- 
2.9.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 07/12] pktcdvd: Generate uevent after attribute available

2016-06-29 Thread Fam Zheng
It is documented that KOBJ_ADD should be generated after the object's
attributes and children are ready.  We can achieve this with the new
disk_gen_uevents interface.

Signed-off-by: Fam Zheng 
---
 drivers/block/pktcdvd.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 00928406..a4e6bb7 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -2785,11 +2785,13 @@ static int pkt_setup_dev(dev_t dev, dev_t* pkt_dev)
disk->events = pd->bdev->bd_disk->events;
disk->async_events = pd->bdev->bd_disk->async_events;
 
-   add_disk(disk, true);
+   add_disk(disk, false);
 
pkt_sysfs_dev_new(pd);
pkt_debugfs_dev_new(pd);
 
+   disk_gen_uevents(disk);
+
pkt_devs[idx] = pd;
if (pkt_dev)
*pkt_dev = pd->pkt_dev;
-- 
2.9.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 06/12] mtip32xx: Generate uevent after attribute available

2016-06-29 Thread Fam Zheng
It is documented that KOBJ_ADD should be generated after the object's
attributes and children are ready.  We can achieve this with the new
disk_gen_uevents interface.

Signed-off-by: Fam Zheng 
---
 drivers/block/mtip32xx/mtip32xx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/block/mtip32xx/mtip32xx.c 
b/drivers/block/mtip32xx/mtip32xx.c
index 2d09fae..8c1cf03 100644
--- a/drivers/block/mtip32xx/mtip32xx.c
+++ b/drivers/block/mtip32xx/mtip32xx.c
@@ -4042,7 +4042,7 @@ skip_create_disk:
set_capacity(dd->disk, capacity);
 
/* Enable the block device and add it to /dev */
-   add_disk(dd->disk, true);
+   add_disk(dd->disk, false);
 
dd->bdev = bdget_disk(dd->disk, 0);
/*
@@ -4054,6 +4054,7 @@ skip_create_disk:
mtip_hw_sysfs_init(dd, kobj);
kobject_put(kobj);
}
+   disk_gen_uevents(dd->disk);
 
if (dd->mtip_svc_handler) {
set_bit(MTIP_DDF_INIT_DONE_BIT, >dd_flag);
-- 
2.9.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 05/12] aoeblk: Generate uevent after attribute available

2016-06-29 Thread Fam Zheng
It is documented that KOBJ_ADD should be generated after the object's
attributes and children are ready.  We can achieve this with the new
disk_gen_uevents interface.

Signed-off-by: Fam Zheng 
---
 drivers/block/aoe/aoeblk.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
index e91c5f1..f0cf4d6 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -417,9 +417,10 @@ aoeblk_gdalloc(void *vp)
 
spin_unlock_irqrestore(>lock, flags);
 
-   add_disk(gd, true);
+   add_disk(gd, false);
aoedisk_add_sysfs(d);
aoedisk_add_debugfs(d);
+   disk_gen_uevents(gd);
 
spin_lock_irqsave(>lock, flags);
WARN_ON(!(d->flags & DEVFL_GD_NOW));
-- 
2.9.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 04/12] axonrom: Generate uevent after attribute available

2016-06-29 Thread Fam Zheng
It is documented that KOBJ_ADD should be generated after the object's
attributes and children are ready.  We can achieve this with the new
disk_gen_uevents interface.

Signed-off-by: Fam Zheng 
---
 arch/powerpc/sysdev/axonram.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c
index 4efd69b..27e7175 100644
--- a/arch/powerpc/sysdev/axonram.c
+++ b/arch/powerpc/sysdev/axonram.c
@@ -238,7 +238,7 @@ static int axon_ram_probe(struct platform_device *device)
set_capacity(bank->disk, bank->size >> AXON_RAM_SECTOR_SHIFT);
blk_queue_make_request(bank->disk->queue, axon_ram_make_request);
blk_queue_logical_block_size(bank->disk->queue, AXON_RAM_SECTOR_SIZE);
-   add_disk(bank->disk, true);
+   add_disk(bank->disk, false);
 
bank->irq_id = irq_of_parse_and_map(device->dev.of_node, 0);
if (bank->irq_id == NO_IRQ) {
@@ -262,6 +262,7 @@ static int axon_ram_probe(struct platform_device *device)
rc = -EFAULT;
goto failed;
}
+   disk_gen_uevents(bank->disk);
 
azfs_minor += bank->disk->minors;
 
-- 
2.9.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 03/12] virtio-blk: Generate uevent after attribute available

2016-06-29 Thread Fam Zheng
Userspace listens to the KOBJ_ADD uevent generated in add_disk. At that
point we haven't created the serial attribute file, therefore depending
on how fast udev reacts, the /dev/disk/by-id/ entry doesn't always get
created.

This race condition can be easily reproduced by hot plugging a number of
virtio-blk disks.

Also in systemd, there used to be a related workaround in udev rules
called 'WAIT_FOR="serial"', but it is removed in later versions.

Now let's generate a KOBJ_CHANGE event after the attributes are ready.

Signed-off-by: Fam Zheng 
---
 drivers/block/virtio_blk.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index f3a59f9..cd9a036 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -733,7 +733,7 @@ static int virtblk_probe(struct virtio_device *vdev)
 
virtio_device_ready(vdev);
 
-   add_disk(vblk->disk, true);
+   add_disk(vblk->disk, false);
err = device_create_file(disk_to_dev(vblk->disk), _attr_serial);
if (err)
goto out_del_disk;
@@ -746,6 +746,7 @@ static int virtblk_probe(struct virtio_device *vdev)
 _attr_cache_type_ro);
if (err)
goto out_del_disk;
+   disk_gen_uevents(vblk->disk);
return 0;
 
 out_del_disk:
-- 
2.9.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 02/12] genhd: Honor gen_uevent and add disk_gen_uevents

2016-06-29 Thread Fam Zheng
In add_disk(), don't send uevent to userspace when gen_uevent is true;
also export the refactored function disk_gen_uevents for later use.

Signed-off-by: Fam Zheng 
---
 block/genhd.c | 23 +++
 include/linux/genhd.h |  1 +
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index 8e1bfa1..9b66953 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -506,12 +506,10 @@ static int exact_lock(dev_t devt, void *data)
return 0;
 }
 
-static void register_disk(struct gendisk *disk)
+static void register_disk(struct gendisk *disk, bool gen_uevent)
 {
struct device *ddev = disk_to_dev(disk);
struct block_device *bdev;
-   struct disk_part_iter piter;
-   struct hd_struct *part;
int err;
 
ddev->parent = disk->driverfs_dev;
@@ -563,6 +561,22 @@ static void register_disk(struct gendisk *disk)
 exit:
/* announce disk after possible partitions are created */
dev_set_uevent_suppress(ddev, 0);
+   if (gen_uevent)
+   disk_gen_uevents(disk);
+}
+
+/**
+ * disk_gen_uevents
+ * @disk - the disk to generate uevent
+ *
+ * Generate KOBJ_ADD uevents on the disk and partitions.
+ */
+void disk_gen_uevents(struct gendisk *disk)
+{
+   struct device *ddev = disk_to_dev(disk);
+   struct disk_part_iter piter;
+   struct hd_struct *part;
+
kobject_uevent(>kobj, KOBJ_ADD);
 
/* announce possible partitions */
@@ -571,6 +585,7 @@ exit:
kobject_uevent(_to_dev(part)->kobj, KOBJ_ADD);
disk_part_iter_exit();
 }
+EXPORT_SYMBOL(disk_gen_uevents);
 
 /**
  * add_disk - add partitioning information to kernel list
@@ -618,7 +633,7 @@ void add_disk(struct gendisk *disk, bool gen_uevent)
 
blk_register_region(disk_devt(disk), disk->minors, NULL,
exact_match, exact_lock, disk);
-   register_disk(disk);
+   register_disk(disk, gen_uevent);
blk_register_queue(disk);
 
/*
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 038be80..87ad9e5 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -416,6 +416,7 @@ extern void part_round_stats(int cpu, struct hd_struct 
*part);
 /* block/genhd.c */
 extern void add_disk(struct gendisk *disk, bool gen_uevent);
 extern void del_gendisk(struct gendisk *gp);
+extern void disk_gen_uevents(struct gendisk *disk);
 extern struct gendisk *get_gendisk(dev_t dev, int *partno);
 extern struct block_device *bdget_disk(struct gendisk *disk, int partno);
 
-- 
2.9.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 01/12] genhd: Add "gen_uevent" parameter to add_disk

2016-06-29 Thread Fam Zheng
The parameter will be used to control whether add_disk should generate
the KOBJ_ADD uevent already.

Signed-off-by: Fam Zheng 
---
 arch/m68k/emu/nfblock.c | 2 +-
 arch/powerpc/sysdev/axonram.c   | 2 +-
 arch/um/drivers/ubd_kern.c  | 2 +-
 arch/xtensa/platforms/iss/simdisk.c | 2 +-
 block/genhd.c   | 3 ++-
 drivers/block/DAC960.c  | 2 +-
 drivers/block/amiflop.c | 2 +-
 drivers/block/aoe/aoeblk.c  | 2 +-
 drivers/block/ataflop.c | 2 +-
 drivers/block/brd.c | 4 ++--
 drivers/block/cciss.c   | 2 +-
 drivers/block/drbd/drbd_main.c  | 2 +-
 drivers/block/floppy.c  | 2 +-
 drivers/block/hd.c  | 2 +-
 drivers/block/loop.c| 2 +-
 drivers/block/mg_disk.c | 2 +-
 drivers/block/mtip32xx/mtip32xx.c   | 2 +-
 drivers/block/nbd.c | 2 +-
 drivers/block/null_blk.c| 2 +-
 drivers/block/osdblk.c  | 2 +-
 drivers/block/paride/pcd.c  | 2 +-
 drivers/block/paride/pd.c   | 2 +-
 drivers/block/paride/pf.c   | 2 +-
 drivers/block/pktcdvd.c | 2 +-
 drivers/block/ps3disk.c | 2 +-
 drivers/block/ps3vram.c | 2 +-
 drivers/block/rbd.c | 2 +-
 drivers/block/rsxx/dev.c| 2 +-
 drivers/block/skd_main.c| 2 +-
 drivers/block/sunvdc.c  | 2 +-
 drivers/block/swim.c| 2 +-
 drivers/block/swim3.c   | 2 +-
 drivers/block/sx8.c | 2 +-
 drivers/block/umem.c| 2 +-
 drivers/block/virtio_blk.c  | 2 +-
 drivers/block/xen-blkfront.c| 2 +-
 drivers/block/xsysace.c | 2 +-
 drivers/block/z2ram.c   | 2 +-
 drivers/block/zram/zram_drv.c   | 2 +-
 drivers/cdrom/gdrom.c   | 2 +-
 drivers/ide/ide-cd.c| 2 +-
 drivers/ide/ide-gd.c| 2 +-
 drivers/lightnvm/core.c | 2 +-
 drivers/md/bcache/super.c   | 4 ++--
 drivers/md/dm.c | 2 +-
 drivers/md/md.c | 2 +-
 drivers/memstick/core/ms_block.c| 2 +-
 drivers/memstick/core/mspro_block.c | 2 +-
 drivers/mmc/card/block.c| 2 +-
 drivers/mtd/mtd_blkdevs.c   | 2 +-
 drivers/mtd/ubi/block.c | 2 +-
 drivers/nvdimm/blk.c| 2 +-
 drivers/nvdimm/btt.c| 2 +-
 drivers/nvdimm/pmem.c   | 2 +-
 drivers/nvme/host/core.c| 2 +-
 drivers/s390/block/dasd_genhd.c | 2 +-
 drivers/s390/block/dcssblk.c| 2 +-
 drivers/s390/block/scm_blk.c| 2 +-
 drivers/s390/block/xpram.c  | 2 +-
 drivers/sbus/char/jsflash.c | 2 +-
 drivers/scsi/sd.c   | 2 +-
 drivers/scsi/sr.c   | 2 +-
 drivers/staging/lustre/lustre/llite/lloop.c | 2 +-
 include/linux/genhd.h   | 2 +-
 64 files changed, 67 insertions(+), 66 deletions(-)

diff --git a/arch/m68k/emu/nfblock.c b/arch/m68k/emu/nfblock.c
index e9110b9..4252568 100644
--- a/arch/m68k/emu/nfblock.c
+++ b/arch/m68k/emu/nfblock.c
@@ -138,7 +138,7 @@ static int __init nfhd_init_one(int id, u32 blocks, u32 
bsize)
set_capacity(dev->disk, (sector_t)blocks * (bsize / 512));
dev->disk->queue = dev->queue;
 
-   add_disk(dev->disk);
+   add_disk(dev->disk, true);
 
list_add_tail(>list, _list);
 
diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c
index ff75d70..4efd69b 100644
--- a/arch/powerpc/sysdev/axonram.c
+++ b/arch/powerpc/sysdev/axonram.c
@@ -238,7 +238,7 @@ static int axon_ram_probe(struct platform_device *device)
set_capacity(bank->disk, bank->size >> AXON_RAM_SECTOR_SHIFT);
blk_queue_make_request(bank->disk->queue, axon_ram_make_request);
blk_queue_logical_block_size(bank->disk->queue, AXON_RAM_SECTOR_SIZE);
-   add_disk(bank->disk);
+   add_disk(bank->disk, true);
 
bank->irq_id = irq_of_parse_and_map(device->dev.of_node, 0);
if (bank->irq_id == NO_IRQ) {
diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c
index 17e96dc..c2eea65 100644
--- a/arch/um/drivers/ubd_kern.c
+++ b/arch/um/drivers/ubd_kern.c
@@ -828,7 +828,7 @@ static int ubd_disk_register(int major, u64 size, int unit,
 
disk->private_data = _devs[unit];
disk->queue = ubd_devs[unit].queue;
-   add_disk(disk);
+  

[PATCH v2 00/12] gendisk: Generate uevent after attribute available

2016-06-29 Thread Fam Zheng
The race condition is noticed between disk_add() and disk attributes, on
virtio-blk hotplug.

Userspace listens to the KOBJ_ADD uevent generated in add_disk(). At that
point we haven't created the serial attribute file, therefore depending
on how fast udev reacts, the /dev/disk/by-id/ entry doesn't always get
created.

As pointed out by Christoph Hellwig in the specific fix [1], virtio-blk is not
the only driver that suffers from this, so we cannot count on every driver to
send events manually. Moreover as suggested in uevent documentation, it is
advised to defer the KOBJ_ADD event until all attributes are ready:

Documentation/kobject.txt:
> Use the KOBJ_ADD action for when the kobject is first added to the kernel.
> This should be done only after any attributes or children of the kobject
> have been initialized properly, as userspace will instantly start to look
> for them when this call happens.

Unfortunately it seems impossible to fix this generally without touching the
offending callers.  The approach I'm proposing here is adding a flag to
suppress uevent in add_disk(), which is patch 1, then in later patches, convert
any caller to only trigger the uevent when attributes are added.

[1] https://lkml.org/lkml/2016/6/28/550

Fam Zheng (12):
  genhd: Add "gen_uevent" parameter to add_disk
  genhd: Honor gen_uevent and add disk_gen_uevents
  virtio-blk: Generate uevent after attribute available
  axonrom: Generate uevent after attribute available
  aoeblk: Generate uevent after attribute available
  mtip32xx: Generate uevent after attribute available
  pktcdvd: Generate uevent after attribute available
  zram: Generate uevent after attribute available
  md: Generate uevent after attribute available
  mmc: Generate uevent after attribute available
  mtd: Generate uevent after attribute available
  nvme: Generate uevent after attribute available

 arch/m68k/emu/nfblock.c |  2 +-
 arch/powerpc/sysdev/axonram.c   |  3 ++-
 arch/um/drivers/ubd_kern.c  |  2 +-
 arch/xtensa/platforms/iss/simdisk.c |  2 +-
 block/genhd.c   | 26 +-
 drivers/block/DAC960.c  |  2 +-
 drivers/block/amiflop.c |  2 +-
 drivers/block/aoe/aoeblk.c  |  3 ++-
 drivers/block/ataflop.c |  2 +-
 drivers/block/brd.c |  4 ++--
 drivers/block/cciss.c   |  2 +-
 drivers/block/drbd/drbd_main.c  |  2 +-
 drivers/block/floppy.c  |  2 +-
 drivers/block/hd.c  |  2 +-
 drivers/block/loop.c|  2 +-
 drivers/block/mg_disk.c |  2 +-
 drivers/block/mtip32xx/mtip32xx.c   |  3 ++-
 drivers/block/nbd.c |  2 +-
 drivers/block/null_blk.c|  2 +-
 drivers/block/osdblk.c  |  2 +-
 drivers/block/paride/pcd.c  |  2 +-
 drivers/block/paride/pd.c   |  2 +-
 drivers/block/paride/pf.c   |  2 +-
 drivers/block/pktcdvd.c |  4 +++-
 drivers/block/ps3disk.c |  2 +-
 drivers/block/ps3vram.c |  2 +-
 drivers/block/rbd.c |  2 +-
 drivers/block/rsxx/dev.c|  2 +-
 drivers/block/skd_main.c|  2 +-
 drivers/block/sunvdc.c  |  2 +-
 drivers/block/swim.c|  2 +-
 drivers/block/swim3.c   |  2 +-
 drivers/block/sx8.c |  2 +-
 drivers/block/umem.c|  2 +-
 drivers/block/virtio_blk.c  |  3 ++-
 drivers/block/xen-blkfront.c|  2 +-
 drivers/block/xsysace.c |  2 +-
 drivers/block/z2ram.c   |  2 +-
 drivers/block/zram/zram_drv.c   |  3 ++-
 drivers/cdrom/gdrom.c   |  2 +-
 drivers/ide/ide-cd.c|  2 +-
 drivers/ide/ide-gd.c|  2 +-
 drivers/lightnvm/core.c |  2 +-
 drivers/md/bcache/super.c   |  4 ++--
 drivers/md/dm.c |  2 +-
 drivers/md/md.c |  3 ++-
 drivers/memstick/core/ms_block.c|  2 +-
 drivers/memstick/core/mspro_block.c |  2 +-
 drivers/mmc/card/block.c|  3 ++-
 drivers/mtd/mtd_blkdevs.c   |  3 ++-
 drivers/mtd/ubi/block.c |  2 +-
 drivers/nvdimm/blk.c|  2 +-
 drivers/nvdimm/btt.c|  2 +-
 drivers/nvdimm/pmem.c   |  2 +-
 drivers/nvme/host/core.c|  3 ++-
 drivers/s390/block/dasd_genhd.c |  2 +-
 drivers/s390/block/dcssblk.c|  2 +-
 drivers/s390/block/scm_blk.c|  2 +-
 

Re: [PATCH v3 1/2] clk: Add consumer APIs for discovering possible parent clocks

2016-06-29 Thread Rafael J. Wysocki
On Thursday, June 30, 2016 01:47:09 AM Yuantian Tang wrote:
> > -Original Message-
> > From: Rafael J. Wysocki [mailto:r...@rjwysocki.net]
> > Sent: Thursday, June 30, 2016 9:47 AM
> > To: Yuantian Tang 
> > Cc: Scott Wood ; Russell King ;
> > Michael Turquette ; Stephen Boyd
> > ; Viresh Kumar ; linux-
> > c...@vger.kernel.org; linux...@vger.kernel.org; linuxppc-
> > d...@lists.ozlabs.org; Yang-Leo Li ; Xiaofeng Ren
> > ; Scott Wood 
> > Subject: Re: [PATCH v3 1/2] clk: Add consumer APIs for discovering possible
> > parent clocks
> > 
> > On Wednesday, June 29, 2016 05:50:26 AM Yuantian Tang wrote:
> > > Hi,
> > >
> > > This patch is acked by clock maintainer. If no comments from anyone else,
> > we will merge it in next week.
> > 
> > There is a cpufreq commit depending on it.  Are you going to handle that one
> > too?
> > 
> That one has been acked by cpufreq maintainer. You can get this from patch 
> comments.

I know that it has been ACKed.

My question is whether or not you are going to apply it along the [1/2].

If not, it will have to be deferred until the [1/2] is merged and then applied
which may not be desirable.

Thanks,
Rafael

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH v3 1/2] clk: Add consumer APIs for discovering possible parent clocks

2016-06-29 Thread Yuantian Tang
> -Original Message-
> From: Rafael J. Wysocki [mailto:r...@rjwysocki.net]
> Sent: Thursday, June 30, 2016 9:47 AM
> To: Yuantian Tang 
> Cc: Scott Wood ; Russell King ;
> Michael Turquette ; Stephen Boyd
> ; Viresh Kumar ; linux-
> c...@vger.kernel.org; linux...@vger.kernel.org; linuxppc-
> d...@lists.ozlabs.org; Yang-Leo Li ; Xiaofeng Ren
> ; Scott Wood 
> Subject: Re: [PATCH v3 1/2] clk: Add consumer APIs for discovering possible
> parent clocks
> 
> On Wednesday, June 29, 2016 05:50:26 AM Yuantian Tang wrote:
> > Hi,
> >
> > This patch is acked by clock maintainer. If no comments from anyone else,
> we will merge it in next week.
> 
> There is a cpufreq commit depending on it.  Are you going to handle that one
> too?
> 
That one has been acked by cpufreq maintainer. You can get this from patch 
comments.

Regards,
Yuantian

Re
> Thanks,
> Rafael

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 1/2] clk: Add consumer APIs for discovering possible parent clocks

2016-06-29 Thread Rafael J. Wysocki
On Wednesday, June 29, 2016 05:50:26 AM Yuantian Tang wrote:
> Hi,
> 
> This patch is acked by clock maintainer. If no comments from anyone else, we 
> will merge it in next week.

There is a cpufreq commit depending on it.  Are you going to handle that one 
too?

Thanks,
Rafael

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 RFC] pasemi: Fix boot failure on 4.7-rc1

2016-06-29 Thread Aneesh Kumar K.V
Darren Stevens  writes:

> Commit:d6a9996e84ac4beb7713e9485f4563e100a9b03e (powerpc/mm:
> vmalloc abstraction in preparation for radix) turned kernel memory
> and IO addresses from #defined constants to variables initialised
> at runtime.
> 
> On PA6T systems the setup_arch machine call initialises the onboard
> PCI-e root-ports, and uses pci_io_base to do this, which is now before
> its value has been set resulting in a panic right after 'booting 
> linux via __start()'
>
> Move the pci_io_base initialisation to the same place as vmalloc
> ranges are set (hash__early_init_mmu()/radix__early_init_mmu())
> 
> Reported-by: Christian Zigotzky 
> Signed-off-by: Darren Stevens 


Reviewed-by: Aneesh Kumar K.V 

> 
> ---
>
> Tested on my AmigaOneX1000, I don't have access to a refence board system,
> and our developer with one is on honeymoon.
>
> I am hoping to follow this patch with others to reduce the size of the nemo
> patch we apply, eventually I'd like to see the patch gone, but that is a 
> very big job.
>
> Kind regards
> Darren
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
> b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index 88a5eca..ab84c89 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -230,6 +230,7 @@ extern unsigned long __kernel_virt_size;
>  #define KERN_VIRT_SIZE  __kernel_virt_size
>  extern struct page *vmemmap;
>  extern unsigned long ioremap_bot;
> +extern unsigned long pci_io_base;
>  #endif /* __ASSEMBLY__ */
>
>  #include 
> diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
> index 3759df5..a5ae49a 100644
> --- a/arch/powerpc/kernel/pci_64.c
> +++ b/arch/powerpc/kernel/pci_64.c
> @@ -47,7 +47,6 @@ static int __init pcibios_init(void)
>
>   printk(KERN_INFO "PCI: Probing PCI hardware\n");
>
> - pci_io_base = ISA_IO_BASE;
>   /* For now, override phys_mem_access_prot. If we need it,g
>* later, we may move that initialization to each ppc_md
>*/
> diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
> index 5b22ba0..b5b5fe6 100644
> --- a/arch/powerpc/mm/hash_utils_64.c
> +++ b/arch/powerpc/mm/hash_utils_64.c
> @@ -922,6 +922,8 @@ void __init hash__early_init_mmu(void)
>   vmemmap = (struct page *)H_VMEMMAP_BASE;
>   ioremap_bot = IOREMAP_BASE;
>
> + pci_io_base = ISA_IO_BASE;
> +
>   /* Initialize the MMU Hash table and create the linear mapping
>* of memory. Has to be done before SLB initialization as this is
>* currently where the page size encoding is obtained.
> diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
> index e58707d..095fbfa 100644
> --- a/arch/powerpc/mm/pgtable-radix.c
> +++ b/arch/powerpc/mm/pgtable-radix.c
> @@ -328,6 +328,9 @@ void __init radix__early_init_mmu(void)
>   __vmalloc_end = RADIX_VMALLOC_END;
>   vmemmap = (struct page *)RADIX_VMEMMAP_BASE;
>   ioremap_bot = IOREMAP_BASE;
> +
> + pci_io_base = ISA_IO_BASE;
> +
>   /*
>* For now radix also use the same frag size
>*/

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 3/5] powerpc: tm: Always use fp_state and vr_state to store live registers

2016-06-29 Thread Cyril Bur
On Tue, 28 Jun 2016 11:53:13 +0800
Simon Guo  wrote:

> hi Cyril,
> 
> On Wed, Jun 08, 2016 at 02:00:34PM +1000, Cyril Bur wrote:
> > @@ -1108,11 +1084,11 @@ struct task_struct *__switch_to(struct task_struct 
> > *prev,
> >  */
> > save_sprs(>thread);
> >  
> > -   __switch_to_tm(prev);
> > -
> > /* Save FPU, Altivec, VSX and SPE state */
> > giveup_all(prev);
> >  
> > +   __switch_to_tm(prev);
> > +  
> 

Hi Simon,

> There should be a bug.
> giveup_all() will clear MSR[FP] bit. 
> __switch_to_tm() reads that bit to decide whether the FP 
> register needs to be flushed to thread_struct.
> === tm_reclaim() (invoked by __switch_to_tm)
> andi.   r0, r4, MSR_FP
> beq dont_backup_fp
> 
> addir7, r3, THREAD_CKFPSTATE
> SAVE_32FPRS_VSRS(0, R6, R7) /* r6 scratch, r7 transact fp
> state */
> 
> mffsfr0
> stfdfr0,FPSTATE_FPSCR(r7)
> 
> dont_backup_fp:
> =
> 
> But now the __switch_to_tm() is moved behind giveup_all().
> So __switch_to_tm() loses MSR[FP] and cannot decide whether saving ckpt FPU 
> or not.
> 

Good catch! Yes it looks that way indeed. I thought I had a test to catch this
because this is the big risk here but upon reflection it looks like I don't
(mostly because it seems a condition to catch that is hard to craft).

I'll add a test and a fix.

Thanks.

Cyril

> The same applies to VMX/VSX.

Yeah.

> 
> Thanks,
> - Simon
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/2] cxl: Fix allocating a minimum of 2 pages for the SPA

2016-06-29 Thread Andrew Donnellan

On 29/06/16 22:16, Ian Munsie wrote:

From: Ian Munsie 

The Scheduled Process Area is allocated dynamically with enough pages to
fit at least as many processes as the AFU descriptor indicated. Since
the calculation is non-trivial, it does this by calculating how many
processes could fit in an allocation of a given order, and increasing
that order until it can fit enough processes or hits the maximum
supported size.

Currently, it will start this search using a SPA of 2 pages instead of
1. This can waste a page of memory if the AFU's maximum number of
supported processes was small enough to fit in one page.

Fix the algorithm to start the search at 1 page.

Signed-off-by: Ian Munsie 


Makes sense.

Reviewed-by: Andrew Donnellan 

--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/2] cxl: Fix allowing bogus AFU descriptors with 0 maximum processes

2016-06-29 Thread Andrew Donnellan

On 29/06/16 22:16, Ian Munsie wrote:

From: Ian Munsie 

If the AFU descriptor of an AFU directed AFU indicates that it supports
0 maximum processes, we will accept that value and attempt to use it.
The SPA will still be allocated (with 2 pages due to another minor bug
and room for 958 processes), and when a context is allocated we will
pass the value of 0 to idr_alloc as the maximum. However, idr_alloc will
treat that as meaning no maximum and will allocate a context number and
we return a valid context.

Conceivably, this could lead to a buffer overflow of the SPA if more
than 958 contexts were allocated, however this is mitigated by the fact
that there are no known AFUs in the wild with a bogus AFU descriptor
like this, and that only the root user is allowed to flash an AFU image
to a card.

Add a check when validating the AFU descriptor to reject any with 0
maximum processes.

We do still allow a dedicated process only AFU to indicate that it
supports 0 contexts even though that is forbidden in the architecture,
as in that case we ignore the value and use 1 instead. This is just on
the off-chance that such a dedicated process AFU may exist (not that I
am aware of any), since their developers are less likely to have cared
about this value at all.

Signed-off-by: Ian Munsie 


Looks good to me.

Reviewed-by: Andrew Donnellan 

--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] cxl: Fix NULL pointer dereference on kernel contexts with no AFU interrupts

2016-06-29 Thread Andrew Donnellan

On 30/06/16 04:55, Ian Munsie wrote:

From: Ian Munsie 

If a kernel context is initialised and does not have any AFU interrupts
allocated it will cause a NULL pointer dereference when the context is
detached since the irq_names list will not have been initialised.

Move the initialisation of the irq_names list into the cxl_context_init
routine so that it will be valid for the entire lifetime of the context
and will not cause a NULL pointer dereference.

Signed-off-by: Ian Munsie 


As it's nice having your machine not crash on every shutdown...

Reviewed-by: Andrew Donnellan 

--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Proposed: Patch to fix boot on PA6T

2016-06-29 Thread Darren Stevens
Hello Aneesh

On 28/06/2016, Aneesh Kumar K.V wrote:
> Another option is to init it along with rest of the variables as done in
> hash__early_init_mmu(void)/radix__early_init_mmu(void)

*FACEPALM* Why didn't I think of that! I've made this change and seems to work
- obviously I can't test on a Radix system though, as I don't have access to
one.

Patch comming shortly

Regards
Darren

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 2/9] kexec_file: Generalize kexec_add_buffer.

2016-06-29 Thread Thiago Jung Bauermann
Am Mittwoch, 29 Juni 2016, 15:47:51 schrieb Dave Young:
> On 06/28/16 at 07:18pm, Thiago Jung Bauermann wrote:
> > diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> > index e8acb2b43dd9..e16d845d587f 100644
> > --- a/include/linux/kexec.h
> > +++ b/include/linux/kexec.h
> > @@ -146,7 +146,30 @@ struct kexec_file_ops {
> > 
> >   kexec_verify_sig_t *verify_sig;
> >  
> >  #endif
> >  };
> > 
> > -#endif
> > +
> > +/**
> > + * struct kexec_buf - parameters for finding a place for a buffer in
> > memory + * @image:   kexec image in which memory to search.
> > + * @mem: On return will have address of the buffer in memory.
> > + * @memsz:   Size for the buffer in memory.
> > + * @buf_align:   Minimum alignment needed.
> > + * @buf_min: The buffer can't be placed below this address.
> > + * @buf_max: The buffer can't be placed above this address.
> > + * @top_down:Allocate from top of memory.
> > + */
> > +struct kexec_buf {
> > + struct kimage *image;
> > + unsigned long mem;
> > + unsigned long memsz;
> > + unsigned long buf_align;
> > + unsigned long buf_min;
> > + unsigned long buf_max;
> > + bool top_down;
> > +};
> 
> Rethink about the first patch, you dropped the user buffer in kexec_buf
> But later your passing IMA digests buffer patchset may need use it.
> 
> So keep it in kexec_buf should be better.

I'm not following. The IMA buffer patchset doesn't use kexec_locate_mem_hole 
nor struct kexec_buf.

> For the IMA buffer patchset I'm still reading and learning the
> background, will reply them later.

Thank you!

[]'s
Thiago Jung Bauermann
IBM Linux Technology Center

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Proposed: Patch to fix boot on PA6T

2016-06-29 Thread Darren Stevens
Hello Michael

> I can't merge this because you didn't sign it off.

TBH I wasn't really expecting you too.

> See section 11 of:
>
>   https://www.kernel.org/doc/Documentation/SubmittingPatches

Read and understood (I hope)..

Regards
Darren

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 RFC] pasemi: Fix boot failure on 4.7-rc1

2016-06-29 Thread Darren Stevens

Commit:d6a9996e84ac4beb7713e9485f4563e100a9b03e (powerpc/mm:
vmalloc abstraction in preparation for radix) turned kernel memory
and IO addresses from #defined constants to variables initialised
at runtime.

On PA6T systems the setup_arch machine call initialises the onboard
PCI-e root-ports, and uses pci_io_base to do this, which is now before
its value has been set resulting in a panic right after 'booting 
linux via __start()'

Move the pci_io_base initialisation to the same place as vmalloc
ranges are set (hash__early_init_mmu()/radix__early_init_mmu())

Reported-by: Christian Zigotzky 
Signed-off-by: Darren Stevens 

---

Tested on my AmigaOneX1000, I don't have access to a refence board system,
and our developer with one is on honeymoon.

I am hoping to follow this patch with others to reduce the size of the nemo
patch we apply, eventually I'd like to see the patch gone, but that is a 
very big job.

Kind regards
Darren
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 88a5eca..ab84c89 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -230,6 +230,7 @@ extern unsigned long __kernel_virt_size;
 #define KERN_VIRT_SIZE  __kernel_virt_size
 extern struct page *vmemmap;
 extern unsigned long ioremap_bot;
+extern unsigned long pci_io_base;
 #endif /* __ASSEMBLY__ */
 
 #include 
diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
index 3759df5..a5ae49a 100644
--- a/arch/powerpc/kernel/pci_64.c
+++ b/arch/powerpc/kernel/pci_64.c
@@ -47,7 +47,6 @@ static int __init pcibios_init(void)
 
printk(KERN_INFO "PCI: Probing PCI hardware\n");
 
-   pci_io_base = ISA_IO_BASE;
/* For now, override phys_mem_access_prot. If we need it,g
 * later, we may move that initialization to each ppc_md
 */
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 5b22ba0..b5b5fe6 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -922,6 +922,8 @@ void __init hash__early_init_mmu(void)
vmemmap = (struct page *)H_VMEMMAP_BASE;
ioremap_bot = IOREMAP_BASE;
 
+   pci_io_base = ISA_IO_BASE;
+
/* Initialize the MMU Hash table and create the linear mapping
 * of memory. Has to be done before SLB initialization as this is
 * currently where the page size encoding is obtained.
diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index e58707d..095fbfa 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -328,6 +328,9 @@ void __init radix__early_init_mmu(void)
__vmalloc_end = RADIX_VMALLOC_END;
vmemmap = (struct page *)RADIX_VMEMMAP_BASE;
ioremap_bot = IOREMAP_BASE;
+
+   pci_io_base = ISA_IO_BASE;
+
/*
 * For now radix also use the same frag size
 */
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 3/9] kexec_file: Factor out kexec_locate_mem_hole from kexec_add_buffer.

2016-06-29 Thread Thiago Jung Bauermann
Am Mittwoch, 29 Juni 2016, 15:45:18 schrieb Dave Young:
> On 06/28/16 at 07:18pm, Thiago Jung Bauermann wrote:
> > Am Dienstag, 28 Juni 2016, 15:20:55 schrieb Dave Young:
> > > On 06/27/16 at 04:21pm, Dave Young wrote:
> > > Using one argument for both sounds more reasonable than using a
> > > separate
> > > argument for memory walk..
> > 
> > I agree. This patch doesn't use a separate top_down argument, it's the
> > same patch I sent earlier except that the comments to struct kexec_buf
> > are in patch 2/9. What do you think?
> 
> It looks good except one nitpick inline..
> 
>
> > +/**
> > + * kexec_locate_mem_hole - find free memory to load segment or use in
> > purgatory
 
> It is not necessary to use only for purgatory load..

Ok, what about this?

/**
 * kexec_locate_mem_hole - find free memory in a given kimage.


[]'s
Thiago Jung Bauermann
IBM Linux Technology Center

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Proposed: Patch to fix boot on PA6T

2016-06-29 Thread Darren Stevens
Hello Benjamin

On 27/06/2016, Benjamin Herrenschmidt wrote:
> Tell me more, when is that mapping done ? I'm changing things so that
> platform probe is called much later so that might have an impact.
>
> What consumes pci_io_base before it's been initialized ?

pas_pci_init() is the culprit. Following on from Aneesh's suggestion an
improved patch will follow shoertly.

Regards
Darren

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 2/9] kexec_file: Generalize kexec_add_buffer.

2016-06-29 Thread Dave Young
On 06/28/16 at 07:18pm, Thiago Jung Bauermann wrote:
> Am Donnerstag, 23 Juni 2016, 10:25:06 schrieb Dave Young:
> > On 06/22/16 at 08:30pm, Thiago Jung Bauermann wrote:
> > > Am Mittwoch, 22 Juni 2016, 18:20:47 schrieb Dave Young:
> > > > The patch looks good, but could the subject be more specific?
> > > > 
> > > > For example just like the first sentence of the patch descriotion:
> > > > Allow architectures to specify their own memory walking function
> > > 
> > > Ok, What about this? I also changed the description to refer to x86 arch
> > > instead of Intel arch.
> > 
> > It looks good to me.
> 
> This version has the struct kexec_buf documentation comments that were
> in patch 3/9. I fixed the names of the struct members, and changed their
> descriptions to try to be clearer.
> -- 
> []'s
> Thiago Jung Bauermann
> IBM Linux Technology Center
> 
> 
> Subject: [PATCH 2/9] kexec_file: Allow arch-specific memory walking for
>  kexec_add_buffer
> 
> Allow architectures to specify a different memory walking function for
> kexec_add_buffer. x86 uses iomem to track reserved memory ranges, but
> PowerPC uses the memblock subsystem.
> 
> Signed-off-by: Thiago Jung Bauermann 
> Cc: Eric Biederman 
> Cc: Dave Young 
> Cc: ke...@lists.infradead.org
> Cc: linux-ker...@vger.kernel.org
> ---
>  include/linux/kexec.h   | 25 -
>  kernel/kexec_file.c | 30 ++
>  kernel/kexec_internal.h | 14 --
>  3 files changed, 46 insertions(+), 23 deletions(-)
> 
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index e8acb2b43dd9..e16d845d587f 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -146,7 +146,30 @@ struct kexec_file_ops {
>   kexec_verify_sig_t *verify_sig;
>  #endif
>  };
> -#endif
> +
> +/**
> + * struct kexec_buf - parameters for finding a place for a buffer in memory
> + * @image:   kexec image in which memory to search.
> + * @mem: On return will have address of the buffer in memory.
> + * @memsz:   Size for the buffer in memory.
> + * @buf_align:   Minimum alignment needed.
> + * @buf_min: The buffer can't be placed below this address.
> + * @buf_max: The buffer can't be placed above this address.
> + * @top_down:Allocate from top of memory.
> + */
> +struct kexec_buf {
> + struct kimage *image;
> + unsigned long mem;
> + unsigned long memsz;
> + unsigned long buf_align;
> + unsigned long buf_min;
> + unsigned long buf_max;
> + bool top_down;
> +};

Rethink about the first patch, you dropped the user buffer in kexec_buf
But later your passing IMA digests buffer patchset may need use it.

So keep it in kexec_buf should be better.

For the IMA buffer patchset I'm still reading and learning the
background, will reply them later.

> +
> +int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf,
> +int (*func)(u64, u64, void *));
> +#endif /* CONFIG_KEXEC_FILE */
>  
>  struct kimage {
>   kimage_entry_t head;
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index b6eec7527e9f..b1f1f6402518 100644
> --- a/kernel/kexec_file.c
> +++ b/kernel/kexec_file.c
> @@ -428,6 +428,27 @@ static int locate_mem_hole_callback(u64 start, u64 end, 
> void *arg)
>   return locate_mem_hole_bottom_up(start, end, kbuf);
>  }
>  
> +/**
> + * arch_kexec_walk_mem - call func(data) on free memory regions
> + * @kbuf:Context info for the search. Also passed to @func.
> + * @func:Function to call for each memory region.
> + *
> + * Return: The memory walk will stop when func returns a non-zero value
> + * and that value will be returned. If all free regions are visited without
> + * func returning non-zero, then zero will be returned.
> + */
> +int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf,
> +int (*func)(u64, u64, void *))
> +{
> + if (kbuf->image->type == KEXEC_TYPE_CRASH)
> + return walk_iomem_res_desc(crashk_res.desc,
> +IORESOURCE_SYSTEM_RAM | 
> IORESOURCE_BUSY,
> +crashk_res.start, crashk_res.end,
> +kbuf, func);
> + else
> + return walk_system_ram_res(0, ULONG_MAX, kbuf, func);
> +}
> +
>  /*
>   * Helper function for placing a buffer in a kexec segment. This assumes
>   * that kexec_mutex is held.
> @@ -472,14 +493,7 @@ int kexec_add_buffer(struct kimage *image, char *buffer, 
> unsigned long bufsz,
>   kbuf->top_down = top_down;
>  
>   /* Walk the RAM ranges and allocate a suitable range for the buffer */
> - if (image->type == KEXEC_TYPE_CRASH)
> - ret = walk_iomem_res_desc(crashk_res.desc,
> - IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY,
> - crashk_res.start, crashk_res.end, kbuf,
> -

Re: [PATCH v3 3/9] kexec_file: Factor out kexec_locate_mem_hole from kexec_add_buffer.

2016-06-29 Thread Dave Young
On 06/28/16 at 07:18pm, Thiago Jung Bauermann wrote:
> Am Dienstag, 28 Juni 2016, 15:20:55 schrieb Dave Young:
> > On 06/27/16 at 04:21pm, Dave Young wrote:
> > > Please ignore previous reply, I mistakenly send a broken mail without
> > > subject, sorry about it. Resend the reply here.
> > > 
> > > On 06/27/16 at 01:37pm, Thiago Jung Bauermann wrote:
> > > > Am Dienstag, 28 Juni 2016, 00:19:48 schrieb Dave Young:
> > > > > On 06/23/16 at 12:37pm, Thiago Jung Bauermann wrote:
> > > > > > Am Donnerstag, 23 Juni 2016, 01:44:07 schrieb Dave Young:
> > > > > > What is bad about the description of top_down?
> > > > > 
> > > > > It is not clear enough to me, I personally think the original one in
> > > > > source code is better:
> > > > > /* allocate from top of memory hole */
> > > > 
> > > > Actually I realized there's some discrepancy in how the x86 code uses
> > > > top_down and how I need it to work in powerpc. This may be what is
> > > > confusing about my comment and the existing comment.
> > > > 
> > > > x86 always walks memory from bottom to top but if top_down is true, in
> > > > each memory region it will allocate the memory hole in the highest
> > > > address within that region. I don't know why it is done that way,
> > > > though.
> > > 
> > > I think we did not meaning to do this, considering kdump we have only
> > > one crashkernel region for searching (crashk_res) so it is fine.
> > > For kexec maybe changing the walking function to accept top_down is
> > > reasonable.
> > > 
> > > Ccing Vivek see if he can remember something..
> > > 
> > > > On powerpc, the memory walk itself should be from top to bottom, as
> > > > well as the memory hole allocation within each memory region.
> > 
> > What is the particular reason in powerpc for a mandatory top to bottom
> > walking?
> 
> I'm walking unreserved memory ranges, so reservations made low in memory 
> (such as the reservation for the initrd) may create a memory hole that is a 
> lot lower than the true memory limit where I want to allocate from (768 MB). 
> In this situation, allocating at the highest address in the lowest free 
> memory range will allocate the buffer very low in memory, and in that case 
> top_down doesn't mean much.
> 
> Walking memory from lowest to highest address but then allocating memory at 
> the highest address inside the memory range is peculiar and surprising. Is 
> there a particular reason for it?

I do not know if there's some historic reason, personally I think it
should be an accident.

> 
> If it's an accident and doesn't affect x86, I'd suggest that top_down should
> have its expected behavior, which (at least for me) is: allocate from the
> highest available memory address within the desired range.

I tend to agree, but we need test it first to see if it breaks something.

> 
> In any case, my patch series allows each architecture to define what
> top_down should mean. It doesn't change the behavior in x86, since
> the default implementation of arch_kexec_walk_mem ignores
> kexec_buf.top_down, and allows powerpc to take top_down into account
> when walking memory.
> 
> > > > Should I add a separate top_down argument to kexec_locate_mem_hole to
> > > > control if the memory walk should be from top to bottom, and then the
> > > > bottom_up member of struct kexec_buf controls where inside each memory
> > > > region the memory hole will be allocated?
> > 
> > Using one argument for both sounds more reasonable than using a separate
> > argument for memory walk..
> 
> I agree. This patch doesn't use a separate top_down argument, it's the same
> patch I sent earlier except that the comments to struct kexec_buf are in
> patch 2/9. What do you think?

It looks good except one nitpick inline..

> 
> -- 
> []'s
> Thiago Jung Bauermann
> IBM Linux Technology Center
> 
> 
> Subject: [PATCH 3/9] kexec_file: Factor out kexec_locate_mem_hole from
>  kexec_add_buffer.
> 
> kexec_locate_mem_hole will be used by the PowerPC kexec_file_load
> implementation to find free memory for the purgatory stack.
> 
> Signed-off-by: Thiago Jung Bauermann 
> Cc: Eric Biederman 
> Cc: Dave Young 
> Cc: ke...@lists.infradead.org
> Cc: linux-ker...@vger.kernel.org
> ---
>  include/linux/kexec.h |  1 +
>  kernel/kexec_file.c   | 25 -
>  2 files changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index e16d845d587f..2b34e69db679 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -169,6 +169,7 @@ struct kexec_buf {
>  
>  int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf,
>  int (*func)(u64, u64, void *));
> +int kexec_locate_mem_hole(struct kexec_buf *kbuf);
>  #endif /* CONFIG_KEXEC_FILE */
>  
>  struct kimage {
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index b1f1f6402518..445d66add8ca 100644
> --- a/kernel/kexec_file.c
> +++ 

[PATCH] cxl: Fix NULL pointer dereference on kernel contexts with no AFU interrupts

2016-06-29 Thread Ian Munsie
From: Ian Munsie 

If a kernel context is initialised and does not have any AFU interrupts
allocated it will cause a NULL pointer dereference when the context is
detached since the irq_names list will not have been initialised.

Move the initialisation of the irq_names list into the cxl_context_init
routine so that it will be valid for the entire lifetime of the context
and will not cause a NULL pointer dereference.

Signed-off-by: Ian Munsie 
---
 drivers/misc/cxl/context.c | 2 ++
 drivers/misc/cxl/irq.c | 3 ---
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/misc/cxl/context.c b/drivers/misc/cxl/context.c
index 26d206b..edbb99e 100644
--- a/drivers/misc/cxl/context.c
+++ b/drivers/misc/cxl/context.c
@@ -67,6 +67,8 @@ int cxl_context_init(struct cxl_context *ctx, struct cxl_afu 
*afu, bool master,
ctx->pending_fault = false;
ctx->pending_afu_err = false;
 
+   INIT_LIST_HEAD(>irq_names);
+
/*
 * When we have to destroy all contexts in cxl_context_detach_all() we
 * end up with afu_release_irqs() called from inside a
diff --git a/drivers/misc/cxl/irq.c b/drivers/misc/cxl/irq.c
index 8def455..f3a7d4a 100644
--- a/drivers/misc/cxl/irq.c
+++ b/drivers/misc/cxl/irq.c
@@ -260,9 +260,6 @@ int afu_allocate_irqs(struct cxl_context *ctx, u32 count)
else
alloc_count = count + 1;
 
-   /* Initialize the list head to hold irq names */
-   INIT_LIST_HEAD(>irq_names);
-
if ((rc = cxl_ops->alloc_irq_ranges(>irqs, ctx->afu->adapter,
alloc_count)))
return rc;
-- 
2.8.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/2] cxl: Workaround XSL bug that does not clear the RA bit after a reset

2016-06-29 Thread Ian Munsie
From: Ian Munsie 

An issue was noted in our debug logs where the XSL would leave the RA
bit asserted after an AFU reset operation, which would effectively
prevent further AFU reset operations from working.

Workaround the issue by clearing the RA bit with an MMIO write if it is
still asserted after any AFU control operation.

Signed-off-by: Ian Munsie 
---
 drivers/misc/cxl/native.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/misc/cxl/native.c b/drivers/misc/cxl/native.c
index 9479bfc..bc79be8 100644
--- a/drivers/misc/cxl/native.c
+++ b/drivers/misc/cxl/native.c
@@ -55,6 +55,16 @@ static int afu_control(struct cxl_afu *afu, u64 command, u64 
clear,
cpu_relax();
AFU_Cntl = cxl_p2n_read(afu, CXL_AFU_Cntl_An);
};
+
+   if (AFU_Cntl & CXL_AFU_Cntl_An_RA) {
+   /*
+* Workaround for a bug in the XSL used in the Mellanox CX4
+* that fails to clear the RA bit after an AFU reset,
+* preventing subsequent AFU resets from working.
+*/
+   cxl_p2n_write(afu, CXL_AFU_Cntl_An, AFU_Cntl & 
~CXL_AFU_Cntl_An_RA);
+   }
+
pr_devel("AFU command complete: %llx\n", command);
afu->enabled = enabled;
 out:
-- 
2.8.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/2] cxl: Fix bug where AFU disable operation had no effect

2016-06-29 Thread Ian Munsie
From: Ian Munsie 

The AFU disable operation has a bug where it will not clear the enable
bit and therefore will have no effect. To date this has likely been
masked by fact that we perform an AFU reset before the disable, which
also has the effect of clearing the enable bit, making the following
disable operation effectively a noop on most hardware. This patch
modifies the afu_control function to take a parameter to clear from the
AFU control register so that the disable operation can clear the
appropriate bit.

This bug was uncovered on the Mellanox CX4, which uses an XSL rather
than a PSL. On the XSL the reset operation will not complete while the
AFU is enabled, meaning the enable bit was still set at the start of the
disable and as a result this bug was hit and the disable also timed out.

Because of this difference in behaviour between the PSL and XSL, this
patch now makes the reset dependent on the card using a PSL to avoid
waiting for a timeout on the XSL. It is entirely possible that we may be
able to drop the reset altogether if it turns out we only ever needed it
due to this bug - however I am not willing to drop it without further
regression testing.

This also fixes a small issue where the AFU_Cntl register was read
outside of the lock that protects it.

Signed-off-by: Ian Munsie 
---
 drivers/misc/cxl/cxl.h|  1 +
 drivers/misc/cxl/native.c | 36 
 drivers/misc/cxl/pci.c|  1 +
 3 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index ce2b9d5..bab8dfd 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -544,6 +544,7 @@ struct cxl_service_layer_ops {
void (*write_timebase_ctrl)(struct cxl *adapter);
u64 (*timebase_read)(struct cxl *adapter);
int capi_mode;
+   bool needs_reset_before_disable;
 };
 
 struct cxl_native {
diff --git a/drivers/misc/cxl/native.c b/drivers/misc/cxl/native.c
index 120c468..9479bfc 100644
--- a/drivers/misc/cxl/native.c
+++ b/drivers/misc/cxl/native.c
@@ -21,10 +21,10 @@
 #include "cxl.h"
 #include "trace.h"
 
-static int afu_control(struct cxl_afu *afu, u64 command,
+static int afu_control(struct cxl_afu *afu, u64 command, u64 clear,
   u64 result, u64 mask, bool enabled)
 {
-   u64 AFU_Cntl = cxl_p2n_read(afu, CXL_AFU_Cntl_An);
+   u64 AFU_Cntl;
unsigned long timeout = jiffies + (HZ * CXL_TIMEOUT);
int rc = 0;
 
@@ -33,7 +33,8 @@ static int afu_control(struct cxl_afu *afu, u64 command,
 
trace_cxl_afu_ctrl(afu, command);
 
-   cxl_p2n_write(afu, CXL_AFU_Cntl_An, AFU_Cntl | command);
+   AFU_Cntl = cxl_p2n_read(afu, CXL_AFU_Cntl_An);
+   cxl_p2n_write(afu, CXL_AFU_Cntl_An, (AFU_Cntl & ~clear) | command);
 
AFU_Cntl = cxl_p2n_read(afu, CXL_AFU_Cntl_An);
while ((AFU_Cntl & mask) != result) {
@@ -67,7 +68,7 @@ static int afu_enable(struct cxl_afu *afu)
 {
pr_devel("AFU enable request\n");
 
-   return afu_control(afu, CXL_AFU_Cntl_An_E,
+   return afu_control(afu, CXL_AFU_Cntl_An_E, 0,
   CXL_AFU_Cntl_An_ES_Enabled,
   CXL_AFU_Cntl_An_ES_MASK, true);
 }
@@ -76,7 +77,8 @@ int cxl_afu_disable(struct cxl_afu *afu)
 {
pr_devel("AFU disable request\n");
 
-   return afu_control(afu, 0, CXL_AFU_Cntl_An_ES_Disabled,
+   return afu_control(afu, 0, CXL_AFU_Cntl_An_E,
+  CXL_AFU_Cntl_An_ES_Disabled,
   CXL_AFU_Cntl_An_ES_MASK, false);
 }
 
@@ -85,7 +87,7 @@ static int native_afu_reset(struct cxl_afu *afu)
 {
pr_devel("AFU reset request\n");
 
-   return afu_control(afu, CXL_AFU_Cntl_An_RA,
+   return afu_control(afu, CXL_AFU_Cntl_An_RA, 0,
   CXL_AFU_Cntl_An_RS_Complete | 
CXL_AFU_Cntl_An_ES_Disabled,
   CXL_AFU_Cntl_An_RS_MASK | CXL_AFU_Cntl_An_ES_MASK,
   false);
@@ -595,7 +597,16 @@ static int deactivate_afu_directed(struct cxl_afu *afu)
cxl_sysfs_afu_m_remove(afu);
cxl_chardev_afu_remove(afu);
 
-   cxl_ops->afu_reset(afu);
+   if (afu->adapter->native->sl_ops->needs_reset_before_disable) {
+   /*
+* XXX: We may be able to do away with this entirely - it is
+* possible that this was only ever needed due to a bug where
+* the disable operation did not clear the enable bit, however
+* I will only consider dropping this after more regression
+* testing on earlier PSL images.
+*/
+   cxl_ops->afu_reset(afu);
+   }
cxl_afu_disable(afu);
cxl_psl_purge(afu);
 
@@ -735,7 +746,16 @@ static int native_attach_process(struct cxl_context *ctx, 
bool kernel,
 
 static inline int detach_process_native_dedicated(struct cxl_context *ctx)
 {
-   

[PATCH v7] powerpc/pci: Assign fixed PHB number based on device-tree properties

2016-06-29 Thread Guilherme G. Piccoli
The domain/PHB field of PCI addresses has its value obtained from a
global variable, incremented each time a new domain (represented by
struct pci_controller) is added on the system. The domain addition
process happens during boot or due to PHB hotplug add.

As recent kernels are using predictable naming for network interfaces,
the network stack is more tied to PCI naming. This can be a problem in
hotplug scenarios, because PCI addresses will change if devices are
removed and then re-added. This situation seems unusual, but it can
happen if a user wants to replace a NIC without rebooting the machine,
for example.

This patch changes the way PCI domain values are generated: now, we use
device-tree properties to assign fixed PHB numbers to PCI addresses
when available (meaning pSeries and PowerNV cases). We also use a bitmap
to allow dynamic PHB numbering when device-tree properties are not
used. This bitmap keeps track of used PHB numbers and if a PHB is
released (by hotplug operations for example), it allows the reuse of
this PHB number, avoiding PCI address to change in case of device remove
and re-add soon after. No functional changes were introduced.

Signed-off-by: Guilherme G. Piccoli 
Reviewed-by: Gavin Shan 
Reviewed-by: Ian Munsie 
---
v7:
* Removed the goto as per Michael's suggestion;

* Changed of_property_read_u32_array() to of_property_read_u32_index(),
as per Gavin's suggestion. This way, we end up using buid_low as the index
of PHB in pSeries, which is expected but was not being achieved in v6,
as per my mistake.

* Didn't remove machine check for pSeries on "reg" property lookup.
It's worthy to keep it, since almost every platform (if not all of them)
contain the "reg" property on PHB node in device-tree, but only in
pSeries we're 100% sure it can be used as the PHB unique identifier.
Since the patch has a dynamic PHB numbering mechanism, the other platforms
won't have trouble with it.

 arch/powerpc/kernel/pci-common.c | 53 +---
 1 file changed, 50 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 0f7a60f..c87545b 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -41,11 +41,18 @@
 #include 
 #include 
 
+/* hose_spinlock protects accesses to the the phb_bitmap. */
 static DEFINE_SPINLOCK(hose_spinlock);
 LIST_HEAD(hose_list);
 
-/* XXX kill that some day ... */
-static int global_phb_number;  /* Global phb counter */
+/* For dynamic PHB numbering on get_phb_number(): max number of PHBs. */
+#define MAX_PHBS 0x1
+
+/*
+ * For dynamic PHB numbering: used/free PHBs tracking bitmap.
+ * Accesses to this bitmap should be protected by hose_spinlock.
+ */
+static DECLARE_BITMAP(phb_bitmap, MAX_PHBS);
 
 /* ISA Memory physical address */
 resource_size_t isa_mem_base;
@@ -64,6 +71,41 @@ struct dma_map_ops *get_pci_dma_ops(void)
 }
 EXPORT_SYMBOL(get_pci_dma_ops);
 
+/*
+ * This function should run under locking protection, specifically
+ * hose_spinlock.
+ */
+static int get_phb_number(struct device_node *dn)
+{
+   u64 prop;
+   int ret, phb_id = -1;
+
+   /*
+* Try fixed PHB numbering first, by checking archs and reading
+* the respective device-tree properties. Firstly, try PowerNV by
+* reading "ibm,opal-phbid", only present in OPAL environment.
+*/
+   ret = of_property_read_u64(dn, "ibm,opal-phbid", );
+   if (ret && machine_is(pseries))
+   ret = of_property_read_u32_index(dn, "reg", 1, (u32 *));
+   if (!ret)
+   phb_id = (int)(prop & (MAX_PHBS - 1));
+
+   /* We need to be sure to not use the same PHB number twice. */
+   if ((phb_id >= 0) && !test_and_set_bit(phb_id, phb_bitmap))
+   return phb_id;
+
+   /*
+* If not pSeries nor PowerNV, or if fixed PHB numbering tried to add
+* the same PHB number twice, then fallback to dynamic PHB numbering.
+*/
+   phb_id = find_first_zero_bit(phb_bitmap, MAX_PHBS);
+   BUG_ON(phb_id >= MAX_PHBS);
+   set_bit(phb_id, phb_bitmap);
+
+   return phb_id;
+}
+
 struct pci_controller *pcibios_alloc_controller(struct device_node *dev)
 {
struct pci_controller *phb;
@@ -72,7 +114,7 @@ struct pci_controller *pcibios_alloc_controller(struct 
device_node *dev)
if (phb == NULL)
return NULL;
spin_lock(_spinlock);
-   phb->global_number = global_phb_number++;
+   phb->global_number = get_phb_number(dev);
list_add_tail(>list_node, _list);
spin_unlock(_spinlock);
phb->dn = dev;
@@ -94,6 +136,11 @@ EXPORT_SYMBOL_GPL(pcibios_alloc_controller);
 void pcibios_free_controller(struct pci_controller *phb)
 {
spin_lock(_spinlock);
+
+   /* Clear bit of phb_bitmap to allow reuse of this PHB number. */
+   if 

Re: [PATCH][v3] mtd/ifc: Add support for IFC controller version 2.0

2016-06-29 Thread Brian Norris
Hi,

On Wed, Jun 29, 2016 at 02:53:03PM +, Raghav Dogra wrote:
> 
> 
> > -Original Message-
> > From: Leo Li [mailto:pku@gmail.com]
> > Sent: Saturday, May 28, 2016 3:34 AM

1 month delay? So much for the rush...

> > To: Brian Norris ; Raghav Dogra
> > 
> > Cc: Boris Brezillon ; Yang-Leo Li
> > ; Prabhakar Kushwaha
> > ; Scott Wood ; linux-
> > m...@lists.infradead.org; linuxppc-dev ;
> > Raghav Dogra ; Jaiprakash Singh
> > 
> > Subject: Re: [PATCH][v3] mtd/ifc: Add support for IFC controller version 2.0
> > 
> > On Fri, May 27, 2016 at 4:12 PM, Brian Norris
> >  wrote:
> > > On Fri, May 27, 2016 at 10:44:01PM +0200, Boris Brezillon wrote:
> > >> On Fri, 27 May 2016 15:15:00 -0500
> > >> Leo Li  wrote:
> > >> >
> > >> > The pull request does have patch "mtd/ifc: Add support for IFC
> > >> > controller version 2.0", but it doesn't have another patch
> > >> > "driver/memory: Update dependency of IFC for
> > >> > Layerscape"(https://patchwork.ozlabs.org/patch/557389/) needed to
> > >> > make the driver selectable on new hardware.
> > >
[...]

> > >> BTW, in the patch description you say you're only modifying a Kconfig
> > >> dependency, but you're actually doing more than that: you're removing
> > >> an asm header inclusion and manually include several other headers
> > >> (which I guess were previously included by asm/prom.h).
> > >
> > > Please resend this patch with a more complete commit description; I'd
> > > like it to get actual review (and time in linux-next) before it gets
> > > merged, so at best, it'll wait a few -rc's. I also suspect the patch
> > > isn't optimal. I believe Scott has suggested [1] that we didn't need
> > > the FSL_SOC dependency on the LBC driver. I think IFC looks like a
> > > similar case?
> 
> Hi Brian,
> 
> The patch being talked about does not add a FSL_SOC dependency on the IFC 
> driver.
> It uses a generic ARCH_LAYERSCAPE macro to enable IFC. This should be Ok? 

Maybe... but if we know that this driver doesn't actually have an
FSL_SOC dependency, and the FSL maintainers don't really want it in the
first place, then a simpler patch is to just remove the FSL_SOC
dependency, rather than making the deps more complicated.

But anyway, if you resend with the comments addresses (e.g., better
commit description), then we can consider applying it. If the FSL folks
have nothing to contribute here, then I don't see why we wouldn't take
your patch.

Regards,
Brian
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 2/2] powerpc/pseries: Remove call to memblock_add()

2016-06-29 Thread Nathan Fontenot
The call to memblock_add is not needed, this is already done by
memory_add(). This patch removes this call which shrinks
dlpar_add_lmb_memory() enough that it can be merged into dlpar_add_lmb().

Signed-off-by: Nathan Fontenot 
---
 arch/powerpc/platforms/pseries/hotplug-memory.c |   37 ++-
 1 file changed, 10 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 076cd7e..43c8691 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -533,36 +533,11 @@ static int dlpar_memory_remove_by_index(u32 drc_index, 
struct property *prop)
 
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
-static int dlpar_add_lmb_memory(struct of_drconf_cell *lmb)
+static int dlpar_add_lmb(struct of_drconf_cell *lmb)
 {
unsigned long block_sz;
int nid, rc;
 
-   block_sz = memory_block_size_bytes();
-
-   /* Find the node id for this address */
-   nid = memory_add_physaddr_to_nid(lmb->base_addr);
-
-   /* Add the memory */
-   rc = add_memory(nid, lmb->base_addr, block_sz);
-   if (rc)
-   return rc;
-
-   /* Register this block of memory */
-   rc = memblock_add(lmb->base_addr, block_sz);
-   if (rc) {
-   remove_memory(nid, lmb->base_addr, block_sz);
-   return rc;
-   }
-
-   lmb->flags |= DRCONF_MEM_ASSIGNED;
-   return 0;
-}
-
-static int dlpar_add_lmb(struct of_drconf_cell *lmb)
-{
-   int rc;
-
if (lmb->flags & DRCONF_MEM_ASSIGNED)
return -EINVAL;
 
@@ -578,10 +553,18 @@ static int dlpar_add_lmb(struct of_drconf_cell *lmb)
return rc;
}
 
-   rc = dlpar_add_lmb_memory(lmb);
+   block_sz = memory_block_size_bytes();
+
+   /* Find the node id for this address */
+   nid = memory_add_physaddr_to_nid(lmb->base_addr);
+
+   /* Add the memory */
+   rc = add_memory(nid, lmb->base_addr, block_sz);
if (rc) {
dlpar_remove_device_tree_lmb(lmb);
dlpar_release_drc(lmb->drc_index);
+   } else {
+   lmb->flags |= DRCONF_MEM_ASSIGNED;
}
 
return rc;

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 1/2] powerpc/pseries: Auto-online hotplugged memory

2016-06-29 Thread Nathan Fontenot
A recent update (commit id 31bc3858ea3) allows for automatically
onlining memory that is added. This patch sets the config option
CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y for pseries and updates the
pseries memory hotplug code so that DLPAR added memory can be
automatically onlined instead of explicitly onlining the memory.

Signed-off-by: Nathan Fontenot 
---
 arch/powerpc/configs/pseries_defconfig  |1 +
 arch/powerpc/platforms/pseries/hotplug-memory.c |   14 --
 2 files changed, 1 insertion(+), 14 deletions(-)

diff --git a/arch/powerpc/configs/pseries_defconfig 
b/arch/powerpc/configs/pseries_defconfig
index 36871a4..725f411 100644
--- a/arch/powerpc/configs/pseries_defconfig
+++ b/arch/powerpc/configs/pseries_defconfig
@@ -53,6 +53,7 @@ CONFIG_KEXEC=y
 CONFIG_IRQ_ALL_CPUS=y
 CONFIG_MEMORY_HOTPLUG=y
 CONFIG_MEMORY_HOTREMOVE=y
+CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y
 CONFIG_KSM=y
 CONFIG_TRANSPARENT_HUGEPAGE=y
 CONFIG_PPC_64K_PAGES=y
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 2ce1385..076cd7e 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -535,7 +535,6 @@ static int dlpar_memory_remove_by_index(u32 drc_index, 
struct property *prop)
 
 static int dlpar_add_lmb_memory(struct of_drconf_cell *lmb)
 {
-   struct memory_block *mem_block;
unsigned long block_sz;
int nid, rc;
 
@@ -556,19 +555,6 @@ static int dlpar_add_lmb_memory(struct of_drconf_cell *lmb)
return rc;
}
 
-   mem_block = lmb_to_memblock(lmb);
-   if (!mem_block) {
-   remove_memory(nid, lmb->base_addr, block_sz);
-   return -EINVAL;
-   }
-
-   rc = device_online(_block->dev);
-   put_device(_block->dev);
-   if (rc) {
-   remove_memory(nid, lmb->base_addr, block_sz);
-   return rc;
-   }
-
lmb->flags |= DRCONF_MEM_ASSIGNED;
return 0;
 }

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 0/2] powerpc/pseries: Auto-online hotplugged memory

2016-06-29 Thread Nathan Fontenot
Recent updates to the core mm code allow for auto-onlining of added
memory, commit id 31bc3858ea3. This update to the pseries hotplug
memory code takes advantage of this by setting the config option
CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y so that we online added memory
by default, this maintains the current functionality, and removes
the code that explicitly onlines added memory.

Additionally the call to memblock_add() is removed, this is already
done in memory_add().

Changes from v1:
- The memhp_auto_online variable is no longer explicitly set.
- The patch is split into two patches
1/2 - Update to us auto-onlining capabilities
2/2 - Remove the call to memblock_add


-Nathan
---
 configs/pseries_defconfig  |1 
 platforms/pseries/hotplug-memory.c |   51 +++--
 2 files changed, 11 insertions(+), 41 deletions(-)

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/2] cxl: Fix allocating a minimum of 2 pages for the SPA

2016-06-29 Thread Frederic Barrat



Le 29/06/2016 14:16, Ian Munsie a écrit :

From: Ian Munsie 

The Scheduled Process Area is allocated dynamically with enough pages to
fit at least as many processes as the AFU descriptor indicated. Since
the calculation is non-trivial, it does this by calculating how many
processes could fit in an allocation of a given order, and increasing
that order until it can fit enough processes or hits the maximum
supported size.

Currently, it will start this search using a SPA of 2 pages instead of
1. This can waste a page of memory if the AFU's maximum number of
supported processes was small enough to fit in one page.

Fix the algorithm to start the search at 1 page.

Signed-off-by: Ian Munsie 



Reviewed-by: Frederic Barrat 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/2] cxl: Fix allowing bogus AFU descriptors with 0 maximum processes

2016-06-29 Thread Frederic Barrat


Le 29/06/2016 14:16, Ian Munsie a écrit :

From: Ian Munsie 

If the AFU descriptor of an AFU directed AFU indicates that it supports
0 maximum processes, we will accept that value and attempt to use it.
The SPA will still be allocated (with 2 pages due to another minor bug
and room for 958 processes), and when a context is allocated we will
pass the value of 0 to idr_alloc as the maximum. However, idr_alloc will
treat that as meaning no maximum and will allocate a context number and
we return a valid context.

Conceivably, this could lead to a buffer overflow of the SPA if more
than 958 contexts were allocated, however this is mitigated by the fact
that there are no known AFUs in the wild with a bogus AFU descriptor
like this, and that only the root user is allowed to flash an AFU image
to a card.

Add a check when validating the AFU descriptor to reject any with 0
maximum processes.

We do still allow a dedicated process only AFU to indicate that it
supports 0 contexts even though that is forbidden in the architecture,
as in that case we ignore the value and use 1 instead. This is just on
the off-chance that such a dedicated process AFU may exist (not that I
am aware of any), since their developers are less likely to have cared
about this value at all.

Signed-off-by: Ian Munsie 



Reviewed-by: Frederic Barrat 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3] cpuidle: Fix last_residency division

2016-06-29 Thread Nicolas Pitre
On Wed, 29 Jun 2016, Nicolas Pitre wrote:

> On Wed, 29 Jun 2016, Daniel Lezcano wrote:
> 
> > On 06/29/2016 09:06 AM, Shreyas B. Prabhu wrote:
> > > diff --git a/drivers/cpuidle/cpuidle.h b/drivers/cpuidle/cpuidle.h
> > > index f87f399..c8ea5ad 100644
> > > --- a/drivers/cpuidle/cpuidle.h
> > > +++ b/drivers/cpuidle/cpuidle.h
> > > @@ -68,4 +68,27 @@ static inline void
> > > cpuidle_coupled_unregister_device(struct cpuidle_device *dev)
> > >   }
> > >   #endif
> > >
> > > +/*
> > > + * Used for calculating last_residency in usec. Optimized for case
> > > + * where last_residency in nsecs is < INT_MAX/2 by using faster
> > > + * approximation. Approximated value has less than 1% error.
> > > + */
> > > +static inline int convert_nsec_to_usec(u64 nsec)
> > > +{
> > > + if (likely(nsec < INT_MAX / 2)) {
> > 
> > UINT_MAX ?
> 
> Actually this can be better than that.
> 
> > > + int usec = (int)nsec;
> 
> First, you'll want an unsigned type. Given the provided argument is u64, 
> we can assume there won't be any negative values here.
> 
> Then it would be wise to use a type with an explicit width, like U32.
> 
> > > + usec += usec >> 5;
> > > + usec = usec >> 10;
> > > + return usec;
> 
> And now you want to maximize the available range. So not to overflow the 
> first addition, we must respect:
> 
>   usec + (usec >> 5) <= 0x 
>   usec + usec/32 <= 0x
>   usec <= (0x * 32) / 33
> 
> Therefore:
> 
>   nsec <= 0xf83e0f82

And to be sure, you should use 0xf83e0f82UL to avoid any potential sign 
extension.


Nicolas
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/mm: update arch_{add,remove}_memory() for radix

2016-06-29 Thread Reza Arbab

On Tue, Jun 28, 2016 at 09:21:05PM +1000, Michael Ellerman wrote:

No, you need to use mmu_linear_psize for the hotplug case.

But you can probably factor out a common routine that both cases use, and hide
the hash vs radix check in that.


Okay, I'm trying to refactor {create,remove}_section_mapping() into 
hash__ and radix__ variants. This lead to a couple of questions.


Pseudocode for radix__create_section_mapping(start, end):

page_size = 1 << mmu_psize_defs[mmu_linear_psize].shift;
start = _ALIGN_DOWN(start, page_size);

for (; start < end; start += page_size) {
radix__map_kernel_page(start, __pa(start),
   PAGE_KERNEL, page_size);
}

Should the above use PAGE_KERNEL, like the the hash table bolt, or 
(_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_KERNEL_RW), like in the radix 
vmemmap creation?


The other question is what radix__remove_section_mapping() should do.
I don't know offhand what the opposite of map_kernel_page() is. As 
Aneesh mentioned, radix vmemmap removal is currently stubbed as a FIXME 
so I couldn't use that as a reference.


--
Reza Arbab

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 3/4] perf annotate: add powerpc support

2016-06-29 Thread Ravi Bangoria

Thanks Naveen,

On Wednesday 29 June 2016 08:15 PM, Naveen N. Rao wrote:

On 2016/06/29 04:45PM, Ravi Bangoria wrote:

From: Naveen N. Rao 

Powerpc has long list of branch instructions and hardcoding them in
table appears to be error-prone. So, add new function to find
instruction instead of creating table. This function dynamically
create table(list of 'struct ins'), and instead of creating object
every time, first check if list already contain object for that
nemonics.

Signed-off-by: Naveen N. Rao 
Signed-off-by: Ravi Bangoria 
---
Changes in v2:
   - Corrected few memory leaks.
   - Created Dynamic list for powerpc to optimize memory consumption

  tools/perf/util/annotate.c | 121 +
  1 file changed, 121 insertions(+)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 36a5825..812bfad 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -461,6 +461,11 @@ static struct ins instructions_arm[] = {
{ .name = "bne",   .ops  = _ops, },
  };

+struct instructions_powerpc {
+   struct ins *ins;
+   struct list_head list;
+};
+
  static int ins__key_cmp(const void *name, const void *insp)
  {
const struct ins *ins = insp;
@@ -476,6 +481,120 @@ static int ins__cmp(const void *a, const void *b)
return strcmp(ia->name, ib->name);
  }

+static int list_add__ins_powerpc(struct instructions_powerpc *head,
+struct ins *ins)
+{
+   struct instructions_powerpc *ins_powerpc;
+
+   ins_powerpc = zalloc(sizeof(struct instructions_powerpc));
+   if (!ins_powerpc)
+   return -1;
+
+   ins_powerpc->ins = ins;
+   list_add_tail(&(ins_powerpc->list), &(head->list));
+
+   return 0;
+}
+
+static struct ins *list_search__ins_powerpc(struct instructions_powerpc *head,
+   const char *name)
+{
+   struct instructions_powerpc *pos;
+
+   list_for_each_entry(pos, >list, list) {
+   if (!strcmp(pos->ins->name, name))
+   return pos->ins;
+   }
+   return NULL;
+}
+
+static struct ins *ins__find_powerpc(const char *name)
+{
+   int i;
+   struct ins *ins;
+   static struct instructions_powerpc head;
+   static bool list_initialized;
+
+   if (!list_initialized) {
+   INIT_LIST_HEAD();
+   list_initialized = true;
+   }
+
+   /*
+* Search if we already created object of 'struct ins'
+* for this instruction
+*/
+   ins = list_search__ins_powerpc(, name);
+   if (ins)
+   return ins;
+
+   ins = zalloc(sizeof(struct ins));
+   if (!ins)
+   return NULL;
+
+   ins->name = strdup(name);
+   if (!ins->name)
+   goto err;

You can move the above two inside the below if condition, so that you
only allocate memory if needed.

Or, what would be better would be to pass 'name' and the appropriate ops
pointer to the helper above (list_add__ins_powerpc) and have that
allocate 'struct ins' and insert into the list.


Yes I will think about this.


+
+   if (name[0] == 'b') {
+   /* branch instructions */
+   ins->ops = _ops;
+
+   /*
+* - Few start with 'b', but aren't branch instructions.
+* - Let's also ignore instructions involving 'ctr' and
+*   'tar' since target branch addresses for those can't
+*   be determined statically.
+*/
+   if (!strncmp(name, "bcd", 3)   ||
+   !strncmp(name, "brinc", 5) ||
+   !strncmp(name, "bper", 4)  ||
+   strstr(name, "ctr")||
+   strstr(name, "tar"))
+   goto err;

You are still leaking ins->name here.


Ah!! Sorry. I missed that we are using strdup here. Will correct it.

-Ravi

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH][v3] mtd/ifc: Add support for IFC controller version 2.0

2016-06-29 Thread Raghav Dogra


> -Original Message-
> From: Leo Li [mailto:pku@gmail.com]
> Sent: Saturday, May 28, 2016 3:34 AM
> To: Brian Norris ; Raghav Dogra
> 
> Cc: Boris Brezillon ; Yang-Leo Li
> ; Prabhakar Kushwaha
> ; Scott Wood ; linux-
> m...@lists.infradead.org; linuxppc-dev ;
> Raghav Dogra ; Jaiprakash Singh
> 
> Subject: Re: [PATCH][v3] mtd/ifc: Add support for IFC controller version 2.0
> 
> On Fri, May 27, 2016 at 4:12 PM, Brian Norris
>  wrote:
> > Hi Leo,
> >
> > On Fri, May 27, 2016 at 10:44:01PM +0200, Boris Brezillon wrote:
> >> On Fri, 27 May 2016 15:15:00 -0500
> >> Leo Li  wrote:
> >> > On Wed, May 25, 2016 at 3:34 PM, Boris Brezillon
> >> >  wrote:
> >> > > On Wed, 25 May 2016 14:18:43 -0500 Leo Li 
> >> > > wrote:
> >> > >> It seems that the patch at
> >> > >> https://patchwork.ozlabs.org/patch/557389/
> >> > >> mentioned above was not in tree for 4.7.  Can you review and
> >> > >> apply that patch too?
> >> > >
> >> > > I see it in the PR Brian sent 2 days ago [1], so it should appear
> >> > > in Linus tree soon.
> >> > >
> >> > > Regards,
> >> > >
> >> > > Boris
> >> > >
> >> > > [1]https://lkml.org/lkml/2016/5/24/9
> >> >
> >> >
> >> > The pull request does have patch "mtd/ifc: Add support for IFC
> >> > controller version 2.0", but it doesn't have another patch
> >> > "driver/memory: Update dependency of IFC for
> >> > Layerscape"(https://patchwork.ozlabs.org/patch/557389/) needed to
> >> > make the driver selectable on new hardware.
> >
> > Your patches seem to have broken threading. Or at least, in my
> > mailbox, I have that patch, but I can't easily find [PATCH 1/3] or [PATCH
> 3/3].
> > Please fix your threading next time, to help ensure things get handled
> > together.
> >
> > (It also helps when you reply to the patch you're asking about, and
> > not to a different patch.)
> >
> >> Sorry, I overlooked that part in your different emails (even though
> >> you clearly stated that you needed both patches).
> >>
> >> For my defense, I haven't followed the patch series from the
> >> beginning, and only took the patch because Brian suggested to do so
> >> (and the changes seemed ok).
> >> It would have been clearer if the different patches were part of the
> >> same series.
> >
> > +1 to the last sentence.
> >
> >> Anyway, Brian, can you take it into your tree and make it appear in
> >> -rc1 (or earlier if it's still possible)?
> >
> > Not sure how I could get it any "earlier"? It's not making -rc1 at
> > this point.
> >
> >> BTW, in the patch description you say you're only modifying a Kconfig
> >> dependency, but you're actually doing more than that: you're removing
> >> an asm header inclusion and manually include several other headers
> >> (which I guess were previously included by asm/prom.h).
> >
> > Please resend this patch with a more complete commit description; I'd
> > like it to get actual review (and time in linux-next) before it gets
> > merged, so at best, it'll wait a few -rc's. I also suspect the patch
> > isn't optimal. I believe Scott has suggested [1] that we didn't need
> > the FSL_SOC dependency on the LBC driver. I think IFC looks like a
> > similar case?

Hi Brian,

The patch being talked about does not add a FSL_SOC dependency on the IFC 
driver.
It uses a generic ARCH_LAYERSCAPE macro to enable IFC. This should be Ok? 

Regards,
Raghav
> 
> Thanks Brian.
> 
> Raghav, Can you do that as soon as possible?
> 
> Regards,
> Leo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3] cpuidle: Fix last_residency division

2016-06-29 Thread Nicolas Pitre
On Wed, 29 Jun 2016, Daniel Lezcano wrote:

> On 06/29/2016 09:06 AM, Shreyas B. Prabhu wrote:
> > diff --git a/drivers/cpuidle/cpuidle.h b/drivers/cpuidle/cpuidle.h
> > index f87f399..c8ea5ad 100644
> > --- a/drivers/cpuidle/cpuidle.h
> > +++ b/drivers/cpuidle/cpuidle.h
> > @@ -68,4 +68,27 @@ static inline void
> > cpuidle_coupled_unregister_device(struct cpuidle_device *dev)
> >   }
> >   #endif
> >
> > +/*
> > + * Used for calculating last_residency in usec. Optimized for case
> > + * where last_residency in nsecs is < INT_MAX/2 by using faster
> > + * approximation. Approximated value has less than 1% error.
> > + */
> > +static inline int convert_nsec_to_usec(u64 nsec)
> > +{
> > +   if (likely(nsec < INT_MAX / 2)) {
> 
> UINT_MAX ?

Actually this can be better than that.

> > +   int usec = (int)nsec;

First, you'll want an unsigned type. Given the provided argument is u64, 
we can assume there won't be any negative values here.

Then it would be wise to use a type with an explicit width, like U32.

> > +   usec += usec >> 5;
> > +   usec = usec >> 10;
> > +   return usec;

And now you want to maximize the available range. So not to overflow the 
first addition, we must respect:

usec + (usec >> 5) <= 0x 
usec + usec/32 <= 0x
usec <= (0x * 32) / 33

Therefore:

nsec <= 0xf83e0f82

This is much better than INT_MAX/2.

> > +   } else {
> > +   u64 usec = div_u64(nsec, 1000);
> > +
> > +   if (usec > INT_MAX)
> > +   usec = INT_MAX;
> > +   return (int)usec;
> > +   }
> > +}
> 
> 
> 
> -- 
>   Linaro.org │ Open source software for ARM SoCs
> 
> Follow Linaro:   Facebook |
>  Twitter |
>  Blog
> 
> ___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 3/4] perf annotate: add powerpc support

2016-06-29 Thread Naveen N. Rao
On 2016/06/29 04:45PM, Ravi Bangoria wrote:
> From: Naveen N. Rao 
> 
> Powerpc has long list of branch instructions and hardcoding them in
> table appears to be error-prone. So, add new function to find
> instruction instead of creating table. This function dynamically
> create table(list of 'struct ins'), and instead of creating object
> every time, first check if list already contain object for that
> nemonics.
> 
> Signed-off-by: Naveen N. Rao 
> Signed-off-by: Ravi Bangoria 
> ---
> Changes in v2:
>   - Corrected few memory leaks.
>   - Created Dynamic list for powerpc to optimize memory consumption
> 
>  tools/perf/util/annotate.c | 121 
> +
>  1 file changed, 121 insertions(+)
> 
> diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
> index 36a5825..812bfad 100644
> --- a/tools/perf/util/annotate.c
> +++ b/tools/perf/util/annotate.c
> @@ -461,6 +461,11 @@ static struct ins instructions_arm[] = {
>   { .name = "bne",   .ops  = _ops, },
>  };
> 
> +struct instructions_powerpc {
> + struct ins *ins;
> + struct list_head list;
> +};
> +
>  static int ins__key_cmp(const void *name, const void *insp)
>  {
>   const struct ins *ins = insp;
> @@ -476,6 +481,120 @@ static int ins__cmp(const void *a, const void *b)
>   return strcmp(ia->name, ib->name);
>  }
> 
> +static int list_add__ins_powerpc(struct instructions_powerpc *head,
> +  struct ins *ins)
> +{
> + struct instructions_powerpc *ins_powerpc;
> +
> + ins_powerpc = zalloc(sizeof(struct instructions_powerpc));
> + if (!ins_powerpc)
> + return -1;
> +
> + ins_powerpc->ins = ins;
> + list_add_tail(&(ins_powerpc->list), &(head->list));
> +
> + return 0;
> +}
> +
> +static struct ins *list_search__ins_powerpc(struct instructions_powerpc 
> *head,
> + const char *name)
> +{
> + struct instructions_powerpc *pos;
> +
> + list_for_each_entry(pos, >list, list) {
> + if (!strcmp(pos->ins->name, name))
> + return pos->ins;
> + }
> + return NULL;
> +}
> +
> +static struct ins *ins__find_powerpc(const char *name)
> +{
> + int i;
> + struct ins *ins;
> + static struct instructions_powerpc head;
> + static bool list_initialized;
> +
> + if (!list_initialized) {
> + INIT_LIST_HEAD();
> + list_initialized = true;
> + }
> +
> + /*
> +  * Search if we already created object of 'struct ins'
> +  * for this instruction
> +  */
> + ins = list_search__ins_powerpc(, name);
> + if (ins)
> + return ins;
> +
> + ins = zalloc(sizeof(struct ins));
> + if (!ins)
> + return NULL;
> +
> + ins->name = strdup(name);
> + if (!ins->name)
> + goto err;

You can move the above two inside the below if condition, so that you 
only allocate memory if needed.

Or, what would be better would be to pass 'name' and the appropriate ops 
pointer to the helper above (list_add__ins_powerpc) and have that 
allocate 'struct ins' and insert into the list.

> +
> + if (name[0] == 'b') {
> + /* branch instructions */
> + ins->ops = _ops;
> +
> + /*
> +  * - Few start with 'b', but aren't branch instructions.
> +  * - Let's also ignore instructions involving 'ctr' and
> +  *   'tar' since target branch addresses for those can't
> +  *   be determined statically.
> +  */
> + if (!strncmp(name, "bcd", 3)   ||
> + !strncmp(name, "brinc", 5) ||
> + !strncmp(name, "bper", 4)  ||
> + strstr(name, "ctr")||
> + strstr(name, "tar"))
> + goto err;

You are still leaking ins->name here.

- Naveen

> +
> + i = strlen(name) - 1;
> + if (i < 0)
> + goto err;
> +
> + /* ignore optional hints at the end of the instructions */
> + if (name[i] == '+' || name[i] == '-')
> + i--;
> +
> + if (name[i] == 'l' || (name[i] == 'a' && name[i-1] == 
> 'l')) {
> + /*
> +  * if the instruction ends up with 'l' or 'la', then
> +  * those are considered 'calls' since they update LR.
> +  * ... except for 'bnl' which is branch if not less than
> +  * and the absolute form of the same.
> +  */
> + if (strcmp(name, "bnl") && strcmp(name, "bnl+") &&
> + strcmp(name, "bnl-") && strcmp(name, "bnla") &&
> + strcmp(name, "bnla+") && strcmp(name, "bnla-"))
> + ins->ops = _ops;
> +   

Re: [PATCH 1/4] kvm/ppc/book3s_hv: Change vcore element runnable_threads from linked-list to array

2016-06-29 Thread Paolo Bonzini


On 29/06/2016 06:44, Suraj Jitindar Singh wrote:
> Thanks for catching that, yeah I see.
> 
> I don't think we can trivially move the struct kvmppc_vcore definition into 
> kvm_book3s.h as other code in kvm_host.h (i.e. struct kvm_vcpu_arch) requires
> the definition. I was thinking that I could just put runnable_threads inside 
> an #ifdef.
> 
> #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
>   struct kvm_vcpu *runnable_threads[MAX_SMT_THREADS];
> #endif

You can rename MAX_SMT_THREADS to BOOK3S_MAX_SMT_THREADS and move it to
kvm_host.h.  It seems like assembly code does not use it, so it's
unnecessary to have it in book3s_asm.h.

Paolo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/2] cxl: Fix allocating a minimum of 2 pages for the SPA

2016-06-29 Thread Ian Munsie
From: Ian Munsie 

The Scheduled Process Area is allocated dynamically with enough pages to
fit at least as many processes as the AFU descriptor indicated. Since
the calculation is non-trivial, it does this by calculating how many
processes could fit in an allocation of a given order, and increasing
that order until it can fit enough processes or hits the maximum
supported size.

Currently, it will start this search using a SPA of 2 pages instead of
1. This can waste a page of memory if the AFU's maximum number of
supported processes was small enough to fit in one page.

Fix the algorithm to start the search at 1 page.

Signed-off-by: Ian Munsie 
---
 drivers/misc/cxl/native.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/misc/cxl/native.c b/drivers/misc/cxl/native.c
index e80d8f7..120c468 100644
--- a/drivers/misc/cxl/native.c
+++ b/drivers/misc/cxl/native.c
@@ -189,7 +189,7 @@ int cxl_alloc_spa(struct cxl_afu *afu)
unsigned spa_size;
 
/* Work out how many pages to allocate */
-   afu->native->spa_order = 0;
+   afu->native->spa_order = -1;
do {
afu->native->spa_order++;
spa_size = (1 << afu->native->spa_order) * PAGE_SIZE;
-- 
2.8.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/2] cxl: Fix allowing bogus AFU descriptors with 0 maximum processes

2016-06-29 Thread Ian Munsie
From: Ian Munsie 

If the AFU descriptor of an AFU directed AFU indicates that it supports
0 maximum processes, we will accept that value and attempt to use it.
The SPA will still be allocated (with 2 pages due to another minor bug
and room for 958 processes), and when a context is allocated we will
pass the value of 0 to idr_alloc as the maximum. However, idr_alloc will
treat that as meaning no maximum and will allocate a context number and
we return a valid context.

Conceivably, this could lead to a buffer overflow of the SPA if more
than 958 contexts were allocated, however this is mitigated by the fact
that there are no known AFUs in the wild with a bogus AFU descriptor
like this, and that only the root user is allowed to flash an AFU image
to a card.

Add a check when validating the AFU descriptor to reject any with 0
maximum processes.

We do still allow a dedicated process only AFU to indicate that it
supports 0 contexts even though that is forbidden in the architecture,
as in that case we ignore the value and use 1 instead. This is just on
the off-chance that such a dedicated process AFU may exist (not that I
am aware of any), since their developers are less likely to have cared
about this value at all.

Signed-off-by: Ian Munsie 
---
 drivers/misc/cxl/pci.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
index 648817a..58d7d821 100644
--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -775,6 +775,21 @@ static int cxl_afu_descriptor_looks_ok(struct cxl_afu *afu)
}
}
 
+   if ((afu->modes_supported & ~CXL_MODE_DEDICATED) && 
afu->max_procs_virtualised == 0) {
+   /*
+* We could also check this for the dedicated process model
+* since the architecture indicates it should be set to 1, but
+* in that case we ignore the value and I'd rather not risk
+* breaking any existing dedicated process AFUs that left it as
+* 0 (not that I'm aware of any). It is clearly an error for an
+* AFU directed AFU to set this to 0, and would have previously
+* triggered a bug resulting in the maximum not being enforced
+* at all since idr_alloc treats 0 as no maximum.
+*/
+   dev_err(>dev, "AFU does not support any processes\n");
+   return -EINVAL;
+   }
+
return 0;
 }
 
-- 
2.8.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2] selftests/powerpc: exec() with suspended transaction

2016-06-29 Thread Michael Ellerman
From: Cyril Bur 

Perform an exec() class syscall with a suspended transaction.

Signed-off-by: Cyril Bur 
[mpe: Fix build errors, use a single binary for the test]
Signed-off-by: Michael Ellerman 

Selftest updates
---
 tools/testing/selftests/powerpc/tm/.gitignore   |  1 +
 tools/testing/selftests/powerpc/tm/Makefile |  7 ++-
 tools/testing/selftests/powerpc/tm/tm-exec.c| 70 +
 tools/testing/selftests/powerpc/tm/tm-syscall.c | 15 --
 tools/testing/selftests/powerpc/tm/tm.h | 23 +++-
 5 files changed, 98 insertions(+), 18 deletions(-)
 create mode 100644 tools/testing/selftests/powerpc/tm/tm-exec.c

diff --git a/tools/testing/selftests/powerpc/tm/.gitignore 
b/tools/testing/selftests/powerpc/tm/.gitignore
index bb942db845bf..82c0a9ce6e74 100644
--- a/tools/testing/selftests/powerpc/tm/.gitignore
+++ b/tools/testing/selftests/powerpc/tm/.gitignore
@@ -6,3 +6,4 @@ tm-vmxcopy
 tm-fork
 tm-tar
 tm-tmspr
+tm-exec
diff --git a/tools/testing/selftests/powerpc/tm/Makefile 
b/tools/testing/selftests/powerpc/tm/Makefile
index d0505dbd22d5..9d301d785d9e 100644
--- a/tools/testing/selftests/powerpc/tm/Makefile
+++ b/tools/testing/selftests/powerpc/tm/Makefile
@@ -1,11 +1,14 @@
-TEST_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv tm-signal-stack 
tm-vmxcopy tm-fork tm-tar tm-tmspr
+TEST_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv tm-signal-stack \
+   tm-vmxcopy tm-fork tm-tar tm-tmspr tm-exec tm-execed
 
 all: $(TEST_PROGS)
 
 $(TEST_PROGS): ../harness.c ../utils.c
 
+CFLAGS += -mhtm
+
 tm-syscall: tm-syscall-asm.S
-tm-syscall: CFLAGS += -mhtm -I../../../../../usr/include
+tm-syscall: CFLAGS += -I../../../../../usr/include
 tm-tmspr: CFLAGS += -pthread
 
 include ../../lib.mk
diff --git a/tools/testing/selftests/powerpc/tm/tm-exec.c 
b/tools/testing/selftests/powerpc/tm/tm-exec.c
new file mode 100644
index ..3d27fa0ece04
--- /dev/null
+++ b/tools/testing/selftests/powerpc/tm/tm-exec.c
@@ -0,0 +1,70 @@
+/*
+ * Copyright 2016, Cyril Bur, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * Syscalls can be performed provided the transactions are suspended.
+ * The exec() class of syscall is unique as a new process is loaded.
+ *
+ * It makes little sense for after an exec() call for the previously
+ * suspended transaction to still exist.
+ */
+
+#define _GNU_SOURCE
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "utils.h"
+#include "tm.h"
+
+static char *path;
+
+static int test_exec(void)
+{
+   SKIP_IF(!have_htm());
+
+   asm __volatile__(
+   "tbegin.;"
+   "blt1f; "
+   "tsuspend.;"
+   "1: ;"
+   : : : "memory");
+
+   execl(path, "tm-exec", "--child", NULL);
+
+   /* Shouldn't get here */
+   perror("execl() failed");
+   return 1;
+}
+
+static int after_exec(void)
+{
+   asm __volatile__(
+   "tbegin.;"
+   "blt1f;"
+   "tsuspend.;"
+   "1: ;"
+   : : : "memory");
+
+   FAIL_IF(failure_is_nesting());
+   return 0;
+}
+
+int main(int argc, char *argv[])
+{
+   path = argv[0];
+
+   if (argc > 1 && strcmp(argv[1], "--child") == 0)
+   return after_exec();
+
+   return test_harness(test_exec, "tm_exec");
+}
diff --git a/tools/testing/selftests/powerpc/tm/tm-syscall.c 
b/tools/testing/selftests/powerpc/tm/tm-syscall.c
index 60560cb20e38..454b965a2db3 100644
--- a/tools/testing/selftests/powerpc/tm/tm-syscall.c
+++ b/tools/testing/selftests/powerpc/tm/tm-syscall.c
@@ -27,21 +27,6 @@ unsigned retries = 0;
 #define TEST_DURATION 10 /* seconds */
 #define TM_RETRIES 100
 
-long failure_code(void)
-{
-   return __builtin_get_texasru() >> 24;
-}
-
-bool failure_is_persistent(void)
-{
-   return (failure_code() & TM_CAUSE_PERSISTENT) == TM_CAUSE_PERSISTENT;
-}
-
-bool failure_is_syscall(void)
-{
-   return (failure_code() & TM_CAUSE_SYSCALL) == TM_CAUSE_SYSCALL;
-}
-
 pid_t getppid_tm(bool suspend)
 {
int i;
diff --git a/tools/testing/selftests/powerpc/tm/tm.h 
b/tools/testing/selftests/powerpc/tm/tm.h
index 24144b25772c..60318bad7d7a 100644
--- a/tools/testing/selftests/powerpc/tm/tm.h
+++ b/tools/testing/selftests/powerpc/tm/tm.h
@@ -6,8 +6,9 @@
 #ifndef _SELFTESTS_POWERPC_TM_TM_H
 #define _SELFTESTS_POWERPC_TM_TM_H
 
-#include 
+#include 
 #include 
+#include 
 
 #include "../utils.h"
 
@@ -31,4 +32,24 @@ static inline bool have_htm_nosc(void)
 #endif
 }
 
+static inline long failure_code(void)
+{
+   return __builtin_get_texasru() >> 24;
+}
+
+static inline bool failure_is_persistent(void)
+{
+

[PATCH] powerpc/kernel: Drop unused extern for current_set

2016-06-29 Thread Michael Ellerman
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/mm/init_32.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/powerpc/mm/init_32.c b/arch/powerpc/mm/init_32.c
index c899fe340bbd..e2d7ba124618 100644
--- a/arch/powerpc/mm/init_32.c
+++ b/arch/powerpc/mm/init_32.c
@@ -80,9 +80,6 @@ EXPORT_SYMBOL(agp_special_page);
 
 void MMU_init(void);
 
-/* XXX should be in current.h  -- paulus */
-extern struct task_struct *current_set[NR_CPUS];
-
 /*
  * this tells the system to map all of ram with the segregs
  * (i.e. page tables) instead of the bats.
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 4/4] perf annotate: Define macro for arch names

2016-06-29 Thread Ravi Bangoria
Define macro for each arch name and use them instead of using arch
name as string.

Signed-off-by: Ravi Bangoria 
---
Changes in v2:
  - No changes

 tools/perf/arch/common.c   | 36 ++--
 tools/perf/arch/common.h   | 11 +++
 tools/perf/util/annotate.c | 10 +-
 tools/perf/util/unwind-libunwind.c |  4 ++--
 4 files changed, 36 insertions(+), 25 deletions(-)

diff --git a/tools/perf/arch/common.c b/tools/perf/arch/common.c
index ee69668..feb2113 100644
--- a/tools/perf/arch/common.c
+++ b/tools/perf/arch/common.c
@@ -122,25 +122,25 @@ static int lookup_triplets(const char *const *triplets, 
const char *name)
 const char *normalize_arch(char *arch)
 {
if (!strcmp(arch, "x86_64"))
-   return "x86";
+   return NORM_X86;
if (arch[0] == 'i' && arch[2] == '8' && arch[3] == '6')
-   return "x86";
+   return NORM_X86;
if (!strcmp(arch, "sun4u") || !strncmp(arch, "sparc", 5))
-   return "sparc";
+   return NORM_SPARC;
if (!strcmp(arch, "aarch64") || !strcmp(arch, "arm64"))
-   return "arm64";
+   return NORM_ARM64;
if (!strncmp(arch, "arm", 3) || !strcmp(arch, "sa110"))
-   return "arm";
+   return NORM_ARM;
if (!strncmp(arch, "s390", 4))
-   return "s390";
+   return NORM_S390;
if (!strncmp(arch, "parisc", 6))
-   return "parisc";
+   return NORM_PARISC;
if (!strncmp(arch, "powerpc", 7) || !strncmp(arch, "ppc", 3))
-   return "powerpc";
+   return NORM_POWERPC;
if (!strncmp(arch, "mips", 4))
-   return "mips";
+   return NORM_MIPS;
if (!strncmp(arch, "sh", 2) && isdigit(arch[2]))
-   return "sh";
+   return NORM_SH;
 
return arch;
 }
@@ -180,21 +180,21 @@ static int perf_env__lookup_binutils_path(struct perf_env 
*env,
zfree();
}
 
-   if (!strcmp(arch, "arm"))
+   if (!strcmp(arch, NORM_ARM))
path_list = arm_triplets;
-   else if (!strcmp(arch, "arm64"))
+   else if (!strcmp(arch, NORM_ARM64))
path_list = arm64_triplets;
-   else if (!strcmp(arch, "powerpc"))
+   else if (!strcmp(arch, NORM_POWERPC))
path_list = powerpc_triplets;
-   else if (!strcmp(arch, "sh"))
+   else if (!strcmp(arch, NORM_SH))
path_list = sh_triplets;
-   else if (!strcmp(arch, "s390"))
+   else if (!strcmp(arch, NORM_S390))
path_list = s390_triplets;
-   else if (!strcmp(arch, "sparc"))
+   else if (!strcmp(arch, NORM_SPARC))
path_list = sparc_triplets;
-   else if (!strcmp(arch, "x86"))
+   else if (!strcmp(arch, NORM_X86))
path_list = x86_triplets;
-   else if (!strcmp(arch, "mips"))
+   else if (!strcmp(arch, NORM_MIPS))
path_list = mips_triplets;
else {
ui__error("binutils for %s not supported.\n", arch);
diff --git a/tools/perf/arch/common.h b/tools/perf/arch/common.h
index 6b01c73..14ca8ca 100644
--- a/tools/perf/arch/common.h
+++ b/tools/perf/arch/common.h
@@ -5,6 +5,17 @@
 
 extern const char *objdump_path;
 
+/* Macro for normalized arch names */
+#define NORM_X86   "x86"
+#define NORM_SPARC "sparc"
+#define NORM_ARM64 "arm64"
+#define NORM_ARM   "arm"
+#define NORM_S390  "s390"
+#define NORM_PARISC"parisc"
+#define NORM_POWERPC   "powerpc"
+#define NORM_MIPS  "mips"
+#define NORM_SH"sh"
+
 int perf_env__lookup_objdump(struct perf_env *env);
 const char *normalize_arch(char *arch);
 
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 812bfad..8c27486 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -68,7 +68,7 @@ static int call__parse(struct ins_operands *ops,
 
name++;
 
-   if (!strcmp(norm_arch, "arm") && strchr(name, '+'))
+   if (!strcmp(norm_arch, NORM_ARM) && strchr(name, '+'))
return -1;
 
tok = strchr(name, '>');
@@ -255,7 +255,7 @@ static int mov__parse(struct ins_operands *ops,
 
target = ++s;
 
-   if (!strcmp(norm_arch, "arm"))
+   if (!strcmp(norm_arch, NORM_ARM))
comment = strchr(s, ';');
else
comment = strchr(s, '#');
@@ -624,13 +624,13 @@ static struct ins *ins__find(const char *name, const char 
*norm_arch)
sorted = true;
}
 
-   if (!strcmp(norm_arch, "x86")) {
+   if (!strcmp(norm_arch, NORM_X86)) {
instructions = instructions_x86;
nmemb = ARRAY_SIZE(instructions_x86);
-   } else if (!strcmp(norm_arch, "arm")) {
+   } else if (!strcmp(norm_arch, NORM_ARM)) {
instructions = 

[PATCH v2 3/4] perf annotate: add powerpc support

2016-06-29 Thread Ravi Bangoria
From: Naveen N. Rao 

Powerpc has long list of branch instructions and hardcoding them in
table appears to be error-prone. So, add new function to find
instruction instead of creating table. This function dynamically
create table(list of 'struct ins'), and instead of creating object
every time, first check if list already contain object for that
nemonics.

Signed-off-by: Naveen N. Rao 
Signed-off-by: Ravi Bangoria 
---
Changes in v2:
  - Corrected few memory leaks.
  - Created Dynamic list for powerpc to optimize memory consumption

 tools/perf/util/annotate.c | 121 +
 1 file changed, 121 insertions(+)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 36a5825..812bfad 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -461,6 +461,11 @@ static struct ins instructions_arm[] = {
{ .name = "bne",   .ops  = _ops, },
 };
 
+struct instructions_powerpc {
+   struct ins *ins;
+   struct list_head list;
+};
+
 static int ins__key_cmp(const void *name, const void *insp)
 {
const struct ins *ins = insp;
@@ -476,6 +481,120 @@ static int ins__cmp(const void *a, const void *b)
return strcmp(ia->name, ib->name);
 }
 
+static int list_add__ins_powerpc(struct instructions_powerpc *head,
+struct ins *ins)
+{
+   struct instructions_powerpc *ins_powerpc;
+
+   ins_powerpc = zalloc(sizeof(struct instructions_powerpc));
+   if (!ins_powerpc)
+   return -1;
+
+   ins_powerpc->ins = ins;
+   list_add_tail(&(ins_powerpc->list), &(head->list));
+
+   return 0;
+}
+
+static struct ins *list_search__ins_powerpc(struct instructions_powerpc *head,
+   const char *name)
+{
+   struct instructions_powerpc *pos;
+
+   list_for_each_entry(pos, >list, list) {
+   if (!strcmp(pos->ins->name, name))
+   return pos->ins;
+   }
+   return NULL;
+}
+
+static struct ins *ins__find_powerpc(const char *name)
+{
+   int i;
+   struct ins *ins;
+   static struct instructions_powerpc head;
+   static bool list_initialized;
+
+   if (!list_initialized) {
+   INIT_LIST_HEAD();
+   list_initialized = true;
+   }
+
+   /*
+* Search if we already created object of 'struct ins'
+* for this instruction
+*/
+   ins = list_search__ins_powerpc(, name);
+   if (ins)
+   return ins;
+
+   ins = zalloc(sizeof(struct ins));
+   if (!ins)
+   return NULL;
+
+   ins->name = strdup(name);
+   if (!ins->name)
+   goto err;
+
+   if (name[0] == 'b') {
+   /* branch instructions */
+   ins->ops = _ops;
+
+   /*
+* - Few start with 'b', but aren't branch instructions.
+* - Let's also ignore instructions involving 'ctr' and
+*   'tar' since target branch addresses for those can't
+*   be determined statically.
+*/
+   if (!strncmp(name, "bcd", 3)   ||
+   !strncmp(name, "brinc", 5) ||
+   !strncmp(name, "bper", 4)  ||
+   strstr(name, "ctr")||
+   strstr(name, "tar"))
+   goto err;
+
+   i = strlen(name) - 1;
+   if (i < 0)
+   goto err;
+
+   /* ignore optional hints at the end of the instructions */
+   if (name[i] == '+' || name[i] == '-')
+   i--;
+
+   if (name[i] == 'l' || (name[i] == 'a' && name[i-1] == 'l')) {
+   /*
+* if the instruction ends up with 'l' or 'la', then
+* those are considered 'calls' since they update LR.
+* ... except for 'bnl' which is branch if not less than
+* and the absolute form of the same.
+*/
+   if (strcmp(name, "bnl") && strcmp(name, "bnl+") &&
+   strcmp(name, "bnl-") && strcmp(name, "bnla") &&
+   strcmp(name, "bnla+") && strcmp(name, "bnla-"))
+   ins->ops = _ops;
+   }
+   if (name[i] == 'r' && name[i-1] == 'l')
+   /*
+* instructions ending with 'lr' are considered to be
+* return instructions
+*/
+   ins->ops = _ops;
+
+   /*
+* Add instruction to list so next time no need to
+* allocate memory for it.
+*/
+   if (list_add__ins_powerpc(, ins) < 0)
+   

[PATCH v2 2/4] perf annotate: Enable cross arch annotate

2016-06-29 Thread Ravi Bangoria
Change current data structures and function to enable cross arch
annotate.

Current implementation does not contain logic of record on one arch
and annotating on other. This remote annotate is partially possible
with current implementation for x86 (or may be arm as well) only.
But, to make remote annotation work properly, all architecture
instruction tables need to be included in the perf binary. And while
annotating, look for instruction table where perf.data was recorded.

For arm, few instructions were defined under #if __arm__ which I've
used as a table for arm. But I'm not sure whether instruction defined
outside of that also contains arm instructions. Apart from that,
'call__parse()' and 'move__parse()' contains #ifdef __arm__ directive.
I've changed it to  if (!strcmp(norm_arch, "arm")). But I've not
tested this as well.

Signed-off-by: Ravi Bangoria 
---
Changes in v2:
  - No changes

 tools/perf/builtin-top.c  |   2 +-
 tools/perf/ui/browsers/annotate.c |   3 +-
 tools/perf/ui/gtk/annotate.c  |   2 +-
 tools/perf/util/annotate.c| 136 --
 tools/perf/util/annotate.h|   5 +-
 5 files changed, 95 insertions(+), 53 deletions(-)

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 07fc792..d4fd947 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -128,7 +128,7 @@ static int perf_top__parse_source(struct perf_top *top, 
struct hist_entry *he)
return err;
}
 
-   err = symbol__annotate(sym, map, 0);
+   err = symbol__annotate(sym, map, 0, NULL);
if (err == 0) {
 out_assign:
top->sym_filter_entry = he;
diff --git a/tools/perf/ui/browsers/annotate.c 
b/tools/perf/ui/browsers/annotate.c
index 29dc6d2..3a652a6f 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -1050,7 +1050,8 @@ int symbol__tui_annotate(struct symbol *sym, struct map 
*map,
  (nr_pcnt - 1);
}
 
-   if (symbol__annotate(sym, map, sizeof_bdl) < 0) {
+   if (symbol__annotate(sym, map, sizeof_bdl,
+perf_evsel__env_arch(evsel)) < 0) {
ui__error("%s", ui_helpline__last_msg);
goto out_free_offsets;
}
diff --git a/tools/perf/ui/gtk/annotate.c b/tools/perf/ui/gtk/annotate.c
index 9c7ff8d..d7150b3 100644
--- a/tools/perf/ui/gtk/annotate.c
+++ b/tools/perf/ui/gtk/annotate.c
@@ -166,7 +166,7 @@ static int symbol__gtk_annotate(struct symbol *sym, struct 
map *map,
if (map->dso->annotate_warned)
return -1;
 
-   if (symbol__annotate(sym, map, 0) < 0) {
+   if (symbol__annotate(sym, map, 0, perf_evsel__env_arch(evsel)) < 0) {
ui__error("%s", ui_helpline__current);
return -1;
}
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index c385fec..36a5825 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -20,12 +20,14 @@
 #include 
 #include 
 #include 
+#include 
+#include "../arch/common.h"
 
 const char *disassembler_style;
 const char *objdump_path;
 static regex_t  file_lineno;
 
-static struct ins *ins__find(const char *name);
+static struct ins *ins__find(const char *name, const char *norm_arch);
 static int disasm_line__parse(char *line, char **namep, char **rawp);
 
 static void ins__delete(struct ins_operands *ops)
@@ -53,7 +55,8 @@ int ins__scnprintf(struct ins *ins, char *bf, size_t size,
return ins__raw_scnprintf(ins, bf, size, ops);
 }
 
-static int call__parse(struct ins_operands *ops)
+static int call__parse(struct ins_operands *ops,
+  __maybe_unused const char *norm_arch)
 {
char *endptr, *tok, *name;
 
@@ -65,10 +68,8 @@ static int call__parse(struct ins_operands *ops)
 
name++;
 
-#ifdef __arm__
-   if (strchr(name, '+'))
+   if (!strcmp(norm_arch, "arm") && strchr(name, '+'))
return -1;
-#endif
 
tok = strchr(name, '>');
if (tok == NULL)
@@ -117,7 +118,8 @@ bool ins__is_call(const struct ins *ins)
return ins->ops == _ops;
 }
 
-static int jump__parse(struct ins_operands *ops)
+static int jump__parse(struct ins_operands *ops,
+  __maybe_unused const char *norm_arch)
 {
const char *s = strchr(ops->raw, '+');
 
@@ -172,7 +174,7 @@ static int comment__symbol(char *raw, char *comment, u64 
*addrp, char **namep)
return 0;
 }
 
-static int lock__parse(struct ins_operands *ops)
+static int lock__parse(struct ins_operands *ops, const char *norm_arch)
 {
char *name;
 
@@ -183,7 +185,7 @@ static int lock__parse(struct ins_operands *ops)
if (disasm_line__parse(ops->raw, , >locked.ops->raw) < 0)
goto out_free_ops;
 
-   ops->locked.ins = ins__find(name);
+   ops->locked.ins = ins__find(name, norm_arch);
free(name);
 
if 

[PATCH v2 1/4] perf: Utility function to fetch arch

2016-06-29 Thread Ravi Bangoria
Add Utility function to fetch arch using evsel. (evsel->env->arch)

Signed-off-by: Ravi Bangoria 
---
Changes in v2:
  - No changes

 tools/perf/util/evsel.c | 7 +++
 tools/perf/util/evsel.h | 2 ++
 2 files changed, 9 insertions(+)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 1d8f2bb..0fea724 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -2422,3 +2422,10 @@ int perf_evsel__open_strerror(struct perf_evsel *evsel, 
struct target *target,
 err, strerror_r(err, sbuf, sizeof(sbuf)),
 perf_evsel__name(evsel));
 }
+
+char *perf_evsel__env_arch(struct perf_evsel *evsel)
+{
+   if (evsel && evsel->evlist && evsel->evlist->env)
+   return evsel->evlist->env->arch;
+   return NULL;
+}
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 828ddd1..86fed7a 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -435,4 +435,6 @@ typedef int (*attr__fprintf_f)(FILE *, const char *, const 
char *, void *);
 int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr,
 attr__fprintf_f attr__fprintf, void *priv);
 
+char *perf_evsel__env_arch(struct perf_evsel *evsel);
+
 #endif /* __PERF_EVSEL_H */
-- 
2.5.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 0/4] perf annotate: Enable cross arch annotate

2016-06-29 Thread Ravi Bangoria
Perf can currently only support code navigation (branches and calls) in
annotate when run on the same architecture where perf.data was recorded.
But cross arch annotate is not supported.

This patchset enables cross arch annotate. Currently I've used x86
and arm instructions which are already available and adding support
for powerpc as well. Adding support for other arch will be easy.

I've created this patch on top of acme/perf/core. And tested it with
x86 and powerpc only.

Example:

  Record on powerpc:
  $ ./perf record -a

  Report -> Annotate on x86:
  $ ./perf report -i perf.data.powerpc --vmlinux vmlinux.powerpc

Changes in v2:
  - Corrected few memory leaks.
  - Created Dynamic list for powerpc to optimize memory consumption

Naveen N. Rao (1):
  perf annotate: add powerpc support

Ravi Bangoria (3):
  perf: Utility function to fetch arch
  perf annotate: Enable cross arch annotate
  perf: Define macro for arch names

 tools/perf/arch/common.c   |  36 +++---
 tools/perf/arch/common.h   |  11 ++
 tools/perf/builtin-top.c   |   2 +-
 tools/perf/ui/browsers/annotate.c  |   3 +-
 tools/perf/ui/gtk/annotate.c   |   2 +-
 tools/perf/util/annotate.c | 255 ++---
 tools/perf/util/annotate.h |   5 +-
 tools/perf/util/evsel.c|   7 +
 tools/perf/util/evsel.h|   2 +
 tools/perf/util/unwind-libunwind.c |   4 +-
 10 files changed, 255 insertions(+), 72 deletions(-)

--
2.5.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH v3 1/2] clk: Add consumer APIs for discovering possible parent clocks

2016-06-29 Thread Yuantian Tang
Hi,

This patch is acked by clock maintainer. If no comments from anyone else, we 
will merge it in next week.

Thanks,
Yuantian

> -Original Message-
> From: Scott Wood [mailto:o...@buserror.net]
> Sent: Thursday, June 16, 2016 2:21 PM
> To: Russell King ; Michael Turquette
> ; Stephen Boyd ;
> Viresh Kumar ; Rafael J. Wysocki
> 
> Cc: linux-...@vger.kernel.org; linux...@vger.kernel.org; linuxppc-
> d...@lists.ozlabs.org; Yuantian Tang ; Yang-Leo Li
> ; Xiaofeng Ren ; Scott Wood
> 
> Subject: [PATCH v3 1/2] clk: Add consumer APIs for discovering possible
> parent clocks
> 
> From: Scott Wood 
> 
> Commit fc4a05d4b0eb ("clk: Remove unused provider APIs") removed
> __clk_get_num_parents() and clk_hw_get_parent_by_index(), leaving only
> true provider API versions that operate on struct clk_hw.
> 
> qoriq-cpufreq needs these functions in order to determine the options it has
> for calling clk_set_parent() and thus populate the cpufreq table, so revive
> them as legitimate consumer APIs.
> 
> Signed-off-by: Scott Wood 
> ---
> v2: Add missing 'static inline' to stub functions.
> 
> v3: no changes
> 
>  drivers/clk/clk.c   | 19 +++
>  include/linux/clk.h | 31 +++
>  2 files changed, 50 insertions(+)
> 
> diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c index d584004..d61a3fe 
> 100644
> --- a/drivers/clk/clk.c
> +++ b/drivers/clk/clk.c
> @@ -290,6 +290,12 @@ struct clk_hw *__clk_get_hw(struct clk *clk)  }
> EXPORT_SYMBOL_GPL(__clk_get_hw);
> 
> +unsigned int clk_get_num_parents(struct clk *clk) {
> + return !clk ? 0 : clk->core->num_parents; }
> +EXPORT_SYMBOL_GPL(clk_get_num_parents);
> +
>  unsigned int clk_hw_get_num_parents(const struct clk_hw *hw)  {
>   return hw->core->num_parents;
> @@ -358,6 +364,19 @@ static struct clk_core
> *clk_core_get_parent_by_index(struct clk_core *core,
>   return core->parents[index];
>  }
> 
> +struct clk *clk_get_parent_by_index(struct clk *clk, unsigned int
> +index) {
> + struct clk_core *parent;
> +
> + if (!clk)
> + return NULL;
> +
> + parent = clk_core_get_parent_by_index(clk->core, index);
> +
> + return !parent ? NULL : parent->hw->clk; }
> +EXPORT_SYMBOL_GPL(clk_get_parent_by_index);
> +
>  struct clk_hw *
>  clk_hw_get_parent_by_index(const struct clk_hw *hw, unsigned int index)
> { diff --git a/include/linux/clk.h b/include/linux/clk.h index 
> 0df4a51..acd115f
> 100644
> --- a/include/linux/clk.h
> +++ b/include/linux/clk.h
> @@ -392,6 +392,26 @@ int clk_set_parent(struct clk *clk, struct clk *parent);
> struct clk *clk_get_parent(struct clk *clk);
> 
>  /**
> + * clk_get_parent_by_index - get a possible parent clock by index
> + * @clk: clock source
> + * @index: index into the array of possible parents of this clock
> + *
> + * Returns struct clk corresponding to the requested possible
> + * parent clock source, or NULL.
> + */
> +struct clk *clk_get_parent_by_index(struct clk *clk,
> + unsigned int index);
> +
> +/**
> + * clk_get_num_parents - get number of possible parents
> + * @clk: clock source
> + *
> + * Returns the number of possible parents of this clock,
> + * which can then be enumerated using clk_get_parent_by_index().
> + */
> +unsigned int clk_get_num_parents(struct clk *clk);
> +
> +/**
>   * clk_get_sys - get a clock based upon the device name
>   * @dev_id: device name
>   * @con_id: connection ID
> @@ -461,6 +481,17 @@ static inline struct clk *clk_get_parent(struct clk *clk)
>   return NULL;
>  }
> 
> +static inline struct clk *clk_get_parent_by_index(struct clk *clk,
> +   unsigned int index)
> +{
> + return NULL;
> +}
> +
> +static inline unsigned int clk_get_num_parents(struct clk *clk) {
> + return 0;
> +}
> +
>  #endif
> 
>  /* clk_prepare_enable helps cases using clk_enable in non-atomic context.
> */
> --
> 2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V5 2/3] powerpc/opal: Add inline function to get rc from an ASYNC_COMP opal_msg

2016-06-29 Thread Stewart Smith
Suraj Jitindar Singh  writes:
> --- a/arch/powerpc/include/asm/opal.h
> +++ b/arch/powerpc/include/asm/opal.h
> @@ -276,6 +276,14 @@ extern int opal_error_code(int rc);
>  
>  ssize_t opal_msglog_copy(char *to, loff_t pos, size_t count);
>  
> +static inline int opal_get_async_rc(struct opal_msg msg)
> +{
> + if (msg.msg_type != OPAL_MSG_ASYNC_COMP)
> + return OPAL_PARAMETER;

Should instead be WARN_ON or BUG_ON ? Is there *ever* a situation where
calling opal_get_async_rc on a non-OPAL_MSG_ASYNC_COMP is a not both a
dumb idea and a bug?

otherwise (including if above change is made)
Acked-by: Stewart Smith 


-- 
Stewart Smith
OPAL Architect, IBM.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3] cpuidle: Fix last_residency division

2016-06-29 Thread Shreyas B Prabhu

>>
>> +/*
>> + * Used for calculating last_residency in usec. Optimized for case
>> + * where last_residency in nsecs is < INT_MAX/2 by using faster
>> + * approximation. Approximated value has less than 1% error.
>> + */
>> +static inline int convert_nsec_to_usec(u64 nsec)
>> +{
>> +if (likely(nsec < INT_MAX / 2)) {
> 
> UINT_MAX ?

I don't think I can use UINT_MAX here since usec += usec >> 5 can
overflow. Also using INT_MAX / 2 instead of INT_MAX since potentially
usec += usec >> 5 can be negative and usec >> 10 will retain the sign bit.

> 
>> +int usec = (int)nsec;
>> +
>> +usec += usec >> 5;
>> +usec = usec >> 10;
>> +return usec;
>> +} else {
>> +u64 usec = div_u64(nsec, 1000);
>> +
>> +if (usec > INT_MAX)
>> +usec = INT_MAX;
>> +return (int)usec;
>> +}
>> +}
> 

Thanks,
Shreyas

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 3/5] powerpc: tm: Always use fp_state and vr_state to store live registers

2016-06-29 Thread Simon Guo
hi Cyril,

On Wed, Jun 08, 2016 at 02:00:34PM +1000, Cyril Bur wrote:
> @@ -1108,11 +1084,11 @@ struct task_struct *__switch_to(struct task_struct 
> *prev,
>*/
>   save_sprs(>thread);
>  
> - __switch_to_tm(prev);
> -
>   /* Save FPU, Altivec, VSX and SPE state */
>   giveup_all(prev);
>  
> + __switch_to_tm(prev);
> +

There should be a bug.
giveup_all() will clear MSR[FP] bit. 
__switch_to_tm() reads that bit to decide whether the FP 
register needs to be flushed to thread_struct.
=== tm_reclaim() (invoked by __switch_to_tm)
andi.   r0, r4, MSR_FP
beq dont_backup_fp

addir7, r3, THREAD_CKFPSTATE
SAVE_32FPRS_VSRS(0, R6, R7) /* r6 scratch, r7 transact fp
state */

mffsfr0
stfdfr0,FPSTATE_FPSCR(r7)

dont_backup_fp:
=

But now the __switch_to_tm() is moved behind giveup_all().
So __switch_to_tm() loses MSR[FP] and cannot decide whether saving ckpt FPU or 
not.

The same applies to VMX/VSX.

Thanks,
- Simon

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3] cpuidle: Fix last_residency division

2016-06-29 Thread Daniel Lezcano

On 06/29/2016 09:06 AM, Shreyas B. Prabhu wrote:

Snooze is a poll idle state in powernv and pseries platforms. Snooze
has a timeout so that if a cpu stays in snooze for more than target
residency of the next available idle state, then it would exit thereby
giving chance to the cpuidle governor to re-evaluate and
promote the cpu to a deeper idle state. Therefore whenever snooze exits
due to this timeout, its last_residency will be target_residency of next
deeper state.

commit e93e59ce5b85 ("cpuidle: Replace ktime_get() with local_clock()")
changed the math around last_residency calculation. Specifically, while
converting last_residency value from nanoseconds to microseconds it does
right shift by 10. Due to this, in snooze timeout exit scenarios
last_residency calculated is roughly 2.3% less than target_residency of
next available state. This pattern is picked up get_typical_interval()
in the menu governor and therefore expected_interval in menu_select() is
frequently less than the target_residency of any state but snooze.

Due to this we are entering snooze at a higher rate, thereby affecting
the single thread performance.

Fix this by using a better approximation for division by 1000.

Reported-by: Anton Blanchard 
Bisected-by: Shilpasri G Bhat 
Suggested-by David Laight 
Signed-off-by: Shreyas B. Prabhu 


[Cc'ed Nicolas Pitre]


---
Changes in v3
=
  - Using approximation suggested by David

Changes in v2
=
  - Fixing it in the cpuidle core code instead of driver code.

  drivers/cpuidle/cpuidle.c | 11 +++
  drivers/cpuidle/cpuidle.h | 23 +++
  2 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index a4d0059..e9a7f74 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -174,7 +174,6 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct 
cpuidle_driver *drv,
struct cpuidle_state *target_state = >states[index];
bool broadcast = !!(target_state->flags & CPUIDLE_FLAG_TIMER_STOP);
u64 time_start, time_end;
-   s64 diff;

/*
 * Tell the time framework to switch to a broadcast timer because our
@@ -218,14 +217,10 @@ int cpuidle_enter_state(struct cpuidle_device *dev, 
struct cpuidle_driver *drv,
local_irq_enable();

/*
-* local_clock() returns the time in nanosecond, let's shift
-* by 10 (divide by 1024) to have microsecond based time.
+* local_clock() returns the time in nanosecond, convert it to
+* microsecond based time.
 */
-   diff = (time_end - time_start) >> 10;
-   if (diff > INT_MAX)
-   diff = INT_MAX;
-
-   dev->last_residency = (int) diff;
+   dev->last_residency = convert_nsec_to_usec(time_end - time_start);

if (entered_state >= 0) {
/* Update cpuidle counters */
diff --git a/drivers/cpuidle/cpuidle.h b/drivers/cpuidle/cpuidle.h
index f87f399..c8ea5ad 100644
--- a/drivers/cpuidle/cpuidle.h
+++ b/drivers/cpuidle/cpuidle.h
@@ -68,4 +68,27 @@ static inline void cpuidle_coupled_unregister_device(struct 
cpuidle_device *dev)
  }
  #endif

+/*
+ * Used for calculating last_residency in usec. Optimized for case
+ * where last_residency in nsecs is < INT_MAX/2 by using faster
+ * approximation. Approximated value has less than 1% error.
+ */
+static inline int convert_nsec_to_usec(u64 nsec)
+{
+   if (likely(nsec < INT_MAX / 2)) {


UINT_MAX ?


+   int usec = (int)nsec;
+
+   usec += usec >> 5;
+   usec = usec >> 10;
+   return usec;
+   } else {
+   u64 usec = div_u64(nsec, 1000);
+
+   if (usec > INT_MAX)
+   usec = INT_MAX;
+   return (int)usec;
+   }
+}




--
  Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3] cpuidle: Fix last_residency division

2016-06-29 Thread Shreyas B. Prabhu
Snooze is a poll idle state in powernv and pseries platforms. Snooze
has a timeout so that if a cpu stays in snooze for more than target
residency of the next available idle state, then it would exit thereby
giving chance to the cpuidle governor to re-evaluate and
promote the cpu to a deeper idle state. Therefore whenever snooze exits
due to this timeout, its last_residency will be target_residency of next
deeper state.

commit e93e59ce5b85 ("cpuidle: Replace ktime_get() with local_clock()")
changed the math around last_residency calculation. Specifically, while
converting last_residency value from nanoseconds to microseconds it does
right shift by 10. Due to this, in snooze timeout exit scenarios
last_residency calculated is roughly 2.3% less than target_residency of
next available state. This pattern is picked up get_typical_interval()
in the menu governor and therefore expected_interval in menu_select() is
frequently less than the target_residency of any state but snooze.

Due to this we are entering snooze at a higher rate, thereby affecting
the single thread performance.

Fix this by using a better approximation for division by 1000.

Reported-by: Anton Blanchard 
Bisected-by: Shilpasri G Bhat 
Suggested-by David Laight 
Signed-off-by: Shreyas B. Prabhu 
---
Changes in v3
=
 - Using approximation suggested by David

Changes in v2
=
 - Fixing it in the cpuidle core code instead of driver code.

 drivers/cpuidle/cpuidle.c | 11 +++
 drivers/cpuidle/cpuidle.h | 23 +++
 2 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index a4d0059..e9a7f74 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -174,7 +174,6 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct 
cpuidle_driver *drv,
struct cpuidle_state *target_state = >states[index];
bool broadcast = !!(target_state->flags & CPUIDLE_FLAG_TIMER_STOP);
u64 time_start, time_end;
-   s64 diff;
 
/*
 * Tell the time framework to switch to a broadcast timer because our
@@ -218,14 +217,10 @@ int cpuidle_enter_state(struct cpuidle_device *dev, 
struct cpuidle_driver *drv,
local_irq_enable();
 
/*
-* local_clock() returns the time in nanosecond, let's shift
-* by 10 (divide by 1024) to have microsecond based time.
+* local_clock() returns the time in nanosecond, convert it to
+* microsecond based time.
 */
-   diff = (time_end - time_start) >> 10;
-   if (diff > INT_MAX)
-   diff = INT_MAX;
-
-   dev->last_residency = (int) diff;
+   dev->last_residency = convert_nsec_to_usec(time_end - time_start);
 
if (entered_state >= 0) {
/* Update cpuidle counters */
diff --git a/drivers/cpuidle/cpuidle.h b/drivers/cpuidle/cpuidle.h
index f87f399..c8ea5ad 100644
--- a/drivers/cpuidle/cpuidle.h
+++ b/drivers/cpuidle/cpuidle.h
@@ -68,4 +68,27 @@ static inline void cpuidle_coupled_unregister_device(struct 
cpuidle_device *dev)
 }
 #endif
 
+/*
+ * Used for calculating last_residency in usec. Optimized for case
+ * where last_residency in nsecs is < INT_MAX/2 by using faster
+ * approximation. Approximated value has less than 1% error.
+ */
+static inline int convert_nsec_to_usec(u64 nsec)
+{
+   if (likely(nsec < INT_MAX / 2)) {
+   int usec = (int)nsec;
+
+   usec += usec >> 5;
+   usec = usec >> 10;
+   return usec;
+   } else {
+   u64 usec = div_u64(nsec, 1000);
+
+   if (usec > INT_MAX)
+   usec = INT_MAX;
+   return (int)usec;
+   }
+}
+
+
 #endif /* __DRIVER_CPUIDLE_H */
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2] cpuidle: Fix last_residency division

2016-06-29 Thread Shreyas B Prabhu


On 06/27/2016 02:29 PM, David Laight wrote:
> From: Arnd Bergmann
>> Sent: 24 June 2016 20:43
>> On Friday, June 24, 2016 9:31:35 PM CEST Shreyas B Prabhu wrote:
 If those functions are called less often than cpuidle_enter_state(),
 we could just move the division there. Since the divisor is constant,
 do_div() can convert it into a multiply and shift, or we could use
 your the code you suggest above, or use a 32-bit division most of
 the time:

   if (diff <= UINT_MAX)
   diff_32 = (u32)diff / NSECS_PER_USEC;
   else
   diff_32 = div_u64(diff, NSECS_PER_USEC;

 which gcc itself will turn into a multiplication or series of
 shifts on CPUs on which that is faster.

>>> I'm not sure which division method of the three suggested here to use.
>>> Does anyone have a strong preference?
>>>
>>
>> It depends on how accurate we want it and how long we expect
>> the times to be. The optimization for the 4.2 second cutoff
>> for doing a 32-bit division only makes sense if the majority
>> of the sleep times are below that.
> 
> It also depends if the code actually cares about the length of 'long' sleeps.
> I'd guess that for cpu idle 4.2 seconds is 'a long time', so the div_u64()
> result could be treated as 4.2 seconds without causing grief.
> 
> Actually the cost of a 64bit divide after a 4 second sleep will be noise.
> OTOH a 64bit divide after a sleep that lasted a few ns will be significant.
> 
Agreed. I'll use the code you suggested, with a small change-
Using diff_32 += diff_32 >> 5 instead of diff_32 += diff_32 >> 6
since I want to err on the side of last_residency being more than actual.

And for long sleep cases, I'll use div_u64().

Thanks,
Shreyas

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 3/4] perf annotate: add powerpc support

2016-06-29 Thread Ravi Bangoria

Thanks David.

On Tuesday 28 June 2016 09:37 PM, David Laight wrote:

From: Ravi Bangoria

Sent: 28 June 2016 12:37

Powerpc has long list of branch instructions and hardcoding them in table
appears to be error-prone. So, add new function to find instruction
instead of creating table.

Signed-off-by: Naveen N. Rao 
Signed-off-by: Ravi Bangoria 
---
  tools/perf/util/annotate.c | 64 ++
  1 file changed, 64 insertions(+)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 36a5825..96c6610 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -476,6 +476,68 @@ static int ins__cmp(const void *a, const void *b)
return strcmp(ia->name, ib->name);
  }

+static struct ins *ins__find_powerpc(const char *name)

It would be better if the function name include 'branch'.


+{
+   int i;
+   struct ins *ins;
+
+   ins = zalloc(sizeof(struct ins));
+   if (!ins)
+   return NULL;
+
+   ins->name = strdup(name);
+   if (!ins->name)
+   return NULL;

You leak 'ins' here.


+
+   if (name[0] == 'b') {
+   /* branch instructions */
+   ins->ops = _ops;
+
+   /*
+* - Few start with 'b', but aren't branch instructions.
+* - Let's also ignore instructions involving 'ctr' and
+*   'tar' since target branch addresses for those can't
+*   be determined statically.
+*/
+   if (!strncmp(name, "bcd", 3)   ||
+   !strncmp(name, "brinc", 5) ||
+   !strncmp(name, "bper", 4)  ||
+   strstr(name, "ctr")||
+   strstr(name, "tar"))
+   return NULL;

More importantly you leak 'ins' and 'ins->name' here.
And on other paths below.


Yes. Fair points.

I can create linked list that maintain allocated instructions and
lookup it every time before allocating memory. But for this,
I need to free memory at the end and it's becoming complicated.

I can go back to normal approach of creating table for powerpc.
This is simplest. But only problem is powerpc has around 400 branch
instructions(which includes call and ret as well). And list them all is
bit error-prone.

Suggestions?

- Ravi


...

David



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 13/38] powerpc: Put exception configuration in a common place

2016-06-29 Thread Benjamin Herrenschmidt
The various calls to establish exception endianness and AIL are
now done from a single point using already established CPU and FW
feature bits to decide what to do.

Signed-off-by: Benjamin Herrenschmidt 
---
 
v2: Add/fix prototypes of exported function, remove "static"

 arch/powerpc/include/asm/hvcall.h  |  8 ++--
 arch/powerpc/include/asm/opal.h|  1 +
 arch/powerpc/kernel/setup_64.c | 68 +++---
 arch/powerpc/platforms/powernv/opal.c  | 13 +++
 arch/powerpc/platforms/pseries/setup.c | 31 +---
 5 files changed, 66 insertions(+), 55 deletions(-)

diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index 0bc9c28..b88efbb 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -434,13 +434,15 @@ static inline unsigned long cmo_get_page_size(void)
 
 extern long pSeries_enable_reloc_on_exc(void);
 extern long pSeries_disable_reloc_on_exc(void);
-
 extern long pseries_big_endian_exceptions(void);
+extern long pseries_little_endian_exceptions(void);
 
 #else
 
-#define pSeries_enable_reloc_on_exc()  do {} while (0)
-#define pSeries_disable_reloc_on_exc() do {} while (0)
+#define pSeries_enable_reloc_on_exc()  (0)
+#define pSeries_disable_reloc_on_exc() (0)
+#define pseries_big_endian_exceptions()(0)
+#define pseries_little_endian_exceptions() (0)
 
 #endif /* CONFIG_PPC_PSERIES */
 
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 9d86c66..6135816 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -215,6 +215,7 @@ extern int early_init_dt_scan_opal(unsigned long node, 
const char *uname,
   int depth, void *data);
 extern int early_init_dt_scan_recoverable_ranges(unsigned long node,
 const char *uname, int depth, void *data);
+extern void opal_configure_cores(void);
 
 extern int opal_get_chars(uint32_t vtermno, char *buf, int count);
 extern int opal_put_chars(uint32_t vtermno, const char *buf, int total_len);
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index a641753..47a2706 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -69,6 +69,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef DEBUG
 #define DBG(fmt...) udbg_printf(fmt)
@@ -205,21 +206,60 @@ static void fixup_boot_paca(void)
get_paca()->data_offset = 0;
 }
 
+static void configure_exceptions(void)
+{
+   /* Setup the trampolines from the lowmem exception vectors
+* to the kdump kernel when not using a relocatable kernel.
+*/
+   setup_kdump_trampoline();
+
+   /* Under a PAPR hypervisor, we need hypercalls */
+   if (firmware_has_feature(FW_FEATURE_SET_MODE)) {
+   long rc;
+
+   /* Enable AIL */
+   rc = pSeries_enable_reloc_on_exc();
+   if (rc == H_P2) {
+   pr_info("Relocation on exceptions not supported\n");
+   } else if (rc != H_SUCCESS) {
+   pr_warn("Unable to enable relocation on exceptions: "
+   "%ld\n", rc);
+   }
+
+   /*
+* Tell the hypervisor that we want our exceptions to
+* be taken in little endian mode. If this fails we don't
+* want to use BUG() because it will trigger an exception.
+*
+* We don't call this for big endian as our calling convention
+* makes us always enter in BE, and the call may fail under
+* some circumstances with kdump.
+*/
+#ifdef __LITTLE_ENDIAN__
+   rc = pseries_little_endian_exceptions();
+   if (rc) {
+   ppc_md.progress("H_SET_MODE LE exception fail", 0);
+   panic("Could not enable little endian exceptions");
+   }
+#endif
+   } else {
+   /* Set endian mode using OPAL */
+   if (firmware_has_feature(FW_FEATURE_OPAL))
+   opal_configure_cores();
+
+   /* Enable AIL if supported, and we are in hypervisor mode */
+   if (cpu_has_feature(CPU_FTR_HVMODE) &&
+   cpu_has_feature(CPU_FTR_ARCH_207S)) {
+   unsigned long lpcr = mfspr(SPRN_LPCR);
+   mtspr(SPRN_LPCR, lpcr | LPCR_AIL_3);
+   }
+   }
+}
+
 static void cpu_ready_for_interrupts(void)
 {
/* Set IR and DR in PACA MSR */
get_paca()->kernel_msr = MSR_KERNEL;
-
-   /*
-* Enable AIL if supported, and we are in hypervisor mode. If we are
-* not in hypervisor mode, we enable relocation-on interrupts later
-* in pSeries_setup_arch() using the H_SET_MODE hcall.
-*/
-   if 

Re: powerpc/eeh: Fix wrong argument passed to eeh_rmv_device()

2016-06-29 Thread Michael Ellerman
On Fri, 2016-24-06 at 04:49:02 UTC, Gavin Shan wrote:
> When calling eeh_rmv_device() in eeh_reset_device() for partial
> hotplug case, @rmv_data instead of its address is the proper
> argument. Otherwise, the stack frame is corrupted when writing
> to @rmv_data (actually its address) in eeh_rmv_device(). It
> results in kernel crash as observed.
> 
> This fixes the issue by passing @rmv_data, not its address to
> eeh_rmv_device() in eeh_reset_device().
> 
> Fixes: 67086e32b564 ("powerpc/eeh: powerpc/eeh: Support error recovery for VF 
> PE")
> Reported-by: Pridhiviraj Paidipeddi 
> Signed-off-by: Gavin Shan 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/cca0e542e02e48cce541a49c40

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: powerpc/tm: Avoid SLB faults in treclaim/trecheckpoint when RI=0

2016-06-29 Thread Michael Ellerman
On Tue, 2016-28-06 at 03:01:04 UTC, Michael Neuling wrote:
> Currently we have 2 segments that are bolted for the kernel linear
> mapping (ie 0xc000... addresses). This is 0 to 1TB and also the kernel
> stacks.  Anything accessed outside of these regions may need to be
> faulted in.
...
> 
> Signed-off-by: Michael Neuling 
> Reviewed-by: Cyril Bur 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/190ce8693c23eae09ba5f303a8

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v2, 2/2] powerpc: tm: Always reclaim in start_thread() for exec() class syscalls

2016-06-29 Thread Michael Ellerman
On Fri, 2016-17-06 at 04:58:34 UTC, Cyril Bur wrote:
> Userspace can quite legitimately perform an exec() syscall with a
> suspended transaction. exec() does not return to the old process,
...
> 
> Fixes: bc2a940 ("powerpc: Hook in new transactional memory code")
> Signed-off-by: Cyril Bur 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/8e96a87c5431c256feb65bcfc5

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC 0/3] Enable MSR_TM lazily

2016-06-29 Thread Cyril Bur
Currently the kernel checks to see if the hardware is transactional
memory capable and always enables the MSR_TM bit. The problem with
this is that the TM related SPRs become available to userspace,
requiring them to be switched between processes. It turns out these
SPRs are expensive to read and write and if a thread doesn't use TM
(or worse yet isn't even TM aware) then context switching incurs this
penalty for nothing.

The solution here is to leave the MSR_TM bit disabled and enable it
more 'on demand'. Leaving MSR_TM disabled cause a thread to take a
facility unavailable fault if and when it does decide to use TM. As
with recent updates to the FPU, VMX and VSX units the MSR_TM bit will
be enabled upon taking the fault and left on for some time afterwards
as the assumption is that if a thread used TM ones it may well use it
again. The kernel will turn the MSR_TM bit off after some number of
context switches of that thread.

Performance numbers haven't been completely gathered as yet but early
runs of tools/testing/selftests/powerpc/benchmarks/context_switch
(which doesn't use TM) yields a jump from ~16 switches per second
to ~18 switches per second with patch 3/3 applied.

These patches will need to be applied on top of my recent rework of
TM: http://patchwork.ozlabs.org/patch/631959/
I have pushed a branch to github to help with reviews:
https://github.com/cyrilbur-ibm/linux/tree/tm_lazy

Cyril Bur (3):
  selftests/powerpc: Add test to check TM ucontext creation
  powerpc: tm: Add TM Unavailable Exception
  powerpc: tm: Enable transactional memory (TM) lazily for userspace

 arch/powerpc/include/asm/processor.h   |   1 +
 arch/powerpc/kernel/process.c  |  30 --
 arch/powerpc/kernel/traps.c|  33 +++
 .../selftests/powerpc/tm/tm-signal-context-chk.c   | 102 +
 4 files changed, 158 insertions(+), 8 deletions(-)
 create mode 100644 tools/testing/selftests/powerpc/tm/tm-signal-context-chk.c

-- 
2.9.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC 2/3] powerpc: tm: Add TM Unavailable Exception

2016-06-29 Thread Cyril Bur
If the kernel disables transactional memory (TM) and userspace still
tries TM related actions (TM instructions or TM SPR accesses) TM aware
hardware will cause the kernel to take a facility unavailable
exception.

Add checks for the exception being caused by illegal TM access in
userspace.

Signed-off-by: Cyril Bur 
---
 arch/powerpc/kernel/traps.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 3e4c84d..29260ee 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1364,6 +1364,13 @@ void vsx_unavailable_exception(struct pt_regs *regs)
die("Unrecoverable VSX Unavailable Exception", regs, SIGABRT);
 }
 
+static void tm_unavailable(struct pt_regs *regs)
+{
+   pr_emerg("Unrecoverable TM Unavailable Exception "
+   "%lx at %lx\n", regs->trap, regs->nip);
+   die("Unrecoverable TM Unavailable Exception", regs, SIGABRT);
+}
+
 #ifdef CONFIG_PPC64
 void facility_unavailable_exception(struct pt_regs *regs)
 {
@@ -1434,6 +1441,23 @@ void facility_unavailable_exception(struct pt_regs *regs)
return;
}
 
+   /*
+* TM Unavailable
+*
+* If
+*  - firmware bits say don't do TM or
+*  - CONFIG_PPC_TRANSACTIONAL_MEM was not set and
+*  - hardware is actually TM aware
+* Then userspace can spam the console (even with the use of
+* _ratelimited), just send the SIGILL.
+*/
+   if (status == FSCR_TM_LG) {
+   if (!cpu_has_feature(CPU_FTR_TM))
+   goto out;
+   tm_unavailable(regs);
+   return;
+   }
+
if ((status < ARRAY_SIZE(facility_strings)) &&
facility_strings[status])
facility = facility_strings[status];
@@ -1446,6 +1470,7 @@ void facility_unavailable_exception(struct pt_regs *regs)
"%sFacility '%s' unavailable, exception at 0x%lx, MSR=%lx\n",
hv ? "Hypervisor " : "", facility, regs->nip, regs->msr);
 
+out:
if (user_mode(regs)) {
_exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
return;
-- 
2.9.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC 1/3] selftests/powerpc: Add test to check TM ucontext creation

2016-06-29 Thread Cyril Bur
Signed-off-by: Cyril Bur 
---
 .../selftests/powerpc/tm/tm-signal-context-chk.c   | 102 +
 1 file changed, 102 insertions(+)
 create mode 100644 tools/testing/selftests/powerpc/tm/tm-signal-context-chk.c

diff --git a/tools/testing/selftests/powerpc/tm/tm-signal-context-chk.c 
b/tools/testing/selftests/powerpc/tm/tm-signal-context-chk.c
new file mode 100644
index 000..4c906cf
--- /dev/null
+++ b/tools/testing/selftests/powerpc/tm/tm-signal-context-chk.c
@@ -0,0 +1,102 @@
+/*
+ * Copyright 2016, Cyril Bur, IBM Corp.
+ * Licensed under GPLv2.
+ *
+ * Test the kernel's signal frame code.
+ *
+ * The kernel sets up two sets of ucontexts if the signal was to be delivered
+ * while the thread was in a transaction. Expected behaviour is that the
+ * currently executing code is in the first and the checkpointed state (the
+ * state that will be rolled back to) is in the uc_link ucontext.
+ *
+ * The reason for this is that code which is not TM aware and installs a signal
+ * handler will expect to see/modify its currently running state in the uc,
+ * this code may have dynamicially linked against code which is TM aware and is
+ * doing HTM under the hood.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include "utils.h"
+#include "tm.h"
+
+#define TBEGIN  ".long 0x7C00051D ;"
+#define TSUSPEND".long 0x7C0005DD ;"
+#define TRESUME ".long 0x7C2005DD ;"
+#define MAX_ATTEMPT 100
+
+static double fps[] = { 1, 2, 3, 4, 5, 6, 7, 8,
+   -1, -2, -3, -4, -5, -6, -7, -8 
};
+
+extern long tm_signal_self(pid_t pid, double *fps);
+
+static int signaled;
+static int fail;
+
+static void signal_usr1(int signum, siginfo_t *info, void *uc)
+{
+   int i;
+   ucontext_t *ucp = uc;
+   ucontext_t *tm_ucp = ucp->uc_link;
+
+   signaled = 1;
+
+   /* Always be 64bit, don't really care about 32bit */
+   for (i = 0; i < 8 && !fail; i++) {
+   fail = (ucp->uc_mcontext.gp_regs[i + 14] != i);
+   fail |= (tm_ucp->uc_mcontext.gp_regs[i + 14] != 0xFF - i);
+   }
+   if (fail) {
+   printf("Failed on %d gpr %lu or %lu\n", i - 1, 
ucp->uc_mcontext.gp_regs[i + 13], tm_ucp->uc_mcontext.gp_regs[i + 13]);
+   return;
+   }
+   for (i = 0; i < 8 && !fail; i++) {
+   fail = (ucp->uc_mcontext.fp_regs[i + 14] != fps[i]);
+   fail |= (tm_ucp->uc_mcontext.fp_regs[i + 14] != fps[i + 8]);
+   }
+   if (fail) {
+   printf("Failed on %d FP %g or %g\n", i - 1, 
ucp->uc_mcontext.fp_regs[i + 13], tm_ucp->uc_mcontext.fp_regs[i + 13]);
+   }
+}
+
+static int tm_signal_context_chk()
+{
+   struct sigaction act;
+   int i;
+   long rc;
+   pid_t pid = getpid();
+
+   SKIP_IF(!have_htm());
+
+   act.sa_sigaction = signal_usr1;
+   sigemptyset(_mask);
+   act.sa_flags = SA_SIGINFO;
+   if (sigaction(SIGUSR1, , NULL) < 0) {
+   perror("sigaction sigusr1");
+   exit(1);
+   }
+
+   i = 0;
+   while (!signaled && i < MAX_ATTEMPT) {
+   rc = tm_signal_self(pid, fps);
+   if (!rc) {
+   fprintf(stderr, "Transaction was not doomed...\n");
+   FAIL_IF(!rc);
+   }
+   i++;
+   }
+
+   if (i == MAX_ATTEMPT) {
+   fprintf(stderr, "Tried to signal %d times and didn't work, 
failing!\n", MAX_ATTEMPT);
+   fail = 1;
+   }
+   return fail;
+}
+
+int main(void)
+{
+   return test_harness(tm_signal_context_chk, "tm_signal_context_chk");
+}
-- 
2.9.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC 3/3] powerpc: tm: Enable transactional memory (TM) lazily for userspace

2016-06-29 Thread Cyril Bur
Currently the MSR TM bit is always set if the hardware is TM capable.
This adds extra overhead as it means the TM SPRS (TFHAR, TEXASR and
TFAIR) must be swapped for each process regardless of if they use TM.

For processes that don't use TM the TM MSR bit can be turned off
allowing the kernel to avoid the expensive swap of the TM registers.

A TM unavailable exception will occur if a thread does use TM and the
kernel will enable MSR_TM and leave it so for some time afterwards.

Signed-off-by: Cyril Bur 
---
 arch/powerpc/include/asm/processor.h |  1 +
 arch/powerpc/kernel/process.c| 30 ++
 arch/powerpc/kernel/traps.c  |  8 
 3 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 5ff1e4c..9d4363c 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -257,6 +257,7 @@ struct thread_struct {
int used_spe;   /* set if process has used spe */
 #endif /* CONFIG_SPE */
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+   u8  load_tm;
u64 tm_tfhar;   /* Transaction fail handler addr */
u64 tm_texasr;  /* Transaction exception & summary */
u64 tm_tfiar;   /* Transaction fail instr address reg */
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 2e903c6..8abecda 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -870,6 +870,9 @@ void tm_recheckpoint(struct thread_struct *thread,
 {
unsigned long flags;
 
+   if (!(thread->regs->msr & MSR_TM))
+   return;
+
/* We really can't be interrupted here as the TEXASR registers can't
 * change and later in the trecheckpoint code, we have a userspace R1.
 * So let's hard disable over this region.
@@ -905,6 +908,9 @@ static inline void tm_recheckpoint_new_task(struct 
task_struct *new)
if (!new->thread.regs)
return;
 
+   if (!(new->thread.regs->msr & MSR_TM))
+   return;
+
if (!MSR_TM_ACTIVE(new->thread.regs->msr)){
tm_restore_sprs(>thread);
return;
@@ -925,11 +931,18 @@ static inline void tm_recheckpoint_new_task(struct 
task_struct *new)
 new->pid, mfmsr());
 }
 
-static inline void __switch_to_tm(struct task_struct *prev)
+static inline void __switch_to_tm(struct task_struct *prev, struct task_struct 
*new)
 {
if (cpu_has_feature(CPU_FTR_TM)) {
-   tm_enable();
-   tm_reclaim_task(prev);
+   if (prev->thread.regs && (prev->thread.regs->msr & MSR_TM)) {
+   prev->thread.load_tm++;
+   tm_enable();
+   tm_reclaim_task(prev);
+   if (!MSR_TM_ACTIVE(prev->thread.regs->msr) && 
prev->thread.load_tm == 0)
+   prev->thread.regs->msr |= ~MSR_TM;
+   } else if (new && new->thread.regs && (new->thread.regs->msr & 
MSR_TM)) {
+   tm_enable();
+   }
}
 }
 
@@ -965,7 +978,7 @@ void restore_tm_state(struct pt_regs *regs)
 
 #else
 #define tm_recheckpoint_new_task(new)
-#define __switch_to_tm(prev)
+#define __switch_to_tm(prev, new)
 #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
 
 static inline void save_sprs(struct thread_struct *t)
@@ -1095,7 +1108,7 @@ struct task_struct *__switch_to(struct task_struct *prev,
/* Save FPU, Altivec, VSX and SPE state */
giveup_all(prev);
 
-   __switch_to_tm(prev);
+   __switch_to_tm(prev, new);
 
/*
 * We can't take a PMU exception inside _switch() since there is a
@@ -1340,8 +1353,11 @@ int arch_dup_task_struct(struct task_struct *dst, struct 
task_struct *src)
 * transitions the CPU out of TM mode.  Hence we need to call
 * tm_recheckpoint_new_task() (on the same task) to restore the
 * checkpointed state back and the TM mode.
+*
+* Can't pass dst because it isn't ready. Doesn't matter, passing
+* dst is only important for __switch_to()
 */
-   __switch_to_tm(src);
+   __switch_to_tm(src, NULL);
tm_recheckpoint_new_task(src);
 
*dst = *src;
@@ -1574,8 +1590,6 @@ void start_thread(struct pt_regs *regs, unsigned long 
start, unsigned long sp)
current->thread.used_spe = 0;
 #endif /* CONFIG_SPE */
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
-   if (cpu_has_feature(CPU_FTR_TM))
-   regs->msr |= MSR_TM;
current->thread.tm_tfhar = 0;
current->thread.tm_texasr = 0;
current->thread.tm_tfiar = 0;
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 29260ee..141b953 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1366,6 +1366,14 @@ void 

Re: linux-next: manual merge of the powerpc tree with Linus' tree

2016-06-29 Thread Michael Ellerman
On Wed, 2016-06-29 at 10:54 +0530, Naveen N. Rao wrote:
> On 2016/06/29 10:35AM, Stephen Rothwell wrote:
> > 
> > Today's linux-next merge of the powerpc tree got a conflict in:
> > 
> >   arch/powerpc/Kconfig
> > 
> > between commit:
> > 
> >   844e3be47693 ("powerpc/bpf/jit: Disable classic BPF JIT on ppc64le")
> 
> Ah, I see that the above commit is not part of powerpc next tree, which 
> explains the conflict.
 
I'll probably merge the fixes branch into next at some point, so then it will
be sorted.

> > I fixed it up (see below - I am not sure this entirely correct) and can

That resolution is fine.

cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev