[PATCH] arm64: Fix incorrect per-cpu usage for boot CPU

2016-07-21 Thread Suzuki K Poulose
In smp_prepare_boot_cpu(), we invoke cpuinfo_store_boot_cpu to  store
the cpuinfo in a per-cpu ptr, before initialising the per-cpu offset for
the boot CPU. This patch reorders the sequence to make sure we initialise
the per-cpu offset before accessing the per-cpu area.

Commit 4b998ff1885eec ("arm64: Delay cpuinfo_store_boot_cpu") fixed the
issue where we modified the per-cpu area even before the kernel initialises
the per-cpu areas, but failed to wait until the boot cpu updated it's
offset.

Fixes: commit 4b998ff1885eec ("arm64: Delay cpuinfo_store_boot_cpu")
Cc: <sta...@vger.kernel.org>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Catalin Marinas <catalin.mari...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/kernel/smp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 62ff3c0..d242e81 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -437,9 +437,9 @@ void __init smp_cpus_done(unsigned int max_cpus)
 
 void __init smp_prepare_boot_cpu(void)
 {
+   set_my_cpu_offset(per_cpu_offset(smp_processor_id()));
cpuinfo_store_boot_cpu();
save_boot_cpu_run_el();
-   set_my_cpu_offset(per_cpu_offset(smp_processor_id()));
 }
 
 static u64 __init of_get_cpu_mpidr(struct device_node *dn)
-- 
2.7.4



Re: [PATCH V2 5/6] coresight: adding sink parameter to function coresight_build_path()

2016-07-21 Thread Suzuki K Poulose

On 20/07/16 21:38, Mathieu Poirier wrote:

Up to now function coresight_build_path() was counting on a sink to
have been selected (from sysFS) prior to being called.  This patch
adds a string argument so that a sink matching the argument can be
selected.




 static int _coresight_build_path(struct coresight_device *csdev,
-struct list_head *path)
+struct list_head *path, const char *sink)
 {
int i;
bool found = false;
struct coresight_node *node;

-   /* An activated sink has been found.  Enqueue the element */
-   if ((csdev->type == CORESIGHT_DEV_TYPE_SINK ||
-csdev->type == CORESIGHT_DEV_TYPE_LINKSINK) && csdev->activated)
-   goto out;
+   /*
+* First see if we are dealing with a sink.  If we have one check if
+* it was selected via sysFS or the perf cmd line.
+*/
+   if (csdev->type == CORESIGHT_DEV_TYPE_SINK ||
+   csdev->type == CORESIGHT_DEV_TYPE_LINKSINK) {
+   /* Activated via perf cmd line */
+   if (sink && !strcmp(dev_name(>dev), sink))
+   goto out;
+   /* Activated via sysFS */
+   if (csdev->activated)


When a sink is specified, should we skip an activated sink and continue to
find the specified one ? or at least fail with an error as we may not be using
the sink specified by the user ?
i.e may be :
if (!sink && csdev->activated)
goto out;

Suzuki


[PATCH] [4.7] arm64: Honor nosmp kernel command line option

2016-07-21 Thread Suzuki K Poulose
Passing "nosmp" should boot the kernel with a single processor, without
provision to enable secondary CPUs even if they are present. "nosmp" is
implemented by setting maxcpus=0. At the moment we still mark the secondary
CPUs present even with nosmp, which allows the userspace to bring them
up. This patch corrects the smp_prepare_cpus() to honor the maxcpus == 0.

Commit 44dbcc93ab67145 ("arm64: Fix behavior of maxcpus=N") fixed the
behavior for maxcpus >= 1, but broke maxcpus = 0.

Fixes: commit 44dbcc93ab67145 ("arm64: Fix behavior of maxcpus=N")
Cc: Will Deacon <will.dea...@arm.com>
Cc: Catalin Marinas <catalin.mari...@arm.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: James Morse <james.mo...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/kernel/smp.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index d242e81..ec08b7a 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -694,6 +694,13 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
smp_store_cpu_info(smp_processor_id());
 
/*
+* If UP is mandated by "nosmp"(implies maxcpus=0), don't bother about
+* secondary CPUs.
+*/
+   if (max_cpus == 0)
+   return;
+
+   /*
 * Initialise the present map (which describes the set of CPUs
 * actually populated at the present time) and release the
 * secondaries from the bootloader.
-- 
2.7.4



Re: [RFCv2 4/4] perf: util: support sysfs supported_cpumask file

2016-07-18 Thread Suzuki K Poulose

On 15/07/16 11:08, Mark Rutland wrote:

For system PMUs, the perf tools have long expected a cpumask file under
sysfs, describing the single CPU which they support events being
opened/handled on. Prior patches in this series have reworked this
support to support multiple CPUs in a mask, as is required to handle
heterogeneous CPU PMUs.

Unfortunately, adding a cpumask file to CPU PMUs would break existing
userspace. Prior to this series, perf record will refuse to open events,
and perf stat may unexpectedly block at exit time. In the absence of a
cpumask, perf stat is functional.

To address this, this patch adds support for a new file,
supported_cpumask, which can be used to describe heterogeneous CPUs,
without the risk of breaking existing userspace binaries.

Signed-off-by: Mark Rutland 
---
 tools/perf/util/pmu.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index ddb0261..06c985c 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -445,14 +445,23 @@ static struct cpu_map *pmu_cpumask(const char *name)
FILE *file;
struct cpu_map *cpus;
const char *sysfs = sysfs__mountpoint();
+   const char *path_template[] = {
+"%s/bus/event_source/devices/%s/cpumask",
+"%s/bus/event_source/devices/%s/supported_cpumask",
+NULL
+   };
+   unsigned int i;

if (!sysfs)
return NULL;

-   snprintf(path, PATH_MAX,
-"%s/bus/event_source/devices/%s/cpumask", sysfs, name);
+   for (i = 0; i < ARRAY_SIZE(path_template); i++) {


The check could be "path_template[i]" to avoid an iteration with NULL
template.


+   snprintf(path, PATH_MAX, *path_template, sysfs, name);


Btw, did you mean to use path_template[i] here instead of *path_template ?


+   if (stat(path, ) == 0)
+   break;
+   }

-   if (stat(path, ) < 0)
+   if (!*path_template)


Same here ?


return NULL;


Suzuki


Re: [PATCH 01/10] coresight: etm-perf: pass struct perf_event to source::enable/disable()

2016-07-20 Thread Suzuki K Poulose

On 18/07/16 20:51, Mathieu Poirier wrote:

With this commit [1] address range filter information is now found
in the struct hw_perf_event::addr_filters.  As such pass the event
itself to the coresight_source::enable/disable() functions so that
both event attribute and filter can be accessible for configuration.

[1] 'commit 375637bc5249 ("perf/core: Introduce address range filtering")'



diff --git a/include/linux/coresight.h b/include/linux/coresight.h
index 385d62e64abb..2a5982c37dfb 100644
--- a/include/linux/coresight.h
+++ b/include/linux/coresight.h
@@ -232,8 +232,9 @@ struct coresight_ops_source {
int (*cpu_id)(struct coresight_device *csdev);
int (*trace_id)(struct coresight_device *csdev);
int (*enable)(struct coresight_device *csdev,
- struct perf_event_attr *attr,  u32 mode);
-   void (*disable)(struct coresight_device *csdev);
+ struct perf_event *event,  u32 mode);
+   void (*disable)(struct coresight_device *csdev,
+   struct perf_event *event);


nit:

Should we make this a a bit more generic API rather than hard coding
the perf stuff in there ? i.e,

how about :

int (*enable)(struct coresight_device *csdev, void *data, u32 mode)

void (*disable)(struct coresight_device *csdev, void *data, u32 mode)

where data is specific to the mode of operation. That way the API is
cleaner and each mode could pass their own data (even though sysfs
doesn't use any at the moment).

Suzuki


Re: [PATCH 03/10] coresight: etm-perf: configuring filters from perf core

2016-07-20 Thread Suzuki K Poulose

On 18/07/16 20:51, Mathieu Poirier wrote:

This patch implements the required API needed to access
and retrieve range and start/stop filters from the perf core.

Signed-off-by: Mathieu Poirier 
---
 drivers/hwtracing/coresight/coresight-etm-perf.c | 146 ---
 drivers/hwtracing/coresight/coresight-etm-perf.h |  32 +
 2 files changed, 162 insertions(+), 16 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c 
b/drivers/hwtracing/coresight/coresight-etm-perf.c
index 78a1bc0013a2..fde7f42149c5 100644
--- a/drivers/hwtracing/coresight/coresight-etm-perf.c
+++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
@@ -29,6 +29,7 @@
 #include 

 #include "coresight-priv.h"
+#include "coresight-etm-perf.h"

 static struct pmu etm_pmu;
 static bool etm_perf_up;
@@ -83,12 +84,44 @@ static const struct attribute_group *etm_pmu_attr_groups[] 
= {

 static void etm_event_read(struct perf_event *event) {}

+static int etm_addr_filters_alloc(struct perf_event *event)
+{


...


+   return 0;
+}
+




+
 static int etm_event_init(struct perf_event *event)
 {
+   int ret;
+
if (event->attr.type != etm_pmu.type)
return -ENOENT;

-   return 0;
+   ret = etm_addr_filters_alloc(event);




 }

 static void free_event_data(struct work_struct *work)
@@ -456,6 +489,85 @@ static void etm_free_drv_configs(struct perf_event *event)
}
 }

+static int etm_addr_filters_validate(struct list_head *filters)
+{



+
+   return 0;
+}
+
+static void etm_addr_filters_sync(struct perf_event *event)
+{
+   struct perf_addr_filters_head *head = perf_event_addr_filters(event);
+   unsigned long start, stop, *offs = event->addr_filters_offs;
+   struct etm_filters *filters = event->hw.addr_filters;
+   struct perf_addr_filter *filter;
+   int i = 0;


Is it possible to delay the etm_addr_filters_alloc() until this point ?
I understand that this function cannot report back failures if we fail
to allocate memory. Or may be do a lazy allocation from addr_filters_validate(),
when we get the first filter added.

Of course this could be done as a follow up patch to improve things once
we get the initial framework in.




+
+   list_for_each_entry(filter, >list, entry) {
+   start = filter->offset + offs[i];
+   stop = start + filter->size;
+
+   if (filter->range == 1) {
+   filters->filter[i].start_addr = start;
+   filters->filter[i].stop_addr = stop;
+   filters->filter[i].type = ETM_ADDR_TYPE_RANGE;
+   } else {
+   if (filter->filter == 1) {
+   filters->filter[i].start_addr = start;
+   filters->filter[i].type = ETM_ADDR_TYPE_START;
+   } else {
+   filters->filter[i].stop_addr = stop;
+   filters->filter[i].type = ETM_ADDR_TYPE_STOP;
+   }
+   }
+   i++;
+   }
+
+   filters->nr_filters = i;
+/**
+ * struct etm_filters - set of filters for a session
+ * @etm_filter:All the filters for this session.
+ * @nr_filters:Number of filters
+ * @ssstatus:  Status of the start/stop logic.
+ */
+struct etm_filters {
+   struct etm_filter   filter[ETM_ADDR_CMP_MAX];


nit: having the variable renamed to etm_filter will make the code a bit more 
readable
where we populate/validate the filters.

Otherwise looks good

Suzuki


[PATCH] coresight: Use local coresight_desc instances

2016-07-13 Thread Suzuki K Poulose
Each coresight device prepares a description for coresight_register()
in struct coresight_desc. Once we register the device, the description is
useless and can be freed. The coresight_desc is small enough (48bytes on 64bit)
to be allocated on the stack. Hence use an automatic variable to avoid a
needless dynamic allocation and wasting the memory(which will only be free'd
when the device is destroyed).

Cc: Mathieu Poirier <mathieu.poir...@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 drivers/hwtracing/coresight/coresight-etb10.c  | 20 ++-
 drivers/hwtracing/coresight/coresight-etm3x.c  | 20 ++-
 drivers/hwtracing/coresight/coresight-etm4x.c  | 20 ++-
 drivers/hwtracing/coresight/coresight-funnel.c | 20 ++-
 .../coresight/coresight-replicator-qcom.c  | 14 +-
 drivers/hwtracing/coresight/coresight-replicator.c | 20 +--
 drivers/hwtracing/coresight/coresight-stm.c| 22 ++--
 drivers/hwtracing/coresight/coresight-tmc.c| 30 ++
 drivers/hwtracing/coresight/coresight-tpiu.c   | 18 +
 9 files changed, 74 insertions(+), 110 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-etb10.c 
b/drivers/hwtracing/coresight/coresight-etb10.c
index 3b483e3..8a4927c 100644
--- a/drivers/hwtracing/coresight/coresight-etb10.c
+++ b/drivers/hwtracing/coresight/coresight-etb10.c
@@ -636,7 +636,7 @@ static int etb_probe(struct amba_device *adev, const struct 
amba_id *id)
struct coresight_platform_data *pdata = NULL;
struct etb_drvdata *drvdata;
struct resource *res = >res;
-   struct coresight_desc *desc;
+   struct coresight_desc desc = { 0 };
struct device_node *np = adev->dev.of_node;
 
if (np) {
@@ -682,17 +682,13 @@ static int etb_probe(struct amba_device *adev, const 
struct amba_id *id)
return -ENOMEM;
}
 
-   desc = devm_kzalloc(dev, sizeof(*desc), GFP_KERNEL);
-   if (!desc)
-   return -ENOMEM;
-
-   desc->type = CORESIGHT_DEV_TYPE_SINK;
-   desc->subtype.sink_subtype = CORESIGHT_DEV_SUBTYPE_SINK_BUFFER;
-   desc->ops = _cs_ops;
-   desc->pdata = pdata;
-   desc->dev = dev;
-   desc->groups = coresight_etb_groups;
-   drvdata->csdev = coresight_register(desc);
+   desc.type = CORESIGHT_DEV_TYPE_SINK;
+   desc.subtype.sink_subtype = CORESIGHT_DEV_SUBTYPE_SINK_BUFFER;
+   desc.ops = _cs_ops;
+   desc.pdata = pdata;
+   desc.dev = dev;
+   desc.groups = coresight_etb_groups;
+   drvdata->csdev = coresight_register();
if (IS_ERR(drvdata->csdev))
return PTR_ERR(drvdata->csdev);
 
diff --git a/drivers/hwtracing/coresight/coresight-etm3x.c 
b/drivers/hwtracing/coresight/coresight-etm3x.c
index d83ab82..beaaa2c 100644
--- a/drivers/hwtracing/coresight/coresight-etm3x.c
+++ b/drivers/hwtracing/coresight/coresight-etm3x.c
@@ -758,13 +758,9 @@ static int etm_probe(struct amba_device *adev, const 
struct amba_id *id)
struct coresight_platform_data *pdata = NULL;
struct etm_drvdata *drvdata;
struct resource *res = >res;
-   struct coresight_desc *desc;
+   struct coresight_desc desc = { 0 };
struct device_node *np = adev->dev.of_node;
 
-   desc = devm_kzalloc(dev, sizeof(*desc), GFP_KERNEL);
-   if (!desc)
-   return -ENOMEM;
-
drvdata = devm_kzalloc(dev, sizeof(*drvdata), GFP_KERNEL);
if (!drvdata)
return -ENOMEM;
@@ -819,13 +815,13 @@ static int etm_probe(struct amba_device *adev, const 
struct amba_id *id)
etm_init_trace_id(drvdata);
etm_set_default(>config);
 
-   desc->type = CORESIGHT_DEV_TYPE_SOURCE;
-   desc->subtype.source_subtype = CORESIGHT_DEV_SUBTYPE_SOURCE_PROC;
-   desc->ops = _cs_ops;
-   desc->pdata = pdata;
-   desc->dev = dev;
-   desc->groups = coresight_etm_groups;
-   drvdata->csdev = coresight_register(desc);
+   desc.type = CORESIGHT_DEV_TYPE_SOURCE;
+   desc.subtype.source_subtype = CORESIGHT_DEV_SUBTYPE_SOURCE_PROC;
+   desc.ops = _cs_ops;
+   desc.pdata = pdata;
+   desc.dev = dev;
+   desc.groups = coresight_etm_groups;
+   drvdata->csdev = coresight_register();
if (IS_ERR(drvdata->csdev)) {
ret = PTR_ERR(drvdata->csdev);
goto err_arch_supported;
diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c 
b/drivers/hwtracing/coresight/coresight-etm4x.c
index c8b44c6..2c21b1d 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x.c
+++ b/drivers/hwtracing/coresight/coresight-etm4x.c
@@ -726,13 +726,9 @@ static int etm4_probe(struct amba_device *adev, const 
struct amba_id *id)
struct coresight_platform_data *pdata = NULL;
struct etmv4_drv

Re: [PATCH v6] arm64: cpuinfo: Expose MIDR_EL1 and REVIDR_EL1 to sysfs

2016-06-28 Thread Suzuki K Poulose

On 28/06/16 16:33, Catalin Marinas wrote:

On Tue, Jun 21, 2016 at 12:12:36PM +0100, Suzuki K. Poulose wrote:

+#define CPUINFO_ATTR_RO(_name) 
\
+   static ssize_t show_##_name(struct device *dev, 
\
+   struct device_attribute *attr, char *buf)   
\
+   {   
\
+   struct cpuinfo_arm64 *info = _cpu(cpu_data, dev->id);   
 \
+   
\
+   if (info->reg_midr)  \
+   return sprintf(buf, "0x%016x\n", info->reg_##_name);
   \
+   else
\
+   return 0;   
\
+   }   
\
+   static DEVICE_ATTR(_name, 0444, show_##_name, NULL)
+
+CPUINFO_ATTR_RO(midr);
+CPUINFO_ATTR_RO(revidr);


Since exposing these values is aimed at JIT code (and not human
readable), wouldn't it make more sense to present the binary value
instead of the ascii transformation?


I am fine with either.

Edward,

Do you have any preference ?

Suzuki
 



[PATCH 0/8] arm64: Work around for mismatched cache line size

2016-07-08 Thread Suzuki K Poulose
This series adds a work around for systems with mismatched {I,D}-cache
line sizes. When a thread of execution gets migrated to a different CPU,
the cache line size it had cached could be larger than that of the new
CPU. This could cause data corruption issues. We work around this by

 - Dynamically patching the kernel to use the smallest line size on the
   system (from the CPU feature infrastructure)
 - Trapping the userspace access to CTR_EL0 (by clearing SCTLR_EL1.UCT) and
   emulating it with the system wide safe value of CTR.

The series also adds support for alternative code patching of adrp
instructions by adjusting the PC-relative address offset to reflect
the new PC.

The series has been tested on Juno with a hack to forced enabling
of the capability.

Applies on aarch64: for-next/core. The tree is avaiable at :

git://linux-arm.org/linux-skp.git ctr-emulation


Suzuki K Poulose (8):
  arm64: Set the safe value for L1 icache policy
  arm64: Use consistent naming for errata handling
  arm64: Rearrange CPU errata workaround checks
  arm64/insn: Add helpers for pc relative address offsets
  arm64: alternative: Add support for patching adrp instructions
  arm64: Introduce raw_{d,i}cache_line_size
  arm64: Refactor sysinstr exception handling
  arm64: Work around systems with mismatched cache line sizes

 arch/arm64/include/asm/assembler.h  | 45 +--
 arch/arm64/include/asm/cpufeature.h | 14 +++---
 arch/arm64/include/asm/esr.h| 56 
 arch/arm64/include/asm/insn.h   |  5 +++
 arch/arm64/include/asm/sysreg.h |  1 +
 arch/arm64/kernel/alternative.c | 13 ++
 arch/arm64/kernel/asm-offsets.c |  2 +
 arch/arm64/kernel/cpu_errata.c  | 26 ++-
 arch/arm64/kernel/cpufeature.c  | 44 ++-
 arch/arm64/kernel/cpuinfo.c |  2 -
 arch/arm64/kernel/hibernate-asm.S   |  2 +-
 arch/arm64/kernel/insn.c| 23 ++
 arch/arm64/kernel/relocate_kernel.S |  2 +-
 arch/arm64/kernel/smp.c |  8 +++-
 arch/arm64/kernel/traps.c   | 87 ++---
 15 files changed, 275 insertions(+), 55 deletions(-)

-- 
2.7.4



[PATCH 4/8] arm64/insn: Add helpers for pc relative data processing instructions

2016-07-08 Thread Suzuki K Poulose
Adds helpers for decoding/encoding the PC relative addresses for
Data processing instructions (i.e, adr and adrp). This will be used
for handling dynamic patching of 'adr/adrp' instructions in alternative
code patching.

Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Catalin Marinas <catalin.mari...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/include/asm/insn.h |  5 +
 arch/arm64/kernel/insn.c  | 23 +++
 2 files changed, 28 insertions(+)

diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
index 30e50eb..03dc4c2 100644
--- a/arch/arm64/include/asm/insn.h
+++ b/arch/arm64/include/asm/insn.h
@@ -277,6 +277,8 @@ __AARCH64_INSN_FUNCS(hint,  0xF01F, 0xD503201F)
 __AARCH64_INSN_FUNCS(br,   0xFC1F, 0xD61F)
 __AARCH64_INSN_FUNCS(blr,  0xFC1F, 0xD63F)
 __AARCH64_INSN_FUNCS(ret,  0xFC1F, 0xD65F)
+__AARCH64_INSN_FUNCS(adrp, 0x9F00, 0x9000)
+__AARCH64_INSN_FUNCS(adr,  0x9F00, 0x1000)
 
 #undef __AARCH64_INSN_FUNCS
 
@@ -355,6 +357,9 @@ u32 aarch64_insn_gen_logical_shifted_reg(enum 
aarch64_insn_register dst,
 s32 aarch64_get_branch_offset(u32 insn);
 u32 aarch64_set_branch_offset(u32 insn, s32 offset);
 
+s32 aarch64_get_addr_offset(u32 insn);
+u32 aarch64_set_addr_offset(u32 insn, s32 offset);
+
 bool aarch64_insn_hotpatch_safe(u32 old_insn, u32 new_insn);
 
 int aarch64_insn_patch_text_nosync(void *addr, u32 insn);
diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
index 368c082..3edd741 100644
--- a/arch/arm64/kernel/insn.c
+++ b/arch/arm64/kernel/insn.c
@@ -1175,6 +1175,29 @@ u32 aarch64_set_branch_offset(u32 insn, s32 offset)
BUG();
 }
 
+s32 aarch64_get_addr_offset(u32 insn)
+{
+   if (aarch64_insn_is_adr(insn))
+   return aarch64_insn_decode_immediate(AARCH64_INSN_IMM_ADR, 
insn);
+   if (aarch64_insn_is_adrp(insn))
+   return aarch64_insn_decode_immediate(AARCH64_INSN_IMM_ADR, 
insn) << 12;
+
+   /* Unhandled instruction */
+   BUG();
+}
+
+u32 aarch64_set_addr_offset(u32 insn, s32 offset)
+{
+   if (aarch64_insn_is_adr(insn))
+   return aarch64_insn_encode_immediate(AARCH64_INSN_IMM_ADR, insn,
+   offset);
+   if (aarch64_insn_is_adrp(insn))
+   return aarch64_insn_encode_immediate(AARCH64_INSN_IMM_ADR, insn,
+   offset 
>> 12);
+   /* Unhandled instruction */
+   BUG();
+}
+
 bool aarch32_insn_is_wide(u32 insn)
 {
return insn >= 0xe800;
-- 
2.7.4



[PATCH 1/8] arm64: Set the safe value for L1 icache policy

2016-07-08 Thread Suzuki K Poulose
Right now we use 0 as the safe value for CTR_EL0:L1Ip, which is
not defined at the moment. The safer value for the L1Ip should be
the weakest of the policies, which happens to be AIVIVT. While at it,
fix the comment about safe_val.

Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Catalin Marinas <catalin.mari...@arm.com>
Cc: Will Deacon <will.dea...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/include/asm/cpufeature.h | 2 +-
 arch/arm64/kernel/cpufeature.c  | 5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/cpufeature.h 
b/arch/arm64/include/asm/cpufeature.h
index 49dd1bd..2926c7d 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -62,7 +62,7 @@ struct arm64_ftr_bits {
enum ftr_type   type;
u8  shift;
u8  width;
-   s64 safe_val; /* safe value for discrete features */
+   s64 safe_val; /* safe value for FTR_EXACT features */
 };
 
 /*
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 916d27a..3566b6d 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -147,9 +147,10 @@ static struct arm64_ftr_bits ftr_ctr[] = {
ARM64_FTR_BITS(FTR_STRICT, FTR_LOWER_SAFE, 16, 4, 1),   /* DminLine */
/*
 * Linux can handle differing I-cache policies. Userspace JITs will
-* make use of *minLine
+* make use of *minLine.
+* If we have differing I-cache policies, report it as the weakest - 
AIVIVT.
 */
-   ARM64_FTR_BITS(FTR_NONSTRICT, FTR_EXACT, 14, 2, 0), /* L1Ip */
+   ARM64_FTR_BITS(FTR_NONSTRICT, FTR_EXACT, 14, 2, ICACHE_POLICY_AIVIVT),  
/* L1Ip */
ARM64_FTR_BITS(FTR_STRICT, FTR_EXACT, 4, 10, 0),/* RAZ */
ARM64_FTR_BITS(FTR_STRICT, FTR_LOWER_SAFE, 0, 4, 0),/* IminLine */
ARM64_FTR_END,
-- 
2.7.4



[PATCH 3/8] arm64: Rearrange CPU errata workaround checks

2016-07-08 Thread Suzuki K Poulose
Right now we run through the work around checks on a CPU
from __cpuinfo_store_cpu. There are some problems with that:

1) We initialise the system wide CPU feature registers only after the
Boot CPU updates its cpuinfo. Now, if a work around depends on the
variance of a CPU ID feature (e.g, check for Cache Line size mismatch),
we have no way of performing it cleanly for the boot CPU.

2) It is out of place, invoked from __cpuinfo_store_cpu() in cpuinfo.c. It
is not an obvious place for that.

This patch rearranges the CPU specific capability(aka work around) checks.

1) At the moment we use verify_local_cpu_capabilities() to check if a new
CPU has all the system advertised features. Use this for the secondary CPUs
to perform the work around check. For that we rename
  verify_local_cpu_capabilities() => check_local_cpu_capabilities()
which:

   If the system wide capabilities haven't been initialised (i.e, the CPU
   is activated at the boot), update the system wide detected work arounds.

   Otherwise (i.e a CPU hotplugged in later) verify that this CPU conforms to 
the
   system wide capabilities.

2) Boot CPU updates the work arounds from smp_prepare_boot_cpu() after we have
initialised the system wide CPU feature values.

Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Andre Przywara <andre.przyw...@arm.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Catalin Marinas <catalin.mari...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/include/asm/cpufeature.h |  4 ++--
 arch/arm64/kernel/cpufeature.c  | 30 --
 arch/arm64/kernel/cpuinfo.c |  2 --
 arch/arm64/kernel/smp.c |  8 +++-
 4 files changed, 29 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/include/asm/cpufeature.h 
b/arch/arm64/include/asm/cpufeature.h
index 0aab3ec..1442b45 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -192,11 +192,11 @@ void __init setup_cpu_features(void);
 void update_cpu_capabilities(const struct arm64_cpu_capabilities *caps,
const char *info);
 void enable_cpu_capabilities(const struct arm64_cpu_capabilities *caps);
+void check_local_cpu_capabilities(void);
+
 void update_cpu_errata_workarounds(void);
 void __init enable_errata_workarounds(void);
-
 void verify_local_cpu_errata_workarounds(void);
-void verify_local_cpu_capabilities(void);
 
 u64 read_system_reg(u32 id);
 
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 7b8bb7f..fb57c99 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -986,23 +986,33 @@ verify_local_cpu_features(const struct 
arm64_cpu_capabilities *caps)
  * cannot do anything to fix it up and could cause unexpected failures. So
  * we park the CPU.
  */
-void verify_local_cpu_capabilities(void)
+static void verify_local_cpu_capabilities(void)
 {
+   verify_local_cpu_errata_workarounds();
+   verify_local_cpu_features(arm64_features);
+   verify_local_elf_hwcaps(arm64_elf_hwcaps);
+   if (system_supports_32bit_el0())
+   verify_local_elf_hwcaps(compat_elf_hwcaps);
+}
 
+void check_local_cpu_capabilities(void)
+{
+   /*
+* All secondary CPUs should conform to the early CPU features
+* in use by the kernel based on boot CPU.
+*/
check_early_cpu_features();
 
/*
-* If we haven't computed the system capabilities, there is nothing
-* to verify.
+* If we haven't finalised the system capabilities, this CPU gets
+* a chance to update the errata work arounds.
+* Otherwise, this CPU should verify that it has all the system
+* advertised capabilities.
 */
if (!sys_caps_initialised)
-   return;
-
-   verify_local_cpu_errata_workarounds();
-   verify_local_cpu_features(arm64_features);
-   verify_local_elf_hwcaps(arm64_elf_hwcaps);
-   if (system_supports_32bit_el0())
-   verify_local_elf_hwcaps(compat_elf_hwcaps);
+   update_cpu_errata_workarounds();
+   else
+   verify_local_cpu_capabilities();
 }
 
 static void __init setup_feature_capabilities(void)
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index 2de8767..ddd2f36 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -245,8 +245,6 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info)
}
 
cpuinfo_detect_icache_policy(info);
-
-   update_cpu_errata_workarounds();
 }
 
 void cpuinfo_store_cpu(void)
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 62ff3c0..20d630f 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -239,7 +239,7 @@ asmlinkage void secondary_start_kernel(void)
 * this CPU ticks all of those. If it doesn't, the CPU will
 * fail to come online.
 */
-   

[PATCH 4/8] arm64/insn: Add helpers for pc relative address offsets

2016-07-08 Thread Suzuki K Poulose
Adds helpers for decoding/encoding the PC relative addresses for
Data processing instructions (i.e, adr and adrp). This will be used
for handling dynamic patching of 'adr/adrp' instructions in alternative
code patching.

Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Catalin Marinas <catalin.mari...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/include/asm/insn.h |  5 +
 arch/arm64/kernel/insn.c  | 23 +++
 2 files changed, 28 insertions(+)

diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
index 30e50eb..03dc4c2 100644
--- a/arch/arm64/include/asm/insn.h
+++ b/arch/arm64/include/asm/insn.h
@@ -277,6 +277,8 @@ __AARCH64_INSN_FUNCS(hint,  0xF01F, 0xD503201F)
 __AARCH64_INSN_FUNCS(br,   0xFC1F, 0xD61F)
 __AARCH64_INSN_FUNCS(blr,  0xFC1F, 0xD63F)
 __AARCH64_INSN_FUNCS(ret,  0xFC1F, 0xD65F)
+__AARCH64_INSN_FUNCS(adrp, 0x9F00, 0x9000)
+__AARCH64_INSN_FUNCS(adr,  0x9F00, 0x1000)
 
 #undef __AARCH64_INSN_FUNCS
 
@@ -355,6 +357,9 @@ u32 aarch64_insn_gen_logical_shifted_reg(enum 
aarch64_insn_register dst,
 s32 aarch64_get_branch_offset(u32 insn);
 u32 aarch64_set_branch_offset(u32 insn, s32 offset);
 
+s32 aarch64_get_addr_offset(u32 insn);
+u32 aarch64_set_addr_offset(u32 insn, s32 offset);
+
 bool aarch64_insn_hotpatch_safe(u32 old_insn, u32 new_insn);
 
 int aarch64_insn_patch_text_nosync(void *addr, u32 insn);
diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
index 368c082..3edd741 100644
--- a/arch/arm64/kernel/insn.c
+++ b/arch/arm64/kernel/insn.c
@@ -1175,6 +1175,29 @@ u32 aarch64_set_branch_offset(u32 insn, s32 offset)
BUG();
 }
 
+s32 aarch64_get_addr_offset(u32 insn)
+{
+   if (aarch64_insn_is_adr(insn))
+   return aarch64_insn_decode_immediate(AARCH64_INSN_IMM_ADR, 
insn);
+   if (aarch64_insn_is_adrp(insn))
+   return aarch64_insn_decode_immediate(AARCH64_INSN_IMM_ADR, 
insn) << 12;
+
+   /* Unhandled instruction */
+   BUG();
+}
+
+u32 aarch64_set_addr_offset(u32 insn, s32 offset)
+{
+   if (aarch64_insn_is_adr(insn))
+   return aarch64_insn_encode_immediate(AARCH64_INSN_IMM_ADR, insn,
+   offset);
+   if (aarch64_insn_is_adrp(insn))
+   return aarch64_insn_encode_immediate(AARCH64_INSN_IMM_ADR, insn,
+   offset 
>> 12);
+   /* Unhandled instruction */
+   BUG();
+}
+
 bool aarch32_insn_is_wide(u32 insn)
 {
return insn >= 0xe800;
-- 
2.7.4



[PATCH 5/8] arm64: alternative: Add support for patching adrp instructions

2016-07-08 Thread Suzuki K Poulose
adrp uses PC-relative address offset to a page (of 4K size) of
a symbol. If it appears in an alternative code patched in, we
should adjust the offset to reflect the address where it will
be run from. This patch adds support for fixing the offset
for adrp instructions.

Cc: Will Deacon <will.dea...@arm.com>
Cc: Marc Zyngier <marc.zyng...@arm.com>
Cc: Andre Przywara <andre.przyw...@arm.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/kernel/alternative.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
index d2ee1b2..cd36950 100644
--- a/arch/arm64/kernel/alternative.c
+++ b/arch/arm64/kernel/alternative.c
@@ -80,6 +80,19 @@ static u32 get_alt_insn(struct alt_instr *alt, u32 *insnptr, 
u32 *altinsnptr)
offset = target - (unsigned long)insnptr;
insn = aarch64_set_branch_offset(insn, offset);
}
+   } else if (aarch64_insn_is_adrp(insn)) {
+   s32 orig_offset, new_offset;
+   unsigned long target;
+
+   /*
+* If we're replacing an adrp instruction, which uses 
PC-relative
+* immediate addressing, adjust the offset to reflect the new
+* PC. adrp operates on 4K aligned addresses.
+*/
+   orig_offset  = aarch64_get_addr_offset(insn);
+   target = ((unsigned long)altinsnptr & ~0xfffUL) + orig_offset;
+   new_offset = target - ((unsigned long)insnptr & ~0xfffUL);
+   insn = aarch64_set_addr_offset(insn, new_offset);
}
 
return insn;
-- 
2.7.4



[PATCH 2/8] arm64: Use consistent naming for errata handling

2016-07-08 Thread Suzuki K Poulose
This is a cosmetic change to rename the functions dealing with
the errata work arounds to be more consistent with their naming.

1) check_local_cpu_errata() => update_cpu_errata_workarounds()
check_local_cpu_errata() actually updates the system's errata work
arounds. So rename it to reflect the same.

2) verify_local_cpu_errata() => verify_local_cpu_errata_workarounds()
Use errata_workarounds instead of _errata.

Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/include/asm/cpufeature.h | 4 ++--
 arch/arm64/kernel/cpu_errata.c  | 4 ++--
 arch/arm64/kernel/cpufeature.c  | 2 +-
 arch/arm64/kernel/cpuinfo.c | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/cpufeature.h 
b/arch/arm64/include/asm/cpufeature.h
index 2926c7d..0aab3ec 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -192,10 +192,10 @@ void __init setup_cpu_features(void);
 void update_cpu_capabilities(const struct arm64_cpu_capabilities *caps,
const char *info);
 void enable_cpu_capabilities(const struct arm64_cpu_capabilities *caps);
-void check_local_cpu_errata(void);
+void update_cpu_errata_workarounds(void);
 void __init enable_errata_workarounds(void);
 
-void verify_local_cpu_errata(void);
+void verify_local_cpu_errata_workarounds(void);
 void verify_local_cpu_capabilities(void);
 
 u64 read_system_reg(u32 id);
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index af647d2..f0fd6a2c 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -110,7 +110,7 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
  * and the related information is freed soon after. If the new CPU requires
  * an errata not detected at boot, fail this CPU.
  */
-void verify_local_cpu_errata(void)
+void verify_local_cpu_errata_workarounds(void)
 {
const struct arm64_cpu_capabilities *caps = arm64_errata;
 
@@ -125,7 +125,7 @@ void verify_local_cpu_errata(void)
}
 }
 
-void check_local_cpu_errata(void)
+void update_cpu_errata_workarounds(void)
 {
update_cpu_capabilities(arm64_errata, "enabling workaround for");
 }
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 3566b6d..7b8bb7f 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -998,7 +998,7 @@ void verify_local_cpu_capabilities(void)
if (!sys_caps_initialised)
return;
 
-   verify_local_cpu_errata();
+   verify_local_cpu_errata_workarounds();
verify_local_cpu_features(arm64_features);
verify_local_elf_hwcaps(arm64_elf_hwcaps);
if (system_supports_32bit_el0())
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index c173d32..2de8767 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -246,7 +246,7 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info)
 
cpuinfo_detect_icache_policy(info);
 
-   check_local_cpu_errata();
+   update_cpu_errata_workarounds();
 }
 
 void cpuinfo_store_cpu(void)
-- 
2.7.4



[PATCH 7/8] arm64: Refactor sysinstr exception handling

2016-07-08 Thread Suzuki K Poulose
Right now we trap some of the user space data cache operations
based on a few Errata (ARM 819472, 826319, 827319 and 824069).
We need to trap userspace access to CTR_EL0, if we detect mismatched
cache line size. Since both these traps share the EC, refactor
the handler a little bit to make it a bit more reader friendly.

Cc: Andre Przywara <andre.przyw...@arm.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Catalin Marinas <catalin.mari...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/include/asm/esr.h | 48 +
 arch/arm64/kernel/traps.c| 73 
 2 files changed, 95 insertions(+), 26 deletions(-)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index f772e15..2a8f6c3 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -109,6 +109,54 @@
((ESR_ELx_EC_BRK64 << ESR_ELx_EC_SHIFT) | ESR_ELx_IL |  \
 ((imm) & 0x))
 
+/* ISS field definitions for System instruction traps */
+#define ESR_ELx_SYS64_ISS_RES0_SHIFT   22
+#define ESR_ELx_SYS64_ISS_RES0_MASK(UL(0x7) << 
ESR_ELx_SYS64_ISS_RES0_SHIFT)
+#define ESR_ELx_SYS64_ISS_DIR_MASK 0x1
+#define ESR_ELx_SYS64_ISS_DIR_READ 0x1
+#define ESR_ELx_SYS64_ISS_DIR_WRITE0x0
+
+#define ESR_ELx_SYS64_ISS_RT_SHIFT 5
+#define ESR_ELx_SYS64_ISS_RT_MASK  (UL(0x1f) << ESR_ELx_SYS64_ISS_RT_SHIFT)
+#define ESR_ELx_SYS64_ISS_CRm_SHIFT1
+#define ESR_ELx_SYS64_ISS_CRm_MASK (UL(0xf) << ESR_ELx_SYS64_ISS_CRm_SHIFT)
+#define ESR_ELx_SYS64_ISS_CRn_SHIFT10
+#define ESR_ELx_SYS64_ISS_CRn_MASK (UL(0xf) << ESR_ELx_SYS64_ISS_CRn_SHIFT)
+#define ESR_ELx_SYS64_ISS_Op1_SHIFT14
+#define ESR_ELx_SYS64_ISS_Op1_MASK (UL(0x7) << ESR_ELx_SYS64_ISS_Op1_SHIFT)
+#define ESR_ELx_SYS64_ISS_Op2_SHIFT17
+#define ESR_ELx_SYS64_ISS_Op2_MASK (UL(0x7) << ESR_ELx_SYS64_ISS_Op2_SHIFT)
+#define ESR_ELx_SYS64_ISS_Op0_SHIFT20
+#define ESR_ELx_SYS64_ISS_Op0_MASK (UL(0x3) << ESR_ELx_SYS64_ISS_Op0_SHIFT)
+#define ESR_ELx_SYS64_ISS_SYS_MASK (ESR_ELx_SYS64_ISS_Op0_MASK | \
+ESR_ELx_SYS64_ISS_Op1_MASK | \
+ESR_ELx_SYS64_ISS_Op2_MASK | \
+ESR_ELx_SYS64_ISS_CRn_MASK | \
+ESR_ELx_SYS64_ISS_CRm_MASK)
+#define ESR_ELx_SYS64_ISS_SYS_VAL(Op0, Op1, Op2, CRn, CRm) \
+   (((Op0) << ESR_ELx_SYS64_ISS_Op0_SHIFT) 
| \
+((Op1) << ESR_ELx_SYS64_ISS_Op1_SHIFT) 
| \
+((Op2) << ESR_ELx_SYS64_ISS_Op2_SHIFT) 
| \
+((CRn) << ESR_ELx_SYS64_ISS_CRn_SHIFT) 
| \
+((CRm) << ESR_ELx_SYS64_ISS_CRm_SHIFT))
+/*
+ * User space cache operations have the following sysreg encoding
+ * in System instructions.
+ * Op0=1, Op1=3, Op2=1, CRn=7, CRm={ 5, 10, 11, 14 }, WRITE (L=0)
+ */
+#define ESR_ELx_SYS64_ISS_CRm_DC_CIVAC 14
+#define ESR_ELx_SYS64_ISS_CRm_DC_CVAU  11
+#define ESR_ELx_SYS64_ISS_CRm_DC_CVAC  10
+#define ESR_ELx_SYS64_ISS_CRm_IC_IVAU  5
+
+#define ESR_ELx_SYS64_ISS_U_CACHE_OP_MASK  (ESR_ELx_SYS64_ISS_Op0_MASK | \
+ESR_ELx_SYS64_ISS_Op1_MASK | \
+ESR_ELx_SYS64_ISS_Op2_MASK | \
+ESR_ELx_SYS64_ISS_CRn_MASK | \
+ESR_ELx_SYS64_ISS_DIR_MASK)
+#define ESR_ELx_SYS64_ISS_U_CACHE_OP_VAL \
+   (ESR_ELx_SYS64_ISS_SYS_VAL(1, 3, 1, 7, 0) | \
+ESR_ELx_SYS64_ISS_DIR_WRITE)
 #ifndef __ASSEMBLY__
 #include 
 
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index e04f838..93c5287 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -447,36 +447,29 @@ void cpu_enable_cache_maint_trap(void *__unused)
: "=r" (res)\
: "r" (address), "i" (-EFAULT) )
 
-asmlinkage void __exception do_sysinstr(unsigned int esr, struct pt_regs *regs)
+static void user_cache_maint_handler(unsigned int esr, struct pt_regs *regs)
 {
unsigned long address;
-   int ret;
+   int rt = (esr & ESR_ELx_SYS64_ISS_RT_MASK) >> 
ESR_ELx_SYS64_ISS_RT_SHIFT;
+   int crm = (esr & ESR_ELx_SYS64_ISS_CRm_MASK) >> 
ESR_ELx_SYS64_ISS_CRm_SHIFT;
+   int ret = 0;
 
-   /* if this is a write with: Op0=1, Op2=1, Op1=3, CRn=7 */
-   if ((esr & 0x01fffc01) == 0x0012dc00) {
-   int rt = (esr >> 5) & 0x1f;
- 

[PATCH 6/8] arm64: Introduce raw_{d,i}cache_line_size

2016-07-08 Thread Suzuki K Poulose
On systems with mismatched i/d cache min line sizes, we need to use
the smallest size possible across all CPUs. This will be done by fetching
the system wide safe value from CPU feature infrastructure.
However the some special users(e.g kexec, hibernate) would need the line
size on the CPU (rather than the system wide), when the system wide
feature may not be accessible. Provide another helper which will fetch
cache line size on the current CPU.

Cc: James Morse <james.mo...@arm.com>
Cc: Geoff Levand <ge...@infradead.org>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/include/asm/assembler.h  | 24 
 arch/arm64/kernel/hibernate-asm.S   |  2 +-
 arch/arm64/kernel/relocate_kernel.S |  2 +-
 3 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h 
b/arch/arm64/include/asm/assembler.h
index d5025c6..a4bb3f5 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -218,9 +218,10 @@ lr .reqx30 // link register
.endm
 
 /*
- * dcache_line_size - get the minimum D-cache line size from the CTR register.
+ * raw_dcache_line_size - get the minimum D-cache line size on this CPU
+ * from the CTR register.
  */
-   .macro  dcache_line_size, reg, tmp
+   .macro  raw_dcache_line_size, reg, tmp
mrs \tmp, ctr_el0   // read CTR
ubfm\tmp, \tmp, #16, #19// cache line size encoding
mov \reg, #4// bytes per word
@@ -228,9 +229,17 @@ lr .reqx30 // link register
.endm
 
 /*
- * icache_line_size - get the minimum I-cache line size from the CTR register.
+ * dcache_line_size - get the safe D-cache line size across all CPUs
  */
-   .macro  icache_line_size, reg, tmp
+   .macro  dcache_line_size, reg, tmp
+   raw_dcache_line_size\reg, \tmp
+   .endm
+
+/*
+ * raw_icache_line_size - get the minimum I-cache line size on this CPU
+ * from the CTR register.
+ */
+   .macro  raw_icache_line_size, reg, tmp
mrs \tmp, ctr_el0   // read CTR
and \tmp, \tmp, #0xf// cache line size encoding
mov \reg, #4// bytes per word
@@ -238,6 +247,13 @@ lr .reqx30 // link register
.endm
 
 /*
+ * icache_line_size - get the safe I-cache line size across all CPUs
+ */
+   .macro  icache_line_size, reg, tmp
+   raw_icache_line_size\reg, \tmp
+   .endm
+
+/*
  * tcr_set_idmap_t0sz - update TCR.T0SZ so that we can load the ID map
  */
.macro  tcr_set_idmap_t0sz, valreg, tmpreg
diff --git a/arch/arm64/kernel/hibernate-asm.S 
b/arch/arm64/kernel/hibernate-asm.S
index 46f29b6..4ebc6a1 100644
--- a/arch/arm64/kernel/hibernate-asm.S
+++ b/arch/arm64/kernel/hibernate-asm.S
@@ -96,7 +96,7 @@ ENTRY(swsusp_arch_suspend_exit)
 
add x1, x10, #PAGE_SIZE
/* Clean the copied page to PoU - based on flush_icache_range() */
-   dcache_line_size x2, x3
+   raw_dcache_line_size x2, x3
sub x3, x2, #1
bic x4, x10, x3
 2: dc  cvau, x4/* clean D line / unified line */
diff --git a/arch/arm64/kernel/relocate_kernel.S 
b/arch/arm64/kernel/relocate_kernel.S
index 51b73cd..ce704a4 100644
--- a/arch/arm64/kernel/relocate_kernel.S
+++ b/arch/arm64/kernel/relocate_kernel.S
@@ -34,7 +34,7 @@ ENTRY(arm64_relocate_new_kernel)
/* Setup the list loop variables. */
mov x17, x1 /* x17 = kimage_start */
mov x16, x0 /* x16 = kimage_head */
-   dcache_line_size x15, x0/* x15 = dcache line size */
+   raw_dcache_line_size x15, x0/* x15 = dcache line size */
mov x14, xzr/* x14 = entry ptr */
mov x13, xzr/* x13 = copy dest */
 
-- 
2.7.4



[PATCH 8/8] arm64: Work around systems with mismatched cache line sizes

2016-07-08 Thread Suzuki K Poulose
Systems with differing CPU i-cache/d-cache line sizes can cause
problems with the cache management by software when the execution
is migrated from one to another. Usually, the application reads
the cache size on a CPU and then uses that length to perform cache
operations. However, if it gets migrated to another CPU with a smaller
cache line size, things could go completely wrong. To prevent such
cases, always use the smallest cache line size among the CPUs. The
kernel CPU feature infrastructure already keeps track of the safe
value for all CPUID registers including CTR. This patch works around
the problem by :

For kernel, dynamically patch the kernel to read the cache size
from the system wide copy of CTR_EL0.

For applications, trap read accesses to CTR_EL0 (by clearing the SCTLR.UCT)
and emulate the mrs instruction to return the system wide safe value
of CTR_EL0.

For faster access (i.e, avoiding to lookup the system wide value of CTR_EL0
via read_system_reg), we keep track of the pointer to table entry for
CTR_EL0 in the CPU feature infrastructure.

Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Andre Przywara <andre.przyw...@arm.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Catalin Marinas <catalin.mari...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/include/asm/assembler.h  | 25 +++--
 arch/arm64/include/asm/cpufeature.h |  4 +++-
 arch/arm64/include/asm/esr.h|  8 
 arch/arm64/include/asm/sysreg.h |  1 +
 arch/arm64/kernel/asm-offsets.c |  2 ++
 arch/arm64/kernel/cpu_errata.c  | 22 ++
 arch/arm64/kernel/cpufeature.c  |  9 +
 arch/arm64/kernel/traps.c   | 14 ++
 8 files changed, 82 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h 
b/arch/arm64/include/asm/assembler.h
index a4bb3f5..1902ee7 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -216,6 +216,21 @@ lr .reqx30 // link register
.macro  mmid, rd, rn
ldr \rd, [\rn, #MM_CONTEXT_ID]
.endm
+/*
+ * read_safe_ctr - read system wide safe CTR_EL0. If the system has
+ * mismatched cache line sizes, provide the system wide safe value.
+ */
+   .macro  read_safe_ctr, reg
+alternative_if_not ARM64_MISMATCHED_CACHE_LINE_SIZE
+   mrs \reg, ctr_el0   // read CTR
+   nop
+   nop
+alternative_else
+   ldr_l   \reg, sys_ctr_ftr   // Read system wide safe CTR 
value
+   ldr \reg, [\reg, #ARM64_FTR_SYSVAL] // from sys_ctr_ftr->sys_val
+alternative_endif
+   .endm
+
 
 /*
  * raw_dcache_line_size - get the minimum D-cache line size on this CPU
@@ -232,7 +247,10 @@ lr .reqx30 // link register
  * dcache_line_size - get the safe D-cache line size across all CPUs
  */
.macro  dcache_line_size, reg, tmp
-   raw_dcache_line_size\reg, \tmp
+   read_safe_ctr   \tmp
+   ubfm\tmp, \tmp, #16, #19// cache line size encoding
+   mov \reg, #4// bytes per word
+   lsl \reg, \reg, \tmp// actual cache line size
.endm
 
 /*
@@ -250,7 +268,10 @@ lr .reqx30 // link register
  * icache_line_size - get the safe I-cache line size across all CPUs
  */
.macro  icache_line_size, reg, tmp
-   raw_icache_line_size\reg, \tmp
+   read_safe_ctr   \tmp
+   and \tmp, \tmp, #0xf// cache line size encoding
+   mov \reg, #4// bytes per word
+   lsl \reg, \reg, \tmp// actual cache line size
.endm
 
 /*
diff --git a/arch/arm64/include/asm/cpufeature.h 
b/arch/arm64/include/asm/cpufeature.h
index 1442b45..b1f963c 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -36,8 +36,9 @@
 #define ARM64_HAS_VIRT_HOST_EXTN   11
 #define ARM64_WORKAROUND_CAVIUM_27456  12
 #define ARM64_HAS_32BIT_EL013
+#define ARM64_MISMATCHED_CACHE_LINE_SIZE   14
 
-#define ARM64_NCAPS14
+#define ARM64_NCAPS15
 
 #ifndef __ASSEMBLY__
 
@@ -108,6 +109,7 @@ struct arm64_cpu_capabilities {
 };
 
 extern DECLARE_BITMAP(cpu_hwcaps, ARM64_NCAPS);
+extern struct arm64_ftr_reg *sys_ctr_ftr;
 
 bool this_cpu_has_cap(unsigned int cap);
 
diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index 2a8f6c3..51aea89 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -139,6 +139,9 @@
 ((Op2) << ESR_ELx_SYS64_ISS_Op2_SHIFT) 
| \
 ((CRn) << ESR_ELx_SYS64_ISS_CRn_SHIFT) 
| \
 ((CRm) << ESR_ELx_SYS64_ISS_CRm_SHIFT))
+
+#define ESR_ELx

[PATCH 0/8] arm64: Work around for mismatched cache line size

2016-07-08 Thread Suzuki K Poulose
This series adds a work around for systems with mismatched {I,D}-cache
line sizes. When a thread of execution gets migrated to a different CPU,
the cache line size it had cached could be larger than that of the new
CPU. This could cause data corruption issues. We work around this by

 - Dynamically patching the kernel to use the smallest line size on the
   system (from the CPU feature infrastructure)
 - Trapping the userspace access to CTR_EL0 (by clearing SCTLR_EL1.UCT) and
   emulating it with the system wide safe value of CTR.

The series also adds support for alternative code patching of adrp
instructions by adjusting the PC-relative address offset to reflect
the new PC.

The series has been tested on Juno with a hack to forced enabling
of the capability.

Applies on aarch64: for-next/core. The tree is avaiable at :

git://linux-arm.org/linux-skp.git ctr-emulation


Suzuki K Poulose (8):
  arm64: Set the safe value for L1 icache policy
  arm64: Use consistent naming for errata handling
  arm64: Rearrange CPU errata workaround checks
  arm64/insn: Add helpers for pc relative address offsets
  arm64: alternative: Add support for patching adrp instructions
  arm64: Introduce raw_{d,i}cache_line_size
  arm64: Refactor sysinstr exception handling
  arm64: Work around systems with mismatched cache line sizes

 arch/arm64/include/asm/assembler.h  | 45 +--
 arch/arm64/include/asm/cpufeature.h | 14 +++---
 arch/arm64/include/asm/esr.h| 56 
 arch/arm64/include/asm/insn.h   |  5 +++
 arch/arm64/include/asm/sysreg.h |  1 +
 arch/arm64/kernel/alternative.c | 13 ++
 arch/arm64/kernel/asm-offsets.c |  2 +
 arch/arm64/kernel/cpu_errata.c  | 26 ++-
 arch/arm64/kernel/cpufeature.c  | 44 ++-
 arch/arm64/kernel/cpuinfo.c |  2 -
 arch/arm64/kernel/hibernate-asm.S   |  2 +-
 arch/arm64/kernel/insn.c| 23 ++
 arch/arm64/kernel/relocate_kernel.S |  2 +-
 arch/arm64/kernel/smp.c |  8 +++-
 arch/arm64/kernel/traps.c   | 87 ++---
 15 files changed, 275 insertions(+), 55 deletions(-)

-- 
2.7.4



[PATCH v9] arm64: cpuinfo: Expose MIDR_EL1 and REVIDR_EL1 to sysfs

2016-07-08 Thread Suzuki K Poulose
From: Steve Capper <steve.cap...@linaro.org>

It can be useful for JIT software to be aware of MIDR_EL1 and
REVIDR_EL1 to ascertain the presence of any core errata that could
affect code generation.

This patch exposes these registers through sysfs:

/sys/devices/system/cpu/cpu$ID/regs/identification/midr_el1
/sys/devices/system/cpu/cpu$ID/regs/identification/revidr_el1

where $ID is the cpu number. For big.LITTLE systems, one can have a
mixture of cores (e.g. Cortex A53 and Cortex A57), thus all CPUs need
to be enumerated.

If the kernel does not have valid information to populate these entries
with, an empty string is returned to userspace.

Cc: Catalin Marinas <catalin.mari...@arm.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Signed-off-by: Steve Capper <steve.cap...@linaro.org>
[ ABI documentation updates, hotplug notifiers, kobject changes ]
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
Changes since V8:
  - Handle sysfs group addition/removal gracefully.
Changes since V7:
  - Remove unnecessary clean up cpuinfo_regs_init
Changes since V6:
  - Introduce regs/identification hierarchy(using kobject for the added level)
  - Use the register names as in ARM ARM (i.e, midr => midr_el1)
Changes since V5:
  - Add hotplug notifier to {add/remove} the attributes when the CPU is brought
{online/offline}.
  - Replace cpu_hotplug_{disable,enable} => cpu_notifier_register_{begin/done}
  - Remove redundant check for cpu present, as the sysfs infrastructure does
check already returning -ENODEV, if the CPU goes offline between open() and
read().
Changes since V4:
  - Update comment as suggested by Mark Rutland
Changes since V3:
  - Disable cpu hotplug while we initialise
  - Added a comment to explain why expose 64bit value
  - Update Document/ABI/testing/sysfs-devices-system-cpu
Changes since V2:
  - Fix errno for failures (Spotted-by: Russell King)
  - Roll back, if we encounter a missing cpu device
  - Return error for access to registers of CPUs not present.
---
 Documentation/ABI/testing/sysfs-devices-system-cpu |  10 ++
 arch/arm64/include/asm/cpu.h   |   2 +
 arch/arm64/kernel/cpuinfo.c| 120 +
 3 files changed, 132 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu 
b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 1650133..4987417 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -340,3 +340,13 @@ Description:   POWERNV CPUFreq driver's frequency 
throttle stats directory and
'policyX/throttle_stats' directory and all the attributes are 
same as
the /sys/devices/system/cpu/cpuX/cpufreq/throttle_stats 
directory and
attributes which give the frequency throttle information of the 
chip.
+
+What:  /sys/devices/system/cpu/cpuX/regs/
+   /sys/devices/system/cpu/cpuX/regs/identification/
+   /sys/devices/system/cpu/cpuX/regs/identification/midr_el1
+   /sys/devices/system/cpu/cpuX/regs/identification/revidr_el1
+Date:  June 2016
+Contact:   Linux ARM Kernel Mailing list 
<linux-arm-ker...@lists.infradead.org>
+Description:   AArch64 CPU registers
+   'identification' directory exposes the CPU ID registers for
+identifying model and revision of the CPU.
diff --git a/arch/arm64/include/asm/cpu.h b/arch/arm64/include/asm/cpu.h
index 13a6103..889226b 100644
--- a/arch/arm64/include/asm/cpu.h
+++ b/arch/arm64/include/asm/cpu.h
@@ -25,10 +25,12 @@
  */
 struct cpuinfo_arm64 {
struct cpu  cpu;
+   struct kobject  kobj;
u32 reg_ctr;
u32 reg_cntfrq;
u32 reg_dczid;
u32 reg_midr;
+   u32 reg_revidr;
 
u64 reg_id_aa64dfr0;
u64 reg_id_aa64dfr1;
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index c173d32..ed1b84f 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -183,6 +183,123 @@ const struct seq_operations cpuinfo_op = {
.show   = c_show
 };
 
+
+static struct kobj_type cpuregs_kobj_type = {
+   .sysfs_ops = _sysfs_ops,
+};
+
+/*
+ * The ARM ARM uses the phrase "32-bit register" to describe a register
+ * whose upper 32 bits are RES0 (per C5.1.1, ARM DDI 0487A.i), however
+ * no statement is made as to whether the upper 32 bits will or will not
+ * be made use of in future, and between ARM DDI 0487A.c and ARM DDI
+ * 0487A.d CLIDR_EL1 was expanded from 32-bit to 64-bit.
+ *
+ * Thus, while both MIDR_EL1 and REVIDR_EL1 are described as 32-bit
+ * registers, we expose them both as 64 bit values to cater for possible
+ * future expansion without an ABI break.
+ */
+#define kobj_to_cp

Re: [PATCH V4 6/6] coresight: etm-perf: incorporating sink definition from cmd line

2016-08-05 Thread Suzuki K Poulose

On 04/08/16 17:53, Mathieu Poirier wrote:

Now that PMU specific configuration is available as part of the event,
lookup the sink identified by users from the perf command line and build
a path from source to sink.

With this functionality it is no longer required to select a sink in a
separate step (from sysFS) before a perf trace session can be started.

Signed-off-by: Mathieu Poirier 




+static const match_table_t drv_cfg_tokens = {
+   {ETM_TOKEN_SINK_CPU, "sink=cpu%d:%s"},
+   {ETM_TOKEN_SINK, "sink=%s"},
+   {ETM_TOKEN_ERR, NULL},
+};
+


Is the above format documented somewhere for the perf users ? If not could we 
please
add it ?

Thanks
Suzuki



[PATCH v7] arm64: cpuinfo: Expose MIDR_EL1 and REVIDR_EL1 to sysfs

2016-06-30 Thread Suzuki K Poulose
From: Steve Capper <steve.cap...@linaro.org>

It can be useful for JIT software to be aware of MIDR_EL1 and
REVIDR_EL1 to ascertain the presence of any core errata that could
affect code generation.

This patch exposes these registers through sysfs:

/sys/devices/system/cpu/cpu$ID/regs/identification/midr_el1
/sys/devices/system/cpu/cpu$ID/regs/identification/revidr_el1

where $ID is the cpu number. For big.LITTLE systems, one can have a
mixture of cores (e.g. Cortex A53 and Cortex A57), thus all CPUs need
to be enumerated.

If the kernel does not have valid information to populate these entries
with, an empty string is returned to userspace.

Cc: Catalin Marinas <catalin.mari...@arm.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Signed-off-by: Steve Capper <steve.cap...@linaro.org>
[ ABI documentation updates, hotplug notifiers, kobject changes ]
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
Changes since V6:
  - Introduce regs/identification hierarchy(using kobject for the added level)
  - Use the register names as in ARM ARM (i.e, midr => midr_el1)
Changes since V5:
  - Add hotplug notifier to {add/remove} the attributes when the CPU is brought
{online/offline}.
  - Replace cpu_hotplug_{disable,enable} => cpu_notifier_register_{begin/done}
  - Remove redundant check for cpu present, as the sysfs infrastructure does
check already returning -ENODEV, if the CPU goes offline between open() and
read().
Changes since V4:
  - Update comment as suggested by Mark Rutland
Changes since V3:
  - Disable cpu hotplug while we initialise
  - Added a comment to explain why expose 64bit value
  - Update Document/ABI/testing/sysfs-devices-system-cpu
Changes since V2:
  - Fix errno for failures (Spotted-by: Russell King)
  - Roll back, if we encounter a missing cpu device
  - Return error for access to registers of CPUs not present.
---
 Documentation/ABI/testing/sysfs-devices-system-cpu |  14 +++
 arch/arm64/include/asm/cpu.h   |   2 +
 arch/arm64/kernel/cpuinfo.c| 137 +
 3 files changed, 153 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu 
b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 1650133..31dee60 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -340,3 +340,17 @@ Description:   POWERNV CPUFreq driver's frequency 
throttle stats directory and
'policyX/throttle_stats' directory and all the attributes are 
same as
the /sys/devices/system/cpu/cpuX/cpufreq/throttle_stats 
directory and
attributes which give the frequency throttle information of the 
chip.
+
+What:  /sys/devices/system/cpu/cpuX/regs/
+   /sys/devices/system/cpu/cpuX/regs/identification/
+   /sys/devices/system/cpu/cpuX/regs/identification/midr_el1
+   /sys/devices/system/cpu/cpuX/regs/identification/revidr_el1
+Date:  June 2016
+Contact:   Linux ARM Kernel Mailing list 
<linux-arm-ker...@lists.infradead.org>
+   Linux Kernel mailing list <linux-kernel@vger.kernel.org>
+Description:   ARM64 CPU identification registers
+   'identification' directory exposes the CPU ID registers for
+identifying model and revision of the CPU.
+   - midr_el1 : This file gives contents of Main ID Register 
(MIDR_EL1).
+   - revidr_el1 : This file gives contents of the Revision ID 
register
+(REVIDR_EL1).
diff --git a/arch/arm64/include/asm/cpu.h b/arch/arm64/include/asm/cpu.h
index 13a6103..889226b 100644
--- a/arch/arm64/include/asm/cpu.h
+++ b/arch/arm64/include/asm/cpu.h
@@ -25,10 +25,12 @@
  */
 struct cpuinfo_arm64 {
struct cpu  cpu;
+   struct kobject  kobj;
u32 reg_ctr;
u32 reg_cntfrq;
u32 reg_dczid;
u32 reg_midr;
+   u32 reg_revidr;
 
u64 reg_id_aa64dfr0;
u64 reg_id_aa64dfr1;
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index c173d32..59d3076 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -183,6 +183,140 @@ const struct seq_operations cpuinfo_op = {
.show   = c_show
 };
 
+
+static struct kobj_type cpuregs_kobj_type = {
+   .sysfs_ops = _sysfs_ops,
+};
+
+/*
+ * The ARM ARM uses the phrase "32-bit register" to describe a register
+ * whose upper 32 bits are RES0 (per C5.1.1, ARM DDI 0487A.i), however
+ * no statement is made as to whether the upper 32 bits will or will not
+ * be made use of in future, and between ARM DDI 0487A.c and ARM DDI
+ * 0487A.d CLIDR_EL1 was expanded from 32-bit to 64-bit.
+ *
+ * Thus, while both MIDR_EL1 and REVIDR_EL1 are describ

Re: [PATCH 4/7] bus: arm-cci: add missing of_node_put after calling of_parse_phandle

2016-07-01 Thread Suzuki K Poulose

On 01/07/16 10:41, Peter Chen wrote:

of_node_put needs to be called when the device node which is got
from of_parse_phandle has finished using.

Cc: Will Deacon <will.dea...@arm.com>
Cc: Suzuki K Poulose <suzuki.poul...@arm.com>
Signed-off-by: Peter Chen <peter.c...@nxp.com>


Thanks for the fix.


---
  drivers/bus/arm-cci.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/bus/arm-cci.c b/drivers/bus/arm-cci.c
index a49b283..e7b0b8c 100644
--- a/drivers/bus/arm-cci.c
+++ b/drivers/bus/arm-cci.c
@@ -1912,9 +1912,12 @@ static int __cci_ace_get_port(struct device_node *dn, 
int type)
cci_portn = of_parse_phandle(dn, "cci-control-port", 0);
for (i = 0; i < nb_cci_ports; i++) {
ace_match = ports[i].type == type;
-   if (ace_match && cci_portn == ports[i].dn)
+   if (ace_match && cci_portn == ports[i].dn) {
+   of_node_put(cci_portn);
return i;
+   }
}
+   of_node_put(cci_portn);


nit: Could we please do some thing like this ?
if (ace_match && cci_portn == ports[i].dn)
break;
}

of_node_put(cci_portn);
return (i < nb_cci_ports) ? i : -ENODEV ;

Either way,

Reviewed-by: Suzuki K Poulose <suzuki.poul...@arm.com>



Re: [PATCH v7] arm64: cpuinfo: Expose MIDR_EL1 and REVIDR_EL1 to sysfs

2016-07-01 Thread Suzuki K Poulose

On 01/07/16 14:01, Catalin Marinas wrote:

On Thu, Jun 30, 2016 at 06:36:44PM +0100, Suzuki K. Poulose wrote:

From: Steve Capper <steve.cap...@linaro.org>

It can be useful for JIT software to be aware of MIDR_EL1 and
REVIDR_EL1 to ascertain the presence of any core errata that could
affect code generation.

This patch exposes these registers through sysfs:

/sys/devices/system/cpu/cpu$ID/regs/identification/midr_el1
/sys/devices/system/cpu/cpu$ID/regs/identification/revidr_el1




+What:  /sys/devices/system/cpu/cpuX/regs/
+   /sys/devices/system/cpu/cpuX/regs/identification/
+   /sys/devices/system/cpu/cpuX/regs/identification/midr_el1
+   /sys/devices/system/cpu/cpuX/regs/identification/revidr_el1
+Date:  June 2016
+Contact:   Linux ARM Kernel Mailing list 
<linux-arm-ker...@lists.infradead.org>
+   Linux Kernel mailing list <linux-kernel@vger.kernel.org>
+Description:   ARM64 CPU identification registers


s/ARM64/AArch64/.


Ok.




+   'identification' directory exposes the CPU ID registers for
+identifying model and revision of the CPU.
+   - midr_el1 : This file gives contents of Main ID Register 
(MIDR_EL1).
+   - revidr_el1 : This file gives contents of the Revision ID 
register
+(REVIDR_EL1).

[...]


I will remove those superfluous explanation.


--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -183,6 +183,140 @@ const struct seq_operations cpuinfo_op = {
.show   = c_show
  };

+
+static struct kobj_type cpuregs_kobj_type = {
+   .sysfs_ops = _sysfs_ops,
+};
+
+/*
+ * The ARM ARM uses the phrase "32-bit register" to describe a register
+ * whose upper 32 bits are RES0 (per C5.1.1, ARM DDI 0487A.i), however
+ * no statement is made as to whether the upper 32 bits will or will not
+ * be made use of in future, and between ARM DDI 0487A.c and ARM DDI
+ * 0487A.d CLIDR_EL1 was expanded from 32-bit to 64-bit.
+ *
+ * Thus, while both MIDR_EL1 and REVIDR_EL1 are described as 32-bit
+ * registers, we expose them both as 64 bit values to cater for possible
+ * future expansion without an ABI break.
+ */
+#define kobj_to_cpuinfo(kobj)  container_of(kobj, struct cpuinfo_arm64, kobj)
+#define CPUREGS_ATTR_RO(_name, _field) 
\
+   static ssize_t _name##_show(struct kobject *kobj,   
\
+   struct kobj_attribute *attr, char *buf) 
\
+   {   
\
+   struct cpuinfo_arm64 *info = kobj_to_cpuinfo(kobj); 
\
+   
\
+   if (info->reg_midr)  \
+   return sprintf(buf, "0x%016x\n", info->reg_##_field);   
   \
+   else
\
+   return 0;   
\
+   }   
\
+   static struct kobj_attribute cpuregs_attr_##_name = __ATTR_RO(_name)
+
+CPUREGS_ATTR_RO(midr_el1, midr);
+CPUREGS_ATTR_RO(revidr_el1, revidr);
+
+static struct attribute *cpuregs_id_attrs[] = {
+   _attr_midr_el1.attr,
+   _attr_revidr_el1.attr,
+   NULL
+};
+
+static struct attribute_group cpuregs_attr_group = {
+   .attrs = cpuregs_id_attrs,
+   .name = "identification"
+};
+
+static int cpuid_add_regs(int cpu)
+{
+   int rc;
+   struct device *dev;
+   struct cpuinfo_arm64 *info = _cpu(cpu_data, cpu);
+
+   dev = get_cpu_device(cpu);
+   if (dev) {
+   rc = kobject_add(>kobj, >kobj, "regs");
+   if (!rc)
+   rc = sysfs_create_group(>kobj, 
_attr_group);
+   } else {
+   return -ENODEV;
+   }
+
+   return rc;
+}
+
+static int cpuid_remove_regs(int cpu)
+{
+   int rc = 0;
+   struct device *dev;
+   struct cpuinfo_arm64 *info = _cpu(cpu_data, cpu);
+
+   dev = get_cpu_device(cpu);
+   if (dev) {
+   sysfs_remove_group(>kobj, _attr_group);
+   kobject_del(>kobj);
+   } else {
+   rc = -ENODEV;
+   }
+
+   return rc;
+}
+
+static int cpuid_callback(struct notifier_block *nb,
+unsigned long action, void *hcpu)
+{
+   int rc = 0;
+   unsigned long cpu = (unsigned long)hcpu;
+
+   switch (action & ~CPU_TASKS_FROZEN) {
+   case CPU_ONLINE:
+   rc = cpuid_add_regs(cpu);
+   break;
+   case CPU_DEAD:
+   rc = cpuid_remove_regs(cpu);
+   break;
+   }
+
+   return notifier_from_errno(rc)

[PATCH v8] arm64: cpuinfo: Expose MIDR_EL1 and REVIDR_EL1 to sysfs

2016-07-04 Thread Suzuki K Poulose
From: Steve Capper <steve.cap...@linaro.org>

It can be useful for JIT software to be aware of MIDR_EL1 and
REVIDR_EL1 to ascertain the presence of any core errata that could
affect code generation.

This patch exposes these registers through sysfs:

/sys/devices/system/cpu/cpu$ID/regs/identification/midr_el1
/sys/devices/system/cpu/cpu$ID/regs/identification/revidr_el1

where $ID is the cpu number. For big.LITTLE systems, one can have a
mixture of cores (e.g. Cortex A53 and Cortex A57), thus all CPUs need
to be enumerated.

If the kernel does not have valid information to populate these entries
with, an empty string is returned to userspace.

Cc: Catalin Marinas <catalin.mari...@arm.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Signed-off-by: Steve Capper <steve.cap...@linaro.org>
[ ABI documentation updates, hotplug notifiers, kobject changes ]
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
Changes since V7:
  - Remove unnecessary clean up cpuinfo_regs_init
Changes since V6:
  - Introduce regs/identification hierarchy(using kobject for the added level)
  - Use the register names as in ARM ARM (i.e, midr => midr_el1)
Changes since V5:
  - Add hotplug notifier to {add/remove} the attributes when the CPU is brought
{online/offline}.
  - Replace cpu_hotplug_{disable,enable} => cpu_notifier_register_{begin/done}
  - Remove redundant check for cpu present, as the sysfs infrastructure does
check already returning -ENODEV, if the CPU goes offline between open() and
read().
Changes since V4:
  - Update comment as suggested by Mark Rutland
Changes since V3:
  - Disable cpu hotplug while we initialise
  - Added a comment to explain why expose 64bit value
  - Update Document/ABI/testing/sysfs-devices-system-cpu
Changes since V2:
  - Fix errno for failures (Spotted-by: Russell King)
  - Roll back, if we encounter a missing cpu device
  - Return error for access to registers of CPUs not present.
---
 Documentation/ABI/testing/sysfs-devices-system-cpu |  11 ++
 arch/arm64/include/asm/cpu.h   |   2 +
 arch/arm64/kernel/cpuinfo.c| 118 +
 3 files changed, 131 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu 
b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 1650133..5062dc1 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -340,3 +340,14 @@ Description:   POWERNV CPUFreq driver's frequency 
throttle stats directory and
'policyX/throttle_stats' directory and all the attributes are 
same as
the /sys/devices/system/cpu/cpuX/cpufreq/throttle_stats 
directory and
attributes which give the frequency throttle information of the 
chip.
+
+What:  /sys/devices/system/cpu/cpuX/regs/
+   /sys/devices/system/cpu/cpuX/regs/identification/
+   /sys/devices/system/cpu/cpuX/regs/identification/midr_el1
+   /sys/devices/system/cpu/cpuX/regs/identification/revidr_el1
+Date:  June 2016
+Contact:   Linux ARM Kernel Mailing list 
<linux-arm-ker...@lists.infradead.org>
+   Linux Kernel mailing list <linux-kernel@vger.kernel.org>
+Description:   AArch64 CPU identification registers
+   'identification' directory exposes the CPU ID registers for
+identifying model and revision of the CPU.
diff --git a/arch/arm64/include/asm/cpu.h b/arch/arm64/include/asm/cpu.h
index 13a6103..889226b 100644
--- a/arch/arm64/include/asm/cpu.h
+++ b/arch/arm64/include/asm/cpu.h
@@ -25,10 +25,12 @@
  */
 struct cpuinfo_arm64 {
struct cpu  cpu;
+   struct kobject  kobj;
u32 reg_ctr;
u32 reg_cntfrq;
u32 reg_dczid;
u32 reg_midr;
+   u32 reg_revidr;
 
u64 reg_id_aa64dfr0;
u64 reg_id_aa64dfr1;
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index c173d32..135c329 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -183,6 +183,121 @@ const struct seq_operations cpuinfo_op = {
.show   = c_show
 };
 
+
+static struct kobj_type cpuregs_kobj_type = {
+   .sysfs_ops = _sysfs_ops,
+};
+
+/*
+ * The ARM ARM uses the phrase "32-bit register" to describe a register
+ * whose upper 32 bits are RES0 (per C5.1.1, ARM DDI 0487A.i), however
+ * no statement is made as to whether the upper 32 bits will or will not
+ * be made use of in future, and between ARM DDI 0487A.c and ARM DDI
+ * 0487A.d CLIDR_EL1 was expanded from 32-bit to 64-bit.
+ *
+ * Thus, while both MIDR_EL1 and REVIDR_EL1 are described as 32-bit
+ * registers, we expose them both as 64 bit values to cater for possible
+ * future expansion without an ABI break

Re: [PATCH] coresight: STM: Balance enable/disable

2017-01-19 Thread Suzuki K Poulose

On 19/01/17 11:40, Greg KH wrote:

Fixes: commit 237483aa5cf43 ("coresight: stm: adding driver for CoreSight STM 
component")
Cc: Pratik Patel <prat...@codeaurora.org>
Cc: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Cc: sta...@vger.kernel.org # 4.7+
Acked-by: Mathieu Poirier <mathieu.poir...@linaro.org>
Reviewed-by: Chunyan Zhang <zhang.chun...@linaro.org>
Reported-by: Robert Walker <robert.wal...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---

Greg,

Without this patch, the coresight STM IP can only be used for one tracing
session per boot, seriously limiting its usability.


When you resend a patch, please tell me what is different from the
previous version you sent.  I figured it out here, but please do this
next time.


Hi Greg,

Sure, will keep that in mind. For this patch nothing has changed, except for
the addition of Review/Ack. It was resent (with to: you) just to make sure it
gets through the rc series. Thanks for queuing it.

Suzuki



[PATCH] coresight: STM: Balance enable/disable

2017-01-16 Thread Suzuki K Poulose
The stm is automatically enabled when an application sets the policy
via ->link() call back by using coresight_enable(), which keeps the
refcount of the current users of the STM. However, the unlink() callback
issues stm_disable() directly, which leaves the STM turned off, without
the coresight layer knowing about it. This prevents any further uses
of the STM hardware as the coresight layer still thinks the STM is
turned on and doesn't enable the hardware when required. Even manually
enabling the STM via sysfs can't really enable the hw.

e.g,

 $ echo 1 > $CS_DEVS/$ETR/enable_sink
 $ mkdir -p $CONFIG_FS/stp-policy/$source.0/stm_test/
 $ echo 32768 65535 > $CONFIG_FS/stp-policy/$source.0/stm_test/channels
 $ echo 64 > $CS_DEVS/$source/traceid
 $ ./stm_app
 Sending 64000 byte blocks of pattern 0 at 0us intervals
 Success to map channel(32768~32783) to 0xa95fa000
 Sending on channel 32768
 $ dd if=/dev/$ETR of=~/trace.bin.1
 597+1 records in
 597+1 records out
 305920 bytes (306 kB) copied, 0.399952 s, 765 kB/s
 $ ./stm_app
 Sending 64000 byte blocks of pattern 0 at 0us intervals
 Success to map channel(32768~32783) to 0x7e9e2000
 Sending on channel 32768
 $ dd if=/dev/$ETR of=~/trace.bin.2
 0+0 records in
 0+0 records out
 0 bytes (0 B) copied, 0.0232083 s, 0.0 kB/s

 Note that we don't get any data from the ETR for the second session.

 Also dmesg shows :

 [   77.520458] coresight-tmc 2080.etr: TMC-ETR enabled
 [   77.537097] coresight-replicator etr_replicator@2089: REPLICATOR enabled
 [   77.558828] coresight-replicator main_replicator@208a: REPLICATOR 
enabled
 [   77.581068] coresight-funnel 208c.main_funnel: FUNNEL inport 0 enabled
 [   77.602217] coresight-tmc 2084.etf: TMC-ETF enabled
 [   77.618422] coresight-stm 2086.stm: STM tracing enabled
 [  139.554252] coresight-stm 2086.stm: STM tracing disabled
  # End of first tracing session
 [  146.351135] coresight-tmc 2080.etr: TMC read start
 [  146.514486] coresight-tmc 2080.etr: TMC read end
  # Note that the STM is not turned on via 
stm_generic_link()->coresight_enable()
  # and hence none of the components are turned on.
 [  152.479080] coresight-tmc 2080.etr: TMC read start
 [  152.542632] coresight-tmc 2080.etr: TMC read end

This patch fixes the problem by balancing the unlink operation by using
the coresight_disable(), keeping the coresight layer in sync with the
hardware state and thus allowing normal usage of the STM component.

Fixes: commit 237483aa5cf43 ("coresight: stm: adding driver for CoreSight STM 
component")
Cc: Pratik Patel <prat...@codeaurora.org>
Cc: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Cc: sta...@vger.kernel.org # 4.7+
Acked-by: Mathieu Poirier <mathieu.poir...@linaro.org>
Reviewed-by: Chunyan Zhang <zhang.chun...@linaro.org>
Reported-by: Robert Walker <robert.wal...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---

Greg,

Without this patch, the coresight STM IP can only be used for one tracing
session per boot, seriously limiting its usability.

---
 drivers/hwtracing/coresight/coresight-stm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/hwtracing/coresight/coresight-stm.c 
b/drivers/hwtracing/coresight/coresight-stm.c
index e4c55c5..93fc26f 100644
--- a/drivers/hwtracing/coresight/coresight-stm.c
+++ b/drivers/hwtracing/coresight/coresight-stm.c
@@ -356,7 +356,7 @@ static void stm_generic_unlink(struct stm_data *stm_data,
if (!drvdata || !drvdata->csdev)
return;
 
-   stm_disable(drvdata->csdev, NULL);
+   coresight_disable(drvdata->csdev);
 }
 
 static phys_addr_t
-- 
2.7.4



Re: [PATCH 4/8] arm64: insn: Add helpers for adrp offsets

2016-08-18 Thread Suzuki K Poulose

On 18/08/16 15:47, Marc Zyngier wrote:

Hi Suzuki,

On 18/08/16 14:10, Suzuki K Poulose wrote:

Adds helpers for decoding/encoding the PC relative addresses for adrp.
This will be used for handling dynamic patching of 'adrp' instructions
in alternative code patching.

Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Catalin Marinas <catalin.mari...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/include/asm/insn.h |  4 
 arch/arm64/kernel/insn.c  | 13 +
 2 files changed, 17 insertions(+)

diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
index 1dbaa90..dffb0364 100644
--- a/arch/arm64/include/asm/insn.h
+++ b/arch/arm64/include/asm/insn.h
@@ -247,6 +247,7 @@ static __always_inline u32 
aarch64_insn_get_##abbr##_value(void) \
 { return (val); }

 __AARCH64_INSN_FUNCS(adr_adrp, 0x1F00, 0x1000)
+__AARCH64_INSN_FUNCS(adrp, 0x9F00, 0x9000)


I'm a bit bothered by this one. We end-up with both
aarch64_insn_is_adr_adrp() *and* aarch64_insn_is_adrp() (and their
respective getters).


You're right. It doesn't look good.  


How about dropping adr_adrp, and explicitly having adr and adrp? There
is only two users in the tree, so that should be easy to address.


Sounds good, will update if for v2.

Cheers
Suzuki



Re: [PATCH] usb: gadget: configs: plug memory leak

2017-02-28 Thread Suzuki K Poulose

On 28/02/17 10:55, John Keeping wrote:

When binding a gadget to a device, "name" is stored in gi->udc_name, but
this does not happen when unregistering and the string is leaked.

Signed-off-by: John Keeping 
---
 drivers/usb/gadget/configfs.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/usb/gadget/configfs.c b/drivers/usb/gadget/configfs.c
index 78c44979dde3..cbff3b02840d 100644
--- a/drivers/usb/gadget/configfs.c
+++ b/drivers/usb/gadget/configfs.c
@@ -269,6 +269,7 @@ static ssize_t gadget_dev_desc_UDC_store(struct config_item 
*item,
ret = unregister_gadget(gi);
if (ret)
goto err;
+   kfree(name);


Looks correct to me.

Suzuki


Re: [PATCH 2/2] arm64: Use static keys for CPU features

2016-09-02 Thread Suzuki K Poulose

On 02/09/16 16:52, Catalin Marinas wrote:

On Fri, Aug 26, 2016 at 10:22:13AM +0100, Suzuki K. Poulose wrote:

On 25/08/16 18:26, Catalin Marinas wrote:




Just a heads up. I have a patch [1] which moves the "check_local_cpu_errata()"
around to smp_prepare_boot_cpu(). This patch should still work fine with that
case. Only that may be we could move the jump_lable_init() to 
smp_prepare_boot_cpu(),
before we call "update_cpu_errata_work_arounds()" for Boot CPU.


IIUC, we wouldn't call update_cpu_errata_work_arounds() until the CPU
feature infrastructure is initialised via cpuinfo_store_boot_cpu(). So
I don't think moving the jump_label_init() call above is necessary.


Right, as I said, your patch should work fine even with that change. Its just 
that,
jump_label_init() (a generic kernel setup) can be called from a better visible
place (smp_prepare_boot_cpu()) than from a less interesting place with the patch
below.

Cheers
Suzuki





[1] 
https://lkml.kernel.org/r/1471525832-21209-4-git-send-email-suzuki.poul...@arm.com






[PATCH v3 5/9] arm64: insn: Add helpers for adrp offsets

2016-09-05 Thread Suzuki K Poulose
Adds helpers for decoding/encoding the PC relative addresses for adrp.
This will be used for handling dynamic patching of 'adrp' instructions
in alternative code patching.

Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Catalin Marinas <catalin.mari...@arm.com>
Cc: Marc Zyngier <marc.zyng...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/include/asm/insn.h | 11 ++-
 arch/arm64/kernel/insn.c  | 13 +
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
index 1dbaa90..bc85366 100644
--- a/arch/arm64/include/asm/insn.h
+++ b/arch/arm64/include/asm/insn.h
@@ -246,7 +246,8 @@ static __always_inline bool aarch64_insn_is_##abbr(u32 
code) \
 static __always_inline u32 aarch64_insn_get_##abbr##_value(void) \
 { return (val); }
 
-__AARCH64_INSN_FUNCS(adr_adrp, 0x1F00, 0x1000)
+__AARCH64_INSN_FUNCS(adr,  0x9F00, 0x1000)
+__AARCH64_INSN_FUNCS(adrp, 0x9F00, 0x9000)
 __AARCH64_INSN_FUNCS(prfm_lit, 0xFF00, 0xD800)
 __AARCH64_INSN_FUNCS(str_reg,  0x3FE0EC00, 0x38206800)
 __AARCH64_INSN_FUNCS(ldr_reg,  0x3FE0EC00, 0x38606800)
@@ -318,6 +319,11 @@ __AARCH64_INSN_FUNCS(msr_reg,  0xFFF0, 0xD510)
 bool aarch64_insn_is_nop(u32 insn);
 bool aarch64_insn_is_branch_imm(u32 insn);
 
+static inline bool aarch64_insn_is_adr_adrp(u32 insn)
+{
+   return aarch64_insn_is_adr(insn) || aarch64_insn_is_adrp(insn);
+}
+
 int aarch64_insn_read(void *addr, u32 *insnp);
 int aarch64_insn_write(void *addr, u32 insn);
 enum aarch64_insn_encoding_class aarch64_get_insn_class(u32 insn);
@@ -398,6 +404,9 @@ int aarch64_insn_patch_text_nosync(void *addr, u32 insn);
 int aarch64_insn_patch_text_sync(void *addrs[], u32 insns[], int cnt);
 int aarch64_insn_patch_text(void *addrs[], u32 insns[], int cnt);
 
+s32 aarch64_insn_adrp_get_offset(u32 insn);
+u32 aarch64_insn_adrp_set_offset(u32 insn, s32 offset);
+
 bool aarch32_insn_is_wide(u32 insn);
 
 #define A32_RN_OFFSET  16
diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
index 178488f..6f2ac4f 100644
--- a/arch/arm64/kernel/insn.c
+++ b/arch/arm64/kernel/insn.c
@@ -1202,6 +1202,19 @@ u32 aarch64_set_branch_offset(u32 insn, s32 offset)
BUG();
 }
 
+s32 aarch64_insn_adrp_get_offset(u32 insn)
+{
+   BUG_ON(!aarch64_insn_is_adrp(insn));
+   return aarch64_insn_decode_immediate(AARCH64_INSN_IMM_ADR, insn) << 12;
+}
+
+u32 aarch64_insn_adrp_set_offset(u32 insn, s32 offset)
+{
+   BUG_ON(!aarch64_insn_is_adrp(insn));
+   return aarch64_insn_encode_immediate(AARCH64_INSN_IMM_ADR, insn,
+   offset >> 12);
+}
+
 /*
  * Extract the Op/CR data from a msr/mrs instruction.
  */
-- 
2.7.4



[PATCH v3 3/9] arm64: Rearrange CPU errata workaround checks

2016-09-05 Thread Suzuki K Poulose
Right now we run through the work around checks on a CPU
from __cpuinfo_store_cpu. There are some problems with that:

1) We initialise the system wide CPU feature registers only after the
Boot CPU updates its cpuinfo. Now, if a work around depends on the
variance of a CPU ID feature (e.g, check for Cache Line size mismatch),
we have no way of performing it cleanly for the boot CPU.

2) It is out of place, invoked from __cpuinfo_store_cpu() in cpuinfo.c. It
is not an obvious place for that.

This patch rearranges the CPU specific capability(aka work around) checks.

1) At the moment we use verify_local_cpu_capabilities() to check if a new
CPU has all the system advertised features. Use this for the secondary CPUs
to perform the work around check. For that we rename
  verify_local_cpu_capabilities() => check_local_cpu_capabilities()
which:

   If the system wide capabilities haven't been initialised (i.e, the CPU
   is activated at the boot), update the system wide detected work arounds.

   Otherwise (i.e a CPU hotplugged in later) verify that this CPU conforms to 
the
   system wide capabilities.

2) Boot CPU updates the work arounds from smp_prepare_boot_cpu() after we have
initialised the system wide CPU feature values.

Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Andre Przywara <andre.przyw...@arm.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Catalin Marinas <catalin.mari...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/include/asm/cpufeature.h |  4 ++--
 arch/arm64/kernel/cpufeature.c  | 30 --
 arch/arm64/kernel/cpuinfo.c |  2 --
 arch/arm64/kernel/smp.c |  8 +++-
 4 files changed, 29 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/include/asm/cpufeature.h 
b/arch/arm64/include/asm/cpufeature.h
index 8289295..0c4f282 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -194,11 +194,11 @@ void __init setup_cpu_features(void);
 void update_cpu_capabilities(const struct arm64_cpu_capabilities *caps,
const char *info);
 void enable_cpu_capabilities(const struct arm64_cpu_capabilities *caps);
+void check_local_cpu_capabilities(void);
+
 void update_cpu_errata_workarounds(void);
 void __init enable_errata_workarounds(void);
-
 void verify_local_cpu_errata_workarounds(void);
-void verify_local_cpu_capabilities(void);
 
 u64 read_system_reg(u32 id);
 
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 643c856..90e89d1 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1006,23 +1006,33 @@ verify_local_cpu_features(const struct 
arm64_cpu_capabilities *caps)
  * cannot do anything to fix it up and could cause unexpected failures. So
  * we park the CPU.
  */
-void verify_local_cpu_capabilities(void)
+static void verify_local_cpu_capabilities(void)
 {
+   verify_local_cpu_errata_workarounds();
+   verify_local_cpu_features(arm64_features);
+   verify_local_elf_hwcaps(arm64_elf_hwcaps);
+   if (system_supports_32bit_el0())
+   verify_local_elf_hwcaps(compat_elf_hwcaps);
+}
 
+void check_local_cpu_capabilities(void)
+{
+   /*
+* All secondary CPUs should conform to the early CPU features
+* in use by the kernel based on boot CPU.
+*/
check_early_cpu_features();
 
/*
-* If we haven't computed the system capabilities, there is nothing
-* to verify.
+* If we haven't finalised the system capabilities, this CPU gets
+* a chance to update the errata work arounds.
+* Otherwise, this CPU should verify that it has all the system
+* advertised capabilities.
 */
if (!sys_caps_initialised)
-   return;
-
-   verify_local_cpu_errata_workarounds();
-   verify_local_cpu_features(arm64_features);
-   verify_local_elf_hwcaps(arm64_elf_hwcaps);
-   if (system_supports_32bit_el0())
-   verify_local_elf_hwcaps(compat_elf_hwcaps);
+   update_cpu_errata_workarounds();
+   else
+   verify_local_cpu_capabilities();
 }
 
 static void __init setup_feature_capabilities(void)
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index 4fa7b73..b3d5b3e 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -363,8 +363,6 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info)
}
 
cpuinfo_detect_icache_policy(info);
-
-   update_cpu_errata_workarounds();
 }
 
 void cpuinfo_store_cpu(void)
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index d93d433..99d8cc3 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -239,7 +239,7 @@ asmlinkage void secondary_start_kernel(void)
 * this CPU ticks all of those. If it doesn't, the CPU wi

[PATCH v3 2/9] arm64: Use consistent naming for errata handling

2016-09-05 Thread Suzuki K Poulose
This is a cosmetic change to rename the functions dealing with
the errata work arounds to be more consistent with their naming.

1) check_local_cpu_errata() => update_cpu_errata_workarounds()
check_local_cpu_errata() actually updates the system's errata work
arounds. So rename it to reflect the same.

2) verify_local_cpu_errata() => verify_local_cpu_errata_workarounds()
Use errata_workarounds instead of _errata.

Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Andre Przywara <andre.przyw...@arm.com>
Cc: Catalin Marinas <catalin.mari...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/include/asm/cpufeature.h | 4 ++--
 arch/arm64/kernel/cpu_errata.c  | 4 ++--
 arch/arm64/kernel/cpufeature.c  | 2 +-
 arch/arm64/kernel/cpuinfo.c | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/cpufeature.h 
b/arch/arm64/include/asm/cpufeature.h
index df47969..8289295 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -194,10 +194,10 @@ void __init setup_cpu_features(void);
 void update_cpu_capabilities(const struct arm64_cpu_capabilities *caps,
const char *info);
 void enable_cpu_capabilities(const struct arm64_cpu_capabilities *caps);
-void check_local_cpu_errata(void);
+void update_cpu_errata_workarounds(void);
 void __init enable_errata_workarounds(void);
 
-void verify_local_cpu_errata(void);
+void verify_local_cpu_errata_workarounds(void);
 void verify_local_cpu_capabilities(void);
 
 u64 read_system_reg(u32 id);
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index 82b0fc2..5836b3d 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -116,7 +116,7 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
  * and the related information is freed soon after. If the new CPU requires
  * an errata not detected at boot, fail this CPU.
  */
-void verify_local_cpu_errata(void)
+void verify_local_cpu_errata_workarounds(void)
 {
const struct arm64_cpu_capabilities *caps = arm64_errata;
 
@@ -131,7 +131,7 @@ void verify_local_cpu_errata(void)
}
 }
 
-void check_local_cpu_errata(void)
+void update_cpu_errata_workarounds(void)
 {
update_cpu_capabilities(arm64_errata, "enabling workaround for");
 }
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 4a19138d..643c856 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1018,7 +1018,7 @@ void verify_local_cpu_capabilities(void)
if (!sys_caps_initialised)
return;
 
-   verify_local_cpu_errata();
+   verify_local_cpu_errata_workarounds();
verify_local_cpu_features(arm64_features);
verify_local_elf_hwcaps(arm64_elf_hwcaps);
if (system_supports_32bit_el0())
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index ed1b84f..4fa7b73 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -364,7 +364,7 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info)
 
cpuinfo_detect_icache_policy(info);
 
-   check_local_cpu_errata();
+   update_cpu_errata_workarounds();
 }
 
 void cpuinfo_store_cpu(void)
-- 
2.7.4



[PATCH v3 1/9] arm64: Set the safe value for L1 icache policy

2016-09-05 Thread Suzuki K Poulose
Right now we use 0 as the safe value for CTR_EL0:L1Ip, which is
not defined at the moment. The safer value for the L1Ip should be
the weakest of the policies, which happens to be AIVIVT. While at it,
fix the comment about safe_val.

Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Catalin Marinas <catalin.mari...@arm.com>
Cc: Will Deacon <will.dea...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/include/asm/cpufeature.h | 2 +-
 arch/arm64/kernel/cpufeature.c  | 5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/cpufeature.h 
b/arch/arm64/include/asm/cpufeature.h
index c07c5d1..df47969 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -63,7 +63,7 @@ struct arm64_ftr_bits {
enum ftr_type   type;
u8  shift;
u8  width;
-   s64 safe_val; /* safe value for discrete features */
+   s64 safe_val; /* safe value for FTR_EXACT features */
 };
 
 /*
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index c3d7ae4..4a19138d 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -147,9 +147,10 @@ static const struct arm64_ftr_bits ftr_ctr[] = {
ARM64_FTR_BITS(FTR_STRICT, FTR_LOWER_SAFE, 16, 4, 1),   /* DminLine */
/*
 * Linux can handle differing I-cache policies. Userspace JITs will
-* make use of *minLine
+* make use of *minLine.
+* If we have differing I-cache policies, report it as the weakest - 
AIVIVT.
 */
-   ARM64_FTR_BITS(FTR_NONSTRICT, FTR_EXACT, 14, 2, 0), /* L1Ip */
+   ARM64_FTR_BITS(FTR_NONSTRICT, FTR_EXACT, 14, 2, ICACHE_POLICY_AIVIVT),  
/* L1Ip */
ARM64_FTR_BITS(FTR_STRICT, FTR_EXACT, 4, 10, 0),/* RAZ */
ARM64_FTR_BITS(FTR_STRICT, FTR_LOWER_SAFE, 0, 4, 0),/* IminLine */
ARM64_FTR_END,
-- 
2.7.4



[PATCH v3 6/9] arm64: alternative: Add support for patching adrp instructions

2016-09-05 Thread Suzuki K Poulose
adrp uses PC-relative address offset to a page (of 4K size) of
a symbol. If it appears in an alternative code patched in, we
should adjust the offset to reflect the address where it will
be run from. This patch adds support for fixing the offset
for adrp instructions.

Cc: Will Deacon <will.dea...@arm.com>
Cc: Marc Zyngier <marc.zyng...@arm.com>
Cc: Andre Przywara <andre.przyw...@arm.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/kernel/alternative.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
index 992918d..06d650f 100644
--- a/arch/arm64/kernel/alternative.c
+++ b/arch/arm64/kernel/alternative.c
@@ -58,6 +58,8 @@ static bool branch_insn_requires_update(struct alt_instr 
*alt, unsigned long pc)
BUG();
 }
 
+#define align_down(x, a)   ((unsigned long)(x) & ~(((unsigned long)(a)) - 
1))
+
 static u32 get_alt_insn(struct alt_instr *alt, u32 *insnptr, u32 *altinsnptr)
 {
u32 insn;
@@ -79,6 +81,19 @@ static u32 get_alt_insn(struct alt_instr *alt, u32 *insnptr, 
u32 *altinsnptr)
offset = target - (unsigned long)insnptr;
insn = aarch64_set_branch_offset(insn, offset);
}
+   } else if (aarch64_insn_is_adrp(insn)) {
+   s32 orig_offset, new_offset;
+   unsigned long target;
+
+   /*
+* If we're replacing an adrp instruction, which uses 
PC-relative
+* immediate addressing, adjust the offset to reflect the new
+* PC. adrp operates on 4K aligned addresses.
+*/
+   orig_offset  = aarch64_insn_adrp_get_offset(insn);
+   target = align_down(altinsnptr, SZ_4K) + orig_offset;
+   new_offset = target - align_down(insnptr, SZ_4K);
+   insn = aarch64_insn_adrp_set_offset(insn, new_offset);
} else if (aarch64_insn_uses_literal(insn)) {
/*
 * Disallow patching unhandled instructions using PC relative
-- 
2.7.4



[PATCH v3 7/9] arm64: Introduce raw_{d,i}cache_line_size

2016-09-05 Thread Suzuki K Poulose
On systems with mismatched i/d cache min line sizes, we need to use
the smallest size possible across all CPUs. This will be done by fetching
the system wide safe value from CPU feature infrastructure.
However the some special users(e.g kexec, hibernate) would need the line
size on the CPU (rather than the system wide), when either the system
wide feature may not be accessible or it is guranteed that the caller
executes with a gurantee of no migration.
Provide another helper which will fetch cache line size on the current CPU.

Acked-by: James Morse <james.mo...@arm.com>
Reviewed-by: Geoff Levand <ge...@infradead.org>
Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Catalin Marinas <catalin.mari...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/include/asm/assembler.h  | 24 
 arch/arm64/kernel/hibernate-asm.S   |  2 +-
 arch/arm64/kernel/relocate_kernel.S |  2 +-
 3 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h 
b/arch/arm64/include/asm/assembler.h
index d5025c6..a4bb3f5 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -218,9 +218,10 @@ lr .reqx30 // link register
.endm
 
 /*
- * dcache_line_size - get the minimum D-cache line size from the CTR register.
+ * raw_dcache_line_size - get the minimum D-cache line size on this CPU
+ * from the CTR register.
  */
-   .macro  dcache_line_size, reg, tmp
+   .macro  raw_dcache_line_size, reg, tmp
mrs \tmp, ctr_el0   // read CTR
ubfm\tmp, \tmp, #16, #19// cache line size encoding
mov \reg, #4// bytes per word
@@ -228,9 +229,17 @@ lr .reqx30 // link register
.endm
 
 /*
- * icache_line_size - get the minimum I-cache line size from the CTR register.
+ * dcache_line_size - get the safe D-cache line size across all CPUs
  */
-   .macro  icache_line_size, reg, tmp
+   .macro  dcache_line_size, reg, tmp
+   raw_dcache_line_size\reg, \tmp
+   .endm
+
+/*
+ * raw_icache_line_size - get the minimum I-cache line size on this CPU
+ * from the CTR register.
+ */
+   .macro  raw_icache_line_size, reg, tmp
mrs \tmp, ctr_el0   // read CTR
and \tmp, \tmp, #0xf// cache line size encoding
mov \reg, #4// bytes per word
@@ -238,6 +247,13 @@ lr .reqx30 // link register
.endm
 
 /*
+ * icache_line_size - get the safe I-cache line size across all CPUs
+ */
+   .macro  icache_line_size, reg, tmp
+   raw_icache_line_size\reg, \tmp
+   .endm
+
+/*
  * tcr_set_idmap_t0sz - update TCR.T0SZ so that we can load the ID map
  */
.macro  tcr_set_idmap_t0sz, valreg, tmpreg
diff --git a/arch/arm64/kernel/hibernate-asm.S 
b/arch/arm64/kernel/hibernate-asm.S
index 7734f3e..e56d848 100644
--- a/arch/arm64/kernel/hibernate-asm.S
+++ b/arch/arm64/kernel/hibernate-asm.S
@@ -96,7 +96,7 @@ ENTRY(swsusp_arch_suspend_exit)
 
add x1, x10, #PAGE_SIZE
/* Clean the copied page to PoU - based on flush_icache_range() */
-   dcache_line_size x2, x3
+   raw_dcache_line_size x2, x3
sub x3, x2, #1
bic x4, x10, x3
 2: dc  cvau, x4/* clean D line / unified line */
diff --git a/arch/arm64/kernel/relocate_kernel.S 
b/arch/arm64/kernel/relocate_kernel.S
index 51b73cd..ce704a4 100644
--- a/arch/arm64/kernel/relocate_kernel.S
+++ b/arch/arm64/kernel/relocate_kernel.S
@@ -34,7 +34,7 @@ ENTRY(arm64_relocate_new_kernel)
/* Setup the list loop variables. */
mov x17, x1 /* x17 = kimage_start */
mov x16, x0 /* x16 = kimage_head */
-   dcache_line_size x15, x0/* x15 = dcache line size */
+   raw_dcache_line_size x15, x0/* x15 = dcache line size */
mov x14, xzr/* x14 = entry ptr */
mov x13, xzr/* x13 = copy dest */
 
-- 
2.7.4



[PATCH v3 9/9] arm64: Work around systems with mismatched cache line sizes

2016-09-05 Thread Suzuki K Poulose
Systems with differing CPU i-cache/d-cache line sizes can cause
problems with the cache management by software when the execution
is migrated from one to another. Usually, the application reads
the cache size on a CPU and then uses that length to perform cache
operations. However, if it gets migrated to another CPU with a smaller
cache line size, things could go completely wrong. To prevent such
cases, always use the smallest cache line size among the CPUs. The
kernel CPU feature infrastructure already keeps track of the safe
value for all CPUID registers including CTR. This patch works around
the problem by :

For kernel, dynamically patch the kernel to read the cache size
from the system wide copy of CTR_EL0.

For applications, trap read accesses to CTR_EL0 (by clearing the SCTLR.UCT)
and emulate the mrs instruction to return the system wide safe value
of CTR_EL0.

For faster access (i.e, avoiding to lookup the system wide value of CTR_EL0
via read_system_reg), we keep track of the pointer to table entry for
CTR_EL0 in the CPU feature infrastructure.

Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Andre Przywara <andre.przyw...@arm.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Catalin Marinas <catalin.mari...@arm.com>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/include/asm/assembler.h  | 25 +++--
 arch/arm64/include/asm/cpufeature.h |  3 ++-
 arch/arm64/include/asm/esr.h|  8 
 arch/arm64/include/asm/sysreg.h |  1 +
 arch/arm64/kernel/asm-offsets.c |  2 ++
 arch/arm64/kernel/cpu_errata.c  | 22 ++
 arch/arm64/kernel/traps.c   | 14 ++
 7 files changed, 72 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h 
b/arch/arm64/include/asm/assembler.h
index a4bb3f5..addc1dd 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -216,6 +216,21 @@ lr .reqx30 // link register
.macro  mmid, rd, rn
ldr \rd, [\rn, #MM_CONTEXT_ID]
.endm
+/*
+ * read_ctr - read CTR_EL0. If the system has mismatched
+ * cache line sizes, provide the system wide safe value.
+ */
+   .macro  read_ctr, reg
+alternative_if_not ARM64_MISMATCHED_CACHE_LINE_SIZE
+   mrs \reg, ctr_el0   // read CTR
+   nop
+   nop
+alternative_else
+   adr_l   \reg, arm64_ftr_reg_ctrel0  // Read system wide safe CTR 
value
+   ldr \reg, [\reg, #ARM64_FTR_SYSVAL] // from 
arm64_ftr_reg_ctrel0.sys_val
+alternative_endif
+   .endm
+
 
 /*
  * raw_dcache_line_size - get the minimum D-cache line size on this CPU
@@ -232,7 +247,10 @@ lr .reqx30 // link register
  * dcache_line_size - get the safe D-cache line size across all CPUs
  */
.macro  dcache_line_size, reg, tmp
-   raw_dcache_line_size\reg, \tmp
+   read_ctr\tmp
+   ubfm\tmp, \tmp, #16, #19// cache line size encoding
+   mov \reg, #4// bytes per word
+   lsl \reg, \reg, \tmp// actual cache line size
.endm
 
 /*
@@ -250,7 +268,10 @@ lr .reqx30 // link register
  * icache_line_size - get the safe I-cache line size across all CPUs
  */
.macro  icache_line_size, reg, tmp
-   raw_icache_line_size\reg, \tmp
+   read_ctr\tmp
+   and \tmp, \tmp, #0xf// cache line size encoding
+   mov \reg, #4// bytes per word
+   lsl \reg, \reg, \tmp// actual cache line size
.endm
 
 /*
diff --git a/arch/arm64/include/asm/cpufeature.h 
b/arch/arm64/include/asm/cpufeature.h
index 0c4f282..8f325bf 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -37,8 +37,9 @@
 #define ARM64_WORKAROUND_CAVIUM_27456  12
 #define ARM64_HAS_32BIT_EL013
 #define ARM64_HYP_OFFSET_LOW   14
+#define ARM64_MISMATCHED_CACHE_LINE_SIZE   15
 
-#define ARM64_NCAPS15
+#define ARM64_NCAPS16
 
 #ifndef __ASSEMBLY__
 
diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index 9875b32..d14c478 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -149,6 +149,9 @@
 ((op2) << ESR_ELx_SYS64_ISS_OP2_SHIFT) 
| \
 ((crn) << ESR_ELx_SYS64_ISS_CRN_SHIFT) 
| \
 ((crm) << ESR_ELx_SYS64_ISS_CRM_SHIFT))
+
+#define ESR_ELx_SYS64_ISS_SYS_OP_MASK  (ESR_ELx_SYS64_ISS_SYS_MASK | \
+ESR_ELx_SYS64_ISS_DIR_MASK)
 /*
  * User space cache operations have the following sysreg encoding
  *

[PATCH v3 8/9] arm64: Refactor sysinstr exception handling

2016-09-05 Thread Suzuki K Poulose
Right now we trap some of the user space data cache operations
based on a few Errata (ARM 819472, 826319, 827319 and 824069).
We need to trap userspace access to CTR_EL0, if we detect mismatched
cache line size. Since both these traps share the EC, refactor
the handler a little bit to make it a bit more reader friendly.

Cc: Andre Przywara <andre.przyw...@arm.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Catalin Marinas <catalin.mari...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/include/asm/esr.h | 76 ++--
 arch/arm64/kernel/traps.c| 73 +++---
 2 files changed, 114 insertions(+), 35 deletions(-)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index f772e15..9875b32 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -78,6 +78,23 @@
 
 #define ESR_ELx_IL (UL(1) << 25)
 #define ESR_ELx_ISS_MASK   (ESR_ELx_IL - 1)
+
+/* ISS field definitions shared by different classes */
+#define ESR_ELx_WNR(UL(1) << 6)
+
+/* Shared ISS field definitions for Data/Instruction aborts */
+#define ESR_ELx_EA (UL(1) << 9)
+#define ESR_ELx_S1PTW  (UL(1) << 7)
+
+/* Shared ISS fault status code(IFSC/DFSC) for Data/Instruction aborts */
+#define ESR_ELx_FSC(0x3F)
+#define ESR_ELx_FSC_TYPE   (0x3C)
+#define ESR_ELx_FSC_EXTABT (0x10)
+#define ESR_ELx_FSC_ACCESS (0x08)
+#define ESR_ELx_FSC_FAULT  (0x04)
+#define ESR_ELx_FSC_PERM   (0x0C)
+
+/* ISS field definitions for Data Aborts */
 #define ESR_ELx_ISV(UL(1) << 24)
 #define ESR_ELx_SAS_SHIFT  (22)
 #define ESR_ELx_SAS(UL(3) << ESR_ELx_SAS_SHIFT)
@@ -86,16 +103,9 @@
 #define ESR_ELx_SRT_MASK   (UL(0x1F) << ESR_ELx_SRT_SHIFT)
 #define ESR_ELx_SF (UL(1) << 15)
 #define ESR_ELx_AR (UL(1) << 14)
-#define ESR_ELx_EA (UL(1) << 9)
 #define ESR_ELx_CM (UL(1) << 8)
-#define ESR_ELx_S1PTW  (UL(1) << 7)
-#define ESR_ELx_WNR(UL(1) << 6)
-#define ESR_ELx_FSC(0x3F)
-#define ESR_ELx_FSC_TYPE   (0x3C)
-#define ESR_ELx_FSC_EXTABT (0x10)
-#define ESR_ELx_FSC_ACCESS (0x08)
-#define ESR_ELx_FSC_FAULT  (0x04)
-#define ESR_ELx_FSC_PERM   (0x0C)
+
+/* ISS field definitions for exceptions taken in to Hyp */
 #define ESR_ELx_CV (UL(1) << 24)
 #define ESR_ELx_COND_SHIFT (20)
 #define ESR_ELx_COND_MASK  (UL(0xF) << ESR_ELx_COND_SHIFT)
@@ -109,6 +119,54 @@
((ESR_ELx_EC_BRK64 << ESR_ELx_EC_SHIFT) | ESR_ELx_IL |  \
 ((imm) & 0x))
 
+/* ISS field definitions for System instruction traps */
+#define ESR_ELx_SYS64_ISS_RES0_SHIFT   22
+#define ESR_ELx_SYS64_ISS_RES0_MASK(UL(0x7) << 
ESR_ELx_SYS64_ISS_RES0_SHIFT)
+#define ESR_ELx_SYS64_ISS_DIR_MASK 0x1
+#define ESR_ELx_SYS64_ISS_DIR_READ 0x1
+#define ESR_ELx_SYS64_ISS_DIR_WRITE0x0
+
+#define ESR_ELx_SYS64_ISS_RT_SHIFT 5
+#define ESR_ELx_SYS64_ISS_RT_MASK  (UL(0x1f) << ESR_ELx_SYS64_ISS_RT_SHIFT)
+#define ESR_ELx_SYS64_ISS_CRM_SHIFT1
+#define ESR_ELx_SYS64_ISS_CRM_MASK (UL(0xf) << ESR_ELx_SYS64_ISS_CRM_SHIFT)
+#define ESR_ELx_SYS64_ISS_CRN_SHIFT10
+#define ESR_ELx_SYS64_ISS_CRN_MASK (UL(0xf) << ESR_ELx_SYS64_ISS_CRN_SHIFT)
+#define ESR_ELx_SYS64_ISS_OP1_SHIFT14
+#define ESR_ELx_SYS64_ISS_OP1_MASK (UL(0x7) << ESR_ELx_SYS64_ISS_OP1_SHIFT)
+#define ESR_ELx_SYS64_ISS_OP2_SHIFT17
+#define ESR_ELx_SYS64_ISS_OP2_MASK (UL(0x7) << ESR_ELx_SYS64_ISS_OP2_SHIFT)
+#define ESR_ELx_SYS64_ISS_OP0_SHIFT20
+#define ESR_ELx_SYS64_ISS_OP0_MASK (UL(0x3) << ESR_ELx_SYS64_ISS_OP0_SHIFT)
+#define ESR_ELx_SYS64_ISS_SYS_MASK (ESR_ELx_SYS64_ISS_OP0_MASK | \
+ESR_ELx_SYS64_ISS_OP1_MASK | \
+ESR_ELx_SYS64_ISS_OP2_MASK | \
+ESR_ELx_SYS64_ISS_CRN_MASK | \
+ESR_ELx_SYS64_ISS_CRM_MASK)
+#define ESR_ELx_SYS64_ISS_SYS_VAL(op0, op1, op2, crn, crm) \
+   (((op0) << ESR_ELx_SYS64_ISS_OP0_SHIFT) 
| \
+((op1) << ESR_ELx_SYS64_ISS_OP1_SHIFT) 
| \
+((op2) << ESR_ELx_SYS64_ISS_OP2_SHIFT) 
| \
+((crn) << ESR_ELx_SYS64_ISS_CRN_SHIFT) 
| \
+((crm) << ESR_ELx_SYS64_ISS_CRM_SHIFT))
+/*
+ * User space cache operations have the following sysreg encoding
+ * in System instructions.
+ * op0=1, op1=3, op2=1, crn=7, crm={ 5, 10, 11, 14

[PATCH v3 0/9] arm64: Work around for mismatched cache line size

2016-09-05 Thread Suzuki K Poulose
This series adds a work around for systems with mismatched {I,D}-cache
line sizes. When a thread of execution gets migrated to a different CPU,
the cache line size it had cached could be larger than that of the new
CPU. This could cause data corruption issues. We work around this by

 - Dynamically patching the kernel to use the smallest line size on the
   system (from the CPU feature infrastructure)
 - Trapping the userspace access to CTR_EL0 (by clearing SCTLR_EL1.UCT) and
   emulating it with the system wide safe value of CTR.

The series also adds support for alternative code patching of adrp
instructions by adjusting the PC-relative address offset to reflect
the new PC.

The series has been tested on Juno with a hack to forced enabling
of the capability.

Applies on aarch64 for-next/core.
 The tree is avaiable at :
  git://linux-arm.org/linux-skp.git ctr-v3

Changes since V2:
 - Rebase to for-next/core which contains Ard's series for refactoring
   the arm64_ftr_reg [1]

Changes since V1:

 - Replace adr_adrp insn helper with seperate helpers for adr and adrp.
 - Add/use align_down() macro for adjusting the page address for adrp offsets.
 - Add comments for existing ISS field defintions.
 - Added a patch to disallow silent patching of unhandled pc relative
   instructions in alternative code patching.

[1] http://marc.info/?l=linux-arm-kernel=147263959504998=2


Suzuki K Poulose (9):
  arm64: Set the safe value for L1 icache policy
  arm64: Use consistent naming for errata handling
  arm64: Rearrange CPU errata workaround checks
  arm64: alternative: Disallow patching instructions using literals
  arm64: insn: Add helpers for adrp offsets
  arm64: alternative: Add support for patching adrp instructions
  arm64: Introduce raw_{d,i}cache_line_size
  arm64: Refactor sysinstr exception handling
  arm64: Work around systems with mismatched cache line sizes

 arch/arm64/include/asm/assembler.h  | 45 +--
 arch/arm64/include/asm/cpufeature.h | 13 +++---
 arch/arm64/include/asm/esr.h| 84 +++
 arch/arm64/include/asm/insn.h   | 11 -
 arch/arm64/include/asm/sysreg.h |  1 +
 arch/arm64/kernel/alternative.c | 21 +
 arch/arm64/kernel/asm-offsets.c |  2 +
 arch/arm64/kernel/cpu_errata.c  | 26 ++-
 arch/arm64/kernel/cpufeature.c  | 35 ++-
 arch/arm64/kernel/cpuinfo.c |  2 -
 arch/arm64/kernel/hibernate-asm.S   |  2 +-
 arch/arm64/kernel/insn.c| 13 ++
 arch/arm64/kernel/relocate_kernel.S |  2 +-
 arch/arm64/kernel/smp.c |  8 +++-
 arch/arm64/kernel/traps.c   | 87 ++---
 15 files changed, 287 insertions(+), 65 deletions(-)

-- 
2.7.4



[PATCH v3 4/9] arm64: alternative: Disallow patching instructions using literals

2016-09-05 Thread Suzuki K Poulose
The alternative code patching doesn't check if the replaced instruction
uses a pc relative literal. This could cause silent corruption in the
instruction stream as the instruction will be executed from a different
address than what it was compiled for. Catch all such cases.

Cc: Marc Zyngier <marc.zyng...@arm.com>
Cc: Andre Przywara <andre.przyw...@arm.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Catalin Marinas <catalin.mari...@arm.com>
Suggested-by: Will Deacon <will.dea...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/kernel/alternative.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
index 4434dab..992918d 100644
--- a/arch/arm64/kernel/alternative.c
+++ b/arch/arm64/kernel/alternative.c
@@ -79,6 +79,12 @@ static u32 get_alt_insn(struct alt_instr *alt, u32 *insnptr, 
u32 *altinsnptr)
offset = target - (unsigned long)insnptr;
insn = aarch64_set_branch_offset(insn, offset);
}
+   } else if (aarch64_insn_uses_literal(insn)) {
+   /*
+* Disallow patching unhandled instructions using PC relative
+* literal addresses
+*/
+   BUG();
}
 
return insn;
-- 
2.7.4



Re: [PATCH v3 9/9] arm64: Work around systems with mismatched cache line sizes

2016-09-05 Thread Suzuki K Poulose

On 05/09/16 11:10, Ard Biesheuvel wrote:

On 5 September 2016 at 10:58, Suzuki K Poulose <suzuki.poul...@arm.com> wrote:

+/*
+ * read_ctr - read CTR_EL0. If the system has mismatched
+ * cache line sizes, provide the system wide safe value.
+ */
+   .macro  read_ctr, reg
+alternative_if_not ARM64_MISMATCHED_CACHE_LINE_SIZE
+   mrs \reg, ctr_el0   // read CTR
+   nop
+   nop
+alternative_else
+   adr_l   \reg, arm64_ftr_reg_ctrel0  // Read system wide safe CTR 
value
+   ldr \reg, [\reg, #ARM64_FTR_SYSVAL] // from 
arm64_ftr_reg_ctrel0.sys_val


You should be able to use

ldr_l \reg, arm64_ftr_reg_ctrel0 + ARM64_FTR_SYSVAL

here, and save one instruction.


I had given a thought about that and chose the above to account for a
rare chance of arm64_ftr_reg_ctrel0 spanning across a 4K boundary. But,
you are right, ldr_l could treat (arm64_ftr_reg_ctrel0 + ARM64_FTR_SYSVAL)
as the symbol address and still get the right offset.

Suzuki



Re: [PATCH V7 5/5] perf tools: adding sink configuration for cs_etm PMU

2016-09-01 Thread Suzuki K Poulose

On 31/08/16 15:14, Mathieu Poirier wrote:

On 31 August 2016 at 03:37, Suzuki K Poulose <suzuki.poul...@arm.com> wrote:

On 30/08/16 17:19, Mathieu Poirier wrote:


Using the PMU::set_drv_config() callback to enable the CoreSight
sink that will be used for the trace session.




+int cs_etm_set_drv_config(struct perf_evsel_config_term *term)
+{
+   int ret;
+   char enable_sink[ENABLE_SINK_MAX];
+
+   snprintf(enable_sink, ENABLE_SINK_MAX, "%s/%s",
+term->val.drv_cfg, "enable_sink");
+
+   ret = cs_device__print_file(enable_sink, "%d", 1);
+   if (ret < 0)
+   return ret;
+
+   return 0;
+}




Don't we have to disable the sink at the end of the session ? How is that
taken care of ? Did I miss that ?



Correct - the sink has to be disabled once it is no longer needed.  It
is a little tricky to do and I haven't decided on the best way to
proceed.  Fortunately that aspect doesn't affect this patchset.


Well, this patchset when used, could leave a sink enabled. If we a choose
a different sink (say an ETF) from the perf, which occurs before the
previous sink (say an ETR) in the coresight path, the perf wouldn't get any
trace data, without any clue.

May be we could register an atexit() handler for clearing the sink ? So that
it is guaranteed to clear it irrespective of the path taken by perf to exit ?

Cheers
Suzuki



Re: [PATCH v2 9/9] arm64: Work around systems with mismatched cache line sizes

2016-09-02 Thread Suzuki K Poulose

On 26/08/16 18:00, Catalin Marinas wrote:

On Fri, Aug 26, 2016 at 05:16:27PM +0100, Will Deacon wrote:

On Fri, Aug 26, 2016 at 02:08:01PM +0100, Suzuki K Poulose wrote:

On 26/08/16 14:04, Suzuki K Poulose wrote:



It might be worth looking to see if we can pass the ctr as an extra
parameter to the assembly routines that need it. Then you can access it
easily from C code, and if you pass it as 0 that could result in the asm
code reading it from the h/w register, removing the need for the _raw
stuff you add.


How often to we need to access a sanitised sysreg from assembly? AFAICT,
CTR_EL0 is the first. If we only need it to infer the minimum cache line
size, we could as well store the latter in a global variable and access
it directly. If we feel brave, we could patch a "mov \reg, #x"
instruction in the ?cache_line_size macros (starting with 32 by default,
though to make it less cumbersome we'd have to improve the run-time
patching code a bit).



With Ard's patches [1] to refactor the feature array, we can refer to named
CTR_EL0 feature register cleanly. I can rebase this series on top of that
if nobody has any objection.

[1] http://marc.info/?l=linux-arm-kernel=147263959504998=2


Suzuki



Re: [PATCH V7 5/5] perf tools: adding sink configuration for cs_etm PMU

2016-08-31 Thread Suzuki K Poulose

On 30/08/16 17:19, Mathieu Poirier wrote:

Using the PMU::set_drv_config() callback to enable the CoreSight
sink that will be used for the trace session.



+int cs_etm_set_drv_config(struct perf_evsel_config_term *term)
+{
+   int ret;
+   char enable_sink[ENABLE_SINK_MAX];
+
+   snprintf(enable_sink, ENABLE_SINK_MAX, "%s/%s",
+term->val.drv_cfg, "enable_sink");
+
+   ret = cs_device__print_file(enable_sink, "%d", 1);
+   if (ret < 0)
+   return ret;
+
+   return 0;
+}



Don't we have to disable the sink at the end of the session ? How is that
taken care of ? Did I miss that ?

Suzuki


Re: [PATCH v3] Added perf functionality to mmdc driver

2016-08-31 Thread Suzuki K Poulose

On 17/08/16 20:42, Zhengyu Shen wrote:

MMDC is a multi-mode DDR controller that supports DDR3/DDR3L x16/x32/x64
and LPDDR2 two channel x16/x32 memory types. MMDC is configurable, high
performance, and optimized. MMDC is present on i.MX6 Quad and i.MX6
QuadPlus devices, but this driver only supports i.MX6 Quad at the moment.
MMDC provides registers for performance counters which read via this
driver to help debug memory throughput and similar issues.

$ perf stat -a -e 
mmdc/busy-cycles/,mmdc/read-accesses/,mmdc/read-bytes/,mmdc/total-cycles/,mmdc/write-accesses/,mmdc/write-bytes/
 dd if=/dev/zero of=/dev/null bs=1M count=5000
Performance counter stats for 'dd if=/dev/zero of=/dev/null bs=1M count=5000':

 898021787  mmdc/busy-cycles/
  14819600  mmdc/read-accesses/
471.30 MB   mmdc/read-bytes/
2815419216  mmdc/total-cycles/
  13367354  mmdc/write-accesses/
427.76 MB   mmdc/write-bytes/

   5.334757334 seconds time elapsed

Signed-off-by: Zhengyu Shen 
Signed-off-by: Frank Li 
---
Changes from v2 to v3:
Use WARN_ONCE instead of returning generic error values
Replace CPU Notifiers with newer state machine hotplug
Added additional checks on event_init for grouping and sampling
Remove useless mmdc_enable_profiling function
Added comments
Moved start index of events from 0x01 to 0x00
Added a counter to pmu_mmdc to only stop hrtimer after all events are 
finished
Replace readl_relaxed and writel_relaxed with readl and writel
Removed duplicate update function
Used devm_kasprintf when naming mmdcs probed

Changes from v1 to v2:
Added cpumask and migration handling support to driver
Validated event during event_init
Added code to properly stop counters
Used perf_invalid_context instead of perf_sw_context
Added hrtimer to poll for overflow
Added better description
Added support for multiple mmdcs



Should we move all the PMU specific code under at least CONFIG_PERF_EVENTS ?

Suzuki


Re: [PATCH v3] Added perf functionality to mmdc driver

2016-08-31 Thread Suzuki K Poulose

On 17/08/16 20:42, Zhengyu Shen wrote:

MMDC is a multi-mode DDR controller that supports DDR3/DDR3L x16/x32/x64
and LPDDR2 two channel x16/x32 memory types. MMDC is configurable, high
performance, and optimized. MMDC is present on i.MX6 Quad and i.MX6
QuadPlus devices, but this driver only supports i.MX6 Quad at the moment.
MMDC provides registers for performance counters which read via this
driver to help debug memory throughput and similar issues.

$ perf stat -a -e 
mmdc/busy-cycles/,mmdc/read-accesses/,mmdc/read-bytes/,mmdc/total-cycles/,mmdc/write-accesses/,mmdc/write-bytes/
 dd if=/dev/zero of=/dev/null bs=1M count=5000
Performance counter stats for 'dd if=/dev/zero of=/dev/null bs=1M count=5000':

 898021787  mmdc/busy-cycles/
  14819600  mmdc/read-accesses/
471.30 MB   mmdc/read-bytes/
2815419216  mmdc/total-cycles/
  13367354  mmdc/write-accesses/
427.76 MB   mmdc/write-bytes/

   5.334757334 seconds time elapsed

Signed-off-by: Zhengyu Shen 
Signed-off-by: Frank Li 




+
+static int mmdc_pmu_init(struct mmdc_pmu *pmu_mmdc,
+   void __iomem *mmdc_base, struct device *dev)
+{
+   int mmdc_num;
+
+   *pmu_mmdc = (struct mmdc_pmu) {
+   .pmu = (struct pmu) {
+   .task_ctx_nr= perf_invalid_context,
+   .attr_groups= attr_groups,
+   .event_init = mmdc_event_init,
+   .add= mmdc_event_add,
+   .del= mmdc_event_del,
+   .start  = mmdc_event_start,
+   .stop   = mmdc_event_stop,
+   .read   = mmdc_event_update,
+   },
+   .mmdc_base = mmdc_base,
+   };
+
+   mmdc_num = ida_simple_get(_ida, 0, 0, GFP_KERNEL);
+
+   cpumask_set_cpu(smp_processor_id(), _mmdc->cpu);
+
+   pmu_mmdc->dev = dev;
+   pmu_mmdc->active_events = 0;
+   spin_lock_init(_mmdc->mmdc_active_events_lock);
+
+   cpuhp_mmdc_pmu = pmu_mmdc;
+   cpuhp_setup_state(CPUHP_ONLINE,
+   "PERF_MMDC_ONLINE", NULL,
+   mmdc_pmu_offline_cpu);


You may want cpuhp_setup_state_nocalls instead here ?


Cheers
Suzuki


[PATCH v4 6/9] arm64: alternative: Add support for patching adrp instructions

2016-09-09 Thread Suzuki K Poulose
adrp uses PC-relative address offset to a page (of 4K size) of
a symbol. If it appears in an alternative code patched in, we
should adjust the offset to reflect the address where it will
be run from. This patch adds support for fixing the offset
for adrp instructions.

Cc: Will Deacon <will.dea...@arm.com>
Cc: Marc Zyngier <marc.zyng...@arm.com>
Cc: Andre Przywara <andre.przyw...@arm.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/kernel/alternative.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
index 992918d..06d650f 100644
--- a/arch/arm64/kernel/alternative.c
+++ b/arch/arm64/kernel/alternative.c
@@ -58,6 +58,8 @@ static bool branch_insn_requires_update(struct alt_instr 
*alt, unsigned long pc)
BUG();
 }
 
+#define align_down(x, a)   ((unsigned long)(x) & ~(((unsigned long)(a)) - 
1))
+
 static u32 get_alt_insn(struct alt_instr *alt, u32 *insnptr, u32 *altinsnptr)
 {
u32 insn;
@@ -79,6 +81,19 @@ static u32 get_alt_insn(struct alt_instr *alt, u32 *insnptr, 
u32 *altinsnptr)
offset = target - (unsigned long)insnptr;
insn = aarch64_set_branch_offset(insn, offset);
}
+   } else if (aarch64_insn_is_adrp(insn)) {
+   s32 orig_offset, new_offset;
+   unsigned long target;
+
+   /*
+* If we're replacing an adrp instruction, which uses 
PC-relative
+* immediate addressing, adjust the offset to reflect the new
+* PC. adrp operates on 4K aligned addresses.
+*/
+   orig_offset  = aarch64_insn_adrp_get_offset(insn);
+   target = align_down(altinsnptr, SZ_4K) + orig_offset;
+   new_offset = target - align_down(insnptr, SZ_4K);
+   insn = aarch64_insn_adrp_set_offset(insn, new_offset);
} else if (aarch64_insn_uses_literal(insn)) {
/*
 * Disallow patching unhandled instructions using PC relative
-- 
2.7.4



[PATCH v4 7/9] arm64: Introduce raw_{d,i}cache_line_size

2016-09-09 Thread Suzuki K Poulose
On systems with mismatched i/d cache min line sizes, we need to use
the smallest size possible across all CPUs. This will be done by fetching
the system wide safe value from CPU feature infrastructure.
However the some special users(e.g kexec, hibernate) would need the line
size on the CPU (rather than the system wide), when either the system
wide feature may not be accessible or it is guranteed that the caller
executes with a gurantee of no migration.
Provide another helper which will fetch cache line size on the current CPU.

Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Catalin Marinas <catalin.mari...@arm.com>
Acked-by: James Morse <james.mo...@arm.com>
Reviewed-by: Geoff Levand <ge...@infradead.org>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/include/asm/assembler.h  | 24 
 arch/arm64/kernel/hibernate-asm.S   |  2 +-
 arch/arm64/kernel/relocate_kernel.S |  2 +-
 3 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h 
b/arch/arm64/include/asm/assembler.h
index d5025c6..a4bb3f5 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -218,9 +218,10 @@ lr .reqx30 // link register
.endm
 
 /*
- * dcache_line_size - get the minimum D-cache line size from the CTR register.
+ * raw_dcache_line_size - get the minimum D-cache line size on this CPU
+ * from the CTR register.
  */
-   .macro  dcache_line_size, reg, tmp
+   .macro  raw_dcache_line_size, reg, tmp
mrs \tmp, ctr_el0   // read CTR
ubfm\tmp, \tmp, #16, #19// cache line size encoding
mov \reg, #4// bytes per word
@@ -228,9 +229,17 @@ lr .reqx30 // link register
.endm
 
 /*
- * icache_line_size - get the minimum I-cache line size from the CTR register.
+ * dcache_line_size - get the safe D-cache line size across all CPUs
  */
-   .macro  icache_line_size, reg, tmp
+   .macro  dcache_line_size, reg, tmp
+   raw_dcache_line_size\reg, \tmp
+   .endm
+
+/*
+ * raw_icache_line_size - get the minimum I-cache line size on this CPU
+ * from the CTR register.
+ */
+   .macro  raw_icache_line_size, reg, tmp
mrs \tmp, ctr_el0   // read CTR
and \tmp, \tmp, #0xf// cache line size encoding
mov \reg, #4// bytes per word
@@ -238,6 +247,13 @@ lr .reqx30 // link register
.endm
 
 /*
+ * icache_line_size - get the safe I-cache line size across all CPUs
+ */
+   .macro  icache_line_size, reg, tmp
+   raw_icache_line_size\reg, \tmp
+   .endm
+
+/*
  * tcr_set_idmap_t0sz - update TCR.T0SZ so that we can load the ID map
  */
.macro  tcr_set_idmap_t0sz, valreg, tmpreg
diff --git a/arch/arm64/kernel/hibernate-asm.S 
b/arch/arm64/kernel/hibernate-asm.S
index 7734f3e..e56d848 100644
--- a/arch/arm64/kernel/hibernate-asm.S
+++ b/arch/arm64/kernel/hibernate-asm.S
@@ -96,7 +96,7 @@ ENTRY(swsusp_arch_suspend_exit)
 
add x1, x10, #PAGE_SIZE
/* Clean the copied page to PoU - based on flush_icache_range() */
-   dcache_line_size x2, x3
+   raw_dcache_line_size x2, x3
sub x3, x2, #1
bic x4, x10, x3
 2: dc  cvau, x4/* clean D line / unified line */
diff --git a/arch/arm64/kernel/relocate_kernel.S 
b/arch/arm64/kernel/relocate_kernel.S
index 51b73cd..ce704a4 100644
--- a/arch/arm64/kernel/relocate_kernel.S
+++ b/arch/arm64/kernel/relocate_kernel.S
@@ -34,7 +34,7 @@ ENTRY(arm64_relocate_new_kernel)
/* Setup the list loop variables. */
mov x17, x1 /* x17 = kimage_start */
mov x16, x0 /* x16 = kimage_head */
-   dcache_line_size x15, x0/* x15 = dcache line size */
+   raw_dcache_line_size x15, x0/* x15 = dcache line size */
mov x14, xzr/* x14 = entry ptr */
mov x13, xzr/* x13 = copy dest */
 
-- 
2.7.4



[PATCH v4 0/9] arm64: Work around for mismatched cache line size

2016-09-09 Thread Suzuki K Poulose
This series adds a work around for systems with mismatched {I,D}-cache
line sizes. When a thread of execution gets migrated to a different CPU,
the cache line size it had cached could be larger than that of the new
CPU. This could cause data corruption issues. We work around this by

 - Dynamically patching the kernel to use the smallest line size on the
   system (from the CPU feature infrastructure)
 - Trapping the userspace access to CTR_EL0 (by clearing SCTLR_EL1.UCT) and
   emulating it with the system wide safe value of CTR.

The series also adds support for alternative code patching of adrp
instructions by adjusting the PC-relative address offset to reflect
the new PC.

The series has been tested on Juno with a hack to forced enabling
of the capability.

Applies on aarch64 for-next/core.
 The tree is avaiable at :
  git://linux-arm.org/linux-skp.git ctr-v4

Changes since V3:
 - Use ldr_l for arm64_ftr_reg_ctrel0+ARM64_FTR_SYS_VAL, saving one
   instruction.
 - Added Acked-by's from Andre

Changes since V2:
 - Rebase to for-next/core which contains Ard's series for refactoring
   the arm64_ftr_reg [1]

Changes since V1:

 - Replace adr_adrp insn helper with seperate helpers for adr and adrp.
 - Add/use align_down() macro for adjusting the page address for adrp offsets.
 - Add comments for existing ISS field defintions.
 - Added a patch to disallow silent patching of unhandled pc relative
   instructions in alternative code patching.

[1] http://marc.info/?l=linux-arm-kernel=147263959504998=2


Suzuki K Poulose (9):
  arm64: Set the safe value for L1 icache policy
  arm64: Use consistent naming for errata handling
  arm64: Rearrange CPU errata workaround checks
  arm64: alternative: Disallow patching instructions using literals
  arm64: insn: Add helpers for adrp offsets
  arm64: alternative: Add support for patching adrp instructions
  arm64: Introduce raw_{d,i}cache_line_size
  arm64: Refactor sysinstr exception handling
  arm64: Work around systems with mismatched cache line sizes

 arch/arm64/include/asm/assembler.h  | 44 +--
 arch/arm64/include/asm/cpufeature.h | 13 +++---
 arch/arm64/include/asm/esr.h| 84 +++
 arch/arm64/include/asm/insn.h   | 11 -
 arch/arm64/include/asm/sysreg.h |  1 +
 arch/arm64/kernel/alternative.c | 21 +
 arch/arm64/kernel/asm-offsets.c |  2 +
 arch/arm64/kernel/cpu_errata.c  | 26 ++-
 arch/arm64/kernel/cpufeature.c  | 35 ++-
 arch/arm64/kernel/cpuinfo.c |  2 -
 arch/arm64/kernel/hibernate-asm.S   |  2 +-
 arch/arm64/kernel/insn.c| 13 ++
 arch/arm64/kernel/relocate_kernel.S |  2 +-
 arch/arm64/kernel/smp.c |  8 +++-
 arch/arm64/kernel/traps.c   | 87 ++---
 15 files changed, 286 insertions(+), 65 deletions(-)

-- 
2.7.4



[PATCH v4 4/9] arm64: alternative: Disallow patching instructions using literals

2016-09-09 Thread Suzuki K Poulose
The alternative code patching doesn't check if the replaced instruction
uses a pc relative literal. This could cause silent corruption in the
instruction stream as the instruction will be executed from a different
address than what it was compiled for. Catch all such cases.

Cc: Marc Zyngier <marc.zyng...@arm.com>
Cc: Andre Przywara <andre.przyw...@arm.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Catalin Marinas <catalin.mari...@arm.com>
Suggested-by: Will Deacon <will.dea...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/kernel/alternative.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
index 4434dab..992918d 100644
--- a/arch/arm64/kernel/alternative.c
+++ b/arch/arm64/kernel/alternative.c
@@ -79,6 +79,12 @@ static u32 get_alt_insn(struct alt_instr *alt, u32 *insnptr, 
u32 *altinsnptr)
offset = target - (unsigned long)insnptr;
insn = aarch64_set_branch_offset(insn, offset);
}
+   } else if (aarch64_insn_uses_literal(insn)) {
+   /*
+* Disallow patching unhandled instructions using PC relative
+* literal addresses
+*/
+   BUG();
}
 
return insn;
-- 
2.7.4



[PATCH v4 1/9] arm64: Set the safe value for L1 icache policy

2016-09-09 Thread Suzuki K Poulose
Right now we use 0 as the safe value for CTR_EL0:L1Ip, which is
not defined at the moment. The safer value for the L1Ip should be
the weakest of the policies, which happens to be AIVIVT. While at it,
fix the comment about safe_val.

Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Catalin Marinas <catalin.mari...@arm.com>
Cc: Will Deacon <will.dea...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/include/asm/cpufeature.h | 2 +-
 arch/arm64/kernel/cpufeature.c  | 5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/cpufeature.h 
b/arch/arm64/include/asm/cpufeature.h
index c07c5d1..df47969 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -63,7 +63,7 @@ struct arm64_ftr_bits {
enum ftr_type   type;
u8  shift;
u8  width;
-   s64 safe_val; /* safe value for discrete features */
+   s64 safe_val; /* safe value for FTR_EXACT features */
 };
 
 /*
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index c3d7ae4..4a19138d 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -147,9 +147,10 @@ static const struct arm64_ftr_bits ftr_ctr[] = {
ARM64_FTR_BITS(FTR_STRICT, FTR_LOWER_SAFE, 16, 4, 1),   /* DminLine */
/*
 * Linux can handle differing I-cache policies. Userspace JITs will
-* make use of *minLine
+* make use of *minLine.
+* If we have differing I-cache policies, report it as the weakest - 
AIVIVT.
 */
-   ARM64_FTR_BITS(FTR_NONSTRICT, FTR_EXACT, 14, 2, 0), /* L1Ip */
+   ARM64_FTR_BITS(FTR_NONSTRICT, FTR_EXACT, 14, 2, ICACHE_POLICY_AIVIVT),  
/* L1Ip */
ARM64_FTR_BITS(FTR_STRICT, FTR_EXACT, 4, 10, 0),/* RAZ */
ARM64_FTR_BITS(FTR_STRICT, FTR_LOWER_SAFE, 0, 4, 0),/* IminLine */
ARM64_FTR_END,
-- 
2.7.4



[PATCH v4 5/9] arm64: insn: Add helpers for adrp offsets

2016-09-09 Thread Suzuki K Poulose
Adds helpers for decoding/encoding the PC relative addresses for adrp.
This will be used for handling dynamic patching of 'adrp' instructions
in alternative code patching.

Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Catalin Marinas <catalin.mari...@arm.com>
Cc: Marc Zyngier <marc.zyng...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/include/asm/insn.h | 11 ++-
 arch/arm64/kernel/insn.c  | 13 +
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
index 1dbaa90..bc85366 100644
--- a/arch/arm64/include/asm/insn.h
+++ b/arch/arm64/include/asm/insn.h
@@ -246,7 +246,8 @@ static __always_inline bool aarch64_insn_is_##abbr(u32 
code) \
 static __always_inline u32 aarch64_insn_get_##abbr##_value(void) \
 { return (val); }
 
-__AARCH64_INSN_FUNCS(adr_adrp, 0x1F00, 0x1000)
+__AARCH64_INSN_FUNCS(adr,  0x9F00, 0x1000)
+__AARCH64_INSN_FUNCS(adrp, 0x9F00, 0x9000)
 __AARCH64_INSN_FUNCS(prfm_lit, 0xFF00, 0xD800)
 __AARCH64_INSN_FUNCS(str_reg,  0x3FE0EC00, 0x38206800)
 __AARCH64_INSN_FUNCS(ldr_reg,  0x3FE0EC00, 0x38606800)
@@ -318,6 +319,11 @@ __AARCH64_INSN_FUNCS(msr_reg,  0xFFF0, 0xD510)
 bool aarch64_insn_is_nop(u32 insn);
 bool aarch64_insn_is_branch_imm(u32 insn);
 
+static inline bool aarch64_insn_is_adr_adrp(u32 insn)
+{
+   return aarch64_insn_is_adr(insn) || aarch64_insn_is_adrp(insn);
+}
+
 int aarch64_insn_read(void *addr, u32 *insnp);
 int aarch64_insn_write(void *addr, u32 insn);
 enum aarch64_insn_encoding_class aarch64_get_insn_class(u32 insn);
@@ -398,6 +404,9 @@ int aarch64_insn_patch_text_nosync(void *addr, u32 insn);
 int aarch64_insn_patch_text_sync(void *addrs[], u32 insns[], int cnt);
 int aarch64_insn_patch_text(void *addrs[], u32 insns[], int cnt);
 
+s32 aarch64_insn_adrp_get_offset(u32 insn);
+u32 aarch64_insn_adrp_set_offset(u32 insn, s32 offset);
+
 bool aarch32_insn_is_wide(u32 insn);
 
 #define A32_RN_OFFSET  16
diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
index 178488f..6f2ac4f 100644
--- a/arch/arm64/kernel/insn.c
+++ b/arch/arm64/kernel/insn.c
@@ -1202,6 +1202,19 @@ u32 aarch64_set_branch_offset(u32 insn, s32 offset)
BUG();
 }
 
+s32 aarch64_insn_adrp_get_offset(u32 insn)
+{
+   BUG_ON(!aarch64_insn_is_adrp(insn));
+   return aarch64_insn_decode_immediate(AARCH64_INSN_IMM_ADR, insn) << 12;
+}
+
+u32 aarch64_insn_adrp_set_offset(u32 insn, s32 offset)
+{
+   BUG_ON(!aarch64_insn_is_adrp(insn));
+   return aarch64_insn_encode_immediate(AARCH64_INSN_IMM_ADR, insn,
+   offset >> 12);
+}
+
 /*
  * Extract the Op/CR data from a msr/mrs instruction.
  */
-- 
2.7.4



[PATCH v4 3/9] arm64: Rearrange CPU errata workaround checks

2016-09-09 Thread Suzuki K Poulose
Right now we run through the work around checks on a CPU
from __cpuinfo_store_cpu. There are some problems with that:

1) We initialise the system wide CPU feature registers only after the
Boot CPU updates its cpuinfo. Now, if a work around depends on the
variance of a CPU ID feature (e.g, check for Cache Line size mismatch),
we have no way of performing it cleanly for the boot CPU.

2) It is out of place, invoked from __cpuinfo_store_cpu() in cpuinfo.c. It
is not an obvious place for that.

This patch rearranges the CPU specific capability(aka work around) checks.

1) At the moment we use verify_local_cpu_capabilities() to check if a new
CPU has all the system advertised features. Use this for the secondary CPUs
to perform the work around check. For that we rename
  verify_local_cpu_capabilities() => check_local_cpu_capabilities()
which:

   If the system wide capabilities haven't been initialised (i.e, the CPU
   is activated at the boot), update the system wide detected work arounds.

   Otherwise (i.e a CPU hotplugged in later) verify that this CPU conforms to 
the
   system wide capabilities.

2) Boot CPU updates the work arounds from smp_prepare_boot_cpu() after we have
initialised the system wide CPU feature values.

Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Andre Przywara <andre.przyw...@arm.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Catalin Marinas <catalin.mari...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/include/asm/cpufeature.h |  4 ++--
 arch/arm64/kernel/cpufeature.c  | 30 --
 arch/arm64/kernel/cpuinfo.c |  2 --
 arch/arm64/kernel/smp.c |  8 +++-
 4 files changed, 29 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/include/asm/cpufeature.h 
b/arch/arm64/include/asm/cpufeature.h
index 8289295..0c4f282 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -194,11 +194,11 @@ void __init setup_cpu_features(void);
 void update_cpu_capabilities(const struct arm64_cpu_capabilities *caps,
const char *info);
 void enable_cpu_capabilities(const struct arm64_cpu_capabilities *caps);
+void check_local_cpu_capabilities(void);
+
 void update_cpu_errata_workarounds(void);
 void __init enable_errata_workarounds(void);
-
 void verify_local_cpu_errata_workarounds(void);
-void verify_local_cpu_capabilities(void);
 
 u64 read_system_reg(u32 id);
 
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 643c856..90e89d1 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1006,23 +1006,33 @@ verify_local_cpu_features(const struct 
arm64_cpu_capabilities *caps)
  * cannot do anything to fix it up and could cause unexpected failures. So
  * we park the CPU.
  */
-void verify_local_cpu_capabilities(void)
+static void verify_local_cpu_capabilities(void)
 {
+   verify_local_cpu_errata_workarounds();
+   verify_local_cpu_features(arm64_features);
+   verify_local_elf_hwcaps(arm64_elf_hwcaps);
+   if (system_supports_32bit_el0())
+   verify_local_elf_hwcaps(compat_elf_hwcaps);
+}
 
+void check_local_cpu_capabilities(void)
+{
+   /*
+* All secondary CPUs should conform to the early CPU features
+* in use by the kernel based on boot CPU.
+*/
check_early_cpu_features();
 
/*
-* If we haven't computed the system capabilities, there is nothing
-* to verify.
+* If we haven't finalised the system capabilities, this CPU gets
+* a chance to update the errata work arounds.
+* Otherwise, this CPU should verify that it has all the system
+* advertised capabilities.
 */
if (!sys_caps_initialised)
-   return;
-
-   verify_local_cpu_errata_workarounds();
-   verify_local_cpu_features(arm64_features);
-   verify_local_elf_hwcaps(arm64_elf_hwcaps);
-   if (system_supports_32bit_el0())
-   verify_local_elf_hwcaps(compat_elf_hwcaps);
+   update_cpu_errata_workarounds();
+   else
+   verify_local_cpu_capabilities();
 }
 
 static void __init setup_feature_capabilities(void)
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index 4fa7b73..b3d5b3e 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -363,8 +363,6 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info)
}
 
cpuinfo_detect_icache_policy(info);
-
-   update_cpu_errata_workarounds();
 }
 
 void cpuinfo_store_cpu(void)
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index d93d433..99d8cc3 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -239,7 +239,7 @@ asmlinkage void secondary_start_kernel(void)
 * this CPU ticks all of those. If it doesn't, the CPU wi

[PATCH v4 2/9] arm64: Use consistent naming for errata handling

2016-09-09 Thread Suzuki K Poulose
This is a cosmetic change to rename the functions dealing with
the errata work arounds to be more consistent with their naming.

1) check_local_cpu_errata() => update_cpu_errata_workarounds()
check_local_cpu_errata() actually updates the system's errata work
arounds. So rename it to reflect the same.

2) verify_local_cpu_errata() => verify_local_cpu_errata_workarounds()
Use errata_workarounds instead of _errata.

Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Catalin Marinas <catalin.mari...@arm.com>
Acked-by: Andre Przywara <andre.przyw...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/include/asm/cpufeature.h | 4 ++--
 arch/arm64/kernel/cpu_errata.c  | 4 ++--
 arch/arm64/kernel/cpufeature.c  | 2 +-
 arch/arm64/kernel/cpuinfo.c | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/cpufeature.h 
b/arch/arm64/include/asm/cpufeature.h
index df47969..8289295 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -194,10 +194,10 @@ void __init setup_cpu_features(void);
 void update_cpu_capabilities(const struct arm64_cpu_capabilities *caps,
const char *info);
 void enable_cpu_capabilities(const struct arm64_cpu_capabilities *caps);
-void check_local_cpu_errata(void);
+void update_cpu_errata_workarounds(void);
 void __init enable_errata_workarounds(void);
 
-void verify_local_cpu_errata(void);
+void verify_local_cpu_errata_workarounds(void);
 void verify_local_cpu_capabilities(void);
 
 u64 read_system_reg(u32 id);
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index 82b0fc2..5836b3d 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -116,7 +116,7 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
  * and the related information is freed soon after. If the new CPU requires
  * an errata not detected at boot, fail this CPU.
  */
-void verify_local_cpu_errata(void)
+void verify_local_cpu_errata_workarounds(void)
 {
const struct arm64_cpu_capabilities *caps = arm64_errata;
 
@@ -131,7 +131,7 @@ void verify_local_cpu_errata(void)
}
 }
 
-void check_local_cpu_errata(void)
+void update_cpu_errata_workarounds(void)
 {
update_cpu_capabilities(arm64_errata, "enabling workaround for");
 }
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 4a19138d..643c856 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1018,7 +1018,7 @@ void verify_local_cpu_capabilities(void)
if (!sys_caps_initialised)
return;
 
-   verify_local_cpu_errata();
+   verify_local_cpu_errata_workarounds();
verify_local_cpu_features(arm64_features);
verify_local_elf_hwcaps(arm64_elf_hwcaps);
if (system_supports_32bit_el0())
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index ed1b84f..4fa7b73 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -364,7 +364,7 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info)
 
cpuinfo_detect_icache_policy(info);
 
-   check_local_cpu_errata();
+   update_cpu_errata_workarounds();
 }
 
 void cpuinfo_store_cpu(void)
-- 
2.7.4



[PATCH v4 8/9] arm64: Refactor sysinstr exception handling

2016-09-09 Thread Suzuki K Poulose
Right now we trap some of the user space data cache operations
based on a few Errata (ARM 819472, 826319, 827319 and 824069).
We need to trap userspace access to CTR_EL0, if we detect mismatched
cache line size. Since both these traps share the EC, refactor
the handler a little bit to make it a bit more reader friendly.

Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Catalin Marinas <catalin.mari...@arm.com>
Acked-by: Andre Przywara <andre.przyw...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/include/asm/esr.h | 76 ++--
 arch/arm64/kernel/traps.c| 73 +++---
 2 files changed, 114 insertions(+), 35 deletions(-)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index f772e15..9875b32 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -78,6 +78,23 @@
 
 #define ESR_ELx_IL (UL(1) << 25)
 #define ESR_ELx_ISS_MASK   (ESR_ELx_IL - 1)
+
+/* ISS field definitions shared by different classes */
+#define ESR_ELx_WNR(UL(1) << 6)
+
+/* Shared ISS field definitions for Data/Instruction aborts */
+#define ESR_ELx_EA (UL(1) << 9)
+#define ESR_ELx_S1PTW  (UL(1) << 7)
+
+/* Shared ISS fault status code(IFSC/DFSC) for Data/Instruction aborts */
+#define ESR_ELx_FSC(0x3F)
+#define ESR_ELx_FSC_TYPE   (0x3C)
+#define ESR_ELx_FSC_EXTABT (0x10)
+#define ESR_ELx_FSC_ACCESS (0x08)
+#define ESR_ELx_FSC_FAULT  (0x04)
+#define ESR_ELx_FSC_PERM   (0x0C)
+
+/* ISS field definitions for Data Aborts */
 #define ESR_ELx_ISV(UL(1) << 24)
 #define ESR_ELx_SAS_SHIFT  (22)
 #define ESR_ELx_SAS(UL(3) << ESR_ELx_SAS_SHIFT)
@@ -86,16 +103,9 @@
 #define ESR_ELx_SRT_MASK   (UL(0x1F) << ESR_ELx_SRT_SHIFT)
 #define ESR_ELx_SF (UL(1) << 15)
 #define ESR_ELx_AR (UL(1) << 14)
-#define ESR_ELx_EA (UL(1) << 9)
 #define ESR_ELx_CM (UL(1) << 8)
-#define ESR_ELx_S1PTW  (UL(1) << 7)
-#define ESR_ELx_WNR(UL(1) << 6)
-#define ESR_ELx_FSC(0x3F)
-#define ESR_ELx_FSC_TYPE   (0x3C)
-#define ESR_ELx_FSC_EXTABT (0x10)
-#define ESR_ELx_FSC_ACCESS (0x08)
-#define ESR_ELx_FSC_FAULT  (0x04)
-#define ESR_ELx_FSC_PERM   (0x0C)
+
+/* ISS field definitions for exceptions taken in to Hyp */
 #define ESR_ELx_CV (UL(1) << 24)
 #define ESR_ELx_COND_SHIFT (20)
 #define ESR_ELx_COND_MASK  (UL(0xF) << ESR_ELx_COND_SHIFT)
@@ -109,6 +119,54 @@
((ESR_ELx_EC_BRK64 << ESR_ELx_EC_SHIFT) | ESR_ELx_IL |  \
 ((imm) & 0x))
 
+/* ISS field definitions for System instruction traps */
+#define ESR_ELx_SYS64_ISS_RES0_SHIFT   22
+#define ESR_ELx_SYS64_ISS_RES0_MASK(UL(0x7) << 
ESR_ELx_SYS64_ISS_RES0_SHIFT)
+#define ESR_ELx_SYS64_ISS_DIR_MASK 0x1
+#define ESR_ELx_SYS64_ISS_DIR_READ 0x1
+#define ESR_ELx_SYS64_ISS_DIR_WRITE0x0
+
+#define ESR_ELx_SYS64_ISS_RT_SHIFT 5
+#define ESR_ELx_SYS64_ISS_RT_MASK  (UL(0x1f) << ESR_ELx_SYS64_ISS_RT_SHIFT)
+#define ESR_ELx_SYS64_ISS_CRM_SHIFT1
+#define ESR_ELx_SYS64_ISS_CRM_MASK (UL(0xf) << ESR_ELx_SYS64_ISS_CRM_SHIFT)
+#define ESR_ELx_SYS64_ISS_CRN_SHIFT10
+#define ESR_ELx_SYS64_ISS_CRN_MASK (UL(0xf) << ESR_ELx_SYS64_ISS_CRN_SHIFT)
+#define ESR_ELx_SYS64_ISS_OP1_SHIFT14
+#define ESR_ELx_SYS64_ISS_OP1_MASK (UL(0x7) << ESR_ELx_SYS64_ISS_OP1_SHIFT)
+#define ESR_ELx_SYS64_ISS_OP2_SHIFT17
+#define ESR_ELx_SYS64_ISS_OP2_MASK (UL(0x7) << ESR_ELx_SYS64_ISS_OP2_SHIFT)
+#define ESR_ELx_SYS64_ISS_OP0_SHIFT20
+#define ESR_ELx_SYS64_ISS_OP0_MASK (UL(0x3) << ESR_ELx_SYS64_ISS_OP0_SHIFT)
+#define ESR_ELx_SYS64_ISS_SYS_MASK (ESR_ELx_SYS64_ISS_OP0_MASK | \
+ESR_ELx_SYS64_ISS_OP1_MASK | \
+ESR_ELx_SYS64_ISS_OP2_MASK | \
+ESR_ELx_SYS64_ISS_CRN_MASK | \
+ESR_ELx_SYS64_ISS_CRM_MASK)
+#define ESR_ELx_SYS64_ISS_SYS_VAL(op0, op1, op2, crn, crm) \
+   (((op0) << ESR_ELx_SYS64_ISS_OP0_SHIFT) 
| \
+((op1) << ESR_ELx_SYS64_ISS_OP1_SHIFT) 
| \
+((op2) << ESR_ELx_SYS64_ISS_OP2_SHIFT) 
| \
+((crn) << ESR_ELx_SYS64_ISS_CRN_SHIFT) 
| \
+((crm) << ESR_ELx_SYS64_ISS_CRM_SHIFT))
+/*
+ * User space cache operations have the following sysreg encoding
+ * in System instructions.
+ * op0=1, op1=3, op2=1, crn=7, crm={ 5, 10, 11, 14

[PATCH v4 9/9] arm64: Work around systems with mismatched cache line sizes

2016-09-09 Thread Suzuki K Poulose
Systems with differing CPU i-cache/d-cache line sizes can cause
problems with the cache management by software when the execution
is migrated from one to another. Usually, the application reads
the cache size on a CPU and then uses that length to perform cache
operations. However, if it gets migrated to another CPU with a smaller
cache line size, things could go completely wrong. To prevent such
cases, always use the smallest cache line size among the CPUs. The
kernel CPU feature infrastructure already keeps track of the safe
value for all CPUID registers including CTR. This patch works around
the problem by :

For kernel, dynamically patch the kernel to read the cache size
from the system wide copy of CTR_EL0.

For applications, trap read accesses to CTR_EL0 (by clearing the SCTLR.UCT)
and emulate the mrs instruction to return the system wide safe value
of CTR_EL0.

For faster access (i.e, avoiding to lookup the system wide value of CTR_EL0
via read_system_reg), we keep track of the pointer to table entry for
CTR_EL0 in the CPU feature infrastructure.

Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Andre Przywara <andre.przyw...@arm.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Catalin Marinas <catalin.mari...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/include/asm/assembler.h  | 24 ++--
 arch/arm64/include/asm/cpufeature.h |  3 ++-
 arch/arm64/include/asm/esr.h|  8 
 arch/arm64/include/asm/sysreg.h |  1 +
 arch/arm64/kernel/asm-offsets.c |  2 ++
 arch/arm64/kernel/cpu_errata.c  | 22 ++
 arch/arm64/kernel/traps.c   | 14 ++
 7 files changed, 71 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h 
b/arch/arm64/include/asm/assembler.h
index a4bb3f5..f09a5ae 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -216,6 +216,20 @@ lr .reqx30 // link register
.macro  mmid, rd, rn
ldr \rd, [\rn, #MM_CONTEXT_ID]
.endm
+/*
+ * read_ctr - read CTR_EL0. If the system has mismatched
+ * cache line sizes, provide the system wide safe value
+ * from arm64_ftr_reg_ctrel0.sys_val
+ */
+   .macro  read_ctr, reg
+alternative_if_not ARM64_MISMATCHED_CACHE_LINE_SIZE
+   mrs \reg, ctr_el0   // read CTR
+   nop
+alternative_else
+   ldr_l   \reg, arm64_ftr_reg_ctrel0 + ARM64_FTR_SYSVAL
+alternative_endif
+   .endm
+
 
 /*
  * raw_dcache_line_size - get the minimum D-cache line size on this CPU
@@ -232,7 +246,10 @@ lr .reqx30 // link register
  * dcache_line_size - get the safe D-cache line size across all CPUs
  */
.macro  dcache_line_size, reg, tmp
-   raw_dcache_line_size\reg, \tmp
+   read_ctr\tmp
+   ubfm\tmp, \tmp, #16, #19// cache line size encoding
+   mov \reg, #4// bytes per word
+   lsl \reg, \reg, \tmp// actual cache line size
.endm
 
 /*
@@ -250,7 +267,10 @@ lr .reqx30 // link register
  * icache_line_size - get the safe I-cache line size across all CPUs
  */
.macro  icache_line_size, reg, tmp
-   raw_icache_line_size\reg, \tmp
+   read_ctr\tmp
+   and \tmp, \tmp, #0xf// cache line size encoding
+   mov \reg, #4// bytes per word
+   lsl \reg, \reg, \tmp// actual cache line size
.endm
 
 /*
diff --git a/arch/arm64/include/asm/cpufeature.h 
b/arch/arm64/include/asm/cpufeature.h
index 0c4f282..8f325bf 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -37,8 +37,9 @@
 #define ARM64_WORKAROUND_CAVIUM_27456  12
 #define ARM64_HAS_32BIT_EL013
 #define ARM64_HYP_OFFSET_LOW   14
+#define ARM64_MISMATCHED_CACHE_LINE_SIZE   15
 
-#define ARM64_NCAPS15
+#define ARM64_NCAPS16
 
 #ifndef __ASSEMBLY__
 
diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index 9875b32..d14c478 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -149,6 +149,9 @@
 ((op2) << ESR_ELx_SYS64_ISS_OP2_SHIFT) 
| \
 ((crn) << ESR_ELx_SYS64_ISS_CRN_SHIFT) 
| \
 ((crm) << ESR_ELx_SYS64_ISS_CRM_SHIFT))
+
+#define ESR_ELx_SYS64_ISS_SYS_OP_MASK  (ESR_ELx_SYS64_ISS_SYS_MASK | \
+ESR_ELx_SYS64_ISS_DIR_MASK)
 /*
  * User space cache operations have the following sysreg encoding
  * in System instructions.
@@ -167,6 +170,11 @@
 #define ESR_ELx_SYS64_ISS_EL0_CACHE_OP_VAL \

Re: [PATCH] arm-cci: add cci_enable_port_for_self() declaration in arm-cci.h

2016-09-12 Thread Suzuki K Poulose

On 12/09/16 11:33, Baoyou Xie wrote:

We get 1 warning when building kernel with W=1:
drivers/bus/arm-cci.c:2027:25: warning: no previous prototype for 
'cci_enable_port_for_self' [-Wmissing-prototypes]

In fact, this function is used in a few files,
but should be declared in a header file.

So this patch adds the declaration in arm-cci.h.

Signed-off-by: Baoyou Xie 
---
 include/linux/arm-cci.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/arm-cci.h b/include/linux/arm-cci.h
index 521ec1f..b88f6fb 100644
--- a/include/linux/arm-cci.h
+++ b/include/linux/arm-cci.h
@@ -30,8 +30,10 @@ struct device_node;

 #ifdef CONFIG_ARM_CCI
 extern bool cci_probed(void);
+extern asmlinkage void __naked cci_enable_port_for_self(void);
 #else
 static inline bool cci_probed(void) { return false; }
+static inline void ci_enable_port_for_self(void) { return; }
 #endif



Don't you think the above definitions should depend on

ARM_CCI400_PORT_CTRL than ARM_CCI ?


Suzuki


[PATCH] coresight: tmc: Cleanup operation mode handling

2016-09-14 Thread Suzuki K Poulose
The mode of operation of the TMC tracked in drvdata->mode is defined
as a local_t type. This is always checked and modified under the
drvdata->spinlock and hence we don't need local_t for it and the
unnecessary synchronisation instructions that comes with it. This
change makes the code a bit more cleaner.

Also fixes the order in which we update the drvdata->mode to
CS_MODE_DISABLED. i.e, in tmc_disable_etX_sink we change the
mode to CS_MODE_DISABLED before invoking tmc_disable_etX_hw()
which in turn depends on the mode to decide whether to dump the
trace to a buffer.

Applies on mathieu's coresight/next tree [1]

https://git.linaro.org/kernel/coresight.git next

Reported-by: Venkatesh Vivekanandan <venkatesh.vivekanan...@broadcom.com>
Cc: Mathieu Poirier <mathieu.poir...@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 drivers/hwtracing/coresight/coresight-tmc-etf.c | 32 +++--
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 30 ++-
 drivers/hwtracing/coresight/coresight-tmc.h |  2 +-
 3 files changed, 28 insertions(+), 36 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c 
b/drivers/hwtracing/coresight/coresight-tmc-etf.c
index d6941ea..c51ce45 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
@@ -70,7 +70,7 @@ static void tmc_etb_disable_hw(struct tmc_drvdata *drvdata)
 * When operating in sysFS mode the content of the buffer needs to be
 * read before the TMC is disabled.
 */
-   if (local_read(>mode) == CS_MODE_SYSFS)
+   if (drvdata->mode == CS_MODE_SYSFS)
tmc_etb_dump_hw(drvdata);
tmc_disable_hw(drvdata);
 
@@ -108,7 +108,6 @@ static int tmc_enable_etf_sink_sysfs(struct 
coresight_device *csdev, u32 mode)
int ret = 0;
bool used = false;
char *buf = NULL;
-   long val;
unsigned long flags;
struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
 
@@ -138,13 +137,12 @@ static int tmc_enable_etf_sink_sysfs(struct 
coresight_device *csdev, u32 mode)
goto out;
}
 
-   val = local_xchg(>mode, mode);
/*
 * In sysFS mode we can have multiple writers per sink.  Since this
 * sink is already enabled no memory is needed and the HW need not be
 * touched.
 */
-   if (val == CS_MODE_SYSFS)
+   if (drvdata->mode == CS_MODE_SYSFS)
goto out;
 
/*
@@ -163,6 +161,7 @@ static int tmc_enable_etf_sink_sysfs(struct 
coresight_device *csdev, u32 mode)
drvdata->buf = buf;
}
 
+   drvdata->mode = CS_MODE_SYSFS;
tmc_etb_enable_hw(drvdata);
 out:
spin_unlock_irqrestore(>spinlock, flags);
@@ -180,7 +179,6 @@ out:
 static int tmc_enable_etf_sink_perf(struct coresight_device *csdev, u32 mode)
 {
int ret = 0;
-   long val;
unsigned long flags;
struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
 
@@ -194,17 +192,17 @@ static int tmc_enable_etf_sink_perf(struct 
coresight_device *csdev, u32 mode)
goto out;
}
 
-   val = local_xchg(>mode, mode);
/*
 * In Perf mode there can be only one writer per sink.  There
 * is also no need to continue if the ETB/ETR is already operated
 * from sysFS.
 */
-   if (val != CS_MODE_DISABLED) {
+   if (drvdata->mode != CS_MODE_DISABLED) {
ret = -EINVAL;
goto out;
}
 
+   drvdata->mode = mode;
tmc_etb_enable_hw(drvdata);
 out:
spin_unlock_irqrestore(>spinlock, flags);
@@ -227,7 +225,6 @@ static int tmc_enable_etf_sink(struct coresight_device 
*csdev, u32 mode)
 
 static void tmc_disable_etf_sink(struct coresight_device *csdev)
 {
-   long val;
unsigned long flags;
struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
 
@@ -237,10 +234,11 @@ static void tmc_disable_etf_sink(struct coresight_device 
*csdev)
return;
}
 
-   val = local_xchg(>mode, CS_MODE_DISABLED);
/* Disable the TMC only if it needs to */
-   if (val != CS_MODE_DISABLED)
+   if (drvdata->mode != CS_MODE_DISABLED) {
tmc_etb_disable_hw(drvdata);
+   drvdata->mode = CS_MODE_DISABLED;
+   }
 
spin_unlock_irqrestore(>spinlock, flags);
 
@@ -260,7 +258,7 @@ static int tmc_enable_etf_link(struct coresight_device 
*csdev,
}
 
tmc_etf_enable_hw(drvdata);
-   local_set(>mode, CS_MODE_SYSFS);
+   drvdata->mode = CS_MODE_SYSFS;
spin_unlock_irqrestore(>spinlock, flags);
 
dev_info(drvdata->dev, "TMC-ETF enabled\n");
@@ -279,8 +277,8 @@ static void tmc_disable_etf_link(struct coresight_device 
*csdev,
 

Re: [PATCH] coresight: tmc: fix for trace collection bug in sysFS mode

2016-09-14 Thread Suzuki K Poulose

On 14/09/16 12:30, Venkatesh Vivekanandan wrote:



On Wed, Sep 14, 2016 at 3:26 PM, Suzuki K Poulose <suzuki.poul...@arm.com 
<mailto:suzuki.poul...@arm.com>> wrote:

On 13/09/16 16:41, Mathieu Poirier wrote:

On 13 September 2016 at 06:20, Venkatesh Vivekanandan
<venkatesh.vivekanan...@broadcom.com 
<mailto:venkatesh.vivekanan...@broadcom.com>> wrote:

tmc_etb_dump_hw is never called in sysFS mode to collect trace from
hardware, because drvdata->mode is set to CS_MODE_DISABLED at
tmc_disable_etf/etr_sink

static void tmc_etb_disable_hw(struct tmc_drvdata *drvdata)
{
.
.
if (local_read(>mode) == CS_MODE_SYSFS)
tmc_etb_dump_hw(drvdata);
.
.
}

static void tmc_disable_etf_sink(struct coresight_device *csdev)
{
   .
   .
val = local_xchg(>mode, CS_MODE_DISABLED);
/* Disable the TMC only if it needs to */
if (val != CS_MODE_DISABLED)
tmc_etb_disable_hw(drvdata);


You are correct.

   .
   .
}

 

I think we should :

1) First switch the drvdata->mode to a normal type from local_t. Using an
atomic type for mode is completely unnecessary and comes with the overhead
of barriers/synchronisation instructions, while all accesses, including 
read/write
are performed under the drvdata->spinlock. I have a patch already for this, 
which
I plan to send it soon.

and

2) Do something like :

void  tmc_disable_etX_sink()
{
if (drvdata->mode != CS_MODE_DISABLED) {
tmc_etX_disable_hw(drvdata);
drvdata->mode = CS_MODE_DISABLED;
}
}

You will fix this along with above changes?


Yes.

nit: Please fix your mail client. Do not use HTML formatted emails on mailing 
list

Cheers
Suzuki


Re: [PATCH] coresight: tmc: fix for trace collection bug in sysFS mode

2016-09-14 Thread Suzuki K Poulose

On 13/09/16 16:41, Mathieu Poirier wrote:

On 13 September 2016 at 06:20, Venkatesh Vivekanandan
 wrote:

tmc_etb_dump_hw is never called in sysFS mode to collect trace from
hardware, because drvdata->mode is set to CS_MODE_DISABLED at
tmc_disable_etf/etr_sink

static void tmc_etb_disable_hw(struct tmc_drvdata *drvdata)
{
.
.
if (local_read(>mode) == CS_MODE_SYSFS)
tmc_etb_dump_hw(drvdata);
.
.
}

static void tmc_disable_etf_sink(struct coresight_device *csdev)
{
   .
   .
val = local_xchg(>mode, CS_MODE_DISABLED);
/* Disable the TMC only if it needs to */
if (val != CS_MODE_DISABLED)
tmc_etb_disable_hw(drvdata);


You are correct.


   .
   .
}

Signed-off-by: Venkatesh Vivekanandan 
---
 drivers/hwtracing/coresight/coresight-tmc-etf.c | 9 +
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 9 +
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c 
b/drivers/hwtracing/coresight/coresight-tmc-etf.c
index 466af86..c7fb7f7 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
@@ -61,6 +61,8 @@ static void tmc_etb_dump_hw(struct tmc_drvdata *drvdata)

 static void tmc_etb_disable_hw(struct tmc_drvdata *drvdata)
 {
+   long val;
+
CS_UNLOCK(drvdata->base);

tmc_flush_and_stop(drvdata);
@@ -68,7 +70,8 @@ static void tmc_etb_disable_hw(struct tmc_drvdata *drvdata)
 * When operating in sysFS mode the content of the buffer needs to be
 * read before the TMC is disabled.
 */
-   if (local_read(>mode) == CS_MODE_SYSFS)
+   val = local_xchg(>mode, CS_MODE_DISABLED);
+   if (val == CS_MODE_SYSFS)
tmc_etb_dump_hw(drvdata);
tmc_disable_hw(drvdata);

@@ -225,7 +228,6 @@ static int tmc_enable_etf_sink(struct coresight_device 
*csdev, u32 mode)

 static void tmc_disable_etf_sink(struct coresight_device *csdev)
 {
-   long val;
unsigned long flags;
struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);

@@ -235,9 +237,8 @@ static void tmc_disable_etf_sink(struct coresight_device 
*csdev)
return;
}

-   val = local_xchg(>mode, CS_MODE_DISABLED);
/* Disable the TMC only if it needs to */
-   if (val != CS_MODE_DISABLED)
+   if (local_read(>mode) != CS_MODE_DISABLED)
tmc_etb_disable_hw(drvdata);


This would work but tmc_enable_etf_sink() and tmc_disable_etf_sink()
are no longer balanced.  Another approach would be to add a "mode"
parameter to tmc_etb_disable_hw() and so something like:

if (val != CS_MODE_DISABLED)
tmc_etb_disable_hw(drvdata, val);

In tmc_etb_disable_hw(), if mode == CS_MODE_SYSFS then we can move
ahead with the dump operation.  The same apply for ETR


I think we should :

1) First switch the drvdata->mode to a normal type from local_t. Using an
atomic type for mode is completely unnecessary and comes with the overhead
of barriers/synchronisation instructions, while all accesses, including 
read/write
are performed under the drvdata->spinlock. I have a patch already for this, 
which
I plan to send it soon.

and

2) Do something like :

void  tmc_disable_etX_sink()
{
if (drvdata->mode != CS_MODE_DISABLED) {
tmc_etX_disable_hw(drvdata);
drvdata->mode = CS_MODE_DISABLED;
}
}

Leaving the tmc_etX_disable_hw() untouched.


Suzuki





Re: [PATCH] coresight: tmc: Cleanup operation mode handling

2016-09-19 Thread Suzuki K Poulose

On 19/09/16 17:59, Suzuki K Poulose wrote:

On 16/09/16 18:07, Mathieu Poirier wrote:

On 14 September 2016 at 07:53, Suzuki K Poulose <suzuki.poul...@arm.com> wrote:



Cheers
Suzuki
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.



Bah, sorry about that. You are fine. This email is intended for the public list 
discussions.

Suzuki


Re: [PATCH] coresight: tmc: Cleanup operation mode handling

2016-09-19 Thread Suzuki K Poulose

On 16/09/16 18:07, Mathieu Poirier wrote:

On 14 September 2016 at 07:53, Suzuki K Poulose <suzuki.poul...@arm.com> wrote:

The mode of operation of the TMC tracked in drvdata->mode is defined
as a local_t type. This is always checked and modified under the
drvdata->spinlock and hence we don't need local_t for it and the
unnecessary synchronisation instructions that comes with it. This
change makes the code a bit more cleaner.

Also fixes the order in which we update the drvdata->mode to
CS_MODE_DISABLED. i.e, in tmc_disable_etX_sink we change the
mode to CS_MODE_DISABLED before invoking tmc_disable_etX_hw()
which in turn depends on the mode to decide whether to dump the
trace to a buffer.


Thank you for the patch - just a few comments below.





@@ -194,17 +192,17 @@ static int tmc_enable_etf_sink_perf(struct 
coresight_device *csdev, u32 mode)
goto out;
}

-   val = local_xchg(>mode, mode);
/*
 * In Perf mode there can be only one writer per sink.  There
 * is also no need to continue if the ETB/ETR is already operated
 * from sysFS.
 */
-   if (val != CS_MODE_DISABLED) {
+   if (drvdata->mode != CS_MODE_DISABLED) {
ret = -EINVAL;
goto out;
}

+   drvdata->mode = mode;


Given the way tmc_enable_etf_sink_perf() is called in
tmc_enable_etf_sink(), I think it is time to get rid of the 'mode'
parameter - it doesn't do anything nowadays.  Same thing for
tmc_enable_etf_sink_sysfs() and ETR.


Sure, makes sense. I will clean it up.


@@ -279,8 +277,8 @@ static void tmc_disable_etf_link(struct coresight_device 
*csdev,
return;
}

+   drvdata->mode = CS_MODE_DISABLED;
tmc_etf_disable_hw(drvdata);
-   local_set(>mode, CS_MODE_DISABLED);


I think setting the mode should come after tmc_etf_disable_hw(), as it
was before.


You're right, I will change it.


Thanks for the review. Will send the updated series soon.

Cheers
Suzuki
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.



[RESEND] [PATCH 0/8] arm64: Work around for mismatched cache line size

2016-08-18 Thread Suzuki K Poulose
This series adds a work around for systems with mismatched {I,D}-cache
line sizes. When a thread of execution gets migrated to a different CPU,
the cache line size it had cached could be larger than that of the new
CPU. This could cause data corruption issues. We work around this by

 - Dynamically patching the kernel to use the smallest line size on the
   system (from the CPU feature infrastructure)
 - Trapping the userspace access to CTR_EL0 (by clearing SCTLR_EL1.UCT) and
   emulating it with the system wide safe value of CTR.

The series also adds support for alternative code patching of adrp
instructions by adjusting the PC-relative address offset to reflect
the new PC.

The series has been tested on Juno with a hack to forced enabling
of the capability.

Applies on aarch64: for-next/core. The tree is avaiable at :

git://linux-arm.org/linux-skp.git ctr-emulation

Suzuki K Poulose (8):
  arm64: Set the safe value for L1 icache policy
  arm64: Use consistent naming for errata handling
  arm64: Rearrange CPU errata workaround checks
  arm64: insn: Add helpers for adrp offsets
  arm64: alternative: Add support for patching adrp instructions
  arm64: Introduce raw_{d,i}cache_line_size
  arm64: Refactor sysinstr exception handling
  arm64: Work around systems with mismatched cache line sizes

 arch/arm64/include/asm/assembler.h  | 45 +--
 arch/arm64/include/asm/cpufeature.h | 14 +++---
 arch/arm64/include/asm/esr.h| 56 
 arch/arm64/include/asm/insn.h   |  4 ++
 arch/arm64/include/asm/sysreg.h |  1 +
 arch/arm64/kernel/alternative.c | 13 ++
 arch/arm64/kernel/asm-offsets.c |  2 +
 arch/arm64/kernel/cpu_errata.c  | 26 ++-
 arch/arm64/kernel/cpufeature.c  | 44 ++-
 arch/arm64/kernel/cpuinfo.c |  2 -
 arch/arm64/kernel/hibernate-asm.S   |  2 +-
 arch/arm64/kernel/insn.c| 13 ++
 arch/arm64/kernel/relocate_kernel.S |  2 +-
 arch/arm64/kernel/smp.c |  8 +++-
 arch/arm64/kernel/traps.c   | 87 ++---
 15 files changed, 264 insertions(+), 55 deletions(-)

-- 
2.7.4



[PATCH 5/8] arm64: alternative: Add support for patching adrp instructions

2016-08-18 Thread Suzuki K Poulose
adrp uses PC-relative address offset to a page (of 4K size) of
a symbol. If it appears in an alternative code patched in, we
should adjust the offset to reflect the address where it will
be run from. This patch adds support for fixing the offset
for adrp instructions.

Cc: Will Deacon <will.dea...@arm.com>
Cc: Marc Zyngier <marc.zyng...@arm.com>
Cc: Andre Przywara <andre.przyw...@arm.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/kernel/alternative.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
index d2ee1b2..71c6962 100644
--- a/arch/arm64/kernel/alternative.c
+++ b/arch/arm64/kernel/alternative.c
@@ -80,6 +80,19 @@ static u32 get_alt_insn(struct alt_instr *alt, u32 *insnptr, 
u32 *altinsnptr)
offset = target - (unsigned long)insnptr;
insn = aarch64_set_branch_offset(insn, offset);
}
+   } else if (aarch64_insn_is_adrp(insn)) {
+   s32 orig_offset, new_offset;
+   unsigned long target;
+
+   /*
+* If we're replacing an adrp instruction, which uses 
PC-relative
+* immediate addressing, adjust the offset to reflect the new
+* PC. adrp operates on 4K aligned addresses.
+*/
+   orig_offset  = aarch64_insn_adrp_get_offset(insn);
+   target = ((unsigned long)altinsnptr & ~0xfffUL) + orig_offset;
+   new_offset = target - ((unsigned long)insnptr & ~0xfffUL);
+   insn = aarch64_insn_adrp_set_offset(insn, new_offset);
}
 
return insn;
-- 
2.7.4



[PATCH 8/8] arm64: Work around systems with mismatched cache line sizes

2016-08-18 Thread Suzuki K Poulose
Systems with differing CPU i-cache/d-cache line sizes can cause
problems with the cache management by software when the execution
is migrated from one to another. Usually, the application reads
the cache size on a CPU and then uses that length to perform cache
operations. However, if it gets migrated to another CPU with a smaller
cache line size, things could go completely wrong. To prevent such
cases, always use the smallest cache line size among the CPUs. The
kernel CPU feature infrastructure already keeps track of the safe
value for all CPUID registers including CTR. This patch works around
the problem by :

For kernel, dynamically patch the kernel to read the cache size
from the system wide copy of CTR_EL0.

For applications, trap read accesses to CTR_EL0 (by clearing the SCTLR.UCT)
and emulate the mrs instruction to return the system wide safe value
of CTR_EL0.

For faster access (i.e, avoiding to lookup the system wide value of CTR_EL0
via read_system_reg), we keep track of the pointer to table entry for
CTR_EL0 in the CPU feature infrastructure.

Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Andre Przywara <andre.przyw...@arm.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Catalin Marinas <catalin.mari...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/include/asm/assembler.h  | 25 +++--
 arch/arm64/include/asm/cpufeature.h |  4 +++-
 arch/arm64/include/asm/esr.h|  8 
 arch/arm64/include/asm/sysreg.h |  1 +
 arch/arm64/kernel/asm-offsets.c |  2 ++
 arch/arm64/kernel/cpu_errata.c  | 22 ++
 arch/arm64/kernel/cpufeature.c  |  9 +
 arch/arm64/kernel/traps.c   | 14 ++
 8 files changed, 82 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h 
b/arch/arm64/include/asm/assembler.h
index a4bb3f5..66bd268 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -216,6 +216,21 @@ lr .reqx30 // link register
.macro  mmid, rd, rn
ldr \rd, [\rn, #MM_CONTEXT_ID]
.endm
+/*
+ * read_ctr - read CTR_EL0. If the system has mismatched
+ * cache line sizes, provide the system wide safe value.
+ */
+   .macro  read_ctr, reg
+alternative_if_not ARM64_MISMATCHED_CACHE_LINE_SIZE
+   mrs \reg, ctr_el0   // read CTR
+   nop
+   nop
+alternative_else
+   ldr_l   \reg, sys_ctr_ftr   // Read system wide safe CTR 
value
+   ldr \reg, [\reg, #ARM64_FTR_SYSVAL] // from sys_ctr_ftr->sys_val
+alternative_endif
+   .endm
+
 
 /*
  * raw_dcache_line_size - get the minimum D-cache line size on this CPU
@@ -232,7 +247,10 @@ lr .reqx30 // link register
  * dcache_line_size - get the safe D-cache line size across all CPUs
  */
.macro  dcache_line_size, reg, tmp
-   raw_dcache_line_size\reg, \tmp
+   read_ctr\tmp
+   ubfm\tmp, \tmp, #16, #19// cache line size encoding
+   mov \reg, #4// bytes per word
+   lsl \reg, \reg, \tmp// actual cache line size
.endm
 
 /*
@@ -250,7 +268,10 @@ lr .reqx30 // link register
  * icache_line_size - get the safe I-cache line size across all CPUs
  */
.macro  icache_line_size, reg, tmp
-   raw_icache_line_size\reg, \tmp
+   read_ctr\tmp
+   and \tmp, \tmp, #0xf// cache line size encoding
+   mov \reg, #4// bytes per word
+   lsl \reg, \reg, \tmp// actual cache line size
.endm
 
 /*
diff --git a/arch/arm64/include/asm/cpufeature.h 
b/arch/arm64/include/asm/cpufeature.h
index 692b8d3..e99f2af 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -37,8 +37,9 @@
 #define ARM64_WORKAROUND_CAVIUM_27456  12
 #define ARM64_HAS_32BIT_EL013
 #define ARM64_HYP_OFFSET_LOW   14
+#define ARM64_MISMATCHED_CACHE_LINE_SIZE   15
 
-#define ARM64_NCAPS15
+#define ARM64_NCAPS16
 
 #ifndef __ASSEMBLY__
 
@@ -109,6 +110,7 @@ struct arm64_cpu_capabilities {
 };
 
 extern DECLARE_BITMAP(cpu_hwcaps, ARM64_NCAPS);
+extern struct arm64_ftr_reg *sys_ctr_ftr;
 
 bool this_cpu_has_cap(unsigned int cap);
 
diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index 2a8f6c3..51aea89 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -139,6 +139,9 @@
 ((Op2) << ESR_ELx_SYS64_ISS_Op2_SHIFT) 
| \
 ((CRn) << ESR_ELx_SYS64_ISS_CRn_SHIFT) 
| \
 ((CRm) << ESR_ELx_SYS64_ISS_CRm_SHIFT))
+
+#define ESR_ELx

[PATCH 7/8] arm64: Refactor sysinstr exception handling

2016-08-18 Thread Suzuki K Poulose
Right now we trap some of the user space data cache operations
based on a few Errata (ARM 819472, 826319, 827319 and 824069).
We need to trap userspace access to CTR_EL0, if we detect mismatched
cache line size. Since both these traps share the EC, refactor
the handler a little bit to make it a bit more reader friendly.

Cc: Andre Przywara <andre.przyw...@arm.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Catalin Marinas <catalin.mari...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/include/asm/esr.h | 48 +
 arch/arm64/kernel/traps.c| 73 
 2 files changed, 95 insertions(+), 26 deletions(-)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index f772e15..2a8f6c3 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -109,6 +109,54 @@
((ESR_ELx_EC_BRK64 << ESR_ELx_EC_SHIFT) | ESR_ELx_IL |  \
 ((imm) & 0x))
 
+/* ISS field definitions for System instruction traps */
+#define ESR_ELx_SYS64_ISS_RES0_SHIFT   22
+#define ESR_ELx_SYS64_ISS_RES0_MASK(UL(0x7) << 
ESR_ELx_SYS64_ISS_RES0_SHIFT)
+#define ESR_ELx_SYS64_ISS_DIR_MASK 0x1
+#define ESR_ELx_SYS64_ISS_DIR_READ 0x1
+#define ESR_ELx_SYS64_ISS_DIR_WRITE0x0
+
+#define ESR_ELx_SYS64_ISS_RT_SHIFT 5
+#define ESR_ELx_SYS64_ISS_RT_MASK  (UL(0x1f) << ESR_ELx_SYS64_ISS_RT_SHIFT)
+#define ESR_ELx_SYS64_ISS_CRm_SHIFT1
+#define ESR_ELx_SYS64_ISS_CRm_MASK (UL(0xf) << ESR_ELx_SYS64_ISS_CRm_SHIFT)
+#define ESR_ELx_SYS64_ISS_CRn_SHIFT10
+#define ESR_ELx_SYS64_ISS_CRn_MASK (UL(0xf) << ESR_ELx_SYS64_ISS_CRn_SHIFT)
+#define ESR_ELx_SYS64_ISS_Op1_SHIFT14
+#define ESR_ELx_SYS64_ISS_Op1_MASK (UL(0x7) << ESR_ELx_SYS64_ISS_Op1_SHIFT)
+#define ESR_ELx_SYS64_ISS_Op2_SHIFT17
+#define ESR_ELx_SYS64_ISS_Op2_MASK (UL(0x7) << ESR_ELx_SYS64_ISS_Op2_SHIFT)
+#define ESR_ELx_SYS64_ISS_Op0_SHIFT20
+#define ESR_ELx_SYS64_ISS_Op0_MASK (UL(0x3) << ESR_ELx_SYS64_ISS_Op0_SHIFT)
+#define ESR_ELx_SYS64_ISS_SYS_MASK (ESR_ELx_SYS64_ISS_Op0_MASK | \
+ESR_ELx_SYS64_ISS_Op1_MASK | \
+ESR_ELx_SYS64_ISS_Op2_MASK | \
+ESR_ELx_SYS64_ISS_CRn_MASK | \
+ESR_ELx_SYS64_ISS_CRm_MASK)
+#define ESR_ELx_SYS64_ISS_SYS_VAL(Op0, Op1, Op2, CRn, CRm) \
+   (((Op0) << ESR_ELx_SYS64_ISS_Op0_SHIFT) 
| \
+((Op1) << ESR_ELx_SYS64_ISS_Op1_SHIFT) 
| \
+((Op2) << ESR_ELx_SYS64_ISS_Op2_SHIFT) 
| \
+((CRn) << ESR_ELx_SYS64_ISS_CRn_SHIFT) 
| \
+((CRm) << ESR_ELx_SYS64_ISS_CRm_SHIFT))
+/*
+ * User space cache operations have the following sysreg encoding
+ * in System instructions.
+ * Op0=1, Op1=3, Op2=1, CRn=7, CRm={ 5, 10, 11, 14 }, WRITE (L=0)
+ */
+#define ESR_ELx_SYS64_ISS_CRm_DC_CIVAC 14
+#define ESR_ELx_SYS64_ISS_CRm_DC_CVAU  11
+#define ESR_ELx_SYS64_ISS_CRm_DC_CVAC  10
+#define ESR_ELx_SYS64_ISS_CRm_IC_IVAU  5
+
+#define ESR_ELx_SYS64_ISS_U_CACHE_OP_MASK  (ESR_ELx_SYS64_ISS_Op0_MASK | \
+ESR_ELx_SYS64_ISS_Op1_MASK | \
+ESR_ELx_SYS64_ISS_Op2_MASK | \
+ESR_ELx_SYS64_ISS_CRn_MASK | \
+ESR_ELx_SYS64_ISS_DIR_MASK)
+#define ESR_ELx_SYS64_ISS_U_CACHE_OP_VAL \
+   (ESR_ELx_SYS64_ISS_SYS_VAL(1, 3, 1, 7, 0) | \
+ESR_ELx_SYS64_ISS_DIR_WRITE)
 #ifndef __ASSEMBLY__
 #include 
 
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index e04f838..93c5287 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -447,36 +447,29 @@ void cpu_enable_cache_maint_trap(void *__unused)
: "=r" (res)\
: "r" (address), "i" (-EFAULT) )
 
-asmlinkage void __exception do_sysinstr(unsigned int esr, struct pt_regs *regs)
+static void user_cache_maint_handler(unsigned int esr, struct pt_regs *regs)
 {
unsigned long address;
-   int ret;
+   int rt = (esr & ESR_ELx_SYS64_ISS_RT_MASK) >> 
ESR_ELx_SYS64_ISS_RT_SHIFT;
+   int crm = (esr & ESR_ELx_SYS64_ISS_CRm_MASK) >> 
ESR_ELx_SYS64_ISS_CRm_SHIFT;
+   int ret = 0;
 
-   /* if this is a write with: Op0=1, Op2=1, Op1=3, CRn=7 */
-   if ((esr & 0x01fffc01) == 0x0012dc00) {
-   int rt = (esr >> 5) & 0x1f;
- 

[PATCH 1/8] arm64: Set the safe value for L1 icache policy

2016-08-18 Thread Suzuki K Poulose
Right now we use 0 as the safe value for CTR_EL0:L1Ip, which is
not defined at the moment. The safer value for the L1Ip should be
the weakest of the policies, which happens to be AIVIVT. While at it,
fix the comment about safe_val.

Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Catalin Marinas <catalin.mari...@arm.com>
Cc: Will Deacon <will.dea...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/include/asm/cpufeature.h | 2 +-
 arch/arm64/kernel/cpufeature.c  | 5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/cpufeature.h 
b/arch/arm64/include/asm/cpufeature.h
index 7099f26..f6f5e49 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -63,7 +63,7 @@ struct arm64_ftr_bits {
enum ftr_type   type;
u8  shift;
u8  width;
-   s64 safe_val; /* safe value for discrete features */
+   s64 safe_val; /* safe value for FTR_EXACT features */
 };
 
 /*
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 62272ea..43f73e0 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -147,9 +147,10 @@ static struct arm64_ftr_bits ftr_ctr[] = {
ARM64_FTR_BITS(FTR_STRICT, FTR_LOWER_SAFE, 16, 4, 1),   /* DminLine */
/*
 * Linux can handle differing I-cache policies. Userspace JITs will
-* make use of *minLine
+* make use of *minLine.
+* If we have differing I-cache policies, report it as the weakest - 
AIVIVT.
 */
-   ARM64_FTR_BITS(FTR_NONSTRICT, FTR_EXACT, 14, 2, 0), /* L1Ip */
+   ARM64_FTR_BITS(FTR_NONSTRICT, FTR_EXACT, 14, 2, ICACHE_POLICY_AIVIVT),  
/* L1Ip */
ARM64_FTR_BITS(FTR_STRICT, FTR_EXACT, 4, 10, 0),/* RAZ */
ARM64_FTR_BITS(FTR_STRICT, FTR_LOWER_SAFE, 0, 4, 0),/* IminLine */
ARM64_FTR_END,
-- 
2.7.4



[PATCH 6/8] arm64: Introduce raw_{d,i}cache_line_size

2016-08-18 Thread Suzuki K Poulose
On systems with mismatched i/d cache min line sizes, we need to use
the smallest size possible across all CPUs. This will be done by fetching
the system wide safe value from CPU feature infrastructure.
However the some special users(e.g kexec, hibernate) would need the line
size on the CPU (rather than the system wide), when the system wide
feature may not be accessible. Provide another helper which will fetch
cache line size on the current CPU.

Cc: James Morse <james.mo...@arm.com>
Cc: Geoff Levand <ge...@infradead.org>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/include/asm/assembler.h  | 24 
 arch/arm64/kernel/hibernate-asm.S   |  2 +-
 arch/arm64/kernel/relocate_kernel.S |  2 +-
 3 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h 
b/arch/arm64/include/asm/assembler.h
index d5025c6..a4bb3f5 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -218,9 +218,10 @@ lr .reqx30 // link register
.endm
 
 /*
- * dcache_line_size - get the minimum D-cache line size from the CTR register.
+ * raw_dcache_line_size - get the minimum D-cache line size on this CPU
+ * from the CTR register.
  */
-   .macro  dcache_line_size, reg, tmp
+   .macro  raw_dcache_line_size, reg, tmp
mrs \tmp, ctr_el0   // read CTR
ubfm\tmp, \tmp, #16, #19// cache line size encoding
mov \reg, #4// bytes per word
@@ -228,9 +229,17 @@ lr .reqx30 // link register
.endm
 
 /*
- * icache_line_size - get the minimum I-cache line size from the CTR register.
+ * dcache_line_size - get the safe D-cache line size across all CPUs
  */
-   .macro  icache_line_size, reg, tmp
+   .macro  dcache_line_size, reg, tmp
+   raw_dcache_line_size\reg, \tmp
+   .endm
+
+/*
+ * raw_icache_line_size - get the minimum I-cache line size on this CPU
+ * from the CTR register.
+ */
+   .macro  raw_icache_line_size, reg, tmp
mrs \tmp, ctr_el0   // read CTR
and \tmp, \tmp, #0xf// cache line size encoding
mov \reg, #4// bytes per word
@@ -238,6 +247,13 @@ lr .reqx30 // link register
.endm
 
 /*
+ * icache_line_size - get the safe I-cache line size across all CPUs
+ */
+   .macro  icache_line_size, reg, tmp
+   raw_icache_line_size\reg, \tmp
+   .endm
+
+/*
  * tcr_set_idmap_t0sz - update TCR.T0SZ so that we can load the ID map
  */
.macro  tcr_set_idmap_t0sz, valreg, tmpreg
diff --git a/arch/arm64/kernel/hibernate-asm.S 
b/arch/arm64/kernel/hibernate-asm.S
index 46f29b6..4ebc6a1 100644
--- a/arch/arm64/kernel/hibernate-asm.S
+++ b/arch/arm64/kernel/hibernate-asm.S
@@ -96,7 +96,7 @@ ENTRY(swsusp_arch_suspend_exit)
 
add x1, x10, #PAGE_SIZE
/* Clean the copied page to PoU - based on flush_icache_range() */
-   dcache_line_size x2, x3
+   raw_dcache_line_size x2, x3
sub x3, x2, #1
bic x4, x10, x3
 2: dc  cvau, x4/* clean D line / unified line */
diff --git a/arch/arm64/kernel/relocate_kernel.S 
b/arch/arm64/kernel/relocate_kernel.S
index 51b73cd..ce704a4 100644
--- a/arch/arm64/kernel/relocate_kernel.S
+++ b/arch/arm64/kernel/relocate_kernel.S
@@ -34,7 +34,7 @@ ENTRY(arm64_relocate_new_kernel)
/* Setup the list loop variables. */
mov x17, x1 /* x17 = kimage_start */
mov x16, x0 /* x16 = kimage_head */
-   dcache_line_size x15, x0/* x15 = dcache line size */
+   raw_dcache_line_size x15, x0/* x15 = dcache line size */
mov x14, xzr/* x14 = entry ptr */
mov x13, xzr/* x13 = copy dest */
 
-- 
2.7.4



[PATCH 3/8] arm64: Rearrange CPU errata workaround checks

2016-08-18 Thread Suzuki K Poulose
Right now we run through the work around checks on a CPU
from __cpuinfo_store_cpu. There are some problems with that:

1) We initialise the system wide CPU feature registers only after the
Boot CPU updates its cpuinfo. Now, if a work around depends on the
variance of a CPU ID feature (e.g, check for Cache Line size mismatch),
we have no way of performing it cleanly for the boot CPU.

2) It is out of place, invoked from __cpuinfo_store_cpu() in cpuinfo.c. It
is not an obvious place for that.

This patch rearranges the CPU specific capability(aka work around) checks.

1) At the moment we use verify_local_cpu_capabilities() to check if a new
CPU has all the system advertised features. Use this for the secondary CPUs
to perform the work around check. For that we rename
  verify_local_cpu_capabilities() => check_local_cpu_capabilities()
which:

   If the system wide capabilities haven't been initialised (i.e, the CPU
   is activated at the boot), update the system wide detected work arounds.

   Otherwise (i.e a CPU hotplugged in later) verify that this CPU conforms to 
the
   system wide capabilities.

2) Boot CPU updates the work arounds from smp_prepare_boot_cpu() after we have
initialised the system wide CPU feature values.

Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Andre Przywara <andre.przyw...@arm.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Catalin Marinas <catalin.mari...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/include/asm/cpufeature.h |  4 ++--
 arch/arm64/kernel/cpufeature.c  | 30 --
 arch/arm64/kernel/cpuinfo.c |  2 --
 arch/arm64/kernel/smp.c |  8 +++-
 4 files changed, 29 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/include/asm/cpufeature.h 
b/arch/arm64/include/asm/cpufeature.h
index aadd946..692b8d3 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -193,11 +193,11 @@ void __init setup_cpu_features(void);
 void update_cpu_capabilities(const struct arm64_cpu_capabilities *caps,
const char *info);
 void enable_cpu_capabilities(const struct arm64_cpu_capabilities *caps);
+void check_local_cpu_capabilities(void);
+
 void update_cpu_errata_workarounds(void);
 void __init enable_errata_workarounds(void);
-
 void verify_local_cpu_errata_workarounds(void);
-void verify_local_cpu_capabilities(void);
 
 u64 read_system_reg(u32 id);
 
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index a591c35..fcf87ca 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1005,23 +1005,33 @@ verify_local_cpu_features(const struct 
arm64_cpu_capabilities *caps)
  * cannot do anything to fix it up and could cause unexpected failures. So
  * we park the CPU.
  */
-void verify_local_cpu_capabilities(void)
+static void verify_local_cpu_capabilities(void)
 {
+   verify_local_cpu_errata_workarounds();
+   verify_local_cpu_features(arm64_features);
+   verify_local_elf_hwcaps(arm64_elf_hwcaps);
+   if (system_supports_32bit_el0())
+   verify_local_elf_hwcaps(compat_elf_hwcaps);
+}
 
+void check_local_cpu_capabilities(void)
+{
+   /*
+* All secondary CPUs should conform to the early CPU features
+* in use by the kernel based on boot CPU.
+*/
check_early_cpu_features();
 
/*
-* If we haven't computed the system capabilities, there is nothing
-* to verify.
+* If we haven't finalised the system capabilities, this CPU gets
+* a chance to update the errata work arounds.
+* Otherwise, this CPU should verify that it has all the system
+* advertised capabilities.
 */
if (!sys_caps_initialised)
-   return;
-
-   verify_local_cpu_errata_workarounds();
-   verify_local_cpu_features(arm64_features);
-   verify_local_elf_hwcaps(arm64_elf_hwcaps);
-   if (system_supports_32bit_el0())
-   verify_local_elf_hwcaps(compat_elf_hwcaps);
+   update_cpu_errata_workarounds();
+   else
+   verify_local_cpu_capabilities();
 }
 
 static void __init setup_feature_capabilities(void)
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index 4fa7b73..b3d5b3e 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -363,8 +363,6 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info)
}
 
cpuinfo_detect_icache_policy(info);
-
-   update_cpu_errata_workarounds();
 }
 
 void cpuinfo_store_cpu(void)
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 76a6d92..289c43b 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -239,7 +239,7 @@ asmlinkage void secondary_start_kernel(void)
 * this CPU ticks all of those. If it doesn't, the CPU wi

[PATCH 2/8] arm64: Use consistent naming for errata handling

2016-08-18 Thread Suzuki K Poulose
This is a cosmetic change to rename the functions dealing with
the errata work arounds to be more consistent with their naming.

1) check_local_cpu_errata() => update_cpu_errata_workarounds()
check_local_cpu_errata() actually updates the system's errata work
arounds. So rename it to reflect the same.

2) verify_local_cpu_errata() => verify_local_cpu_errata_workarounds()
Use errata_workarounds instead of _errata.

Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/include/asm/cpufeature.h | 4 ++--
 arch/arm64/kernel/cpu_errata.c  | 4 ++--
 arch/arm64/kernel/cpufeature.c  | 2 +-
 arch/arm64/kernel/cpuinfo.c | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/cpufeature.h 
b/arch/arm64/include/asm/cpufeature.h
index f6f5e49..aadd946 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -193,10 +193,10 @@ void __init setup_cpu_features(void);
 void update_cpu_capabilities(const struct arm64_cpu_capabilities *caps,
const char *info);
 void enable_cpu_capabilities(const struct arm64_cpu_capabilities *caps);
-void check_local_cpu_errata(void);
+void update_cpu_errata_workarounds(void);
 void __init enable_errata_workarounds(void);
 
-void verify_local_cpu_errata(void);
+void verify_local_cpu_errata_workarounds(void);
 void verify_local_cpu_capabilities(void);
 
 u64 read_system_reg(u32 id);
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index 82b0fc2..5836b3d 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -116,7 +116,7 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
  * and the related information is freed soon after. If the new CPU requires
  * an errata not detected at boot, fail this CPU.
  */
-void verify_local_cpu_errata(void)
+void verify_local_cpu_errata_workarounds(void)
 {
const struct arm64_cpu_capabilities *caps = arm64_errata;
 
@@ -131,7 +131,7 @@ void verify_local_cpu_errata(void)
}
 }
 
-void check_local_cpu_errata(void)
+void update_cpu_errata_workarounds(void)
 {
update_cpu_capabilities(arm64_errata, "enabling workaround for");
 }
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 43f73e0..a591c35 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1017,7 +1017,7 @@ void verify_local_cpu_capabilities(void)
if (!sys_caps_initialised)
return;
 
-   verify_local_cpu_errata();
+   verify_local_cpu_errata_workarounds();
verify_local_cpu_features(arm64_features);
verify_local_elf_hwcaps(arm64_elf_hwcaps);
if (system_supports_32bit_el0())
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index ed1b84f..4fa7b73 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -364,7 +364,7 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info)
 
cpuinfo_detect_icache_policy(info);
 
-   check_local_cpu_errata();
+   update_cpu_errata_workarounds();
 }
 
 void cpuinfo_store_cpu(void)
-- 
2.7.4



[PATCH 4/8] arm64: insn: Add helpers for adrp offsets

2016-08-18 Thread Suzuki K Poulose
Adds helpers for decoding/encoding the PC relative addresses for adrp.
This will be used for handling dynamic patching of 'adrp' instructions
in alternative code patching.

Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Catalin Marinas <catalin.mari...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/include/asm/insn.h |  4 
 arch/arm64/kernel/insn.c  | 13 +
 2 files changed, 17 insertions(+)

diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
index 1dbaa90..dffb0364 100644
--- a/arch/arm64/include/asm/insn.h
+++ b/arch/arm64/include/asm/insn.h
@@ -247,6 +247,7 @@ static __always_inline u32 
aarch64_insn_get_##abbr##_value(void) \
 { return (val); }
 
 __AARCH64_INSN_FUNCS(adr_adrp, 0x1F00, 0x1000)
+__AARCH64_INSN_FUNCS(adrp, 0x9F00, 0x9000)
 __AARCH64_INSN_FUNCS(prfm_lit, 0xFF00, 0xD800)
 __AARCH64_INSN_FUNCS(str_reg,  0x3FE0EC00, 0x38206800)
 __AARCH64_INSN_FUNCS(ldr_reg,  0x3FE0EC00, 0x38606800)
@@ -398,6 +399,9 @@ int aarch64_insn_patch_text_nosync(void *addr, u32 insn);
 int aarch64_insn_patch_text_sync(void *addrs[], u32 insns[], int cnt);
 int aarch64_insn_patch_text(void *addrs[], u32 insns[], int cnt);
 
+s32 aarch64_insn_adrp_get_offset(u32 insn);
+u32 aarch64_insn_adrp_set_offset(u32 insn, s32 offset);
+
 bool aarch32_insn_is_wide(u32 insn);
 
 #define A32_RN_OFFSET  16
diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
index 63f9432..f022af4 100644
--- a/arch/arm64/kernel/insn.c
+++ b/arch/arm64/kernel/insn.c
@@ -1202,6 +1202,19 @@ u32 aarch64_set_branch_offset(u32 insn, s32 offset)
BUG();
 }
 
+s32 aarch64_insn_adrp_get_offset(u32 insn)
+{
+   BUG_ON(!aarch64_insn_is_adrp(insn));
+   return aarch64_insn_decode_immediate(AARCH64_INSN_IMM_ADR, insn) << 12;
+}
+
+u32 aarch64_insn_adrp_set_offset(u32 insn, s32 offset)
+{
+   BUG_ON(!aarch64_insn_is_adrp(insn));
+   return aarch64_insn_encode_immediate(AARCH64_INSN_IMM_ADR, insn,
+   offset >> 12);
+}
+
 /*
  * Extract the Op/CR data from a msr/mrs instruction.
  */
-- 
2.7.4



Re: [PATCH v2 9/9] arm64: Work around systems with mismatched cache line sizes

2016-08-26 Thread Suzuki K Poulose

On 26/08/16 14:04, Suzuki K Poulose wrote:

On 26/08/16 12:03, Ard Biesheuvel wrote:

Hello Suzuki,





For faster access (i.e, avoiding to lookup the system wide value of CTR_EL0
via read_system_reg), we keep track of the pointer to table entry for
CTR_EL0 in the CPU feature infrastructure.



IIUC it is the runtime sorting of the arm64_ftr_reg array that
requires you to stash a pointer to CTR_EL0's entry somewhere, so that
you can dereference it without doing the bsearch.


Correct.








IMO, this is a pattern that we should avoid: you are introducing one
instance now, which will make it hard to say no to the next one in the
future. Isn't there a better way to organize the arm64_ftr_reg array
that allows us to reference entries directly? Ideally, a way that gets
rid of the runtime sorting, since I don't think that is a good
replacement for developer discipline anyway (although I should have
spoken up when that was first introduced) Or am I missing something
here?


I had some form of direct access to the feature register in one of
the versions [0], but was dropped based on Catalin's suggestion at [1].


Forgot to add, [0] wouldn't solve this issue cleanly either. It would simply
speed up the read_system_reg(). So we do need a call to read_system_reg()
from assembly code, which makes it a little bit tricky.

Suzuki




[0] https://lkml.org/lkml/2015/10/5/504
[1] https://lkml.org/lkml/2015/10/7/558


Suzuki






Re: [PATCH v2 9/9] arm64: Work around systems with mismatched cache line sizes

2016-08-26 Thread Suzuki K Poulose

On 26/08/16 12:03, Ard Biesheuvel wrote:

Hello Suzuki,





For faster access (i.e, avoiding to lookup the system wide value of CTR_EL0
via read_system_reg), we keep track of the pointer to table entry for
CTR_EL0 in the CPU feature infrastructure.



IIUC it is the runtime sorting of the arm64_ftr_reg array that
requires you to stash a pointer to CTR_EL0's entry somewhere, so that
you can dereference it without doing the bsearch.


Correct.



IMO, this is a pattern that we should avoid: you are introducing one
instance now, which will make it hard to say no to the next one in the
future. Isn't there a better way to organize the arm64_ftr_reg array
that allows us to reference entries directly? Ideally, a way that gets
rid of the runtime sorting, since I don't think that is a good
replacement for developer discipline anyway (although I should have
spoken up when that was first introduced) Or am I missing something
here?


I had some form of direct access to the feature register in one of
the versions [0], but was dropped based on Catalin's suggestion at [1].


[0] https://lkml.org/lkml/2015/10/5/504
[1] https://lkml.org/lkml/2015/10/7/558


Suzuki




Re: [PATCH 2/2] arm64: Use static keys for CPU features

2016-08-26 Thread Suzuki K Poulose

On 25/08/16 18:26, Catalin Marinas wrote:

This patch adds static keys transparently for all the cpu_hwcaps
features by implementing an array of default-false static keys and
enabling them when detected. The cpus_have_cap() check uses the static
keys if the feature being checked is a constant, otherwise the compiler
generates the bitmap test.

Because of the early call to static_branch_enable() via
check_local_cpu_errata() -> update_cpu_capabilities(), the jump labels
are initialised in cpuinfo_store_boot_cpu().

Cc: Will Deacon <will.dea...@arm.com>
Cc: Suzuki K. Poulose <suzuki.poul...@arm.com>
Signed-off-by: Catalin Marinas <catalin.mari...@arm.com>
---




 static inline int __attribute_const__
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 62272eac1352..919b2d0d68ae 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -46,6 +46,9 @@ unsigned int compat_elf_hwcap2 __read_mostly;

 DECLARE_BITMAP(cpu_hwcaps, ARM64_NCAPS);

+DEFINE_STATIC_KEY_ARRAY_FALSE(cpu_hwcap_keys, ARM64_NCAPS);
+EXPORT_SYMBOL(cpu_hwcap_keys);
+
 #define __ARM64_FTR_BITS(SIGNED, STRICT, TYPE, SHIFT, WIDTH, SAFE_VAL) \
{   \
.sign = SIGNED, \
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index ed1b84fe6925..6a141e399daf 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -377,6 +377,12 @@ void cpuinfo_store_cpu(void)
 void __init cpuinfo_store_boot_cpu(void)
 {
struct cpuinfo_arm64 *info = _cpu(cpu_data, 0);
+
+   /*
+* Initialise the static keys early as they may be enabled by
+* check_local_cpu_errata() -> update_cpu_capabilities().
+*/
+   jump_label_init();
__cpuinfo_store_cpu(info);


Catalin,

Just a heads up. I have a patch [1] which moves the "check_local_cpu_errata()"
around to smp_prepare_boot_cpu(). This patch should still work fine with that
case. Only that may be we could move the jump_lable_init() to 
smp_prepare_boot_cpu(),
before we call "update_cpu_errata_work_arounds()" for Boot CPU.

Either way, this will be useful for some of the other feature checks.

Thanks
Suzuki

[1] 
https://lkml.kernel.org/r/1471525832-21209-4-git-send-email-suzuki.poul...@arm.com



[PATCH v2 7/9] arm64: Introduce raw_{d,i}cache_line_size

2016-08-26 Thread Suzuki K Poulose
On systems with mismatched i/d cache min line sizes, we need to use
the smallest size possible across all CPUs. This will be done by fetching
the system wide safe value from CPU feature infrastructure.
However the some special users(e.g kexec, hibernate) would need the line
size on the CPU (rather than the system wide), when either the system
wide feature may not be accessible or it is guranteed that the caller
executes with a gurantee of no migration.
Provide another helper which will fetch cache line size on the current CPU.

Acked-by: James Morse <james.mo...@arm.com>
Reviewed-by: Geoff Levand <ge...@infradead.org>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/include/asm/assembler.h  | 24 
 arch/arm64/kernel/hibernate-asm.S   |  2 +-
 arch/arm64/kernel/relocate_kernel.S |  2 +-
 3 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h 
b/arch/arm64/include/asm/assembler.h
index d5025c6..a4bb3f5 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -218,9 +218,10 @@ lr .reqx30 // link register
.endm
 
 /*
- * dcache_line_size - get the minimum D-cache line size from the CTR register.
+ * raw_dcache_line_size - get the minimum D-cache line size on this CPU
+ * from the CTR register.
  */
-   .macro  dcache_line_size, reg, tmp
+   .macro  raw_dcache_line_size, reg, tmp
mrs \tmp, ctr_el0   // read CTR
ubfm\tmp, \tmp, #16, #19// cache line size encoding
mov \reg, #4// bytes per word
@@ -228,9 +229,17 @@ lr .reqx30 // link register
.endm
 
 /*
- * icache_line_size - get the minimum I-cache line size from the CTR register.
+ * dcache_line_size - get the safe D-cache line size across all CPUs
  */
-   .macro  icache_line_size, reg, tmp
+   .macro  dcache_line_size, reg, tmp
+   raw_dcache_line_size\reg, \tmp
+   .endm
+
+/*
+ * raw_icache_line_size - get the minimum I-cache line size on this CPU
+ * from the CTR register.
+ */
+   .macro  raw_icache_line_size, reg, tmp
mrs \tmp, ctr_el0   // read CTR
and \tmp, \tmp, #0xf// cache line size encoding
mov \reg, #4// bytes per word
@@ -238,6 +247,13 @@ lr .reqx30 // link register
.endm
 
 /*
+ * icache_line_size - get the safe I-cache line size across all CPUs
+ */
+   .macro  icache_line_size, reg, tmp
+   raw_icache_line_size\reg, \tmp
+   .endm
+
+/*
  * tcr_set_idmap_t0sz - update TCR.T0SZ so that we can load the ID map
  */
.macro  tcr_set_idmap_t0sz, valreg, tmpreg
diff --git a/arch/arm64/kernel/hibernate-asm.S 
b/arch/arm64/kernel/hibernate-asm.S
index 46f29b6..4ebc6a1 100644
--- a/arch/arm64/kernel/hibernate-asm.S
+++ b/arch/arm64/kernel/hibernate-asm.S
@@ -96,7 +96,7 @@ ENTRY(swsusp_arch_suspend_exit)
 
add x1, x10, #PAGE_SIZE
/* Clean the copied page to PoU - based on flush_icache_range() */
-   dcache_line_size x2, x3
+   raw_dcache_line_size x2, x3
sub x3, x2, #1
bic x4, x10, x3
 2: dc  cvau, x4/* clean D line / unified line */
diff --git a/arch/arm64/kernel/relocate_kernel.S 
b/arch/arm64/kernel/relocate_kernel.S
index 51b73cd..ce704a4 100644
--- a/arch/arm64/kernel/relocate_kernel.S
+++ b/arch/arm64/kernel/relocate_kernel.S
@@ -34,7 +34,7 @@ ENTRY(arm64_relocate_new_kernel)
/* Setup the list loop variables. */
mov x17, x1 /* x17 = kimage_start */
mov x16, x0 /* x16 = kimage_head */
-   dcache_line_size x15, x0/* x15 = dcache line size */
+   raw_dcache_line_size x15, x0/* x15 = dcache line size */
mov x14, xzr/* x14 = entry ptr */
mov x13, xzr/* x13 = copy dest */
 
-- 
2.7.4



[PATCH v2 8/9] arm64: Refactor sysinstr exception handling

2016-08-26 Thread Suzuki K Poulose
Right now we trap some of the user space data cache operations
based on a few Errata (ARM 819472, 826319, 827319 and 824069).
We need to trap userspace access to CTR_EL0, if we detect mismatched
cache line size. Since both these traps share the EC, refactor
the handler a little bit to make it a bit more reader friendly.

Cc: Andre Przywara <andre.przyw...@arm.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Catalin Marinas <catalin.mari...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>

---
Changes since V1:
 - Add comments for iss field definitions for other exceptions.
---
 arch/arm64/include/asm/esr.h | 76 ++--
 arch/arm64/kernel/traps.c| 73 +++---
 2 files changed, 114 insertions(+), 35 deletions(-)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index f772e15..9875b32 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -78,6 +78,23 @@
 
 #define ESR_ELx_IL (UL(1) << 25)
 #define ESR_ELx_ISS_MASK   (ESR_ELx_IL - 1)
+
+/* ISS field definitions shared by different classes */
+#define ESR_ELx_WNR(UL(1) << 6)
+
+/* Shared ISS field definitions for Data/Instruction aborts */
+#define ESR_ELx_EA (UL(1) << 9)
+#define ESR_ELx_S1PTW  (UL(1) << 7)
+
+/* Shared ISS fault status code(IFSC/DFSC) for Data/Instruction aborts */
+#define ESR_ELx_FSC(0x3F)
+#define ESR_ELx_FSC_TYPE   (0x3C)
+#define ESR_ELx_FSC_EXTABT (0x10)
+#define ESR_ELx_FSC_ACCESS (0x08)
+#define ESR_ELx_FSC_FAULT  (0x04)
+#define ESR_ELx_FSC_PERM   (0x0C)
+
+/* ISS field definitions for Data Aborts */
 #define ESR_ELx_ISV(UL(1) << 24)
 #define ESR_ELx_SAS_SHIFT  (22)
 #define ESR_ELx_SAS(UL(3) << ESR_ELx_SAS_SHIFT)
@@ -86,16 +103,9 @@
 #define ESR_ELx_SRT_MASK   (UL(0x1F) << ESR_ELx_SRT_SHIFT)
 #define ESR_ELx_SF (UL(1) << 15)
 #define ESR_ELx_AR (UL(1) << 14)
-#define ESR_ELx_EA (UL(1) << 9)
 #define ESR_ELx_CM (UL(1) << 8)
-#define ESR_ELx_S1PTW  (UL(1) << 7)
-#define ESR_ELx_WNR(UL(1) << 6)
-#define ESR_ELx_FSC(0x3F)
-#define ESR_ELx_FSC_TYPE   (0x3C)
-#define ESR_ELx_FSC_EXTABT (0x10)
-#define ESR_ELx_FSC_ACCESS (0x08)
-#define ESR_ELx_FSC_FAULT  (0x04)
-#define ESR_ELx_FSC_PERM   (0x0C)
+
+/* ISS field definitions for exceptions taken in to Hyp */
 #define ESR_ELx_CV (UL(1) << 24)
 #define ESR_ELx_COND_SHIFT (20)
 #define ESR_ELx_COND_MASK  (UL(0xF) << ESR_ELx_COND_SHIFT)
@@ -109,6 +119,54 @@
((ESR_ELx_EC_BRK64 << ESR_ELx_EC_SHIFT) | ESR_ELx_IL |  \
 ((imm) & 0x))
 
+/* ISS field definitions for System instruction traps */
+#define ESR_ELx_SYS64_ISS_RES0_SHIFT   22
+#define ESR_ELx_SYS64_ISS_RES0_MASK(UL(0x7) << 
ESR_ELx_SYS64_ISS_RES0_SHIFT)
+#define ESR_ELx_SYS64_ISS_DIR_MASK 0x1
+#define ESR_ELx_SYS64_ISS_DIR_READ 0x1
+#define ESR_ELx_SYS64_ISS_DIR_WRITE0x0
+
+#define ESR_ELx_SYS64_ISS_RT_SHIFT 5
+#define ESR_ELx_SYS64_ISS_RT_MASK  (UL(0x1f) << ESR_ELx_SYS64_ISS_RT_SHIFT)
+#define ESR_ELx_SYS64_ISS_CRM_SHIFT1
+#define ESR_ELx_SYS64_ISS_CRM_MASK (UL(0xf) << ESR_ELx_SYS64_ISS_CRM_SHIFT)
+#define ESR_ELx_SYS64_ISS_CRN_SHIFT10
+#define ESR_ELx_SYS64_ISS_CRN_MASK (UL(0xf) << ESR_ELx_SYS64_ISS_CRN_SHIFT)
+#define ESR_ELx_SYS64_ISS_OP1_SHIFT14
+#define ESR_ELx_SYS64_ISS_OP1_MASK (UL(0x7) << ESR_ELx_SYS64_ISS_OP1_SHIFT)
+#define ESR_ELx_SYS64_ISS_OP2_SHIFT17
+#define ESR_ELx_SYS64_ISS_OP2_MASK (UL(0x7) << ESR_ELx_SYS64_ISS_OP2_SHIFT)
+#define ESR_ELx_SYS64_ISS_OP0_SHIFT20
+#define ESR_ELx_SYS64_ISS_OP0_MASK (UL(0x3) << ESR_ELx_SYS64_ISS_OP0_SHIFT)
+#define ESR_ELx_SYS64_ISS_SYS_MASK (ESR_ELx_SYS64_ISS_OP0_MASK | \
+ESR_ELx_SYS64_ISS_OP1_MASK | \
+ESR_ELx_SYS64_ISS_OP2_MASK | \
+ESR_ELx_SYS64_ISS_CRN_MASK | \
+ESR_ELx_SYS64_ISS_CRM_MASK)
+#define ESR_ELx_SYS64_ISS_SYS_VAL(op0, op1, op2, crn, crm) \
+   (((op0) << ESR_ELx_SYS64_ISS_OP0_SHIFT) 
| \
+((op1) << ESR_ELx_SYS64_ISS_OP1_SHIFT) 
| \
+((op2) << ESR_ELx_SYS64_ISS_OP2_SHIFT) 
| \
+((crn) << ESR_ELx_SYS64_ISS_CRN_SHIFT) 
| \
+((crm) << ESR_ELx_SYS64_ISS_CRM_SHIFT))
+/*
+ * User space cache operations have the following sysreg e

[PATCH v2 9/9] arm64: Work around systems with mismatched cache line sizes

2016-08-26 Thread Suzuki K Poulose
Systems with differing CPU i-cache/d-cache line sizes can cause
problems with the cache management by software when the execution
is migrated from one to another. Usually, the application reads
the cache size on a CPU and then uses that length to perform cache
operations. However, if it gets migrated to another CPU with a smaller
cache line size, things could go completely wrong. To prevent such
cases, always use the smallest cache line size among the CPUs. The
kernel CPU feature infrastructure already keeps track of the safe
value for all CPUID registers including CTR. This patch works around
the problem by :

For kernel, dynamically patch the kernel to read the cache size
from the system wide copy of CTR_EL0.

For applications, trap read accesses to CTR_EL0 (by clearing the SCTLR.UCT)
and emulate the mrs instruction to return the system wide safe value
of CTR_EL0.

For faster access (i.e, avoiding to lookup the system wide value of CTR_EL0
via read_system_reg), we keep track of the pointer to table entry for
CTR_EL0 in the CPU feature infrastructure.

Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Andre Przywara <andre.przyw...@arm.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Catalin Marinas <catalin.mari...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/include/asm/assembler.h  | 25 +++--
 arch/arm64/include/asm/cpufeature.h |  4 +++-
 arch/arm64/include/asm/esr.h|  8 
 arch/arm64/include/asm/sysreg.h |  1 +
 arch/arm64/kernel/asm-offsets.c |  2 ++
 arch/arm64/kernel/cpu_errata.c  | 22 ++
 arch/arm64/kernel/cpufeature.c  |  9 +
 arch/arm64/kernel/traps.c   | 14 ++
 8 files changed, 82 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h 
b/arch/arm64/include/asm/assembler.h
index a4bb3f5..66bd268 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -216,6 +216,21 @@ lr .reqx30 // link register
.macro  mmid, rd, rn
ldr \rd, [\rn, #MM_CONTEXT_ID]
.endm
+/*
+ * read_ctr - read CTR_EL0. If the system has mismatched
+ * cache line sizes, provide the system wide safe value.
+ */
+   .macro  read_ctr, reg
+alternative_if_not ARM64_MISMATCHED_CACHE_LINE_SIZE
+   mrs \reg, ctr_el0   // read CTR
+   nop
+   nop
+alternative_else
+   ldr_l   \reg, sys_ctr_ftr   // Read system wide safe CTR 
value
+   ldr \reg, [\reg, #ARM64_FTR_SYSVAL] // from sys_ctr_ftr->sys_val
+alternative_endif
+   .endm
+
 
 /*
  * raw_dcache_line_size - get the minimum D-cache line size on this CPU
@@ -232,7 +247,10 @@ lr .reqx30 // link register
  * dcache_line_size - get the safe D-cache line size across all CPUs
  */
.macro  dcache_line_size, reg, tmp
-   raw_dcache_line_size\reg, \tmp
+   read_ctr\tmp
+   ubfm\tmp, \tmp, #16, #19// cache line size encoding
+   mov \reg, #4// bytes per word
+   lsl \reg, \reg, \tmp// actual cache line size
.endm
 
 /*
@@ -250,7 +268,10 @@ lr .reqx30 // link register
  * icache_line_size - get the safe I-cache line size across all CPUs
  */
.macro  icache_line_size, reg, tmp
-   raw_icache_line_size\reg, \tmp
+   read_ctr\tmp
+   and \tmp, \tmp, #0xf// cache line size encoding
+   mov \reg, #4// bytes per word
+   lsl \reg, \reg, \tmp// actual cache line size
.endm
 
 /*
diff --git a/arch/arm64/include/asm/cpufeature.h 
b/arch/arm64/include/asm/cpufeature.h
index 692b8d3..e99f2af 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -37,8 +37,9 @@
 #define ARM64_WORKAROUND_CAVIUM_27456  12
 #define ARM64_HAS_32BIT_EL013
 #define ARM64_HYP_OFFSET_LOW   14
+#define ARM64_MISMATCHED_CACHE_LINE_SIZE   15
 
-#define ARM64_NCAPS15
+#define ARM64_NCAPS16
 
 #ifndef __ASSEMBLY__
 
@@ -109,6 +110,7 @@ struct arm64_cpu_capabilities {
 };
 
 extern DECLARE_BITMAP(cpu_hwcaps, ARM64_NCAPS);
+extern struct arm64_ftr_reg *sys_ctr_ftr;
 
 bool this_cpu_has_cap(unsigned int cap);
 
diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index 9875b32..d14c478 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -149,6 +149,9 @@
 ((op2) << ESR_ELx_SYS64_ISS_OP2_SHIFT) 
| \
 ((crn) << ESR_ELx_SYS64_ISS_CRN_SHIFT) 
| \
 ((crm) << ESR_ELx_SYS64_ISS_CRM_SHIFT))
+
+#define ESR_ELx

[PATCH v2 5/9] arm64: insn: Add helpers for adrp offsets

2016-08-26 Thread Suzuki K Poulose
Adds helpers for decoding/encoding the PC relative addresses for adrp.
This will be used for handling dynamic patching of 'adrp' instructions
in alternative code patching.

Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Catalin Marinas <catalin.mari...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
Changes since V1:
 - Replace adr_adrp with seperate handlers for adr and adrp (Marc Zyngier)
---
 arch/arm64/include/asm/insn.h | 11 ++-
 arch/arm64/kernel/insn.c  | 13 +
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
index 1dbaa90..bc85366 100644
--- a/arch/arm64/include/asm/insn.h
+++ b/arch/arm64/include/asm/insn.h
@@ -246,7 +246,8 @@ static __always_inline bool aarch64_insn_is_##abbr(u32 
code) \
 static __always_inline u32 aarch64_insn_get_##abbr##_value(void) \
 { return (val); }
 
-__AARCH64_INSN_FUNCS(adr_adrp, 0x1F00, 0x1000)
+__AARCH64_INSN_FUNCS(adr,  0x9F00, 0x1000)
+__AARCH64_INSN_FUNCS(adrp, 0x9F00, 0x9000)
 __AARCH64_INSN_FUNCS(prfm_lit, 0xFF00, 0xD800)
 __AARCH64_INSN_FUNCS(str_reg,  0x3FE0EC00, 0x38206800)
 __AARCH64_INSN_FUNCS(ldr_reg,  0x3FE0EC00, 0x38606800)
@@ -318,6 +319,11 @@ __AARCH64_INSN_FUNCS(msr_reg,  0xFFF0, 0xD510)
 bool aarch64_insn_is_nop(u32 insn);
 bool aarch64_insn_is_branch_imm(u32 insn);
 
+static inline bool aarch64_insn_is_adr_adrp(u32 insn)
+{
+   return aarch64_insn_is_adr(insn) || aarch64_insn_is_adrp(insn);
+}
+
 int aarch64_insn_read(void *addr, u32 *insnp);
 int aarch64_insn_write(void *addr, u32 insn);
 enum aarch64_insn_encoding_class aarch64_get_insn_class(u32 insn);
@@ -398,6 +404,9 @@ int aarch64_insn_patch_text_nosync(void *addr, u32 insn);
 int aarch64_insn_patch_text_sync(void *addrs[], u32 insns[], int cnt);
 int aarch64_insn_patch_text(void *addrs[], u32 insns[], int cnt);
 
+s32 aarch64_insn_adrp_get_offset(u32 insn);
+u32 aarch64_insn_adrp_set_offset(u32 insn, s32 offset);
+
 bool aarch32_insn_is_wide(u32 insn);
 
 #define A32_RN_OFFSET  16
diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
index 63f9432..f022af4 100644
--- a/arch/arm64/kernel/insn.c
+++ b/arch/arm64/kernel/insn.c
@@ -1202,6 +1202,19 @@ u32 aarch64_set_branch_offset(u32 insn, s32 offset)
BUG();
 }
 
+s32 aarch64_insn_adrp_get_offset(u32 insn)
+{
+   BUG_ON(!aarch64_insn_is_adrp(insn));
+   return aarch64_insn_decode_immediate(AARCH64_INSN_IMM_ADR, insn) << 12;
+}
+
+u32 aarch64_insn_adrp_set_offset(u32 insn, s32 offset)
+{
+   BUG_ON(!aarch64_insn_is_adrp(insn));
+   return aarch64_insn_encode_immediate(AARCH64_INSN_IMM_ADR, insn,
+   offset >> 12);
+}
+
 /*
  * Extract the Op/CR data from a msr/mrs instruction.
  */
-- 
2.7.4



[PATCH v2 6/9] arm64: alternative: Add support for patching adrp instructions

2016-08-26 Thread Suzuki K Poulose
adrp uses PC-relative address offset to a page (of 4K size) of
a symbol. If it appears in an alternative code patched in, we
should adjust the offset to reflect the address where it will
be run from. This patch adds support for fixing the offset
for adrp instructions.

Cc: Will Deacon <will.dea...@arm.com>
Cc: Marc Zyngier <marc.zyng...@arm.com>
Cc: Andre Przywara <andre.przyw...@arm.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>

---
Changes since V1:
 - Add align_down macro. Couldn't find the best place to add it.
   Didn't want to add this to uapi/ headers where the kernel's generic
   ALIGN helpers are really defined. For the time being left it here.
---
 arch/arm64/kernel/alternative.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
index 6b269a9..d681498 100644
--- a/arch/arm64/kernel/alternative.c
+++ b/arch/arm64/kernel/alternative.c
@@ -59,6 +59,8 @@ static bool branch_insn_requires_update(struct alt_instr 
*alt, unsigned long pc)
BUG();
 }
 
+#define align_down(x, a)   ((unsigned long)(x) & ~(((unsigned long)(a)) - 
1))
+
 static u32 get_alt_insn(struct alt_instr *alt, u32 *insnptr, u32 *altinsnptr)
 {
u32 insn;
@@ -80,6 +82,19 @@ static u32 get_alt_insn(struct alt_instr *alt, u32 *insnptr, 
u32 *altinsnptr)
offset = target - (unsigned long)insnptr;
insn = aarch64_set_branch_offset(insn, offset);
}
+   } else if (aarch64_insn_is_adrp(insn)) {
+   s32 orig_offset, new_offset;
+   unsigned long target;
+
+   /*
+* If we're replacing an adrp instruction, which uses 
PC-relative
+* immediate addressing, adjust the offset to reflect the new
+* PC. adrp operates on 4K aligned addresses.
+*/
+   orig_offset  = aarch64_insn_adrp_get_offset(insn);
+   target = align_down(altinsnptr, SZ_4K) + orig_offset;
+   new_offset = target - align_down(insnptr, SZ_4K);
+   insn = aarch64_insn_adrp_set_offset(insn, new_offset);
} else if (aarch64_insn_uses_literal(insn)) {
/*
 * Disallow patching unhandled instructions using PC relative
-- 
2.7.4



[PATCH v2 2/9] arm64: Use consistent naming for errata handling

2016-08-26 Thread Suzuki K Poulose
This is a cosmetic change to rename the functions dealing with
the errata work arounds to be more consistent with their naming.

1) check_local_cpu_errata() => update_cpu_errata_workarounds()
check_local_cpu_errata() actually updates the system's errata work
arounds. So rename it to reflect the same.

2) verify_local_cpu_errata() => verify_local_cpu_errata_workarounds()
Use errata_workarounds instead of _errata.

Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/include/asm/cpufeature.h | 4 ++--
 arch/arm64/kernel/cpu_errata.c  | 4 ++--
 arch/arm64/kernel/cpufeature.c  | 2 +-
 arch/arm64/kernel/cpuinfo.c | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/cpufeature.h 
b/arch/arm64/include/asm/cpufeature.h
index f6f5e49..aadd946 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -193,10 +193,10 @@ void __init setup_cpu_features(void);
 void update_cpu_capabilities(const struct arm64_cpu_capabilities *caps,
const char *info);
 void enable_cpu_capabilities(const struct arm64_cpu_capabilities *caps);
-void check_local_cpu_errata(void);
+void update_cpu_errata_workarounds(void);
 void __init enable_errata_workarounds(void);
 
-void verify_local_cpu_errata(void);
+void verify_local_cpu_errata_workarounds(void);
 void verify_local_cpu_capabilities(void);
 
 u64 read_system_reg(u32 id);
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index 82b0fc2..5836b3d 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -116,7 +116,7 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
  * and the related information is freed soon after. If the new CPU requires
  * an errata not detected at boot, fail this CPU.
  */
-void verify_local_cpu_errata(void)
+void verify_local_cpu_errata_workarounds(void)
 {
const struct arm64_cpu_capabilities *caps = arm64_errata;
 
@@ -131,7 +131,7 @@ void verify_local_cpu_errata(void)
}
 }
 
-void check_local_cpu_errata(void)
+void update_cpu_errata_workarounds(void)
 {
update_cpu_capabilities(arm64_errata, "enabling workaround for");
 }
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 43f73e0..a591c35 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1017,7 +1017,7 @@ void verify_local_cpu_capabilities(void)
if (!sys_caps_initialised)
return;
 
-   verify_local_cpu_errata();
+   verify_local_cpu_errata_workarounds();
verify_local_cpu_features(arm64_features);
verify_local_elf_hwcaps(arm64_elf_hwcaps);
if (system_supports_32bit_el0())
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index ed1b84f..4fa7b73 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -364,7 +364,7 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info)
 
cpuinfo_detect_icache_policy(info);
 
-   check_local_cpu_errata();
+   update_cpu_errata_workarounds();
 }
 
 void cpuinfo_store_cpu(void)
-- 
2.7.4



[PATCH v2 1/9] arm64: Set the safe value for L1 icache policy

2016-08-26 Thread Suzuki K Poulose
Right now we use 0 as the safe value for CTR_EL0:L1Ip, which is
not defined at the moment. The safer value for the L1Ip should be
the weakest of the policies, which happens to be AIVIVT. While at it,
fix the comment about safe_val.

Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Catalin Marinas <catalin.mari...@arm.com>
Cc: Will Deacon <will.dea...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/include/asm/cpufeature.h | 2 +-
 arch/arm64/kernel/cpufeature.c  | 5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/cpufeature.h 
b/arch/arm64/include/asm/cpufeature.h
index 7099f26..f6f5e49 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -63,7 +63,7 @@ struct arm64_ftr_bits {
enum ftr_type   type;
u8  shift;
u8  width;
-   s64 safe_val; /* safe value for discrete features */
+   s64 safe_val; /* safe value for FTR_EXACT features */
 };
 
 /*
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 62272ea..43f73e0 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -147,9 +147,10 @@ static struct arm64_ftr_bits ftr_ctr[] = {
ARM64_FTR_BITS(FTR_STRICT, FTR_LOWER_SAFE, 16, 4, 1),   /* DminLine */
/*
 * Linux can handle differing I-cache policies. Userspace JITs will
-* make use of *minLine
+* make use of *minLine.
+* If we have differing I-cache policies, report it as the weakest - 
AIVIVT.
 */
-   ARM64_FTR_BITS(FTR_NONSTRICT, FTR_EXACT, 14, 2, 0), /* L1Ip */
+   ARM64_FTR_BITS(FTR_NONSTRICT, FTR_EXACT, 14, 2, ICACHE_POLICY_AIVIVT),  
/* L1Ip */
ARM64_FTR_BITS(FTR_STRICT, FTR_EXACT, 4, 10, 0),/* RAZ */
ARM64_FTR_BITS(FTR_STRICT, FTR_LOWER_SAFE, 0, 4, 0),/* IminLine */
ARM64_FTR_END,
-- 
2.7.4



[PATCH v2 4/9] arm64: alternative: Disallow patching instructions using literals

2016-08-26 Thread Suzuki K Poulose
The alternative code patching doesn't check if the replaced instruction
uses a pc relative literal. This could cause silent corruption in the
instruction stream as the instruction will be executed from a different
address than what it was compiled for. Catch all such cases.

Cc: Marc Zyngier <marc.zyng...@arm.com>
Cc: Andre Przywara <andre.przyw...@arm.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Catalin Marinas <catalin.mari...@arm.com>
Suggested-by: Will Deacon <will.dea...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/kernel/alternative.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
index d2ee1b2..6b269a9 100644
--- a/arch/arm64/kernel/alternative.c
+++ b/arch/arm64/kernel/alternative.c
@@ -80,6 +80,12 @@ static u32 get_alt_insn(struct alt_instr *alt, u32 *insnptr, 
u32 *altinsnptr)
offset = target - (unsigned long)insnptr;
insn = aarch64_set_branch_offset(insn, offset);
}
+   } else if (aarch64_insn_uses_literal(insn)) {
+   /*
+* Disallow patching unhandled instructions using PC relative
+* literal addresses
+*/
+   BUG();
}
 
return insn;
-- 
2.7.4



[PATCH v2 3/9] arm64: Rearrange CPU errata workaround checks

2016-08-26 Thread Suzuki K Poulose
Right now we run through the work around checks on a CPU
from __cpuinfo_store_cpu. There are some problems with that:

1) We initialise the system wide CPU feature registers only after the
Boot CPU updates its cpuinfo. Now, if a work around depends on the
variance of a CPU ID feature (e.g, check for Cache Line size mismatch),
we have no way of performing it cleanly for the boot CPU.

2) It is out of place, invoked from __cpuinfo_store_cpu() in cpuinfo.c. It
is not an obvious place for that.

This patch rearranges the CPU specific capability(aka work around) checks.

1) At the moment we use verify_local_cpu_capabilities() to check if a new
CPU has all the system advertised features. Use this for the secondary CPUs
to perform the work around check. For that we rename
  verify_local_cpu_capabilities() => check_local_cpu_capabilities()
which:

   If the system wide capabilities haven't been initialised (i.e, the CPU
   is activated at the boot), update the system wide detected work arounds.

   Otherwise (i.e a CPU hotplugged in later) verify that this CPU conforms to 
the
   system wide capabilities.

2) Boot CPU updates the work arounds from smp_prepare_boot_cpu() after we have
initialised the system wide CPU feature values.

Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Andre Przywara <andre.przyw...@arm.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Catalin Marinas <catalin.mari...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/include/asm/cpufeature.h |  4 ++--
 arch/arm64/kernel/cpufeature.c  | 30 --
 arch/arm64/kernel/cpuinfo.c |  2 --
 arch/arm64/kernel/smp.c |  8 +++-
 4 files changed, 29 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/include/asm/cpufeature.h 
b/arch/arm64/include/asm/cpufeature.h
index aadd946..692b8d3 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -193,11 +193,11 @@ void __init setup_cpu_features(void);
 void update_cpu_capabilities(const struct arm64_cpu_capabilities *caps,
const char *info);
 void enable_cpu_capabilities(const struct arm64_cpu_capabilities *caps);
+void check_local_cpu_capabilities(void);
+
 void update_cpu_errata_workarounds(void);
 void __init enable_errata_workarounds(void);
-
 void verify_local_cpu_errata_workarounds(void);
-void verify_local_cpu_capabilities(void);
 
 u64 read_system_reg(u32 id);
 
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index a591c35..fcf87ca 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1005,23 +1005,33 @@ verify_local_cpu_features(const struct 
arm64_cpu_capabilities *caps)
  * cannot do anything to fix it up and could cause unexpected failures. So
  * we park the CPU.
  */
-void verify_local_cpu_capabilities(void)
+static void verify_local_cpu_capabilities(void)
 {
+   verify_local_cpu_errata_workarounds();
+   verify_local_cpu_features(arm64_features);
+   verify_local_elf_hwcaps(arm64_elf_hwcaps);
+   if (system_supports_32bit_el0())
+   verify_local_elf_hwcaps(compat_elf_hwcaps);
+}
 
+void check_local_cpu_capabilities(void)
+{
+   /*
+* All secondary CPUs should conform to the early CPU features
+* in use by the kernel based on boot CPU.
+*/
check_early_cpu_features();
 
/*
-* If we haven't computed the system capabilities, there is nothing
-* to verify.
+* If we haven't finalised the system capabilities, this CPU gets
+* a chance to update the errata work arounds.
+* Otherwise, this CPU should verify that it has all the system
+* advertised capabilities.
 */
if (!sys_caps_initialised)
-   return;
-
-   verify_local_cpu_errata_workarounds();
-   verify_local_cpu_features(arm64_features);
-   verify_local_elf_hwcaps(arm64_elf_hwcaps);
-   if (system_supports_32bit_el0())
-   verify_local_elf_hwcaps(compat_elf_hwcaps);
+   update_cpu_errata_workarounds();
+   else
+   verify_local_cpu_capabilities();
 }
 
 static void __init setup_feature_capabilities(void)
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index 4fa7b73..b3d5b3e 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -363,8 +363,6 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info)
}
 
cpuinfo_detect_icache_policy(info);
-
-   update_cpu_errata_workarounds();
 }
 
 void cpuinfo_store_cpu(void)
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index d93d433..99d8cc3 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -239,7 +239,7 @@ asmlinkage void secondary_start_kernel(void)
 * this CPU ticks all of those. If it doesn't, the CPU wi

[PATCH v2 0/9] arm64: Work around for mismatched cache line size

2016-08-26 Thread Suzuki K Poulose
This series adds a work around for systems with mismatched {I,D}-cache
line sizes. When a thread of execution gets migrated to a different CPU,
the cache line size it had cached could be larger than that of the new
CPU. This could cause data corruption issues. We work around this by

 - Dynamically patching the kernel to use the smallest line size on the
   system (from the CPU feature infrastructure)
 - Trapping the userspace access to CTR_EL0 (by clearing SCTLR_EL1.UCT) and
   emulating it with the system wide safe value of CTR.

The series also adds support for alternative code patching of adrp
instructions by adjusting the PC-relative address offset to reflect
the new PC.

The series has been tested on Juno with a hack to forced enabling
of the capability.

Applies on v4.8-rc3. The tree is avaiable at :
  git://linux-arm.org/linux-skp.git ctr-v2

Changes since V1:

 - Replace adr_adrp insn helper with seperate helpers for adr and adrp.
 - Add/use align_down() macro for adjusting the page address for adrp offsets.
 - Add comments for existing ISS field defintions.
 - Added a patch to disallow silent patching of unhandled pc relative
   instructions in alternative code patching.

Suzuki K Poulose (9):
  arm64: Set the safe value for L1 icache policy
  arm64: Use consistent naming for errata handling
  arm64: Rearrange CPU errata workaround checks
  arm64: alternative: Disallow patching instructions using literals
  arm64: insn: Add helpers for adrp offsets
  arm64: alternative: Add support for patching adrp instructions
  arm64: Introduce raw_{d,i}cache_line_size
  arm64: Refactor sysinstr exception handling
  arm64: Work around systems with mismatched cache line sizes

 arch/arm64/include/asm/assembler.h  | 45 +--
 arch/arm64/include/asm/cpufeature.h | 14 +++---
 arch/arm64/include/asm/esr.h| 84 +++
 arch/arm64/include/asm/insn.h   | 11 -
 arch/arm64/include/asm/sysreg.h |  1 +
 arch/arm64/kernel/alternative.c | 21 +
 arch/arm64/kernel/asm-offsets.c |  2 +
 arch/arm64/kernel/cpu_errata.c  | 26 ++-
 arch/arm64/kernel/cpufeature.c  | 44 ++-
 arch/arm64/kernel/cpuinfo.c |  2 -
 arch/arm64/kernel/hibernate-asm.S   |  2 +-
 arch/arm64/kernel/insn.c| 13 ++
 arch/arm64/kernel/relocate_kernel.S |  2 +-
 arch/arm64/kernel/smp.c |  8 +++-
 arch/arm64/kernel/traps.c   | 87 ++---
 15 files changed, 297 insertions(+), 65 deletions(-)

-- 
2.7.4



Re: [PATCH 5/8] arm64: alternative: Add support for patching adrp instructions

2016-08-23 Thread Suzuki K Poulose

On 22/08/16 12:45, Ard Biesheuvel wrote:

On 18 August 2016 at 15:10, Suzuki K Poulose <suzuki.poul...@arm.com> wrote:

adrp uses PC-relative address offset to a page (of 4K size) of
a symbol. If it appears in an alternative code patched in, we
should adjust the offset to reflect the address where it will
be run from. This patch adds support for fixing the offset
for adrp instructions.

Cc: Will Deacon <will.dea...@arm.com>
Cc: Marc Zyngier <marc.zyng...@arm.com>
Cc: Andre Przywara <andre.przyw...@arm.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/kernel/alternative.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
index d2ee1b2..71c6962 100644
--- a/arch/arm64/kernel/alternative.c
+++ b/arch/arm64/kernel/alternative.c
@@ -80,6 +80,19 @@ static u32 get_alt_insn(struct alt_instr *alt, u32 *insnptr, 
u32 *altinsnptr)
offset = target - (unsigned long)insnptr;
insn = aarch64_set_branch_offset(insn, offset);
}
+   } else if (aarch64_insn_is_adrp(insn)) {
+   s32 orig_offset, new_offset;
+   unsigned long target;
+
+   /*
+* If we're replacing an adrp instruction, which uses 
PC-relative
+* immediate addressing, adjust the offset to reflect the new
+* PC. adrp operates on 4K aligned addresses.
+*/
+   orig_offset  = aarch64_insn_adrp_get_offset(insn);
+   target = ((unsigned long)altinsnptr & ~0xfffUL) + orig_offset;
+   new_offset = target - ((unsigned long)insnptr & ~0xfffUL);
+   insn = aarch64_insn_adrp_set_offset(insn, new_offset);


Are orig_offset and new_offset guaranteed to be equal modulo 4 KB?
Otherwise, you will have to track down and patch the associated :lo12:
add/ldr instruction as well.


We are modifying the alternative instruction to accommodate for the new PC,
where this instruction will be executed from, while the referenced symbol
remains the same. Hence the associated :lo12: doesn't change. Does that
address your concern ? Or did I miss something ?

Suzuki




Re: [PATCH 6/8] arm64: Introduce raw_{d,i}cache_line_size

2016-08-23 Thread Suzuki K Poulose

On 22/08/16 11:00, Will Deacon wrote:

On Thu, Aug 18, 2016 at 02:10:30PM +0100, Suzuki K Poulose wrote:

On systems with mismatched i/d cache min line sizes, we need to use
the smallest size possible across all CPUs. This will be done by fetching
the system wide safe value from CPU feature infrastructure.
However the some special users(e.g kexec, hibernate) would need the line
size on the CPU (rather than the system wide), when the system wide
feature may not be accessible. Provide another helper which will fetch
cache line size on the current CPU.


Why are these users "special"? Using a smaller line size shouldn't affect


With the alternate patched code, we refer to the kernel data structure for
CTR value. At least for kexec, it may overwrite the existing kernel image/data 
where
our data was stored and could possibly end up in receiving corrupted code.

For all special cases where it is ensured that the code is run on a
single CPU and will not be migrated to another CPU they can rely on
the raw value of CTR, hence the change.


correctness, and I don't see kexec and hibernate as being performance
critical in their cache maintenance.


Its not for performance, but for the safety.

Suzuki



Re: [PATCH 5/8] arm64: alternative: Add support for patching adrp instructions

2016-08-23 Thread Suzuki K Poulose

On 22/08/16 12:19, Will Deacon wrote:

On Thu, Aug 18, 2016 at 02:10:29PM +0100, Suzuki K Poulose wrote:

adrp uses PC-relative address offset to a page (of 4K size) of
a symbol. If it appears in an alternative code patched in, we
should adjust the offset to reflect the address where it will
be run from. This patch adds support for fixing the offset
for adrp instructions.

Cc: Will Deacon <will.dea...@arm.com>
Cc: Marc Zyngier <marc.zyng...@arm.com>
Cc: Andre Przywara <andre.przyw...@arm.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/kernel/alternative.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
index d2ee1b2..71c6962 100644
--- a/arch/arm64/kernel/alternative.c
+++ b/arch/arm64/kernel/alternative.c
@@ -80,6 +80,19 @@ static u32 get_alt_insn(struct alt_instr *alt, u32 *insnptr, 
u32 *altinsnptr)
offset = target - (unsigned long)insnptr;
insn = aarch64_set_branch_offset(insn, offset);
}
+   } else if (aarch64_insn_is_adrp(insn)) {
+   s32 orig_offset, new_offset;
+   unsigned long target;
+
+   /*
+* If we're replacing an adrp instruction, which uses 
PC-relative
+* immediate addressing, adjust the offset to reflect the new
+* PC. adrp operates on 4K aligned addresses.
+*/
+   orig_offset  = aarch64_insn_adrp_get_offset(insn);
+   target = ((unsigned long)altinsnptr & ~0xfffUL) + orig_offset;
+   new_offset = target - ((unsigned long)insnptr & ~0xfffUL);


The masking with ~0xfffUL might be nicer if you write it as
align_down(ptr, SZ_4K);


Right, that definitely looks better. Will change it.




+   insn = aarch64_insn_adrp_set_offset(insn, new_offset);
}

return insn;


I wonder if we shouldn't have a catch-all for any instructions performing
PC-relative operations here, because silent corruption of the instruction


Correct, which is what happened initially when I didn't have the adrp handling 
;-).


stream is pretty horrible. What other instructions are there? ADR, LDR
(literal), ... ?


From a quick look, all the instructions under "Load register (literal)" :

i.e,
LDR (literal) for GPR/FP_SIMD/32bit/64bit
LDRSW (literal)
PRFM (literal)

and Data processing instructions - immediate group with PC-relative addressing:

ADR, ADRP

I will add a check to catch the unsupported instructions in the alternative 
code.

Thanks
Suzuki



Re: [PATCH 7/8] arm64: Refactor sysinstr exception handling

2016-08-23 Thread Suzuki K Poulose

On 22/08/16 13:53, Will Deacon wrote:

On Thu, Aug 18, 2016 at 02:10:31PM +0100, Suzuki K Poulose wrote:

Right now we trap some of the user space data cache operations
based on a few Errata (ARM 819472, 826319, 827319 and 824069).
We need to trap userspace access to CTR_EL0, if we detect mismatched
cache line size. Since both these traps share the EC, refactor
the handler a little bit to make it a bit more reader friendly.

Cc: Andre Przywara <andre.przyw...@arm.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Catalin Marinas <catalin.mari...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/include/asm/esr.h | 48 +
 arch/arm64/kernel/traps.c| 73 
 2 files changed, 95 insertions(+), 26 deletions(-)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index f772e15..2a8f6c3 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -109,6 +109,54 @@
((ESR_ELx_EC_BRK64 << ESR_ELx_EC_SHIFT) | ESR_ELx_IL |\
 ((imm) & 0x))

+/* ISS field definitions for System instruction traps */


Can you add a similar comment for the ESR_ELx_* encodings that we already
have, please? Unfortunately, we've not namespaced things, so the
data/instruction abort encodings are described as e.g. ESR_ELx_ISV.


Will do.



+#define ESR_ELx_SYS64_ISS_Op0_SHIFT20
+#define ESR_ELx_SYS64_ISS_Op0_MASK (UL(0x3) << ESR_ELx_SYS64_ISS_Op0_SHIFT)


Inconsistent capitalisation in your macro naming (e.g. RT vs Op1). Maybe
just stick to uppercase for #defines?


Sure


+
+#define ESR_ELx_SYS64_ISS_U_CACHE_OP_MASK  (ESR_ELx_SYS64_ISS_Op0_MASK | \
+ESR_ELx_SYS64_ISS_Op1_MASK | \
+ESR_ELx_SYS64_ISS_Op2_MASK | \
+ESR_ELx_SYS64_ISS_CRn_MASK | \
+ESR_ELx_SYS64_ISS_DIR_MASK)
+#define ESR_ELx_SYS64_ISS_U_CACHE_OP_VAL \
+   (ESR_ELx_SYS64_ISS_SYS_VAL(1, 3, 1, 7, 0) | \
+ESR_ELx_SYS64_ISS_DIR_WRITE)


What is the _U_ for? Unified? User? If it's user, EL0 may be more
appropriate.


Ok.
 



Thanks
Suzuki


Re: [PATCH 8/8] arm64: Work around systems with mismatched cache line sizes

2016-08-24 Thread Suzuki K Poulose

On 22/08/16 14:02, Will Deacon wrote:

On Thu, Aug 18, 2016 at 02:10:32PM +0100, Suzuki K Poulose wrote:

Systems with differing CPU i-cache/d-cache line sizes can cause
problems with the cache management by software when the execution
is migrated from one to another. Usually, the application reads
the cache size on a CPU and then uses that length to perform cache
operations. However, if it gets migrated to another CPU with a smaller
cache line size, things could go completely wrong. To prevent such
cases, always use the smallest cache line size among the CPUs. The
kernel CPU feature infrastructure already keeps track of the safe
value for all CPUID registers including CTR. This patch works around
the problem by :

For kernel, dynamically patch the kernel to read the cache size
from the system wide copy of CTR_EL0.


Is it only CTR that is mismatched in practice, or do we need to worry
about DCZID_EL0 too?


A mismatched DCZID_EL0 is quite possible. However, there is no way to
trap accesses to DCZID_EL0. Rather, we can trap DC ZVA if we clear
SCTLR_EL1.DZE. But then clearing the SCTLR_EL1.DZE implies reading DCZID.DZP
returns 1, indicating DC ZVA is not supported. So if a proper application
checks the DZP before issuing a DC ZVA, we may never be able to emulate it.
Or in other words, if there is a mismatch, the work around is to disable
the DC ZVA operations (which could possibly affect existing (incorrect) 
userspace
applications assuming DC ZVA is supported without checking the DZP bit).
 

 static void update_cpu_ftr_reg(struct arm64_ftr_reg *reg, u64 new)
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 93c5287..db2d6cb 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -480,6 +480,14 @@ static void user_cache_maint_handler(unsigned int esr, 
struct pt_regs *regs)
regs->pc += 4;
 }

+static void ctr_read_handler(unsigned int esr, struct pt_regs *regs)
+{
+   int rt = (esr & ESR_ELx_SYS64_ISS_RT_MASK) >> 
ESR_ELx_SYS64_ISS_RT_SHIFT;
+
+   regs->regs[rt] = sys_ctr_ftr->sys_val;
+   regs->pc += 4;
+}


Whilst this is correct, I wonder if there's any advantage in reporting a
*larger* size to userspace and avoid incurring additional trap overhead?


Combining the trapping of user space dc operations for Errata work around for
clean cache, we could possibly report a larger size and emulate it properly
in the kernel. But I think that can be a enhancement on top of this series.



Any idea what sort of size typical JITs are using?


I have no clue about it. I have Cc-ed Rodolph and Stuart, who may have better
idea about the JIT's usage.

Suzuki



[PATCH 1/3] coresight: tmc: Cleanup operation mode handling

2016-09-27 Thread Suzuki K Poulose
The mode of operation of the TMC tracked in drvdata->mode is defined
as a local_t type. This is always checked and modified under the
drvdata->spinlock and hence we don't need local_t for it and the
unnecessary synchronisation instructions that comes with it. This
change makes the code a bit more cleaner.

Also fixes the order in which we update the drvdata->mode to
CS_MODE_DISABLED. i.e, in tmc_disable_etX_sink we change the
mode to CS_MODE_DISABLED before invoking tmc_disable_etX_hw()
which in turn depends on the mode to decide whether to dump the
trace to a buffer.

Applies on mathieu's coresight/next tree [1]

https://git.linaro.org/kernel/coresight.git next

Reported-by: Venkatesh Vivekanandan <venkatesh.vivekanan...@broadcom.com>
Cc: Mathieu Poirier <mathieu.poir...@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 drivers/hwtracing/coresight/coresight-tmc-etf.c | 32 +++--
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 26 +---
 drivers/hwtracing/coresight/coresight-tmc.h |  2 +-
 3 files changed, 26 insertions(+), 34 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c 
b/drivers/hwtracing/coresight/coresight-tmc-etf.c
index d6941ea..e80a8f4 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
@@ -70,7 +70,7 @@ static void tmc_etb_disable_hw(struct tmc_drvdata *drvdata)
 * When operating in sysFS mode the content of the buffer needs to be
 * read before the TMC is disabled.
 */
-   if (local_read(>mode) == CS_MODE_SYSFS)
+   if (drvdata->mode == CS_MODE_SYSFS)
tmc_etb_dump_hw(drvdata);
tmc_disable_hw(drvdata);
 
@@ -108,7 +108,6 @@ static int tmc_enable_etf_sink_sysfs(struct 
coresight_device *csdev, u32 mode)
int ret = 0;
bool used = false;
char *buf = NULL;
-   long val;
unsigned long flags;
struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
 
@@ -138,13 +137,12 @@ static int tmc_enable_etf_sink_sysfs(struct 
coresight_device *csdev, u32 mode)
goto out;
}
 
-   val = local_xchg(>mode, mode);
/*
 * In sysFS mode we can have multiple writers per sink.  Since this
 * sink is already enabled no memory is needed and the HW need not be
 * touched.
 */
-   if (val == CS_MODE_SYSFS)
+   if (drvdata->mode == CS_MODE_SYSFS)
goto out;
 
/*
@@ -163,6 +161,7 @@ static int tmc_enable_etf_sink_sysfs(struct 
coresight_device *csdev, u32 mode)
drvdata->buf = buf;
}
 
+   drvdata->mode = CS_MODE_SYSFS;
tmc_etb_enable_hw(drvdata);
 out:
spin_unlock_irqrestore(>spinlock, flags);
@@ -180,7 +179,6 @@ out:
 static int tmc_enable_etf_sink_perf(struct coresight_device *csdev, u32 mode)
 {
int ret = 0;
-   long val;
unsigned long flags;
struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
 
@@ -194,17 +192,17 @@ static int tmc_enable_etf_sink_perf(struct 
coresight_device *csdev, u32 mode)
goto out;
}
 
-   val = local_xchg(>mode, mode);
/*
 * In Perf mode there can be only one writer per sink.  There
 * is also no need to continue if the ETB/ETR is already operated
 * from sysFS.
 */
-   if (val != CS_MODE_DISABLED) {
+   if (drvdata->mode != CS_MODE_DISABLED) {
ret = -EINVAL;
goto out;
}
 
+   drvdata->mode = mode;
tmc_etb_enable_hw(drvdata);
 out:
spin_unlock_irqrestore(>spinlock, flags);
@@ -227,7 +225,6 @@ static int tmc_enable_etf_sink(struct coresight_device 
*csdev, u32 mode)
 
 static void tmc_disable_etf_sink(struct coresight_device *csdev)
 {
-   long val;
unsigned long flags;
struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
 
@@ -237,10 +234,11 @@ static void tmc_disable_etf_sink(struct coresight_device 
*csdev)
return;
}
 
-   val = local_xchg(>mode, CS_MODE_DISABLED);
/* Disable the TMC only if it needs to */
-   if (val != CS_MODE_DISABLED)
+   if (drvdata->mode != CS_MODE_DISABLED) {
tmc_etb_disable_hw(drvdata);
+   drvdata->mode = CS_MODE_DISABLED;
+   }
 
spin_unlock_irqrestore(>spinlock, flags);
 
@@ -260,7 +258,7 @@ static int tmc_enable_etf_link(struct coresight_device 
*csdev,
}
 
tmc_etf_enable_hw(drvdata);
-   local_set(>mode, CS_MODE_SYSFS);
+   drvdata->mode = CS_MODE_SYSFS;
spin_unlock_irqrestore(>spinlock, flags);
 
dev_info(drvdata->dev, "TMC-ETF enabled\n");
@@ -280,7 +278,7 @@ static void tmc_disable_etf_link(struct coresight_device 
*csdev,
}
 

[PATCH 3/3] coresight: tmc: Remove duplicate memset

2016-09-27 Thread Suzuki K Poulose
The tmc_etr_enable_hw() fills the buffer with 0's before enabling
the hardware. So, we don't need an explicit memset() in
tmc_enable_etr_sink_sysfs() before calling the tmc_etr_enable_hw().
This patch removes the explicit memset from tmc_enable_etr_sink_sysfs.

Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c 
b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 3b84d0d..5d31269 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -150,8 +150,6 @@ static int tmc_enable_etr_sink_sysfs(struct 
coresight_device *csdev)
drvdata->buf = drvdata->vaddr;
}
 
-   memset(drvdata->vaddr, 0, drvdata->size);
-
drvdata->mode = CS_MODE_SYSFS;
tmc_etr_enable_hw(drvdata);
 out:
-- 
2.7.4



[PATCH 2/3] coresight: tmc: Get rid of mode parameter for helper routines

2016-09-27 Thread Suzuki K Poulose
Get rid of the superfluous mode parameter and the check for
the mode in tmc_etX_enable_sink_{perf/sysfs}. While at it, also
remove the unnecessary WARN_ON() checks.

Cc: Mathieu Poirier <mathieu.poir...@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 drivers/hwtracing/coresight/coresight-tmc-etf.c | 18 +-
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 15 ---
 2 files changed, 9 insertions(+), 24 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c 
b/drivers/hwtracing/coresight/coresight-tmc-etf.c
index e80a8f4..1549436 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
@@ -103,7 +103,7 @@ static void tmc_etf_disable_hw(struct tmc_drvdata *drvdata)
CS_LOCK(drvdata->base);
 }
 
-static int tmc_enable_etf_sink_sysfs(struct coresight_device *csdev, u32 mode)
+static int tmc_enable_etf_sink_sysfs(struct coresight_device *csdev)
 {
int ret = 0;
bool used = false;
@@ -111,10 +111,6 @@ static int tmc_enable_etf_sink_sysfs(struct 
coresight_device *csdev, u32 mode)
unsigned long flags;
struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
 
-/* This shouldn't be happening */
-   if (WARN_ON(mode != CS_MODE_SYSFS))
-   return -EINVAL;
-
/*
 * If we don't have a buffer release the lock and allocate memory.
 * Otherwise keep the lock and move along.
@@ -176,16 +172,12 @@ out:
return ret;
 }
 
-static int tmc_enable_etf_sink_perf(struct coresight_device *csdev, u32 mode)
+static int tmc_enable_etf_sink_perf(struct coresight_device *csdev)
 {
int ret = 0;
unsigned long flags;
struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
 
-/* This shouldn't be happening */
-   if (WARN_ON(mode != CS_MODE_PERF))
-   return -EINVAL;
-
spin_lock_irqsave(>spinlock, flags);
if (drvdata->reading) {
ret = -EINVAL;
@@ -202,7 +194,7 @@ static int tmc_enable_etf_sink_perf(struct coresight_device 
*csdev, u32 mode)
goto out;
}
 
-   drvdata->mode = mode;
+   drvdata->mode = CS_MODE_PERF;
tmc_etb_enable_hw(drvdata);
 out:
spin_unlock_irqrestore(>spinlock, flags);
@@ -214,9 +206,9 @@ static int tmc_enable_etf_sink(struct coresight_device 
*csdev, u32 mode)
 {
switch (mode) {
case CS_MODE_SYSFS:
-   return tmc_enable_etf_sink_sysfs(csdev, mode);
+   return tmc_enable_etf_sink_sysfs(csdev);
case CS_MODE_PERF:
-   return tmc_enable_etf_sink_perf(csdev, mode);
+   return tmc_enable_etf_sink_perf(csdev);
}
 
/* We shouldn't be here */
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c 
b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index f23ef0c..3b84d0d 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -93,7 +93,7 @@ static void tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
CS_LOCK(drvdata->base);
 }
 
-static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev, u32 mode)
+static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
 {
int ret = 0;
bool used = false;
@@ -102,9 +102,6 @@ static int tmc_enable_etr_sink_sysfs(struct 
coresight_device *csdev, u32 mode)
dma_addr_t paddr;
struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
 
-/* This shouldn't be happening */
-   if (WARN_ON(mode != CS_MODE_SYSFS))
-   return -EINVAL;
 
/*
 * If we don't have a buffer release the lock and allocate memory.
@@ -170,16 +167,12 @@ out:
return ret;
 }
 
-static int tmc_enable_etr_sink_perf(struct coresight_device *csdev, u32 mode)
+static int tmc_enable_etr_sink_perf(struct coresight_device *csdev)
 {
int ret = 0;
unsigned long flags;
struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
 
-/* This shouldn't be happening */
-   if (WARN_ON(mode != CS_MODE_PERF))
-   return -EINVAL;
-
spin_lock_irqsave(>spinlock, flags);
if (drvdata->reading) {
ret = -EINVAL;
@@ -208,9 +201,9 @@ static int tmc_enable_etr_sink(struct coresight_device 
*csdev, u32 mode)
 {
switch (mode) {
case CS_MODE_SYSFS:
-   return tmc_enable_etr_sink_sysfs(csdev, mode);
+   return tmc_enable_etr_sink_sysfs(csdev);
case CS_MODE_PERF:
-   return tmc_enable_etr_sink_perf(csdev, mode);
+   return tmc_enable_etr_sink_perf(csdev);
}
 
/* We shouldn't be here */
-- 
2.7.4



Re: [PATCH] coresight: tmc: implementing TMC-ETR AUX space API

2016-10-03 Thread Suzuki K Poulose

On 19/09/16 22:14, Mathieu Poirier wrote:

This patch implements the AUX area interfaces required to
use the TMC-ETR (configured to work in scatter-gather mode)
from the Perf sub-system.

Some of this work was inspired from the original implementation
done by Pratik Patel at CodeAurora.



Hi Mathieu,

Thanks for nailing the monster. I have a few comments below on the 
implementation.


Signed-off-by: Mathieu Poirier 
---
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 629 +++-
 drivers/hwtracing/coresight/coresight-tmc.h |   1 +
 2 files changed, 621 insertions(+), 9 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c 
b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 6d7de0309e94..581d6393bb5d 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -17,10 +17,60 @@

 #include 
 #include 
+#include 
+
 #include "coresight-priv.h"
 #include "coresight-tmc.h"

-void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
+/**
+ * struct etr_page - DMA'able and virtual address representation for a page
+ * @daddr: DMA'able page address returned by dma_map_page()
+ * @vaddr: Virtual address returned by page_address()
+ */
+struct etr_page {
+   dma_addr_t  daddr;
+   u64 vaddr;
+};
+
+/**
+ * struct cs_etr_buffer - keep track of a recording session' specifics
+ * @dev:   device reference to be used with the DMA API
+ * @tmc:   generic portion of the TMC buffers
+ * @etr_nr_pages:  number of memory pages for the ETR-SG trace storage
+ * @pt_vaddr:  the virtual address of the first page table entry
+ * @page_addr: quick access to all the pages held in the page table
+ */
+struct cs_etr_buffers {
+   struct device   *dev;
+   struct cs_buffers   tmc;
+   unsigned intetr_nr_pages;
+   void __iomem*pt_vaddr;
+   struct etr_page page_addr[0];
+};
+
+#define TMC_ETR_ENTRIES_PER_PT (PAGE_SIZE / sizeof(u32))
+
+/*
+ * Helpers for scatter-gather descriptors.  Descriptors are defined as follow:
+ *
+ * ---Bit31Bit4---Bit1-Bit0--
+ * | Address[39:12]| SBZ |  Entry Type  |
+ * --
+ *
+ * Address: Bits [39:12] of a physical page address. Bits [11:0] are
+ * always zero.
+ *
+ * Entry type: b10 - Normal entry
+ * b11 - Last entry in a page table
+ * b01 - Last entry
+ */
+#define TMC_ETR_SG_LST_ENT(phys_pte)   (((phys_pte >> PAGE_SHIFT) << 4) | 0x1)
+#define TMC_ETR_SG_ENT(phys_pte)   (((phys_pte >> PAGE_SHIFT) << 4) | 0x2)
+#define TMC_ETR_SG_NXT_TBL(phys_pte)   (((phys_pte >> PAGE_SHIFT) << 4) | 0x3)
+


Please be aware that on arm64, the PAGE_SIZE can be 16K or 64K. So hard coding
PAGE_SHIFT here might be problematic on those configurations as the ETR page 
size
is always 4K.


+#define TMC_ETR_SG_ENT_TO_PG(entry)((entry >> 4) << PAGE_SHIFT)
+
+void tmc_etr_enable_hw_cnt_mem(struct tmc_drvdata *drvdata)
 {
u32 axictl;

@@ -57,7 +107,47 @@ void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
CS_LOCK(drvdata->base);
 }

-static void tmc_etr_dump_hw(struct tmc_drvdata *drvdata)
+void tmc_etr_enable_hw_sg_mem(struct tmc_drvdata *drvdata)




+* DBAHI Holds the upper eight bits of the 40-bit address used to
+* locate the trace buffer in system memory.
+*/
+   writel_relaxed((drvdata->paddr >> 32) & 0xFF,
+   drvdata->base + TMC_DBAHI);


I think we should do the same for tmc_etr_enable_hw_cnt_mem().


@@ -199,7 +290,7 @@ static int tmc_enable_etr_sink_perf(struct coresight_device 
*csdev, u32 mode)
goto out;
}

-   tmc_etr_enable_hw(drvdata);
+   tmc_etr_enable_hw_sg_mem(drvdata);
 out:
spin_unlock_irqrestore(>spinlock, flags);

@@ -241,9 +332,528 @@ static void tmc_disable_etr_sink(struct coresight_device 
*csdev)
dev_info(drvdata->dev, "TMC-ETR disabled\n");
 }

+/*
+ * The default perf ring buffer size is 32 and 1024 pages for user and kernel
+ * space respectively.  The size of the intermediate SG list is allowed
+ * to match the size of the perf ring buffer but cap it to the default
+ * kernel size.
+ */
+#define DEFAULT_NR_KERNEL_PAGES1024
+static int tmc_get_etr_pages(int nr_pages)


The name could be confusing, as it kind of implies it allocates nr_pages.
It might be worth renaming it to tmc_get_etr_pages_nr ?


+{
+   if (nr_pages <= DEFAULT_NR_KERNEL_PAGES)
+   return nr_pages;
+
+   return DEFAULT_NR_KERNEL_PAGES;
+}
+
+/*
+ * Go through all the pages in the SG list and check if @phys_addr
+ * falls within one of those.  If so record the information in
+ * @page and @offset.
+ */
+static int
+tmc_get_sg_page_index(struct cs_etr_buffers *etr_buffer,
+ u64 

Re: [PATCH 3/3] arm64/fpsimd: Use ID_AA64PFR0_EL1_.* macros

2016-11-07 Thread Suzuki K Poulose

On 03/09/15 19:12, Alexander Kuleshov wrote:

The 26d75e67c commit (arm64/cpufeature.h: Add macros for a cpu features
testing) provides set of macros for the testing processor's FP and advanced
SIMD features.

Let's use these macros instead of direct calculation.

Signed-off-by: Alexander Kuleshov 
---
 arch/arm64/kernel/fpsimd.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 44d6f75..12943a5 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -27,6 +27,7 @@

 #include 
 #include 
+#include 

 #define FPEXC_IOF  (1 << 0)
 #define FPEXC_DZF  (1 << 1)
@@ -333,13 +334,13 @@ static int __init fpsimd_init(void)
 {
u64 pfr = read_cpuid(ID_AA64PFR0_EL1);

-   if (pfr & (0xf << 16)) {
+   if (ID_AA64PFR0_EL1_FP(pfr)) {
pr_notice("Floating-point is not implemented\n");
return 0;
}
elf_hwcap |= HWCAP_FP;

-   if (pfr & (0xf << 20))
+   if (ID_AA64PFR0_EL1_ADV_SIMD(pfr))
pr_notice("Advanced SIMD is not implemented\n");
else
elf_hwcap |= HWCAP_ASIMD;



Similar to the previous one, this won't apply anymore.

Suzuki


Re: [PATCH 2/3] arm64/setup: Use ID_AA64ISAR0_EL1_.* macros

2016-11-07 Thread Suzuki K Poulose

On 03/09/15 19:12, Alexander Kuleshov wrote:

The 26d75e67c commit (arm64/cpufeature.h: Add macros for a cpu features
testing) provides set of macros for the testing processor's crypto features.
Let's use these macros instead of direct calculation.






Signed-off-by: Alexander Kuleshov 
---
 arch/arm64/kernel/setup.c | 29 +
 1 file changed, 9 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 926ae8d..a3faf4f 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c


This patch doesn't apply on the current mainline tree. Where does this patch 
apply ?
The elf_hwcap calculation has been moved to a separate function 
setup_elf_hwcaps()
in arch/arm64/kernel/cpufeature.c,  which makes uses of a table of 
arm64_cpu_capabilities.

Suzuki


@@ -250,33 +250,22 @@ static void __init setup_processor(void)

/*
 * ID_AA64ISAR0_EL1 contains 4-bit wide signed feature blocks.
-* The blocks we test below represent incremental functionality
-* for non-negative values. Negative values are reserved.
 */
features = read_cpuid(ID_AA64ISAR0_EL1);
-   block = (features >> 4) & 0xf;
-   if (!(block & 0x8)) {
-   switch (block) {
-   default:
-   case 2:
-   elf_hwcap |= HWCAP_PMULL;
-   case 1:
-   elf_hwcap |= HWCAP_AES;
-   case 0:
-   break;
-   }
-   }

-   block = (features >> 8) & 0xf;
-   if (block && !(block & 0x8))
+   if (ID_AA64ISAR0_EL1_AES(features))
+   elf_hwcap |= HWCAP_AES;
+
+   if (ID_AA64ISAR0_EL1_PMULL(features))
+   elf_hwcap |= HWCAP_PMULL;
+
+   if (ID_AA64ISAR0_EL1_SHA1(features))
elf_hwcap |= HWCAP_SHA1;

-   block = (features >> 12) & 0xf;
-   if (block && !(block & 0x8))
+   if (ID_AA64ISAR0_EL1_SHA2(features))
elf_hwcap |= HWCAP_SHA2;

-   block = (features >> 16) & 0xf;
-   if (block && !(block & 0x8))
+   if (ID_AA64ISAR0_EL1_CRC32(features))
elf_hwcap |= HWCAP_CRC32;






[PATCH v3 2/2] arm64: Support systems without FP/ASIMD

2016-11-08 Thread Suzuki K Poulose
The arm64 kernel assumes that FP/ASIMD units are always present
and accesses the FP/ASIMD specific registers unconditionally. This
could cause problems when they are absent. This patch adds the
support for kernel handling systems without FP/ASIMD by skipping the
register access within the kernel. For kvm, we trap the accesses
to FP/ASIMD and inject an undefined instruction exception to the VM.

The callers of the exported kernel_neon_begin_partial() should
make sure that the FP/ASIMD is supported.

Cc: Catalin Marinas <catalin.mari...@arm.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Christoffer Dall <christoffer.d...@linaro.org>
Cc: Marc Zyngier <marc.zyng...@arm.com>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/include/asm/cpucaps.h|  3 ++-
 arch/arm64/include/asm/cpufeature.h |  5 +
 arch/arm64/include/asm/neon.h   |  3 ++-
 arch/arm64/kernel/cpufeature.c  | 15 +++
 arch/arm64/kernel/fpsimd.c  | 14 ++
 arch/arm64/kvm/handle_exit.c| 11 +++
 arch/arm64/kvm/hyp/hyp-entry.S  |  9 -
 arch/arm64/kvm/hyp/switch.c |  5 -
 8 files changed, 61 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index 87b4465..4174f09 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -34,7 +34,8 @@
 #define ARM64_HAS_32BIT_EL013
 #define ARM64_HYP_OFFSET_LOW   14
 #define ARM64_MISMATCHED_CACHE_LINE_SIZE   15
+#define ARM64_HAS_NO_FPSIMD16
 
-#define ARM64_NCAPS16
+#define ARM64_NCAPS17
 
 #endif /* __ASM_CPUCAPS_H */
diff --git a/arch/arm64/include/asm/cpufeature.h 
b/arch/arm64/include/asm/cpufeature.h
index 9890d20..ce45770 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -213,6 +213,11 @@ static inline bool system_supports_mixed_endian_el0(void)
return 
id_aa64mmfr0_mixed_endian_el0(read_system_reg(SYS_ID_AA64MMFR0_EL1));
 }
 
+static inline bool system_supports_fpsimd(void)
+{
+   return !cpus_have_const_cap(ARM64_HAS_NO_FPSIMD);
+}
+
 #endif /* __ASSEMBLY__ */
 
 #endif
diff --git a/arch/arm64/include/asm/neon.h b/arch/arm64/include/asm/neon.h
index 13ce4cc..ad4cdc9 100644
--- a/arch/arm64/include/asm/neon.h
+++ b/arch/arm64/include/asm/neon.h
@@ -9,8 +9,9 @@
  */
 
 #include 
+#include 
 
-#define cpu_has_neon() (1)
+#define cpu_has_neon() system_supports_fpsimd()
 
 #define kernel_neon_begin()kernel_neon_begin_partial(32)
 
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index fc2bd19..f89385d 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -746,6 +746,14 @@ static bool hyp_offset_low(const struct 
arm64_cpu_capabilities *entry,
return idmap_addr > GENMASK(VA_BITS - 2, 0) && !is_kernel_in_hyp_mode();
 }
 
+static bool has_no_fpsimd(const struct arm64_cpu_capabilities *entry, int 
__unused)
+{
+   u64 pfr0 = read_system_reg(SYS_ID_AA64PFR0_EL1);
+
+   return cpuid_feature_extract_signed_field(pfr0,
+   ID_AA64PFR0_FP_SHIFT) < 0;
+}
+
 static const struct arm64_cpu_capabilities arm64_features[] = {
{
.desc = "GIC system register CPU interface",
@@ -829,6 +837,13 @@ static const struct arm64_cpu_capabilities 
arm64_features[] = {
.def_scope = SCOPE_SYSTEM,
.matches = hyp_offset_low,
},
+   {
+   /* FP/SIMD is not implemented */
+   .capability = ARM64_HAS_NO_FPSIMD,
+   .def_scope = SCOPE_SYSTEM,
+   .min_field_value = 0,
+   .matches = has_no_fpsimd,
+   },
{},
 };
 
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 394c61d..b883f1f 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -127,6 +127,8 @@ void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs)
 
 void fpsimd_thread_switch(struct task_struct *next)
 {
+   if (!system_supports_fpsimd())
+   return;
/*
 * Save the current FPSIMD state to memory, but only if whatever is in
 * the registers is in fact the most recent userland FPSIMD state of
@@ -157,6 +159,8 @@ void fpsimd_thread_switch(struct task_struct *next)
 
 void fpsimd_flush_thread(void)
 {
+   if (!system_supports_fpsimd())
+   return;
memset(>thread.fpsimd_state, 0, sizeof(struct fpsimd_state));
fpsimd_flush_task_state(current);
set_thread_flag(TIF_FOREIGN_FPSTATE);
@@ -168,6 +172,8 @@ void fpsimd_flush_thread(void)
  */
 void fpsimd_preserve_current_state(void)
 {
+   if (!system_sup

[PATCH v3 1/2] arm64: Add hypervisor safe helper for checking constant capabilities

2016-11-08 Thread Suzuki K Poulose
The hypervisor may not have full access to the kernel data structures
and hence cannot safely use cpus_have_cap() helper for checking the
system capability. Add a safe helper for hypervisors to check a constant
system capability, which *doesn't* fall back to checking the bitmap
maintained by the kernel. With this, make the cpus_have_cap() only
check the bitmask and force constant cap checks to use the new API
for quicker checks.

Cc: Robert Ritcher <rritc...@cavium.com>
Cc: Tirumalesh Chalamarla <tchalama...@cavium.com>
Cc: Marc Zyngier <marc.zyng...@arm.com>
Cc: Catalin Marinas <catalin.mari...@arm.com>
Cc: Will Deacon <will.dea...@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
---
 arch/arm64/include/asm/cpufeature.h | 19 ---
 arch/arm64/kernel/cpufeature.c  |  2 +-
 arch/arm64/kernel/process.c |  2 +-
 drivers/irqchip/irq-gic-v3.c| 13 +
 4 files changed, 15 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/include/asm/cpufeature.h 
b/arch/arm64/include/asm/cpufeature.h
index 0bc0b1d..9890d20 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -9,8 +9,6 @@
 #ifndef __ASM_CPUFEATURE_H
 #define __ASM_CPUFEATURE_H
 
-#include 
-
 #include 
 #include 
 #include 
@@ -27,6 +25,8 @@
 
 #ifndef __ASSEMBLY__
 
+#include 
+#include 
 #include 
 
 /* CPU feature register tracking */
@@ -104,14 +104,19 @@ static inline bool cpu_have_feature(unsigned int num)
return elf_hwcap & (1UL << num);
 }
 
+/* System capability check for constant caps */
+static inline bool cpus_have_const_cap(int num)
+{
+   if (num >= ARM64_NCAPS)
+   return false;
+   return static_branch_unlikely(_hwcap_keys[num]);
+}
+
 static inline bool cpus_have_cap(unsigned int num)
 {
if (num >= ARM64_NCAPS)
return false;
-   if (__builtin_constant_p(num))
-   return static_branch_unlikely(_hwcap_keys[num]);
-   else
-   return test_bit(num, cpu_hwcaps);
+   return test_bit(num, cpu_hwcaps);
 }
 
 static inline void cpus_set_cap(unsigned int num)
@@ -200,7 +205,7 @@ static inline bool cpu_supports_mixed_endian_el0(void)
 
 static inline bool system_supports_32bit_el0(void)
 {
-   return cpus_have_cap(ARM64_HAS_32BIT_EL0);
+   return cpus_have_const_cap(ARM64_HAS_32BIT_EL0);
 }
 
 static inline bool system_supports_mixed_endian_el0(void)
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index c02504e..fc2bd19 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1102,5 +1102,5 @@ void __init setup_cpu_features(void)
 static bool __maybe_unused
 cpufeature_pan_not_uao(const struct arm64_cpu_capabilities *entry, int 
__unused)
 {
-   return (cpus_have_cap(ARM64_HAS_PAN) && !cpus_have_cap(ARM64_HAS_UAO));
+   return (cpus_have_const_cap(ARM64_HAS_PAN) && 
!cpus_have_const_cap(ARM64_HAS_UAO));
 }
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 01753cd..18354f3 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -282,7 +282,7 @@ int copy_thread(unsigned long clone_flags, unsigned long 
stack_start,
memset(childregs, 0, sizeof(struct pt_regs));
childregs->pstate = PSR_MODE_EL1h;
if (IS_ENABLED(CONFIG_ARM64_UAO) &&
-   cpus_have_cap(ARM64_HAS_UAO))
+   cpus_have_const_cap(ARM64_HAS_UAO))
childregs->pstate |= PSR_UAO_BIT;
p->thread.cpu_context.x19 = stack_start;
p->thread.cpu_context.x20 = stk_sz;
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 19d642e..26e1d7f 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -120,11 +120,10 @@ static void gic_redist_wait_for_rwp(void)
 }
 
 #ifdef CONFIG_ARM64
-static DEFINE_STATIC_KEY_FALSE(is_cavium_thunderx);
 
 static u64 __maybe_unused gic_read_iar(void)
 {
-   if (static_branch_unlikely(_cavium_thunderx))
+   if (cpus_have_const_cap(ARM64_WORKAROUND_CAVIUM_23154))
return gic_read_iar_cavium_thunderx();
else
return gic_read_iar_common();
@@ -905,14 +904,6 @@ static const struct irq_domain_ops partition_domain_ops = {
.select = gic_irq_domain_select,
 };
 
-static void gicv3_enable_quirks(void)
-{
-#ifdef CONFIG_ARM64
-   if (cpus_have_cap(ARM64_WORKAROUND_CAVIUM_23154))
-   static_branch_enable(_cavium_thunderx);
-#endif
-}
-
 static int __init gic_init_bases(void __iomem *dist_base,
 struct redist_region *rdist_regs,
 u32 nr_redist_regions,
@@ -935,8 +926,6 @@ static int __init gic_init_bases(void __iomem *dist_base,
gic_data.nr_redist_regions = nr_redis

[PATCH v3 0/2] arm64: Support systems without FP/ASIMD

2016-11-08 Thread Suzuki K Poulose
This series adds supports to the kernel and KVM hyp to handle
systems without FP/ASIMD properly. At the moment the kernel
doesn't check if the FP unit is available before accessing
the registers (e.g during context switch). Also for KVM,
we trap the FP/ASIMD accesses and handle it by injecting an
undefined instruction into the VM on systems without FP.

Tested on a FVP_Base-AEM-v8A model by disabling VFP on at
least one CPU ( -C clusterX.cpuY.vfp-present=0 ).

Changes since V2:
 - Dropped cleanup patch for arm64/crypto/aes-ce-ccm-glue.c
 - Removed static_key check from cpus_have_cap. All users with
   constant caps should use the new API to make use of static_keys.
 - Removed a dedicated static_key used in irqchip-gic-v3.c for
   Cavium errata with the new API.

Applies on v4.9-rc4 + [1] (which is pushed for rc5)

[1] http://marc.info/?l=linux-arm-kernel=147819889813214=2


Suzuki K Poulose (2):
  arm64: Add hypervisor safe helper for checking constant capabilities
  arm64: Support systems without FP/ASIMD

 arch/arm64/include/asm/cpucaps.h|  3 ++-
 arch/arm64/include/asm/cpufeature.h | 24 +---
 arch/arm64/include/asm/neon.h   |  3 ++-
 arch/arm64/kernel/cpufeature.c  | 17 -
 arch/arm64/kernel/fpsimd.c  | 14 ++
 arch/arm64/kernel/process.c |  2 +-
 arch/arm64/kvm/handle_exit.c| 11 +++
 arch/arm64/kvm/hyp/hyp-entry.S  |  9 -
 arch/arm64/kvm/hyp/switch.c |  5 -
 drivers/irqchip/irq-gic-v3.c| 13 +
 10 files changed, 76 insertions(+), 25 deletions(-)

-- 
2.7.4



Re: [PATCH V4 01/10] acpi: apei: read ack upon ghes record consumption

2016-10-24 Thread Suzuki K Poulose

On 21/10/16 18:30, Tyler Baicar wrote:

A RAS (Reliability, Availability, Serviceability) controller
may be a separate processor running in parallel with OS
execution, and may generate error records for consumption by
the OS. If the RAS controller produces multiple error records,
then they may be overwritten before the OS has consumed them.

The Generic Hardware Error Source (GHES) v2 structure
introduces the capability for the OS to acknowledge the
consumption of the error record generated by the RAS
controller. A RAS controller supporting GHESv2 shall wait for
the acknowledgment before writing a new error record, thus
eliminating the race condition.

Signed-off-by: Jonathan (Zhixiong) Zhang 
Signed-off-by: Richard Ruigrok 
Signed-off-by: Tyler Baicar 
Signed-off-by: Naveen Kaje 
---
 drivers/acpi/apei/ghes.c | 42 ++
 drivers/acpi/apei/hest.c |  7 +--
 include/acpi/ghes.h  |  5 -
 3 files changed, 51 insertions(+), 3 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 60746ef..7d020b0 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -45,6 +45,7 @@
 #include 
 #include 

+#include 
 #include 
 #include 
 #include 
@@ -79,6 +80,10 @@
((struct acpi_hest_generic_status *)\
 ((struct ghes_estatus_node *)(estatus_node) + 1))

+#define HEST_TYPE_GENERIC_V2(ghes) \
+   ((struct acpi_hest_header *)ghes->generic)->type ==   \
+ACPI_HEST_TYPE_GENERIC_ERROR_V2
+
 /*
  * This driver isn't really modular, however for the time being,
  * continuing to use module_param is the easiest way to remain
@@ -248,7 +253,15 @@ static struct ghes *ghes_new(struct acpi_hest_generic 
*generic)
ghes = kzalloc(sizeof(*ghes), GFP_KERNEL);
if (!ghes)
return ERR_PTR(-ENOMEM);
+
ghes->generic = generic;
+   if (HEST_TYPE_GENERIC_V2(ghes)) {
+   rc = apei_map_generic_address(
+   >generic_v2->read_ack_register);
+   if (rc)
+   goto err_unmap;


I think should be goto err_free, see more below.


+   }
+
rc = apei_map_generic_address(>error_status_address);
if (rc)
goto err_free;
@@ -270,6 +283,9 @@ static struct ghes *ghes_new(struct acpi_hest_generic 
*generic)

 err_unmap:
apei_unmap_generic_address(>error_status_address);
+   if (HEST_TYPE_GENERIC_V2(ghes))
+   apei_unmap_generic_address(
+   >generic_v2->read_ack_register);


We might end up trying to unmap (error_status_address) which is not mapped
if we hit the error in mapping read_ack_register. The read_ack_register unmap
hunk should be moved below to err_free.



 err_free:
kfree(ghes);
return ERR_PTR(rc);
@@ -279,6 +295,9 @@ static void ghes_fini(struct ghes *ghes)
 {
kfree(ghes->estatus);
apei_unmap_generic_address(>generic->error_status_address);
+   if (HEST_TYPE_GENERIC_V2(ghes))
+   apei_unmap_generic_address(
+   >generic_v2->read_ack_register);
 }

 static inline int ghes_severity(int severity)
@@ -648,6 +667,23 @@ static void ghes_estatus_cache_add(
rcu_read_unlock();
 }




+static int ghes_do_read_ack(struct acpi_hest_generic_v2 *generic_v2)


nit: We are actually writing something to the read_ack_register. The names
read_ack_register (which may be as per standard) and more importantly the
function name (ghes_do_read_ack) sounds a bit misleading.

Rest looks fine to me.

Suzuki



Re: [PATCH] coresight: reset "enable_sink" flag when need be

2016-10-18 Thread Suzuki K Poulose

On 07/10/16 22:10, Mathieu Poirier wrote:

When using coresight from the perf interface sinks are specified
as part of the perf command line.  As such the sink needs to be
disabled once it has been acknowledged by the coresight framework.
Otherwise the sink stays enabled, which may interfere with other
sessions.



I personally think the descriptions needs to be a bit more clearer
from here on.


This patch removes the sink selection check from the build path
process and make it a function on it's own.  The function is
then used when operating from sysFS or perf to determine what
sink has been selected.


I think you should mention that the helper function provides an
option to "de-activate" the enabled sink, once it has been "found"
by a lookup. Perf uses this option to de-activate the sink, while
sysfs leaves it to the user to do the same.

We don't have a mechanism to ensure that the "enabled" sink is
the one perf really enabled for us. But there is nothing much we
could do and should rely on the user to do it right for us.

So, with the changes to description :

Reviewed-by: Suzuki K Poulose <suzuki.poul...@arm.com>



<    4   5   6   7   8   9   10   11   12   13   >