Re: [PATCH RESEND 0/4] Add support for HiSilicon PCIe Tune and Trace device

2021-04-17 Thread Alexander Shishkin
Yicong Yang  writes:

> The reason for not using perf is because there is no current support
> for uncore tracing in the perf facilities.

Not unless you count

$ perf list|grep -ic uncore
77

> We have our own format
> of data and don't need perf doing the parsing.

Perf has AUX buffers, which are used for all kinds of own formats.

> A similar approach for implementing this function is ETM, which use
> sysfs for configuring and a character device for dumping data.

And also perf. One reason ETM has a sysfs interface is because the
driver predates perf's AUX buffers. Can't say if it's the only
reason. I'm assuming you're talking about Coresight ETM.

> Greg has some comments on our implementation and doesn't advocate
> to build driver on debugfs [1]. So I resend this series to
> collect more feedbacks on the implementation of this driver.
>
> Hi perf and ETM related experts, is it suggested to adapt this driver
> to perf? Or is the debugfs approach acceptable? Otherwise use

Aside from the above, I don't think the use of debugfs for kernel ABIs
is ever encouraged.

Regards,
--
Ale


[tip: perf/core] perf: Cap allocation order at aux_watermark

2021-04-16 Thread tip-bot2 for Alexander Shishkin
The following commit has been merged into the perf/core branch of tip:

Commit-ID: d68e6799a5c87f415d3bfa0dea49caee28ab00d1
Gitweb:
https://git.kernel.org/tip/d68e6799a5c87f415d3bfa0dea49caee28ab00d1
Author:Alexander Shishkin 
AuthorDate:Wed, 14 Apr 2021 18:49:54 +03:00
Committer: Peter Zijlstra 
CommitterDate: Fri, 16 Apr 2021 16:32:39 +02:00

perf: Cap allocation order at aux_watermark

Currently, we start allocating AUX pages half the size of the total
requested AUX buffer size, ignoring the attr.aux_watermark setting. This,
in turn, makes intel_pt driver disregard the watermark also, as it uses
page order for its SG (ToPA) configuration.

Now, this can be fixed in the intel_pt PMU driver, but seeing as it's the
only one currently making use of high order allocations, there is no
reason not to fix the allocator instead. This way, any other driver
wishing to add this support would not have to worry about this.

Signed-off-by: Alexander Shishkin 
Signed-off-by: Peter Zijlstra (Intel) 
Link: 
https://lkml.kernel.org/r/20210414154955.49603-2-alexander.shish...@linux.intel.com
---
 kernel/events/ring_buffer.c | 34 ++
 1 file changed, 18 insertions(+), 16 deletions(-)

diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index bd55ccc..5286871 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -674,21 +674,26 @@ int rb_alloc_aux(struct perf_buffer *rb, struct 
perf_event *event,
if (!has_aux(event))
return -EOPNOTSUPP;
 
-   /*
-* We need to start with the max_order that fits in nr_pages,
-* not the other way around, hence ilog2() and not get_order.
-*/
-   max_order = ilog2(nr_pages);
-
-   /*
-* PMU requests more than one contiguous chunks of memory
-* for SW double buffering
-*/
if (!overwrite) {
-   if (!max_order)
-   return -EINVAL;
+   /*
+* Watermark defaults to half the buffer, and so does the
+* max_order, to aid PMU drivers in double buffering.
+*/
+   if (!watermark)
+   watermark = nr_pages << (PAGE_SHIFT - 1);
 
-   max_order--;
+   /*
+* Use aux_watermark as the basis for chunking to
+* help PMU drivers honor the watermark.
+*/
+   max_order = get_order(watermark);
+   } else {
+   /*
+* We need to start with the max_order that fits in nr_pages,
+* not the other way around, hence ilog2() and not get_order.
+*/
+   max_order = ilog2(nr_pages);
+   watermark = 0;
}
 
rb->aux_pages = kcalloc_node(nr_pages, sizeof(void *), GFP_KERNEL,
@@ -743,9 +748,6 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event 
*event,
rb->aux_overwrite = overwrite;
rb->aux_watermark = watermark;
 
-   if (!rb->aux_watermark && !rb->aux_overwrite)
-   rb->aux_watermark = nr_pages << (PAGE_SHIFT - 1);
-
 out:
if (!ret)
rb->aux_pgoff = pgoff;


[tip: perf/core] perf intel-pt: Use aux_watermark

2021-04-16 Thread tip-bot2 for Alexander Shishkin
The following commit has been merged into the perf/core branch of tip:

Commit-ID: 874fc35cdd55e2d46161901de43ec58ca2efc5fe
Gitweb:
https://git.kernel.org/tip/874fc35cdd55e2d46161901de43ec58ca2efc5fe
Author:Alexander Shishkin 
AuthorDate:Wed, 14 Apr 2021 18:49:55 +03:00
Committer: Peter Zijlstra 
CommitterDate: Fri, 16 Apr 2021 16:32:39 +02:00

perf intel-pt: Use aux_watermark

Turns out, the default setting of attr.aux_watermark to half of the total
buffer size is not very useful, especially with smaller buffers. The
problem is that, after half of the buffer is filled up, the kernel updates
->aux_head and sets up the next "transaction", while observing that
->aux_tail is still zero (as userspace haven't had the chance to update
it), meaning that the trace will have to stop at the end of this second
"transaction". This means, for example, that the second PERF_RECORD_AUX in
every trace comes with TRUNCATED flag set.

Setting attr.aux_watermark to quarter of the buffer gives enough space for
the ->aux_tail update to be observed and prevents the data loss.

The obligatory before/after showcase:

> # perf_before record -e intel_pt//u -m,8 uname
> Linux
> [ perf record: Woken up 6 times to write data ]
> Warning:
> AUX data lost 4 times out of 10!
>
> [ perf record: Captured and wrote 0.099 MB perf.data ]
> # perf record -e intel_pt//u -m,8 uname
> Linux
> [ perf record: Woken up 4 times to write data ]
> [ perf record: Captured and wrote 0.039 MB perf.data ]

The effect is still visible with large workloads and large buffers,
although less pronounced.

Signed-off-by: Alexander Shishkin 
Signed-off-by: Peter Zijlstra (Intel) 
Link: 
https://lkml.kernel.org/r/20210414154955.49603-3-alexander.shish...@linux.intel.com
---
 tools/perf/arch/x86/util/intel-pt.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/tools/perf/arch/x86/util/intel-pt.c 
b/tools/perf/arch/x86/util/intel-pt.c
index a6420c6..6df0dc0 100644
--- a/tools/perf/arch/x86/util/intel-pt.c
+++ b/tools/perf/arch/x86/util/intel-pt.c
@@ -776,6 +776,12 @@ static int intel_pt_recording_options(struct 
auxtrace_record *itr,
}
}
 
+   if (!opts->auxtrace_snapshot_mode && !opts->auxtrace_sample_mode) {
+   u32 aux_watermark = opts->auxtrace_mmap_pages * page_size / 4;
+
+   intel_pt_evsel->core.attr.aux_watermark = aux_watermark;
+   }
+
intel_pt_parse_terms(intel_pt_pmu->name, _pt_pmu->format,
 "tsc", _bit);
 


Re: [PATCH v1 2/2] dt-bindings: arm: add property for coresight component name

2021-04-16 Thread Alexander Shishkin
Tao Zhang  writes:

> Add property "coresight-name" for coresight component name. This
> allows coresight driver to read device name from device entries.
>
> Signed-off-by: Tao Zhang 
> ---
>  Documentation/devicetree/bindings/arm/coresight.txt | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/Documentation/devicetree/bindings/arm/coresight.txt 
> b/Documentation/devicetree/bindings/arm/coresight.txt
> index d711676..0e980ce 100644
> --- a/Documentation/devicetree/bindings/arm/coresight.txt
> +++ b/Documentation/devicetree/bindings/arm/coresight.txt
> @@ -103,6 +103,8 @@ its hardware characteristcs.
> powers down the coresight component also powers down and loses its
> context. This property is currently only used for the ETM 4.x driver.
>  
> + * coresight-name: the name of the coresight devices.

Which devices? Also, is it a common practice to extend device tree
definitions based on arbitrary driver needs, or should there be some
sort of a discussion first?

Regards,
--
Alex


Re: [PATCH v1 1/2] coresight: Add support for device names

2021-04-16 Thread Alexander Shishkin
Tao Zhang  writes:

> diff --git a/drivers/hwtracing/coresight/coresight-core.c 
> b/drivers/hwtracing/coresight/coresight-core.c
> index 4ba801d..b79c726 100644
> --- a/drivers/hwtracing/coresight/coresight-core.c
> +++ b/drivers/hwtracing/coresight/coresight-core.c
> @@ -1640,6 +1640,12 @@ char *coresight_alloc_device_name(struct 
> coresight_dev_list *dict,
>   int idx;
>   char *name = NULL;
>   struct fwnode_handle **list;
> + struct device_node *node = dev->of_node;
> +
> + if (!node) {
> + if (!of_property_read_string(node, "coresight-name", ))

Ok, I'm not a device tree expert, but I'm pretty sure the above is a
nop.

Regards,
--
Alex


Re: [PATCH 2/7] stm class: Replace uuid_t with plain u8 uuid[16]

2021-04-15 Thread Alexander Shishkin
Greg Kroah-Hartman  writes:

> On Wed, Apr 14, 2021 at 10:14:34PM +0300, Alexander Shishkin wrote:
>> Greg Kroah-Hartman  writes:
>> 
>> >> Using raw buffer APIs against uuid_t / guid_t.
>> >
>> > So you want to do that, or you do not want to do that?  Totally
>> > confused,
>> 
>> My understanding is that:
>> 1) generate_random_uuid() use is allegedly bad even though it's in their
>> header,
>> 2) poking directly at the byte array inside uuid_t is bad, even though,
>> again, header.
>> 
>> It is, indeed, not ideal.
>> 
>> If agreeable, I'll update this patch to the below and respin the whole
>> series.
>
> You are showing that Andy wrote this, when you are the one that did :(

That's intentional, it's Andy's patch. In fact, it was probably me who
insisted on the open-coded-byte-array version, in an offline
conversation some time ago. I'd like to keep his name on it if that's
ok. I've re-sent it [1] as a standalone patch.

> Anyway, I've dropped this single patch from the series and applied the
> rest.  Feel free to send this patch as a stand-alone one once you have
> the authorship issues sorted out.

Thank you!

[1] 
https://lore.kernel.org/lkml/20210415091555.88085-1-alexander.shish...@linux.intel.com/

Regards,
--
Alex


[PATCH] stm class: Use correct UUID APIs

2021-04-15 Thread Alexander Shishkin
From: Andy Shevchenko 

It appears that the STM code didn't manage to accurately decypher the
delicate inner workings of an alternative thought process behind the
UUID API and directly called generate_random_uuid() that clearly needs
to be a static function in lib/uuid.c.

At the same time, said STM code is poking directly at the byte array
inside the uuid_t when it uses the UUID for its internal purposes.

Fix these two transgressions by using intended APIs instead.

Signed-off-by: Andy Shevchenko 
[ash: changed back to uuid_t and updated the commit message]
Signed-off-by: Alexander Shishkin 
---
 drivers/hwtracing/stm/p_sys-t.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/hwtracing/stm/p_sys-t.c b/drivers/hwtracing/stm/p_sys-t.c
index 360b5c03df95..8254971c02e7 100644
--- a/drivers/hwtracing/stm/p_sys-t.c
+++ b/drivers/hwtracing/stm/p_sys-t.c
@@ -92,7 +92,7 @@ static void sys_t_policy_node_init(void *priv)
 {
struct sys_t_policy_node *pn = priv;
 
-   generate_random_uuid(pn->uuid.b);
+   uuid_gen(>uuid);
 }
 
 static int sys_t_output_open(void *priv, struct stm_output *output)
@@ -292,6 +292,7 @@ static ssize_t sys_t_write(struct stm_data *data, struct 
stm_output *output,
unsigned int m = output->master;
const unsigned char nil = 0;
u32 header = DATA_HEADER;
+   u8 uuid[UUID_SIZE];
ssize_t sz;
 
/* We require an existing policy node to proceed */
@@ -322,7 +323,8 @@ static ssize_t sys_t_write(struct stm_data *data, struct 
stm_output *output,
return sz;
 
/* GUID */
-   sz = stm_data_write(data, m, c, false, op->node.uuid.b, UUID_SIZE);
+   export_uuid(uuid, >node.uuid);
+   sz = stm_data_write(data, m, c, false, uuid, sizeof(op->node.uuid));
if (sz <= 0)
return sz;
 
-- 
2.30.2



Re: [PATCH 2/7] stm class: Replace uuid_t with plain u8 uuid[16]

2021-04-14 Thread Alexander Shishkin
Greg Kroah-Hartman  writes:

>> Using raw buffer APIs against uuid_t / guid_t.
>
> So you want to do that, or you do not want to do that?  Totally
> confused,

My understanding is that:
1) generate_random_uuid() use is allegedly bad even though it's in their
header,
2) poking directly at the byte array inside uuid_t is bad, even though,
again, header.

It is, indeed, not ideal.

If agreeable, I'll update this patch to the below and respin the whole
series.

>From 02340f8c7c17ace028040a35553c33cce8f3bce4 Mon Sep 17 00:00:00 2001
From: Andy Shevchenko 
Date: Wed, 22 Apr 2020 16:02:20 +0300
Subject: [PATCH] stm class: Use correct UUID APIs

It appears that the STM code didn't manage to accurately decypher the
delicate inner workings of an alternative thought process behind the
UUID API and directly called generate_random_uuid() that clearly needs
to be a static function in lib/uuid.c.

At the same time, said STM code is poking directly at the byte array
inside the uuid_t when it uses the UUID for its internal purposes.

Fix these two transgressions by using intended APIs instead.

Signed-off-by: Andy Shevchenko 
[ash: changed back to uuid_t and updated the commit message]
Signed-off-by: Alexander Shishkin 
---
 drivers/hwtracing/stm/p_sys-t.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/hwtracing/stm/p_sys-t.c b/drivers/hwtracing/stm/p_sys-t.c
index 360b5c03df95..8254971c02e7 100644
--- a/drivers/hwtracing/stm/p_sys-t.c
+++ b/drivers/hwtracing/stm/p_sys-t.c
@@ -92,7 +92,7 @@ static void sys_t_policy_node_init(void *priv)
 {
struct sys_t_policy_node *pn = priv;
 
-   generate_random_uuid(pn->uuid.b);
+   uuid_gen(>uuid);
 }
 
 static int sys_t_output_open(void *priv, struct stm_output *output)
@@ -292,6 +292,7 @@ static ssize_t sys_t_write(struct stm_data *data, struct 
stm_output *output,
unsigned int m = output->master;
const unsigned char nil = 0;
u32 header = DATA_HEADER;
+   u8 uuid[UUID_SIZE];
ssize_t sz;
 
/* We require an existing policy node to proceed */
@@ -322,7 +323,8 @@ static ssize_t sys_t_write(struct stm_data *data, struct 
stm_output *output,
return sz;
 
/* GUID */
-   sz = stm_data_write(data, m, c, false, op->node.uuid.b, UUID_SIZE);
+   export_uuid(uuid, >node.uuid);
+   sz = stm_data_write(data, m, c, false, uuid, sizeof(op->node.uuid));
if (sz <= 0)
return sz;
 
-- 
2.30.2



[PATCH 7/7] intel_th: pci: Add Alder Lake-M support

2021-04-14 Thread Alexander Shishkin
This adds support for the Trace Hub in Alder Lake-M PCH.

Signed-off-by: Alexander Shishkin 
Reviewed-by: Andy Shevchenko 
Cc: sta...@vger.kernel.org # v4.14+
---
 drivers/hwtracing/intel_th/pci.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/hwtracing/intel_th/pci.c b/drivers/hwtracing/intel_th/pci.c
index a756c995fc7a..7da4f298ed01 100644
--- a/drivers/hwtracing/intel_th/pci.c
+++ b/drivers/hwtracing/intel_th/pci.c
@@ -273,6 +273,11 @@ static const struct pci_device_id intel_th_pci_id_table[] 
= {
PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x51a6),
.driver_data = (kernel_ulong_t)_th_2x,
},
+   {
+   /* Alder Lake-M */
+   PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x54a6),
+   .driver_data = (kernel_ulong_t)_th_2x,
+   },
{
/* Alder Lake CPU */
PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x466f),
-- 
2.30.2



[PATCH 3/7] intel_th: Constify all drvdata references

2021-04-14 Thread Alexander Shishkin
Anything that deals with drvdata structures should leave them intact.
Reflect this in function signatures.

Signed-off-by: Alexander Shishkin 
Reviewed-by: Andy Shevchenko 
---
 drivers/hwtracing/intel_th/core.c | 2 +-
 drivers/hwtracing/intel_th/intel_th.h | 6 +++---
 drivers/hwtracing/intel_th/pci.c  | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/hwtracing/intel_th/core.c 
b/drivers/hwtracing/intel_th/core.c
index c9ac3dc65113..24d0c974bfd5 100644
--- a/drivers/hwtracing/intel_th/core.c
+++ b/drivers/hwtracing/intel_th/core.c
@@ -844,7 +844,7 @@ static irqreturn_t intel_th_irq(int irq, void *data)
  * @irq:   irq number
  */
 struct intel_th *
-intel_th_alloc(struct device *dev, struct intel_th_drvdata *drvdata,
+intel_th_alloc(struct device *dev, const struct intel_th_drvdata *drvdata,
   struct resource *devres, unsigned int ndevres)
 {
int err, r, nr_mmios = 0;
diff --git a/drivers/hwtracing/intel_th/intel_th.h 
b/drivers/hwtracing/intel_th/intel_th.h
index 5fe694708b7a..05fa2dab37d1 100644
--- a/drivers/hwtracing/intel_th/intel_th.h
+++ b/drivers/hwtracing/intel_th/intel_th.h
@@ -74,7 +74,7 @@ struct intel_th_drvdata {
  */
 struct intel_th_device {
struct device   dev;
-   struct intel_th_drvdata *drvdata;
+   const struct intel_th_drvdata *drvdata;
struct resource *resource;
unsigned intnum_resources;
unsigned inttype;
@@ -224,7 +224,7 @@ static inline struct intel_th *to_intel_th(struct 
intel_th_device *thdev)
 }
 
 struct intel_th *
-intel_th_alloc(struct device *dev, struct intel_th_drvdata *drvdata,
+intel_th_alloc(struct device *dev, const struct intel_th_drvdata *drvdata,
   struct resource *devres, unsigned int ndevres);
 void intel_th_free(struct intel_th *th);
 
@@ -272,7 +272,7 @@ struct intel_th {
 
struct intel_th_device  *thdev[TH_SUBDEVICE_MAX];
struct intel_th_device  *hub;
-   struct intel_th_drvdata *drvdata;
+   const struct intel_th_drvdata   *drvdata;
 
struct resource resource[TH_MMIO_END];
int (*activate)(struct intel_th *);
diff --git a/drivers/hwtracing/intel_th/pci.c b/drivers/hwtracing/intel_th/pci.c
index 251e75c9ba9d..759994055cb4 100644
--- a/drivers/hwtracing/intel_th/pci.c
+++ b/drivers/hwtracing/intel_th/pci.c
@@ -71,7 +71,7 @@ static void intel_th_pci_deactivate(struct intel_th *th)
 static int intel_th_pci_probe(struct pci_dev *pdev,
  const struct pci_device_id *id)
 {
-   struct intel_th_drvdata *drvdata = (void *)id->driver_data;
+   const struct intel_th_drvdata *drvdata = (void *)id->driver_data;
struct resource resource[TH_MMIO_END + TH_NVEC_MAX] = {
[TH_MMIO_CONFIG]= pdev->resource[TH_PCI_CONFIG_BAR],
[TH_MMIO_SW]= pdev->resource[TH_PCI_STH_SW_BAR],
-- 
2.30.2



[PATCH 6/7] intel_th: pci: Add Rocket Lake CPU support

2021-04-14 Thread Alexander Shishkin
This adds support for the Trace Hub in Rocket Lake CPUs.

Signed-off-by: Alexander Shishkin 
Reviewed-by: Andy Shevchenko 
Cc: stable  # v4.14+
---
 drivers/hwtracing/intel_th/pci.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/hwtracing/intel_th/pci.c b/drivers/hwtracing/intel_th/pci.c
index 759994055cb4..a756c995fc7a 100644
--- a/drivers/hwtracing/intel_th/pci.c
+++ b/drivers/hwtracing/intel_th/pci.c
@@ -278,6 +278,11 @@ static const struct pci_device_id intel_th_pci_id_table[] 
= {
PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x466f),
.driver_data = (kernel_ulong_t)_th_2x,
},
+   {
+   /* Rocket Lake CPU */
+   PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x4c19),
+   .driver_data = (kernel_ulong_t)_th_2x,
+   },
{ 0 },
 };
 
-- 
2.30.2



[PATCH 5/7] intel_th: Consistency and off-by-one fix

2021-04-14 Thread Alexander Shishkin
From: Pavel Machek 

Consistently use "< ... +1" in for loops.

Fix of-by-one in for_each_set_bit().

Signed-off-by: Pavel Machek 
Signed-off-by: Alexander Shishkin 
Link: https://lore.kernel.org/lkml/20190724095841.GA6952@amd/
Reviewed-by: Andy Shevchenko 
---
 drivers/hwtracing/intel_th/gth.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/hwtracing/intel_th/gth.c b/drivers/hwtracing/intel_th/gth.c
index f72803a02391..28509b02a0b5 100644
--- a/drivers/hwtracing/intel_th/gth.c
+++ b/drivers/hwtracing/intel_th/gth.c
@@ -543,7 +543,7 @@ static void intel_th_gth_disable(struct intel_th_device 
*thdev,
output->active = false;
 
for_each_set_bit(master, gth->output[output->port].master,
-TH_CONFIGURABLE_MASTERS) {
+TH_CONFIGURABLE_MASTERS + 1) {
gth_master_set(gth, master, -1);
}
spin_unlock(>gth_lock);
@@ -697,7 +697,7 @@ static void intel_th_gth_unassign(struct intel_th_device 
*thdev,
othdev->output.port = -1;
othdev->output.active = false;
gth->output[port].output = NULL;
-   for (master = 0; master <= TH_CONFIGURABLE_MASTERS; master++)
+   for (master = 0; master < TH_CONFIGURABLE_MASTERS + 1; master++)
if (gth->master[master] == port)
gth->master[master] = -1;
spin_unlock(>gth_lock);
-- 
2.30.2



[PATCH 4/7] intel_th: Constify attribute_group structs

2021-04-14 Thread Alexander Shishkin
From: Rikard Falkeborn 

The only usage of them is to pass their address to sysfs_create_group()
and sysfs_remove_group(), both which have pointers to const
attribute_group structs as input. Make them const to allow the compiler
to put them in read-only memory.

Signed-off-by: Rikard Falkeborn 
Signed-off-by: Alexander Shishkin 
Reviewed-by: Andy Shevchenko 
---
 drivers/hwtracing/intel_th/intel_th.h | 2 +-
 drivers/hwtracing/intel_th/msu.c  | 2 +-
 drivers/hwtracing/intel_th/pti.c  | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/hwtracing/intel_th/intel_th.h 
b/drivers/hwtracing/intel_th/intel_th.h
index 05fa2dab37d1..89c67e0e1d34 100644
--- a/drivers/hwtracing/intel_th/intel_th.h
+++ b/drivers/hwtracing/intel_th/intel_th.h
@@ -178,7 +178,7 @@ struct intel_th_driver {
/* file_operations for those who want a device node */
const struct file_operations *fops;
/* optional attributes */
-   struct attribute_group  *attr_group;
+   const struct attribute_group *attr_group;
 
/* source ops */
int (*set_output)(struct intel_th_device *thdev,
diff --git a/drivers/hwtracing/intel_th/msu.c b/drivers/hwtracing/intel_th/msu.c
index 7d95242db900..2edc433d 100644
--- a/drivers/hwtracing/intel_th/msu.c
+++ b/drivers/hwtracing/intel_th/msu.c
@@ -2095,7 +2095,7 @@ static struct attribute *msc_output_attrs[] = {
NULL,
 };
 
-static struct attribute_group msc_output_group = {
+static const struct attribute_group msc_output_group = {
.attrs  = msc_output_attrs,
 };
 
diff --git a/drivers/hwtracing/intel_th/pti.c b/drivers/hwtracing/intel_th/pti.c
index 0da6b787f553..09132ab8bc23 100644
--- a/drivers/hwtracing/intel_th/pti.c
+++ b/drivers/hwtracing/intel_th/pti.c
@@ -142,7 +142,7 @@ static struct attribute *pti_output_attrs[] = {
NULL,
 };
 
-static struct attribute_group pti_output_group = {
+static const struct attribute_group pti_output_group = {
.attrs  = pti_output_attrs,
 };
 
@@ -295,7 +295,7 @@ static struct attribute *lpp_output_attrs[] = {
NULL,
 };
 
-static struct attribute_group lpp_output_group = {
+static const struct attribute_group lpp_output_group = {
.attrs  = lpp_output_attrs,
 };
 
-- 
2.30.2



[PATCH 2/7] stm class: Replace uuid_t with plain u8 uuid[16]

2021-04-14 Thread Alexander Shishkin
From: Andy Shevchenko 

It appears that uuid_t use in STM code abuses UUID API. Moreover,
this type is only useful when we parse user input. Due to above
replace uuid_t with u8 uuid[16] and use uuid_t only when parse
user input.

Signed-off-by: Andy Shevchenko 
Signed-off-by: Alexander Shishkin 
---
 drivers/hwtracing/stm/p_sys-t.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/hwtracing/stm/p_sys-t.c b/drivers/hwtracing/stm/p_sys-t.c
index 360b5c03df95..04d13b3785d3 100644
--- a/drivers/hwtracing/stm/p_sys-t.c
+++ b/drivers/hwtracing/stm/p_sys-t.c
@@ -76,7 +76,7 @@ enum sys_t_message_string_subtype {
 MIPI_SYST_SEVERITY(MAX))
 
 struct sys_t_policy_node {
-   uuid_t  uuid;
+   u8  uuid[UUID_SIZE];
booldo_len;
unsigned long   ts_interval;
unsigned long   clocksync_interval;
@@ -92,7 +92,7 @@ static void sys_t_policy_node_init(void *priv)
 {
struct sys_t_policy_node *pn = priv;
 
-   generate_random_uuid(pn->uuid.b);
+   generate_random_uuid(pn->uuid);
 }
 
 static int sys_t_output_open(void *priv, struct stm_output *output)
@@ -120,7 +120,7 @@ static ssize_t sys_t_policy_uuid_show(struct config_item 
*item,
 {
struct sys_t_policy_node *pn = to_pdrv_policy_node(item);
 
-   return sprintf(page, "%pU\n", >uuid);
+   return sprintf(page, "%pU\n", pn->uuid);
 }
 
 static ssize_t
@@ -129,13 +129,17 @@ sys_t_policy_uuid_store(struct config_item *item, const 
char *page,
 {
struct mutex *mutexp = >ci_group->cg_subsys->su_mutex;
struct sys_t_policy_node *pn = to_pdrv_policy_node(item);
+   uuid_t uuid;
int ret;
 
mutex_lock(mutexp);
-   ret = uuid_parse(page, >uuid);
+   ret = uuid_parse(page, );
mutex_unlock(mutexp);
+   if (ret)
+   return ret;
 
-   return ret < 0 ? ret : count;
+   export_uuid(pn->uuid, );
+   return count;
 }
 
 CONFIGFS_ATTR(sys_t_policy_, uuid);
@@ -322,7 +326,7 @@ static ssize_t sys_t_write(struct stm_data *data, struct 
stm_output *output,
return sz;
 
/* GUID */
-   sz = stm_data_write(data, m, c, false, op->node.uuid.b, UUID_SIZE);
+   sz = stm_data_write(data, m, c, false, op->node.uuid, 
sizeof(op->node.uuid));
if (sz <= 0)
return sz;
 
-- 
2.30.2



[PATCH 1/7] stm class: Remove an unused function

2021-04-14 Thread Alexander Shishkin
From: Jiapeng Chong 

Fix the following clang warning:

drivers/hwtracing/stm/policy.c:60:21: warning: unused function
'stp_policy_node_name' [-Wunused-function].

Reported-by: Abaci Robot 
Signed-off-by: Jiapeng Chong 
Signed-off-by: Alexander Shishkin 
Reviewed-by: Andy Shevchenko 
---
 drivers/hwtracing/stm/policy.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/drivers/hwtracing/stm/policy.c b/drivers/hwtracing/stm/policy.c
index 603b4a9969d3..42103c3a177f 100644
--- a/drivers/hwtracing/stm/policy.c
+++ b/drivers/hwtracing/stm/policy.c
@@ -57,11 +57,6 @@ void stp_policy_node_get_ranges(struct stp_policy_node 
*policy_node,
*cend   = policy_node->last_channel;
 }
 
-static inline char *stp_policy_node_name(struct stp_policy_node *policy_node)
-{
-   return policy_node->group.cg_item.ci_name ? : "";
-}
-
 static inline struct stp_policy *to_stp_policy(struct config_item *item)
 {
return item ?
-- 
2.30.2



[PATCH 0/7] stm class/intel_th: Updates for v5.13

2021-04-14 Thread Alexander Shishkin
Hi Greg,

Here are the stm class and intel_th updates that I have for v5.13. These
are all trivial, including 2 new PCI IDs. Andy provided his reviewed-bys.
Please consider applying. Thank you!

Alexander Shishkin (3):
  intel_th: Constify all drvdata references
  intel_th: pci: Add Rocket Lake CPU support
  intel_th: pci: Add Alder Lake-M support

Andy Shevchenko (1):
  stm class: Replace uuid_t with plain u8 uuid[16]

Jiapeng Chong (1):
  stm class: Remove an unused function

Pavel Machek (1):
  intel_th: Consistency and off-by-one fix

Rikard Falkeborn (1):
  intel_th: Constify attribute_group structs

 drivers/hwtracing/intel_th/core.c |  2 +-
 drivers/hwtracing/intel_th/gth.c  |  4 ++--
 drivers/hwtracing/intel_th/intel_th.h |  8 
 drivers/hwtracing/intel_th/msu.c  |  2 +-
 drivers/hwtracing/intel_th/pci.c  | 12 +++-
 drivers/hwtracing/intel_th/pti.c  |  4 ++--
 drivers/hwtracing/stm/p_sys-t.c   | 16 ++--
 drivers/hwtracing/stm/policy.c|  5 -
 8 files changed, 31 insertions(+), 22 deletions(-)

-- 
2.30.2



[PATCH 0/2] perf, pt: Improve data loss

2021-04-14 Thread Alexander Shishkin
Hi guys,

v1: fixed whitespace damage in 1/2 and applied Adrian's comment to 2/2.

There is a problem between the PT driver and the AUX allocator that results
in smaller buffers consisting of 2 high-order regions, which also means
only 2 possibilities of where PMI gets generated and where tracing stops.

This is not good enough for double buffering: when we get a PMI mid-buffer,
we update the ->aux_head etc and immediately start a new transaction while
observing ->aux_tail to still be zero, which makes the PT driver put a stop
bit at the end of the buffer. However quick userspace is to update the
->aux_tail, that second transaction/PERF_RECORD_AUX ends up truncated.

The proposed solution here is to set up attr.aux_watermark to a quarter of
the buffer. Unfortunately, at the moment, the PT driver is not equipped to
deal with aux_watermark that's smaller than the AUX allocation order. I
could fix this in the driver itself, but, seeing as it's the only PMU that
actually uses the 'order' from AUX allocations, I'd rather fix the
allocator instead, which is done in patch 1/2.

Patch 2/2 could be replaced by instead changing the in-kernel aux_watermark
default, but that may interfere with PMU drivers that don't ignore said
watermark / handle->wakeup (afaict, that's only arm_spe).

Alexander Shishkin (2):
  perf: Cap allocation order at aux_watermark
  perf intel-pt: Use aux_watermark

 kernel/events/ring_buffer.c | 34 +++--
 tools/perf/arch/x86/util/intel-pt.c |  6 +
 2 files changed, 24 insertions(+), 16 deletions(-)

-- 
2.30.2



[PATCH v1 2/2] perf intel-pt: Use aux_watermark

2021-04-14 Thread Alexander Shishkin
Turns out, the default setting of attr.aux_watermark to half of the total
buffer size is not very useful, especially with smaller buffers. The
problem is that, after half of the buffer is filled up, the kernel updates
->aux_head and sets up the next "transaction", while observing that
->aux_tail is still zero (as userspace haven't had the chance to update
it), meaning that the trace will have to stop at the end of this second
"transaction". This means, for example, that the second PERF_RECORD_AUX in
every trace comes with TRUNCATED flag set.

Setting attr.aux_watermark to quarter of the buffer gives enough space for
the ->aux_tail update to be observed and prevents the data loss.

The obligatory before/after showcase:

> # perf_before record -e intel_pt//u -m,8 uname
> Linux
> [ perf record: Woken up 6 times to write data ]
> Warning:
> AUX data lost 4 times out of 10!
>
> [ perf record: Captured and wrote 0.099 MB perf.data ]
> # perf record -e intel_pt//u -m,8 uname
> Linux
> [ perf record: Woken up 4 times to write data ]
> [ perf record: Captured and wrote 0.039 MB perf.data ]

The effect is still visible with large workloads and large buffers,
although less pronounced.

Signed-off-by: Alexander Shishkin 
---
 tools/perf/arch/x86/util/intel-pt.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/tools/perf/arch/x86/util/intel-pt.c 
b/tools/perf/arch/x86/util/intel-pt.c
index a6420c647959..6df0dc00d73a 100644
--- a/tools/perf/arch/x86/util/intel-pt.c
+++ b/tools/perf/arch/x86/util/intel-pt.c
@@ -776,6 +776,12 @@ static int intel_pt_recording_options(struct 
auxtrace_record *itr,
}
}
 
+   if (!opts->auxtrace_snapshot_mode && !opts->auxtrace_sample_mode) {
+   u32 aux_watermark = opts->auxtrace_mmap_pages * page_size / 4;
+
+   intel_pt_evsel->core.attr.aux_watermark = aux_watermark;
+   }
+
intel_pt_parse_terms(intel_pt_pmu->name, _pt_pmu->format,
 "tsc", _bit);
 
-- 
2.30.2



[PATCH v1 1/2] perf: Cap allocation order at aux_watermark

2021-04-14 Thread Alexander Shishkin
Currently, we start allocating AUX pages half the size of the total
requested AUX buffer size, ignoring the attr.aux_watermark setting. This,
in turn, makes intel_pt driver disregard the watermark also, as it uses
page order for its SG (ToPA) configuration.

Now, this can be fixed in the intel_pt PMU driver, but seeing as it's the
only one currently making use of high order allocations, there is no
reason not to fix the allocator instead. This way, any other driver
wishing to add this support would not have to worry about this.

Signed-off-by: Alexander Shishkin 
---
 kernel/events/ring_buffer.c | 34 ++
 1 file changed, 18 insertions(+), 16 deletions(-)

diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index bd55ccc91373..52868716ec35 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -674,21 +674,26 @@ int rb_alloc_aux(struct perf_buffer *rb, struct 
perf_event *event,
if (!has_aux(event))
return -EOPNOTSUPP;
 
-   /*
-* We need to start with the max_order that fits in nr_pages,
-* not the other way around, hence ilog2() and not get_order.
-*/
-   max_order = ilog2(nr_pages);
-
-   /*
-* PMU requests more than one contiguous chunks of memory
-* for SW double buffering
-*/
if (!overwrite) {
-   if (!max_order)
-   return -EINVAL;
+   /*
+* Watermark defaults to half the buffer, and so does the
+* max_order, to aid PMU drivers in double buffering.
+*/
+   if (!watermark)
+   watermark = nr_pages << (PAGE_SHIFT - 1);
 
-   max_order--;
+   /*
+* Use aux_watermark as the basis for chunking to
+* help PMU drivers honor the watermark.
+*/
+   max_order = get_order(watermark);
+   } else {
+   /*
+* We need to start with the max_order that fits in nr_pages,
+* not the other way around, hence ilog2() and not get_order.
+*/
+   max_order = ilog2(nr_pages);
+   watermark = 0;
}
 
rb->aux_pages = kcalloc_node(nr_pages, sizeof(void *), GFP_KERNEL,
@@ -743,9 +748,6 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event 
*event,
rb->aux_overwrite = overwrite;
rb->aux_watermark = watermark;
 
-   if (!rb->aux_watermark && !rb->aux_overwrite)
-   rb->aux_watermark = nr_pages << (PAGE_SHIFT - 1);
-
 out:
if (!ret)
rb->aux_pgoff = pgoff;
-- 
2.30.2



Re: [PATCH 2/2] perf intel-pt: Use aux_watermark

2021-04-14 Thread Alexander Shishkin
Adrian Hunter  writes:

> On 8/04/21 6:31 pm, Alexander Shishkin wrote:
>> diff --git a/tools/perf/arch/x86/util/intel-pt.c 
>> b/tools/perf/arch/x86/util/intel-pt.c
>> index a6420c647959..d00707faf547 100644
>> --- a/tools/perf/arch/x86/util/intel-pt.c
>> +++ b/tools/perf/arch/x86/util/intel-pt.c
>> @@ -776,6 +776,10 @@ static int intel_pt_recording_options(struct 
>> auxtrace_record *itr,
>>  }
>>  }
>>  
>> +if (opts->full_auxtrace)
>> +intel_pt_evsel->core.attr.aux_watermark =
>> +   opts->auxtrace_mmap_pages / 4 * page_size;
>> +
>
> I would be explicit about the mode and put "/ 4" at the end
> for the case auxtrace_mmap_pages is not a multiple of 4 (e.g. 2).
> i.e.
>
>   if (!opts->auxtrace_snapshot_mode && !opts->auxtrace_sample_mode) {
>   u32 aux_watermark = opts->auxtrace_mmap_pages * page_size / 4;
>
>   intel_pt_evsel->core.attr.aux_watermark = aux_watermark;
>   }

Thank you! I'll do exactly that.

Regards,
--
Alex


[PATCH 2/2] perf intel-pt: Use aux_watermark

2021-04-08 Thread Alexander Shishkin
Turns out, the default setting of attr.aux_watermark to half of the total
buffer size is not very useful, especially with smaller buffers. The
problem is that, after half of the buffer is filled up, the kernel updates
->aux_head and sets up the next "transaction", while observing that
->aux_tail is still zero (as userspace haven't had the chance to update
it), meaning that the trace will have to stop at the end of this second
"transaction". This means, for example, that the second PERF_RECORD_AUX in
every trace comes with TRUNCATED flag set.

Setting attr.aux_watermark to quarter of the buffer gives enough space for
the ->aux_tail update to be observed and prevents the data loss.

The obligatory before/after showcase:

> # perf_before record -e intel_pt//u -m,8 uname
> Linux
> [ perf record: Woken up 6 times to write data ]
> Warning:
> AUX data lost 4 times out of 10!
>
> [ perf record: Captured and wrote 0.099 MB perf.data ]
> # perf record -e intel_pt//u -m,8 uname
> Linux
> [ perf record: Woken up 4 times to write data ]
> [ perf record: Captured and wrote 0.039 MB perf.data ]

The effect is still visible with large workloads and large buffers,
although less pronounced.

Signed-off-by: Alexander Shishkin 
---
 tools/perf/arch/x86/util/intel-pt.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/tools/perf/arch/x86/util/intel-pt.c 
b/tools/perf/arch/x86/util/intel-pt.c
index a6420c647959..d00707faf547 100644
--- a/tools/perf/arch/x86/util/intel-pt.c
+++ b/tools/perf/arch/x86/util/intel-pt.c
@@ -776,6 +776,10 @@ static int intel_pt_recording_options(struct 
auxtrace_record *itr,
}
}
 
+   if (opts->full_auxtrace)
+   intel_pt_evsel->core.attr.aux_watermark =
+  opts->auxtrace_mmap_pages / 4 * page_size;
+
intel_pt_parse_terms(intel_pt_pmu->name, _pt_pmu->format,
 "tsc", _bit);
 
-- 
2.30.2



[PATCH 1/2] perf: Cap allocation order at aux_watermark

2021-04-08 Thread Alexander Shishkin
Currently, we start allocating AUX pages half the size of the total
requested AUX buffer size, ignoring the attr.aux_watermark setting. This,
in turn, makes intel_pt driver disregard the watermark also, as it uses
page order for its SG (ToPA) configuration.

Now, this can be fixed in the intel_pt PMU driver, but seeing as it's the
only one currently making use of high order allocations, there is no
reason not to fix the allocator instead. This way, any other driver
wishing to add this support would not have to worry about this.

Signed-off-by: Alexander Shishkin 
---
 kernel/events/ring_buffer.c | 34 ++
 1 file changed, 18 insertions(+), 16 deletions(-)

diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index bd55ccc91373..bd94b91bd4be 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -674,21 +674,26 @@ int rb_alloc_aux(struct perf_buffer *rb, struct 
perf_event *event,
if (!has_aux(event))
return -EOPNOTSUPP;
 
-   /*
-* We need to start with the max_order that fits in nr_pages,
-* not the other way around, hence ilog2() and not get_order.
-*/
-   max_order = ilog2(nr_pages);
-
-   /*
-* PMU requests more than one contiguous chunks of memory
-* for SW double buffering
-*/
if (!overwrite) {
-   if (!max_order)
-   return -EINVAL;
+   /*
+* Watermark defaults to half the buffer, and so does the
+* max_order, to aid PMU drivers in double buffering.
+*/
+   if (!watermark)
+   watermark = nr_pages << (PAGE_SHIFT - 1);
 
-   max_order--;
+   /*
+* Use aux_watermark as the basis for chunking to
+* help PMU drivers honor the watermark.
+*/
+   max_order = get_order(watermark);
+   } else {
+   /*
+   * We need to start with the max_order that fits in nr_pages,
+   * not the other way around, hence ilog2() and not get_order.
+   */
+   max_order = ilog2(nr_pages);
+   watermark = 0;
}
 
rb->aux_pages = kcalloc_node(nr_pages, sizeof(void *), GFP_KERNEL,
@@ -743,9 +748,6 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event 
*event,
rb->aux_overwrite = overwrite;
rb->aux_watermark = watermark;
 
-   if (!rb->aux_watermark && !rb->aux_overwrite)
-   rb->aux_watermark = nr_pages << (PAGE_SHIFT - 1);
-
 out:
if (!ret)
rb->aux_pgoff = pgoff;
-- 
2.30.2



[PATCH 0/2] perf, pt: Improve data loss

2021-04-08 Thread Alexander Shishkin
Hi guys,

There is a problem between the PT driver and the AUX allocator that results
in smaller buffers consisting of 2 high-order regions, which also means
only 2 possibilities of where PMI gets generated and where tracing stops.

This is not good enough for double buffering: when we get a PMI mid-buffer,
we update the ->aux_head etc and immediately start a new transaction while
observing ->aux_tail to still be zero, which makes the PT driver put a stop
bit at the end of the buffer. However quick userspace is to update the
->aux_tail, that second transaction/PERF_RECORD_AUX ends up truncated.

The proposed solution here is to set up attr.aux_watermark to a quarter of
the buffer. Unfortunately, at the moment, the PT driver is not equipped to
deal with aux_watermark that's smaller than the AUX allocation order. I
could fix this in the driver itself, but, seeing as it's the only PMU that
actually uses the 'order' from AUX allocations, I'd rather fix the
allocator instead, which is done in patch 1/2.

Patch 2/2 could be replaced by instead changing the in-kernel aux_watermark
default, but that may interfere with PMU drivers that don't ignore said
watermark / handle->wakeup (afaict, that's only arm_spe).

Alexander Shishkin (2):
  perf: Cap allocation order at aux_watermark
  perf intel-pt: Use aux_watermark

 kernel/events/ring_buffer.c | 34 +++--
 tools/perf/arch/x86/util/intel-pt.c |  4 
 2 files changed, 22 insertions(+), 16 deletions(-)

-- 
2.30.2



Re: [PATCH v1] misc: pti: Remove driver for deprecated platform

2021-01-22 Thread Alexander Shishkin
Andy Shevchenko  writes:

> Intel Moorestown and Medfield are quite old Intel Atom based
> 32-bit platforms, which were in limited use in some Android phones,
> tablets and consumer electronics more than eight years ago.
>
> There are no bugs or problems ever reported outside from Intel
> for breaking any of that platforms for years. It seems no real
> users exists who run more or less fresh kernel on it. The commit
> 05f4434bc130 ("ASoC: Intel: remove mfld_machine") also in align
> with this theory.
>
> Due to above and to reduce a burden of supporting outdated drivers
> we remove the support of outdated platforms completely.
>
> Cc: Alexander Shishkin 
> Signed-off-by: Andy Shevchenko 
> Acked-by: Linus Walleij 

FWIW,

Acked-by: Alexander Shishkin 

Regards,
--
Alex


[PATCH 2/2] intel_th: pci: Add Alder Lake-P support

2021-01-15 Thread Alexander Shishkin
This adds support for the Trace Hub in Alder Lake-P.

Signed-off-by: Alexander Shishkin 
---
 drivers/hwtracing/intel_th/pci.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/hwtracing/intel_th/pci.c b/drivers/hwtracing/intel_th/pci.c
index 52acd77438ed..251e75c9ba9d 100644
--- a/drivers/hwtracing/intel_th/pci.c
+++ b/drivers/hwtracing/intel_th/pci.c
@@ -268,6 +268,11 @@ static const struct pci_device_id intel_th_pci_id_table[] 
= {
PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x7aa6),
.driver_data = (kernel_ulong_t)_th_2x,
},
+   {
+   /* Alder Lake-P */
+   PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x51a6),
+   .driver_data = (kernel_ulong_t)_th_2x,
+   },
{
/* Alder Lake CPU */
PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x466f),
-- 
2.29.2



[PATCH 1/2] stm class: Fix module init return on allocation failure

2021-01-15 Thread Alexander Shishkin
From: Wang Hui 

In stm_heartbeat_init(): return value gets reset after the first
iteration by stm_source_register_device(), so allocation failures
after that will, after a clean up, return success. Fix that.

Reported-by: Hulk Robot 
Fixes: 119291853038 ("stm class: Add heartbeat stm source device")
Signed-off-by: Wang Hui 
Signed-off-by: Alexander Shishkin 
---
 drivers/hwtracing/stm/heartbeat.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/hwtracing/stm/heartbeat.c 
b/drivers/hwtracing/stm/heartbeat.c
index 3e7df1c0477f..81d7b21d31ec 100644
--- a/drivers/hwtracing/stm/heartbeat.c
+++ b/drivers/hwtracing/stm/heartbeat.c
@@ -64,7 +64,7 @@ static void stm_heartbeat_unlink(struct stm_source_data *data)
 
 static int stm_heartbeat_init(void)
 {
-   int i, ret = -ENOMEM;
+   int i, ret;
 
if (nr_devs < 0 || nr_devs > STM_HEARTBEAT_MAX)
return -EINVAL;
@@ -72,8 +72,10 @@ static int stm_heartbeat_init(void)
for (i = 0; i < nr_devs; i++) {
stm_heartbeat[i].data.name =
kasprintf(GFP_KERNEL, "heartbeat.%d", i);
-   if (!stm_heartbeat[i].data.name)
+   if (!stm_heartbeat[i].data.name) {
+   ret = -ENOMEM;
goto fail_unregister;
+   }
 
stm_heartbeat[i].data.nr_chans  = 1;
stm_heartbeat[i].data.link  = stm_heartbeat_link;
-- 
2.29.2



[PATCH 0/2] stm class/intel_th: Fixes for v5.11

2021-01-15 Thread Alexander Shishkin
Hi Greg,

Here are updates that I have for v5.11. These are: one minor bugfix and
a new PCI ID.

Alexander Shishkin (1):
  intel_th: pci: Add Alder Lake-P support

Wang Hui (1):
  stm class: Fix module init return on allocation failure

 drivers/hwtracing/intel_th/pci.c  | 5 +
 drivers/hwtracing/stm/heartbeat.c | 6 --
 2 files changed, 9 insertions(+), 2 deletions(-)

-- 
2.29.2



Re: [RFC] perf evlist: Warn if event group has mixed sw/hw events

2020-10-26 Thread Alexander Shishkin
Andi Kleen  writes:

> On Mon, Oct 26, 2020 at 11:19:37PM +0900, Namhyung Kim wrote:
>> This patch just added a warning before running it.  I'd really want to
>> fix the kernel if possible but don't have a good idea.  Thoughts?
>
> The easiest fix would be some multi threading in perf stat opening, then then
> extra latencies could be mostly hidden. One thread per group would probably
> be overkill, but just a few threads would lower the penalty significantly.
>
> I think that would be better than this patch and it's likely not that much
> more complicated, as this is already a lot of code.
>
>> +{
>> +const char *known_sw_pmu[] = {
>> +"software", "tracepoint", "breakpoint", "kprobe", "uprobe", 
>> "msr"
>
> That's a non scalable approach. New pmus get added regularly. It would be 
> better to
> indicate this in a generic way from the kernel.

That, and also, intel_pt is a software PMU and a few of its features
depend on intel_pt/.../ being a group leader.

Regards,
--
Alex


[PATCH 3/8] tracing: Add trace_export support for event trace

2020-10-05 Thread Alexander Shishkin
From: Tingwei Zhang 

Only function traces can be exported to other destinations currently.
This patch exports event trace as well. Move trace export related
function to the beginning of file so other trace can call
trace_process_export() to export.

Signed-off-by: Tingwei Zhang 
Reviewed-by: Steven Rostedt (VMware) 
Reviewed-by: Alexander Shishkin 
Signed-off-by: Alexander Shishkin 
---
 include/linux/trace.h |   1 +
 kernel/trace/trace.c  | 259 ++
 2 files changed, 135 insertions(+), 125 deletions(-)

diff --git a/include/linux/trace.h b/include/linux/trace.h
index c115a5d2269f..86033d214972 100644
--- a/include/linux/trace.h
+++ b/include/linux/trace.h
@@ -5,6 +5,7 @@
 #ifdef CONFIG_TRACING
 
 #define TRACE_EXPORT_FUNCTION  BIT(0)
+#define TRACE_EXPORT_EVENT BIT(1)
 
 /*
  * The trace export - an export of Ftrace output. The trace_export
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 3ca121ad8728..a40ee413123c 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -251,6 +251,138 @@ unsigned long long ns2usecs(u64 nsec)
return nsec;
 }
 
+static void
+trace_process_export(struct trace_export *export,
+  struct ring_buffer_event *event, int flag)
+{
+   struct trace_entry *entry;
+   unsigned int size = 0;
+
+   if (export->flags & flag) {
+   entry = ring_buffer_event_data(event);
+   size = ring_buffer_event_length(event);
+   export->write(export, entry, size);
+   }
+}
+
+static DEFINE_MUTEX(ftrace_export_lock);
+
+static struct trace_export __rcu *ftrace_exports_list __read_mostly;
+
+static DEFINE_STATIC_KEY_FALSE(trace_function_exports_enabled);
+static DEFINE_STATIC_KEY_FALSE(trace_event_exports_enabled);
+
+static inline void ftrace_exports_enable(struct trace_export *export)
+{
+   if (export->flags & TRACE_EXPORT_FUNCTION)
+   static_branch_inc(_function_exports_enabled);
+
+   if (export->flags & TRACE_EXPORT_EVENT)
+   static_branch_inc(_event_exports_enabled);
+}
+
+static inline void ftrace_exports_disable(struct trace_export *export)
+{
+   if (export->flags & TRACE_EXPORT_FUNCTION)
+   static_branch_dec(_function_exports_enabled);
+
+   if (export->flags & TRACE_EXPORT_EVENT)
+   static_branch_dec(_event_exports_enabled);
+}
+
+static void ftrace_exports(struct ring_buffer_event *event, int flag)
+{
+   struct trace_export *export;
+
+   preempt_disable_notrace();
+
+   export = rcu_dereference_raw_check(ftrace_exports_list);
+   while (export) {
+   trace_process_export(export, event, flag);
+   export = rcu_dereference_raw_check(export->next);
+   }
+
+   preempt_enable_notrace();
+}
+
+static inline void
+add_trace_export(struct trace_export **list, struct trace_export *export)
+{
+   rcu_assign_pointer(export->next, *list);
+   /*
+* We are entering export into the list but another
+* CPU might be walking that list. We need to make sure
+* the export->next pointer is valid before another CPU sees
+* the export pointer included into the list.
+*/
+   rcu_assign_pointer(*list, export);
+}
+
+static inline int
+rm_trace_export(struct trace_export **list, struct trace_export *export)
+{
+   struct trace_export **p;
+
+   for (p = list; *p != NULL; p = &(*p)->next)
+   if (*p == export)
+   break;
+
+   if (*p != export)
+   return -1;
+
+   rcu_assign_pointer(*p, (*p)->next);
+
+   return 0;
+}
+
+static inline void
+add_ftrace_export(struct trace_export **list, struct trace_export *export)
+{
+   ftrace_exports_enable(export);
+
+   add_trace_export(list, export);
+}
+
+static inline int
+rm_ftrace_export(struct trace_export **list, struct trace_export *export)
+{
+   int ret;
+
+   ret = rm_trace_export(list, export);
+   ftrace_exports_disable(export);
+
+   return ret;
+}
+
+int register_ftrace_export(struct trace_export *export)
+{
+   if (WARN_ON_ONCE(!export->write))
+   return -1;
+
+   mutex_lock(_export_lock);
+
+   add_ftrace_export(_exports_list, export);
+
+   mutex_unlock(_export_lock);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(register_ftrace_export);
+
+int unregister_ftrace_export(struct trace_export *export)
+{
+   int ret;
+
+   mutex_lock(_export_lock);
+
+   ret = rm_ftrace_export(_exports_list, export);
+
+   mutex_unlock(_export_lock);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(unregister_ftrace_export);
+
 /* trace_flags holds trace_options default values */
 #define TRACE_DEFAULT_FLAGS\
(FUNCTION_DEFAULT_FLAGS |   \
@@ -2699,6 +2831,8 @@ void trace_event_buffer_commit(struc

[PATCH 8/8] intel_th: pci: Add Alder Lake CPU support

2020-10-05 Thread Alexander Shishkin
This adds support for the Trace Hub in Alder Lake CPU.

Signed-off-by: Alexander Shishkin 
Cc: sta...@vger.kernel.org # v4.14+
---
 drivers/hwtracing/intel_th/pci.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/hwtracing/intel_th/pci.c b/drivers/hwtracing/intel_th/pci.c
index dda4476b8553..52acd77438ed 100644
--- a/drivers/hwtracing/intel_th/pci.c
+++ b/drivers/hwtracing/intel_th/pci.c
@@ -268,6 +268,11 @@ static const struct pci_device_id intel_th_pci_id_table[] 
= {
PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x7aa6),
.driver_data = (kernel_ulong_t)_th_2x,
},
+   {
+   /* Alder Lake CPU */
+   PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x466f),
+   .driver_data = (kernel_ulong_t)_th_2x,
+   },
{ 0 },
 };
 
-- 
2.28.0



[PATCH 4/8] tracing: Add trace_export support for trace_marker

2020-10-05 Thread Alexander Shishkin
From: Tingwei Zhang 

Add the support to route trace_marker buffer to other destination
via trace_export.

Signed-off-by: Tingwei Zhang 
Reviewed-by: Steven Rostedt (VMware) 
Reviewed-by: Alexander Shishkin 
Signed-off-by: Alexander Shishkin 
---
 include/linux/trace.h | 1 +
 kernel/trace/trace.c  | 9 +
 2 files changed, 10 insertions(+)

diff --git a/include/linux/trace.h b/include/linux/trace.h
index 86033d214972..886a4ffd9d45 100644
--- a/include/linux/trace.h
+++ b/include/linux/trace.h
@@ -6,6 +6,7 @@
 
 #define TRACE_EXPORT_FUNCTION  BIT(0)
 #define TRACE_EXPORT_EVENT BIT(1)
+#define TRACE_EXPORT_MARKERBIT(2)
 
 /*
  * The trace export - an export of Ftrace output. The trace_export
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index a40ee413123c..6048fba2f590 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -271,6 +271,7 @@ static struct trace_export __rcu *ftrace_exports_list 
__read_mostly;
 
 static DEFINE_STATIC_KEY_FALSE(trace_function_exports_enabled);
 static DEFINE_STATIC_KEY_FALSE(trace_event_exports_enabled);
+static DEFINE_STATIC_KEY_FALSE(trace_marker_exports_enabled);
 
 static inline void ftrace_exports_enable(struct trace_export *export)
 {
@@ -279,6 +280,9 @@ static inline void ftrace_exports_enable(struct 
trace_export *export)
 
if (export->flags & TRACE_EXPORT_EVENT)
static_branch_inc(_event_exports_enabled);
+
+   if (export->flags & TRACE_EXPORT_MARKER)
+   static_branch_inc(_marker_exports_enabled);
 }
 
 static inline void ftrace_exports_disable(struct trace_export *export)
@@ -288,6 +292,9 @@ static inline void ftrace_exports_disable(struct 
trace_export *export)
 
if (export->flags & TRACE_EXPORT_EVENT)
static_branch_dec(_event_exports_enabled);
+
+   if (export->flags & TRACE_EXPORT_MARKER)
+   static_branch_dec(_marker_exports_enabled);
 }
 
 static void ftrace_exports(struct ring_buffer_event *event, int flag)
@@ -6687,6 +6694,8 @@ tracing_mark_write(struct file *filp, const char __user 
*ubuf,
} else
entry->buf[cnt] = '\0';
 
+   if (static_branch_unlikely(_marker_exports_enabled))
+   ftrace_exports(event, TRACE_EXPORT_MARKER);
__buffer_unlock_commit(buffer, event);
 
if (tt)
-- 
2.28.0



[PATCH 1/8] stm class: ftrace: Change dependency to TRACING

2020-10-05 Thread Alexander Shishkin
From: Tingwei Zhang 

We will support copying event trace to STM. Change
STM_SOURCE_FTRACE to depend on TRACING since we will
support multiple tracers.

Signed-off-by: Tingwei Zhang 
Reviewed-by: Steven Rostedt (VMware) 
Reviewed-by: Alexander Shishkin 
Signed-off-by: Alexander Shishkin 
---
 drivers/hwtracing/stm/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/hwtracing/stm/Kconfig b/drivers/hwtracing/stm/Kconfig
index d0e92a8a045c..aad594fe79cc 100644
--- a/drivers/hwtracing/stm/Kconfig
+++ b/drivers/hwtracing/stm/Kconfig
@@ -71,7 +71,7 @@ config STM_SOURCE_HEARTBEAT
 
 config STM_SOURCE_FTRACE
tristate "Copy the output from kernel Ftrace to STM engine"
-   depends on FUNCTION_TRACER
+   depends on TRACING
help
  This option can be used to copy the output from kernel Ftrace
  to STM engine. Enabling this option will introduce a slight
-- 
2.28.0



[PATCH 0/8] stm class/intel_th: Updates for v5.10

2020-10-05 Thread Alexander Shishkin
Hi Greg,

Apologies for the late posting. These are the updates that I have for
the next merge window if that's still doable.

Six patches add support for exporting more of ftrace via STM, 3 of them
get into ftrace territory, but Steven reviewed all of them and there
should be no conflicts.

Two patches add new PCI IDs to the Intel TH driver, which would normally
go in a "fixes" patchset, but this late in the cycle that doesn't make
much sense, so they are included in the same series.

Alexander Shishkin (2):
  intel_th: pci: Add Alder Lake-S support
  intel_th: pci: Add Alder Lake CPU support

Tingwei Zhang (6):
  stm class: ftrace: Change dependency to TRACING
  tracing: Add flag to control different traces
  tracing: Add trace_export support for event trace
  tracing: Add trace_export support for trace_marker
  stm class: ftrace: Enable supported trace export flag
  stm class: ftrace: Use different channel accroding to CPU

 drivers/hwtracing/intel_th/pci.c |  10 ++
 drivers/hwtracing/stm/Kconfig|   2 +-
 drivers/hwtracing/stm/ftrace.c   |   7 +-
 include/linux/trace.h|   7 +
 kernel/trace/trace.c | 270 +--
 5 files changed, 169 insertions(+), 127 deletions(-)

-- 
2.28.0



[PATCH 6/8] stm class: ftrace: Use different channel accroding to CPU

2020-10-05 Thread Alexander Shishkin
From: Tingwei Zhang 

To avoid mixup of packets from differnt ftrace packets simultaneously,
use different channel for packets from different CPU.

Signed-off-by: Tingwei Zhang 
Reviewed-by: Steven Rostedt (VMware) 
Signed-off-by: Alexander Shishkin 
---
 drivers/hwtracing/stm/ftrace.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/hwtracing/stm/ftrace.c b/drivers/hwtracing/stm/ftrace.c
index c694a6e692d1..3bb606dfa634 100644
--- a/drivers/hwtracing/stm/ftrace.c
+++ b/drivers/hwtracing/stm/ftrace.c
@@ -37,8 +37,10 @@ static void notrace
 stm_ftrace_write(struct trace_export *export, const void *buf, unsigned int 
len)
 {
struct stm_ftrace *stm = container_of(export, struct stm_ftrace, 
ftrace);
+   /* This is called from trace system with preemption disabled */
+   unsigned int cpu = smp_processor_id();
 
-   stm_source_write(>data, STM_FTRACE_CHAN, buf, len);
+   stm_source_write(>data, STM_FTRACE_CHAN + cpu, buf, len);
 }
 
 static int stm_ftrace_link(struct stm_source_data *data)
@@ -63,6 +65,7 @@ static int __init stm_ftrace_init(void)
 {
int ret;
 
+   stm_ftrace.data.nr_chans = roundup_pow_of_two(num_possible_cpus());
ret = stm_source_register_device(NULL, _ftrace.data);
if (ret)
pr_err("Failed to register stm_source - ftrace.\n");
-- 
2.28.0



[PATCH 5/8] stm class: ftrace: Enable supported trace export flag

2020-10-05 Thread Alexander Shishkin
From: Tingwei Zhang 

Set flags for trace_export. Export function trace, event trace
and trace marker to stm.

Signed-off-by: Tingwei Zhang 
Reviewed-by: Steven Rostedt (VMware) 
Reviewed-by: Alexander Shishkin 
Signed-off-by: Alexander Shishkin 
---
 drivers/hwtracing/stm/ftrace.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/hwtracing/stm/ftrace.c b/drivers/hwtracing/stm/ftrace.c
index ce868e095410..c694a6e692d1 100644
--- a/drivers/hwtracing/stm/ftrace.c
+++ b/drivers/hwtracing/stm/ftrace.c
@@ -46,6 +46,8 @@ static int stm_ftrace_link(struct stm_source_data *data)
struct stm_ftrace *sf = container_of(data, struct stm_ftrace, data);
 
sf->ftrace.write = stm_ftrace_write;
+   sf->ftrace.flags = TRACE_EXPORT_FUNCTION | TRACE_EXPORT_EVENT
+   | TRACE_EXPORT_MARKER;
 
return register_ftrace_export(>ftrace);
 }
-- 
2.28.0



[PATCH 7/8] intel_th: pci: Add Alder Lake-S support

2020-10-05 Thread Alexander Shishkin
This adds support for the Trace Hub in Alder Lake-S.

Signed-off-by: Alexander Shishkin 
Cc: sta...@vger.kernel.org # v4.14+
---
 drivers/hwtracing/intel_th/pci.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/hwtracing/intel_th/pci.c b/drivers/hwtracing/intel_th/pci.c
index 21fdf0b93516..dda4476b8553 100644
--- a/drivers/hwtracing/intel_th/pci.c
+++ b/drivers/hwtracing/intel_th/pci.c
@@ -263,6 +263,11 @@ static const struct pci_device_id intel_th_pci_id_table[] 
= {
PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x1bcc),
.driver_data = (kernel_ulong_t)_th_2x,
},
+   {
+   /* Alder Lake */
+   PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x7aa6),
+   .driver_data = (kernel_ulong_t)_th_2x,
+   },
{ 0 },
 };
 
-- 
2.28.0



[PATCH 2/8] tracing: Add flag to control different traces

2020-10-05 Thread Alexander Shishkin
From: Tingwei Zhang 

More traces like event trace or trace marker will be supported.
Add flag for difference traces, so that they can be controlled
separately. Move current function trace to it's own flag
instead of global ftrace enable flag.

Signed-off-by: Tingwei Zhang 
Reviewed-by: Steven Rostedt (VMware) 
Reviewed-by: Alexander Shishkin 
Signed-off-by: Alexander Shishkin 
---
 include/linux/trace.h |  5 +
 kernel/trace/trace.c  | 36 +++-
 2 files changed, 24 insertions(+), 17 deletions(-)

diff --git a/include/linux/trace.h b/include/linux/trace.h
index 36d255d66f88..c115a5d2269f 100644
--- a/include/linux/trace.h
+++ b/include/linux/trace.h
@@ -3,6 +3,9 @@
 #define _LINUX_TRACE_H
 
 #ifdef CONFIG_TRACING
+
+#define TRACE_EXPORT_FUNCTION  BIT(0)
+
 /*
  * The trace export - an export of Ftrace output. The trace_export
  * can process traces and export them to a registered destination as
@@ -15,10 +18,12 @@
  * next- pointer to the next trace_export
  * write   - copy traces which have been delt with ->commit() to
  *   the destination
+ * flags   - which ftrace to be exported
  */
 struct trace_export {
struct trace_export __rcu   *next;
void (*write)(struct trace_export *, const void *, unsigned int);
+   int flags;
 };
 
 int register_ftrace_export(struct trace_export *export);
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index f40d850ebabc..3ca121ad8728 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2744,33 +2744,37 @@ trace_buffer_unlock_commit_nostack(struct trace_buffer 
*buffer,
 
 static void
 trace_process_export(struct trace_export *export,
-  struct ring_buffer_event *event)
+  struct ring_buffer_event *event, int flag)
 {
struct trace_entry *entry;
unsigned int size = 0;
 
-   entry = ring_buffer_event_data(event);
-   size = ring_buffer_event_length(event);
-   export->write(export, entry, size);
+   if (export->flags & flag) {
+   entry = ring_buffer_event_data(event);
+   size = ring_buffer_event_length(event);
+   export->write(export, entry, size);
+   }
 }
 
 static DEFINE_MUTEX(ftrace_export_lock);
 
 static struct trace_export __rcu *ftrace_exports_list __read_mostly;
 
-static DEFINE_STATIC_KEY_FALSE(ftrace_exports_enabled);
+static DEFINE_STATIC_KEY_FALSE(trace_function_exports_enabled);
 
-static inline void ftrace_exports_enable(void)
+static inline void ftrace_exports_enable(struct trace_export *export)
 {
-   static_branch_enable(_exports_enabled);
+   if (export->flags & TRACE_EXPORT_FUNCTION)
+   static_branch_inc(_function_exports_enabled);
 }
 
-static inline void ftrace_exports_disable(void)
+static inline void ftrace_exports_disable(struct trace_export *export)
 {
-   static_branch_disable(_exports_enabled);
+   if (export->flags & TRACE_EXPORT_FUNCTION)
+   static_branch_dec(_function_exports_enabled);
 }
 
-static void ftrace_exports(struct ring_buffer_event *event)
+static void ftrace_exports(struct ring_buffer_event *event, int flag)
 {
struct trace_export *export;
 
@@ -2778,7 +2782,7 @@ static void ftrace_exports(struct ring_buffer_event 
*event)
 
export = rcu_dereference_raw_check(ftrace_exports_list);
while (export) {
-   trace_process_export(export, event);
+   trace_process_export(export, event, flag);
export = rcu_dereference_raw_check(export->next);
}
 
@@ -2818,8 +2822,7 @@ rm_trace_export(struct trace_export **list, struct 
trace_export *export)
 static inline void
 add_ftrace_export(struct trace_export **list, struct trace_export *export)
 {
-   if (*list == NULL)
-   ftrace_exports_enable();
+   ftrace_exports_enable(export);
 
add_trace_export(list, export);
 }
@@ -2830,8 +2833,7 @@ rm_ftrace_export(struct trace_export **list, struct 
trace_export *export)
int ret;
 
ret = rm_trace_export(list, export);
-   if (*list == NULL)
-   ftrace_exports_disable();
+   ftrace_exports_disable(export);
 
return ret;
 }
@@ -2884,8 +2886,8 @@ trace_function(struct trace_array *tr,
entry->parent_ip= parent_ip;
 
if (!call_filter_check_discard(call, entry, buffer, event)) {
-   if (static_branch_unlikely(_exports_enabled))
-   ftrace_exports(event);
+   if (static_branch_unlikely(_function_exports_enabled))
+   ftrace_exports(event, TRACE_EXPORT_FUNCTION);
__buffer_unlock_commit(buffer, event);
}
 }
-- 
2.28.0



Re: [PATCH v3 0/6] tracing: export event trace and trace_marker

2020-09-18 Thread Alexander Shishkin
Tingwei Zhang  writes:

> Hi Alexander, Maxime, Aleandre,
>
> May I know your comments for this patch set?

Everything except the last patch is

Reviewed-by: Alexander Shishkin 

After that one is resolved either I can pick it up or Stephen. Either
way is fine with me.

Regards,
--
Alex


Re: [PATCH v3 6/6] stm class: ftrace: use different channel accroding to CPU

2020-09-18 Thread Alexander Shishkin
Tingwei Zhang  writes:

> @@ -63,6 +65,7 @@ static int __init stm_ftrace_init(void)
>  {
>   int ret;
>  
> + stm_ftrace.data.nr_chans = num_possible_cpus();

Not a problem with this patch necesarily, but this made me realize that
.nr_chans may be larger than:

 (1) what the policy permits,
 (2) what the stm device can handle.

While (1) the user can fix in the policy, they won't be able to fix (2),
in which case they won't be able to use stm_ftrace at all. I'm thinking
if a link-time callback would be good enough.

Another thing is that .nr_chans needs to be a power of 2 at the moment.

Regards,
--
Alex


[PATCH] perf: Turn kernel address filters into linear address filters

2020-09-09 Thread Alexander Shishkin
One thing that the address filters can't do at the moment is cover
anonymous mappings. Apparently, there is code out there that generates
executable code in anonymous mappings and executes it in place. And at
the moment we only allow file-based address filters for userspace.

The easiest way to do this is to repurpose the kernel filters and allow
those to be used on userspace addresses. The Intel PT driver at the moment
does enforce that non-file-based filters are only used with kernel
addresses, so that needs to be corrected. The Coresight ETM driver doesn't
enforce that, so no changes needed there.

No tooling changes required, either.

Signed-off-by: Alexander Shishkin 
---
 arch/x86/events/intel/pt.c | 13 -
 kernel/events/core.c   | 22 ++
 2 files changed, 10 insertions(+), 25 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index e94af4a54d0d..2d981dbaac1b 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -1347,11 +1347,6 @@ static void pt_addr_filters_fini(struct perf_event 
*event)
event->hw.addr_filters = NULL;
 }
 
-static inline bool valid_kernel_ip(unsigned long ip)
-{
-   return virt_addr_valid(ip) && kernel_ip(ip);
-}
-
 static int pt_event_addr_filters_validate(struct list_head *filters)
 {
struct perf_addr_filter *filter;
@@ -1366,14 +1361,6 @@ static int pt_event_addr_filters_validate(struct 
list_head *filters)
filter->action == PERF_ADDR_FILTER_ACTION_START)
return -EOPNOTSUPP;
 
-   if (!filter->path.dentry) {
-   if (!valid_kernel_ip(filter->offset))
-   return -EINVAL;
-
-   if (!valid_kernel_ip(filter->offset + filter->size))
-   return -EINVAL;
-   }
-
if (++range > 
intel_pt_validate_hw_cap(PT_CAP_num_address_ranges))
return -EOPNOTSUPP;
}
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 57efe3b21e29..8579654d3fc7 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9989,9 +9989,9 @@ enum {
IF_ACT_START,
IF_ACT_STOP,
IF_SRC_FILE,
-   IF_SRC_KERNEL,
+   IF_SRC_LINEAR,
IF_SRC_FILEADDR,
-   IF_SRC_KERNELADDR,
+   IF_SRC_LINEARADDR,
 };
 
 enum {
@@ -10005,9 +10005,9 @@ static const match_table_t if_tokens = {
{ IF_ACT_START, "start" },
{ IF_ACT_STOP,  "stop" },
{ IF_SRC_FILE,  "%u/%u@%s" },
-   { IF_SRC_KERNEL,"%u/%u" },
+   { IF_SRC_LINEAR,"%u/%u" },
{ IF_SRC_FILEADDR,  "%u@%s" },
-   { IF_SRC_KERNELADDR,"%u" },
+   { IF_SRC_LINEARADDR,"%u" },
{ IF_ACT_NONE,  NULL },
 };
 
@@ -10022,7 +10022,7 @@ perf_event_parse_addr_filter(struct perf_event *event, 
char *fstr,
char *start, *orig, *filename = NULL;
substring_t args[MAX_OPT_ARGS];
int state = IF_STATE_ACTION, token;
-   unsigned int kernel = 0;
+   unsigned int linear = 0;
int ret = -EINVAL;
 
orig = fstr = kstrdup(fstr, GFP_KERNEL);
@@ -10059,9 +10059,9 @@ perf_event_parse_addr_filter(struct perf_event *event, 
char *fstr,
state = IF_STATE_SOURCE;
break;
 
-   case IF_SRC_KERNELADDR:
-   case IF_SRC_KERNEL:
-   kernel = 1;
+   case IF_SRC_LINEARADDR:
+   case IF_SRC_LINEAR:
+   linear = 1;
/* fall through */
 
case IF_SRC_FILEADDR:
@@ -10074,7 +10074,7 @@ perf_event_parse_addr_filter(struct perf_event *event, 
char *fstr,
if (ret)
goto fail;
 
-   if (token == IF_SRC_KERNEL || token == IF_SRC_FILE) {
+   if (token == IF_SRC_LINEAR || token == IF_SRC_FILE) {
*args[1].to = 0;
ret = kstrtoul(args[1].from, 0, >size);
if (ret)
@@ -10105,8 +10105,6 @@ perf_event_parse_addr_filter(struct perf_event *event, 
char *fstr,
 */
if (state == IF_STATE_END) {
ret = -EINVAL;
-   if (kernel && event->attr.exclude_kernel)
-   goto fail;
 
/*
 * ACTION "filter" must have a non-zero length region
@@ -10116,7 +10114,7 @@ perf_event_parse_addr_filter(struct perf_event *event, 
char *fstr,
!filter->size)
goto fail;
 
-   if (!kernel) {
+   if (!linear) {
if (!filename)
goto fail;
 
-- 
2.28.0



Re: [PATCH] kernel: events: Use scnprintf() in show_pmu_*() instead of snprintf()

2020-09-09 Thread Alexander Shishkin
Shuah Khan  writes:

> Since snprintf() returns would-be-output size instead of the actual
> output size, replace it with scnprintf(), so the nr_addr_filters_show(),
> type_show(), and perf_event_mux_interval_ms_show() routines return the
> actual size.

Well, firstly they should just be sprintf()s, and secondly, I wouldn't
worry about it, because [0].

[0] https://marc.info/?l=linux-kernel=159874491103969=2

Regards,
--
Alex


Re: [PATCH 1/2] perf: Add closing sibling events' file descriptors

2020-08-11 Thread Alexander Shishkin
Andi Kleen  writes:

> On Tue, Aug 11, 2020 at 12:47:24PM +0300, Alexander Shishkin wrote:
>> 
>> Right, but which bytes? One byte per event? That's
>> arbitrary. sizeof(struct perf_event)? Then, probably also sizeof(struct
>> perf_event_context).
>
> Yes the sum of all the sizeofs needed for a perf_event.

Well, *all* of them will be tedious to collect, seeing as there is
ctx->task_ctx_data, there is ring_buffer, scheduling trees, there is
stuff that pmus allocate under the hood, like AUX SG tables.

>> The above two structs add up to 2288 bytes on my local build. Given the
>> default RLIMIT_MEMLOCK of 64k, that's 28 events. As opposed to ~1k
>> events if we keep using the RLIMIT_NOFILE. Unless I'm missing your
>> point.
>
> Yes that's true. We would probably need to increase the limit to a few
> MB at least.

Ok, but if we have to increase a limit anyway, we might as well increase
the NOFILE.

> Or maybe use some combination with the old rlimit for compatibility.
> The old rlimit would give an implicit extra RLIMIT_NFILE * 2288 limit
> for RLIMIT_MEMLOCK. This would only give full compatibility for a single
> perf process, but I suspect that's good enough for most users.

We'd need to settle on charging a fixed set of structures per event,
then. And, without increasing the file limit, this would still total at
1052 events.

We could also involve perf_event_mlock_kb *and* increase it too, but I
suspect distros don't just leave it at kernel's default either.

Regards,
--
Alex


Re: [PATCH 1/2] perf: Add closing sibling events' file descriptors

2020-08-11 Thread Alexander Shishkin
Andi Kleen  writes:

>> It didn't. I can't figure out what to charge on the locked memory, as
>> all that memory is in kernel-side objects. It also needs to make sense
>
> I don't see how that makes a difference for the count. It just account
> bytes. Can you elaborate?

Right, but which bytes? One byte per event? That's
arbitrary. sizeof(struct perf_event)? Then, probably also sizeof(struct
perf_event_context).

>> as iirc the default MLOCK_LIMIT is quite low, you'd hit it sooner than
>> the file descriptor limit.
>
> For a single process? 

The above two structs add up to 2288 bytes on my local build. Given the
default RLIMIT_MEMLOCK of 64k, that's 28 events. As opposed to ~1k
events if we keep using the RLIMIT_NOFILE. Unless I'm missing your
point.

Regards,
--
Alex


Re: [PATCH 1/2] perf: Add closing sibling events' file descriptors

2020-08-10 Thread Alexander Shishkin
Andi Kleen  writes:

>> > This adds an opt-in flag to the perf_event_open() syscall to retain
>> > sibling events after their file descriptors are closed. In this case, the
>> > actual events will be closed with the group leader.
>> 
>> So having the 1:1 relation with filedesc imposes a resource limit on
>> userspace.
>> 
>> This patch breaks that and enables a user to basically DoS the system by
>> creating unbound events.
>
> The idea was to account the events in the locked memory allocation too.
> Not sure that made it into the patch though.

It didn't. I can't figure out what to charge on the locked memory, as
all that memory is in kernel-side objects. It also needs to make sense
as iirc the default MLOCK_LIMIT is quite low, you'd hit it sooner than
the file descriptor limit.

> It has a minor issue that it might break some existing setups that rely
> on the mmap fitting exactly into the mmap allocation, but that could
> be solved by allowing a little slack, since the existing setups
> likely don't have that many events.

I don't see how to make this work in a sane way. Besides, if we have to
have a limit anyway, sticking with the existing one is just easier and
1:1 is kind of more logical.

Regards,
--
Alex


[PATCH 1/2] perf: Add closing sibling events' file descriptors

2020-07-08 Thread Alexander Shishkin
Currently, perf requires one file descriptor per event. In large groups,
this may mean running into the limit on open file descriptors. However,
the sibling events in a group only need file descriptors for the initial
configuration stage, after which they may not be needed any more.

This adds an opt-in flag to the perf_event_open() syscall to retain
sibling events after their file descriptors are closed. In this case, the
actual events will be closed with the group leader.

Signed-off-by: Alexander Shishkin 
Suggested-by: Andi Kleen 
---
 include/linux/perf_event.h  |   7 ++
 include/uapi/linux/perf_event.h |   1 +
 kernel/events/core.c| 149 +++-
 3 files changed, 115 insertions(+), 42 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 3b22db08b6fb..4ce2c303 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -623,6 +623,11 @@ struct perf_event {
 * either sufficies for read.
 */
struct list_headsibling_list;
+   /*
+* ALLOW_CLOSED siblings that were actually closed; when the group
+* leader goes, so should they.
+*/
+   struct list_headclosed_list;
struct list_headactive_list;
/*
 * Node on the pinned or flexible tree located at the event context;
@@ -644,6 +649,8 @@ struct perf_event {
int event_caps;
/* The cumulative AND of all event_caps for events in this group. */
int group_caps;
+   unsignedallow_close : 1,
+   closed  : 1;
 
struct perf_event   *group_leader;
struct pmu  *pmu;
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 52ca2093831c..69823c0e3cbd 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -1093,6 +1093,7 @@ enum perf_callchain_context {
 #define PERF_FLAG_FD_OUTPUT(1UL << 1)
 #define PERF_FLAG_PID_CGROUP   (1UL << 2) /* pid=cgroup id, per-cpu 
mode only */
 #define PERF_FLAG_FD_CLOEXEC   (1UL << 3) /* O_CLOEXEC */
+#define PERF_FLAG_ALLOW_CLOSE  (1UL << 4) /* retain the event past fd 
close */
 
 #if defined(__LITTLE_ENDIAN_BITFIELD)
 union perf_mem_data_src {
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 7c436d705fbd..e61be9cfce98 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -352,7 +352,8 @@ static void event_function_local(struct perf_event *event, 
event_f func, void *d
 #define PERF_FLAG_ALL (PERF_FLAG_FD_NO_GROUP |\
   PERF_FLAG_FD_OUTPUT  |\
   PERF_FLAG_PID_CGROUP |\
-  PERF_FLAG_FD_CLOEXEC)
+  PERF_FLAG_FD_CLOEXEC |\
+  PERF_FLAG_ALLOW_CLOSE)
 
 /*
  * branch priv levels that need permission checks
@@ -2165,6 +2166,15 @@ static void perf_group_detach(struct perf_event *event)
 * to whatever list we are on.
 */
list_for_each_entry_safe(sibling, tmp, >sibling_list, 
sibling_list) {
+   if (sibling->closed) {
+   list_move(>sibling_list, >closed_list);
+   event->nr_siblings--;
+   continue;
+   }
+
+   /* Proceed as if it was an ordinary sibling */
+   if (sibling->allow_close)
+   sibling->allow_close = 0;
 
sibling->group_leader = sibling;
list_del_init(>sibling_list);
@@ -2313,6 +2323,7 @@ __perf_remove_from_context(struct perf_event *event,
   void *info)
 {
unsigned long flags = (unsigned long)info;
+   struct perf_event *sibling;
 
if (ctx->is_active & EVENT_TIME) {
update_context_time(ctx);
@@ -2332,6 +2343,10 @@ __perf_remove_from_context(struct perf_event *event,
cpuctx->task_ctx = NULL;
}
}
+
+   flags &= ~DETACH_GROUP;
+   list_for_each_entry(sibling, >closed_list, sibling_list)
+   __perf_remove_from_context(sibling, cpuctx, ctx, (void *)flags);
 }
 
 /*
@@ -4906,51 +4921,12 @@ static void put_event(struct perf_event *event)
_free_event(event);
 }
 
-/*
- * Kill an event dead; while event:refcount will preserve the event
- * object, it will not preserve its functionality. Once the last 'user'
- * gives up the object, we'll destroy the thing.
- */
-int perf_event_release_kernel(struct perf_event *event)
+static void perf_event_free_children(struct perf_event *event)
 {
-   struct perf_event_context *ctx = event->ctx;
struct perf_event *child, *tmp;
+   struct perf_event_context *ctx;
 

[PATCH 2/2] perf record: Support closing siblings' file descriptors

2020-07-08 Thread Alexander Shishkin
This changes perf record to close siblings' file descriptors after they
are created, to reduce the number of open files. The trick is setting up
mappings and applying ioctls to these guys before they get closed, which
means a slight rework to allow these things to happen in the same loop as
evsel creation.

Before:

> $ perf record -e '{dummy,syscalls:*}' -o before.data uname
> Error:
> Too many events are opened.
> Probably the maximum number of open file descriptors has been reached.
> Hint: Try again after reducing the number of events.
> Hint: Try increasing the limit with 'ulimit -n '

After:

> $ perf record -e '{dummy,syscalls:*}' -o after.data uname
> Couldn't synthesize cgroup events.
> Linux
> [ perf record: Woken up 5 times to write data ]
> [ perf record: Captured and wrote 0.118 MB after.data (62 samples) ]

Signed-off-by: Alexander Shishkin 
---
 tools/include/uapi/linux/perf_event.h   |  1 +
 tools/lib/perf/evlist.c | 30 +++-
 tools/lib/perf/evsel.c  | 21 +++
 tools/lib/perf/include/internal/evsel.h |  4 +++
 tools/perf/builtin-record.c | 48 +++--
 tools/perf/util/cpumap.c|  4 +++
 tools/perf/util/evlist.c|  4 ++-
 tools/perf/util/evsel.c | 17 -
 tools/perf/util/evsel.h |  3 ++
 9 files changed, 119 insertions(+), 13 deletions(-)

diff --git a/tools/include/uapi/linux/perf_event.h 
b/tools/include/uapi/linux/perf_event.h
index 7b2d6fc9e6ed..90238b8ee2ae 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -1069,6 +1069,7 @@ enum perf_callchain_context {
 #define PERF_FLAG_FD_OUTPUT(1UL << 1)
 #define PERF_FLAG_PID_CGROUP   (1UL << 2) /* pid=cgroup id, per-cpu 
mode only */
 #define PERF_FLAG_FD_CLOEXEC   (1UL << 3) /* O_CLOEXEC */
+#define PERF_FLAG_ALLOW_CLOSE  (1UL << 4) /* XXX */
 
 #if defined(__LITTLE_ENDIAN_BITFIELD)
 union perf_mem_data_src {
diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
index 6a875a0f01bb..8aeb5634bc61 100644
--- a/tools/lib/perf/evlist.c
+++ b/tools/lib/perf/evlist.c
@@ -403,6 +403,7 @@ perf_evlist__mmap_cb_get(struct perf_evlist *evlist, bool 
overwrite, int idx)
 }
 
 #define FD(e, x, y) (*(int *) xyarray__entry(e->fd, x, y))
+#define OUTPUT(e, x, y) (*(int *)xyarray__entry(e->output, x, y))
 
 static int
 perf_evlist__mmap_cb_mmap(struct perf_mmap *map, struct perf_mmap_param *mp,
@@ -433,6 +434,11 @@ mmap_per_evsel(struct perf_evlist *evlist, struct 
perf_evlist_mmap_ops *ops,
bool overwrite = evsel->attr.write_backward;
struct perf_mmap *map;
int *output, fd, cpu;
+   bool do_close = false;
+
+   /* Skip over not yet opened evsels */
+   if (!evsel->fd)
+   continue;
 
if (evsel->system_wide && thread)
continue;
@@ -454,6 +460,10 @@ mmap_per_evsel(struct perf_evlist *evlist, struct 
perf_evlist_mmap_ops *ops,
}
 
fd = FD(evsel, cpu, thread);
+   if (OUTPUT(evsel, cpu, thread) >= 0) {
+   *output = OUTPUT(evsel, cpu, thread);
+   continue;
+   }
 
if (*output == -1) {
*output = fd;
@@ -479,6 +489,10 @@ mmap_per_evsel(struct perf_evlist *evlist, struct 
perf_evlist_mmap_ops *ops,
if (!idx)
perf_evlist__set_mmap_first(evlist, map, 
overwrite);
} else {
+   if (fd < 0) {
+   fd = -fd;
+   do_close = true;
+   }
if (ioctl(fd, PERF_EVENT_IOC_SET_OUTPUT, *output) != 0)
return -1;
 
@@ -487,7 +501,7 @@ mmap_per_evsel(struct perf_evlist *evlist, struct 
perf_evlist_mmap_ops *ops,
 
revent = !overwrite ? POLLIN : 0;
 
-   if (!evsel->system_wide &&
+   if (!evsel->system_wide && !do_close &&
perf_evlist__add_pollfd(evlist, fd, map, revent) < 0) {
perf_mmap__put(map);
return -1;
@@ -500,6 +514,13 @@ mmap_per_evsel(struct perf_evlist *evlist, struct 
perf_evlist_mmap_ops *ops,
perf_evlist__set_sid_idx(evlist, evsel, idx, cpu,
 thread);
}
+
+   if (do_close) {
+   close(fd);
+   perf_mmap__put(map);
+   FD(evsel, cpu, thread) = -1; /* *output */
+   }
+   OUTPUT(evsel, cpu, thread) = *output;
}
 

[PATCH 0/2] perf: Allow closing siblings' file descriptors

2020-07-08 Thread Alexander Shishkin
Hi guys,

I've been looking at reducing the number of open file descriptors per perf
session. If we retain one descriptor per event, in a large group they add
up. At the same time, we're not actually using them for anything after the
SET_OUTPUT and maybe SET_FILTER ioctls. So, this series is a stab at that.

So, I added a new flag to the perf_event_open() that tells perf to keep
the event around after its file descriptor gets closed, for as long as its
group leader is alive. Since this is a new behavior, the flag is an opt-in.

I also hacked this into the perf tool (mostly perf record, but I'll hack
stat as well if this general approach is agreeable).

Alexander Shishkin (2):
  perf: Add closing sibling events' file descriptors
  perf record: Support closing siblings' file descriptors

 include/linux/perf_event.h  |   7 ++
 include/uapi/linux/perf_event.h |   1 +
 kernel/events/core.c| 149 +---
 tools/include/uapi/linux/perf_event.h   |   1 +
 tools/lib/perf/evlist.c |  30 -
 tools/lib/perf/evsel.c  |  21 
 tools/lib/perf/include/internal/evsel.h |   4 +
 tools/perf/builtin-record.c |  48 ++--
 tools/perf/util/cpumap.c|   4 +
 tools/perf/util/evlist.c|   4 +-
 tools/perf/util/evsel.c |  17 ++-
 tools/perf/util/evsel.h |   3 +
 12 files changed, 234 insertions(+), 55 deletions(-)

-- 
2.27.0



[PATCH 0/4] intel_th: Fixes for v5.8

2020-07-06 Thread Alexander Shishkin
Hi Greg,

Here are the fixes I have for v5.8 cycle so far. There is, in fact, just
one bugfix and 3 new PCI IDs. Nothing dramatic. Andy's r-bs are included.
Please consider applying. Thanks!

Alexander Shishkin (4):
  intel_th: pci: Add Jasper Lake CPU support
  intel_th: pci: Add Tiger Lake PCH-H support
  intel_th: pci: Add Emmitsburg PCH support
  intel_th: Fix a NULL dereference when hub driver is not loaded

 drivers/hwtracing/intel_th/core.c | 21 ++---
 drivers/hwtracing/intel_th/pci.c  | 15 +++
 drivers/hwtracing/intel_th/sth.c  |  4 +---
 3 files changed, 34 insertions(+), 6 deletions(-)

-- 
2.27.0



[PATCH 2/4] intel_th: pci: Add Tiger Lake PCH-H support

2020-07-06 Thread Alexander Shishkin
This adds support for the Trace Hub in Tiger Lake PCH-H.

Signed-off-by: Alexander Shishkin 
Reviewed-by: Andy Shevchenko 
Cc: sta...@vger.kernel.org # v4.14+
---
 drivers/hwtracing/intel_th/pci.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/hwtracing/intel_th/pci.c b/drivers/hwtracing/intel_th/pci.c
index f1dc1eef9ba2..f321e5ffe2a7 100644
--- a/drivers/hwtracing/intel_th/pci.c
+++ b/drivers/hwtracing/intel_th/pci.c
@@ -233,6 +233,11 @@ static const struct pci_device_id intel_th_pci_id_table[] 
= {
PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0xa0a6),
.driver_data = (kernel_ulong_t)_th_2x,
},
+   {
+   /* Tiger Lake PCH-H */
+   PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x43a6),
+   .driver_data = (kernel_ulong_t)_th_2x,
+   },
{
/* Jasper Lake PCH */
PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x4da6),
-- 
2.27.0



[PATCH 4/4] intel_th: Fix a NULL dereference when hub driver is not loaded

2020-07-06 Thread Alexander Shishkin
Connecting master to an output port when GTH driver module is not loaded
triggers a NULL dereference:

> RIP: 0010:intel_th_set_output+0x35/0x70 [intel_th]
> Call Trace:
>  ? sth_stm_link+0x12/0x20 [intel_th_sth]
>  stm_source_link_store+0x164/0x270 [stm_core]
>  dev_attr_store+0x17/0x30
>  sysfs_kf_write+0x3e/0x50
>  kernfs_fop_write+0xda/0x1b0
>  __vfs_write+0x1b/0x40
>  vfs_write+0xb9/0x1a0
>  ksys_write+0x67/0xe0
>  __x64_sys_write+0x1a/0x20
>  do_syscall_64+0x57/0x1d0
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Make sure the module in question is loaded and return an error if not.

Signed-off-by: Alexander Shishkin 
Fixes: 39f4034693b7c ("intel_th: Add driver infrastructure for Intel(R) Trace 
Hub devices")
Reviewed-by: Andy Shevchenko 
Reported-by: Ammy Yi 
Tested-by: Ammy Yi 
Cc: sta...@vger.kernel.org # v4.4
---
 drivers/hwtracing/intel_th/core.c | 21 ++---
 drivers/hwtracing/intel_th/sth.c  |  4 +---
 2 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/drivers/hwtracing/intel_th/core.c 
b/drivers/hwtracing/intel_th/core.c
index ca232ec565e8..c9ac3dc65113 100644
--- a/drivers/hwtracing/intel_th/core.c
+++ b/drivers/hwtracing/intel_th/core.c
@@ -1021,15 +1021,30 @@ int intel_th_set_output(struct intel_th_device *thdev,
 {
struct intel_th_device *hub = to_intel_th_hub(thdev);
struct intel_th_driver *hubdrv = to_intel_th_driver(hub->dev.driver);
+   int ret;
 
/* In host mode, this is up to the external debugger, do nothing. */
if (hub->host_mode)
return 0;
 
-   if (!hubdrv->set_output)
-   return -ENOTSUPP;
+   /*
+* hub is instantiated together with the source device that
+* calls here, so guaranteed to be present.
+*/
+   hubdrv = to_intel_th_driver(hub->dev.driver);
+   if (!hubdrv || !try_module_get(hubdrv->driver.owner))
+   return -EINVAL;
+
+   if (!hubdrv->set_output) {
+   ret = -ENOTSUPP;
+   goto out;
+   }
+
+   ret = hubdrv->set_output(hub, master);
 
-   return hubdrv->set_output(hub, master);
+out:
+   module_put(hubdrv->driver.owner);
+   return ret;
 }
 EXPORT_SYMBOL_GPL(intel_th_set_output);
 
diff --git a/drivers/hwtracing/intel_th/sth.c b/drivers/hwtracing/intel_th/sth.c
index 3a1f4e650378..a1529f571491 100644
--- a/drivers/hwtracing/intel_th/sth.c
+++ b/drivers/hwtracing/intel_th/sth.c
@@ -161,9 +161,7 @@ static int sth_stm_link(struct stm_data *stm_data, unsigned 
int master,
 {
struct sth_device *sth = container_of(stm_data, struct sth_device, stm);
 
-   intel_th_set_output(to_intel_th_device(sth->dev), master);
-
-   return 0;
+   return intel_th_set_output(to_intel_th_device(sth->dev), master);
 }
 
 static int intel_th_sw_init(struct sth_device *sth)
-- 
2.27.0



[PATCH 3/4] intel_th: pci: Add Emmitsburg PCH support

2020-07-06 Thread Alexander Shishkin
This adds support for the Trace Hub in Emmitsburg PCH.

Signed-off-by: Alexander Shishkin 
Reviewed-by: Andy Shevchenko 
Cc: sta...@vger.kernel.org # v4.14+
---
 drivers/hwtracing/intel_th/pci.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/hwtracing/intel_th/pci.c b/drivers/hwtracing/intel_th/pci.c
index f321e5ffe2a7..21fdf0b93516 100644
--- a/drivers/hwtracing/intel_th/pci.c
+++ b/drivers/hwtracing/intel_th/pci.c
@@ -258,6 +258,11 @@ static const struct pci_device_id intel_th_pci_id_table[] 
= {
PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x4b26),
.driver_data = (kernel_ulong_t)_th_2x,
},
+   {
+   /* Emmitsburg PCH */
+   PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x1bcc),
+   .driver_data = (kernel_ulong_t)_th_2x,
+   },
{ 0 },
 };
 
-- 
2.27.0



[PATCH 1/4] intel_th: pci: Add Jasper Lake CPU support

2020-07-06 Thread Alexander Shishkin
This adds support for the Trace Hub in Jasper Lake CPU.

Signed-off-by: Alexander Shishkin 
Reviewed-by: Andy Shevchenko 
Cc: sta...@vger.kernel.org # v4.14+
---
 drivers/hwtracing/intel_th/pci.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/hwtracing/intel_th/pci.c b/drivers/hwtracing/intel_th/pci.c
index 7ccac74553a6..f1dc1eef9ba2 100644
--- a/drivers/hwtracing/intel_th/pci.c
+++ b/drivers/hwtracing/intel_th/pci.c
@@ -238,6 +238,11 @@ static const struct pci_device_id intel_th_pci_id_table[] 
= {
PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x4da6),
.driver_data = (kernel_ulong_t)_th_2x,
},
+   {
+   /* Jasper Lake CPU */
+   PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x4e29),
+   .driver_data = (kernel_ulong_t)_th_2x,
+   },
{
/* Elkhart Lake CPU */
PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x4529),
-- 
2.27.0



Re: [PATCH v2 4/4] perf/x86/intel/pt: Opportunistically use single range output mode

2019-10-23 Thread Alexander Shishkin
Alexander Shishkin  writes:

> diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
> index 2f20d5a333c1..6edd7b785861 100644
> --- a/arch/x86/events/intel/pt.c
> +++ b/arch/x86/events/intel/pt.c
> @@ -491,7 +491,9 @@ static void pt_config(struct perf_event *event)
>   }
>  
>   reg = pt_config_filters(event);
> - reg |= RTIT_CTL_TOPA | RTIT_CTL_TRACEEN;
> + reg |= RTIT_CTL_TRACEEN;
> + if (!buf->single)
> + reg |= RTIT_CTL_TOPA;

This one is broken. The below is better.

>From a18feffdc15957f6db0a686e51cddf69eef205c3 Mon Sep 17 00:00:00 2001
From: Alexander Shishkin 
Date: Thu, 17 Oct 2019 15:42:15 +0300
Subject: [PATCH] perf/x86/intel/pt: Opportunistically use single range output
 mode

Most of PT implementations support Single Range Output mode, which is
an alternative to ToPA that can be used for a single contiguous buffer
and if we don't require an interrupt, that is, in AUX snapshot mode.

Now that perf core will use high order allocations for the AUX buffer,
in many cases the first condition will also be satisfied.

The two most obvious benefits of the Single Range Output mode over the
ToPA are:
 * not having to allocate the ToPA table(s),
 * not using the ToPA walk hardware.

Make use of this functionality where available and appropriate.

Signed-off-by: Alexander Shishkin 
---
 arch/x86/events/intel/pt.c | 118 -
 arch/x86/events/intel/pt.h |   2 +
 2 files changed, 92 insertions(+), 28 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index 2f20d5a333c1..e6a10a5dba85 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -482,6 +482,8 @@ static u64 pt_config_filters(struct perf_event *event)
 
 static void pt_config(struct perf_event *event)
 {
+   struct pt *pt = this_cpu_ptr(_ctx);
+   struct pt_buffer *buf = perf_get_aux(>handle);
u64 reg;
 
/* First round: clear STATUS, in particular the PSB byte counter. */
@@ -491,7 +493,9 @@ static void pt_config(struct perf_event *event)
}
 
reg = pt_config_filters(event);
-   reg |= RTIT_CTL_TOPA | RTIT_CTL_TRACEEN;
+   reg |= RTIT_CTL_TRACEEN;
+   if (!buf->single)
+   reg |= RTIT_CTL_TOPA;
 
/*
 * Previously, we had BRANCH_EN on by default, but now that PT has
@@ -543,18 +547,6 @@ static void pt_config_stop(struct perf_event *event)
wmb();
 }
 
-static void pt_config_buffer(void *buf, unsigned int topa_idx,
-unsigned int output_off)
-{
-   u64 reg;
-
-   wrmsrl(MSR_IA32_RTIT_OUTPUT_BASE, virt_to_phys(buf));
-
-   reg = 0x7f | ((u64)topa_idx << 7) | ((u64)output_off << 32);
-
-   wrmsrl(MSR_IA32_RTIT_OUTPUT_MASK, reg);
-}
-
 /**
  * struct topa - ToPA metadata
  * @list:  linkage to struct pt_buffer's list of tables
@@ -612,6 +604,26 @@ static inline phys_addr_t topa_pfn(struct topa *topa)
 #define TOPA_ENTRY_SIZE(t, i) (sizes(TOPA_ENTRY((t), (i))->size))
 #define TOPA_ENTRY_PAGES(t, i) (1 << TOPA_ENTRY((t), (i))->size)
 
+static void pt_config_buffer(struct pt_buffer *buf)
+{
+   u64 reg, mask;
+   void *base;
+
+   if (buf->single) {
+   base = buf->data_pages[0];
+   mask = (buf->nr_pages * PAGE_SIZE - 1) >> 7;
+   } else {
+   base = topa_to_page(buf->cur)->table;
+   mask = (u64)buf->cur_idx;
+   }
+
+   wrmsrl(MSR_IA32_RTIT_OUTPUT_BASE, virt_to_phys(base));
+
+   reg = 0x7f | (mask << 7) | ((u64)buf->output_off << 32);
+
+   wrmsrl(MSR_IA32_RTIT_OUTPUT_MASK, reg);
+}
+
 /**
  * topa_alloc() - allocate page-sized ToPA table
  * @cpu:   CPU on which to allocate.
@@ -812,6 +824,11 @@ static void pt_update_head(struct pt *pt)
struct pt_buffer *buf = perf_get_aux(>handle);
u64 topa_idx, base, old;
 
+   if (buf->single) {
+   local_set(>data_size, buf->output_off);
+   return;
+   }
+
/* offset of the first region in this table from the beginning of buf */
base = buf->cur->offset + buf->output_off;
 
@@ -913,18 +930,21 @@ static void pt_handle_status(struct pt *pt)
  */
 static void pt_read_offset(struct pt_buffer *buf)
 {
-   u64 offset, base_topa;
+   u64 offset, base;
struct topa_page *tp;
 
-   rdmsrl(MSR_IA32_RTIT_OUTPUT_BASE, base_topa);
-   tp = phys_to_virt(base_topa);
-   buf->cur = >topa;
+   rdmsrl(MSR_IA32_RTIT_OUTPUT_BASE, base);
+   if (!buf->single) {
+   tp = phys_to_virt(base);
+   buf->cur = >topa;
+   }
 
rdmsrl(MSR_IA32_RTIT_OUTPUT_MASK, offset);
/* offset within current output region */
buf->output_off = offset >> 32;
/* index of current output region within this t

Re: [PATCH 1/3] perf: Optimize perf_install_in_event()

2019-10-23 Thread Alexander Shishkin
Peter Zijlstra  writes:

> + /*
> +  * perf_event_attr::disabled events will not run and can be initialized
> +  * without IPI. Except when this is the first event for the context, in
> +  * that case we need the magic of the IPI to set ctx->is_active.
> +  *
> +  * The IOC_ENABLE that is sure to follow the creation of a disabled
> +  * event will issue the IPI and reprogram the hardware.
> +  */
> + if (__perf_effective_state(event) == PERF_EVENT_STATE_OFF && 
> ctx->nr_events) {
> + raw_spin_lock_irq(>lock);
> + if (task && ctx->task == TASK_TOMBSTONE) {

Confused: isn't that redundant? If ctx->task reads TASK_TOMBSTONE, task
is always !NULL, afaict. And in any case, if a task context is going
away, we shouldn't probably be adding events there. Or am I missing
something?

Other than that, this makes sense to me, fwiw.

Regards,
--
Alex


[tip: perf/urgent] perf/aux: Fix AUX output stopping

2019-10-22 Thread tip-bot2 for Alexander Shishkin
The following commit has been merged into the perf/urgent branch of tip:

Commit-ID: f3a519e4add93b7b31a6616f0b09635ff2e6a159
Gitweb:
https://git.kernel.org/tip/f3a519e4add93b7b31a6616f0b09635ff2e6a159
Author:Alexander Shishkin 
AuthorDate:Tue, 22 Oct 2019 10:39:40 +03:00
Committer: Ingo Molnar 
CommitterDate: Tue, 22 Oct 2019 14:39:37 +02:00

perf/aux: Fix AUX output stopping

Commit:

  8a58ddae2379 ("perf/core: Fix exclusive events' grouping")

allows CAP_EXCLUSIVE events to be grouped with other events. Since all
of those also happen to be AUX events (which is not the case the other
way around, because arch/s390), this changes the rules for stopping the
output: the AUX event may not be on its PMU's context any more, if it's
grouped with a HW event, in which case it will be on that HW event's
context instead. If that's the case, munmap() of the AUX buffer can't
find and stop the AUX event, potentially leaving the last reference with
the atomic context, which will then end up freeing the AUX buffer. This
will then trip warnings:

Fix this by using the context's PMU context when looking for events
to stop, instead of the event's PMU context.

Signed-off-by: Alexander Shishkin 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: Vince Weaver 
Cc: sta...@vger.kernel.org
Link: 
https://lkml.kernel.org/r/20191022073940.61814-1-alexander.shish...@linux.intel.com
Signed-off-by: Ingo Molnar 
---
 kernel/events/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index f5d7950..bb3748d 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6949,7 +6949,7 @@ static void __perf_event_output_stop(struct perf_event 
*event, void *data)
 static int __perf_pmu_output_stop(void *info)
 {
struct perf_event *event = info;
-   struct pmu *pmu = event->pmu;
+   struct pmu *pmu = event->ctx->pmu;
struct perf_cpu_context *cpuctx = this_cpu_ptr(pmu->pmu_cpu_context);
struct remote_output ro = {
.rb = event->rb,


[PATCH v2 3/4] perf/x86/intel/pt: Add sampling support

2019-10-22 Thread Alexander Shishkin
Add AUX sampling support to the PT PMU: implement an NMI-safe callback
that takes a snapshot of the buffer without touching the event states.
This is done for PT events that don't use PMIs, that is, snapshot mode
(RO mapping of the AUX area).

Signed-off-by: Alexander Shishkin 
---
 arch/x86/events/intel/pt.c | 54 ++
 1 file changed, 54 insertions(+)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index 170f3b402274..2f20d5a333c1 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -1208,6 +1208,13 @@ pt_buffer_setup_aux(struct perf_event *event, void 
**pages,
if (!nr_pages)
return NULL;
 
+   /*
+* Only support AUX sampling in snapshot mode, where we don't
+* generate NMIs.
+*/
+   if (event->attr.aux_sample_size && !snapshot)
+   return NULL;
+
if (cpu == -1)
cpu = raw_smp_processor_id();
node = cpu_to_node(cpu);
@@ -1506,6 +1513,52 @@ static void pt_event_stop(struct perf_event *event, int 
mode)
}
 }
 
+static long pt_event_snapshot_aux(struct perf_event *event,
+ struct perf_output_handle *handle,
+ unsigned long size)
+{
+   struct pt *pt = this_cpu_ptr(_ctx);
+   struct pt_buffer *buf = perf_get_aux(>handle);
+   unsigned long from = 0, to;
+   long ret;
+
+   if (WARN_ON_ONCE(!buf))
+   return 0;
+
+   /*
+* Sampling is only allowed on snapshot events;
+* see pt_buffer_setup_aux().
+*/
+   if (WARN_ON_ONCE(!buf->snapshot))
+   return 0;
+
+   /*
+* Here, handle_nmi tells us if the tracing is on
+*/
+   if (READ_ONCE(pt->handle_nmi))
+   pt_config_stop(event);
+
+   pt_read_offset(buf);
+   pt_update_head(pt);
+
+   to = local_read(>data_size);
+   if (to < size)
+   from = buf->nr_pages << PAGE_SHIFT;
+   from += to - size;
+
+   ret = perf_output_copy_aux(>handle, handle, from, to);
+
+   /*
+* If the tracing was on when we turned up, restart it.
+* Compiler barrier not needed as we couldn't have been
+* preempted by anything that touches pt->handle_nmi.
+*/
+   if (pt->handle_nmi)
+   pt_config_start(event);
+
+   return ret;
+}
+
 static void pt_event_del(struct perf_event *event, int mode)
 {
pt_event_stop(event, PERF_EF_UPDATE);
@@ -1625,6 +1678,7 @@ static __init int pt_init(void)
pt_pmu.pmu.del   = pt_event_del;
pt_pmu.pmu.start = pt_event_start;
pt_pmu.pmu.stop  = pt_event_stop;
+   pt_pmu.pmu.snapshot_aux  = pt_event_snapshot_aux;
pt_pmu.pmu.read  = pt_event_read;
pt_pmu.pmu.setup_aux = pt_buffer_setup_aux;
pt_pmu.pmu.free_aux  = pt_buffer_free_aux;
-- 
2.23.0



[PATCH v2 4/4] perf/x86/intel/pt: Opportunistically use single range output mode

2019-10-22 Thread Alexander Shishkin
Most of PT implementations support Single Range Output mode, which is
an alternative to ToPA that can be used for a single contiguous buffer
and if we don't require an interrupt, that is, in AUX snapshot mode.

Now that perf core will use high order allocations for the AUX buffer,
in many cases the first condition will also be satisfied.

The two most obvious benefits of the Single Range Output mode over the
ToPA are:
 * not having to allocate the ToPA table(s),
 * not using the ToPA walk hardware.

Make use of this functionality where available and appropriate.

Signed-off-by: Alexander Shishkin 
---
 arch/x86/events/intel/pt.c | 116 -
 arch/x86/events/intel/pt.h |   2 +
 2 files changed, 90 insertions(+), 28 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index 2f20d5a333c1..6edd7b785861 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -491,7 +491,9 @@ static void pt_config(struct perf_event *event)
}
 
reg = pt_config_filters(event);
-   reg |= RTIT_CTL_TOPA | RTIT_CTL_TRACEEN;
+   reg |= RTIT_CTL_TRACEEN;
+   if (!buf->single)
+   reg |= RTIT_CTL_TOPA;
 
/*
 * Previously, we had BRANCH_EN on by default, but now that PT has
@@ -543,18 +545,6 @@ static void pt_config_stop(struct perf_event *event)
wmb();
 }
 
-static void pt_config_buffer(void *buf, unsigned int topa_idx,
-unsigned int output_off)
-{
-   u64 reg;
-
-   wrmsrl(MSR_IA32_RTIT_OUTPUT_BASE, virt_to_phys(buf));
-
-   reg = 0x7f | ((u64)topa_idx << 7) | ((u64)output_off << 32);
-
-   wrmsrl(MSR_IA32_RTIT_OUTPUT_MASK, reg);
-}
-
 /**
  * struct topa - ToPA metadata
  * @list:  linkage to struct pt_buffer's list of tables
@@ -612,6 +602,26 @@ static inline phys_addr_t topa_pfn(struct topa *topa)
 #define TOPA_ENTRY_SIZE(t, i) (sizes(TOPA_ENTRY((t), (i))->size))
 #define TOPA_ENTRY_PAGES(t, i) (1 << TOPA_ENTRY((t), (i))->size)
 
+static void pt_config_buffer(struct pt_buffer *buf)
+{
+   u64 reg, mask;
+   void *base;
+
+   if (buf->single) {
+   base = buf->data_pages[0];
+   mask = (buf->nr_pages * PAGE_SIZE - 1) >> 7;
+   } else {
+   base = topa_to_page(buf->cur)->table;
+   mask = (u64)buf->cur_idx;
+   }
+
+   wrmsrl(MSR_IA32_RTIT_OUTPUT_BASE, virt_to_phys(base));
+
+   reg = 0x7f | (mask << 7) | ((u64)buf->output_off << 32);
+
+   wrmsrl(MSR_IA32_RTIT_OUTPUT_MASK, reg);
+}
+
 /**
  * topa_alloc() - allocate page-sized ToPA table
  * @cpu:   CPU on which to allocate.
@@ -812,6 +822,11 @@ static void pt_update_head(struct pt *pt)
struct pt_buffer *buf = perf_get_aux(>handle);
u64 topa_idx, base, old;
 
+   if (buf->single) {
+   local_set(>data_size, buf->output_off);
+   return;
+   }
+
/* offset of the first region in this table from the beginning of buf */
base = buf->cur->offset + buf->output_off;
 
@@ -913,18 +928,21 @@ static void pt_handle_status(struct pt *pt)
  */
 static void pt_read_offset(struct pt_buffer *buf)
 {
-   u64 offset, base_topa;
+   u64 offset, base;
struct topa_page *tp;
 
-   rdmsrl(MSR_IA32_RTIT_OUTPUT_BASE, base_topa);
-   tp = phys_to_virt(base_topa);
-   buf->cur = >topa;
+   rdmsrl(MSR_IA32_RTIT_OUTPUT_BASE, base);
+   if (!buf->single) {
+   tp = phys_to_virt(base);
+   buf->cur = >topa;
+   }
 
rdmsrl(MSR_IA32_RTIT_OUTPUT_MASK, offset);
/* offset within current output region */
buf->output_off = offset >> 32;
/* index of current output region within this table */
-   buf->cur_idx = (offset & 0xff80) >> 7;
+   if (!buf->single)
+   buf->cur_idx = (offset & 0xff80) >> 7;
 }
 
 static struct topa_entry *
@@ -1040,6 +1058,9 @@ static int pt_buffer_reset_markers(struct pt_buffer *buf,
unsigned long head = local64_read(>head);
unsigned long idx, npages, wakeup;
 
+   if (buf->single)
+   return 0;
+
/* can't stop in the middle of an output region */
if (buf->output_off + handle->size + 1 < pt_buffer_region_size(buf)) {
perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
@@ -1121,13 +1142,17 @@ static void pt_buffer_reset_offsets(struct pt_buffer 
*buf, unsigned long head)
if (buf->snapshot)
head &= (buf->nr_pages << PAGE_SHIFT) - 1;
 
-   pg = (head >> PAGE_SHIFT) & (buf->nr_pages - 1);
-   te = pt_topa_entry_for_page(buf, pg);
+   if (!buf->single) {
+   pg = (head >> PAGE_SHIFT) & (buf->nr_pages - 1);
+ 

[PATCH v2 1/4] perf: Allow using AUX data in perf samples

2019-10-22 Thread Alexander Shishkin
AUX data can be used to annotate perf events such as performance counters
or tracepoints/breakpoints by including it in sample records when
PERF_SAMPLE_AUX flag is set. Such samples would be instrumental in debugging
and profiling by providing, for example, a history of instruction flow
leading up to the event's overflow.

The implementation makes use of grouping an AUX event with all the events
that wish to take samples of the AUX data, such that the former is the
group leader. The samplees should also specify the desired size of the AUX
sample via attr.aux_sample_size.

AUX capable PMUs need to explicitly add support for sampling, because it
relies on a new callback to take a snapshot of the buffer without touching
the event states.

Signed-off-by: Alexander Shishkin 
---
 include/linux/perf_event.h  |  20 
 include/uapi/linux/perf_event.h |   7 +-
 kernel/events/core.c| 159 +++-
 kernel/events/internal.h|   1 +
 kernel/events/ring_buffer.c |  36 
 5 files changed, 220 insertions(+), 3 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 587ae4d002f5..24c5093f1229 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -249,6 +249,8 @@ struct perf_event;
 #define PERF_PMU_CAP_NO_EXCLUDE0x80
 #define PERF_PMU_CAP_AUX_OUTPUT0x100
 
+struct perf_output_handle;
+
 /**
  * struct pmu - generic performance monitoring unit
  */
@@ -423,6 +425,19 @@ struct pmu {
 */
void (*free_aux)(void *aux); /* optional */
 
+   /*
+* Take a snapshot of the AUX buffer without touching the event
+* state, so that preempting ->start()/->stop() callbacks does
+* not interfere with their logic. Called in PMI context.
+*
+* Returns the size of AUX data copied to the output handle.
+*
+* Optional.
+*/
+   long (*snapshot_aux)(struct perf_event *event,
+struct perf_output_handle *handle,
+unsigned long size);
+
/*
 * Validate address range filters: make sure the HW supports the
 * requested configuration and number of filters; return 0 if the
@@ -964,6 +979,7 @@ struct perf_sample_data {
u32 reserved;
}   cpu_entry;
struct perf_callchain_entry *callchain;
+   u64 aux_size;
 
/*
 * regs_user may point to task_pt_regs or to regs_user_copy, depending
@@ -996,6 +1012,7 @@ static inline void perf_sample_data_init(struct 
perf_sample_data *data,
data->weight = 0;
data->data_src.val = PERF_MEM_NA;
data->txn = 0;
+   data->aux_size = 0;
 }
 
 extern void perf_output_sample(struct perf_output_handle *handle,
@@ -1353,6 +1370,9 @@ extern unsigned int perf_output_copy(struct 
perf_output_handle *handle,
 const void *buf, unsigned int len);
 extern unsigned int perf_output_skip(struct perf_output_handle *handle,
 unsigned int len);
+extern long perf_output_copy_aux(struct perf_output_handle *aux_handle,
+struct perf_output_handle *handle,
+unsigned long from, unsigned long to);
 extern int perf_swevent_get_recursion_context(void);
 extern void perf_swevent_put_recursion_context(int rctx);
 extern u64 perf_swevent_set_period(struct perf_event *event);
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index bb7b271397a6..84dbd1499a24 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -141,8 +141,9 @@ enum perf_event_sample_format {
PERF_SAMPLE_TRANSACTION = 1U << 17,
PERF_SAMPLE_REGS_INTR   = 1U << 18,
PERF_SAMPLE_PHYS_ADDR   = 1U << 19,
+   PERF_SAMPLE_AUX = 1U << 20,
 
-   PERF_SAMPLE_MAX = 1U << 20, /* non-ABI */
+   PERF_SAMPLE_MAX = 1U << 21, /* non-ABI */
 
__PERF_SAMPLE_CALLCHAIN_EARLY   = 1ULL << 63, /* non-ABI; 
internal use */
 };
@@ -424,7 +425,7 @@ struct perf_event_attr {
 */
__u32   aux_watermark;
__u16   sample_max_stack;
-   __u16   __reserved_2;   /* align to __u64 */
+   __u16   aux_sample_size;
 };
 
 /*
@@ -864,6 +865,8 @@ enum perf_event_type {
 *  { u64   abi; # enum perf_sample_regs_abi
 *u64   regs[weight(mask)]; } && 
PERF_SAMPLE_REGS_INTR
 *  { u64   phys_addr;} && PERF_SAMPLE_PHYS_ADDR
+*  { u64   si

[PATCH v2 2/4] perf/x86/intel/pt: Factor out starting the trace

2019-10-22 Thread Alexander Shishkin
PT trace is now enabled at the bottom of the event configuration
function that takes care of all configuration bits related to a given
event, including the address filter update. This is only needed where
the event configuration changes, that is, in ->add()/->start().

In the interrupt path we can use a lighter version that keeps the
configuration intact, since it hasn't changed, and only flips the
enable bit.

Signed-off-by: Alexander Shishkin 
---
 arch/x86/events/intel/pt.c | 22 --
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index 05e43d0f430b..170f3b402274 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -397,6 +397,20 @@ static bool pt_event_valid(struct perf_event *event)
  * These all are cpu affine and operate on a local PT
  */
 
+static void pt_config_start(struct perf_event *event)
+{
+   struct pt *pt = this_cpu_ptr(_ctx);
+   u64 ctl = event->hw.config;
+
+   ctl |= RTIT_CTL_TRACEEN;
+   if (READ_ONCE(pt->vmx_on))
+   perf_aux_output_flag(>handle, PERF_AUX_FLAG_PARTIAL);
+   else
+   wrmsrl(MSR_IA32_RTIT_CTL, ctl);
+
+   WRITE_ONCE(event->hw.config, ctl);
+}
+
 /* Address ranges and their corresponding msr configuration registers */
 static const struct pt_address_range {
unsigned long   msr_a;
@@ -468,7 +482,6 @@ static u64 pt_config_filters(struct perf_event *event)
 
 static void pt_config(struct perf_event *event)
 {
-   struct pt *pt = this_cpu_ptr(_ctx);
u64 reg;
 
/* First round: clear STATUS, in particular the PSB byte counter. */
@@ -501,10 +514,7 @@ static void pt_config(struct perf_event *event)
reg |= (event->attr.config & PT_CONFIG_MASK);
 
event->hw.config = reg;
-   if (READ_ONCE(pt->vmx_on))
-   perf_aux_output_flag(>handle, PERF_AUX_FLAG_PARTIAL);
-   else
-   wrmsrl(MSR_IA32_RTIT_CTL, reg);
+   pt_config_start(event);
 }
 
 static void pt_config_stop(struct perf_event *event)
@@ -1381,7 +1391,7 @@ void intel_pt_interrupt(void)
 
pt_config_buffer(topa_to_page(buf->cur)->table, buf->cur_idx,
 buf->output_off);
-   pt_config(event);
+   pt_config_start(event);
}
 }
 
-- 
2.23.0



[PATCH v2 0/4] perf: Add AUX data sampling

2019-10-22 Thread Alexander Shishkin
Hi Peter,

Here's a new version of the AUX sampling. Since the previous one [3],
it addresses the issues of NMI-safety and sampling hardware events.
The former is addressed by adding a new PMU callback, the latter by
making use of grouping. It also depends on the AUX output stop fix
[4] to work correctly. I decided to post them separately, because [4]
is also a candidate for perf/urgent.

This series introduces AUX data sampling for perf events, which in
case of our instruction/branch tracing PMUs like Intel PT, BTS, CS
ETM means execution flow history leading up to a perf event's
overflow.

In case of Intel PT, this can be used as an alternative to LBR, with
virtually as many as you like branches per sample. It doesn't support
some of the LBR features (branch prediction indication, basic block
level timing, etc [1]) and it can't be exposed as BRANCH_STACK, because
that would require decoding PT stream in kernel space, which is not
practical. Instead, we deliver the PT data to userspace as is, for
offline processing. The PT decoder already supports presenting PT as
virtual LBR.

AUX sampling is different from the snapshot mode in that it doesn't
require instrumentation (for when to take a snapshot) and is better
for generic data collection, when you don't yet know what you are
looking for. It's also useful for automated data collection, for
example, for feedback-driven compiler optimizaitions.

It's also different from the "full trace mode" in that it produces
much less data and, consequently, takes up less I/O bandwidth and
storage space, and takes less time to decode.

The bulk of code is in 1/4, which adds the user interface bits and
the code to measure and copy out AUX data. 3/4 adds PT side support
for sampling. 4/4 is not strictly related, but makes an improvement
to the PT's snapshot mode by implementing a simpler buffer management
that would also benefit the sampling.

The tooling support is ready, although I'm not including it here to
save the bandwidth. Adrian or I will post itseparately. Meanwhile,
it can be found here [2].

[1] https://marc.info/?l=linux-kernel=147467007714928=2
[2] 
https://git.kernel.org/cgit/linux/kernel/git/ash/linux.git/log/?h=perf-aux-sampling
[3] https://marc.info/?l=linux-kernel=15287828771
[4] https://marc.info/?l=linux-kernel=157172999231707

Alexander Shishkin (4):
  perf: Allow using AUX data in perf samples
  perf/x86/intel/pt: Factor out starting the trace
  perf/x86/intel/pt: Add sampling support
  perf/x86/intel/pt: Opportunistically use single range output mode

 arch/x86/events/intel/pt.c  | 192 ++--
 arch/x86/events/intel/pt.h  |   2 +
 include/linux/perf_event.h  |  20 
 include/uapi/linux/perf_event.h |   7 +-
 kernel/events/core.c| 159 +-
 kernel/events/internal.h|   1 +
 kernel/events/ring_buffer.c |  36 ++
 7 files changed, 380 insertions(+), 37 deletions(-)

-- 
2.23.0



[PATCH] perf: Fix AUX output stopping

2019-10-22 Thread Alexander Shishkin
Commit

  8a58ddae2379 ("perf/core: Fix exclusive events' grouping")

allows CAP_EXCLUSIVE events to be grouped with other events. Since all
of those also happen to be AUX events (which is not the case the other
way around, because arch/s390), this changes the rules for stopping the
output: the AUX event may not be on its PMU's context any more, if it's
grouped with a HW event, in which case it will be on that HW event's
context instead. If that's the case, munmap() of the AUX buffer can't
find and stop the AUX event, potentially leaving the last reference with
the atomic context, which will then end up freeing the AUX buffer. This
will then trip warnings:

> WARNING: CPU: 2 PID: 318 at kernel/events/core.c:5615 
> perf_mmap_close+0x839/0x850
> Modules linked in:
> CPU: 2 PID: 318 Comm: exclusive-group Tainted: G W 
> 5.4.0-rc3-00070-g39b656ee9f2c #846
> RIP: 0010:perf_mmap_close+0x839/0x850

Fix this by using the context's PMU context when looking for events
to stop, instead of the event's PMU context.

Signed-off-by: Alexander Shishkin 
Cc: sta...@vger.kernel.org
---
 kernel/events/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 5bbaabdad068..77793ef0d8bc 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6965,7 +6965,7 @@ static void __perf_event_output_stop(struct perf_event 
*event, void *data)
 static int __perf_pmu_output_stop(void *info)
 {
struct perf_event *event = info;
-   struct pmu *pmu = event->pmu;
+   struct pmu *pmu = event->ctx->pmu;
struct perf_cpu_context *cpuctx = this_cpu_ptr(pmu->pmu_cpu_context);
struct remote_output ro = {
.rb = event->rb,
-- 
2.23.0



Re: [PATCH] perf/x86/intel/pt: Fix base for single entry topa

2019-10-21 Thread Alexander Shishkin
Jiri Olsa  writes:

> Jan reported failing ltp test for pt. It looks like the reason
> is commit 38bb8d77d0b9, that did not keep the TOPA_SHIFT for
> entry base, adding it back.

Thanks for taking care of that so quickly!

Regards,
--
Alex


Re: [PATCH] perf: Fix inheritance of aux_output groups

2019-10-07 Thread Alexander Shishkin
Ingo Molnar  writes:

> * Alexander Shishkin  wrote:
>
>> Commit
>> 
>>   b43762ef010 ("perf: Allow normal events to output AUX data")
>
> Missing 'a', the proper SHA1 is:
>
> ab43762ef010 ("perf: Allow normal events to output AUX data")
>
> :-)

Ouch, sorry about that.

> Could this explain weird 'perf top' failures I'm seeing on my desktop, 
> which I was just about to debug and report?

Unlikely; to trigger the above you'd have to manually enable the
aux_output thingy and that would require HW support to begin with.

Thanks,
--
Alex


[tip: perf/urgent] perf/core: Fix inheritance of aux_output groups

2019-10-07 Thread tip-bot2 for Alexander Shishkin
The following commit has been merged into the perf/urgent branch of tip:

Commit-ID: f733c6b508bcaa3441ba1eacf16efb9abd47489f
Gitweb:
https://git.kernel.org/tip/f733c6b508bcaa3441ba1eacf16efb9abd47489f
Author:Alexander Shishkin 
AuthorDate:Fri, 04 Oct 2019 15:57:29 +03:00
Committer: Ingo Molnar 
CommitterDate: Mon, 07 Oct 2019 16:50:42 +02:00

perf/core: Fix inheritance of aux_output groups

Commit:

  ab43762ef010 ("perf: Allow normal events to output AUX data")

forgets to configure aux_output relation in the inherited groups, which
results in child PEBS events forever failing to schedule.

Fix this by setting up the AUX output link in the inheritance path.

Signed-off-by: Alexander Shishkin 
Cc: Arnaldo Carvalho de Melo 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: 
https://lkml.kernel.org/r/20191004125729.32397-1-alexander.shish...@linux.intel.com
Signed-off-by: Ingo Molnar 
---
 kernel/events/core.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 3f0cb82..f953dd1 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11862,6 +11862,10 @@ static int inherit_group(struct perf_event 
*parent_event,
child, leader, child_ctx);
if (IS_ERR(child_ctr))
return PTR_ERR(child_ctr);
+
+   if (sub->aux_event == parent_event &&
+   !perf_get_aux_event(child_ctr, leader))
+   return -EINVAL;
}
return 0;
 }


[PATCH] perf: Fix inheritance of aux_output groups

2019-10-04 Thread Alexander Shishkin
Commit

  b43762ef010 ("perf: Allow normal events to output AUX data")

forgets to configure aux_output relation in the inherited groups, which
results in child PEBS events forever failing to schedule.

Fix this by setting up the AUX output link in the inheritance path.

Signed-off-by: Alexander Shishkin 
---
 kernel/events/core.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index f5bb2557d5f6..761995f21b30 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -12024,6 +12024,10 @@ static int inherit_group(struct perf_event 
*parent_event,
child, leader, child_ctx);
if (IS_ERR(child_ctr))
return PTR_ERR(child_ctr);
+
+   if (sub->aux_event == parent_event &&
+   !perf_get_aux_event(child_ctr, leader))
+   return -EINVAL;
}
return 0;
 }
-- 
2.23.0



Re: [PATCH v1 4/6] perf: Allow using AUX data in perf samples

2019-09-30 Thread Alexander Shishkin
Peter Zijlstra  writes:

> On Fri, Aug 09, 2019 at 03:32:39PM +0300, Alexander Shishkin wrote:
>> The other problem is sampling SW events, that would require a ctx->lock
>> to prevent racing with event_function_call()s from other cpus, resulting
>> in somewhat cringy "if (!in_nmi()) raw_spin_lock(...)", but I don't have
>> better idea as to how to handle that.
>
>> +int perf_pmu_aux_sample_output(struct perf_event *event,
>> +   struct perf_output_handle *handle,
>> +   unsigned long size)
>> +{
>> +unsigned long flags;
>> +int ret;
>> +
>> +/*
>> + * NMI vs IRQ
>> + *
>> + * Normal ->start()/->stop() callbacks run in IRQ mode in scheduler
>> + * paths. If we start calling them in NMI context, they may race with
>> + * the IRQ ones, that is, for example, re-starting an event that's just
>> + * been stopped.
>> + */
>> +if (!in_nmi())
>> +raw_spin_lock_irqsave(>ctx->lock, flags);
>> +
>> +ret = event->pmu->snapshot_aux(event, handle, size);
>> +
>> +if (!in_nmi())
>> +raw_spin_unlock_irqrestore(>ctx->lock, flags);
>> +
>> +return ret;
>> +}
>
> I'm confused... would not something like:
>
>   unsigned long flags;
>
>   local_irq_save(flags);
>   ret = event->pmu->snapshot_aux(...);
>   local_irq_restore(flags);
>
>   return ret;
>
> Be sufficient? By disabling IRQs we already hold off remote
> event_function_call()s.
>
> Or am I misunderstanding the race here?

No, you're right, disabling IRQs covers our bases.

Thanks,
--
Alex


Re: [PATCH v1 4/6] perf: Allow using AUX data in perf samples

2019-09-26 Thread Alexander Shishkin
Alexander Shishkin  writes:

> Peter Zijlstra  writes:
>
>> On Tue, Jun 19, 2018 at 01:47:25PM +0300, Alexander Shishkin wrote:
>>> Right, the SW stuff may then race with event_function_call() stuff. Hmm.
>>> For the HW stuff, I'm hoping that some kind of a sleight of hand may
>>> suffice. Let me think some more.
>>
>> I currently don't see how the SW driven snapshot can ever work, see my
>> comment on the last patch.
>
> Yes, this should be solved by grouping, similarly PEBS->PT.

Peter, do you have any thoughts/suggestions as to this RFC? Thanks!

Regards,
--
Alex


Re: [GIT PULL v1 0/4] stm class/intel_th: Fixes for v5.3

2019-08-28 Thread Alexander Shishkin
Alexander Shishkin  writes:

> Hi Greg,
>
> These are the fixes that I have for v5.3. One is an actual bugfix that's
> copied to stable, one SPDX header fix and two new PCI IDs, copied to
> stable as well. Signed tag below, individual patches follow. Please
> consider applying or pulling. Thanks!

I forgot to attach the pull request to this one, but I figured, you'll
take patches anyway. Sorry about that.

The following changes since commit 5f9e832c137075045d15cd6899ab0505cfb2ca4b:

  Linus 5.3-rc1 (2019-07-21 14:05:38 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/ash/stm.git 
tags/stm-intel_th-fixes-for-greg-20190821

for you to fetch changes up to 9b2b12d5f2ea658c0441c9d066c7faeb4f949ad0:

  intel_th: pci: Add Tiger Lake support (2019-08-21 10:40:48 +0300)


stm class/intel_th: Fixes for v5.3

These are:
  * Double-free fix, going back to v4.4.
  * SPDX license header fix.
  * 2 new PCI IDs.

--------
Alexander Shishkin (2):
  intel_th: pci: Add support for another Lewisburg PCH
  intel_th: pci: Add Tiger Lake support

Ding Xiang (1):
  stm class: Fix a double free of stm_source_device

Nishad Kamdar (1):
  intel_th: Use the correct style for SPDX License Identifier

 drivers/hwtracing/intel_th/msu.h |  2 +-
 drivers/hwtracing/intel_th/pci.c | 10 ++
 drivers/hwtracing/intel_th/pti.h |  2 +-
 drivers/hwtracing/stm/core.c |  1 -
 4 files changed, 12 insertions(+), 3 deletions(-)


[tip: perf/core] perf: Allow normal events to output AUX data

2019-08-28 Thread tip-bot2 for Alexander Shishkin
The following commit has been merged into the perf/core branch of tip:

Commit-ID: ab43762ef010967e4ccd53627f70a2eecbeafefb
Gitweb:
https://git.kernel.org/tip/ab43762ef010967e4ccd53627f70a2eecbeafefb
Author:Alexander Shishkin 
AuthorDate:Tue, 06 Aug 2019 11:46:00 +03:00
Committer: Peter Zijlstra 
CommitterDate: Wed, 28 Aug 2019 11:29:38 +02:00

perf: Allow normal events to output AUX data

In some cases, ordinary (non-AUX) events can generate data for AUX events.
For example, PEBS events can come out as records in the Intel PT stream
instead of their usual DS records, if configured to do so.

One requirement for such events is to consistently schedule together, to
ensure that the data from the "AUX output" events isn't lost while their
corresponding AUX event is not scheduled. We use grouping to provide this
guarantee: an "AUX output" event can be added to a group where an AUX event
is a group leader, and provided that the former supports writing to the
latter.

Signed-off-by: Alexander Shishkin 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Ingo Molnar 
Cc: Arnaldo Carvalho de Melo 
Cc: kan.li...@linux.intel.com
Link: 
https://lkml.kernel.org/r/20190806084606.4021-2-alexander.shish...@linux.intel.com
---
 include/linux/perf_event.h  | 14 +-
 include/uapi/linux/perf_event.h |  3 +-
 kernel/events/core.c| 93 -
 3 files changed, 109 insertions(+), 1 deletion(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index e8ad3c5..61448c1 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -246,6 +246,7 @@ struct perf_event;
 #define PERF_PMU_CAP_ITRACE0x20
 #define PERF_PMU_CAP_HETEROGENEOUS_CPUS0x40
 #define PERF_PMU_CAP_NO_EXCLUDE0x80
+#define PERF_PMU_CAP_AUX_OUTPUT0x100
 
 /**
  * struct pmu - generic performance monitoring unit
@@ -447,6 +448,16 @@ struct pmu {
/* optional */
 
/*
+* Check if event can be used for aux_output purposes for
+* events of this PMU.
+*
+* Runs from perf_event_open(). Should return 0 for "no match"
+* or non-zero for "match".
+*/
+   int (*aux_output_match) (struct perf_event *event);
+   /* optional */
+
+   /*
 * Filter events for PMU-specific reasons.
 */
int (*filter_match) (struct perf_event *event); /* optional 
*/
@@ -681,6 +692,9 @@ struct perf_event {
struct perf_addr_filter_range   *addr_filter_ranges;
unsigned long   addr_filters_gen;
 
+   /* for aux_output events */
+   struct perf_event   *aux_event;
+
void (*destroy)(struct perf_event *);
struct rcu_head rcu_head;
 
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 7198ddd..bb7b271 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -374,7 +374,8 @@ struct perf_event_attr {
namespaces :  1, /* include namespaces data 
*/
ksymbol:  1, /* include ksymbol events 
*/
bpf_event  :  1, /* include bpf events */
-   __reserved_1   : 33;
+   aux_output :  1, /* generate AUX records 
instead of events */
+   __reserved_1   : 32;
 
union {
__u32   wakeup_events;/* wakeup every n events */
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 0463c11..2aad959 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1887,6 +1887,89 @@ list_del_event(struct perf_event *event, struct 
perf_event_context *ctx)
ctx->generation++;
 }
 
+static int
+perf_aux_output_match(struct perf_event *event, struct perf_event *aux_event)
+{
+   if (!has_aux(aux_event))
+   return 0;
+
+   if (!event->pmu->aux_output_match)
+   return 0;
+
+   return event->pmu->aux_output_match(aux_event);
+}
+
+static void put_event(struct perf_event *event);
+static void event_sched_out(struct perf_event *event,
+   struct perf_cpu_context *cpuctx,
+   struct perf_event_context *ctx);
+
+static void perf_put_aux_event(struct perf_event *event)
+{
+   struct perf_event_context *ctx = event->ctx;
+   struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
+   struct perf_event *iter;
+
+   /*
+* If event uses aux_event tear down the link
+*/
+   if (event->aux_event) {
+   iter = event->aux_event;
+   event->aux_event = NULL;
+   put_event(iter);
+  

[tip: perf/core] perf/x86/intel: Support PEBS output to PT

2019-08-28 Thread tip-bot2 for Alexander Shishkin
The following commit has been merged into the perf/core branch of tip:

Commit-ID: 42880f726c66f13ae1d9ac9ce4c43abe64ecac84
Gitweb:
https://git.kernel.org/tip/42880f726c66f13ae1d9ac9ce4c43abe64ecac84
Author:Alexander Shishkin 
AuthorDate:Tue, 06 Aug 2019 11:46:01 +03:00
Committer: Peter Zijlstra 
CommitterDate: Wed, 28 Aug 2019 11:29:39 +02:00

perf/x86/intel: Support PEBS output to PT

If PEBS declares ability to output its data to Intel PT stream, use the
aux_output attribute bit to enable PEBS data output to PT. This requires
a PT event to be present and scheduled in the same context. Unlike the
DS area, the kernel does not extract PEBS records from the PT stream to
generate corresponding records in the perf stream, because that would
require real time in-kernel PT decoding, which is not feasible. The PMI,
however, can still be used.

The output setting is per-CPU, so all PEBS events must be either writing
to PT or to the DS area, therefore, in case of conflict, the conflicting
event will fail to schedule, allowing the rotation logic to alternate
between the PEBS->PT and PEBS->DS events.

Signed-off-by: Alexander Shishkin 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Ingo Molnar 
Cc: Arnaldo Carvalho de Melo 
Cc: kan.li...@linux.intel.com
Link: 
https://lkml.kernel.org/r/20190806084606.4021-3-alexander.shish...@linux.intel.com
---
 arch/x86/events/core.c   | 34 +-
 arch/x86/events/intel/core.c | 18 +++-
 arch/x86/events/intel/ds.c   | 51 ++-
 arch/x86/events/intel/pt.c   |  5 +++-
 arch/x86/events/perf_event.h | 17 ++-
 arch/x86/include/asm/intel_pt.h  |  2 +-
 arch/x86/include/asm/msr-index.h |  4 ++-
 7 files changed, 130 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 325959d..15b90b1 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1005,6 +1005,27 @@ static int collect_events(struct cpu_hw_events *cpuc, 
struct perf_event *leader,
 
/* current number of events already accepted */
n = cpuc->n_events;
+   if (!cpuc->n_events)
+   cpuc->pebs_output = 0;
+
+   if (!cpuc->is_fake && leader->attr.precise_ip) {
+   /*
+* For PEBS->PT, if !aux_event, the group leader (PT) went
+* away, the group was broken down and this singleton event
+* can't schedule any more.
+*/
+   if (is_pebs_pt(leader) && !leader->aux_event)
+   return -EINVAL;
+
+   /*
+* pebs_output: 0: no PEBS so far, 1: PT, 2: DS
+*/
+   if (cpuc->pebs_output &&
+   cpuc->pebs_output != is_pebs_pt(leader) + 1)
+   return -EINVAL;
+
+   cpuc->pebs_output = is_pebs_pt(leader) + 1;
+   }
 
if (is_x86_event(leader)) {
if (n >= max_count)
@@ -2241,6 +2262,17 @@ static int x86_pmu_check_period(struct perf_event 
*event, u64 value)
return 0;
 }
 
+static int x86_pmu_aux_output_match(struct perf_event *event)
+{
+   if (!(pmu.capabilities & PERF_PMU_CAP_AUX_OUTPUT))
+   return 0;
+
+   if (x86_pmu.aux_output_match)
+   return x86_pmu.aux_output_match(event);
+
+   return 0;
+}
+
 static struct pmu pmu = {
.pmu_enable = x86_pmu_enable,
.pmu_disable= x86_pmu_disable,
@@ -2266,6 +2298,8 @@ static struct pmu pmu = {
.sched_task = x86_pmu_sched_task,
.task_ctx_size  = sizeof(struct x86_perf_task_context),
.check_period   = x86_pmu_check_period,
+
+   .aux_output_match   = x86_pmu_aux_output_match,
 };
 
 void arch_perf_update_userpage(struct perf_event *event,
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 648260b..28459f4 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -3298,6 +3299,13 @@ static int intel_pmu_hw_config(struct perf_event *event)
}
}
 
+   if (event->attr.aux_output) {
+   if (!event->attr.precise_ip)
+   return -EINVAL;
+
+   event->hw.flags |= PERF_X86_EVENT_PEBS_VIA_PT;
+   }
+
if (event->attr.type != PERF_TYPE_RAW)
return 0;
 
@@ -3811,6 +3819,14 @@ static int intel_pmu_check_period(struct perf_event 
*event, u64 value)
return intel_pmu_has_bts_period(event, value) ? -EINVAL : 0;
 }
 
+static int intel_pmu_aux_output_match(struct perf_event *event)
+{
+   if (!x86_pmu.intel_cap.pebs_output_pt_available)
+   return 0;
+
+   return is_intel_pt_event(event);
+}
+
 PMU_FORMAT_ATTR(offcore_rsp,

[tip: perf/core] perf/x86/intel/pt: Clean up ToPA allocation path

2019-08-26 Thread tip-bot2 for Alexander Shishkin
The following commit has been merged into the perf/core branch of tip:

Commit-ID: 90583af61d0c0d2826f42a297a03645b35c49085
Gitweb:
https://git.kernel.org/tip/90583af61d0c0d2826f42a297a03645b35c49085
Author:Alexander Shishkin 
AuthorDate:Wed, 21 Aug 2019 15:47:22 +03:00
Committer: Ingo Molnar 
CommitterDate: Mon, 26 Aug 2019 12:00:12 +02:00

perf/x86/intel/pt: Clean up ToPA allocation path

Some of the allocation parameters are passed as function arguments,
while the CPU number for per-cpu allocation is passed via the buffer
object. There's no reason for this.

Pass the CPU as a function argument instead.

Signed-off-by: Alexander Shishkin 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: Vince Weaver 
Link: 
http://lkml.kernel.org/r/20190821124727.73310-2-alexander.shish...@linux.intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/events/intel/pt.c | 15 +++
 arch/x86/events/intel/pt.h |  2 --
 2 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index d3dc227..9d9258f 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -670,7 +670,7 @@ static bool topa_table_full(struct topa *topa)
  *
  * Return: 0 on success or error code.
  */
-static int topa_insert_pages(struct pt_buffer *buf, gfp_t gfp)
+static int topa_insert_pages(struct pt_buffer *buf, int cpu, gfp_t gfp)
 {
struct topa *topa = buf->last;
int order = 0;
@@ -681,7 +681,7 @@ static int topa_insert_pages(struct pt_buffer *buf, gfp_t 
gfp)
order = page_private(p);
 
if (topa_table_full(topa)) {
-   topa = topa_alloc(buf->cpu, gfp);
+   topa = topa_alloc(cpu, gfp);
if (!topa)
return -ENOMEM;
 
@@ -1061,20 +1061,20 @@ static void pt_buffer_fini_topa(struct pt_buffer *buf)
  * @size:  Total size of all regions within this ToPA.
  * @gfp:   Allocation flags.
  */
-static int pt_buffer_init_topa(struct pt_buffer *buf, unsigned long nr_pages,
-  gfp_t gfp)
+static int pt_buffer_init_topa(struct pt_buffer *buf, int cpu,
+  unsigned long nr_pages, gfp_t gfp)
 {
struct topa *topa;
int err;
 
-   topa = topa_alloc(buf->cpu, gfp);
+   topa = topa_alloc(cpu, gfp);
if (!topa)
return -ENOMEM;
 
topa_insert_table(buf, topa);
 
while (buf->nr_pages < nr_pages) {
-   err = topa_insert_pages(buf, gfp);
+   err = topa_insert_pages(buf, cpu, gfp);
if (err) {
pt_buffer_fini_topa(buf);
return -ENOMEM;
@@ -1124,13 +1124,12 @@ pt_buffer_setup_aux(struct perf_event *event, void 
**pages,
if (!buf)
return NULL;
 
-   buf->cpu = cpu;
buf->snapshot = snapshot;
buf->data_pages = pages;
 
INIT_LIST_HEAD(>tables);
 
-   ret = pt_buffer_init_topa(buf, nr_pages, GFP_KERNEL);
+   ret = pt_buffer_init_topa(buf, cpu, nr_pages, GFP_KERNEL);
if (ret) {
kfree(buf);
return NULL;
diff --git a/arch/x86/events/intel/pt.h b/arch/x86/events/intel/pt.h
index 63fe406..8de8ed0 100644
--- a/arch/x86/events/intel/pt.h
+++ b/arch/x86/events/intel/pt.h
@@ -53,7 +53,6 @@ struct pt_pmu {
 /**
  * struct pt_buffer - buffer configuration; one buffer per task_struct or
  * cpu, depending on perf event configuration
- * @cpu:   cpu for per-cpu allocation
  * @tables:list of ToPA tables in this buffer
  * @first: shorthand for first topa table
  * @last:  shorthand for last topa table
@@ -71,7 +70,6 @@ struct pt_pmu {
  * @topa_index:table of topa entries indexed by page offset
  */
 struct pt_buffer {
-   int cpu;
struct list_headtables;
struct topa *first, *last, *cur;
unsigned intcur_idx;


[tip: perf/core] perf/x86/intel/pt: Use pointer arithmetics instead in ToPA entry calculation

2019-08-26 Thread tip-bot2 for Alexander Shishkin
The following commit has been merged into the perf/core branch of tip:

Commit-ID: 539f7c26b41d4ed7d88dd9756de3966ae7ca07b4
Gitweb:
https://git.kernel.org/tip/539f7c26b41d4ed7d88dd9756de3966ae7ca07b4
Author:Alexander Shishkin 
AuthorDate:Wed, 21 Aug 2019 15:47:24 +03:00
Committer: Ingo Molnar 
CommitterDate: Mon, 26 Aug 2019 12:00:13 +02:00

perf/x86/intel/pt: Use pointer arithmetics instead in ToPA entry calculation

Currently, pt_buffer_reset_offsets() calculates the current ToPA entry by
casting pointers to addresses and performing ungainly subtractions and
divisions instead of a simpler pointer arithmetic, which would be perfectly
applicable in that case. Fix that.

Signed-off-by: Alexander Shishkin 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: Vince Weaver 
Link: 
http://lkml.kernel.org/r/20190821124727.73310-4-alexander.shish...@linux.intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/events/intel/pt.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index f269875..188d45f 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -1030,8 +1030,7 @@ static void pt_buffer_reset_offsets(struct pt_buffer 
*buf, unsigned long head)
pg = pt_topa_next_entry(buf, pg);
 
buf->cur = (struct topa *)((unsigned long)buf->topa_index[pg] & 
PAGE_MASK);
-   buf->cur_idx = ((unsigned long)buf->topa_index[pg] -
-   (unsigned long)buf->cur) / sizeof(struct topa_entry);
+   buf->cur_idx = buf->topa_index[pg] - TOPA_ENTRY(buf->cur, 0);
buf->output_off = head & (pt_buffer_region_size(buf) - 1);
 
local64_set(>head, head);


[tip: perf/core] perf/x86/intel/pt: Get rid of reverse lookup table for ToPA

2019-08-26 Thread tip-bot2 for Alexander Shishkin
The following commit has been merged into the perf/core branch of tip:

Commit-ID: 39152ee51b77851689f9b23fde6f610d13566c39
Gitweb:
https://git.kernel.org/tip/39152ee51b77851689f9b23fde6f610d13566c39
Author:Alexander Shishkin 
AuthorDate:Wed, 21 Aug 2019 15:47:27 +03:00
Committer: Ingo Molnar 
CommitterDate: Mon, 26 Aug 2019 12:00:16 +02:00

perf/x86/intel/pt: Get rid of reverse lookup table for ToPA

In order to quickly find a ToPA entry by its page offset in the buffer,
we're using a reverse lookup table. The problem with it is that it's a
large array of mostly similar pointers, especially so now that we're
using high order allocations from the page allocator. Because its size
is limited to whatever is the maximum for kmalloc(), it places a limit
on the number of ToPA entries per buffer, and therefore, on the total
buffer size, which otherwise doesn't have to be there.

Replace the reverse lookup table with a simple runtime lookup. With the
high order AUX allocations in place, the runtime penalty of such a lookup
is much smaller and in cases where all entries in a ToPA table are of
the same size, the complexity is O(1).

Signed-off-by: Alexander Shishkin 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: Vince Weaver 
Link: 
http://lkml.kernel.org/r/20190821124727.73310-7-alexander.shish...@linux.intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/events/intel/pt.c | 194 +++-
 arch/x86/events/intel/pt.h |  10 +-
 2 files changed, 131 insertions(+), 73 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index 0f38ed3..fa43d90 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -551,12 +551,14 @@ static void pt_config_buffer(void *buf, unsigned int 
topa_idx,
  * @offset:offset of the first entry in this table in the buffer
  * @size:  total size of all entries in this table
  * @last:  index of the last initialized entry in this table
+ * @z_count:   how many times the first entry repeats
  */
 struct topa {
struct list_headlist;
u64 offset;
size_t  size;
int last;
+   unsigned intz_count;
 };
 
 /*
@@ -598,6 +600,7 @@ static inline phys_addr_t topa_pfn(struct topa *topa)
? _to_page(t)->table[(t)->last]\
: _to_page(t)->table[(i)])
 #define TOPA_ENTRY_SIZE(t, i) (sizes(TOPA_ENTRY((t), (i))->size))
+#define TOPA_ENTRY_PAGES(t, i) (1 << TOPA_ENTRY((t), (i))->size)
 
 /**
  * topa_alloc() - allocate page-sized ToPA table
@@ -713,6 +716,11 @@ static int topa_insert_pages(struct pt_buffer *buf, int 
cpu, gfp_t gfp)
topa_insert_table(buf, topa);
}
 
+   if (topa->z_count == topa->last - 1) {
+   if (order == TOPA_ENTRY(topa, topa->last - 1)->size)
+   topa->z_count++;
+   }
+
TOPA_ENTRY(topa, -1)->base = page_to_phys(p) >> TOPA_SHIFT;
TOPA_ENTRY(topa, -1)->size = order;
if (!buf->snapshot &&
@@ -756,6 +764,8 @@ static void pt_topa_dump(struct pt_buffer *buf)
 tp->table[i].stop) ||
tp->table[i].end)
break;
+   if (!i && topa->z_count)
+   i += topa->z_count;
}
}
 }
@@ -907,29 +917,97 @@ static void pt_read_offset(struct pt_buffer *buf)
buf->cur_idx = (offset & 0xff80) >> 7;
 }
 
-/**
- * pt_topa_next_entry() - obtain index of the first page in the next ToPA entry
- * @buf:   PT buffer.
- * @pg:Page offset in the buffer.
- *
- * When advancing to the next output region (ToPA entry), given a page offset
- * into the buffer, we need to find the offset of the first page in the next
- * region.
- */
-static unsigned int pt_topa_next_entry(struct pt_buffer *buf, unsigned int pg)
+static struct topa_entry *
+pt_topa_entry_for_page(struct pt_buffer *buf, unsigned int pg)
+{
+   struct topa_page *tp;
+   struct topa *topa;
+   unsigned int idx, cur_pg = 0, z_pg = 0, start_idx = 0;
+
+   /*
+* Indicates a bug in the caller.
+*/
+   if (WARN_ON_ONCE(pg >= buf->nr_pages))
+   return NULL;
+
+   /*
+* First, find the ToPA table where @pg fits. With high
+* order allocations, there shouldn't be many of these.
+*/
+   list_for_each_entry(topa, >tables, list) {
+   if (topa->offset + topa->size > pg << PAGE_SHIFT)
+   goto found;
+   }
+
+   /*
+* Hitting this means we have a problem in the ToPA
+* allocation code.
+*/
+ 

[tip: perf/core] perf/x86/intel/pt: Split ToPA metadata and page layout

2019-08-26 Thread tip-bot2 for Alexander Shishkin
The following commit has been merged into the perf/core branch of tip:

Commit-ID: 38bb8d77d0b932a0773b5de2ef42479409314f96
Gitweb:
https://git.kernel.org/tip/38bb8d77d0b932a0773b5de2ef42479409314f96
Author:Alexander Shishkin 
AuthorDate:Wed, 21 Aug 2019 15:47:25 +03:00
Committer: Ingo Molnar 
CommitterDate: Mon, 26 Aug 2019 12:00:14 +02:00

perf/x86/intel/pt: Split ToPA metadata and page layout

PT uses page sized ToPA tables, where the ToPA table resides at the bottom
and its driver-specific metadata taking up a few words at the top of the
page. The split is currently calculated manually and needs to be redone
every time a field is added to or removed from the metadata structure.
Also, the 32-bit version can be made smaller.

By splitting the table and metadata into separate structures, we are making
the compiler figure out the division of the page.

Signed-off-by: Alexander Shishkin 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: Vince Weaver 
Link: 
http://lkml.kernel.org/r/20190821124727.73310-5-alexander.shish...@linux.intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/events/intel/pt.c | 93 +++--
 1 file changed, 60 insertions(+), 33 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index 188d45f..2e3f068 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -545,16 +545,8 @@ static void pt_config_buffer(void *buf, unsigned int 
topa_idx,
wrmsrl(MSR_IA32_RTIT_OUTPUT_MASK, reg);
 }
 
-/*
- * Keep ToPA table-related metadata on the same page as the actual table,
- * taking up a few words from the top
- */
-
-#define TENTS_PER_PAGE (((PAGE_SIZE - 40) / sizeof(struct topa_entry)) - 1)
-
 /**
- * struct topa - page-sized ToPA table with metadata at the top
- * @table: actual ToPA table entries, as understood by PT hardware
+ * struct topa - ToPA metadata
  * @list:  linkage to struct pt_buffer's list of tables
  * @phys:  physical address of this page
  * @offset:offset of the first entry in this table in the buffer
@@ -562,7 +554,6 @@ static void pt_config_buffer(void *buf, unsigned int 
topa_idx,
  * @last:  index of the last initialized entry in this table
  */
 struct topa {
-   struct topa_entry   table[TENTS_PER_PAGE];
struct list_headlist;
u64 phys;
u64 offset;
@@ -570,8 +561,39 @@ struct topa {
int last;
 };
 
+/*
+ * Keep ToPA table-related metadata on the same page as the actual table,
+ * taking up a few words from the top
+ */
+
+#define TENTS_PER_PAGE \
+   ((PAGE_SIZE - sizeof(struct topa)) / sizeof(struct topa_entry))
+
+/**
+ * struct topa_page - page-sized ToPA table with metadata at the top
+ * @table: actual ToPA table entries, as understood by PT hardware
+ * @topa:  metadata
+ */
+struct topa_page {
+   struct topa_entry   table[TENTS_PER_PAGE];
+   struct topa topa;
+};
+
+static inline struct topa_page *topa_to_page(struct topa *topa)
+{
+   return container_of(topa, struct topa_page, topa);
+}
+
+static inline struct topa_page *topa_entry_to_page(struct topa_entry *te)
+{
+   return (struct topa_page *)((unsigned long)te & PAGE_MASK);
+}
+
 /* make -1 stand for the last table entry */
-#define TOPA_ENTRY(t, i) ((i) == -1 ? &(t)->table[(t)->last] : 
&(t)->table[(i)])
+#define TOPA_ENTRY(t, i)   \
+   ((i) == -1  \
+   ? _to_page(t)->table[(t)->last]\
+   : _to_page(t)->table[(i)])
 #define TOPA_ENTRY_SIZE(t, i) (sizes(TOPA_ENTRY((t), (i))->size))
 
 /**
@@ -584,27 +606,27 @@ struct topa {
 static struct topa *topa_alloc(int cpu, gfp_t gfp)
 {
int node = cpu_to_node(cpu);
-   struct topa *topa;
+   struct topa_page *tp;
struct page *p;
 
p = alloc_pages_node(node, gfp | __GFP_ZERO, 0);
if (!p)
return NULL;
 
-   topa = page_address(p);
-   topa->last = 0;
-   topa->phys = page_to_phys(p);
+   tp = page_address(p);
+   tp->topa.last = 0;
+   tp->topa.phys = page_to_phys(p);
 
/*
 * In case of singe-entry ToPA, always put the self-referencing END
 * link as the 2nd entry in the table
 */
if (!intel_pt_validate_hw_cap(PT_CAP_topa_multiple_entries)) {
-   TOPA_ENTRY(topa, 1)->base = topa->phys >> TOPA_SHIFT;
-   TOPA_ENTRY(topa, 1)->end = 1;
+   TOPA_ENTRY(>topa, 1)->base = tp->topa.phys;
+   TOPA_ENTRY(>topa, 1)->end = 1;
}
 
-   return topa;
+   return >topa;
 }
 
 /**
@@ -714,22 +736,23 @@ static void pt_topa_dump(struct pt_buffer *buf

[tip: perf/core] perf/x86/intel/pt: Use helpers to obtain ToPA entry size

2019-08-26 Thread tip-bot2 for Alexander Shishkin
The following commit has been merged into the perf/core branch of tip:

Commit-ID: fffec50f541ace292383c0cbe9a2a97d16d201c6
Gitweb:
https://git.kernel.org/tip/fffec50f541ace292383c0cbe9a2a97d16d201c6
Author:Alexander Shishkin 
AuthorDate:Wed, 21 Aug 2019 15:47:23 +03:00
Committer: Ingo Molnar 
CommitterDate: Mon, 26 Aug 2019 12:00:13 +02:00

perf/x86/intel/pt: Use helpers to obtain ToPA entry size

There are a few places in the PT driver that need to obtain the size of
a ToPA entry, some of them for the current ToPA entry in the buffer.
Use helpers for those, to make the lines shorter and more readable.

Signed-off-by: Alexander Shishkin 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: Vince Weaver 
Link: 
http://lkml.kernel.org/r/20190821124727.73310-3-alexander.shish...@linux.intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/events/intel/pt.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index 9d9258f..f269875 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -572,6 +572,7 @@ struct topa {
 
 /* make -1 stand for the last table entry */
 #define TOPA_ENTRY(t, i) ((i) == -1 ? &(t)->table[(t)->last] : 
&(t)->table[(i)])
+#define TOPA_ENTRY_SIZE(t, i) (sizes(TOPA_ENTRY((t), (i))->size))
 
 /**
  * topa_alloc() - allocate page-sized ToPA table
@@ -771,7 +772,7 @@ static void pt_update_head(struct pt *pt)
 
/* offset of the current output region within this table */
for (topa_idx = 0; topa_idx < buf->cur_idx; topa_idx++)
-   base += sizes(buf->cur->table[topa_idx].size);
+   base += TOPA_ENTRY_SIZE(buf->cur, topa_idx);
 
if (buf->snapshot) {
local_set(>data_size, base);
@@ -800,7 +801,7 @@ static void *pt_buffer_region(struct pt_buffer *buf)
  */
 static size_t pt_buffer_region_size(struct pt_buffer *buf)
 {
-   return sizes(buf->cur->table[buf->cur_idx].size);
+   return TOPA_ENTRY_SIZE(buf->cur, buf->cur_idx);
 }
 
 /**
@@ -830,7 +831,7 @@ static void pt_handle_status(struct pt *pt)
 * know.
 */
if (!intel_pt_validate_hw_cap(PT_CAP_topa_multiple_entries) ||
-   buf->output_off == sizes(TOPA_ENTRY(buf->cur, 
buf->cur_idx)->size)) {
+   buf->output_off == pt_buffer_region_size(buf)) {
perf_aux_output_flag(>handle,
 PERF_AUX_FLAG_TRUNCATED);
advance++;
@@ -925,8 +926,7 @@ static int pt_buffer_reset_markers(struct pt_buffer *buf,
unsigned long idx, npages, wakeup;
 
/* can't stop in the middle of an output region */
-   if (buf->output_off + handle->size + 1 <
-   sizes(TOPA_ENTRY(buf->cur, buf->cur_idx)->size)) {
+   if (buf->output_off + handle->size + 1 < pt_buffer_region_size(buf)) {
perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
return -EINVAL;
}
@@ -1032,7 +1032,7 @@ static void pt_buffer_reset_offsets(struct pt_buffer 
*buf, unsigned long head)
buf->cur = (struct topa *)((unsigned long)buf->topa_index[pg] & 
PAGE_MASK);
buf->cur_idx = ((unsigned long)buf->topa_index[pg] -
(unsigned long)buf->cur) / sizeof(struct topa_entry);
-   buf->output_off = head & (sizes(buf->cur->table[buf->cur_idx].size) - 
1);
+   buf->output_off = head & (pt_buffer_region_size(buf) - 1);
 
local64_set(>head, head);
local_set(>data_size, 0);


[tip: perf/core] perf/x86/intel/pt: Free up space in a ToPA descriptor

2019-08-26 Thread tip-bot2 for Alexander Shishkin
The following commit has been merged into the perf/core branch of tip:

Commit-ID: 91feca5e2ecc9752894d57c9a72c2645471929c3
Gitweb:
https://git.kernel.org/tip/91feca5e2ecc9752894d57c9a72c2645471929c3
Author:Alexander Shishkin 
AuthorDate:Wed, 21 Aug 2019 15:47:26 +03:00
Committer: Ingo Molnar 
CommitterDate: Mon, 26 Aug 2019 12:00:15 +02:00

perf/x86/intel/pt: Free up space in a ToPA descriptor

Currently, we're storing physical address of a ToPA table in its
descriptor, which is completely unnecessary. Since the descriptor
and the table itself share the same page, reducing the descriptor
size leaves more space for the table.

Signed-off-by: Alexander Shishkin 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: Vince Weaver 
Link: 
http://lkml.kernel.org/r/20190821124727.73310-6-alexander.shish...@linux.intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/events/intel/pt.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index 2e3f068..0f38ed3 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -548,14 +548,12 @@ static void pt_config_buffer(void *buf, unsigned int 
topa_idx,
 /**
  * struct topa - ToPA metadata
  * @list:  linkage to struct pt_buffer's list of tables
- * @phys:  physical address of this page
  * @offset:offset of the first entry in this table in the buffer
  * @size:  total size of all entries in this table
  * @last:  index of the last initialized entry in this table
  */
 struct topa {
struct list_headlist;
-   u64 phys;
u64 offset;
size_t  size;
int last;
@@ -589,6 +587,11 @@ static inline struct topa_page *topa_entry_to_page(struct 
topa_entry *te)
return (struct topa_page *)((unsigned long)te & PAGE_MASK);
 }
 
+static inline phys_addr_t topa_pfn(struct topa *topa)
+{
+   return PFN_DOWN(virt_to_phys(topa_to_page(topa)));
+}
+
 /* make -1 stand for the last table entry */
 #define TOPA_ENTRY(t, i)   \
((i) == -1  \
@@ -615,14 +618,13 @@ static struct topa *topa_alloc(int cpu, gfp_t gfp)
 
tp = page_address(p);
tp->topa.last = 0;
-   tp->topa.phys = page_to_phys(p);
 
/*
 * In case of singe-entry ToPA, always put the self-referencing END
 * link as the 2nd entry in the table
 */
if (!intel_pt_validate_hw_cap(PT_CAP_topa_multiple_entries)) {
-   TOPA_ENTRY(>topa, 1)->base = tp->topa.phys;
+   TOPA_ENTRY(>topa, 1)->base = page_to_phys(p);
TOPA_ENTRY(>topa, 1)->end = 1;
}
 
@@ -666,7 +668,7 @@ static void topa_insert_table(struct pt_buffer *buf, struct 
topa *topa)
 
BUG_ON(last->last != TENTS_PER_PAGE - 1);
 
-   TOPA_ENTRY(last, -1)->base = topa->phys >> TOPA_SHIFT;
+   TOPA_ENTRY(last, -1)->base = topa_pfn(topa);
TOPA_ENTRY(last, -1)->end = 1;
 }
 
@@ -739,8 +741,8 @@ static void pt_topa_dump(struct pt_buffer *buf)
struct topa_page *tp = topa_to_page(topa);
int i;
 
-   pr_debug("# table @%p (%016Lx), off %llx size %zx\n", tp->table,
-topa->phys, topa->offset, topa->size);
+   pr_debug("# table @%p, off %llx size %zx\n", tp->table,
+topa->offset, topa->size);
for (i = 0; i < TENTS_PER_PAGE; i++) {
pr_debug("# entry @%p (%lx sz %u %c%c%c) raw=%16llx\n",
 >table[i],
@@ -,7 +1113,7 @@ static int pt_buffer_init_topa(struct pt_buffer *buf, int 
cpu,
 
/* link last table to the first one, unless we're double buffering */
if (intel_pt_validate_hw_cap(PT_CAP_topa_multiple_entries)) {
-   TOPA_ENTRY(buf->last, -1)->base = buf->first->phys >> 
TOPA_SHIFT;
+   TOPA_ENTRY(buf->last, -1)->base = topa_pfn(buf->first);
TOPA_ENTRY(buf->last, -1)->end = 1;
}
 


[PATCH v0 3/6] perf/x86/intel/pt: Use pointer arithmetics instead in ToPA entry calculation

2019-08-21 Thread Alexander Shishkin
Currently, pt_buffer_reset_offsets() calculates the current ToPA entry by
casting pointers to addresses and performing ungainly subtractions and
divisions instead of a simpler pointer arithmetic, which would be perfectly
applicable in that case. Fix that.

Signed-off-by: Alexander Shishkin 
---
 arch/x86/events/intel/pt.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index 831163a1b41a..15e7c11e3460 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -1030,8 +1030,7 @@ static void pt_buffer_reset_offsets(struct pt_buffer 
*buf, unsigned long head)
pg = pt_topa_next_entry(buf, pg);
 
buf->cur = (struct topa *)((unsigned long)buf->topa_index[pg] & 
PAGE_MASK);
-   buf->cur_idx = ((unsigned long)buf->topa_index[pg] -
-   (unsigned long)buf->cur) / sizeof(struct topa_entry);
+   buf->cur_idx = buf->topa_index[pg] - TOPA_ENTRY(buf->cur, 0);
buf->output_off = head & (pt_buffer_region_size(buf) - 1);
 
local64_set(>head, head);
-- 
2.23.0.rc1



[PATCH v0 1/6] perf/x86/intel/pt: Clean up ToPA allocation path

2019-08-21 Thread Alexander Shishkin
Some of the allocation parameters are passed as function arguments,
while the CPU number for per-cpu allocation is passed via the buffer
object. There's no reason for this.

Pass the CPU as a function argument instead.

Signed-off-by: Alexander Shishkin 
---
 arch/x86/events/intel/pt.c | 15 +++
 arch/x86/events/intel/pt.h |  2 --
 2 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index a7318c29242e..b029f32edae2 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -670,7 +670,7 @@ static bool topa_table_full(struct topa *topa)
  *
  * Return: 0 on success or error code.
  */
-static int topa_insert_pages(struct pt_buffer *buf, gfp_t gfp)
+static int topa_insert_pages(struct pt_buffer *buf, int cpu, gfp_t gfp)
 {
struct topa *topa = buf->last;
int order = 0;
@@ -681,7 +681,7 @@ static int topa_insert_pages(struct pt_buffer *buf, gfp_t 
gfp)
order = page_private(p);
 
if (topa_table_full(topa)) {
-   topa = topa_alloc(buf->cpu, gfp);
+   topa = topa_alloc(cpu, gfp);
if (!topa)
return -ENOMEM;
 
@@ -1061,20 +1061,20 @@ static void pt_buffer_fini_topa(struct pt_buffer *buf)
  * @size:  Total size of all regions within this ToPA.
  * @gfp:   Allocation flags.
  */
-static int pt_buffer_init_topa(struct pt_buffer *buf, unsigned long nr_pages,
-  gfp_t gfp)
+static int pt_buffer_init_topa(struct pt_buffer *buf, int cpu,
+  unsigned long nr_pages, gfp_t gfp)
 {
struct topa *topa;
int err;
 
-   topa = topa_alloc(buf->cpu, gfp);
+   topa = topa_alloc(cpu, gfp);
if (!topa)
return -ENOMEM;
 
topa_insert_table(buf, topa);
 
while (buf->nr_pages < nr_pages) {
-   err = topa_insert_pages(buf, gfp);
+   err = topa_insert_pages(buf, cpu, gfp);
if (err) {
pt_buffer_fini_topa(buf);
return -ENOMEM;
@@ -1124,13 +1124,12 @@ pt_buffer_setup_aux(struct perf_event *event, void 
**pages,
if (!buf)
return NULL;
 
-   buf->cpu = cpu;
buf->snapshot = snapshot;
buf->data_pages = pages;
 
INIT_LIST_HEAD(>tables);
 
-   ret = pt_buffer_init_topa(buf, nr_pages, GFP_KERNEL);
+   ret = pt_buffer_init_topa(buf, cpu, nr_pages, GFP_KERNEL);
if (ret) {
kfree(buf);
return NULL;
diff --git a/arch/x86/events/intel/pt.h b/arch/x86/events/intel/pt.h
index 63fe4063fbd6..8de8ed089697 100644
--- a/arch/x86/events/intel/pt.h
+++ b/arch/x86/events/intel/pt.h
@@ -53,7 +53,6 @@ struct pt_pmu {
 /**
  * struct pt_buffer - buffer configuration; one buffer per task_struct or
  * cpu, depending on perf event configuration
- * @cpu:   cpu for per-cpu allocation
  * @tables:list of ToPA tables in this buffer
  * @first: shorthand for first topa table
  * @last:  shorthand for last topa table
@@ -71,7 +70,6 @@ struct pt_pmu {
  * @topa_index:table of topa entries indexed by page offset
  */
 struct pt_buffer {
-   int cpu;
struct list_headtables;
struct topa *first, *last, *cur;
unsigned intcur_idx;
-- 
2.23.0.rc1



[PATCH v0 2/6] perf/x86/intel/pt: Use helpers to obtain ToPA entry size

2019-08-21 Thread Alexander Shishkin
There are a few places in the PT driver that need to obtain the size of
a ToPA entry, some of them for the current ToPA entry in the buffer.
Use helpers for those, to make the lines shorter and more readable.

Signed-off-by: Alexander Shishkin 
---
 arch/x86/events/intel/pt.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index b029f32edae2..831163a1b41a 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -572,6 +572,7 @@ struct topa {
 
 /* make -1 stand for the last table entry */
 #define TOPA_ENTRY(t, i) ((i) == -1 ? &(t)->table[(t)->last] : 
&(t)->table[(i)])
+#define TOPA_ENTRY_SIZE(t, i) (sizes(TOPA_ENTRY((t), (i))->size))
 
 /**
  * topa_alloc() - allocate page-sized ToPA table
@@ -771,7 +772,7 @@ static void pt_update_head(struct pt *pt)
 
/* offset of the current output region within this table */
for (topa_idx = 0; topa_idx < buf->cur_idx; topa_idx++)
-   base += sizes(buf->cur->table[topa_idx].size);
+   base += TOPA_ENTRY_SIZE(buf->cur, topa_idx);
 
if (buf->snapshot) {
local_set(>data_size, base);
@@ -800,7 +801,7 @@ static void *pt_buffer_region(struct pt_buffer *buf)
  */
 static size_t pt_buffer_region_size(struct pt_buffer *buf)
 {
-   return sizes(buf->cur->table[buf->cur_idx].size);
+   return TOPA_ENTRY_SIZE(buf->cur, buf->cur_idx);
 }
 
 /**
@@ -830,7 +831,7 @@ static void pt_handle_status(struct pt *pt)
 * know.
 */
if (!intel_pt_validate_hw_cap(PT_CAP_topa_multiple_entries) ||
-   buf->output_off == sizes(TOPA_ENTRY(buf->cur, 
buf->cur_idx)->size)) {
+   buf->output_off == pt_buffer_region_size(buf)) {
perf_aux_output_flag(>handle,
 PERF_AUX_FLAG_TRUNCATED);
advance++;
@@ -925,8 +926,7 @@ static int pt_buffer_reset_markers(struct pt_buffer *buf,
unsigned long idx, npages, wakeup;
 
/* can't stop in the middle of an output region */
-   if (buf->output_off + handle->size + 1 <
-   sizes(TOPA_ENTRY(buf->cur, buf->cur_idx)->size)) {
+   if (buf->output_off + handle->size + 1 < pt_buffer_region_size(buf)) {
perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
return -EINVAL;
}
@@ -1032,7 +1032,7 @@ static void pt_buffer_reset_offsets(struct pt_buffer 
*buf, unsigned long head)
buf->cur = (struct topa *)((unsigned long)buf->topa_index[pg] & 
PAGE_MASK);
buf->cur_idx = ((unsigned long)buf->topa_index[pg] -
(unsigned long)buf->cur) / sizeof(struct topa_entry);
-   buf->output_off = head & (sizes(buf->cur->table[buf->cur_idx].size) - 
1);
+   buf->output_off = head & (pt_buffer_region_size(buf) - 1);
 
local64_set(>head, head);
local_set(>data_size, 0);
-- 
2.23.0.rc1



[PATCH v0 0/6] perf/x86/intel/pt: Misc updates

2019-08-21 Thread Alexander Shishkin
Hi Peter,

Here are updates that I have for the PT driver. The biggest change is
6/6, it gets rid of the reverse lookup table that the driver uses to
find ToPA (SG) entries by the page offset. With the high order page
allocations in the AUX buffer, the cost of runtime lookup should be
minimal, and we get to free up some memory that the table occupies.
Plus, we get to allocate 2G (and up) PT buffers should we be so
inclined.

Others are small reworks and cleanups striving to make the code easier
on the eyes.

Alexander Shishkin (6):
  perf/x86/intel/pt: Clean up ToPA allocation path
  perf/x86/intel/pt: Use helpers to obtain ToPA entry size
  perf/x86/intel/pt: Use pointer arithmetics instead in ToPA entry
calculation
  perf/x86/intel/pt: Split ToPA metadata and page layout
  perf/x86/intel/pt: Free up space in a ToPA descriptor
  perf/x86/intel/pt: Get rid of reverse lookup table for ToPA

 arch/x86/events/intel/pt.c | 325 +++--
 arch/x86/events/intel/pt.h |  12 +-
 2 files changed, 210 insertions(+), 127 deletions(-)

-- 
2.23.0.rc1



[PATCH v0 6/6] perf/x86/intel/pt: Get rid of reverse lookup table for ToPA

2019-08-21 Thread Alexander Shishkin
In order to quickly find a ToPA entry by its page offset in the buffer,
we're using a reverse lookup table. The problem with it is that it's a
large array of mostly similar pointers, especially so now that we're
using high order allocations from the page allocator. Because its size
is limited to whatever is the maximum for kmalloc(), it places a limit
on the number of ToPA entries per buffer, and therefore, on the total
buffer size, which otherwise doesn't have to be there.

Replace the reverse lookup table with a simple runtime lookup. With the
high order AUX allocations in place, the runtime penalty of such a lookup
is much smaller and in cases where all entries in a ToPA table are of
the same size, the complexity is O(1).

Signed-off-by: Alexander Shishkin 
---
 arch/x86/events/intel/pt.c | 194 -
 arch/x86/events/intel/pt.h |  10 +-
 2 files changed, 131 insertions(+), 73 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index 5965e9493c8b..4972c727d5c5 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -551,12 +551,14 @@ static void pt_config_buffer(void *buf, unsigned int 
topa_idx,
  * @offset:offset of the first entry in this table in the buffer
  * @size:  total size of all entries in this table
  * @last:  index of the last initialized entry in this table
+ * @z_count:   how many times the first entry repeats
  */
 struct topa {
struct list_headlist;
u64 offset;
size_t  size;
int last;
+   unsigned intz_count;
 };
 
 /*
@@ -598,6 +600,7 @@ static inline phys_addr_t topa_pfn(struct topa *topa)
? _to_page(t)->table[(t)->last]\
: _to_page(t)->table[(i)])
 #define TOPA_ENTRY_SIZE(t, i) (sizes(TOPA_ENTRY((t), (i))->size))
+#define TOPA_ENTRY_PAGES(t, i) (1 << TOPA_ENTRY((t), (i))->size)
 
 /**
  * topa_alloc() - allocate page-sized ToPA table
@@ -713,6 +716,11 @@ static int topa_insert_pages(struct pt_buffer *buf, int 
cpu, gfp_t gfp)
topa_insert_table(buf, topa);
}
 
+   if (topa->z_count == topa->last - 1) {
+   if (order == TOPA_ENTRY(topa, topa->last - 1)->size)
+   topa->z_count++;
+   }
+
TOPA_ENTRY(topa, -1)->base = page_to_phys(p) >> TOPA_SHIFT;
TOPA_ENTRY(topa, -1)->size = order;
if (!buf->snapshot &&
@@ -756,6 +764,8 @@ static void pt_topa_dump(struct pt_buffer *buf)
 tp->table[i].stop) ||
tp->table[i].end)
break;
+   if (!i && topa->z_count)
+   i += topa->z_count;
}
}
 }
@@ -907,29 +917,97 @@ static void pt_read_offset(struct pt_buffer *buf)
buf->cur_idx = (offset & 0xff80) >> 7;
 }
 
-/**
- * pt_topa_next_entry() - obtain index of the first page in the next ToPA entry
- * @buf:   PT buffer.
- * @pg:Page offset in the buffer.
- *
- * When advancing to the next output region (ToPA entry), given a page offset
- * into the buffer, we need to find the offset of the first page in the next
- * region.
- */
-static unsigned int pt_topa_next_entry(struct pt_buffer *buf, unsigned int pg)
+static struct topa_entry *
+pt_topa_entry_for_page(struct pt_buffer *buf, unsigned int pg)
+{
+   struct topa_page *tp;
+   struct topa *topa;
+   unsigned int idx, cur_pg = 0, z_pg = 0, start_idx = 0;
+
+   /*
+* Indicates a bug in the caller.
+*/
+   if (WARN_ON_ONCE(pg >= buf->nr_pages))
+   return NULL;
+
+   /*
+* First, find the ToPA table where @pg fits. With high
+* order allocations, there shouldn't be many of these.
+*/
+   list_for_each_entry(topa, >tables, list) {
+   if (topa->offset + topa->size > pg << PAGE_SHIFT)
+   goto found;
+   }
+
+   /*
+* Hitting this means we have a problem in the ToPA
+* allocation code.
+*/
+   WARN_ON_ONCE(1);
+
+   return NULL;
+
+found:
+   /*
+* Indicates a problem in the ToPA allocation code.
+*/
+   if (WARN_ON_ONCE(topa->last == -1))
+   return NULL;
+
+   tp = topa_to_page(topa);
+   cur_pg = PFN_DOWN(topa->offset);
+   if (topa->z_count) {
+   z_pg = TOPA_ENTRY_PAGES(topa, 0) * (topa->z_count + 1);
+   start_idx = topa->z_count + 1;
+   }
+
+   /*
+* Multiple entries at the beginning of the table have the same size,
+* ideally all of them; if @pg falls there, the search is done.
+*/
+   if (pg >= cur_pg && pg < cur_pg + z_pg) {
+ 

[PATCH v0 5/6] perf/x86/intel/pt: Free up space in a ToPA descriptor

2019-08-21 Thread Alexander Shishkin
Currently, we're storing physical address of a ToPA table in its
descriptor, which is completely unnecessary. Since the descriptor
and the table itself share the same page, reducing the descriptor
size leaves more space for the table.

Signed-off-by: Alexander Shishkin 
---
 arch/x86/events/intel/pt.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index e7b0f8f4f1b0..5965e9493c8b 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -548,14 +548,12 @@ static void pt_config_buffer(void *buf, unsigned int 
topa_idx,
 /**
  * struct topa - ToPA metadata
  * @list:  linkage to struct pt_buffer's list of tables
- * @phys:  physical address of this page
  * @offset:offset of the first entry in this table in the buffer
  * @size:  total size of all entries in this table
  * @last:  index of the last initialized entry in this table
  */
 struct topa {
struct list_headlist;
-   u64 phys;
u64 offset;
size_t  size;
int last;
@@ -589,6 +587,11 @@ static inline struct topa_page *topa_entry_to_page(struct 
topa_entry *te)
return (struct topa_page *)((unsigned long)te & PAGE_MASK);
 }
 
+static inline phys_addr_t topa_pfn(struct topa *topa)
+{
+   return PFN_DOWN(virt_to_phys(topa_to_page(topa)));
+}
+
 /* make -1 stand for the last table entry */
 #define TOPA_ENTRY(t, i)   \
((i) == -1  \
@@ -615,14 +618,13 @@ static struct topa *topa_alloc(int cpu, gfp_t gfp)
 
tp = page_address(p);
tp->topa.last = 0;
-   tp->topa.phys = page_to_phys(p);
 
/*
 * In case of singe-entry ToPA, always put the self-referencing END
 * link as the 2nd entry in the table
 */
if (!intel_pt_validate_hw_cap(PT_CAP_topa_multiple_entries)) {
-   TOPA_ENTRY(>topa, 1)->base = tp->topa.phys;
+   TOPA_ENTRY(>topa, 1)->base = page_to_phys(p);
TOPA_ENTRY(>topa, 1)->end = 1;
}
 
@@ -666,7 +668,7 @@ static void topa_insert_table(struct pt_buffer *buf, struct 
topa *topa)
 
BUG_ON(last->last != TENTS_PER_PAGE - 1);
 
-   TOPA_ENTRY(last, -1)->base = topa->phys >> TOPA_SHIFT;
+   TOPA_ENTRY(last, -1)->base = topa_pfn(topa);
TOPA_ENTRY(last, -1)->end = 1;
 }
 
@@ -739,8 +741,8 @@ static void pt_topa_dump(struct pt_buffer *buf)
struct topa_page *tp = topa_to_page(topa);
int i;
 
-   pr_debug("# table @%p (%016Lx), off %llx size %zx\n", tp->table,
-topa->phys, topa->offset, topa->size);
+   pr_debug("# table @%p, off %llx size %zx\n", tp->table,
+topa->offset, topa->size);
for (i = 0; i < TENTS_PER_PAGE; i++) {
pr_debug("# entry @%p (%lx sz %u %c%c%c) raw=%16llx\n",
 >table[i],
@@ -,7 +1113,7 @@ static int pt_buffer_init_topa(struct pt_buffer *buf, int 
cpu,
 
/* link last table to the first one, unless we're double buffering */
if (intel_pt_validate_hw_cap(PT_CAP_topa_multiple_entries)) {
-   TOPA_ENTRY(buf->last, -1)->base = buf->first->phys >> 
TOPA_SHIFT;
+   TOPA_ENTRY(buf->last, -1)->base = topa_pfn(buf->first);
TOPA_ENTRY(buf->last, -1)->end = 1;
}
 
-- 
2.23.0.rc1



[PATCH v0 4/6] perf/x86/intel/pt: Split ToPA metadata and page layout

2019-08-21 Thread Alexander Shishkin
PT uses page sized ToPA tables, where the ToPA table resides at the bottom
and its driver-specific metadata taking up a few words at the top of the
page. The split is currently calculated manually and needs to be redone
every time a field is added to or removed from the metadata structure.
Also, the 32-bit version can be made smaller.

By splitting the table and metadata into separate structures, we are making
the compiler figure out the division of the page.

Signed-off-by: Alexander Shishkin 
---
 arch/x86/events/intel/pt.c | 93 --
 1 file changed, 60 insertions(+), 33 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index 15e7c11e3460..e7b0f8f4f1b0 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -545,16 +545,8 @@ static void pt_config_buffer(void *buf, unsigned int 
topa_idx,
wrmsrl(MSR_IA32_RTIT_OUTPUT_MASK, reg);
 }
 
-/*
- * Keep ToPA table-related metadata on the same page as the actual table,
- * taking up a few words from the top
- */
-
-#define TENTS_PER_PAGE (((PAGE_SIZE - 40) / sizeof(struct topa_entry)) - 1)
-
 /**
- * struct topa - page-sized ToPA table with metadata at the top
- * @table: actual ToPA table entries, as understood by PT hardware
+ * struct topa - ToPA metadata
  * @list:  linkage to struct pt_buffer's list of tables
  * @phys:  physical address of this page
  * @offset:offset of the first entry in this table in the buffer
@@ -562,7 +554,6 @@ static void pt_config_buffer(void *buf, unsigned int 
topa_idx,
  * @last:  index of the last initialized entry in this table
  */
 struct topa {
-   struct topa_entry   table[TENTS_PER_PAGE];
struct list_headlist;
u64 phys;
u64 offset;
@@ -570,8 +561,39 @@ struct topa {
int last;
 };
 
+/*
+ * Keep ToPA table-related metadata on the same page as the actual table,
+ * taking up a few words from the top
+ */
+
+#define TENTS_PER_PAGE \
+   ((PAGE_SIZE - sizeof(struct topa)) / sizeof(struct topa_entry))
+
+/**
+ * struct topa_page - page-sized ToPA table with metadata at the top
+ * @table: actual ToPA table entries, as understood by PT hardware
+ * @topa:  metadata
+ */
+struct topa_page {
+   struct topa_entry   table[TENTS_PER_PAGE];
+   struct topa topa;
+};
+
+static inline struct topa_page *topa_to_page(struct topa *topa)
+{
+   return container_of(topa, struct topa_page, topa);
+}
+
+static inline struct topa_page *topa_entry_to_page(struct topa_entry *te)
+{
+   return (struct topa_page *)((unsigned long)te & PAGE_MASK);
+}
+
 /* make -1 stand for the last table entry */
-#define TOPA_ENTRY(t, i) ((i) == -1 ? &(t)->table[(t)->last] : 
&(t)->table[(i)])
+#define TOPA_ENTRY(t, i)   \
+   ((i) == -1  \
+   ? _to_page(t)->table[(t)->last]\
+   : _to_page(t)->table[(i)])
 #define TOPA_ENTRY_SIZE(t, i) (sizes(TOPA_ENTRY((t), (i))->size))
 
 /**
@@ -584,27 +606,27 @@ struct topa {
 static struct topa *topa_alloc(int cpu, gfp_t gfp)
 {
int node = cpu_to_node(cpu);
-   struct topa *topa;
+   struct topa_page *tp;
struct page *p;
 
p = alloc_pages_node(node, gfp | __GFP_ZERO, 0);
if (!p)
return NULL;
 
-   topa = page_address(p);
-   topa->last = 0;
-   topa->phys = page_to_phys(p);
+   tp = page_address(p);
+   tp->topa.last = 0;
+   tp->topa.phys = page_to_phys(p);
 
/*
 * In case of singe-entry ToPA, always put the self-referencing END
 * link as the 2nd entry in the table
 */
if (!intel_pt_validate_hw_cap(PT_CAP_topa_multiple_entries)) {
-   TOPA_ENTRY(topa, 1)->base = topa->phys >> TOPA_SHIFT;
-   TOPA_ENTRY(topa, 1)->end = 1;
+   TOPA_ENTRY(>topa, 1)->base = tp->topa.phys;
+   TOPA_ENTRY(>topa, 1)->end = 1;
}
 
-   return topa;
+   return >topa;
 }
 
 /**
@@ -714,22 +736,23 @@ static void pt_topa_dump(struct pt_buffer *buf)
struct topa *topa;
 
list_for_each_entry(topa, >tables, list) {
+   struct topa_page *tp = topa_to_page(topa);
int i;
 
-   pr_debug("# table @%p (%016Lx), off %llx size %zx\n", 
topa->table,
+   pr_debug("# table @%p (%016Lx), off %llx size %zx\n", tp->table,
 topa->phys, topa->offset, topa->size);
for (i = 0; i < TENTS_PER_PAGE; i++) {
pr_debug("# entry @%p (%lx sz %u %c%c%c) raw=%16llx\n",
->table[i],
-(unsigned long)topa->table[i].base

[GIT PULL v1 0/4] stm class/intel_th: Fixes for v5.3

2019-08-21 Thread Alexander Shishkin
Hi Greg,

These are the fixes that I have for v5.3. One is an actual bugfix that's
copied to stable, one SPDX header fix and two new PCI IDs, copied to
stable as well. Signed tag below, individual patches follow. Please
consider applying or pulling. Thanks!

Alexander Shishkin (2):
  intel_th: pci: Add support for another Lewisburg PCH
  intel_th: pci: Add Tiger Lake support

Ding Xiang (1):
  stm class: Fix a double free of stm_source_device

Nishad Kamdar (1):
  intel_th: Use the correct style for SPDX License Identifier

 drivers/hwtracing/intel_th/msu.h |  2 +-
 drivers/hwtracing/intel_th/pci.c | 10 ++
 drivers/hwtracing/intel_th/pti.h |  2 +-
 drivers/hwtracing/stm/core.c |  1 -
 4 files changed, 12 insertions(+), 3 deletions(-)

-- 
2.23.0.rc1



[GIT PULL v1 3/4] intel_th: pci: Add support for another Lewisburg PCH

2019-08-21 Thread Alexander Shishkin
Add support for the Trace Hub in another Lewisburg PCH.

Signed-off-by: Alexander Shishkin 
Cc: sta...@vger.kernel.org # v4.14+
---
 drivers/hwtracing/intel_th/pci.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/hwtracing/intel_th/pci.c b/drivers/hwtracing/intel_th/pci.c
index c0378c3de9a4..5c4e4fbec936 100644
--- a/drivers/hwtracing/intel_th/pci.c
+++ b/drivers/hwtracing/intel_th/pci.c
@@ -164,6 +164,11 @@ static const struct pci_device_id intel_th_pci_id_table[] 
= {
PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0xa1a6),
.driver_data = (kernel_ulong_t)0,
},
+   {
+   /* Lewisburg PCH */
+   PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0xa226),
+   .driver_data = (kernel_ulong_t)0,
+   },
{
/* Gemini Lake */
PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x318e),
-- 
2.23.0.rc1



[GIT PULL v1 4/4] intel_th: pci: Add Tiger Lake support

2019-08-21 Thread Alexander Shishkin
This adds support for the Trace Hub in Tiger Lake PCH.

Signed-off-by: Alexander Shishkin 
Cc: sta...@vger.kernel.org # v4.14+
---
 drivers/hwtracing/intel_th/pci.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/hwtracing/intel_th/pci.c b/drivers/hwtracing/intel_th/pci.c
index 5c4e4fbec936..91dfeba62485 100644
--- a/drivers/hwtracing/intel_th/pci.c
+++ b/drivers/hwtracing/intel_th/pci.c
@@ -204,6 +204,11 @@ static const struct pci_device_id intel_th_pci_id_table[] 
= {
PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x45c5),
.driver_data = (kernel_ulong_t)_th_2x,
},
+   {
+   /* Tiger Lake PCH */
+   PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0xa0a6),
+   .driver_data = (kernel_ulong_t)_th_2x,
+   },
{ 0 },
 };
 
-- 
2.23.0.rc1



[GIT PULL v1 1/4] stm class: Fix a double free of stm_source_device

2019-08-21 Thread Alexander Shishkin
From: Ding Xiang 

In the error path of stm_source_register_device(), the kfree is
unnecessary, as the put_device() before it ends up calling
stm_source_device_release() to free stm_source_device, leading to
a double free at the outer kfree() call. Remove it.

Signed-off-by: Ding Xiang 
Signed-off-by: Alexander Shishkin 
Fixes: 7bd1d4093c2fa ("stm class: Introduce an abstraction for System Trace 
Module devices")
Link: 
https://lore.kernel.org/linux-arm-kernel/1563354988-23826-1-git-send-email-dingxi...@cmss.chinamobile.com/
Cc: sta...@vger.kernel.org # v4.4+
---
 drivers/hwtracing/stm/core.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/hwtracing/stm/core.c b/drivers/hwtracing/stm/core.c
index e55b902560de..181e7ff1ec4f 100644
--- a/drivers/hwtracing/stm/core.c
+++ b/drivers/hwtracing/stm/core.c
@@ -1276,7 +1276,6 @@ int stm_source_register_device(struct device *parent,
 
 err:
put_device(>dev);
-   kfree(src);
 
return err;
 }
-- 
2.23.0.rc1



[GIT PULL v1 2/4] intel_th: Use the correct style for SPDX License Identifier

2019-08-21 Thread Alexander Shishkin
From: Nishad Kamdar 

This patch corrects the SPDX License Identifier style
in header files related to Drivers for Intel(R) Trace Hub
controller.
For C header files Documentation/process/license-rules.rst
mandates C-like comments (opposed to C source files where
C++ style should be used)

Changes made by using a script provided by Joe Perches here:
https://lkml.org/lkml/2019/2/7/46

Suggested-by: Joe Perches 
Signed-off-by: Nishad Kamdar 
Signed-off-by: Alexander Shishkin 
Fixes: 50352fa730328 ("intel_th: Add SPDX GPL-2.0 header to replace GPLv2 
boilerplate")
Link: https://lore.kernel.org/lkml/20190726142840.GA4301@nishad/
---
 drivers/hwtracing/intel_th/msu.h | 2 +-
 drivers/hwtracing/intel_th/pti.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/hwtracing/intel_th/msu.h b/drivers/hwtracing/intel_th/msu.h
index 574c16004cb2..13d9b141daaa 100644
--- a/drivers/hwtracing/intel_th/msu.h
+++ b/drivers/hwtracing/intel_th/msu.h
@@ -1,4 +1,4 @@
-// SPDX-License-Identifier: GPL-2.0
+/* SPDX-License-Identifier: GPL-2.0 */
 /*
  * Intel(R) Trace Hub Memory Storage Unit (MSU) data structures
  *
diff --git a/drivers/hwtracing/intel_th/pti.h b/drivers/hwtracing/intel_th/pti.h
index e9381babc84c..7dfc0431333b 100644
--- a/drivers/hwtracing/intel_th/pti.h
+++ b/drivers/hwtracing/intel_th/pti.h
@@ -1,4 +1,4 @@
-// SPDX-License-Identifier: GPL-2.0
+/* SPDX-License-Identifier: GPL-2.0 */
 /*
  * Intel(R) Trace Hub PTI output data structures
  *
-- 
2.23.0.rc1



Re: [GIT PULL 3/4] intel_th: pci: Add support for another Lewisburg PCH

2019-08-20 Thread Alexander Shishkin
Greg Kroah-Hartman  writes:

> On Tue, Aug 20, 2019 at 01:16:52PM +0300, Alexander Shishkin wrote:
>> Add support for the Trace Hub in another Lewisburg PCH.
>> 
>> Signed-off-by: Alexander Shishkin 
>> ---
>>  drivers/hwtracing/intel_th/pci.c | 5 +
>>  1 file changed, 5 insertions(+)
>
> same here, ok for stable?

Yes for both. Should I resend?

Regards,
--
Alex


[GIT PULL 2/4] intel_th: Use the correct style for SPDX License Identifier

2019-08-20 Thread Alexander Shishkin
From: Nishad Kamdar 

This patch corrects the SPDX License Identifier style
in header files related to Drivers for Intel(R) Trace Hub
controller.
For C header files Documentation/process/license-rules.rst
mandates C-like comments (opposed to C source files where
C++ style should be used)

Changes made by using a script provided by Joe Perches here:
https://lkml.org/lkml/2019/2/7/46

Suggested-by: Joe Perches 
Signed-off-by: Nishad Kamdar 
Signed-off-by: Alexander Shishkin 
Fixes: 50352fa730328 ("intel_th: Add SPDX GPL-2.0 header to replace GPLv2 
boilerplate")
Link: https://lore.kernel.org/lkml/20190726142840.GA4301@nishad/
---
 drivers/hwtracing/intel_th/msu.h | 2 +-
 drivers/hwtracing/intel_th/pti.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/hwtracing/intel_th/msu.h b/drivers/hwtracing/intel_th/msu.h
index 574c16004cb2..13d9b141daaa 100644
--- a/drivers/hwtracing/intel_th/msu.h
+++ b/drivers/hwtracing/intel_th/msu.h
@@ -1,4 +1,4 @@
-// SPDX-License-Identifier: GPL-2.0
+/* SPDX-License-Identifier: GPL-2.0 */
 /*
  * Intel(R) Trace Hub Memory Storage Unit (MSU) data structures
  *
diff --git a/drivers/hwtracing/intel_th/pti.h b/drivers/hwtracing/intel_th/pti.h
index e9381babc84c..7dfc0431333b 100644
--- a/drivers/hwtracing/intel_th/pti.h
+++ b/drivers/hwtracing/intel_th/pti.h
@@ -1,4 +1,4 @@
-// SPDX-License-Identifier: GPL-2.0
+/* SPDX-License-Identifier: GPL-2.0 */
 /*
  * Intel(R) Trace Hub PTI output data structures
  *
-- 
2.23.0.rc1



[GIT PULL 0/4] stm class/intel_th: Fixes for v5.3

2019-08-20 Thread Alexander Shishkin
Hi Greg,

These are the fixes that I have for v5.3. One is an actual bugfix that's
copied to stable, one SPDX header fix and two new PCI IDs. Signed tag
below, individual patches follow. Please consider applying or pulling.
Thanks!

The following changes since commit 5f9e832c137075045d15cd6899ab0505cfb2ca4b:

  Linus 5.3-rc1 (2019-07-21 14:05:38 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/ash/stm.git 
tags/stm-intel_th-fixes-for-greg-20190820

for you to fetch changes up to 1114218bb49423c2ed6767f0f385f994da71c068:

  intel_th: pci: Add Tiger Lake support (2019-08-20 13:02:59 +0300)


stm class/intel_th: Fixes for v5.3

These are:
  * Double-free fix, going back to v4.4.
  * SPDX license header fix.
  * 2 new PCI IDs.


Alexander Shishkin (2):
  intel_th: pci: Add support for another Lewisburg PCH
  intel_th: pci: Add Tiger Lake support

Ding Xiang (1):
  stm class: Fix a double free of stm_source_device

Nishad Kamdar (1):
  intel_th: Use the correct style for SPDX License Identifier

 drivers/hwtracing/intel_th/msu.h |  2 +-
 drivers/hwtracing/intel_th/pci.c | 10 ++
 drivers/hwtracing/intel_th/pti.h |  2 +-
 drivers/hwtracing/stm/core.c |  1 -
 4 files changed, 12 insertions(+), 3 deletions(-)

-- 
2.23.0.rc1



[GIT PULL 4/4] intel_th: pci: Add Tiger Lake support

2019-08-20 Thread Alexander Shishkin
This adds support for the Trace Hub in Tiger Lake PCH.

Signed-off-by: Alexander Shishkin 
---
 drivers/hwtracing/intel_th/pci.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/hwtracing/intel_th/pci.c b/drivers/hwtracing/intel_th/pci.c
index 5c4e4fbec936..91dfeba62485 100644
--- a/drivers/hwtracing/intel_th/pci.c
+++ b/drivers/hwtracing/intel_th/pci.c
@@ -204,6 +204,11 @@ static const struct pci_device_id intel_th_pci_id_table[] 
= {
PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x45c5),
.driver_data = (kernel_ulong_t)_th_2x,
},
+   {
+   /* Tiger Lake PCH */
+   PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0xa0a6),
+   .driver_data = (kernel_ulong_t)_th_2x,
+   },
{ 0 },
 };
 
-- 
2.23.0.rc1



[GIT PULL 3/4] intel_th: pci: Add support for another Lewisburg PCH

2019-08-20 Thread Alexander Shishkin
Add support for the Trace Hub in another Lewisburg PCH.

Signed-off-by: Alexander Shishkin 
---
 drivers/hwtracing/intel_th/pci.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/hwtracing/intel_th/pci.c b/drivers/hwtracing/intel_th/pci.c
index c0378c3de9a4..5c4e4fbec936 100644
--- a/drivers/hwtracing/intel_th/pci.c
+++ b/drivers/hwtracing/intel_th/pci.c
@@ -164,6 +164,11 @@ static const struct pci_device_id intel_th_pci_id_table[] 
= {
PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0xa1a6),
.driver_data = (kernel_ulong_t)0,
},
+   {
+   /* Lewisburg PCH */
+   PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0xa226),
+   .driver_data = (kernel_ulong_t)0,
+   },
{
/* Gemini Lake */
PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x318e),
-- 
2.23.0.rc1



[GIT PULL 1/4] stm class: Fix a double free of stm_source_device

2019-08-20 Thread Alexander Shishkin
From: Ding Xiang 

In the error path of stm_source_register_device(), the kfree is
unnecessary, as the put_device() before it ends up calling
stm_source_device_release() to free stm_source_device, leading to
a double free at the outer kfree() call. Remove it.

Signed-off-by: Ding Xiang 
Signed-off-by: Alexander Shishkin 
Fixes: 7bd1d4093c2fa ("stm class: Introduce an abstraction for System Trace 
Module devices")
Link: 
https://lore.kernel.org/linux-arm-kernel/1563354988-23826-1-git-send-email-dingxi...@cmss.chinamobile.com/
Cc: sta...@vger.kernel.org # v4.4+
---
 drivers/hwtracing/stm/core.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/hwtracing/stm/core.c b/drivers/hwtracing/stm/core.c
index e55b902560de..181e7ff1ec4f 100644
--- a/drivers/hwtracing/stm/core.c
+++ b/drivers/hwtracing/stm/core.c
@@ -1276,7 +1276,6 @@ int stm_source_register_device(struct device *parent,
 
 err:
put_device(>dev);
-   kfree(src);
 
return err;
 }
-- 
2.23.0.rc1



[tip:perf/core] perf record: Add an option to take an AUX snapshot on exit

2019-08-15 Thread tip-bot for Alexander Shishkin
Commit-ID:  ce7b0e426ef359ee1d4a6126314ee3547a8eed87
Gitweb: https://git.kernel.org/tip/ce7b0e426ef359ee1d4a6126314ee3547a8eed87
Author: Alexander Shishkin 
AuthorDate: Tue, 6 Aug 2019 17:41:01 +0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Wed, 14 Aug 2019 10:59:59 -0300

perf record: Add an option to take an AUX snapshot on exit

It is sometimes useful to generate a snapshot when perf record exits;
I've been using a wrapper script around the workload that would do a
killall -USR2 perf when the workload exits.

This patch makes it easier and also works when perf record is attached
to a pre-existing task. A new snapshot option 'e' can be specified in
-S to enable this behavior:

root@elsewhere:~# perf record -e intel_pt// -Se sleep 1
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.085 MB perf.data ]

Co-developed-by: Adrian Hunter 
Signed-off-by: Alexander Shishkin 
Cc: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/20190806144101.62892-1-alexander.shish...@linux.intel.com
[ Fixed up !HAVE_AUXTRACE_SUPPORT build in builtin-record.c, adding 2 missing 
__maybe_unused ]
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/Documentation/perf-record.txt | 11 +++---
 tools/perf/builtin-record.c  | 35 
 tools/perf/perf.h|  1 +
 tools/perf/util/auxtrace.c   | 14 +++--
 tools/perf/util/auxtrace.h   |  2 +-
 5 files changed, 53 insertions(+), 10 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt 
b/tools/perf/Documentation/perf-record.txt
index 15e0fa87241b..d5e58e0a2bca 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -422,9 +422,14 @@ CLOCK_BOOTTIME, CLOCK_REALTIME and CLOCK_TAI.
 -S::
 --snapshot::
 Select AUX area tracing Snapshot Mode. This option is valid only with an
-AUX area tracing event. Optionally the number of bytes to capture per
-snapshot can be specified. In Snapshot Mode, trace data is captured only when
-signal SIGUSR2 is received.
+AUX area tracing event. Optionally, certain snapshot capturing parameters
+can be specified in a string that follows this option:
+  'e': take one last snapshot on exit; guarantees that there is at least one
+   snapshot in the output file;
+  : if the PMU supports this, specify the desired snapshot size.
+
+In Snapshot Mode trace data is captured only when signal SIGUSR2 is received
+and on exit if the above 'e' option is given.
 
 --proc-map-timeout::
 When processing pre-existing threads /proc/XXX/mmap, it may take a long time,
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index d31d7a5a1be3..f71631f2bcb5 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -613,19 +613,35 @@ out:
return rc;
 }
 
-static void record__read_auxtrace_snapshot(struct record *rec)
+static void record__read_auxtrace_snapshot(struct record *rec, bool on_exit)
 {
pr_debug("Recording AUX area tracing snapshot\n");
if (record__auxtrace_read_snapshot_all(rec) < 0) {
trigger_error(_snapshot_trigger);
} else {
-   if (auxtrace_record__snapshot_finish(rec->itr))
+   if (auxtrace_record__snapshot_finish(rec->itr, on_exit))
trigger_error(_snapshot_trigger);
else
trigger_ready(_snapshot_trigger);
}
 }
 
+static int record__auxtrace_snapshot_exit(struct record *rec)
+{
+   if (trigger_is_error(_snapshot_trigger))
+   return 0;
+
+   if (!auxtrace_record__snapshot_started &&
+   auxtrace_record__snapshot_start(rec->itr))
+   return -1;
+
+   record__read_auxtrace_snapshot(rec, true);
+   if (trigger_is_error(_snapshot_trigger))
+   return -1;
+
+   return 0;
+}
+
 static int record__auxtrace_init(struct record *rec)
 {
int err;
@@ -654,7 +670,8 @@ int record__auxtrace_mmap_read(struct record *rec 
__maybe_unused,
 }
 
 static inline
-void record__read_auxtrace_snapshot(struct record *rec __maybe_unused)
+void record__read_auxtrace_snapshot(struct record *rec __maybe_unused,
+   bool on_exit __maybe_unused)
 {
 }
 
@@ -664,6 +681,12 @@ int auxtrace_record__snapshot_start(struct auxtrace_record 
*itr __maybe_unused)
return 0;
 }
 
+static inline
+int record__auxtrace_snapshot_exit(struct record *rec __maybe_unused)
+{
+   return 0;
+}
+
 static int record__auxtrace_init(struct record *rec __maybe_unused)
 {
return 0;
@@ -1536,7 +1559,7 @@ static int __cmd_record(struct record *rec, int argc, 
const char **argv)
if (auxtrace_record__snapshot_started) {
auxtrace_record__snapshot_started = 0;
if (!trigger_is_error(_snapshot_trigger))
-   

Re: [PATCH v6 7/7] perf intel-pt: Add brief documentation for PEBS via Intel PT

2019-08-13 Thread Alexander Shishkin
Arnaldo Carvalho de Melo  writes:

> I've just blindly followed the provided documentation :)

Yes, I should have checked it also before I sent it out. :)

> So you say I should have tried this instead:
>
>   # perf record -c 1 -e '{intel_pt/branch=0/,cycles/aux-output/ppp}' uname

Right. For the purposes of illustrating the error condition, you can
probably drop the '-c ...' and 'branch=0' also, but either way is fine.

>   Error:
>   The 'aux_output' feature is not supported, update the kernel.

Or it's not supported by the hardware. I don't think we make a
distinction at the moment. You can tell if it's available from dmesg,
but not otherwise.

>   #
>
> Or with leader sampling?
>
>   # perf record -c 1 -e '{intel_pt/branch=0/,cycles/aux-output/ppp}:S' 
> uname

Not sure if we should even allow this. Maybe Adrian can chime in.

Thanks,
--
Alex


Re: [PATCH v6 7/7] perf intel-pt: Add brief documentation for PEBS via Intel PT

2019-08-13 Thread Alexander Shishkin
Arnaldo Carvalho de Melo  writes:

> Em Tue, Aug 06, 2019 at 11:46:06AM +0300, Alexander Shishkin escreveu:
>> From: Adrian Hunter 
>> 
>> Document how to select PEBS via Intel PT and how to display synthesized
>> PEBS samples.
>> 
>> Signed-off-by: Adrian Hunter 
>> Signed-off-by: Alexander Shishkin 
>> ---
>>  tools/perf/Documentation/intel-pt.txt | 15 +++
>>  1 file changed, 15 insertions(+)
>> 
>> diff --git a/tools/perf/Documentation/intel-pt.txt 
>> b/tools/perf/Documentation/intel-pt.txt
>> index 50c5b60101bd..8dc513b6607b 100644
>> --- a/tools/perf/Documentation/intel-pt.txt
>> +++ b/tools/perf/Documentation/intel-pt.txt
>> @@ -919,3 +919,18 @@ amended to take the number of elements as a parameter.
>>  
>>  Note there is currently no advantage to using Intel PT instead of LBR, but
>>  that may change in the future if greater use is made of the data.
>> +
>> +
>> +PEBS via Intel PT
>> +=
>> +
>> +Some hardware has the feature to redirect PEBS records to the Intel PT 
>> trace.
>> +Recording is selected by using the aux-output config term e.g.
>> +
>> +perf record  -c 1 -e cycles/aux-output/ppp -e intel_pt/branch=0/ 
>> uname
>> +
>> +Note that currently, software only supports redirecting at most one PEBS 
>> event.
>
> So, with these patches, but not the kernel ones I end up getting:
>
> [root@quaco ~]# perf record  -c 1 -e cycles/aux-output/ppp -e 
> intel_pt/branch=0/ uname

FWIW, the correct command line for that would have the two events
grouped and intel_pt be the group leader.

Regards,
--
Alex


Re: [PATCH v1 4/6] perf: Allow using AUX data in perf samples

2019-08-09 Thread Alexander Shishkin
Peter Zijlstra  writes:

> On Tue, Jun 19, 2018 at 01:47:25PM +0300, Alexander Shishkin wrote:
>> Right, the SW stuff may then race with event_function_call() stuff. Hmm.
>> For the HW stuff, I'm hoping that some kind of a sleight of hand may
>> suffice. Let me think some more.
>
> I currently don't see how the SW driven snapshot can ever work, see my
> comment on the last patch.

Yes, this should be solved by grouping, similarly PEBS->PT.

>> > Why can't you just snapshot the current location and let the thing
>> > 'run' ?
>> 
>> Because the buffer will overwrite itself and the location will be useless.
>
> Not if it's large enough ;-)
>
>> We don't write the AUX data out in this 'mode' at all, only the samples,
>> which allows for much less data in the resulting perf.data, less work for
>> the consumer, less IO bandwidth etc, and as a bonus, no AUX-related
>> interrupts.
>> 
>> But actually, even to snapshot the location we need to stop the event.
>
> Maybe new methods that should only be called from NMI context? 

Right, I went with that and added a ->snapshot_aux() method that can't
change any states or do anything that would affect the operation of
potentially preempted in-IRQ callbacks. So, if NMI hits in the middle of
->stop() or whatnot, we should be fine. Some context: in AUX overwrite
mode (aka "snapshot mode") we don't get PT related NMIs, the only NMIs
are from the HW events for which we're writing samples. We use the cpu
local ->handle_nmi (a kind of misnomer) to tell us if the NMI hit
between ->start() and ->stop() and we can safely take the sample.

The other problem is sampling SW events, that would require a ctx->lock
to prevent racing with event_function_call()s from other cpus, resulting
in somewhat cringy "if (!in_nmi()) raw_spin_lock(...)", but I don't have
better idea as to how to handle that.

The whole thing is squashed into one patch below for convenience.

>From df0b5efd9c24a9e4477a3501331888c4b4682588 Mon Sep 17 00:00:00 2001
From: Alexander Shishkin 
Date: Tue, 12 Jun 2018 10:51:15 +0300
Subject: [RFC v0] perf, pt: Allow using AUX data in perf samples

---
 arch/x86/events/intel/pt.c  |  40 +
 include/linux/perf_event.h  |  15 
 include/uapi/linux/perf_event.h |   7 +-
 kernel/events/core.c| 139 +++-
 kernel/events/internal.h|   1 +
 5 files changed, 200 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index d58124d93e5f..a7318c29242e 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -1413,6 +1413,45 @@ static void pt_event_stop(struct perf_event *event, int 
mode)
}
 }
 
+static long pt_event_snapshot_aux(struct perf_event *event,
+ struct perf_output_handle *handle,
+ unsigned long size)
+{
+   struct pt *pt = this_cpu_ptr(_ctx);
+   struct pt_buffer *buf = perf_get_aux(>handle);
+   unsigned long from = 0, to;
+   long ret;
+
+   if (!buf->snapshot)
+   return 0;
+
+   /*
+* Here, handle_nmi tells us if the tracing is on
+*/
+   if (!READ_ONCE(pt->handle_nmi))
+   return 0;
+
+   pt_config_stop(event);
+
+   pt_read_offset(buf);
+   pt_update_head(pt);
+
+   to = local_read(>data_size);
+   if (to < size)
+   from = buf->nr_pages << PAGE_SHIFT;
+   from += to - size;
+
+   ret = perf_output_copy_aux(>handle, handle, from, to);
+
+   /*
+* If the tracing was on, restart it.
+*/
+   if (READ_ONCE(pt->handle_nmi))
+   pt_config(event);
+
+   return ret;
+}
+
 static void pt_event_del(struct perf_event *event, int mode)
 {
pt_event_stop(event, PERF_EF_UPDATE);
@@ -1532,6 +1571,7 @@ static __init int pt_init(void)
pt_pmu.pmu.del   = pt_event_del;
pt_pmu.pmu.start = pt_event_start;
pt_pmu.pmu.stop  = pt_event_stop;
+   pt_pmu.pmu.snapshot_aux  = pt_event_snapshot_aux;
pt_pmu.pmu.read  = pt_event_read;
pt_pmu.pmu.setup_aux = pt_buffer_setup_aux;
pt_pmu.pmu.free_aux  = pt_buffer_free_aux;
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 9c8b70e8dc99..48ad35da73cd 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -103,6 +103,10 @@ struct perf_branch_stack {
struct perf_branch_entryentries[0];
 };
 
+struct perf_aux_record {
+   u64 size;
+};
+
 struct task_struct;
 
 /*
@@ -248,6 +252,8 @@ struct perf_event;
 #define PERF_PMU_CAP_NO_EXCLUDE0x80
 #define PERF_PMU_CAP_AUX_OUTPUT 

[PATCH v1] perf record: Add an option to take an AUX snapshot on exit

2019-08-06 Thread Alexander Shishkin
It is sometimes useful to generate a snapshot when perf record exits;
I've been using a wrapper script around the workload that would do a
killall -USR2 perf when the workload exits.

This patch makes it easier and also works when perf record is attached
to a pre-existing task. A new snapshot option 'e' can be specified in
-S to enable this behavior:

root@elsewhere:~# perf record -e intel_pt// -Se sleep 1
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.085 MB perf.data ]

Signed-off-by: Alexander Shishkin 
Co-developed-by: Adrian Hunter 
---
 tools/perf/Documentation/perf-record.txt | 11 +---
 tools/perf/builtin-record.c  | 34 +---
 tools/perf/perf.h|  1 +
 tools/perf/util/auxtrace.c   | 14 --
 tools/perf/util/auxtrace.h   |  2 +-
 5 files changed, 52 insertions(+), 10 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt 
b/tools/perf/Documentation/perf-record.txt
index 15e0fa87241b..d5e58e0a2bca 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -422,9 +422,14 @@ CLOCK_BOOTTIME, CLOCK_REALTIME and CLOCK_TAI.
 -S::
 --snapshot::
 Select AUX area tracing Snapshot Mode. This option is valid only with an
-AUX area tracing event. Optionally the number of bytes to capture per
-snapshot can be specified. In Snapshot Mode, trace data is captured only when
-signal SIGUSR2 is received.
+AUX area tracing event. Optionally, certain snapshot capturing parameters
+can be specified in a string that follows this option:
+  'e': take one last snapshot on exit; guarantees that there is at least one
+   snapshot in the output file;
+  : if the PMU supports this, specify the desired snapshot size.
+
+In Snapshot Mode trace data is captured only when signal SIGUSR2 is received
+and on exit if the above 'e' option is given.
 
 --proc-map-timeout::
 When processing pre-existing threads /proc/XXX/mmap, it may take a long time,
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index d31d7a5a1be3..e9a2525ecfcc 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -613,19 +613,35 @@ static int record__auxtrace_read_snapshot_all(struct 
record *rec)
return rc;
 }
 
-static void record__read_auxtrace_snapshot(struct record *rec)
+static void record__read_auxtrace_snapshot(struct record *rec, bool on_exit)
 {
pr_debug("Recording AUX area tracing snapshot\n");
if (record__auxtrace_read_snapshot_all(rec) < 0) {
trigger_error(_snapshot_trigger);
} else {
-   if (auxtrace_record__snapshot_finish(rec->itr))
+   if (auxtrace_record__snapshot_finish(rec->itr, on_exit))
trigger_error(_snapshot_trigger);
else
trigger_ready(_snapshot_trigger);
}
 }
 
+static int record__auxtrace_snapshot_exit(struct record *rec)
+{
+   if (trigger_is_error(_snapshot_trigger))
+   return 0;
+
+   if (!auxtrace_record__snapshot_started &&
+   auxtrace_record__snapshot_start(rec->itr))
+   return -1;
+
+   record__read_auxtrace_snapshot(rec, true);
+   if (trigger_is_error(_snapshot_trigger))
+   return -1;
+
+   return 0;
+}
+
 static int record__auxtrace_init(struct record *rec)
 {
int err;
@@ -654,7 +670,7 @@ int record__auxtrace_mmap_read(struct record *rec 
__maybe_unused,
 }
 
 static inline
-void record__read_auxtrace_snapshot(struct record *rec __maybe_unused)
+void record__read_auxtrace_snapshot(struct record *rec __maybe_unused, bool 
on_exit)
 {
 }
 
@@ -664,6 +680,12 @@ int auxtrace_record__snapshot_start(struct auxtrace_record 
*itr __maybe_unused)
return 0;
 }
 
+static inline
+int record__auxtrace_snapshot_exit(struct record *rec)
+{
+   return 0;
+}
+
 static int record__auxtrace_init(struct record *rec __maybe_unused)
 {
return 0;
@@ -1536,7 +1558,7 @@ static int __cmd_record(struct record *rec, int argc, 
const char **argv)
if (auxtrace_record__snapshot_started) {
auxtrace_record__snapshot_started = 0;
if (!trigger_is_error(_snapshot_trigger))
-   record__read_auxtrace_snapshot(rec);
+   record__read_auxtrace_snapshot(rec, false);
if (trigger_is_error(_snapshot_trigger)) {
pr_err("AUX area tracing snapshot failed\n");
err = -1;
@@ -1609,9 +1631,13 @@ static int __cmd_record(struct record *rec, int argc, 
const char **argv)
disabled = true;
}
}
+
trigger_off(_snapshot_trigger);
trigger_off(_output_trigger);
 
+   if (opts->auxtrace_snapshot_on_exit)
+  

Re: [PATCH] perf record: Add an option to take an AUX snapshot on exit

2019-08-06 Thread Alexander Shishkin
Alexander Shishkin  writes:

> It is sometimes useful to generate a snapshot when perf record exits;
> I've been using a wrapper script around the workload that would do a
> killall -USR2 perf when the workload exits.
>
> This patch makes it easier and also works when perf record is attached
> to a pre-existing task. A new snapshot option 'e' can be specified in
> -S to enable this behavior:
>
> root@elsewhere:~# perf record -e intel_pt// -Se sleep 1
> [ perf record: Woken up 2 times to write data ]
> [ perf record: Captured and wrote 0.085 MB perf.data ]
>
> Signed-off-by: Alexander Shishkin 
> ---
>  tools/perf/Documentation/perf-record.txt | 11 ---
>  tools/perf/builtin-record.c  |  4 
>  tools/perf/perf.h|  1 +
>  tools/perf/util/auxtrace.c   | 10 ++
>  4 files changed, 23 insertions(+), 3 deletions(-)
>
> diff --git a/tools/perf/Documentation/perf-record.txt 
> b/tools/perf/Documentation/perf-record.txt
> index ea3789d05e..164ffce680 100644
> --- a/tools/perf/Documentation/perf-record.txt
> +++ b/tools/perf/Documentation/perf-record.txt
> @@ -387,9 +387,14 @@ CLOCK_BOOTTIME, CLOCK_REALTIME and CLOCK_TAI.
>  -S::
>  --snapshot::
>  Select AUX area tracing Snapshot Mode. This option is valid only with an
> -AUX area tracing event. Optionally the number of bytes to capture per
> -snapshot can be specified. In Snapshot Mode, trace data is captured only when
> -signal SIGUSR2 is received.
> +AUX area tracing event. Optionally, certain snapshot capturing parameters
> +can be specified in a string that follows this option:
> +  'e': take one last snapshot on exit; guarantees that there is at least one
> +   snapshot in the output file;
> +  : if the PMU supports this, specify the desired snapshot size.
> +
> +In Snapshot Mode trace data is captured only when signal SIGUSR2 is received
> +and on exit if the above 'e' option is given.
>  
>  --proc-map-timeout::
>  When processing pre-existing threads /proc/XXX/mmap, it may take a long time,
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 70340ff200..42d1b7aeee 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -1136,6 +1136,10 @@ static int __cmd_record(struct record *rec, int argc, 
> const char **argv)
>   disabled = true;
>   }
>   }
> +
> + if (opts->auxtrace_snapshot_on_exit)
> + record__read_auxtrace_snapshot(rec);
> +
>   trigger_off(_snapshot_trigger);
>   trigger_off(_output_trigger);
>  
> diff --git a/tools/perf/perf.h b/tools/perf/perf.h
> index 806c216a10..b79c57485b 100644
> --- a/tools/perf/perf.h
> +++ b/tools/perf/perf.h
> @@ -50,6 +50,7 @@ struct record_opts {
>   bool running_time;
>   bool full_auxtrace;
>   bool auxtrace_snapshot_mode;
> + bool auxtrace_snapshot_on_exit;
>   bool record_namespaces;
>   bool record_switch_events;
>   bool all_kernel;
> diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
> index 0daf63b9ee..25ad4445b0 100644
> --- a/tools/perf/util/auxtrace.c
> +++ b/tools/perf/util/auxtrace.c
> @@ -564,6 +564,16 @@ int auxtrace_parse_snapshot_options(struct 
> auxtrace_record *itr,
>   if (!str)
>   return 0;
>  
> + /* PMU-agnostic options */
> + switch (*str) {
> + case 'e':
> + opts->auxtrace_snapshot_on_exit = true;
> + str++;
> + break;
> + default:
> + break;
> + }
> +
>   if (itr)
>   return itr->parse_snapshot_options(itr, opts, str);
>  
> -- 
> 2.11.0


  1   2   3   4   5   6   7   8   9   10   >