[PATCHv2 0/4] perf stat: Enable group read of counters
hi, sending changes to enable group read of perf counters for perf stat command. It allows us to read whole group of counters within single read syscall. v2 changes: - fixed release segfault reported by Arnaldo - rebased to latest Arnaldo's perf/core - patch 1 already merged in Also available in here: git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git perf/stat_group Not sure why we haven't supported yet, but anyway it was unavailable for some time due to a bug which was fixed just recently via: ba5213ae6b88 ("perf/core: Correct event creation with PERF_FORMAT_GROUP") thanks, jirka --- Jiri Olsa (3): perf tools: Add perf_evsel__read_size function perf tools: Add perf_evsel__read_counter function perf stat: Use group read for event groups tools/perf/builtin-stat.c | 30 --- tools/perf/util/counts.h | 1 + tools/perf/util/evsel.c | 139 - tools/perf/util/evsel.h | 2 ++ tools/perf/util/stat.c| 4 +++ tools/perf/util/stat.h| 5 ++-- 6 files changed, 175 insertions(+), 6 deletions(-)
[PATCH 1/3] perf tools: Add perf_evsel__read_size function
Currently we use the size of struct perf_counts_values to read the event, which prevents us to put any new member to the struct. Adding perf_evsel__read_size to return size of the buffer needed for event read. Link: http://lkml.kernel.org/n/tip-cfc3dmil3tlzezzxtyi9f...@git.kernel.org Signed-off-by: Jiri Olsa --- tools/perf/util/evsel.c | 29 - 1 file changed, 28 insertions(+), 1 deletion(-) diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index 450b5fadf8cb..4dd0fcc06db9 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -1261,15 +1261,42 @@ void perf_counts_values__scale(struct perf_counts_values *count, *pscaled = scaled; } +static int perf_evsel__read_size(struct perf_evsel *evsel) +{ + u64 read_format = evsel->attr.read_format; + int entry = sizeof(u64); /* value */ + int size = 0; + int nr = 1; + + if (read_format & PERF_FORMAT_TOTAL_TIME_ENABLED) + size += sizeof(u64); + + if (read_format & PERF_FORMAT_TOTAL_TIME_RUNNING) + size += sizeof(u64); + + if (read_format & PERF_FORMAT_ID) + entry += sizeof(u64); + + if (read_format & PERF_FORMAT_GROUP) { + nr = evsel->nr_members; + size += sizeof(u64); + } + + size += entry * nr; + return size; +} + int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread, struct perf_counts_values *count) { + size_t size = perf_evsel__read_size(evsel); + memset(count, 0, sizeof(*count)); if (FD(evsel, cpu, thread) < 0) return -EINVAL; - if (readn(FD(evsel, cpu, thread), count, sizeof(*count)) <= 0) + if (readn(FD(evsel, cpu, thread), count->values, size) <= 0) return -errno; return 0; -- 2.9.4
[PATCH 3/3] perf stat: Use group read for event groups
Make perf stat use group read if there are groups defined. The group read will get the values for all member of groups within a single syscall instead of calling read syscall for every event. We can see considerable less amount of kernel cycles spent on single group read, than reading each event separately, like for following perf stat command: # perf stat -e {cycles,instructions} -I 10 -a sleep 1 Monitored with "perf stat -r 5 -e '{cycles:u,cycles:k}'" Before: 24,325,676 cycles:u 297,040,775 cycles:k 1.038554134 seconds time elapsed After: 25,034,418 cycles:u 158,256,395 cycles:k 1.036864497 seconds time elapsed The perf_evsel__open fallback changes contributed by Andi Kleen. Link: http://lkml.kernel.org/n/tip-b6g8qarwvptr81cqdtfst...@git.kernel.org Signed-off-by: Jiri Olsa --- tools/perf/builtin-stat.c | 30 +++--- tools/perf/util/counts.h | 1 + tools/perf/util/evsel.c | 10 ++ 3 files changed, 38 insertions(+), 3 deletions(-) diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index 48ac53b199fc..866da7aa54bf 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -213,10 +213,20 @@ static void perf_stat__reset_stats(void) static int create_perf_stat_counter(struct perf_evsel *evsel) { struct perf_event_attr *attr = >attr; + struct perf_evsel *leader = evsel->leader; - if (stat_config.scale) + if (stat_config.scale) { attr->read_format = PERF_FORMAT_TOTAL_TIME_ENABLED | PERF_FORMAT_TOTAL_TIME_RUNNING; + } + + /* +* The event is part of non trivial group, let's enable +* the group read (for leader) and ID retrieval for all +* members. +*/ + if (leader->nr_members > 1) + attr->read_format |= PERF_FORMAT_ID|PERF_FORMAT_GROUP; attr->inherit = !no_inherit; @@ -333,13 +343,21 @@ static int read_counter(struct perf_evsel *counter) struct perf_counts_values *count; count = perf_counts(counter->counts, cpu, thread); - if (perf_evsel__read(counter, cpu, thread, count)) { + + /* +* The leader's group read loads data into its group members +* (via perf_evsel__read_counter) and sets threir count->loaded. +*/ + if (!count->loaded && + perf_evsel__read_counter(counter, cpu, thread)) { counter->counts->scaled = -1; perf_counts(counter->counts, cpu, thread)->ena = 0; perf_counts(counter->counts, cpu, thread)->run = 0; return -1; } + count->loaded = false; + if (STAT_RECORD) { if (perf_evsel__write_stat_event(counter, cpu, thread, count)) { pr_err("failed to write stat event\n"); @@ -559,6 +577,11 @@ static int store_counter_ids(struct perf_evsel *counter) return __store_counter_ids(counter, cpus, threads); } +static bool perf_evsel__should_store_id(struct perf_evsel *counter) +{ + return STAT_RECORD || counter->attr.read_format & PERF_FORMAT_ID; +} + static int __run_perf_stat(int argc, const char **argv) { int interval = stat_config.interval; @@ -631,7 +654,8 @@ static int __run_perf_stat(int argc, const char **argv) if (l > unit_width) unit_width = l; - if (STAT_RECORD && store_counter_ids(counter)) + if (perf_evsel__should_store_id(counter) && + store_counter_ids(counter)) return -1; } diff --git a/tools/perf/util/counts.h b/tools/perf/util/counts.h index 34d8baaf558a..cb45a6aecf9d 100644 --- a/tools/perf/util/counts.h +++ b/tools/perf/util/counts.h @@ -12,6 +12,7 @@ struct perf_counts_values { }; u64 values[3]; }; + boolloaded; }; struct perf_counts { diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index 89aecf3a35c7..3735c9e0080d 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -49,6 +49,7 @@ static struct { bool clockid_wrong; bool lbr_flags; bool write_backward; + bool group_read; } perf_missing_features; static clockid_t clockid; @@ -1321,6 +1322,7 @@ perf_evsel__set_count(struct perf_evsel *counter, int cpu, int thread, count->val= val; count->ena= ena; count->run= run; + count->loaded = true; } static int @@ -1677,6 +1679,8 @@ int perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus, if
[PATCH 2/3] perf tools: Add perf_evsel__read_counter function
Adding perf_evsel__read_counter function to read single or group counter. After calling this function the counter's evsel::counts struct is filled with values for the counter and member of its group if there are any. Link: http://lkml.kernel.org/n/tip-itsuxdyt7rp4mvij1t6k7...@git.kernel.org Signed-off-by: Jiri Olsa --- tools/perf/util/evsel.c | 100 tools/perf/util/evsel.h | 2 + tools/perf/util/stat.c | 4 ++ tools/perf/util/stat.h | 5 ++- 4 files changed, 109 insertions(+), 2 deletions(-) diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index 4dd0fcc06db9..89aecf3a35c7 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -1302,6 +1302,106 @@ int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread, return 0; } +static int +perf_evsel__read_one(struct perf_evsel *evsel, int cpu, int thread) +{ + struct perf_counts_values *count = perf_counts(evsel->counts, cpu, thread); + + return perf_evsel__read(evsel, cpu, thread, count); +} + +static void +perf_evsel__set_count(struct perf_evsel *counter, int cpu, int thread, + u64 val, u64 ena, u64 run) +{ + struct perf_counts_values *count; + + count = perf_counts(counter->counts, cpu, thread); + + count->val= val; + count->ena= ena; + count->run= run; +} + +static int +perf_evsel__process_group_data(struct perf_evsel *leader, + int cpu, int thread, u64 *data) +{ + u64 read_format = leader->attr.read_format; + struct sample_read_value *v; + u64 nr, ena = 0, run = 0, i; + + nr = *data++; + + if (nr != (u64) leader->nr_members) + return -EINVAL; + + if (read_format & PERF_FORMAT_TOTAL_TIME_ENABLED) + ena = *data++; + + if (read_format & PERF_FORMAT_TOTAL_TIME_RUNNING) + run = *data++; + + v = (struct sample_read_value *) data; + + perf_evsel__set_count(leader, cpu, thread, + v[0].value, ena, run); + + for (i = 1; i < nr; i++) { + struct perf_evsel *counter; + + counter = perf_evlist__id2evsel(leader->evlist, v[i].id); + if (!counter) + return -EINVAL; + + perf_evsel__set_count(counter, cpu, thread, + v[i].value, ena, run); + } + + return 0; +} + +static int +perf_evsel__read_group(struct perf_evsel *leader, int cpu, int thread) +{ + struct perf_stat_evsel *ps = leader->priv; + u64 read_format = leader->attr.read_format; + int size = perf_evsel__read_size(leader); + u64 *data = ps->group_data; + + if (!(read_format & PERF_FORMAT_ID)) + return -EINVAL; + + if (!perf_evsel__is_group_leader(leader)) + return -EINVAL; + + if (!data) { + data = zalloc(size); + if (!data) + return -ENOMEM; + + ps->group_data = data; + } + + if (FD(leader, cpu, thread) < 0) + return -EINVAL; + + if (readn(FD(leader, cpu, thread), data, size) <= 0) + return -errno; + + return perf_evsel__process_group_data(leader, cpu, thread, data); +} + +int perf_evsel__read_counter(struct perf_evsel *evsel, int cpu, int thread) +{ + u64 read_format = evsel->attr.read_format; + + if (read_format & PERF_FORMAT_GROUP) + return perf_evsel__read_group(evsel, cpu, thread); + else + return perf_evsel__read_one(evsel, cpu, thread); +} + int __perf_evsel__read_on_cpu(struct perf_evsel *evsel, int cpu, int thread, bool scale) { diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h index fb40ca3c6519..de03c18daaf0 100644 --- a/tools/perf/util/evsel.h +++ b/tools/perf/util/evsel.h @@ -299,6 +299,8 @@ static inline bool perf_evsel__match2(struct perf_evsel *e1, int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread, struct perf_counts_values *count); +int perf_evsel__read_counter(struct perf_evsel *evsel, int cpu, int thread); + int __perf_evsel__read_on_cpu(struct perf_evsel *evsel, int cpu, int thread, bool scale); diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c index 53b9a994a3dc..35e9848734d6 100644 --- a/tools/perf/util/stat.c +++ b/tools/perf/util/stat.c @@ -128,6 +128,10 @@ static int perf_evsel__alloc_stat_priv(struct perf_evsel *evsel) static void perf_evsel__free_stat_priv(struct perf_evsel *evsel) { + struct perf_stat_evsel *ps = evsel->priv; + + if (ps) + free(ps->group_data); zfree(>priv); } diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h index 7522bf10b03e..eacaf958e19d 100644 --- a/tools/perf/util/stat.h +++ b/tools/perf/util/stat.h @@
Re: [PATCH 1/3] arm/syscalls: Move address limit check in loop
On Tue, Jul 25, 2017 at 01:01:17PM -0700, Thomas Garnier wrote: > On Tue, Jul 25, 2017 at 3:38 AM, Russell King - ARM Linux > wrote: > > On Tue, Jul 25, 2017 at 01:28:01PM +0300, Leonard Crestez wrote: > >> On Mon, 2017-07-24 at 10:07 -0700, Thomas Garnier wrote: > >> > On Wed, Jul 19, 2017 at 10:58 AM, Thomas Garnier >> > > wrote: > >> > > > >> > > The work pending loop can call set_fs after addr_limit_user_check > >> > > removed the _TIF_FSCHECK flag. To prevent the infinite loop, move > >> > > the addr_limit_user_check call at the beginning of the loop. > >> > > > >> > > Fixes: 73ac5d6a2b6a ("arm/syscalls: Check address limit on user- > >> > > mode return") > >> > > Reported-by: Leonard Crestez > >> > > Signed-off-by: Thomas Garnier > >> > >> > Any comments on this patch set? > >> > >> Tested-by: Leonard Crestez > >> > >> This appears to fix the original issue of failing to boot from NFS when > >> there are lots of alignment faults. But this is a very basic test > >> relative to the reach of this change. > >> > >> However the original patch has been in linux-next for a while and > >> apparently nobody else noticed system calls randomly hanging on arm. > >> > >> I assume maintainers need to give their opinion. > > > > I've already stated my opinion, which is different from what Linus has > > requested of Thomas. IMHO, the current approach is going to keep on > > causing problems along the lines that I've already pointed out. > > I understand. Do you think this problem apply to arm64 as well? It's probably less of an issue for arm64 because we don't take alignment faults from the kernel and I think the perf case would resolve itself by throttling the event. However, I also don't see the advantage of doing this in the work loop as opposed to leaving it until we're actually doing the return to userspace. I looked to see what you've done for x86, but it looks like you check/clear the flag before the work pending loop (exit_to_usermode_loop), which subsequently re-enables interrupts and exits when EXIT_TO_USERMODE_LOOP_FLAGS are all clear. Since TIF_FSCHECK isn't included in those flags, what stops it being set again by an irq and remaining set for the return to userspace? Will
Re: [PATCH] arm64: Convert to using %pOF instead of full_name
Hi Rob, On Tue, Jul 25, 2017 at 07:27:29PM -0500, Rob Herring wrote: > On Tue, Jul 25, 2017 at 7:04 AM, Will Deacon wrote: > > On Tue, Jul 18, 2017 at 04:42:42PM -0500, Rob Herring wrote: > >> Now that we have a custom printf format specifier, convert users of > >> full_name to use %pOF instead. This is preparation to remove storing > >> of the full path string for each node. > >> > >> Signed-off-by: Rob Herring > >> Cc: Catalin Marinas > >> Cc: Will Deacon > >> Cc: linux-arm-ker...@lists.infradead.org > >> --- > >> arch/arm64/kernel/cpu_ops.c | 4 ++-- > >> arch/arm64/kernel/smp.c | 12 ++-- > >> arch/arm64/kernel/topology.c | 22 +++--- > >> 3 files changed, 19 insertions(+), 19 deletions(-) > > > > I've queued this and the perf patch too, but it would be good if somebody > > could update sparse to recognise this format specifier. Currently it > > just complains about it. > > I'm happy to fix it, but I ran sparse and don't see any errors. Got a pointer? I went back and checked again and it's not sparse that's warning, it's actually smatch (sorry for getting that mixed up): arch/arm64/kernel/cpu_ops.c:85 cpu_read_enable_method() error: unrecognized %p extension 'O', treated as normal %p [smatch] Will
Re: [PATCH v4.4.y] sched/cgroup: Move sched_online_group() back into css_online() to fix crash
On Tue, 25 Jul, at 11:04:39AM, Greg KH wrote: > On Thu, Jul 20, 2017 at 02:53:09PM +0100, Matt Fleming wrote: > > From: Konstantin Khlebnikov > > > > commit 96b777452d8881480fd5be50112f791c17db4b6b upstream. > > > > Commit: > > > > 2f5177f0fd7e ("sched/cgroup: Fix/cleanup cgroup teardown/init") > > > > .. moved sched_online_group() from css_online() to css_alloc(). > > It exposes half-baked task group into global lists before initializing > > generic cgroup stuff. > > > > LTP testcase (third in cgroup_regression_test) written for testing > > similar race in kernels 2.6.26-2.6.28 easily triggers this oops: > > > > BUG: unable to handle kernel NULL pointer dereference at 0008 > > IP: kernfs_path_from_node_locked+0x260/0x320 > > CPU: 1 PID: 30346 Comm: cat Not tainted 4.10.0-rc5-test #4 > > Call Trace: > > ? kernfs_path_from_node+0x4f/0x60 > > kernfs_path_from_node+0x3e/0x60 > > print_rt_rq+0x44/0x2b0 > > print_rt_stats+0x7a/0xd0 > > print_cpu+0x2fc/0xe80 > > ? __might_sleep+0x4a/0x80 > > sched_debug_show+0x17/0x30 > > seq_read+0xf2/0x3b0 > > proc_reg_read+0x42/0x70 > > __vfs_read+0x28/0x130 > > ? security_file_permission+0x9b/0xc0 > > ? rw_verify_area+0x4e/0xb0 > > vfs_read+0xa5/0x170 > > SyS_read+0x46/0xa0 > > entry_SYSCALL_64_fastpath+0x1e/0xad > > > > Here the task group is already linked into the global RCU-protected > > 'task_groups' > > list, but the css->cgroup pointer is still NULL. > > > > This patch reverts this chunk and moves online back to css_online(). > > > > Signed-off-by: Konstantin Khlebnikov > > Signed-off-by: Peter Zijlstra (Intel) > > Cc: Linus Torvalds > > Cc: Peter Zijlstra > > Cc: Tejun Heo > > Cc: Thomas Gleixner > > Fixes: 2f5177f0fd7e ("sched/cgroup: Fix/cleanup cgroup teardown/init") > > Link: > > http://lkml.kernel.org/r/148655324740.424917.5302984537258726349.stgit@buzz > > Signed-off-by: Ingo Molnar > > Signed-off-by: Matt Fleming > > --- > > kernel/sched/core.c | 14 -- > > 1 file changed, 12 insertions(+), 2 deletions(-) > > What about 4.9-stable, this should go there too, right? Yes, good catch. Would you like me to send a separate patch?
[PATCH v5 0/4] ARM: dts: imx: add CX9020 Embedded PC device tree
From: Patrick Bruenn The CX9020 differs from i.MX53 Quick Start Board by: - use uart2 instead of uart1 - DVI-D connector instead of VGA - no audio - no SATA connector - CCAT FPGA connected to emi - enable rtc v5: - rebased on v4.13-rc2 - don't take maintainership for imx53-cx9020.dtsi, keep it to ARM/FREESCALE IMX maintainers - add explicit pinmux settings for pwr leds (EIM_D22 - D24) - remove display0->status="okay" - use "regulator-vbus" as name for usb_vbus regulator node - use correct reset values for explicit pinmux settings of: MX53_PAD_GPIO_0__CCM_CLKO MX53_PAD_GPIO_16__I2C3_SDA MX53_PAD_GPIO_1__ESDHC1_CD MX53_PAD_GPIO_3__GPIO1_3 MX53_PAD_GPIO_8__GPIO1_8 v4: - move alternative UART2 pinmux settings to imx53-pinfunc.h - fix copyright notice and model name to clearify cx9020 is a Beckhoff board and not from Freescale/NXP/Qualcomm - add "bhf,cx9020" compatible - remove ccat node and pin configuration as long as the ccat driver is not mainlined - use dvi-connector + ti,tfp410 instead of panel-simple - add newlines between property list and child nodes - replace underscores in node names with hypens - replace magic number 0 with polarity defines from include/dt-bindings/gpio/gpio.h - move rtc node into imx53.dtsi, change it's name into 'srtc', to avoid a conflict with 'rtc' node in imx53-m53.dtsi - rename regulator-3p2v - drop imx53-qsb container node - make iomux configuration explicit - remove unused audmux - remove unused led_pin_gpio3_23 configuration - use blue gpio-leds as disk-activity indicators for mmc0 and mmc1 - add mmc indicator leds to sdhc pingroups - keep node names in alphabetical order - remove unused sata and ssi2 - remove unused pin configs from hoggrp - add entry for Beckhoff related files to MAINTAINERS v3: add missig changelog v2: - keep alphabetic order of dts/Makefile - configure uart2 with 'fsl,dte-mode' - use display-0 and panel-0 as node names - remove unnecessary "simple-bus" for fixed regulators Patrick Bruenn (4): dt-bindings: arm: Add entry for Beckhoff CX9020 ARM: dts: imx53: add srtc node ARM: dts: imx53: add alternative UART2 configuration ARM: dts: imx: add CX9020 Embedded PC device tree Documentation/devicetree/bindings/arm/bhf.txt | 6 + .../devicetree/bindings/vendor-prefixes.txt| 1 + MAINTAINERS| 5 + arch/arm/boot/dts/Makefile | 1 + arch/arm/boot/dts/imx53-cx9020.dts | 297 + arch/arm/boot/dts/imx53-pinfunc.h | 4 + arch/arm/boot/dts/imx53.dtsi | 9 + 7 files changed, 323 insertions(+) create mode 100644 Documentation/devicetree/bindings/arm/bhf.txt create mode 100644 arch/arm/boot/dts/imx53-cx9020.dts -- 2.11.0
[PATCH v5 4/4] ARM: dts: imx: add CX9020 Embedded PC device tree
From: Patrick Bruenn The CX9020 differs from i.MX53 Quick Start Board by: - use uart2 instead of uart1 - DVI-D connector instead of VGA - no audio - no SATA connector - CCAT FPGA connected to emi - enable rtc Signed-off-by: Patrick Bruenn --- arch/arm/boot/dts/Makefile | 1 + arch/arm/boot/dts/imx53-cx9020.dts | 297 + 2 files changed, 298 insertions(+) create mode 100644 arch/arm/boot/dts/imx53-cx9020.dts diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile index 4b17f35dc9a7..f0ba9be523e0 100644 --- a/arch/arm/boot/dts/Makefile +++ b/arch/arm/boot/dts/Makefile @@ -340,6 +340,7 @@ dtb-$(CONFIG_SOC_IMX51) += \ imx51-ts4800.dtb dtb-$(CONFIG_SOC_IMX53) += \ imx53-ard.dtb \ + imx53-cx9020.dtb \ imx53-m53evk.dtb \ imx53-mba53.dtb \ imx53-qsb.dtb \ diff --git a/arch/arm/boot/dts/imx53-cx9020.dts b/arch/arm/boot/dts/imx53-cx9020.dts new file mode 100644 index ..4f54fd4418a3 --- /dev/null +++ b/arch/arm/boot/dts/imx53-cx9020.dts @@ -0,0 +1,297 @@ +/* + * Copyright 2017 Beckhoff Automation GmbH & Co. KG + * based on imx53-qsb.dts + * + * The code contained herein is licensed under the GNU General Public + * License. You may obtain a copy of the GNU General Public License + * Version 2 or later at the following locations: + * + * http://www.opensource.org/licenses/gpl-license.html + * http://www.gnu.org/copyleft/gpl.html + */ + +/dts-v1/; +#include "imx53.dtsi" + +/ { + model = "Beckhoff CX9020 Embedded PC"; + compatible = "bhf,cx9020", "fsl,imx53"; + + chosen { + stdout-path = + }; + + memory { + reg = <0x7000 0x2000>, + <0xb000 0x2000>; + }; + + display-0 { + #address-cells =<1>; + #size-cells = <0>; + compatible = "fsl,imx-parallel-display"; + interface-pix-fmt = "rgb24"; + pinctrl-names = "default"; + pinctrl-0 = <_ipu_disp0>; + + port@0 { + reg = <0>; + + display0_in: endpoint { + remote-endpoint = <_di0_disp0>; + }; + }; + + port@1 { + reg = <1>; + + display0_out: endpoint { + remote-endpoint = <_in>; + }; + }; + }; + + dvi-connector { + compatible = "dvi-connector"; + ddc-i2c-bus = <>; + digital; + + port { + dvi_connector_in: endpoint { + remote-endpoint = <_out>; + }; + }; + }; + + dvi-converter { + #address-cells = <1>; + #size-cells = <0>; + compatible = "ti,tfp410"; + + port@0 { + reg = <0>; + + tfp410_in: endpoint { + remote-endpoint = <_out>; + }; + }; + + port@1 { + reg = <1>; + + tfp410_out: endpoint { + remote-endpoint = <_connector_in>; + }; + }; + }; + + leds { + compatible = "gpio-leds"; + + pwr-r { + gpios = < 22 GPIO_ACTIVE_HIGH>; + default-state = "off"; + }; + + pwr-g { + gpios = < 24 GPIO_ACTIVE_HIGH>; + default-state = "on"; + }; + + pwr-b { + gpios = < 23 GPIO_ACTIVE_HIGH>; + default-state = "off"; + }; + + sd1-b { + linux,default-trigger = "mmc0"; + gpios = < 20 GPIO_ACTIVE_HIGH>; + }; + + sd2-b { + linux,default-trigger = "mmc1"; + gpios = < 17 GPIO_ACTIVE_HIGH>; + }; + }; + + regulator-3p2v { + compatible = "regulator-fixed"; + regulator-name = "3P2V"; + regulator-min-microvolt = <320>; + regulator-max-microvolt = <320>; + regulator-always-on; + }; + + reg_usb_vbus: regulator-vbus { + compatible = "regulator-fixed"; + regulator-name = "usb_vbus"; + regulator-min-microvolt = <500>; + regulator-max-microvolt = <500>; + gpio = < 8 GPIO_ACTIVE_HIGH>; + enable-active-high; + }; +}; + + { + pinctrl-names = "default"; + pinctrl-0 = <_esdhc1>; + cd-gpios = < 1 GPIO_ACTIVE_LOW>; +
[PATCH v5 1/4] dt-bindings: arm: Add entry for Beckhoff CX9020
From: Patrick Bruenn - add vendor prefix bhf for Beckhoff - add new board binding bhf,cx9020 Signed-off-by: Patrick Bruenn --- Documentation/devicetree/bindings/arm/bhf.txt | 6 ++ Documentation/devicetree/bindings/vendor-prefixes.txt | 1 + MAINTAINERS | 5 + 3 files changed, 12 insertions(+) create mode 100644 Documentation/devicetree/bindings/arm/bhf.txt diff --git a/Documentation/devicetree/bindings/arm/bhf.txt b/Documentation/devicetree/bindings/arm/bhf.txt new file mode 100644 index ..886b503caf9c --- /dev/null +++ b/Documentation/devicetree/bindings/arm/bhf.txt @@ -0,0 +1,6 @@ +Beckhoff Automation Platforms Device Tree Bindings +-- + +CX9020 Embedded PC +Required root node properties: +- compatible = "bhf,cx9020", "fsl,imx53"; diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt b/Documentation/devicetree/bindings/vendor-prefixes.txt index daf465bef758..20c2cf57ebc9 100644 --- a/Documentation/devicetree/bindings/vendor-prefixes.txt +++ b/Documentation/devicetree/bindings/vendor-prefixes.txt @@ -47,6 +47,7 @@ avic Shanghai AVIC Optoelectronics Co., Ltd. axentiaAxentia Technologies AB axis Axis Communications AB bananapi BIPAI KEJI LIMITED +bhfBeckhoff Automation GmbH & Co. KG boeBOE Technology Group Co., Ltd. bosch Bosch Sensortec GmbH boundary Boundary Devices Inc. diff --git a/MAINTAINERS b/MAINTAINERS index f66488dfdbc9..e1d3111aea97 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1196,6 +1196,11 @@ F: arch/arm/boot/dts/sama*.dtsi F: arch/arm/include/debug/at91.S F: drivers/memory/atmel* +ARM/BECKHOFF SUPPORT +M: Patrick Bruenn +S: Maintained +F: Documentation/devicetree/bindings/arm/bhf.txt + ARM/CALXEDA HIGHBANK ARCHITECTURE M: Rob Herring L: linux-arm-ker...@lists.infradead.org (moderated for non-subscribers) -- 2.11.0
Re: [PATCH] mm, memcg: reset low limit during memcg offlining
Hello, Vladimir. On Wed, Jul 26, 2017 at 11:30:17AM +0300, Vladimir Davydov wrote: > > As I understand, css_reset() callback is intended to _completely_ disable > > all > > limits, as if there were no cgroup at all. > > But that's exactly what cgroup offline is: deletion of a cgroup as if it > never existed. The fact that we leave the zombie dangling until all > pages charged to the cgroup are gone is an implementation detail. IIRC > we would "reparent" those charges and delete the mem_cgroup right away > if it were not inherently racy. That may be true for memcg but not in general. Think about writeback IOs servicing dirty pages of a removed cgroup. Removing a cgroup shouldn't grant it more resources than when it was alive and changing the membership to the parent will break that. For memcg, they seem the same just because no new major consumption can be generated after removal. > The user can't tweak limits of an offline cgroup, because the cgroup > directory no longer exist. So IMHO resetting all limits is reasonable. > If you want to keep the cgroup limits effective, you shouldn't have > deleted it in the first place, I suppose. I don't think that's the direction we wanna go. Granting more resources on removal is surprising. Thanks. -- tejun
Re: [PATCH v2 01/11] ASoC: samsung: s3c2412: Handle return value of clk_prepare_enable.
Hi, On Wednesday 26 July 2017 04:58 PM, Mark Brown wrote: On Wed, Jul 26, 2017 at 11:15:25AM +0530, Arvind Yadav wrote: --- a/sound/soc/samsung/s3c2412-i2s.c +++ b/sound/soc/samsung/s3c2412-i2s.c @@ -65,13 +65,16 @@ static int s3c2412_i2s_probe(struct snd_soc_dai *dai) s3c2412_i2s.iis_cclk = devm_clk_get(dai->dev, "i2sclk"); if (IS_ERR(s3c2412_i2s.iis_cclk)) { pr_err("failed to get i2sclk clock\n"); - return PTR_ERR(s3c2412_i2s.iis_cclk); + ret = PTR_ERR(s3c2412_i2s.iis_cclk); + goto err; } Why are we making this unrelated change? None of the error handling we jump to is relevant if this fails... 3c_i2sv2_probe is enabling "iis" clock. If devm_clk_get(, "i2sclk") fails. we need to disable and free the clock "iis" . /* Set MPLL as the source for IIS CLK */ clk_set_parent(s3c2412_i2s.iis_cclk, clk_get(NULL, "mpll")); - clk_prepare_enable(s3c2412_i2s.iis_cclk); + ret = clk_prepare_enable(s3c2412_i2s.iis_cclk); + if (ret) + goto err; s3c2412_i2s.iis_cclk = s3c2412_i2s.iis_pclk; @@ -80,6 +83,11 @@ static int s3c2412_i2s_probe(struct snd_soc_dai *dai) S3C_GPIO_PULL_NONE); return 0; + +err: + clk_disable(s3c2412_i2s.iis_pclk); This will disable the clock if we failed to enable it which is clearly not correct. It's also matching a clk_prepare_enable() with a clk_disable() which is going to leave an unbalanced prepare. s3c_i2sv2_probe is enabling "iis" clock. And s3c2412_i2s_probe is enabling "i2sclk" and "mpll"clock. If, "mpll" clk_prepare_enable fails. We need to disable and free the clock "iis". and devm will handle other clock "i2sclk". In this code we have used "s3c2412_i2s.iis_cclk" for all the clock which is more confusing for me. Please correct me if i am wrong. ~arvind
Re: [PATCH] lib/strscpy: avoid KASAN false positive
On Wed, Jul 19, 2017 at 6:05 PM, Dave Jones wrote: > On Wed, Jul 19, 2017 at 11:39:32AM -0400, Chris Metcalf wrote: > > > > We could just remove all that word-at-a-time logic. Do we have any > > > evidence that this would harm anything? > > > > The word-at-a-time logic was part of the initial commit since I wanted > > to ensure that strscpy could be used to replace strlcpy or strncpy without > > serious concerns about performance. > > I'm curious what the typical length of the strings we're concerned about > in this case are if this makes a difference. My vote is for proceeding with the original Andrey's patch. It's not perfect, but it's simple, short, minimally intrusive and fixes the problem at hand. We can do something more fundamental when/if we have more such cases.
[PATCH v5 2/4] ARM: dts: imx53: add srtc node
From: Patrick Bruenn The i.MX53 has an integrated secure real time clock. Add it to the dtsi. Signed-off-by: Patrick Bruenn --- arch/arm/boot/dts/imx53.dtsi | 9 + 1 file changed, 9 insertions(+) diff --git a/arch/arm/boot/dts/imx53.dtsi b/arch/arm/boot/dts/imx53.dtsi index 2e516f4985e4..8bf0d89cdd35 100644 --- a/arch/arm/boot/dts/imx53.dtsi +++ b/arch/arm/boot/dts/imx53.dtsi @@ -433,6 +433,15 @@ clock-names = "ipg", "per"; }; + srtc: srtc@53fa4000 { + compatible = "fsl,imx53-rtc", "fsl,imx25-rtc"; + reg = <0x53fa4000 0x4000>; + interrupts = <24>; + interrupt-parent = <>; + clocks = < IMX5_CLK_SRTC_GATE>; + clock-names = "ipg"; + }; + iomuxc: iomuxc@53fa8000 { compatible = "fsl,imx53-iomuxc"; reg = <0x53fa8000 0x4000>; -- 2.11.0
[PATCH v5 3/4] ARM: dts: imx53: add alternative UART2 configuration
From: Patrick Bruenn UART2 on EIM_D26 - EIM_D29 pins supports interchanging RXD/TXD pins and RTS/CTS pins. One board using these alternate settings is Beckhoff CX9020. Add the alternative configuration here, to make it available to others, too. Signed-off-by: Patrick Bruenn --- arch/arm/boot/dts/imx53-pinfunc.h | 4 1 file changed, 4 insertions(+) diff --git a/arch/arm/boot/dts/imx53-pinfunc.h b/arch/arm/boot/dts/imx53-pinfunc.h index aec406bc65eb..59f9c29e3fe2 100644 --- a/arch/arm/boot/dts/imx53-pinfunc.h +++ b/arch/arm/boot/dts/imx53-pinfunc.h @@ -524,6 +524,7 @@ #define MX53_PAD_EIM_D25__UART1_DSR0x140 0x488 0x000 0x7 0x0 #define MX53_PAD_EIM_D26__EMI_WEIM_D_260x144 0x48c 0x000 0x0 0x0 #define MX53_PAD_EIM_D26__GPIO3_26 0x144 0x48c 0x000 0x1 0x0 +#define MX53_PAD_EIM_D26__UART2_RXD_MUX0x144 0x48c 0x880 0x2 0x0 #define MX53_PAD_EIM_D26__UART2_TXD_MUX0x144 0x48c 0x000 0x2 0x0 #define MX53_PAD_EIM_D26__FIRI_RXD 0x144 0x48c 0x80c 0x3 0x0 #define MX53_PAD_EIM_D26__IPU_CSI0_D_1 0x144 0x48c 0x000 0x4 0x0 @@ -533,6 +534,7 @@ #define MX53_PAD_EIM_D27__EMI_WEIM_D_270x148 0x490 0x000 0x0 0x0 #define MX53_PAD_EIM_D27__GPIO3_27 0x148 0x490 0x000 0x1 0x0 #define MX53_PAD_EIM_D27__UART2_RXD_MUX0x148 0x490 0x880 0x2 0x1 +#define MX53_PAD_EIM_D27__UART2_TXD_MUX0x148 0x490 0x000 0x2 0x0 #define MX53_PAD_EIM_D27__FIRI_TXD 0x148 0x490 0x000 0x3 0x0 #define MX53_PAD_EIM_D27__IPU_CSI0_D_0 0x148 0x490 0x000 0x4 0x0 #define MX53_PAD_EIM_D27__IPU_DI1_PIN130x148 0x490 0x000 0x5 0x0 @@ -541,6 +543,7 @@ #define MX53_PAD_EIM_D28__EMI_WEIM_D_280x14c 0x494 0x000 0x0 0x0 #define MX53_PAD_EIM_D28__GPIO3_28 0x14c 0x494 0x000 0x1 0x0 #define MX53_PAD_EIM_D28__UART2_CTS0x14c 0x494 0x000 0x2 0x0 +#define MX53_PAD_EIM_D28__UART2_RTS0x14c 0x494 0x87c 0x2 0x0 #define MX53_PAD_EIM_D28__IPU_DISPB0_SER_DIO 0x14c 0x494 0x82c 0x3 0x1 #define MX53_PAD_EIM_D28__CSPI_MOSI0x14c 0x494 0x788 0x4 0x1 #define MX53_PAD_EIM_D28__I2C1_SDA 0x14c 0x494 0x818 0x5 0x1 @@ -548,6 +551,7 @@ #define MX53_PAD_EIM_D28__IPU_DI0_PIN130x14c 0x494 0x000 0x7 0x0 #define MX53_PAD_EIM_D29__EMI_WEIM_D_290x150 0x498 0x000 0x0 0x0 #define MX53_PAD_EIM_D29__GPIO3_29 0x150 0x498 0x000 0x1 0x0 +#define MX53_PAD_EIM_D29__UART2_CTS0x150 0x498 0x000 0x2 0x0 #define MX53_PAD_EIM_D29__UART2_RTS0x150 0x498 0x87c 0x2 0x1 #define MX53_PAD_EIM_D29__IPU_DISPB0_SER_RS0x150 0x498 0x000 0x3 0x0 #define MX53_PAD_EIM_D29__CSPI_SS0 0x150 0x498 0x78c 0x4 0x2 -- 2.11.0
Re: [PATCH v3 3/3] power: wm831x_power: Support USB charger current limit management
Hi, On Wed, Jul 26, 2017 at 11:05:25AM +0800, Baolin Wang wrote: > On 25 July 2017 at 17:59, Sebastian Reichel > wrote: > > On Tue, Jul 25, 2017 at 04:00:01PM +0800, Baolin Wang wrote: > >> Integrate with the newly added USB charger interface to limit the current > >> we draw from the USB input based on the input device configuration > >> identified by the USB stack, allowing us to charge more quickly from high > >> current inputs without drawing more current than specified from others. > >> > >> Signed-off-by: Mark Brown > >> Signed-off-by: Baolin Wang > >> --- > >> Documentation/devicetree/bindings/mfd/wm831x.txt |1 + > >> drivers/power/supply/wm831x_power.c | 58 > >> ++ > >> 2 files changed, 59 insertions(+) > >> > >> diff --git a/Documentation/devicetree/bindings/mfd/wm831x.txt > >> b/Documentation/devicetree/bindings/mfd/wm831x.txt > >> index 9f8b743..4e3bc07 100644 > >> --- a/Documentation/devicetree/bindings/mfd/wm831x.txt > >> +++ b/Documentation/devicetree/bindings/mfd/wm831x.txt > >> @@ -31,6 +31,7 @@ Required properties: > >> ../interrupt-controller/interrupts.txt > >> > >> Optional sub-nodes: > >> + - usb-phy : Contains a phandle to the USB PHY. > >>- regulators : Contains sub-nodes for each of the regulators supplied by > >> the device. The regulators are bound using their names listed below: > >> > >> diff --git a/drivers/power/supply/wm831x_power.c > >> b/drivers/power/supply/wm831x_power.c > >> index 7082301..d3948ab 100644 > >> --- a/drivers/power/supply/wm831x_power.c > >> +++ b/drivers/power/supply/wm831x_power.c > >> @@ -13,6 +13,7 @@ > >> #include > >> #include > >> #include > >> +#include > >> > >> #include > >> #include > >> @@ -31,6 +32,8 @@ struct wm831x_power { > >> char usb_name[20]; > >> char battery_name[20]; > >> bool have_battery; > >> + struct usb_phy *usb_phy; > >> + struct notifier_block usb_notify; > >> }; > >> > >> static int wm831x_power_check_online(struct wm831x *wm831x, int supply, > >> @@ -125,6 +128,43 @@ static int wm831x_usb_get_prop(struct power_supply > >> *psy, > >> POWER_SUPPLY_PROP_VOLTAGE_NOW, > >> }; > >> > >> +/* In milliamps */ > >> +static const unsigned int wm831x_usb_limits[] = { > >> + 0, > >> + 2, > >> + 100, > >> + 500, > >> + 900, > >> + 1500, > >> + 1800, > >> + 550, > >> +}; > >> + > >> +static int wm831x_usb_limit_change(struct notifier_block *nb, > >> +unsigned long limit, void *data) > >> +{ > >> + struct wm831x_power *wm831x_power = container_of(nb, > >> + struct wm831x_power, > >> + usb_notify); > >> + unsigned int i, best; > >> + > >> + /* Find the highest supported limit */ > >> + best = 0; > >> + for (i = 0; i < ARRAY_SIZE(wm831x_usb_limits); i++) { > >> + if (limit >= wm831x_usb_limits[i] && > >> + wm831x_usb_limits[best] < wm831x_usb_limits[i]) > >> + best = i; > >> + } > >> + > >> + dev_dbg(wm831x_power->wm831x->dev, > >> + "Limiting USB current to %umA", wm831x_usb_limits[best]); > >> + > >> + wm831x_set_bits(wm831x_power->wm831x, WM831X_POWER_STATE, > >> + WM831X_USB_ILIM_MASK, best); > >> + > >> + return 0; > >> +} > >> + > >> /* > >> * Battery properties > >> */ > >> @@ -607,6 +647,19 @@ static int wm831x_power_probe(struct platform_device > >> *pdev) > >> } > >> } > >> > >> + power->usb_phy = devm_usb_get_phy_by_phandle(>dev, > >> + "usb-phy", 0); > >> + if (!IS_ERR(power->usb_phy)) { > >> + power->usb_notify.notifier_call = wm831x_usb_limit_change; > >> + ret = usb_register_notifier(power->usb_phy, > >> + >usb_notify); > >> + if (ret) { > >> + dev_err(>dev, "Failed to register notifier: > >> %d\n", > >> + ret); > >> + goto err_bat_irq; > >> + } > >> + } > > > > No error handling for power->usb_phy? I think you should bail out > > for all errors except for "not defined in DT". Especially I would > > expect probe defer handling in case the power supply driver is > > loaded before the phy driver. > > Make sense. So I think I need to change like below: > > power->usb_phy = devm_usb_get_phy_by_phandle(>dev, "usb-phy", 0); > if (!IS_ERR(power->usb_phy)) { > power->usb_notify.notifier_call = wm831x_usb_limit_change; > ret = usb_register_notifier(power->usb_phy, >usb_notify); > if (ret) { > dev_err(>dev, "Failed to
Re: [PATCH 1/1] mm/hugetlb: Make huge_pte_offset() consistent and document behaviour
Hi Michal, Michal Hocko writes: > On Wed 26-07-17 10:50:38, Michal Hocko wrote: >> On Tue 25-07-17 16:41:14, Punit Agrawal wrote: >> > When walking the page tables to resolve an address that points to >> > !p*d_present() entry, huge_pte_offset() returns inconsistent values >> > depending on the level of page table (PUD or PMD). >> > >> > It returns NULL in the case of a PUD entry while in the case of a PMD >> > entry, it returns a pointer to the page table entry. >> > >> > A similar inconsitency exists when handling swap entries - returns NULL >> > for a PUD entry while a pointer to the pte_t is retured for the PMD >> > entry. >> > >> > Update huge_pte_offset() to make the behaviour consistent - return NULL >> > in the case of p*d_none() and a pointer to the pte_t for hugepage or >> > swap entries. >> > >> > Document the behaviour to clarify the expected behaviour of this >> > function. This is to set clear semantics for architecture specific >> > implementations of huge_pte_offset(). >> >> hugetlb pte semantic is a disaster and I agree it could see some >> cleanup/clarifications but I am quite nervous to see a patchi like this. >> How do we check that nothing will get silently broken by this change? Glad I'm not the only one who finds the hugetlb semantics somewhat confusing. :) I've been running tests from mce-test suite and libhugetlbfs for similar changes we did on arm64. There could be assumptions that were not exercised but I'm not sure how to check for all the possible usages. Do you have any other suggestions that can help improve confidence in the patch? > > Forgot to add. Hugetlb have been special because of the pte sharing. I > haven't looked into that code for quite some time but there might be a > good reason why pud behave differently. I checked the code and don't see anything that would explain (or require) the difference in behaviour.
[PATCH] init:main.c: Fixed issues for Block comments and
Signed-off-by: Janani S --- init/main.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/init/main.c b/init/main.c index 052481f..f8eb4966 100644 --- a/init/main.c +++ b/init/main.c @@ -181,7 +181,8 @@ static bool __init obsolete_checksetup(char *line) /* Already done in parse_early_param? * (Needs exact match on param part). * Keep iterating, as we can have early -* params and __setups of same names 8( */ +* params and __setups of same names +*/ if (line[n] == '\0' || line[n] == '=') had_early_param = true; } else if (!p->setup_func) { @@ -693,9 +694,9 @@ asmlinkage __visible void __init start_kernel(void) arch_post_acpi_subsys_init(); sfi_init_late(); - if (efi_enabled(EFI_RUNTIME_SERVICES)) { + if (efi_enabled(EFI_RUNTIME_SERVICES)) efi_free_boot_services(); - } + /* Do the rest non-__init'ed, we're now alive */ rest_init(); -- 1.9.1
[tip:locking/core] kasan: Allow kasan_check_read/write() to accept pointers to volatiles
Commit-ID: f06e8c584fa0d05312c11ea66194f3d2efb93c21 Gitweb: http://git.kernel.org/tip/f06e8c584fa0d05312c11ea66194f3d2efb93c21 Author: Dmitry Vyukov AuthorDate: Thu, 22 Jun 2017 16:14:17 +0200 Committer: Ingo Molnar CommitDate: Wed, 26 Jul 2017 13:08:54 +0200 kasan: Allow kasan_check_read/write() to accept pointers to volatiles Currently kasan_check_read/write() accept 'const void*', make them accept 'const volatile void*'. This is required for instrumentation of atomic operations and there is just no reason to not allow that. Signed-off-by: Dmitry Vyukov Reviewed-by: Andrey Ryabinin Acked-by: Mark Rutland Cc: Andrew Morton Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: kasan-...@googlegroups.com Cc: linux...@kvack.org Cc: will.dea...@arm.com Link: http://lkml.kernel.org/r/33e5ec275c1ee89299245b2ebbccd63709c6021f.1498140838.git.dvyu...@google.com Signed-off-by: Ingo Molnar --- include/linux/kasan-checks.h | 10 ++ mm/kasan/kasan.c | 4 ++-- 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/include/linux/kasan-checks.h b/include/linux/kasan-checks.h index b7f8ace..41960fe 100644 --- a/include/linux/kasan-checks.h +++ b/include/linux/kasan-checks.h @@ -2,11 +2,13 @@ #define _LINUX_KASAN_CHECKS_H #ifdef CONFIG_KASAN -void kasan_check_read(const void *p, unsigned int size); -void kasan_check_write(const void *p, unsigned int size); +void kasan_check_read(const volatile void *p, unsigned int size); +void kasan_check_write(const volatile void *p, unsigned int size); #else -static inline void kasan_check_read(const void *p, unsigned int size) { } -static inline void kasan_check_write(const void *p, unsigned int size) { } +static inline void kasan_check_read(const volatile void *p, unsigned int size) +{ } +static inline void kasan_check_write(const volatile void *p, unsigned int size) +{ } #endif #endif diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c index ca11bc4..6f319fb 100644 --- a/mm/kasan/kasan.c +++ b/mm/kasan/kasan.c @@ -267,13 +267,13 @@ static void check_memory_region(unsigned long addr, check_memory_region_inline(addr, size, write, ret_ip); } -void kasan_check_read(const void *p, unsigned int size) +void kasan_check_read(const volatile void *p, unsigned int size) { check_memory_region((unsigned long)p, size, false, _RET_IP_); } EXPORT_SYMBOL(kasan_check_read); -void kasan_check_write(const void *p, unsigned int size) +void kasan_check_write(const volatile void *p, unsigned int size) { check_memory_region((unsigned long)p, size, true, _RET_IP_); }
Re: [PATCH] arm64: Convert to using %pOF instead of full_name
Sorry about the false positive. I will push a fix for that later today or tomorrow at the latest. regards, dan carpenter
[tip:x86/asm] x86/kconfig: Make it easier to switch to the new ORC unwinder
Commit-ID: a34a766ff96d9e88572e35a45066279e40a85d84 Gitweb: http://git.kernel.org/tip/a34a766ff96d9e88572e35a45066279e40a85d84 Author: Josh Poimboeuf AuthorDate: Mon, 24 Jul 2017 18:36:58 -0500 Committer: Ingo Molnar CommitDate: Wed, 26 Jul 2017 13:18:20 +0200 x86/kconfig: Make it easier to switch to the new ORC unwinder A couple of Kconfig changes which make it much easier to switch to the new CONFIG_ORC_UNWINDER: 1) Remove x86 dependencies on CONFIG_FRAME_POINTER for lockdep, latencytop, and fault injection. x86 has a 'guess' unwinder which just scans the stack for kernel text addresses. It's not 100% accurate but in many cases it's good enough. This allows those users who don't want the text overhead of the frame pointer or ORC unwinders to still use these features. More importantly, this also makes it much more straightforward to disable frame pointers. 2) Make CONFIG_ORC_UNWINDER depend on !CONFIG_FRAME_POINTER. While it would be possible to have both enabled, it doesn't really make sense to do so. So enforce a sane configuration to prevent the user from making a dumb mistake. With these changes, when you disable CONFIG_FRAME_POINTER, "make oldconfig" will ask if you want to enable CONFIG_ORC_UNWINDER. Signed-off-by: Josh Poimboeuf Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Jiri Slaby Cc: Linus Torvalds Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: live-patch...@vger.kernel.org Link: http://lkml.kernel.org/r/9985fb91ce5005fe33ea5cc2a20f14bd33c61d03.1500938583.git.jpoim...@redhat.com Signed-off-by: Ingo Molnar --- arch/x86/Kconfig.debug | 7 +++ lib/Kconfig.debug | 6 +++--- 2 files changed, 6 insertions(+), 7 deletions(-) diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug index dc10ec6..268a318 100644 --- a/arch/x86/Kconfig.debug +++ b/arch/x86/Kconfig.debug @@ -357,7 +357,7 @@ config PUNIT_ATOM_DEBUG config ORC_UNWINDER bool "ORC unwinder" - depends on X86_64 + depends on X86_64 && !FRAME_POINTER select STACK_VALIDATION ---help--- This option enables the ORC (Oops Rewind Capability) unwinder for @@ -365,9 +365,8 @@ config ORC_UNWINDER a simplified version of the DWARF Call Frame Information standard. This unwinder is more accurate across interrupt entry frames than the - frame pointer unwinder. It can also enable a 5-10% performance - improvement across the entire kernel if CONFIG_FRAME_POINTER is - disabled. + frame pointer unwinder. It also enables a 5-10% performance + improvement across the entire kernel compared to frame pointers. Enabling this option will increase the kernel's runtime memory usage by roughly 2-4MB, depending on your kernel config. diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 0f0d019..32a48e7 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1124,7 +1124,7 @@ config LOCKDEP bool depends on DEBUG_KERNEL && TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT select STACKTRACE - select FRAME_POINTER if !MIPS && !PPC && !ARM_UNWIND && !S390 && !MICROBLAZE && !ARC && !SCORE + select FRAME_POINTER if !MIPS && !PPC && !ARM_UNWIND && !S390 && !MICROBLAZE && !ARC && !SCORE && !X86 select KALLSYMS select KALLSYMS_ALL @@ -1543,7 +1543,7 @@ config FAULT_INJECTION_STACKTRACE_FILTER depends on FAULT_INJECTION_DEBUG_FS && STACKTRACE_SUPPORT depends on !X86_64 select STACKTRACE - select FRAME_POINTER if !MIPS && !PPC && !S390 && !MICROBLAZE && !ARM_UNWIND && !ARC && !SCORE + select FRAME_POINTER if !MIPS && !PPC && !S390 && !MICROBLAZE && !ARM_UNWIND && !ARC && !SCORE && !X86 help Provide stacktrace filter for fault-injection capabilities @@ -1552,7 +1552,7 @@ config LATENCYTOP depends on DEBUG_KERNEL depends on STACKTRACE_SUPPORT depends on PROC_FS - select FRAME_POINTER if !MIPS && !PPC && !S390 && !MICROBLAZE && !ARM_UNWIND && !ARC + select FRAME_POINTER if !MIPS && !PPC && !S390 && !MICROBLAZE && !ARM_UNWIND && !ARC && !X86 select KALLSYMS select KALLSYMS_ALL select STACKTRACE
[tip:x86/asm] x86/unwind: Add the ORC unwinder
Commit-ID: ee9f8fce99640811b2b8e79d0d1dbe8bab69ba67 Gitweb: http://git.kernel.org/tip/ee9f8fce99640811b2b8e79d0d1dbe8bab69ba67 Author: Josh Poimboeuf AuthorDate: Mon, 24 Jul 2017 18:36:57 -0500 Committer: Ingo Molnar CommitDate: Wed, 26 Jul 2017 13:18:20 +0200 x86/unwind: Add the ORC unwinder Add the new ORC unwinder which is enabled by CONFIG_ORC_UNWINDER=y. It plugs into the existing x86 unwinder framework. It relies on objtool to generate the needed .orc_unwind and .orc_unwind_ip sections. For more details on why ORC is used instead of DWARF, see Documentation/x86/orc-unwinder.txt - but the short version is that it's a simplified, fundamentally more robust debugninfo data structure, which also allows up to two orders of magnitude faster lookups than the DWARF unwinder - which matters to profiling workloads like perf. Thanks to Andy Lutomirski for the performance improvement ideas: splitting the ORC unwind table into two parallel arrays and creating a fast lookup table to search a subset of the unwind table. Signed-off-by: Josh Poimboeuf Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Jiri Slaby Cc: Linus Torvalds Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: live-patch...@vger.kernel.org Link: http://lkml.kernel.org/r/0a6cbfb40f8da99b7a45a1a8302dc6aef16ec812.1500938583.git.jpoim...@redhat.com [ Extended the changelog. ] Signed-off-by: Ingo Molnar --- Documentation/x86/orc-unwinder.txt | 179 arch/um/include/asm/unwind.h | 8 + arch/x86/Kconfig | 1 + arch/x86/Kconfig.debug | 25 ++ arch/x86/include/asm/module.h | 9 + arch/x86/include/asm/orc_lookup.h | 46 +++ arch/x86/include/asm/orc_types.h | 2 +- arch/x86/include/asm/unwind.h | 76 +++-- arch/x86/kernel/Makefile | 8 +- arch/x86/kernel/module.c | 11 +- arch/x86/kernel/setup.c| 3 + arch/x86/kernel/unwind_frame.c | 39 +-- arch/x86/kernel/unwind_guess.c | 5 + arch/x86/kernel/unwind_orc.c | 582 + arch/x86/kernel/vmlinux.lds.S | 3 + include/asm-generic/vmlinux.lds.h | 27 +- lib/Kconfig.debug | 3 + scripts/Makefile.build | 14 +- 18 files changed, 977 insertions(+), 64 deletions(-) diff --git a/Documentation/x86/orc-unwinder.txt b/Documentation/x86/orc-unwinder.txt new file mode 100644 index 000..af0c9a4 --- /dev/null +++ b/Documentation/x86/orc-unwinder.txt @@ -0,0 +1,179 @@ +ORC unwinder + + +Overview + + +The kernel CONFIG_ORC_UNWINDER option enables the ORC unwinder, which is +similar in concept to a DWARF unwinder. The difference is that the +format of the ORC data is much simpler than DWARF, which in turn allows +the ORC unwinder to be much simpler and faster. + +The ORC data consists of unwind tables which are generated by objtool. +They contain out-of-band data which is used by the in-kernel ORC +unwinder. Objtool generates the ORC data by first doing compile-time +stack metadata validation (CONFIG_STACK_VALIDATION). After analyzing +all the code paths of a .o file, it determines information about the +stack state at each instruction address in the file and outputs that +information to the .orc_unwind and .orc_unwind_ip sections. + +The per-object ORC sections are combined at link time and are sorted and +post-processed at boot time. The unwinder uses the resulting data to +correlate instruction addresses with their stack states at run time. + + +ORC vs frame pointers +- + +With frame pointers enabled, GCC adds instrumentation code to every +function in the kernel. The kernel's .text size increases by about +3.2%, resulting in a broad kernel-wide slowdown. Measurements by Mel +Gorman [1] have shown a slowdown of 5-10% for some workloads. + +In contrast, the ORC unwinder has no effect on text size or runtime +performance, because the debuginfo is out of band. So if you disable +frame pointers and enable the ORC unwinder, you get a nice performance +improvement across the board, and still have reliable stack traces. + +Ingo Molnar says: + + "Note that it's not just a performance improvement, but also an + instruction cache locality improvement: 3.2% .text savings almost + directly transform into a similarly sized reduction in cache + footprint. That can transform to even higher speedups for workloads + whose cache locality is borderline." + +Another benefit of ORC compared to frame pointers is that it can +reliably unwind across interrupts and exceptions. Frame pointer based +unwinds can sometimes skip the caller of the interrupted function, if it +was a leaf function or if the interrupt hit before the frame pointer was +saved. + +The main disadvantage of the ORC unwinder compared to frame pointers is +that it needs more memory to store the ORC unwind tables: roughly 2-4MB +depending
[tip:x86/asm] x86/kconfig: Consolidate unwinders into multiple choice selection
Commit-ID: 81d387190039c14edac8de2b3ec789beb899afd9 Gitweb: http://git.kernel.org/tip/81d387190039c14edac8de2b3ec789beb899afd9 Author: Josh Poimboeuf AuthorDate: Tue, 25 Jul 2017 08:54:24 -0500 Committer: Ingo Molnar CommitDate: Wed, 26 Jul 2017 14:05:36 +0200 x86/kconfig: Consolidate unwinders into multiple choice selection There are three mutually exclusive unwinders. Make that more obvious by combining them into a multiple-choice selection: CONFIG_FRAME_POINTER_UNWINDER CONFIG_ORC_UNWINDER CONFIG_GUESS_UNWINDER (if CONFIG_EXPERT=y) Frame pointers are still the default (for now). The old CONFIG_FRAME_POINTER option is still used in some arch-independent places, so keep it around, but make it invisible to the user on x86 - it's now selected by CONFIG_FRAME_POINTER_UNWINDER=y. Suggested-by: Ingo Molnar Signed-off-by: Josh Poimboeuf Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Jiri Slaby Cc: Linus Torvalds Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: live-patch...@vger.kernel.org Link: http://lkml.kernel.org/r/20170725135424.zukjmgpz3plf5pmt@treble Signed-off-by: Ingo Molnar --- arch/x86/Kconfig | 3 +-- arch/x86/Kconfig.debug| 47 --- arch/x86/configs/tiny.config | 2 ++ arch/x86/include/asm/unwind.h | 4 ++-- 4 files changed, 45 insertions(+), 11 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 7ccf26a..9b30212 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -73,7 +73,6 @@ config X86 select ARCH_USE_QUEUED_RWLOCKS select ARCH_USE_QUEUED_SPINLOCKS select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH - select ARCH_WANT_FRAME_POINTERS select ARCH_WANTS_DYNAMIC_TASK_STRUCT select ARCH_WANTS_THP_SWAP if X86_64 select BUILDTIME_EXTABLE_SORT @@ -168,7 +167,7 @@ config X86 select HAVE_PERF_REGS select HAVE_PERF_USER_STACK_DUMP select HAVE_REGS_AND_STACK_ACCESS_API - select HAVE_RELIABLE_STACKTRACE if X86_64 && FRAME_POINTER && STACK_VALIDATION + select HAVE_RELIABLE_STACKTRACE if X86_64 && FRAME_POINTER_UNWINDER && STACK_VALIDATION select HAVE_STACK_VALIDATIONif X86_64 select HAVE_SYSCALL_TRACEPOINTS select HAVE_UNSTABLE_SCHED_CLOCK diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug index 268a318..93bbb31 100644 --- a/arch/x86/Kconfig.debug +++ b/arch/x86/Kconfig.debug @@ -355,9 +355,32 @@ config PUNIT_ATOM_DEBUG The current power state can be read from /sys/kernel/debug/punit_atom/dev_power_state +choice + prompt "Choose kernel unwinder" + default FRAME_POINTER_UNWINDER + ---help--- + This determines which method will be used for unwinding kernel stack + traces for panics, oopses, bugs, warnings, perf, /proc//stack, + livepatch, lockdep, and more. + +config FRAME_POINTER_UNWINDER + bool "Frame pointer unwinder" + select FRAME_POINTER + ---help--- + This option enables the frame pointer unwinder for unwinding kernel + stack traces. + + The unwinder itself is fast and it uses less RAM than the ORC + unwinder, but the kernel text size will grow by ~3% and the kernel's + overall performance will degrade by roughly 5-10%. + + This option is recommended if you want to use the livepatch + consistency model, as this is currently the only way to get a + reliable stack trace (CONFIG_HAVE_RELIABLE_STACKTRACE). + config ORC_UNWINDER bool "ORC unwinder" - depends on X86_64 && !FRAME_POINTER + depends on X86_64 select STACK_VALIDATION ---help--- This option enables the ORC (Oops Rewind Capability) unwinder for @@ -371,12 +394,22 @@ config ORC_UNWINDER Enabling this option will increase the kernel's runtime memory usage by roughly 2-4MB, depending on your kernel config. -config FRAME_POINTER_UNWINDER - def_bool y - depends on !ORC_UNWINDER && FRAME_POINTER - config GUESS_UNWINDER - def_bool y - depends on !ORC_UNWINDER && !FRAME_POINTER + bool "Guess unwinder" + depends on EXPERT + ---help--- + This option enables the "guess" unwinder for unwinding kernel stack + traces. It scans the stack and reports every kernel text address it + finds. Some of the addresses it reports may be incorrect. + + While this option often produces false positives, it can still be + useful in many cases. Unlike the other unwinders, it has no runtime + overhead. + +endchoice + +config FRAME_POINTER + depends on !ORC_UNWINDER && !GUESS_UNWINDER + bool endmenu diff --git a/arch/x86/configs/tiny.config b/arch/x86/configs/tiny.config index 4b429df..550cd50 100644 ---
Re: [PATCH net-next v2 01/10] net: dsa: lan9303: Fixed MDIO interface
On 25. juli 2017 21:15, Vivien Didelot wrote: Hi Egil, Egil Hjelmeland writes: Fixes after testing on actual HW: - lan9303_mdio_write()/_read() must multiply register number by 4 to get offset - Indirect access (PMI) to phy register only work in I2C mode. In MDIO mode phy registers must be accessed directly. Introduced struct lan9303_phy_ops to handle the two modes. Renamed functions to clarify. - lan9303_detect_phy_setup() : Failed MDIO read return 0x. Handle that. Small patch series when possible are better. Bullet points in commit messages are likely to describe how a patch or series may be split up ;-) This patch seems to be the unique patch of the series resolving what is described in the cover letter as "Make the MDIO interface work". I'd suggest you to split up this one commit in several *atomic* and easy to review patches and send them separately as on thread named "net: dsa: lan9303: fix MDIO interface" (also note that imperative is prefered for subject lines, see: https://chris.beams.io/posts/git-commit/#imperative) <...> -static int lan9303_port_phy_reg_wait_for_completion(struct lan9303 *chip) +static int lan9303_indirect_phy_wait_for_completion(struct lan9303 *chip) For instance you can have a first commit only renaming the functions. The reason for it is to separate the functional changes from cosmetic changes, which makes it easier for review. <...> Thank you for reviewing. I can split the first patch. I can also split the patch series to more digestible series. But since most of the patches touches the same file, I assume that each series must be completed and applied before starting on a new one. So I really want to group the patches into only a few series in order to not spend months on the process. + if ((reg != 0) && (reg != 0x)) if (reg && reg != 0x) should be enough. Of course. +struct lan9303_phy_ops { + /* PHY 1 &2 access*/ The spacing is weird in the comment. "/* PHY 1 & 2 access */" maybe? Yes. +int lan9303_mdio_phy_write(struct lan9303 *chip, int phy, int regnum, u16 val) +{ + struct lan9303_mdio *sw_dev = dev_get_drvdata(chip->dev); + struct mdio_device *mdio = sw_dev->device; + + mutex_lock(>bus->mdio_lock); + mdio->bus->write(mdio->bus, phy, regnum, val); + mutex_unlock(>bus->mdio_lock); This is exactly what mdiobus_write(mdio->bus, phy, regnum, val) is doing. There are very few valid reasons to go play in the mii_bus structure, using generic APIs are strongly prefered. Plus you have checks and traces for free! Lack of oversight was the only reason. I just adapted stuff from lan9303_mdio_phy_write above. Will switch to mdiobus_write of course. Same here, mdiobus_read(). Ditto. Thanks, Vivien Appreciated, Egil
Re: [PATCH v7 12/13] ACPI / init: Invoke early ACPI initialization earlier
Hi Baoquan, At 07/18/2017 04:45 PM, b...@redhat.com wrote: On 07/18/17 at 02:08pm, Dou Liyang wrote: Hi, Zheng At 07/18/2017 01:18 PM, Zheng, Lv wrote: Hi, Can the problem be fixed by invoking acpi_put_table() for mapped DMAR table? Invoking acpi_put_table() is my first choice. But it made the kernel *panic* when we try to get the table again in intel_iommu_init() in late stage. I am also confused that: There are two places where we used DMAR table in Linux: 1) In detect_intel_iommu() in ACPI early stage: ... status = acpi_get_table(ACPI_SIG_DMAR, 0, _tbl); if (dmar_tbl) { acpi_put_table(dmar_tbl); dmar_tbl = NULL; } 2) In dmar_table_init() in ACPI late stage: ... status = acpi_get_table(ACPI_SIG_DMAR, 0, _tbl); ... As we know, dmar_table_init() is called by intel_iommu_init() and intel_prepare_irq_remapping(). When I invoked acpi_put_table() in the intel_prepare_irq_remapping() in early stage like 1) shows, kernel will panic. That's because acpi_put_table() will make the table pointer be NULL, while dmar_table_init() will skip parse_dmar_table() calling if dmar_table_initialized is set to 1 in intel_prepare_irq_remapping(). Dmar hardware support interrupt remapping and io remapping separately. But intel_iommu_init() is called later than intel_prepare_irq_remapping(). So what if make dmar_table_init() a reentrant function? You can just have a try, but maybe not a good idea, the dmar table will be parsed twice. The true reason why the kernel panic is that acpi_put_table() only released DMAR table structure, but not released the remapping structures in DMAR table, such as DRHD, RMRR. So the address of RMRR parsed in early ACPI stage will be used in late ACPI stage in intel_iommu_init(), which make the kernel panic. The solution is invoking the intel_iommu_free_dmars() before dmar_table_init() in intel_iommu_init() to release the RMRR. Demo code will show at the bottom. I prefer to invoke acpi_early_init() earlier. But it needs a regression test[1]. I am looking for Thinkpad x121e (AMD E-450 APU) to test. I have tested it in Thinkpad s430, It's OK. BTY, I am confused how does the ACPI subsystem affect PIT which will be used to fast calibrate CPU frequency[2]. Do you have any idea? [1] https://lkml.org/lkml/2014/3/10/123 [2] https://lkml.org/lkml/2014/3/12/3 drivers/iommu/dmar.c| 27 +++ drivers/iommu/intel-iommu.c | 2 ++ drivers/iommu/intel_irq_remapping.c | 17 - include/linux/dmar.h| 2 ++ init/main.c | 2 +- 5 files changed, 32 insertions(+), 18 deletions(-) diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index c8b0329..e6261b7 100644 --- a/drivers/iommu/dmar.c +++ b/drivers/iommu/dmar.c @@ -68,6 +68,8 @@ DECLARE_RWSEM(dmar_global_lock); LIST_HEAD(dmar_drhd_units); struct acpi_table_header * __initdata dmar_tbl; +struct acpi_table_header * __initdata dmar_tbl_original; + static int dmar_dev_scope_status = 1; static unsigned long dmar_seq_ids[BITS_TO_LONGS(DMAR_UNITS_SUPPORTED)]; @@ -627,6 +629,7 @@ parse_dmar_table(void) * fixed map. */ dmar_table_detect(); + dmar_tbl_original = dmar_tbl; /* * ACPI tables may not be DMA protected by tboot, so use DMAR copy @@ -811,26 +814,18 @@ int __init dmar_dev_scope_init(void) int __init dmar_table_init(void) { - static int dmar_table_initialized; int ret; - if (dmar_table_initialized == 0) { - ret = parse_dmar_table(); - if (ret < 0) { - if (ret != -ENODEV) - pr_info("Parse DMAR table failure.\n"); - } else if (list_empty(_drhd_units)) { - pr_info("No DMAR devices found\n"); - ret = -ENODEV; - } - - if (ret < 0) - dmar_table_initialized = ret; - else - dmar_table_initialized = 1; + ret = parse_dmar_table(); + if (ret < 0) { + if (ret != -ENODEV) + pr_info("Parse DMAR table failure.\n"); + } else if (list_empty(_drhd_units)) { + pr_info("No DMAR devices found\n"); + ret = -ENODEV; } - return dmar_table_initialized < 0 ? dmar_table_initialized : 0; + return ret; } static void warn_invalid_dmar(u64 addr, const char *message) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 687f18f..90f74f4 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -4832,6 +4832,8 @@ int __init intel_iommu_init(void) } down_write(_global_lock); + + intel_iommu_free_dmars(); if (dmar_table_init()) { if (force_on) panic("tboot: Failed to initialize DMAR table\n"); diff --git
Re: [PATCH] mm: take memory hotplug lock within numa_zonelist_order_handler()
On Wed 26-07-17 13:48:12, Heiko Carstens wrote: > On Wed, Jul 26, 2017 at 01:31:12PM +0200, Michal Hocko wrote: > > On Wed 26-07-17 13:17:38, Heiko Carstens wrote: > > [...] > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > index 6d30e914afb6..fc32aa81f359 100644 > > > --- a/mm/page_alloc.c > > > +++ b/mm/page_alloc.c > > > @@ -4891,9 +4891,11 @@ int numa_zonelist_order_handler(struct ctl_table > > > *table, int write, > > > NUMA_ZONELIST_ORDER_LEN); > > > user_zonelist_order = oldval; > > > } else if (oldval != user_zonelist_order) { > > > + mem_hotplug_begin(); > > > mutex_lock(_mutex); > > > build_all_zonelists(NULL, NULL); > > > mutex_unlock(_mutex); > > > + mem_hotplug_done(); > > > } > > > } > > > out: > > > > Please note that this code has been removed by > > http://lkml.kernel.org/r/20170721143915.14161-2-mho...@kernel.org. It > > will get to linux-next as soon as Andrew releases a new version mmotm > > tree. > > We still would need something for 4.13, no? If this presents a real problem then yes. Has this happened in a real workload or during some artificial test? I mean the code has been like that for ages and nobody noticed/reported any problems. That being said, I do not have anything against your patch. It is trivial to rebase mine on top of yours. I am just not sure it is worth the code churn. E.g. do you think this patch is a stable backport material? -- Michal Hocko SUSE Labs
Re: linux-next: unsigned commits in the drm-misc tree
On Wed, Jul 26, 2017 at 9:09 AM, Daniel Vetter wrote: > Oops, that shouldn't have happened. Actually, our maintainer tooling > ensures this doesn't happen, by auto-adding the committer sob line. > But these patches (and a bunch of others pushed by Benjamin) haven't > been pushed by our tooling it seems (the Link: tag is missing at > least). > > Benjamin, what happened there? Ok, figured it out, added another safety check to the scripting, and hard-reset the tree. Unfortunately some of the patches already landed in drm-next, so that needed a hard-reset too, plus in drm-intel-next, where I still need to do the hard-reset. Ugh. Benjamin: As part of the hard-reset I've thrown out all the patches you've committed. That was simpler than digging out the right patches from the rebase push. Please re-apply and push the right ones again. My apologies for the hiccup, we maintainers (Dave, Sean & me) should have caught this earlier. Thanks, Daniel > > Thanks, Daniel > > > On Wed, Jul 26, 2017 at 8:11 AM, Stephen Rothwell > wrote: >> Hi all, >> >> I noticed a set of commits that have no Signed-off-by from their >> committer: >> >> d9864a1d2dfc ("drm/stm: drv: Rename platform driver name") >> >> to >> >> ed34d261a12a ("drm/stm: dsi: Constify phy ops structure") >> >> -- >> Cheers, >> Stephen Rothwell > > > > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch
[PATCH] iommu/amd: Fix schedule-while-atomic BUG in initialization code
Hi Artem, Thomas, On Wed, Jul 26, 2017 at 12:42:49PM +0200, Thomas Gleixner wrote: > On Tue, 25 Jul 2017, Artem Savkov wrote: > > > Hi, > > > > Commit 1c3c5ea "sched/core: Enable might_sleep() and smp_processor_id() > > checks early" seem to have uncovered an issue with amd-iommu/x2apic. > > > > Starting with that commit the following warning started to show up on AMD > > systems during boot: > > > [0.16] BUG: sleeping function called from invalid context at > > kernel/locking/mutex.c:747 > > > [0.16] mutex_lock_nested+0x1b/0x20 > > [0.16] register_syscore_ops+0x1d/0x70 > > [0.16] state_next+0x119/0x910 > > [0.16] iommu_go_to_state+0x29/0x30 > > [0.16] amd_iommu_enable+0x13/0x23 > > [0.16] irq_remapping_enable+0x1b/0x39 > > [0.16] enable_IR_x2apic+0x91/0x196 > > [0.16] default_setup_apic_routing+0x16/0x6e > > [0.16] native_smp_prepare_cpus+0x257/0x2d5 Thanks for the report! > --- a/drivers/iommu/amd_iommu_init.c > +++ b/drivers/iommu/amd_iommu_init.c > @@ -2440,7 +2440,6 @@ static int __init state_next(void) > break; > case IOMMU_ACPI_FINISHED: > early_enable_iommus(); > - register_syscore_ops(_iommu_syscore_ops); > x86_platform.iommu_shutdown = disable_iommus; > init_state = IOMMU_ENABLED; > break; > @@ -2559,6 +2558,8 @@ static int __init amd_iommu_init(void) > for_each_iommu(iommu) > iommu_flush_all_caches(iommu); > } > + } else { > + register_syscore_ops(_iommu_syscore_ops); > } > > return ret; Yes, that should fix it, but I think its better to just move the register_syscore_ops() call to a later initialization step, like in the patch below. I tested it an will queue it to my iommu/fixes branch. >From 461242d7211c901b6ccdf349cc89235bd5da Mon Sep 17 00:00:00 2001 From: Joerg Roedel Date: Wed, 26 Jul 2017 14:17:55 +0200 Subject: [PATCH] iommu/amd: Fix schedule-while-atomic BUG in initialization code The register_syscore_ops() function takes a mutex and might sleep. In the IOMMU initialization code it is invoked during irq-remapping setup already, where irqs are disabled. This causes a schedule-while-atomic bug: BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747 in_atomic(): 0, irqs_disabled(): 1, pid: 1, name: swapper/0 no locks held by swapper/0/1. irq event stamp: 304 hardirqs last enabled at (303): [] _raw_spin_unlock_irqrestore+0x36/0x60 hardirqs last disabled at (304): [] enable_IR_x2apic+0x79/0x196 softirqs last enabled at (36): [] __do_softirq+0x35f/0x4ec softirqs last disabled at (31): [] irq_exit+0x105/0x120 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc2.1.el7a.test.x86_64.debug #1 Hardware name: PowerEdge C6145 /040N24, BIOS 3.5.0 10/28/2014 Call Trace: dump_stack+0x85/0xca ___might_sleep+0x22a/0x260 __might_sleep+0x4a/0x80 __mutex_lock+0x58/0x960 ? iommu_completion_wait.part.17+0xb5/0x160 ? register_syscore_ops+0x1d/0x70 ? iommu_flush_all_caches+0x120/0x150 mutex_lock_nested+0x1b/0x20 register_syscore_ops+0x1d/0x70 state_next+0x119/0x910 iommu_go_to_state+0x29/0x30 amd_iommu_enable+0x13/0x23 Fix it by moving the register_syscore_ops() call to the next initialization step, which runs with irqs enabled. Signed-off-by: Joerg Roedel --- drivers/iommu/amd_iommu_init.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index 5cc597b383c7..372303700566 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -2440,11 +2440,11 @@ static int __init state_next(void) break; case IOMMU_ACPI_FINISHED: early_enable_iommus(); - register_syscore_ops(_iommu_syscore_ops); x86_platform.iommu_shutdown = disable_iommus; init_state = IOMMU_ENABLED; break; case IOMMU_ENABLED: + register_syscore_ops(_iommu_syscore_ops); ret = amd_iommu_init_pci(); init_state = ret ? IOMMU_INIT_ERROR : IOMMU_PCI_INIT; enable_iommus_v2(); -- 2.13.1
Re: [PATCH V3 1/4] ARM64: dts: rockchip: rk3328 add iommu nodes
Hey Heiko, On Wed, Jul 26, 2017 at 01:44:02PM +0200, Heiko Stübner wrote: > I really would prefer iommu dt-nodes going through my tree :-) > > Especially as parts of these conflict with already pending patches for > graphics support and with the iommu nodes sitting in your tree these > would need to wait another kernel release. Sure, no problem. I have nothing pushed yet, so it's easy to remove again. Do you want to take all three patch-sets from Simon through your tree or just this one? Regards, Joerg
Re: [RFC PATCH 3/5] mm, memory_hotplug: allocate memmap from the added memory range for sparse-vmemmap
On Wed 26-07-17 13:45:39, Heiko Carstens wrote: [...] > In general I do like your idea, however if I understand your patches > correctly we might have an ordering problem on s390: it is not possible to > access hot-added memory on s390 before it is online (MEM_GOING_ONLINE > succeeded). Could you point me to the code please? I cannot seem to find the notifier which implements that. > On MEM_GOING_ONLINE we ask the hypervisor to back the potential available > hot-added memory region with physical pages. Accessing those ranges before > that will result in an exception. Can we make the range which backs the memmap range available? E.g from s390 specific __vmemmap_populate path? > However with your approach the memory is still allocated when add_memory() > is being called, correct? That wouldn't be a change to the current > behaviour; except for the ordering problem outlined above. Could you be more specific please? I do not change when the memmap is allocated. -- Michal Hocko SUSE Labs
Re: [PATCH 1/1] mm/hugetlb: Make huge_pte_offset() consistent and document behaviour
On Wed 26-07-17 13:11:46, Punit Agrawal wrote: > Hi Michal, > > Michal Hocko writes: > > > On Wed 26-07-17 10:50:38, Michal Hocko wrote: > >> On Tue 25-07-17 16:41:14, Punit Agrawal wrote: > >> > When walking the page tables to resolve an address that points to > >> > !p*d_present() entry, huge_pte_offset() returns inconsistent values > >> > depending on the level of page table (PUD or PMD). > >> > > >> > It returns NULL in the case of a PUD entry while in the case of a PMD > >> > entry, it returns a pointer to the page table entry. > >> > > >> > A similar inconsitency exists when handling swap entries - returns NULL > >> > for a PUD entry while a pointer to the pte_t is retured for the PMD > >> > entry. > >> > > >> > Update huge_pte_offset() to make the behaviour consistent - return NULL > >> > in the case of p*d_none() and a pointer to the pte_t for hugepage or > >> > swap entries. > >> > > >> > Document the behaviour to clarify the expected behaviour of this > >> > function. This is to set clear semantics for architecture specific > >> > implementations of huge_pte_offset(). > >> > >> hugetlb pte semantic is a disaster and I agree it could see some > >> cleanup/clarifications but I am quite nervous to see a patchi like this. > >> How do we check that nothing will get silently broken by this change? > > Glad I'm not the only one who finds the hugetlb semantics somewhat > confusing. :) This is a huge understatement. It is a source of nightmares. > I've been running tests from mce-test suite and libhugetlbfs for similar > changes we did on arm64. There could be assumptions that were not > exercised but I'm not sure how to check for all the possible usages. > > Do you have any other suggestions that can help improve confidence in > the patch? Unfortunatelly I don't. I just know there were many subtle assumptions all over the place so I am rather careful to not touch the code unless really necessary. That being said, I am not opposing your patch. -- Michal Hocko SUSE Labs
Re: [PATCH] memory: mtk-smi: Use of_device_get_match_data helper
On Wed, 2017-07-26 at 11:36 +0100, Robin Murphy wrote: > On 26/07/17 10:59, honghui.zh...@mediatek.com wrote: > > From: Honghui Zhang > > > > * for mtk smi gen 1, we need to get the ao(always on) base to config > > * m4u port, and we need to enable the aync clock for transform the smi > > * clock into emi clock domain, but for mtk smi gen2, there's no smi ao > > * base. > > */ > > - smi_gen = (enum mtk_smi_gen)of_id->data; > > - if (smi_gen == MTK_SMI_GEN1) { > > + smi_gen = of_device_get_match_data(dev); > > The data you're retrieving is the exact same thing as of_id->data was, > i.e. an enum mtk_smi_gen cast to void*, so dereferencing it is not a > good idea. The first patch was almost right; you just need to keep the > cast in the assignment to smi_gen. > > Robin. > Hi, Robin, thanks very much. I will send a new version. > > + if (*smi_gen == MTK_SMI_GEN1) { > > res = platform_get_resource(pdev, IORESOURCE_MEM, 0); > > common->smi_ao_base = devm_ioremap_resource(dev, res); > > if (IS_ERR(common->smi_ao_base)) > > >
Re: [PATCH v3 4/9] pwm: Add STM32 LPTimer PWM driver
On 07/07/2017 06:31 PM, Fabrice Gasnier wrote: > Add support for single PWM channel on Low-Power Timer, that can be > found on some STM32 platforms. > > Signed-off-by: Fabrice Gasnier > --- > Changes in v3: > - remove prescalers[] array, use power-of-2 presc directly > - Update following Thierry's comments: > - fix issue using FIELD_GET() macro > - Add get_state() callback > - remove some checks in probe > - slight rework 'reenable' flag > - use more common method to disable pwm in remove() Hi Thierry, Gentle ping for PWM driver review since I did changes in v3. Please advise. Best Regards, Fabrice > > Changes in v2: > - s/Low Power/Low-Power > - update few comment lines > --- > drivers/pwm/Kconfig| 10 ++ > drivers/pwm/Makefile | 1 + > drivers/pwm/pwm-stm32-lp.c | 246 > + > 3 files changed, 257 insertions(+) > create mode 100644 drivers/pwm/pwm-stm32-lp.c > > diff --git a/drivers/pwm/Kconfig b/drivers/pwm/Kconfig > index 313c107..7cb982b 100644 > --- a/drivers/pwm/Kconfig > +++ b/drivers/pwm/Kconfig > @@ -417,6 +417,16 @@ config PWM_STM32 > To compile this driver as a module, choose M here: the module > will be called pwm-stm32. > > +config PWM_STM32_LP > + tristate "STMicroelectronics STM32 PWM LP" > + depends on MFD_STM32_LPTIMER || COMPILE_TEST > + help > + Generic PWM framework driver for STMicroelectronics STM32 SoCs > + with Low-Power Timer (LPTIM). > + > + To compile this driver as a module, choose M here: the module > + will be called pwm-stm32-lp. > + > config PWM_STMPE > bool "STMPE expander PWM export" > depends on MFD_STMPE > diff --git a/drivers/pwm/Makefile b/drivers/pwm/Makefile > index 93da1f7..a3a4bee 100644 > --- a/drivers/pwm/Makefile > +++ b/drivers/pwm/Makefile > @@ -40,6 +40,7 @@ obj-$(CONFIG_PWM_SAMSUNG) += pwm-samsung.o > obj-$(CONFIG_PWM_SPEAR) += pwm-spear.o > obj-$(CONFIG_PWM_STI)+= pwm-sti.o > obj-$(CONFIG_PWM_STM32) += pwm-stm32.o > +obj-$(CONFIG_PWM_STM32_LP) += pwm-stm32-lp.o > obj-$(CONFIG_PWM_STMPE) += pwm-stmpe.o > obj-$(CONFIG_PWM_SUN4I) += pwm-sun4i.o > obj-$(CONFIG_PWM_TEGRA) += pwm-tegra.o > diff --git a/drivers/pwm/pwm-stm32-lp.c b/drivers/pwm/pwm-stm32-lp.c > new file mode 100644 > index 000..9793b29 > --- /dev/null > +++ b/drivers/pwm/pwm-stm32-lp.c > @@ -0,0 +1,246 @@ > +/* > + * STM32 Low-Power Timer PWM driver > + * > + * Copyright (C) STMicroelectronics 2017 > + * > + * Author: Gerald Baeza > + * > + * License terms: GNU General Public License (GPL), version 2 > + * > + * Inspired by Gerald Baeza's pwm-stm32 driver > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > + > +struct stm32_pwm_lp { > + struct pwm_chip chip; > + struct clk *clk; > + struct regmap *regmap; > +}; > + > +static inline struct stm32_pwm_lp *to_stm32_pwm_lp(struct pwm_chip *chip) > +{ > + return container_of(chip, struct stm32_pwm_lp, chip); > +} > + > +/* STM32 Low-Power Timer is preceded by a configurable power-of-2 prescaler > */ > +#define STM32_LPTIM_MAX_PRESCALER128 > + > +static int stm32_pwm_lp_apply(struct pwm_chip *chip, struct pwm_device *pwm, > + struct pwm_state *state) > +{ > + struct stm32_pwm_lp *priv = to_stm32_pwm_lp(chip); > + unsigned long long prd, div, dty; > + struct pwm_state cstate; > + u32 val, mask, cfgr, presc = 0; > + bool reenable; > + int ret; > + > + pwm_get_state(pwm, ); > + reenable = !cstate.enabled; > + > + if (!state->enabled) { > + if (cstate.enabled) { > + /* Disable LP timer */ > + ret = regmap_write(priv->regmap, STM32_LPTIM_CR, 0); > + if (ret) > + return ret; > + /* disable clock to PWM counter */ > + clk_disable(priv->clk); > + } > + return 0; > + } > + > + /* Calculate the period and prescaler value */ > + div = (unsigned long long)clk_get_rate(priv->clk) * state->period; > + do_div(div, NSEC_PER_SEC); > + prd = div; > + while (div > STM32_LPTIM_MAX_ARR) { > + presc++; > + if ((1 << presc) > STM32_LPTIM_MAX_PRESCALER) { > + dev_err(priv->chip.dev, "max prescaler exceeded\n"); > + return -EINVAL; > + } > + div = prd >> presc; > + } > + prd = div; > + > + /* Calculate the duty cycle */ > + dty = prd * state->duty_cycle; > + do_div(dty, state->period); > + > + if (!cstate.enabled) { > + /* enable clock to drive PWM counter */ > + ret = clk_enable(priv->clk); > + if (ret) > + return ret; > + } > + > + ret = regmap_read(priv->regmap,
[PATCH v2] memory: mtk-smi: Use of_device_get_match_data helper
From: Honghui Zhang Replace custom code with generic helper to retrieve driver data. Signed-off-by: Honghui Zhang --- drivers/memory/mtk-smi.c | 14 ++ 1 file changed, 2 insertions(+), 12 deletions(-) diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c index 4afbc41..2b798bb4 100644 --- a/drivers/memory/mtk-smi.c +++ b/drivers/memory/mtk-smi.c @@ -240,20 +240,15 @@ static int mtk_smi_larb_probe(struct platform_device *pdev) struct device *dev = >dev; struct device_node *smi_node; struct platform_device *smi_pdev; - const struct of_device_id *of_id; if (!dev->pm_domain) return -EPROBE_DEFER; - of_id = of_match_node(mtk_smi_larb_of_ids, pdev->dev.of_node); - if (!of_id) - return -EINVAL; - larb = devm_kzalloc(dev, sizeof(*larb), GFP_KERNEL); if (!larb) return -ENOMEM; - larb->larb_gen = of_id->data; + larb->larb_gen = of_device_get_match_data(dev); res = platform_get_resource(pdev, IORESOURCE_MEM, 0); larb->base = devm_ioremap_resource(dev, res); if (IS_ERR(larb->base)) @@ -319,7 +314,6 @@ static int mtk_smi_common_probe(struct platform_device *pdev) struct device *dev = >dev; struct mtk_smi *common; struct resource *res; - const struct of_device_id *of_id; enum mtk_smi_gen smi_gen; if (!dev->pm_domain) @@ -338,17 +332,13 @@ static int mtk_smi_common_probe(struct platform_device *pdev) if (IS_ERR(common->clk_smi)) return PTR_ERR(common->clk_smi); - of_id = of_match_node(mtk_smi_common_of_ids, pdev->dev.of_node); - if (!of_id) - return -EINVAL; - /* * for mtk smi gen 1, we need to get the ao(always on) base to config * m4u port, and we need to enable the aync clock for transform the smi * clock into emi clock domain, but for mtk smi gen2, there's no smi ao * base. */ - smi_gen = (enum mtk_smi_gen)of_id->data; + smi_gen = (enum mtk_smi_gen)of_device_get_match_data(dev); if (smi_gen == MTK_SMI_GEN1) { res = platform_get_resource(pdev, IORESOURCE_MEM, 0); common->smi_ao_base = devm_ioremap_resource(dev, res); -- 2.6.4
[PATCH v3 1/2] video/hdmi: Introduce helpers for the HDMI audio infoframe payload
The DP is using the same audio infoframe payload as hdmi, per DP 1.3 spec, but it has a different header. Provide a new interface here, it just packs the payload. Signed-off-by: Chris Zhong --- Changes in v3: - add size < HDMI_AUDIO_INFOFRAME_SIZE check according to Doug's advice Changes in v2: None drivers/video/hdmi.c | 66 ++-- include/linux/hdmi.h | 2 ++ 2 files changed, 50 insertions(+), 18 deletions(-) diff --git a/drivers/video/hdmi.c b/drivers/video/hdmi.c index 1cf907e..9868050 100644 --- a/drivers/video/hdmi.c +++ b/drivers/video/hdmi.c @@ -240,6 +240,49 @@ int hdmi_audio_infoframe_init(struct hdmi_audio_infoframe *frame) EXPORT_SYMBOL(hdmi_audio_infoframe_init); /** + * hdmi_audio_infoframe_pack_payload() - write HDMI audio infoframe payload to + * binary buffer + * @frame: HDMI audio infoframe + * @buffer: destination buffer + * @size: size of buffer + * + * Packs the information contained in the @frame structure into a binary + * representation that can be written into the corresponding controller + * registers. + * + * Returns 0 on success or a negative error code on failure. + */ +ssize_t hdmi_audio_infoframe_pack_payload(struct hdmi_audio_infoframe *frame, + void *buffer, size_t size) +{ + unsigned char channels; + u8 *ptr = buffer; + + if (size < frame->length || size < HDMI_AUDIO_INFOFRAME_SIZE) + return -ENOSPC; + + memset(buffer, 0, size); + + if (frame->channels >= 2) + channels = frame->channels - 1; + else + channels = 0; + + ptr[0] = ((frame->coding_type & 0xf) << 4) | (channels & 0x7); + ptr[1] = ((frame->sample_frequency & 0x7) << 2) | +(frame->sample_size & 0x3); + ptr[2] = frame->coding_type_ext & 0x1f; + ptr[3] = frame->channel_allocation; + ptr[4] = (frame->level_shift_value & 0xf) << 3; + + if (frame->downmix_inhibit) + ptr[4] |= BIT(7); + + return 0; +} +EXPORT_SYMBOL(hdmi_audio_infoframe_pack_payload); + +/** * hdmi_audio_infoframe_pack() - write HDMI audio infoframe to binary buffer * @frame: HDMI audio infoframe * @buffer: destination buffer @@ -256,22 +299,15 @@ EXPORT_SYMBOL(hdmi_audio_infoframe_init); ssize_t hdmi_audio_infoframe_pack(struct hdmi_audio_infoframe *frame, void *buffer, size_t size) { - unsigned char channels; u8 *ptr = buffer; size_t length; + int ret; length = HDMI_INFOFRAME_HEADER_SIZE + frame->length; if (size < length) return -ENOSPC; - memset(buffer, 0, size); - - if (frame->channels >= 2) - channels = frame->channels - 1; - else - channels = 0; - ptr[0] = frame->type; ptr[1] = frame->version; ptr[2] = frame->length; @@ -279,16 +315,10 @@ ssize_t hdmi_audio_infoframe_pack(struct hdmi_audio_infoframe *frame, /* start infoframe payload */ ptr += HDMI_INFOFRAME_HEADER_SIZE; - - ptr[0] = ((frame->coding_type & 0xf) << 4) | (channels & 0x7); - ptr[1] = ((frame->sample_frequency & 0x7) << 2) | -(frame->sample_size & 0x3); - ptr[2] = frame->coding_type_ext & 0x1f; - ptr[3] = frame->channel_allocation; - ptr[4] = (frame->level_shift_value & 0xf) << 3; - - if (frame->downmix_inhibit) - ptr[4] |= BIT(7); + ret = hdmi_audio_infoframe_pack_payload(frame, ptr, + size - HDMI_INFOFRAME_HEADER_SIZE); + if (ret) + return ret; hdmi_infoframe_set_checksum(buffer, length); diff --git a/include/linux/hdmi.h b/include/linux/hdmi.h index d271ff2..a4be132 100644 --- a/include/linux/hdmi.h +++ b/include/linux/hdmi.h @@ -272,6 +272,8 @@ struct hdmi_audio_infoframe { int hdmi_audio_infoframe_init(struct hdmi_audio_infoframe *frame); ssize_t hdmi_audio_infoframe_pack(struct hdmi_audio_infoframe *frame, void *buffer, size_t size); +ssize_t hdmi_audio_infoframe_pack_payload(struct hdmi_audio_infoframe *frame, + void *buffer, size_t size); enum hdmi_3d_structure { HDMI_3D_STRUCTURE_INVALID = -1, -- 2.7.4
[PATCH v3 2/2] drm/rockchip: cdn-dp: send audio infoframe to sink
Some DP/HDMI sink need to receive the audio infoframe to play sound, especially some multi-channel AV receiver, they need the channel_allocation from infoframe to config the speakers. Send the audio infoframe via SDP will make them work properly. Signed-off-by: Chris Zhong --- Changes in v3: None Changes in v2: - According to the advice of Sean Paul and Doug use hdmi_audio_infoframe_pack_payload to pack the buffer define a SDP_HEADER_SIZE drivers/gpu/drm/rockchip/cdn-dp-core.c | 20 drivers/gpu/drm/rockchip/cdn-dp-reg.c | 27 +++ drivers/gpu/drm/rockchip/cdn-dp-reg.h | 6 ++ include/drm/drm_dp_helper.h| 1 + 4 files changed, 54 insertions(+) diff --git a/drivers/gpu/drm/rockchip/cdn-dp-core.c b/drivers/gpu/drm/rockchip/cdn-dp-core.c index 9b0b058..6a4fc66 100644 --- a/drivers/gpu/drm/rockchip/cdn-dp-core.c +++ b/drivers/gpu/drm/rockchip/cdn-dp-core.c @@ -802,6 +802,7 @@ static int cdn_dp_audio_hw_params(struct device *dev, void *data, .sample_rate = params->sample_rate, .channels = params->channels, }; + u8 buffer[HDMI_AUDIO_INFOFRAME_SIZE + EDP_SDP_HEADER_SIZE] = { 0 }; int ret; mutex_lock(>lock); @@ -823,6 +824,25 @@ static int cdn_dp_audio_hw_params(struct device *dev, void *data, goto out; } + /* +* Prepare the infoframe header to SDP header per DP 1.3 spec, Table +* 2-98. +*/ + buffer[0] = 0; + buffer[1] = HDMI_INFOFRAME_TYPE_AUDIO; + buffer[2] = 0x1b; + buffer[3] = 0x48; + + ret = hdmi_audio_infoframe_pack_payload(>cea, + [EDP_SDP_HEADER_SIZE], + HDMI_AUDIO_INFOFRAME_SIZE); + if (ret < 0) { + DRM_DEV_ERROR(dev, "Failed to pack audio infoframe: %d\n", ret); + goto out; + } + + cdn_dp_sdp_write(dp, 0, buffer, sizeof(buffer)); + ret = cdn_dp_audio_config(dp, ); if (!ret) dp->audio_info = audio; diff --git a/drivers/gpu/drm/rockchip/cdn-dp-reg.c b/drivers/gpu/drm/rockchip/cdn-dp-reg.c index b14d211..4a818e4 100644 --- a/drivers/gpu/drm/rockchip/cdn-dp-reg.c +++ b/drivers/gpu/drm/rockchip/cdn-dp-reg.c @@ -286,6 +286,33 @@ int cdn_dp_dpcd_write(struct cdn_dp_device *dp, u32 addr, u8 value) return ret; } +void cdn_dp_sdp_write(struct cdn_dp_device *dp, int entry_id, u8 *buf, + u32 buf_len) +{ + int idx; + u32 *packet = (u32 *)buf; + u32 num_packets = buf_len / 4; + u8 type; + + if (buf_len < EDP_SDP_HEADER_SIZE) { + DRM_DEV_ERROR(dp->dev, "sdp buffer length: %d\n", buf_len); + return; + } + + type = buf[1]; + + for (idx = 0; idx < num_packets; idx++) + writel(cpu_to_le32(*packet++), dp->regs + SOURCE_PIF_DATA_WR); + + writel(entry_id, dp->regs + SOURCE_PIF_WR_ADDR); + + writel(F_HOST_WR, dp->regs + SOURCE_PIF_WR_REQ); + + writel(PIF_PKT_TYPE_VALID | F_PACKET_TYPE(type) | entry_id, + dp->regs + SOURCE_PIF_PKT_ALLOC_REG); + writel(PIF_PKT_ALLOC_WR_EN, dp->regs + SOURCE_PIF_PKT_ALLOC_WR_EN); +} + int cdn_dp_load_firmware(struct cdn_dp_device *dp, const u32 *i_mem, u32 i_size, const u32 *d_mem, u32 d_size) { diff --git a/drivers/gpu/drm/rockchip/cdn-dp-reg.h b/drivers/gpu/drm/rockchip/cdn-dp-reg.h index c4bbb4a83..6ec0e81 100644 --- a/drivers/gpu/drm/rockchip/cdn-dp-reg.h +++ b/drivers/gpu/drm/rockchip/cdn-dp-reg.h @@ -424,6 +424,11 @@ /* Reference cycles when using lane clock as reference */ #define LANE_REF_CYC 0x8000 +#define F_HOST_WR BIT(0) +#define PIF_PKT_ALLOC_WR_ENBIT(0) +#define PIF_PKT_TYPE_VALID (3 << 16) +#define F_PACKET_TYPE(x) (((x) & 0xff) << 8) + enum voltage_swing_level { VOLTAGE_LEVEL_0, VOLTAGE_LEVEL_1, @@ -478,5 +483,6 @@ int cdn_dp_set_video_status(struct cdn_dp_device *dp, int active); int cdn_dp_config_video(struct cdn_dp_device *dp); int cdn_dp_audio_stop(struct cdn_dp_device *dp, struct audio_info *audio); int cdn_dp_audio_mute(struct cdn_dp_device *dp, bool enable); +void cdn_dp_sdp_write(struct cdn_dp_device *dp, int entry_id, u8 *buf, u32 len); int cdn_dp_audio_config(struct cdn_dp_device *dp, struct audio_info *audio); #endif /* _CDN_DP_REG_H */ diff --git a/include/drm/drm_dp_helper.h b/include/drm/drm_dp_helper.h index b17476a..5d5dd07 100644 --- a/include/drm/drm_dp_helper.h +++ b/include/drm/drm_dp_helper.h @@ -878,6 +878,7 @@ struct edp_sdp_header { u8 HB3; /* 7:5 reserved, 4:0 number of valid data bytes */ } __packed; +#define EDP_SDP_HEADER_SIZE4 #define EDP_SDP_HEADER_REVISION_MASK 0x1F #define EDP_SDP_HEADER_VALID_PAYLOAD_BYTES 0x1F -- 2.7.4
[PATCH] f2fs: provide f2fs_balance_fs to __write_node_page
Signed-off-by: Yunlong Song --- fs/f2fs/checkpoint.c | 2 +- fs/f2fs/f2fs.h | 2 +- fs/f2fs/node.c | 16 ++-- 3 files changed, 12 insertions(+), 8 deletions(-) diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c index 5b876f6..3c84a25 100644 --- a/fs/f2fs/checkpoint.c +++ b/fs/f2fs/checkpoint.c @@ -1017,7 +1017,7 @@ static int block_operations(struct f2fs_sb_info *sbi) if (get_pages(sbi, F2FS_DIRTY_NODES)) { up_write(>node_write); - err = sync_node_pages(sbi, ); + err = sync_node_pages(sbi, , false); if (err) { up_write(>node_change); f2fs_unlock_all(sbi); diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 94a88b2..f69051b 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -2293,7 +2293,7 @@ struct page *new_node_page(struct dnode_of_data *dn, void move_node_page(struct page *node_page, int gc_type); int fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode, struct writeback_control *wbc, bool atomic); -int sync_node_pages(struct f2fs_sb_info *sbi, struct writeback_control *wbc); +int sync_node_pages(struct f2fs_sb_info *sbi, struct writeback_control *wbc, bool need); void build_free_nids(struct f2fs_sb_info *sbi, bool sync, bool mount); bool alloc_nid(struct f2fs_sb_info *sbi, nid_t *nid); void alloc_nid_done(struct f2fs_sb_info *sbi, nid_t nid); diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index d53fe62..b5c0ce3 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -1326,7 +1326,7 @@ static struct page *last_fsync_dnode(struct f2fs_sb_info *sbi, nid_t ino) } static int __write_node_page(struct page *page, bool atomic, bool *submitted, - struct writeback_control *wbc) + struct writeback_control *wbc, bool need) { struct f2fs_sb_info *sbi = F2FS_P_SB(page); nid_t nid; @@ -1387,6 +1387,10 @@ static int __write_node_page(struct page *page, bool atomic, bool *submitted, } unlock_page(page); + if (need) + f2fs_balance_fs(sbi, false); + else + f2fs_balance_fs_bg(sbi); if (unlikely(f2fs_cp_error(sbi))) { f2fs_submit_merged_write(sbi, NODE); @@ -1405,7 +1409,7 @@ static int __write_node_page(struct page *page, bool atomic, bool *submitted, static int f2fs_write_node_page(struct page *page, struct writeback_control *wbc) { - return __write_node_page(page, false, NULL, wbc); + return __write_node_page(page, false, NULL, wbc, true); } int fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode, @@ -1493,7 +1497,7 @@ int fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode, ret = __write_node_page(page, atomic && page == last_page, - , wbc); + , wbc, true); if (ret) { unlock_page(page); f2fs_put_page(last_page, 0); @@ -1530,7 +1534,7 @@ int fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode, return ret ? -EIO: 0; } -int sync_node_pages(struct f2fs_sb_info *sbi, struct writeback_control *wbc) +int sync_node_pages(struct f2fs_sb_info *sbi, struct writeback_control *wbc, bool need) { pgoff_t index, end; struct pagevec pvec; @@ -1608,7 +1612,7 @@ int sync_node_pages(struct f2fs_sb_info *sbi, struct writeback_control *wbc) set_fsync_mark(page, 0); set_dentry_mark(page, 0); - ret = __write_node_page(page, false, , wbc); + ret = __write_node_page(page, false, , wbc, need); if (ret) unlock_page(page); else if (submitted) @@ -1697,7 +1701,7 @@ static int f2fs_write_node_pages(struct address_space *mapping, diff = nr_pages_to_write(sbi, NODE, wbc); wbc->sync_mode = WB_SYNC_NONE; blk_start_plug(); - sync_node_pages(sbi, wbc); + sync_node_pages(sbi, wbc, true); blk_finish_plug(); wbc->nr_to_write = max((long)0, wbc->nr_to_write - diff); return 0; -- 1.8.5.2
Re: [linux-sunxi] [PATCH 10/10] ARM: dts: sun8i: Add SY8106A regulator to Orange Pi PC
在 2017-07-26 19:44,Maxime Ripard 写道: Hi, On Wed, Jul 26, 2017 at 12:23:48PM +0200, Ondřej Jirman wrote: Hi, icen...@aosc.io píše v St 26. 07. 2017 v 15:36 +0800: > > > > > > > > > Otherwse > > > > > > > > > + regulator-max-microvolt = <140>; > > > > > + regulator-ramp-delay = <200>; > > > > > > > > Is this an actual constraint of the SoC? Or is it a characteristic > > > > of the regulator? If it is the latter, it belongs in the driver. > > > > AFAIK the regulator supports varying the ramp delay (slew rate). > > I don't know... > > Maybe I should ask Ondrej? It is probably neither. It is used to calculate a delay inserted by the kernel between setting a new target voltage over I2C and changing the frequency of the CPU. The actual delay is calculated by the difference between previous and the new voltage. I don't remember seeing anything in the datasheet of the regulator. This is just some low value that works. It would probably be dependent on the capacitance on the output of the regulator, actual load (which varies), etc. So it is a board specific value. One could measure it with an oscilloscope if there's a need to optimize this. If this is a reasonable default, then this should be in the driver. You can't expect anyone to properly calculate a ramp delay and have access to both a scope and the CPU power lines. It seems that in regulator_desc structure a default value of ramp delay can be set, and the ones specified in dt can override it. So just add .ramp_delay = 200 in the driver's regulator_desc part? Should a comment be added that explains it's only an experienced value on Allwinner H3/H5 boards VDD-CPUX usage? Maxime ___ linux-arm-kernel mailing list linux-arm-ker...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
[GIT PULL] intel_th: Fixes for char-misc-linus
Hi Greg, Here are my fixes for 4.13, please consider pulling. These are really just two new PCI IDs. Thanks! The following changes since commit 520eccdfe187591a51ea9ab4c1a024ae4d0f68d9: Linux 4.13-rc2 (2017-07-23 16:15:17 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/ash/stm.git tags/stm-fixes-for-greg-20170726 for you to fetch changes up to a45ae3526897ebcd128e9044040bc7b4f57de4f0: intel_th: pci: Add Cannon Lake PCH-LP support (2017-07-26 15:33:15 +0300) intel_th: Fixes for v4.13 These are two new PCI IDs (Cannon Lake PCH-H and PCH-LP). Alexander Shishkin (2): intel_th: pci: Add Cannon Lake PCH-H support intel_th: pci: Add Cannon Lake PCH-LP support drivers/hwtracing/intel_th/pci.c | 10 ++ 1 file changed, 10 insertions(+)
Re: [PATCH 1/1] mm/hugetlb: Make huge_pte_offset() consistent and document behaviour
On Wed 26-07-17 14:33:57, Michal Hocko wrote: > On Wed 26-07-17 13:11:46, Punit Agrawal wrote: [...] > > I've been running tests from mce-test suite and libhugetlbfs for similar > > changes we did on arm64. There could be assumptions that were not > > exercised but I'm not sure how to check for all the possible usages. > > > > Do you have any other suggestions that can help improve confidence in > > the patch? > > Unfortunatelly I don't. I just know there were many subtle assumptions > all over the place so I am rather careful to not touch the code unless > really necessary. > > That being said, I am not opposing your patch. Let me be more specific. I am not opposing your patch but we should definitely need more reviewers to have a look. I am not seeing any immediate problems with it but I do not see a large improvements either (slightly less nightmare doesn't make me sleep all that well ;)). So I will leave the decisions to others. -- Michal Hocko SUSE Labs
Re: [PATCH v8 1/3] perf: cavium: Support memory controller PMU counters
On 26/07/17 12:19, Jan Glauber wrote: On Tue, Jul 25, 2017 at 04:39:18PM +0100, Suzuki K Poulose wrote: On 25/07/17 16:04, Jan Glauber wrote: Add support for the PMU counters on Cavium SOC memory controllers. This patch also adds generic functions to allow supporting more devices with PMU counters. Properties of the LMC PMU counters: - not stoppable - fixed purpose - read-only - one PCI device per memory controller Signed-off-by: Jan Glauber --- drivers/perf/Kconfig | 8 + drivers/perf/Makefile | 1 + drivers/perf/cavium_pmu.c | 424 + include/linux/cpuhotplug.h | 1 + 4 files changed, 434 insertions(+) create mode 100644 drivers/perf/cavium_pmu.c diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig index e5197ff..a46c3f0 100644 --- a/drivers/perf/Kconfig +++ b/drivers/perf/Kconfig @@ -43,4 +43,12 @@ config XGENE_PMU help Say y if you want to use APM X-Gene SoC performance monitors. +config CAVIUM_PMU + bool "Cavium SOC PMU" Is there any specific reason why this can't be built as a module ? Yes. I don't know how to load the module automatically. I can't make it a pci driver as the EDAC driver "owns" the device (and having two drivers for one device wont work as far as I know). I tried to hook into the EDAC driver but the EDAC maintainer was not overly welcoming that approach. And while it would be possible to have it a s a module I think it is of no use if it requires manualy loading. But maybe there is a simple solution I'm missing here? If you are talking about a Cavium specific EDAC driver, may be we could make that depend on this driver "at runtime" via symbols (may be even, trigger the probe of PMU), which will be referenced only when CONFIG_CAVIUM_PMU is defined. It is not the perfect solution, but that should do the trick. + /* +* Forbid groups containing mixed PMUs, software events are acceptable. +*/ + if (event->group_leader->pmu != event->pmu && + !is_software_event(event->group_leader)) + return -EINVAL; + + list_for_each_entry(sibling, >group_leader->sibling_list, + group_entry) + if (sibling->pmu != event->pmu && + !is_software_event(sibling)) + return -EINVAL; Do we also need to check if the events in the same group can be scheduled at once ? i.e, there is enough resources to schedule the requested events from the group. Not sure what you mean, do I need to check for programmable counters that no more counters are programmed than available? Yes. What if there are two events, both trying to use the same counter (either due to lack of programmable counters or duplicate events). + + hwc->config = event->attr.config; + hwc->idx = -1; + return 0; +} + ... +static int cvm_pmu_add(struct perf_event *event, int flags, u64 config_base, + u64 event_base) +{ + struct cvm_pmu_dev *pmu_dev = to_pmu_dev(event->pmu); + struct hw_perf_event *hwc = >hw; + + if (!cmpxchg(_dev->events[hwc->config], NULL, event)) + hwc->idx = hwc->config; + + if (hwc->idx == -1) + return -EBUSY; + + hwc->config_base = config_base; + hwc->event_base = event_base; + hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED; + + if (flags & PERF_EF_START) + pmu_dev->pmu.start(event, PERF_EF_RELOAD); + + return 0; +} + +static void cvm_pmu_del(struct perf_event *event, int flags) +{ + struct cvm_pmu_dev *pmu_dev = to_pmu_dev(event->pmu); + struct hw_perf_event *hwc = >hw; + int i; + + event->pmu->stop(event, PERF_EF_UPDATE); + + /* +* For programmable counters we need to check where we installed it. +* To keep this function generic always test the more complicated +* case (free running counters won't need the loop). +*/ + for (i = 0; i < pmu_dev->num_counters; i++) + if (cmpxchg(_dev->events[i], event, NULL) == event) + break; I couldn't see why hwc->config wouldn't give us the index where we installed the event in pmu_dev->events. What am I missing ? Did you see the comment above? It is not yet needed but will be when I add support for programmable counters. Is it supported in this series ? If it is still confusing I can also remove that for now and add it back later when it is needed. What is the hwc->idx for programmable counters ? is it going to be different than hwc->config ? If so, can we use hwc->idx to keep the idx where we installed the event ? Suzuki
Re: [RFC][PATCH v3]: documentation,atomic: Add new documents
On Wed, Jul 26, 2017 at 01:53:28PM +0200, Peter Zijlstra wrote: > > New version.. > > > --- > Subject: documentation,atomic: Add new documents > From: Peter Zijlstra > Date: Mon Jun 12 14:50:27 CEST 2017 > > Since we've vastly expanded the atomic_t interface in recent years the > existing documentation is woefully out of date and people seem to get > confused a bit. > > Start a new document to hopefully better explain the current state of > affairs. > > The old atomic_ops.txt also covers bitmaps and a few more details so > this is not a full replacement and we'll therefore keep that document > around until such a time that we've managed to write more text to cover > its entire. > You seems have a unfinished paragraph.. > Also please, ReST people, go away. > > Signed-off-by: Peter Zijlstra (Intel) > --- [...] > + > +Further, while something like: > + > + smp_mb__before_atomic(); > + atomic_dec(); > + > +is a 'typical' RELEASE pattern, the barrier is strictly stronger than > +a RELEASE. Similarly for something like: > + .. at here. Maybe you planned to put stronger ACQUIRE pattern? > + > --- a/Documentation/memory-barriers.txt > +++ b/Documentation/memory-barriers.txt > @@ -498,7 +498,7 @@ VARIETIES OF MEMORY BARRIER > This means that ACQUIRE acts as a minimal "acquire" operation and > RELEASE acts as a minimal "release" operation. > [...] > - > -[!] Note that special memory barrier primitives are available for these > -situations because on some CPUs the atomic instructions used imply full > memory > -barriers, and so barrier instructions are superfluous in conjunction with > them, > -and in such cases the special barrier primitives will be no-ops. > - > -See Documentation/core-api/atomic_ops.rst for more information. > +See Documentation/atomic_t.txt for more information. > s/atomic_t.txt/atomic_{t,bitops}.txt/ ? other than those two tiny things, Reviewed-by: Boqun Feng Regards, Boqun > > ACCESSING DEVICES signature.asc Description: PGP signature
RE: [PATCH v12 6/8] mm: support reporting free page blocks
On Wednesday, July 26, 2017 7:55 PM, Michal Hocko wrote: > On Wed 26-07-17 19:44:23, Wei Wang wrote: > [...] > > I thought about it more. Probably we can use the callback function > > with a little change like this: > > > > void walk_free_mem(void *opaque1, void (*visit)(void *opaque2, > > unsigned long pfn, > >unsigned long nr_pages)) > > { > > ... > > for_each_populated_zone(zone) { > >for_each_migratetype_order(order, type) { > > report_unused_page_block(zone, order, type, > > ); // from patch 6 > > pfn = page_to_pfn(page); > > visit(opaque1, pfn, 1 << order); > > } > > } > > } > > > > The above function scans all the free list and directly sends each > > free page block to the hypervisor via the virtio_balloon callback > > below. No need to implement a bitmap. > > > > In virtio-balloon, we have the callback: > > void *virtio_balloon_report_unused_pages(void *opaque, unsigned long > > pfn, unsigned long nr_pages) { > > struct virtio_balloon *vb = (struct virtio_balloon *)opaque; > > ...put the free page block to the the ring of vb; } > > > > > > What do you think? > > I do not mind conveying a context to the callback. I would still prefer > to keep the original min_order to check semantic though. Why? Well, > it doesn't make much sense to scan low order free blocks all the time > because they are simply too volatile. Larger blocks tend to surivive for > longer. So I assume you would only care about larger free blocks. This > will also make the call cheaper. > -- OK, I will keep min order there in the next version. Best, Wei
Re: [PATCH V2 net-next 01/21] net-next/hinic: Initialize hw interface
OK, we will use module_pci_driver although it is not very common in the same segment. On 7/25/2017 11:02 PM, Francois Romieu wrote: > Aviad Krawczyk : > [...] >> module_pci_driver - is not used in other drivers in the same segments, it >> is necessary ? > > /me checks... Ok, there seems to be some overenthusiastic copy'paste. > > See drivers/net/ethernet/intel/ixgb/ixgb_main.c: > [...] > /** > * ixgb_init_module - Driver Registration Routine > * > * ixgb_init_module is the first routine called when the driver is > * loaded. All it does is register with the PCI subsystem. > **/ > > static int __init > ixgb_init_module(void) > { > pr_info("%s - version %s\n", ixgb_driver_string, ixgb_driver_version); > pr_info("%s\n", ixgb_copyright); > > return pci_register_driver(_driver); > } > > module_init(ixgb_init_module); > > /** > * ixgb_exit_module - Driver Exit Cleanup Routine > * > * ixgb_exit_module is called just before the driver is removed > * from memory. > **/ > > static void __exit > ixgb_exit_module(void) > { > pci_unregister_driver(_driver); > } > > module_exit(ixgb_exit_module); > > Driver version ought to be fed through ethtool, if ever. Copyright message > mildly contributes to a better world. So the whole stuff above could be: > > module_pci_driver(ixgb_driver); >
Re: [PATCH] virtio-net: fix module unloading
On Wed, Jul 26, 2017 at 11:52:07AM +0800, Jason Wang wrote: > > > On 2017年07月24日 21:38, Andrew Jones wrote: > > Unregister the driver before removing multi-instance hotplug > > callbacks. This order avoids the warning issued from > > __cpuhp_remove_state_cpuslocked when the number of remaining > > instances isn't yet zero. > > > > Fixes: 8017c279196a ("net/virtio-net: Convert to hotplug state machine") > > Cc: Sebastian Andrzej Siewior > > Signed-off-by: Andrew Jones > > --- > > drivers/net/virtio_net.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > index 99a26a9efec1..f41ab0ea942a 100644 > > --- a/drivers/net/virtio_net.c > > +++ b/drivers/net/virtio_net.c > > @@ -2743,9 +2743,9 @@ module_init(virtio_net_driver_init); > > static __exit void virtio_net_driver_exit(void) > > { > > + unregister_virtio_driver(_net_driver); > > cpuhp_remove_multi_state(CPUHP_VIRT_NET_DEAD); > > cpuhp_remove_multi_state(virtionet_online); > > - unregister_virtio_driver(_net_driver); > > } > > module_exit(virtio_net_driver_exit); > > Acked-by: Jason Wang Thanks for the review! I merged it before the tag and don't want to rebase. Sorry about that. -- MST
Re: [PATCH] fortify: Use WARN instead of BUG for now
It should just be renamed from fortify_panic -> fortify_error, including in arch/x86/boot/compressed/misc.c and arch/x86/boot/compressed/misc.c. It can use WARN instead of BUG by with a 'default n', !COMPILE_TEST option to use BUG again. Otherwise it needs to be patched downstream when that's wanted. I don't think splitting it is the right approach to improving the runtime error handling. That only makes sense for the compile-time errors due to the limitations of __attribute__((error)). Can we think about that before changing it? Just make it use WARN for now. The best debugging experience would be passing along the sizes and having the fortify_error function convert that into nice error messages. For memcpy(p, q, n), n can be larger than both the detected sizes of p and q, not just either one. The error should just be saying the function name and printing the copy size and maximum sizes of p and q. That's going to increase the code size too but I think splitting it will be worse and it goes in the wrong direction in terms of complexity. It's going to make future extensions / optimization harder if it's split.
[REGRESSION 4.13-rc] NFS returns -EACCESS at the first read
Hi, I seem hitting a regression of NFS client on the today's Linus git tree. The symptom is that the file read over NFS returns occasionally -EACCESS at the first read. When I try to read the same file again (or do some other thing), I can read it successfully. The git bisection leaded to the commit bd8b2441742b49c76bec707757bd9c028ea9838e NFS: Store the raw NFS access mask in the inode's access cache Any further hint for debugging? thanks, Takashi
Re: [linux-sunxi] [PATCH 10/10] ARM: dts: sun8i: Add SY8106A regulator to Orange Pi PC
Maxime Ripard píše v St 26. 07. 2017 v 13:44 +0200: > Hi, > > On Wed, Jul 26, 2017 at 12:23:48PM +0200, Ondřej Jirman wrote: > > Hi, > > > > icen...@aosc.io píše v St 26. 07. 2017 v 15:36 +0800: > > > > > > > > > > > > > > > Otherwse > > > > > > > > > > > > > + regulator-max-microvolt = <140>; > > > > > > > + regulator-ramp-delay = <200>; > > > > > > > > > > > > Is this an actual constraint of the SoC? Or is it a characteristic > > > > > > of the regulator? If it is the latter, it belongs in the driver. > > > > > > AFAIK the regulator supports varying the ramp delay (slew rate). > > > > > > I don't know... > > > > > > Maybe I should ask Ondrej? > > > > It is probably neither. > > > > It is used to calculate a delay inserted by the kernel between setting > > a new target voltage over I2C and changing the frequency of the CPU. > > The actual delay is calculated by the difference between previous and > > the new voltage. > > > > I don't remember seeing anything in the datasheet of the regulator. > > This is just some low value that works. > > > > It would probably be dependent on the capacitance on the output of the > > regulator, actual load (which varies), etc. So it is a board specific > > value. One could measure it with an oscilloscope if there's a need to > > optimize this. > > If this is a reasonable default, then this should be in the > driver. You can't expect anyone to properly calculate a ramp delay and > have access to both a scope and the CPU power lines. It translates to 1ms per 0.2V which is highly conservative. The real times will be in 1-10us range. So I guess this could be a default in the driver. regards, o. > Maxime > > -- > Maxime Ripard, Free Electrons > Embedded Linux and Kernel engineering > http://free-electrons.com > signature.asc Description: This is a digitally signed message part
Re: [RFC][PATCH] thunderbolt: icm: Ignore mailbox errors in icm_suspend()
On Wednesday, July 26, 2017 11:32:44 AM Mika Westerberg wrote: > On Tue, Jul 25, 2017 at 06:10:57PM +0200, Rafael J. Wysocki wrote: > > On Tuesday, July 25, 2017 01:00:12 PM Mika Westerberg wrote: > > > On Tue, Jul 25, 2017 at 01:31:00AM +0200, Rafael J. Wysocki wrote: > > > > From: Rafael J. Wysocki > > > > > > > > On one of my test machines nhi_mailbox_cmd() called from icm_suspend() > > > > times out and returnes an error which then is propagated to the > > > > caller and causes the entire system suspend to be aborted which isn't > > > > very useful. > > > > > > > > Instead of aborting system suspend, print the error into the log > > > > and continue. > > > > > > I agree, it should not prevent suspend but I wonder why it fails in the > > > first place? Can you check what is the return value? > > > > As per the above, the error is a timeout, ie. -ETIMEDOUT. > > Ah, right I somehow missed that. > > Does it have Falcon Ridge controller or Alpine Ridge? I'll check later today, but i guess you'll know (see below). > Just to make sure, can you increase the timeout in nhi_mailbox_cmd() > to 1000ms or so. It should not take that long though but better to check. Well, I can do that, but I don't think it will help. It just looks like the chip is not responding at all at that point. > Which system this is BTW? It's the Dell 9360. :-) Sometimes after a reboot or a power cycle it starts in a state in which the TBT controller and a USB one (which seem to be somehow connected) appear to be dead or at least really flaky. Basically, the box needs to be power-cycled again to get rid of this condition and then everything works. Thanks, Rafael
Re: [REGRESSION 4.13-rc] NFS returns -EACCESS at the first read
Hi Takashi, On 07/26/2017 08:54 AM, Takashi Iwai wrote: > Hi, > > I seem hitting a regression of NFS client on the today's Linus git > tree. The symptom is that the file read over NFS returns occasionally > -EACCESS at the first read. When I try to read the same file again > (or do some other thing), I can read it successfully. > > The git bisection leaded to the commit > bd8b2441742b49c76bec707757bd9c028ea9838e > NFS: Store the raw NFS access mask in the inode's access cache > > > Any further hint for debugging? Does the patch in this email thread help? http://www.spinics.net/lists/linux-nfs/msg64930.html Thanks, Anna > > > thanks, > > Takashi > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >
Re: [PATCH net] Revert "vhost: cache used event for better performance"
On Wed, Jul 26, 2017 at 04:03:17PM +0800, Jason Wang wrote: > This reverts commit 809ecb9bca6a9424ccd392d67e368160f8b76c92. Since it > was reported to break vhost_net. We want to cache used event and use > it to check for notification. We try to valid cached used event by > checking whether or not it was ahead of new, but this is not correct > all the time, it could be stale and there's no way to know about this. > > Signed-off-by: Jason Wang Could you supply a bit more data here please? How does it get stale? What does guest need to do to make it stale? This will be helpful if anyone wants to bring it back, or if we want to extend the protocol. > --- > drivers/vhost/vhost.c | 28 ++-- > drivers/vhost/vhost.h | 3 --- > 2 files changed, 6 insertions(+), 25 deletions(-) > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > index e4613a3..9cb3f72 100644 > --- a/drivers/vhost/vhost.c > +++ b/drivers/vhost/vhost.c > @@ -308,7 +308,6 @@ static void vhost_vq_reset(struct vhost_dev *dev, > vq->avail = NULL; > vq->used = NULL; > vq->last_avail_idx = 0; > - vq->last_used_event = 0; > vq->avail_idx = 0; > vq->last_used_idx = 0; > vq->signalled_used = 0; > @@ -1402,7 +1401,7 @@ long vhost_vring_ioctl(struct vhost_dev *d, int ioctl, > void __user *argp) > r = -EINVAL; > break; > } > - vq->last_avail_idx = vq->last_used_event = s.num; > + vq->last_avail_idx = s.num; > /* Forget the cached index value. */ > vq->avail_idx = vq->last_avail_idx; > break; > @@ -2241,6 +2240,10 @@ static bool vhost_notify(struct vhost_dev *dev, struct > vhost_virtqueue *vq) > __u16 old, new; > __virtio16 event; > bool v; > + /* Flush out used index updates. This is paired > + * with the barrier that the Guest executes when enabling > + * interrupts. */ > + smp_mb(); > > if (vhost_has_feature(vq, VIRTIO_F_NOTIFY_ON_EMPTY) && > unlikely(vq->avail_idx == vq->last_avail_idx)) > @@ -2248,10 +2251,6 @@ static bool vhost_notify(struct vhost_dev *dev, struct > vhost_virtqueue *vq) > > if (!vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX)) { > __virtio16 flags; > - /* Flush out used index updates. This is paired > - * with the barrier that the Guest executes when enabling > - * interrupts. */ > - smp_mb(); > if (vhost_get_avail(vq, flags, >avail->flags)) { > vq_err(vq, "Failed to get flags"); > return true; > @@ -2266,26 +2265,11 @@ static bool vhost_notify(struct vhost_dev *dev, > struct vhost_virtqueue *vq) > if (unlikely(!v)) > return true; > > - /* We're sure if the following conditions are met, there's no > - * need to notify guest: > - * 1) cached used event is ahead of new > - * 2) old to new updating does not cross cached used event. */ > - if (vring_need_event(vq->last_used_event, new + vq->num, new) && > - !vring_need_event(vq->last_used_event, new, old)) > - return false; > - > - /* Flush out used index updates. This is paired > - * with the barrier that the Guest executes when enabling > - * interrupts. */ > - smp_mb(); > - > if (vhost_get_avail(vq, event, vhost_used_event(vq))) { > vq_err(vq, "Failed to get used event idx"); > return true; > } > - vq->last_used_event = vhost16_to_cpu(vq, event); > - > - return vring_need_event(vq->last_used_event, new, old); > + return vring_need_event(vhost16_to_cpu(vq, event), new, old); > } > > /* This actually signals the guest, using eventfd. */ > diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h > index f720958..bb7c29b 100644 > --- a/drivers/vhost/vhost.h > +++ b/drivers/vhost/vhost.h > @@ -115,9 +115,6 @@ struct vhost_virtqueue { > /* Last index we used. */ > u16 last_used_idx; > > - /* Last used evet we've seen */ > - u16 last_used_event; > - > /* Used flags */ > u16 used_flags; > > -- > 2.7.4
Re: [PATCH RFC] sched: Allow migrating kthreads into online but inactive CPUs
On Tue, Jul 25, 2017 at 06:58:21PM +0200, Peter Zijlstra wrote: > Hi, > > On Sat, Jun 17, 2017 at 08:10:08AM -0400, Tejun Heo wrote: > > Per-cpu workqueues have been tripping CPU affinity sanity checks while > > a CPU is being offlined. A per-cpu kworker ends up running on a CPU > > which isn't its target CPU while the CPU is online but inactive. > > > > While the scheduler allows kthreads to wake up on an online but > > inactive CPU, it doesn't allow a running kthread to be migrated to > > such a CPU, which leads to an odd situation where setting affinity on > > a sleeping and running kthread leads to different results. > > > > Each mem-reclaim workqueue has one rescuer which guarantees forward > > progress and the rescuer needs to bind itself to the CPU which needs > > help in making forward progress; however, due to the above issue, > > while set_cpus_allowed_ptr() succeeds, the rescuer doesn't end up on > > the correct CPU if the CPU is in the process of going offline, > > tripping the sanity check and executing the work item on the wrong > > CPU. > > > > This patch updates __migrate_task() so that kthreads can be migrated > > into an inactive but online CPU. > > > > Signed-off-by: Tejun Heo > > Reported-by: "Paul E. McKenney" > > Reported-by: Steven Rostedt > > Hmm.. so the rules for running on !active && online are slightly > stricter than just being a kthread, how about the below, does that work > too? Of 24 one-hour runs of the TREE07 rcutorture scenario, two had stalled tasks with this patch. One of them had more than 200 instances, the other two instances. In contrast, a 30-hour run a week ago with Tejun's patch completed cleanly. Here "stalled task" means that one of rcutorture's update-side kthreads fails to make any progress for more than 15 seconds. Grace periods are progressing, but a kthread waiting for a grace period isn't making progress, and is stuck with its ->state field at 0x402, that is TASK_NOLOAD|TASK_UNINTERRUPTIBLE. Which is as if it never got the wakeup, given that it is sleeping on schedule_timeout_idle(). Now, two of 24 might just be bad luck, but I haven't seen anything like this out of TREE07 since I queued Tejun's patch, so I am inclined to view your patch below with considerable suspicion. I -am- seeing this out of TREE01, even with Tejun's patch, but that scenario sets maxcpu=8 and nr_cpus=43, which seems to be tickling an issue that several other people are seeing. Others' testing seems to indicate that setting CONFIG_SOFTLOCKUP_DETECTOR=y suppresses this issue, but I need to do an overnight run to check my test cases, and that is tonight. So there might be something else going on as well. Thanx, Paul > kernel/sched/core.c | 36 ++-- > 1 file changed, 30 insertions(+), 6 deletions(-) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index d3d39a283beb..59b667c16826 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -894,6 +894,22 @@ void check_preempt_curr(struct rq *rq, struct > task_struct *p, int flags) > } > > #ifdef CONFIG_SMP > + > +/* > + * Per-CPU kthreads are allowed to run on !actie && online CPUs, see > + * __set_cpus_allowed_ptr() and select_fallback_rq(). > + */ > +static inline bool is_per_cpu_kthread(struct task_struct *p) > +{ > + if (!(p->flags & PF_KTHREAD)) > + return false; > + > + if (p->nr_cpus_allowed != 1) > + return false; > + > + return true; > +} > + > /* > * This is how migration works: > * > @@ -951,8 +967,13 @@ struct migration_arg { > static struct rq *__migrate_task(struct rq *rq, struct rq_flags *rf, >struct task_struct *p, int dest_cpu) > { > - if (unlikely(!cpu_active(dest_cpu))) > - return rq; > + if (is_per_cpu_kthread(p)) { > + if (unlikely(!cpu_online(dest_cpu))) > + return rq; > + } else { > + if (unlikely(!cpu_active(dest_cpu))) > + return rq; > + } > > /* Affinity changed (again). */ > if (!cpumask_test_cpu(dest_cpu, >cpus_allowed)) > @@ -1482,10 +1503,13 @@ static int select_fallback_rq(int cpu, struct > task_struct *p) > for (;;) { > /* Any allowed, online CPU? */ > for_each_cpu(dest_cpu, >cpus_allowed) { > - if (!(p->flags & PF_KTHREAD) && !cpu_active(dest_cpu)) > - continue; > - if (!cpu_online(dest_cpu)) > - continue; > + if (is_per_cpu_kthread(p)) { > + if (!cpu_online(dest_cpu)) > + continue; > + } else { > + if (!cpu_active(dest_cpu)) > + continue; > + } > goto out; >
Re: [PATCH v1] xen: get rid of paravirt op adjust_exception_frame
On 7/24/2017 10:28 AM, Juergen Gross wrote: When running as Xen pv-guest the exception frame on the stack contains %r11 and %rcx additional to the other data pushed by the processor. Instead of having a paravirt op being called for each exception type prepend the Xen specific code to each exception entry. When running as Xen pv-guest just use the exception entry with prepended instructions, otherwise use the entry without the Xen specific code. Signed-off-by: Juergen Gross Reviewed-by: Boris Ostrovsky (I'd s/xen/x86/ in subject to get x86 maintainers' attention ;-))
Re: [PATCH 1/2] printk/console: Always disable boot consoles that use init memory before it is freed
On (07/14/17 14:51), Petr Mladek wrote: > From: Matt Redfearn > > Commit 4c30c6f566c0 ("kernel/printk: do not turn off bootconsole in > printk_late_init() if keep_bootcon") added a check on keep_bootcon to > ensure that boot consoles were kept around until the real console is > registered. > > This can lead to problems if the boot console data and code are in the > init section, since it can be freed before the boot console is > unregistered. > > Commit 81cc26f2bd11 ("printk: only unregister boot consoles when > necessary") fixed this a better way. It allowed to keep boot consoles > that did not use init data. Unfortunately it did not remove the check > of keep_bootcon. > > This can lead to crashes and weird panics when the bootconsole is > accessed after free, especially if page poisoning is in use and the > code / data have been overwritten with a poison value. > > To prevent this, always free the boot console if it is within the init > section. In addition, print a warning about that the console is removed > prematurely. > > Finally there is a new comment how to avoid the warning. It replaced > an explanation that duplicated a more comprehensive function > description few lines above. > > Fixes: 4c30c6f566c0 ("kernel/printk: do not turn off bootconsole in > printk_late_init() if keep_bootcon") > Signed-off-by: Matt Redfearn > [pmla...@suse.com: print the warning, code and comments clean up] > Signed-off-by: Petr Mladek Reviewed-by: Sergey Senozhatsky -ss
Re: [PATCH 2/2] printk/console: Enhance the check for consoles using init memory
On (07/14/17 14:51), Petr Mladek wrote: > printk_late_init() is responsible for disabling boot consoles that > use init memory. It checks the address of struct console for this. > > But this is not enough. For example, there are several early > consoles that have write() method in the init section and > struct console in the normal section. They are not disabled > and could cause fancy and hard to debug system states. > > It is even more complicated by the macros EARLYCON_DECLARE() and > OF_EARLYCON_DECLARE() where various struct members are set at > runtime by the provided setup() function. > > I have tried to reproduce this problem and forced the classic uart > early console to stay using keep_bootcon parameter. In particular > I used earlycon=uart,io,0x3f8 keep_bootcon console=ttyS0,115200. > The system did not boot: > > [1.570496] PM: Image not found (code -22) > [1.570496] PM: Image not found (code -22) > [1.571886] PM: Hibernation image not present or could not be loaded. > [1.571886] PM: Hibernation image not present or could not be loaded. > [1.576407] Freeing unused kernel memory: 2528K > [1.577244] kernel tried to execute NX-protected page - exploit attempt? > (uid: 0) > > The double lines are caused by having both early uart console and > ttyS0 console enabled at the same time. The early console stopped > working when the init memory was freed. Fortunately, the invalid > call was caught by the NX-protexted page check and did not cause > any silent fancy problems. > > This patch adds a check for many other addresses stored in > struct console. It omits setup() and match() that are used > only when the console is registered. Therefore they have > already been used at this point and there is no reason > to use them again. > > Signed-off-by: Petr Mladek Reviewed-by: Sergey Senozhatsky -ss
[RFC PATCH] mm: memcg: fix css double put in mem_cgroup_iter
From: Wenwei Tao By removing the child cgroup while the parent cgroup is under reclaim, we could trigger the following kernel panic on kernel 3.10: kernel BUG at kernel/cgroup.c:893! invalid opcode: [#1] SMP CPU: 1 PID: 22477 Comm: kworker/1:1 Not tainted 3.10.107 #1 Workqueue: cgroup_destroy css_dput_fn task: 8817959a5780 ti: 8817e8886000 task.ti: 8817e8886000 RIP: 0010:[] [] cgroup_diput+0xc0/0xf0 RSP: :8817e8887da0 EFLAGS: 00010246 RAX: RBX: 8817a5dd5d40 RCX: dead0200 RDX: RSI: 8817973a6910 RDI: 8817f54c2a00 RBP: 8817e8887dc8 R08: 8817a5dd5dd0 R09: df9fb35794b01820 R10: df9fb35794b01820 R11: 7fa95b1efcda R12: 8817a5dd5d9c R13: 8817f38b3a40 R14: 8817973a6910 R15: 8817973a6910 FS: () GS:88181f22() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7fa6e6234000 CR3: 00179f19d000 CR4: 000407e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Stack: 8817a5dd5d40 8817a5dd5d9c 8817f38b3a40 8817973a6910 0040 8817e8887df8 811b37c2 8817fa23c000 8817f57dbb80 88181f232ac0 88181f237500 8817e8887e10 Call Trace: [] dput+0x1a2/0x2f0 [] cgroup_dput.isra.21+0x1c/0x30 [] css_dput_fn+0x1d/0x20 [] process_one_work+0x17c/0x460 [] worker_thread+0x116/0x3b0 [] ? manage_workers.isra.25+0x290/0x290 [] kthread+0xc0/0xd0 [] ? insert_kthread_work+0x40/0x40 [] ret_from_fork+0x58/0x90 [] ? insert_kthread_work+0x40/0x40 Code: 41 5e 41 5f 5d c3 0f 1f 44 00 00 48 8b 7f 78 48 8b 07 a8 01 74 15 48 81 c7 30 01 00 00 48 c7 c6 a0 a7 0c 81 e8 b2 83 02 00 eb c8 <0f> 0b 49 8b 4e 18 48 c7 c2 7e f1 7a 81 be 85 03 00 00 48 c7 c7 RIP [] cgroup_diput+0xc0/0xf0 RSP ---[ end trace 85eeea5212c44f51 ]--- I think there is a css double put in mem_cgroup_iter. Under reclaim, we call mem_cgroup_iter the first time with prev == NULL, and we get last_visited memcg from per zone's reclaim_iter then call __mem_cgroup_iter_next try to get next alive memcg, __mem_cgroup_iter_next could return NULL if last_visited is already the last one so we put the last_visited's memcg css and continue to the next while loop, this time we might not do css_tryget(_visited->css) if the dead_count is changed, but we still do css_put(_visited->css), we put it twice, this could trigger the BUG_ON at kernel/cgroup.c:893. Reported-by: Wang Yu Tested-by: Wang Yu Signed-off-by: Wenwei Tao --- mm/memcontrol.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 437ae2c..3d7a046 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1230,8 +1230,10 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root, memcg = __mem_cgroup_iter_next(root, last_visited); if (reclaim) { - if (last_visited && last_visited != root) + if (last_visited && last_visited != root) { css_put(_visited->css); + last_visited = NULL; + } iter->last_visited = memcg; smp_wmb(); -- 1.8.3.1
Re: [PATCH 2/2] ceph: pagecache writeback fault injection switch
On Tue, Jul 25, 2017 at 10:50 PM, Jeff Layton wrote: > From: Jeff Layton > > Testing ceph for proper writeback error handling turns out to be quite > difficult. I tried using iptables to block traffic but that didn't > give reliable results. > > I hacked in this wb_fault switch that makes the filesystem pretend that > writeback failed, even when it succeeds. With this, I could verify that > cephfs fsync error reporting does work properly. > > Signed-off-by: Jeff Layton > --- > fs/ceph/addr.c| 7 +++ > fs/ceph/debugfs.c | 8 +++- > fs/ceph/super.h | 2 ++ > 3 files changed, 16 insertions(+), 1 deletion(-) > > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c > index 50836280a6f8..a3831d100e16 100644 > --- a/fs/ceph/addr.c > +++ b/fs/ceph/addr.c > @@ -584,6 +584,10 @@ static int writepage_nounlock(struct page *page, struct > writeback_control *wbc) >page_off, len, >truncate_seq, truncate_size, >>i_mtime, , 1); > + > + if (fsc->wb_fault && err >= 0) > + err = -EIO; > + > if (err < 0) { > struct writeback_control tmp_wbc; > if (!wbc) > @@ -666,6 +670,9 @@ static void writepages_finish(struct ceph_osd_request > *req) > struct ceph_fs_client *fsc = ceph_inode_to_client(inode); > bool remove_page; > > + if (fsc->wb_fault && rc >= 0) > + rc = -EIO; > + > dout("writepages_finish %p rc %d\n", inode, rc); > if (rc < 0) { > mapping_set_error(mapping, rc); > diff --git a/fs/ceph/debugfs.c b/fs/ceph/debugfs.c > index 4e2d112c982f..e1e6eaa12031 100644 > --- a/fs/ceph/debugfs.c > +++ b/fs/ceph/debugfs.c > @@ -197,7 +197,6 @@ CEPH_DEFINE_SHOW_FUNC(caps_show) > CEPH_DEFINE_SHOW_FUNC(dentry_lru_show) > CEPH_DEFINE_SHOW_FUNC(mds_sessions_show) > > - > /* > * debugfs > */ > @@ -231,6 +230,7 @@ void ceph_fs_debugfs_cleanup(struct ceph_fs_client *fsc) > debugfs_remove(fsc->debugfs_caps); > debugfs_remove(fsc->debugfs_mdsc); > debugfs_remove(fsc->debugfs_dentry_lru); > + debugfs_remove(fsc->debugfs_wb_fault); > } > > int ceph_fs_debugfs_init(struct ceph_fs_client *fsc) > @@ -298,6 +298,12 @@ int ceph_fs_debugfs_init(struct ceph_fs_client *fsc) > if (!fsc->debugfs_dentry_lru) > goto out; > > + fsc->debugfs_wb_fault = debugfs_create_bool("wb_fault", > + 0600, fsc->client->debugfs_dir, > + >wb_fault); > + if (!fsc->debugfs_wb_fault) > + goto out; > + > return 0; > > out: > diff --git a/fs/ceph/super.h b/fs/ceph/super.h > index f02a2225fe42..a38fd6203b77 100644 > --- a/fs/ceph/super.h > +++ b/fs/ceph/super.h > @@ -84,6 +84,7 @@ struct ceph_fs_client { > > unsigned long mount_state; > int min_caps; /* min caps i added */ > + bool wb_fault; > > struct ceph_mds_client *mdsc; > > @@ -100,6 +101,7 @@ struct ceph_fs_client { > struct dentry *debugfs_bdi; > struct dentry *debugfs_mdsc, *debugfs_mdsmap; > struct dentry *debugfs_mds_sessions; > + struct dentry *debugfs_wb_fault; > #endif > I think it's better not to enable this feature by default. Enabling it by compilation option or mount option? Regards Yan, Zheng > #ifdef CONFIG_CEPH_FSCACHE > -- > 2.13.3 >
[PATCH] Drivers : edac : checkpatch.pl clean up
Fixed 'no assignment in if condition' coding style issue and removed unnecessary spaces at the start of a line. Signed-off-by: Himanshu Jha --- drivers/edac/i82860_edac.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/edac/i82860_edac.c b/drivers/edac/i82860_edac.c index 236c813..c8c1c4d 100644 --- a/drivers/edac/i82860_edac.c +++ b/drivers/edac/i82860_edac.c @@ -282,7 +282,9 @@ static void i82860_remove_one(struct pci_dev *pdev) if (i82860_pci) edac_pci_release_generic_ctl(i82860_pci); - if ((mci = edac_mc_del_mc(>dev)) == NULL) + mci = edac_mc_del_mc(>dev); + + if (mci == NULL) return; edac_mc_free(mci); @@ -312,10 +314,11 @@ static int __init i82860_init(void) edac_dbg(3, "\n"); - /* Ensure that the OPSTATE is set correctly for POLL or NMI */ - opstate_init(); + /* Ensure that the OPSTATE is set correctly for POLL or NMI */ + opstate_init(); - if ((pci_rc = pci_register_driver(_driver)) < 0) + pci_rc = pci_register_driver(_driver); + if (pci_rc < 0) goto fail0; if (!mci_pdev) { -- 2.7.4
Re: [PATCH 1/2] ceph: use errseq_t for writeback error reporting
On Tue, Jul 25, 2017 at 10:50 PM, Jeff Layton wrote: > From: Jeff Layton > > Ensure that when writeback errors are marked that we report those to all > file descriptions that were open at the time of the error. > > Signed-off-by: Jeff Layton > --- > fs/ceph/caps.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c > index 7007ae2a5ad2..13f6edf24acd 100644 > --- a/fs/ceph/caps.c > +++ b/fs/ceph/caps.c > @@ -2110,7 +2110,7 @@ int ceph_fsync(struct file *file, loff_t start, loff_t > end, int datasync) > > dout("fsync %p%s\n", inode, datasync ? " datasync" : ""); > > - ret = filemap_write_and_wait_range(inode->i_mapping, start, end); > + ret = file_write_and_wait_range(file, start, end); > if (ret < 0) > goto out; > > -- > 2.13.3 > Reviewed-by: "Yan, Zheng"
RE: linux-next: Tree for Jul 26
Hi Sergey, Paolo Abeni had sent a patch: https://www.mail-archive.com/netdev@vger.kernel.org/msg179192.html Regards, Rami Rosen -Original Message- From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] On Behalf Of Sergey Senozhatsky Sent: Wednesday, July 26, 2017 13:49 To: Paolo Abeni ; Stephen Rothwell Cc: Linux-Next Mailing List ; Linux Kernel Mailing List ; Paul Moore ; David S. Miller ; net...@vger.kernel.org Subject: Re: linux-next: Tree for Jul 26 Hello, On (07/26/17 16:12), Stephen Rothwell wrote: > Hi all, > > Changes since 20170725: > > Non-merge commits (relative to Linus' tree): 2358 > 2466 files changed, 86994 insertions(+), 44655 deletions(-) dce4551cb2adb1ac ("udp: preserve head state for IP_CMSG_PASSSEC") causes a build error net/ipv4/udp.c: In function ‘__udp_queue_rcv_skb’: net/ipv4/udp.c:1789:49: error: ‘struct sk_buff’ has no member named ‘sp’; did you mean ‘sk’? if (likely(IPCB(skb)->opt.optlen == 0 && !skb->sp)) ^ -ss
Re: [PATCH v8 1/3] perf: cavium: Support memory controller PMU counters
On Wed, Jul 26, 2017 at 01:47:35PM +0100, Suzuki K Poulose wrote: > On 26/07/17 12:19, Jan Glauber wrote: > >On Tue, Jul 25, 2017 at 04:39:18PM +0100, Suzuki K Poulose wrote: > >>On 25/07/17 16:04, Jan Glauber wrote: > >>>Add support for the PMU counters on Cavium SOC memory controllers. > >>> > >>>This patch also adds generic functions to allow supporting more > >>>devices with PMU counters. > >>> > >>>Properties of the LMC PMU counters: > >>>- not stoppable > >>>- fixed purpose > >>>- read-only > >>>- one PCI device per memory controller > >>> > >>>Signed-off-by: Jan Glauber > >>>--- > >>>drivers/perf/Kconfig | 8 + > >>>drivers/perf/Makefile | 1 + > >>>drivers/perf/cavium_pmu.c | 424 > >>>+ > >>>include/linux/cpuhotplug.h | 1 + > >>>4 files changed, 434 insertions(+) > >>>create mode 100644 drivers/perf/cavium_pmu.c > >>> > >>>diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig > >>>index e5197ff..a46c3f0 100644 > >>>--- a/drivers/perf/Kconfig > >>>+++ b/drivers/perf/Kconfig > >>>@@ -43,4 +43,12 @@ config XGENE_PMU > >>>help > >>> Say y if you want to use APM X-Gene SoC performance monitors. > >>> > >>>+config CAVIUM_PMU > >>>+ bool "Cavium SOC PMU" > >> > >>Is there any specific reason why this can't be built as a module ? > > > >Yes. I don't know how to load the module automatically. I can't make it > >a pci driver as the EDAC driver "owns" the device (and having two > >drivers for one device wont work as far as I know). I tried to hook > >into the EDAC driver but the EDAC maintainer was not overly welcoming > >that approach. > > > > >And while it would be possible to have it a s a module I think it is of > >no use if it requires manualy loading. But maybe there is a simple > >solution I'm missing here? > > > If you are talking about a Cavium specific EDAC driver, may be we could > make that depend on this driver "at runtime" via symbols (may be even, > trigger the probe of PMU), which will be referenced only when > CONFIG_CAVIUM_PMU > is defined. It is not the perfect solution, but that should do the trick. I think that is roughly what I proposed in v6. Can you have a look at: https://lkml.org/lkml/2017/6/23/333 https://patchwork.kernel.org/patch/9806427/ Probably there is a better way to do it. Or maybe we just keep it as built-in for the time being. --Jan
RE: [PATCH v3 00/16] Switchtec NTB Support
From: Logan Gunthorpe > Changes since v2: > > - Reordered the ntb_test link patch per Allen > - Removed an extra call to switchtec_ntb_init_mw > - Fixed a typo in the switchtec.txt documentation. Patches 5..16 (also 5 [was 6], and 14, objections notwithstanding): Acked-by: Allen Hubbe > -- > > Changes since v1: > > - Rebased onto latest ntb-next branch (with v4.13-rc1) > - Reworked ntb_mw_count() function so that it can be called all the > time (per discussion with Allen) > - Various spelling and formatting cleanups from Bjorn > - Added request_module() call such that the NTB module is automatically > loaded when appropriate hardware exists. > > -- > > Changes since the rfc: > > - Rebased on ntb-next > - Switched ntb_part_op to use sleep instead of delay > - Dropped a number of useless dbg __func__ prints > - Went back to the dynamic instead of the static class > - Swapped the notifier block for a simple callback > - Modified the new ntb api so that a couple functions with pidx > now must be called after link up. Per our discussion on the list. > > -- > > This patchset implements Non-Transparent Bridge (NTB) support for the > Microsemi Switchtec series of switches. We're looking for some > review from the community at this point but hope to get it upstreamed > for v4.14. > > Switchtec NTB support is configured over the same function and bar > as the management endpoint. Thus, the new driver hooks into the > management driver which we had merged in v4.12. We use the class > interface API to register an NTB device for every switchtec device > which supports NTB (not all do). > > The Switchtec hardware supports doorbells, memory windows and messages. > Seeing there is no native scratchpad support, 128 spads are emulated > through the use of a pre-setup memory window. The switch has 64 > doorbells which are shared between the two partitions and a > configurable set of memory windows. While the hardware supports more > than 2 partitions, this driver only supports the first two seeing > the current NTB API only supports two hosts. > > The driver has been tested with ntb_netdev and fully passes the > ntb_test script. > > This patchset is based off of ntb-next and can be found in this > git repo: > > https://github.com/sbates130272/linux-p2pmem.git switchtec_ntb_v3 > > *** BLURB HERE *** > > Logan Gunthorpe (16): > switchtec: move structure definitions into a common header > switchtec: export class symbol for use in upper layer driver > switchtec: add NTB hardware register definitions > switchtec: add link event notifier callback > ntb: ntb_test: ensure the link is up before trying to configure the > mws > ntb: ensure ntb_mw_get_align() is only called when the link is up > ntb: add check and comment for link up to mw_count() and > mw_get_align() > switchtec_ntb: introduce initial NTB driver > switchtec_ntb: initialize hardware for memory windows > switchtec_ntb: initialize hardware for doorbells and messages > switchtec_ntb: add skeleton NTB driver > switchtec_ntb: add link management > switchtec_ntb: implement doorbell registers > switchtec_ntb: implement scratchpad registers > switchtec_ntb: add memory window support > switchtec_ntb: update switchtec documentation with notes for NTB > > Documentation/switchtec.txt | 12 + > MAINTAINERS |2 + > drivers/ntb/hw/Kconfig |1 + > drivers/ntb/hw/Makefile |1 + > drivers/ntb/hw/mscc/Kconfig |9 + > drivers/ntb/hw/mscc/Makefile|1 + > drivers/ntb/hw/mscc/switchtec_ntb.c | 1211 > +++ > drivers/ntb/ntb_transport.c | 20 +- > drivers/ntb/test/ntb_perf.c | 18 +- > drivers/ntb/test/ntb_tool.c |6 +- > drivers/pci/switch/switchtec.c | 316 ++-- > include/linux/ntb.h | 11 +- > include/linux/switchtec.h | 373 ++ > tools/testing/selftests/ntb/ntb_test.sh |4 + > 14 files changed, 1702 insertions(+), 283 deletions(-) > create mode 100644 drivers/ntb/hw/mscc/Kconfig > create mode 100644 drivers/ntb/hw/mscc/Makefile > create mode 100644 drivers/ntb/hw/mscc/switchtec_ntb.c > create mode 100644 include/linux/switchtec.h > > -- > 2.11.0
Re: [RFC][PATCH] thunderbolt: icm: Ignore mailbox errors in icm_suspend()
On Wed, Jul 26, 2017 at 02:48:54PM +0200, Rafael J. Wysocki wrote: > On Wednesday, July 26, 2017 11:32:44 AM Mika Westerberg wrote: > > On Tue, Jul 25, 2017 at 06:10:57PM +0200, Rafael J. Wysocki wrote: > > > On Tuesday, July 25, 2017 01:00:12 PM Mika Westerberg wrote: > > > > On Tue, Jul 25, 2017 at 01:31:00AM +0200, Rafael J. Wysocki wrote: > > > > > From: Rafael J. Wysocki > > > > > > > > > > On one of my test machines nhi_mailbox_cmd() called from icm_suspend() > > > > > times out and returnes an error which then is propagated to the > > > > > caller and causes the entire system suspend to be aborted which isn't > > > > > very useful. > > > > > > > > > > Instead of aborting system suspend, print the error into the log > > > > > and continue. > > > > > > > > I agree, it should not prevent suspend but I wonder why it fails in the > > > > first place? Can you check what is the return value? > > > > > > As per the above, the error is a timeout, ie. -ETIMEDOUT. > > > > Ah, right I somehow missed that. > > > > Does it have Falcon Ridge controller or Alpine Ridge? > > I'll check later today, but i guess you'll know (see below). No need to check, it is Alpine Ridge (since it is Dell 9360). > > Just to make sure, can you increase the timeout in nhi_mailbox_cmd() > > to 1000ms or so. It should not take that long though but better to check. > > Well, I can do that, but I don't think it will help. > > It just looks like the chip is not responding at all at that point. I see. Then I think we should apply your patch now and we can investigate this further offline and hopefully find the root cause for the problem. For this patch: Acked-by: Mika Westerberg > > Which system this is BTW? > > It's the Dell 9360. :-) > > Sometimes after a reboot or a power cycle it starts in a state in which the > TBT controller and a USB one (which seem to be somehow connected) > appear to be dead or at least really flaky. Basically, the box needs to be > power-cycled again to get rid of this condition and then everything works. The xHCI controller is part of the Thunderbolt controller so whenever you have normal USB-C device connected there, you should also see the Alpine Ridge hierarchy in lspci output but the Thunderbolt host controller is not there.
Re: netlink: NULL timer crash
On Wed, Jul 26, 2017 at 3:09 PM, wrote: > Hi Dmitry, > > By trying to apply your reproducer to normal kernels, this scenery can not > be reproduced (on fedora). Does this C source only for KASAN kernels? No, NULL derefs are detected without KASAN. > On Thursday, March 23, 2017 at 8:55:52 PM UTC+8, Dmitry Vyukov wrote: >> >> Hello, >> >> The following program triggers call of NULL timer func: >> >> >> https://gist.githubusercontent.com/dvyukov/c210d01c74b911273469a93862ea7788/raw/2a3182772a6a6e20af3e71c02c2a1c2895d803fb/gistfile1.txt >> >> >> BUG: unable to handle kernel NULL pointer dereference at (null) >> IP: (null) >> PGD 0 >> Oops: 0010 [#1] SMP KASAN >> Modules linked in: >> CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.11.0-rc3+ #365 >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs >> 01/01/2011 >> task: 88006c634300 task.stack: 88006c64 >> RIP: 0010: (null) >> RSP: 0018:88006d1077c8 EFLAGS: 00010246 >> RAX: dc00 RBX: 880062bddb00 RCX: 8154e161 >> RDX: 1090c1f1 RSI: RDI: 880062bddb00 >> RBP: 88006d1077e8 R08: fbfff0a936a8 R09: 0001 >> R10: 0001 R11: fbfff0a936a7 R12: 84860f80 >> R13: R14: 880062bddb60 R15: 11000da20f05 >> FS: () GS:88006d10() >> knlGS: >> CS: 0010 DS: ES: CR0: 80050033 >> CR2: CR3: 04e21000 CR4: 001406e0 >> Call Trace: >> >> neigh_timer_handler+0x365/0xd40 net/core/neighbour.c:944 >> call_timer_fn+0x232/0x8c0 kernel/time/timer.c:1268 >> expire_timers kernel/time/timer.c:1307 [inline] >> __run_timers+0x6f7/0xbd0 kernel/time/timer.c:1601 >> run_timer_softirq+0x21/0x80 kernel/time/timer.c:1614 >> __do_softirq+0x2d6/0xb54 kernel/softirq.c:284 >> invoke_softirq kernel/softirq.c:364 [inline] >> irq_exit+0x1b1/0x1e0 kernel/softirq.c:405 >> exiting_irq arch/x86/include/asm/apic.h:657 [inline] >> smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:962 >> apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:487 >> RIP: 0010:native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:53 >> RSP: 0018:88006c647dc0 EFLAGS: 0286 ORIG_RAX: ff10 >> RAX: dc00 RBX: 11000d8c8fbb RCX: >> RDX: 109d8ed4 RSI: 0001 RDI: 84ec76a0 >> RBP: 88006c647dc0 R08: ed000d8c6861 R09: >> R10: R11: R12: fbfff09d8ed2 >> R13: 88006c647e78 R14: 84ec7690 R15: 0002 >> >> arch_safe_halt arch/x86/include/asm/paravirt.h:98 [inline] >> default_idle+0xba/0x450 arch/x86/kernel/process.c:275 >> arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:266 >> default_idle_call+0x37/0x80 kernel/sched/idle.c:97 >> cpuidle_idle_call kernel/sched/idle.c:155 [inline] >> do_idle+0x230/0x380 kernel/sched/idle.c:244 >> cpu_startup_entry+0x18/0x20 kernel/sched/idle.c:346 >> start_secondary+0x2a7/0x340 arch/x86/kernel/smpboot.c:275 >> start_cpu+0x14/0x14 arch/x86/kernel/head_64.S:306 >> Code: Bad RIP value. >> RIP: (null) RSP: 88006d1077c8 >> CR2: >> ---[ end trace 845120b8a0d21411 ]--- >> >> On commit 093b995e3b55a0ae0670226ddfcb05bfbf0099ae > > -- > You received this message because you are subscribed to the Google Groups > "syzkaller" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to syzkaller+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout.
Re: [PATCH] irqchip: create a Kconfig menu for irqchip drivers
2017-07-26 19:37 GMT+09:00 Marc Zyngier : > On 26/07/17 11:18, Masahiro Yamada wrote: >> Hi Marc, >> >> >> 2017-07-26 17:04 GMT+09:00 Marc Zyngier : >>> On 26/07/17 05:03, Masahiro Yamada wrote: Some irqchip drivers have a Kconfig prompt. When we run menuconfig or friends, those drivers are directly listed in the "Device Drivers" menu level. This does not look nice. Create a sub-system level menu. Signed-off-by: Masahiro Yamada --- drivers/irqchip/Kconfig | 4 1 file changed, 4 insertions(+) diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig index f1fd5f44d1d4..7b66313a2952 100644 --- a/drivers/irqchip/Kconfig +++ b/drivers/irqchip/Kconfig @@ -1,3 +1,5 @@ +menu "IRQ chip support" + config IRQCHIP def_bool y depends on OF_IRQ @@ -306,3 +308,5 @@ config QCOM_IRQ_COMBINER help Say yes here to add support for the IRQ combiner devices embedded in Qualcomm Technologies chips. + +endmenu >>> >>> I'm very reluctant to introduce this. IMHO, interrupt controllers are >>> way too low level a thing to let them be selected by the user. They >>> really should be selected by the platform that needs them >> >> This is true for the root irqchip. >> Not necessarily true for child irqchips. > > I dispute that argument. We've been able to make this work so far > *without* exposing yet another menu maze to the user. What has changed? The irqchip maintainers applied drivers with user-configurable Kconfig entries. >> >> >>> Do you have any example in mind where having a user-selectable interrupt >>> controller actually makes sense on its own? >> >> Yes. >> >> I see some user-selectable drivers in drivers/irqchip/Kconfig >> and I'd like to add one more for my SoCs. >> >> >> This patch: >> https://github.com/uniphier/linux/commit/f39efdf0ce34f77ae9e324d9ec6c7f486f43a0ed >> >> This is really optional, so >> I intentionally implemented it as a platform driver >> instead of IRQCHIP_DECLARE(). > > I really cannot see how this could be optional. It means that you could > end-up in a situation where the drivers for the devices being this > irqchip could have been compiled in, but not their interrupt controller. > How useful is that? In my case, the assumed irq consumer is GPIO. If the irq consumer is probed before the irqchip, it will be tried later by -EPROBE_DEFER. If the irqchip is not compiled at all, right, the irq consumer will not work. One possible (and general) solution is to specify "depends on" correctly between the provider and the consumer. >> Looks like irq-ts4800.c, irq-keystone.c are modules as well. > > They are directly selected by their respective defconfig. Are you sure? As far as I see, they are not selected by anyone. $ git grep 'TS4800_IRQ\|KEYSTONE_IRQ' arch/arm/configs/keystone_defconfig:CONFIG_KEYSTONE_IRQ=y arch/arm/configs/multi_v7_defconfig:CONFIG_KEYSTONE_IRQ=y drivers/irqchip/Kconfig:config TS4800_IRQ drivers/irqchip/Kconfig:config KEYSTONE_IRQ drivers/irqchip/Makefile:obj-$(CONFIG_TS4800_IRQ) += irq-ts4800.o drivers/irqchip/Makefile:obj-$(CONFIG_KEYSTONE_IRQ) += irq-keystone.o defconfig just provides a default value. Users are allowed to disable the option from menuconfig. > On arm64, > which is what I expect you driver targets, you should simply select it > in your platform entry. OK, assuming your clain is correct, we have 5 suspicious entries in drivers/irqchip/Kconfig. config JCORE_AIC bool "J-Core integrated AIC" if COMPILE_TEST config TS4800_IRQ tristate "TS-4800 IRQ controller" config KEYSTONE_IRQ tristate "Keystone 2 IRQ controller IP" config EZNPS_GIC bool "NPS400 Global Interrupt Manager (GIM)" config QCOM_IRQ_COMBINER bool "QCOM IRQ combiner support" The prompt strings make the entries visible in menuconfig. So, they should be removed. The prompts are pointless if the options are supposed by selected by others. Also, tristate is pointless. If they are supposed to be selected by platforms, they have no chance to be a module. They should be turned into bool (without prompt) Is this what you mean? -- Best Regards Masahiro Yamada
[RFC]Add new mdev interface for QoS
The vfio-mdev provide the capability to let different guest share the same physical device through mediate sharing, as result it bring a requirement about how to control the device sharing, we need a QoS related interface for mdev to management virtual device resource. E.g. In practical use, vGPUs assigned to different quests almost has different performance requirements, some guests may need higher priority for real time usage, some other may need more portion of the GPU resource to get higher 3D performance, corresponding we can define some interfaces like weight/cap for overall budget control, priority for single submission control. So I suggest to add some common attributes which are vendor agnostic in mdev core sysfs for QoS purpose. -Ping
Re: [PATCH net] Revert "vhost: cache used event for better performance"
On 2017年07月26日 20:57, Michael S. Tsirkin wrote: On Wed, Jul 26, 2017 at 04:03:17PM +0800, Jason Wang wrote: This reverts commit 809ecb9bca6a9424ccd392d67e368160f8b76c92. Since it was reported to break vhost_net. We want to cache used event and use it to check for notification. We try to valid cached used event by checking whether or not it was ahead of new, but this is not correct all the time, it could be stale and there's no way to know about this. Signed-off-by: Jason Wang Could you supply a bit more data here please? How does it get stale? What does guest need to do to make it stale? This will be helpful if anyone wants to bring it back, or if we want to extend the protocol. The problem we don't know whether or not guest has published a new used event. The check vring_need_event(vq->last_used_event, new + vq->num, new) is not sufficient to check for this. Thanks
Re: [PATCH net-next 2/2] bnxt_en: define sriov_lock unconditionally
On Wed, Jul 26, 2017 at 12:54 PM, Sathya Perla wrote: > On Wed, Jul 26, 2017 at 2:35 PM, Arnd Bergmann wrote: > [...] >>> Sathya already sent 3 patches to fix some of these issues. But I need >>> to rework one of his patch and resend. >> >> Ok, thanks. I just ran into one more issue, and don't know if that's included >> as well. If not, please also add the patch below (or fold it into the one >> that adds the switchdev dependency to the ethernet driver): >> >> 8<-- >> Subject: [PATCH] RDMA/bnxt_re: add NET_SWITCHDEV dependency >> >> The rdma side of BNXT enables the ethernet driver and has a list >> of its dependencies. However, the ethernet driver now also depends >> on NET_SWITCHDEV, so we have to add that dependency for both: > > Arnd, after the patch "bnxt_en: use SWITCHDEV_SET_OPS() for setting > vf_rep_switchdev_ops" the bnxt_en driver doesn't need an explicit > NET_SWITCHDEV dependency. So, the bnxt_re driver shouldn't need one > either. Are you still seeing the bnxt_re issue even after pulling the > above patch?? I think that's fine then. I missed that patch when it went in, so I only needed the add-on since I still had my own earlier patch. I'll drop both from my test tree now, and will let you know in case something else remains. Arnd
Re: linux-next: Tree for Jul 26
Hello, On (07/26/17 13:09), Rosen, Rami wrote: > Hi Sergey, > Paolo Abeni had sent a patch: > https://www.mail-archive.com/netdev@vger.kernel.org/msg179192.html yep, this should do the trick. thanks. -ss
Re: [PATCH] iommu/amd: Fix schedule-while-atomic BUG in initialization code
On Wed, 26 Jul 2017, Joerg Roedel wrote: > Yes, that should fix it, but I think its better to just move the > register_syscore_ops() call to a later initialization step, like in the > patch below. I tested it an will queue it to my iommu/fixes branch. Fair enough. Acked-by-me.
Re: [PATCH] iommu/amd: Fix schedule-while-atomic BUG in initialization code
On Wed, Jul 26, 2017 at 02:26:14PM +0200, Joerg Roedel wrote: > Hi Artem, Thomas, > > On Wed, Jul 26, 2017 at 12:42:49PM +0200, Thomas Gleixner wrote: > > On Tue, 25 Jul 2017, Artem Savkov wrote: > > > > > Hi, > > > > > > Commit 1c3c5ea "sched/core: Enable might_sleep() and smp_processor_id() > > > checks early" seem to have uncovered an issue with amd-iommu/x2apic. > > > > > > Starting with that commit the following warning started to show up on AMD > > > systems during boot: > > > > > [0.16] BUG: sleeping function called from invalid context at > > > kernel/locking/mutex.c:747 > > > > > [0.16] mutex_lock_nested+0x1b/0x20 > > > [0.16] register_syscore_ops+0x1d/0x70 > > > [0.16] state_next+0x119/0x910 > > > [0.16] iommu_go_to_state+0x29/0x30 > > > [0.16] amd_iommu_enable+0x13/0x23 > > > [0.16] irq_remapping_enable+0x1b/0x39 > > > [0.16] enable_IR_x2apic+0x91/0x196 > > > [0.16] default_setup_apic_routing+0x16/0x6e > > > [0.16] native_smp_prepare_cpus+0x257/0x2d5 > > Thanks for the report! > > > --- a/drivers/iommu/amd_iommu_init.c > > +++ b/drivers/iommu/amd_iommu_init.c > > @@ -2440,7 +2440,6 @@ static int __init state_next(void) > > break; > > case IOMMU_ACPI_FINISHED: > > early_enable_iommus(); > > - register_syscore_ops(_iommu_syscore_ops); > > x86_platform.iommu_shutdown = disable_iommus; > > init_state = IOMMU_ENABLED; > > break; > > @@ -2559,6 +2558,8 @@ static int __init amd_iommu_init(void) > > for_each_iommu(iommu) > > iommu_flush_all_caches(iommu); > > } > > + } else { > > + register_syscore_ops(_iommu_syscore_ops); > > } > > > > return ret; > > Yes, that should fix it, but I think its better to just move the > register_syscore_ops() call to a later initialization step, like in the > patch below. I tested it an will queue it to my iommu/fixes branch. Checked it as well just in case, didn't see any issues. Thank you. Reported-and-tested-by: Artem Savkov -- Regards, Artem
[v4 2/4] mm, oom: cgroup-aware OOM killer
Traditionally, the OOM killer is operating on a process level. Under oom conditions, it finds a process with the highest oom score and kills it. This behavior doesn't suit well the system with many running containers: 1) There is no fairness between containers. A small container with few large processes will be chosen over a large one with huge number of small processes. 2) Containers often do not expect that some random process inside will be killed. In many cases much safer behavior is to kill all tasks in the container. Traditionally, this was implemented in userspace, but doing it in the kernel has some advantages, especially in a case of a system-wide OOM. 3) Per-process oom_score_adj affects global OOM, so it's a breache in the isolation. To address these issues, cgroup-aware OOM killer is introduced. Under OOM conditions, it tries to find the biggest memory consumer, and free memory by killing corresponding task(s). The difference the "traditional" OOM killer is that it can treat memory cgroups as memory consumers as well as single processes. By default, it will look for the biggest leaf cgroup, and kill the largest task inside. But a user can change this behavior by enabling the per-cgroup oom_kill_all_tasks option. If set, it causes the OOM killer treat the whole cgroup as an indivisible memory consumer. In case if it's selected as on OOM victim, all belonging tasks will be killed. Tasks in the root cgroup are treated as independent memory consumers, and are compared with other memory consumers (e.g. leaf cgroups). The root cgroup doesn't support the oom_kill_all_tasks feature. Signed-off-by: Roman Gushchin Cc: Michal Hocko Cc: Vladimir Davydov Cc: Johannes Weiner Cc: Tetsuo Handa Cc: David Rientjes Cc: Tejun Heo Cc: kernel-t...@fb.com Cc: cgro...@vger.kernel.org Cc: linux-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux...@kvack.org --- include/linux/memcontrol.h | 23 + include/linux/oom.h| 3 + mm/memcontrol.c| 208 + mm/oom_kill.c | 172 - 4 files changed, 349 insertions(+), 57 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 3914e3dd6168..b21bbb0edc72 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -35,6 +35,7 @@ struct mem_cgroup; struct page; struct mm_struct; struct kmem_cache; +struct oom_control; /* Cgroup-specific page state, on top of universal node page state */ enum memcg_stat_item { @@ -199,6 +200,12 @@ struct mem_cgroup { /* OOM-Killer disable */ int oom_kill_disable; + /* kill all tasks in the subtree in case of OOM */ + bool oom_kill_all_tasks; + + /* cached OOM score */ + long oom_score; + /* handle for "memory.events" */ struct cgroup_file events_file; @@ -342,6 +349,11 @@ struct mem_cgroup *mem_cgroup_from_css(struct cgroup_subsys_state *css){ return css ? container_of(css, struct mem_cgroup, css) : NULL; } +static inline void mem_cgroup_put(struct mem_cgroup *memcg) +{ + css_put(>css); +} + #define mem_cgroup_from_counter(counter, member) \ container_of(counter, struct mem_cgroup, member) @@ -480,6 +492,8 @@ static inline bool task_in_memcg_oom(struct task_struct *p) bool mem_cgroup_oom_synchronize(bool wait); +bool mem_cgroup_select_oom_victim(struct oom_control *oc); + #ifdef CONFIG_MEMCG_SWAP extern int do_swap_account; #endif @@ -739,6 +753,10 @@ static inline bool task_in_mem_cgroup(struct task_struct *task, return true; } +static inline void mem_cgroup_put(struct mem_cgroup *memcg) +{ +} + static inline struct mem_cgroup * mem_cgroup_iter(struct mem_cgroup *root, struct mem_cgroup *prev, @@ -926,6 +944,11 @@ static inline void count_memcg_event_mm(struct mm_struct *mm, enum vm_event_item idx) { } + +static inline bool mem_cgroup_select_oom_victim(struct oom_control *oc) +{ + return false; +} #endif /* CONFIG_MEMCG */ static inline void __inc_memcg_state(struct mem_cgroup *memcg, diff --git a/include/linux/oom.h b/include/linux/oom.h index 8a266e2be5a6..b7ec3bd441be 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -39,6 +39,7 @@ struct oom_control { unsigned long totalpages; struct task_struct *chosen; unsigned long chosen_points; + struct mem_cgroup *chosen_memcg; }; extern struct mutex oom_lock; @@ -79,6 +80,8 @@ extern void oom_killer_enable(void); extern struct task_struct *find_lock_task_mm(struct task_struct *p); +extern int oom_evaluate_task(struct task_struct *task, void *arg); + /* sysctls */ extern int sysctl_oom_dump_tasks; extern int sysctl_oom_kill_allocating_task; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 9085e55eb69f..ba72d1cf73d0 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2625,6 +2625,181 @@ static
[v4 4/4] mm, oom, docs: describe the cgroup-aware OOM killer
Update cgroups v2 docs. Signed-off-by: Roman Gushchin Cc: Michal Hocko Cc: Vladimir Davydov Cc: Johannes Weiner Cc: Tetsuo Handa Cc: David Rientjes Cc: Tejun Heo Cc: kernel-t...@fb.com Cc: cgro...@vger.kernel.org Cc: linux-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux...@kvack.org --- Documentation/cgroup-v2.txt | 62 + 1 file changed, 62 insertions(+) diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt index cb9ea281ab72..bf106b6b6b52 100644 --- a/Documentation/cgroup-v2.txt +++ b/Documentation/cgroup-v2.txt @@ -48,6 +48,7 @@ v1 is available under Documentation/cgroup-v1/. 5-2-1. Memory Interface Files 5-2-2. Usage Guidelines 5-2-3. Memory Ownership + 5-2-4. Cgroup-aware OOM Killer 5-3. IO 5-3-1. IO Interface Files 5-3-2. Writeback @@ -1001,6 +1002,37 @@ PAGE_SIZE multiple when read back. high limit is used and monitored properly, this limit's utility is limited to providing the final safety net. + memory.oom_kill_all_tasks + + A read-write single value file which exits on non-root + cgroups. The default is "0". + + Defines whether the OOM killer should treat the cgroup + as a single entity during the victim selection. + + If set, OOM killer will kill all belonging tasks in + corresponding cgroup is selected as an OOM victim. + + Be default, OOM killer respect /proc/pid/oom_score_adj value + -1000, and will never kill the task, unless oom_kill_all_tasks + is set. + + memory.oom_priority + + A read-write single value file which exits on non-root + cgroups. The default is "0". + + An integer number within the [-1, 1] range, + which defines the order in which the OOM killer selects victim + memory cgroups. + + OOM killer prefers memory cgroups with larger priority if they + are populated with elegible tasks. + + The oom_priority value is compared within sibling cgroups. + + The root cgroup has the oom_priority 0, which cannot be changed. + memory.events A read-only flat-keyed file which exists on non-root cgroups. The following entries are defined. Unless specified @@ -1205,6 +1237,36 @@ POSIX_FADV_DONTNEED to relinquish the ownership of memory areas belonging to the affected files to ensure correct memory ownership. +Cgroup-aware OOM Killer +~~~ + +Cgroup v2 memory controller implements a cgroup-aware OOM killer. +It means that it treats memory cgroups as first class OOM entities. + +Under OOM conditions the memory controller tries to make the best +choise of a victim, hierarchically looking for the largest memory +consumer. By default, it will look for the biggest task in the +biggest leaf cgroup. + +Be default, all cgroups have oom_priority 0, and OOM killer will +chose the largest cgroup recursively on each level. For non-root +cgroups it's possible to change the oom_priority, and it will cause +the OOM killer to look athe the priority value first, and compare +sizes only of cgroups with equal priority. + +But a user can change this behavior by enabling the per-cgroup +oom_kill_all_tasks option. If set, it causes the OOM killer treat +the whole cgroup as an indivisible memory consumer. In case if it's +selected as on OOM victim, all belonging tasks will be killed. + +Tasks in the root cgroup are treated as independent memory consumers, +and are compared with other memory consumers (e.g. leaf cgroups). +The root cgroup doesn't support the oom_kill_all_tasks feature. + +This affects both system- and cgroup-wide OOMs. For a cgroup-wide OOM +the memory controller considers only cgroups belonging to the sub-tree +of the OOM'ing cgroup. + IO -- -- 2.13.3
[v4 3/4] mm, oom: introduce oom_priority for memory cgroups
Introduce a per-memory-cgroup oom_priority setting: an integer number within the [-1, 1] range, which defines the order in which the OOM killer selects victim memory cgroups. OOM killer prefers memory cgroups with larger priority if they are populated with elegible tasks. The oom_priority value is compared within sibling cgroups. The root cgroup has the oom_priority 0, which cannot be changed. Signed-off-by: Roman Gushchin Cc: Michal Hocko Cc: Vladimir Davydov Cc: Johannes Weiner Cc: David Rientjes Cc: Tejun Heo Cc: Tetsuo Handa Cc: kernel-t...@fb.com Cc: cgro...@vger.kernel.org Cc: linux-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux...@kvack.org --- include/linux/memcontrol.h | 3 +++ mm/memcontrol.c| 55 -- 2 files changed, 56 insertions(+), 2 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index b21bbb0edc72..d31ac58e08ad 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -206,6 +206,9 @@ struct mem_cgroup { /* cached OOM score */ long oom_score; + /* OOM killer priority */ + short oom_priority; + /* handle for "memory.events" */ struct cgroup_file events_file; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index ba72d1cf73d0..2c1566995077 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2710,12 +2710,21 @@ static void select_victim_memcg(struct mem_cgroup *root, struct oom_control *oc) for (;;) { struct cgroup_subsys_state *css; struct mem_cgroup *memcg = NULL; + short prio = SHRT_MIN; long score = LONG_MIN; css_for_each_child(css, >css) { struct mem_cgroup *iter = mem_cgroup_from_css(css); - if (iter->oom_score > score) { + if (iter->oom_score == 0) + continue; + + if (iter->oom_priority > prio) { + memcg = iter; + prio = iter->oom_priority; + score = iter->oom_score; + } else if (iter->oom_priority == prio && + iter->oom_score > score) { memcg = iter; score = iter->oom_score; } @@ -2782,7 +2791,15 @@ bool mem_cgroup_select_oom_victim(struct oom_control *oc) * For system-wide OOMs we should consider tasks in the root cgroup * with oom_score larger than oc->chosen_points. */ - if (!oc->memcg) { + if (!oc->memcg && !(oc->chosen_memcg && + oc->chosen_memcg->oom_priority > 0)) { + /* +* Root memcg has priority 0, so if chosen memcg has lower +* priority, any task in root cgroup is preferable. +*/ + if (oc->chosen_memcg && oc->chosen_memcg->oom_priority < 0) + oc->chosen_points = 0; + select_victim_root_cgroup_task(oc); if (oc->chosen && oc->chosen_memcg) { @@ -5373,6 +5390,34 @@ static ssize_t memory_oom_kill_all_tasks_write(struct kernfs_open_file *of, return nbytes; } +static int memory_oom_priority_show(struct seq_file *m, void *v) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m)); + + seq_printf(m, "%d\n", memcg->oom_priority); + + return 0; +} + +static ssize_t memory_oom_priority_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); + int oom_priority; + int err; + + err = kstrtoint(strstrip(buf), 0, _priority); + if (err) + return err; + + if (oom_priority < -1 || oom_priority > 1) + return -EINVAL; + + memcg->oom_priority = (short)oom_priority; + + return nbytes; +} + static int memory_events_show(struct seq_file *m, void *v) { struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m)); @@ -5499,6 +5544,12 @@ static struct cftype memory_files[] = { .write = memory_oom_kill_all_tasks_write, }, { + .name = "oom_priority", + .flags = CFTYPE_NOT_ON_ROOT, + .seq_show = memory_oom_priority_show, + .write = memory_oom_priority_write, + }, + { .name = "events", .flags = CFTYPE_NOT_ON_ROOT, .file_offset = offsetof(struct mem_cgroup, events_file), -- 2.13.3
Re: [REGRESSION 4.13-rc] NFS returns -EACCESS at the first read
On Wed, 26 Jul 2017 14:57:07 +0200, Anna Schumaker wrote: > > Hi Takashi, > > On 07/26/2017 08:54 AM, Takashi Iwai wrote: > > Hi, > > > > I seem hitting a regression of NFS client on the today's Linus git > > tree. The symptom is that the file read over NFS returns occasionally > > -EACCESS at the first read. When I try to read the same file again > > (or do some other thing), I can read it successfully. > > > > The git bisection leaded to the commit > > bd8b2441742b49c76bec707757bd9c028ea9838e > > NFS: Store the raw NFS access mask in the inode's access cache > > > > > > Any further hint for debugging? > > Does the patch in this email thread help? > http://www.spinics.net/lists/linux-nfs/msg64930.html Thanks, I gave it a shot and the result looks good. Feel free to my tested-by tag: Tested-by: Takashi Iwai Though, when I look around the code, I feel somehow uneasy by that still MAY_XXX is used for nfs_access_entry.mask, e.g. in nfs3_proc_access() or nfs4_proc_access(). Are these function OK without the similar conversion? thanks, Takashi
[v4 1/4] mm, oom: refactor the TIF_MEMDIE usage
First, separate tsk_is_oom_victim() and TIF_MEMDIE flag checks: let the first one indicate that a task is killed by the OOM killer, and the second one indicate that a task has an access to the memory reserves (with a hope to eliminate it later). Second, set TIF_MEMDIE to all threads of an OOM victim process. Third, to limit the number of processes which have an access to memory reserves, let's keep an atomic pointer to a task, which grabbed it. Signed-off-by: Roman Gushchin Cc: Michal Hocko Cc: Vladimir Davydov Cc: Johannes Weiner Cc: Tetsuo Handa Cc: David Rientjes Cc: Tejun Heo Cc: kernel-t...@fb.com Cc: cgro...@vger.kernel.org Cc: linux-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux...@kvack.org --- kernel/exit.c | 2 +- mm/memcontrol.c | 2 +- mm/oom_kill.c | 30 +- 3 files changed, 27 insertions(+), 7 deletions(-) diff --git a/kernel/exit.c b/kernel/exit.c index 8f40bee5ba9d..d5f372a2a363 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -542,7 +542,7 @@ static void exit_mm(void) task_unlock(current); mm_update_next_owner(mm); mmput(mm); - if (test_thread_flag(TIF_MEMDIE)) + if (tsk_is_oom_victim(current)) exit_oom_victim(); } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index d61133e6af99..9085e55eb69f 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1896,7 +1896,7 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, * bypass the last charges so that they can exit quickly and * free their memory. */ - if (unlikely(test_thread_flag(TIF_MEMDIE) || + if (unlikely(tsk_is_oom_victim(current) || fatal_signal_pending(current) || current->flags & PF_EXITING)) goto force; diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 9e8b4f030c1c..72de01be4d33 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -435,6 +435,8 @@ static DECLARE_WAIT_QUEUE_HEAD(oom_victims_wait); static bool oom_killer_disabled __read_mostly; +static struct task_struct *tif_memdie_owner; + #define K(x) ((x) << (PAGE_SHIFT-10)) /* @@ -656,13 +658,24 @@ static void mark_oom_victim(struct task_struct *tsk) struct mm_struct *mm = tsk->mm; WARN_ON(oom_killer_disabled); - /* OOM killer might race with memcg OOM */ - if (test_and_set_tsk_thread_flag(tsk, TIF_MEMDIE)) + + if (!cmpxchg(_memdie_owner, NULL, current)) { + struct task_struct *t; + + rcu_read_lock(); + for_each_thread(current, t) + set_tsk_thread_flag(t, TIF_MEMDIE); + rcu_read_unlock(); + } + + /* +* OOM killer might race with memcg OOM. +* oom_mm is bound to the signal struct life time. +*/ + if (cmpxchg(>signal->oom_mm, NULL, mm)) return; - /* oom_mm is bound to the signal struct life time. */ - if (!cmpxchg(>signal->oom_mm, NULL, mm)) - mmgrab(tsk->signal->oom_mm); + mmgrab(tsk->signal->oom_mm); /* * Make sure that the task is woken up from uninterruptible sleep @@ -682,6 +695,13 @@ void exit_oom_victim(void) { clear_thread_flag(TIF_MEMDIE); + /* +* If current tasks if a thread, which initially +* received TIF_MEMDIE, clear tif_memdie_owner to +* give a next process a chance to capture it. +*/ + cmpxchg(_memdie_owner, current, NULL); + if (!atomic_dec_return(_victims)) wake_up_all(_victims_wait); } -- 2.13.3
Re: Sparse warnings on GENMASK + arm32
> From: "Stephen Boyd" > To: linux-spa...@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > Sent: Tuesday, 25 July, 2017 9:30:20 PM > Subject: Sparse warnings on GENMASK + arm32 > > I see sparse warning when I check a clk driver file in the kernel > on a 32-bit ARM build. > > drivers/clk/sunxi/clk-sun6i-ar100.c:65:20: warning: cast truncates bits from > constant value (3 becomes ) > > The code in question looks like: > > static const struct factors_data sun6i_ar100_data = { > .mux = 16, > .muxmask = GENMASK(1, 0), > .table = _ar100_config, > .getter = sun6i_get_ar100_factors, > }; > > where factors_data is > > struct factors_data { > int enable; > int mux; > int muxmask; > const struct clk_factors_config *table; > void (*getter)(struct factors_request *req); > void (*recalc)(struct factors_request *req); > const char *name; > }; > > > and sparse seems to be complaining about the muxmask assignment > here. Oddly, this doesn't happen on arm64 builds. Both times, I'm > checking this on an x86-64 machine. > > $ sparse --version > v0.5.1-rc4-1-gfa71b7ac0594 > > Is there something confusing to sparse in the GENMASK macro? > Hmm, it seems sparse is incorrectly taking ~0UL to be a 64-bit value while BITS_PER_LONG is (correctly) evaluated to be 32. #define GENMASK(h, l) \ (((~0UL) << (l)) & (~0UL >> (BITS_PER_LONG - 1 - (h > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, > a Linux Foundation Collaborative Project > -- > To unsubscribe from this list: send the line "unsubscribe linux-sparse" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >
Re: [PATCH 1/1] mm/hugetlb: Make huge_pte_offset() consistent and document behaviour
Michal Hocko writes: > On Wed 26-07-17 14:33:57, Michal Hocko wrote: >> On Wed 26-07-17 13:11:46, Punit Agrawal wrote: > [...] >> > I've been running tests from mce-test suite and libhugetlbfs for similar >> > changes we did on arm64. There could be assumptions that were not >> > exercised but I'm not sure how to check for all the possible usages. >> > >> > Do you have any other suggestions that can help improve confidence in >> > the patch? >> >> Unfortunatelly I don't. I just know there were many subtle assumptions >> all over the place so I am rather careful to not touch the code unless >> really necessary. >> >> That being said, I am not opposing your patch. > > Let me be more specific. I am not opposing your patch but we should > definitely need more reviewers to have a look. I am not seeing any > immediate problems with it but I do not see a large improvements either > (slightly less nightmare doesn't make me sleep all that well ;)). So I > will leave the decisions to others. I hear you - I'd definitely appreciate more eyes on the code change and description. Thanks for taking a look.
Re: [PATCH] mm: take memory hotplug lock within numa_zonelist_order_handler()
On Wed, 26 Jul 2017, Heiko Carstens wrote: > Andre Wild reported the folling warning: > > WARNING: CPU: 2 PID: 1205 at kernel/cpu.c:240 > lockdep_assert_cpus_held+0x4c/0x60 > Modules linked in: > CPU: 2 PID: 1205 Comm: bash Not tainted 4.13.0-rc2-00022-gfd2b2c57ec20 #10 > Hardware name: IBM 2964 N96 702 (z/VM 6.4.0) > task: 701d8100 task.stack: 73594000 > Krnl PSW : 0704f0018000 00145e24 > (lockdep_assert_cpus_held+0x4c/0x60) > ... > Call Trace: > lockdep_assert_cpus_held+0x42/0x60) > stop_machine_cpuslocked+0x62/0xf0 > build_all_zonelists+0x92/0x150 > numa_zonelist_order_handler+0x102/0x150 > proc_sys_call_handler.isra.12+0xda/0x118 > proc_sys_write+0x34/0x48 > __vfs_write+0x3c/0x178 > vfs_write+0xbc/0x1a0 > SyS_write+0x66/0xc0 > system_call+0xc4/0x2b0 > locks held by bash/1205: > #0: (sb_writers#4){.+.+.+}, at: [<0037b29e>] vfs_write+0xa6/0x1a0 > #1: (zl_order_mutex){+.+...}, at: [<002c8e4c>] > numa_zonelist_order_handler+0x44/0x150 > #2: (zonelists_mutex){+.+...}, at: [<002c8efc>] > numa_zonelist_order_handler+0xf4/0x150 > Last Breaking-Event-Address: > [<00145e20>] lockdep_assert_cpus_held+0x48/0x60 > > This can be easily triggered with e.g. > > >echo n > /proc/sys/vm/numa_zonelist_order > > With commit 3f906ba23689a ("mm/memory-hotplug: switch locking to a > percpu rwsem") memory hotplug locking was changed to fix a potential > deadlock. This also switched the stop_machine() invocation within > build_all_zonelists() to stop_machine_cpuslocked() which now expects > that online cpus are locked when being called. > > This assumption is not true if build_all_zonelists() is being called > from numa_zonelist_order_handler(). In order to fix this simply add a > mem_hotplug_begin()/mem_hotplug_done() pair to numa_zonelist_order_handler(). Sorry, I missed that call path when I did the conversion. So yes, that needs some protection Thanks, tglx
[PATCH 05/11] powerpc/topology: Remove the unused parent_node() macro
Commit a7be6e5a7f8d ("mm: drop useless local parameters of __register_one_node()") removes the last user of parent_node(). The parent_node() macro in POWERPC platform is unnecessary. Remove it for cleanup. Reported-by: Michael Ellerman Signed-off-by: Dou Liyang Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: linuxppc-...@lists.ozlabs.org --- arch/powerpc/include/asm/topology.h | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h index dc4e159..2d84bca 100644 --- a/arch/powerpc/include/asm/topology.h +++ b/arch/powerpc/include/asm/topology.h @@ -16,8 +16,6 @@ struct device_node; #include -#define parent_node(node) (node) - #define cpumask_of_node(node) ((node) == -1 ? \ cpu_all_mask : \ node_to_cpumask_map[node]) -- 2.5.5
[PATCH 10/11] x86/topology: Remove the unused parent_node() macro
Commit a7be6e5a7f8d ("mm: drop useless local parameters of __register_one_node()") removes the last user of parent_node(). The parent_node() macro in X86 platform is unnecessary. Remove it for cleanup. Reported-by: Michael Ellerman Signed-off-by: Dou Liyang Cc: Thomas Gleixner Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: x...@kernel.org --- arch/x86/include/asm/topology.h | 6 -- 1 file changed, 6 deletions(-) diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h index 6358a85..c1d2a98 100644 --- a/arch/x86/include/asm/topology.h +++ b/arch/x86/include/asm/topology.h @@ -75,12 +75,6 @@ static inline const struct cpumask *cpumask_of_node(int node) extern void setup_node_to_cpumask_map(void); -/* - * Returns the number of the node containing Node 'node'. This - * architecture is flat, so it is a pretty simple function! - */ -#define parent_node(node) (node) - #define pcibus_to_node(bus) __pcibus_to_node(bus) extern int __node_distance(int, int); -- 2.5.5
[PATCH 08/11] sparc64/topology: Remove the unused parent_node() macro
Commit a7be6e5a7f8d ("mm: drop useless local parameters of __register_one_node()") removes the last user of parent_node(). The parent_node() macro in SPARC64 platform is unnecessary. Remove it for cleanup. Reported-by: Michael Ellerman Signed-off-by: Dou Liyang Cc: "David S. Miller" Cc: sparcli...@vger.kernel.org --- arch/sparc/include/asm/topology_64.h | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/sparc/include/asm/topology_64.h b/arch/sparc/include/asm/topology_64.h index ad5293f..0fcc9a0 100644 --- a/arch/sparc/include/asm/topology_64.h +++ b/arch/sparc/include/asm/topology_64.h @@ -10,8 +10,6 @@ static inline int cpu_to_node(int cpu) return numa_cpu_lookup_table[cpu]; } -#define parent_node(node) (node) - #define cpumask_of_node(node) ((node) == -1 ? \ cpu_all_mask : \ _cpumask_lookup_table[node]) -- 2.5.5
Re: [PATCH v2 02/13] xen/pvcalls: connect to the backend
On 7/25/2017 5:21 PM, Stefano Stabellini wrote: Implement the probe function for the pvcalls frontend. Read the supported versions, max-page-order and function-calls nodes from xenstore. Introduce a data structure named pvcalls_bedata. It contains pointers to the command ring, the event channel, a list of active sockets and a list of passive sockets. Lists accesses are protected by a spin_lock. Introduce a waitqueue to allow waiting for a response on commands sent to the backend. Introduce an array of struct xen_pvcalls_response to store commands responses. Only one frontend<->backend connection is supported at any given time for a guest. Store the active frontend device to a static pointer. Introduce a stub functions for the event handler. Signed-off-by: Stefano Stabellini CC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/pvcalls-front.c | 153 1 file changed, 153 insertions(+) diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c index a8d38c2..5e0b265 100644 --- a/drivers/xen/pvcalls-front.c +++ b/drivers/xen/pvcalls-front.c @@ -20,6 +20,29 @@ #include #include +#define PVCALLS_INVALID_ID (UINT_MAX) Unnecessary parentheses +#define RING_ORDER XENBUS_MAX_RING_GRANT_ORDER PVCALLS_RING_ORDER? +#define PVCALLS_NR_REQ_PER_RING __CONST_RING_SIZE(xen_pvcalls, XEN_PAGE_SIZE) + +struct pvcalls_bedata { + struct xen_pvcalls_front_ring ring; + grant_ref_t ref; + int irq; + + struct list_head socket_mappings; + struct list_head socketpass_mappings; + spinlock_t pvcallss_lock; + + wait_queue_head_t inflight_req; + struct xen_pvcalls_response rsp[PVCALLS_NR_REQ_PER_RING]; +}; +struct xenbus_device *pvcalls_front_dev; static + +static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id) +{ + return IRQ_HANDLED; +} + static const struct xenbus_device_id pvcalls_front_ids[] = { { "pvcalls" }, { "" } @@ -33,12 +56,142 @@ static int pvcalls_front_remove(struct xenbus_device *dev) static int pvcalls_front_probe(struct xenbus_device *dev, const struct xenbus_device_id *id) { + int ret = -EFAULT, evtchn, ref = -1, i; + unsigned int max_page_order, function_calls, len; + char *versions; + grant_ref_t gref_head = 0; + struct xenbus_transaction xbt; + struct pvcalls_bedata *bedata = NULL; + struct xen_pvcalls_sring *sring; + + if (pvcalls_front_dev != NULL) { + dev_err(>dev, "only one PV Calls connection supported\n"); + return -EINVAL; + } + + versions = xenbus_read(XBT_NIL, dev->otherend, "versions", ); + if (!len) + return -EINVAL; + if (strcmp(versions, "1")) { + kfree(versions); + return -EINVAL; + } + kfree(versions); + ret = xenbus_scanf(XBT_NIL, dev->otherend, + "max-page-order", "%u", _page_order); + if (ret <= 0) + return -ENODEV; + if (max_page_order < RING_ORDER) + return -ENODEV; + ret = xenbus_scanf(XBT_NIL, dev->otherend, + "function-calls", "%u", _calls); + if (ret <= 0 || function_calls != 1) + return -ENODEV; + pr_info("%s max-page-order is %u\n", __func__, max_page_order); + + bedata = kzalloc(sizeof(struct pvcalls_bedata), GFP_KERNEL); + if (!bedata) + return -ENOMEM; + + init_waitqueue_head(>inflight_req); + for (i = 0; i < PVCALLS_NR_REQ_PER_RING; i++) + bedata->rsp[i].req_id = PVCALLS_INVALID_ID; + + sring = (struct xen_pvcalls_sring *) __get_free_page(GFP_KERNEL | +__GFP_ZERO); + if (!sring) + goto error; + SHARED_RING_INIT(sring); + FRONT_RING_INIT(>ring, sring, XEN_PAGE_SIZE); + + ret = xenbus_alloc_evtchn(dev, ); + if (ret) + goto error; + + bedata->irq = bind_evtchn_to_irqhandler(evtchn, + pvcalls_front_event_handler, + 0, "pvcalls-frontend", dev); + if (bedata->irq < 0) { + ret = bedata->irq; + goto error; + } + + ret = gnttab_alloc_grant_references(1, _head); + if (ret < 0) + goto error; + bedata->ref = ref = gnttab_claim_grant_reference(_head); Is ref really needed? + if (ref < 0) + goto error; + gnttab_grant_foreign_access_ref(ref, dev->otherend_id, + virt_to_gfn((void *)sring), 0); + + again: + ret = xenbus_transaction_start(); + if (ret) { + xenbus_dev_fatal(dev, ret, "starting transaction"); + goto error; + } + ret = xenbus_printf(xbt,
[PATCH 03/11] metag/numa: Remove the unused parent_node() macro
Commit a7be6e5a7f8d ("mm: drop useless local parameters of __register_one_node()") removes the last user of parent_node(). The parent_node() macro in METAG architecture is unnecessary. Remove it for cleanup. Reported-by: Michael Ellerman Signed-off-by: Dou Liyang Cc: James Hogan Cc: a...@linux-foundation.org Cc: linux-me...@vger.kernel.org --- arch/metag/include/asm/topology.h | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/metag/include/asm/topology.h b/arch/metag/include/asm/topology.h index e95f874..707c7f7 100644 --- a/arch/metag/include/asm/topology.h +++ b/arch/metag/include/asm/topology.h @@ -4,7 +4,6 @@ #ifdef CONFIG_NUMA #define cpu_to_node(cpu) ((void)(cpu), 0) -#define parent_node(node) ((void)(node), 0) #define cpumask_of_node(node) ((void)node, cpu_online_mask) -- 2.5.5
[PATCH v2] smp_call_function: use inline helpers instead of macros
A new caller of smp_call_function() passes a local variable as the 'wait' argument, and that variable is otherwise unused, so we get a warning in non-SMP configurations: virt/kvm/kvm_main.c: In function 'kvm_make_all_cpus_request': virt/kvm/kvm_main.c:195:7: error: unused variable 'wait' [-Werror=unused-variable] bool wait = req & KVM_REQUEST_WAIT; This addresses the warning by changing the two macros into inline functions. As reported by the 0day build bot, a small change is required in the MIPS r4k code for this, which then gets a warning about a missing variable. Fixes: 7a97cec26b94 ("KVM: mark requests that need synchronization") Cc: Paolo Bonzini Link: https://patchwork.kernel.org/patch/9722063/ Signed-off-by: Arnd Bergmann --- v2: - fix MIPS build error reported by kbuild test robot - remove up_smp_call_function() --- arch/mips/mm/c-r4k.c | 2 ++ include/linux/smp.h | 12 +++- 2 files changed, 9 insertions(+), 5 deletions(-) diff --git a/arch/mips/mm/c-r4k.c b/arch/mips/mm/c-r4k.c index 81d6a15c93d0..f353bf5f24f1 100644 --- a/arch/mips/mm/c-r4k.c +++ b/arch/mips/mm/c-r4k.c @@ -97,9 +97,11 @@ static inline void r4k_on_each_cpu(unsigned int type, void (*func)(void *info), void *info) { preempt_disable(); +#ifdef CONFIG_SMP if (r4k_op_needs_ipi(type)) smp_call_function_many(_foreign_map[smp_processor_id()], func, info, 1); +#endif func(info); preempt_enable(); } diff --git a/include/linux/smp.h b/include/linux/smp.h index 68123c1fe549..ea24e2d3504c 100644 --- a/include/linux/smp.h +++ b/include/linux/smp.h @@ -135,17 +135,19 @@ static inline void smp_send_stop(void) { } * These macros fold the SMP functionality into a single CPU system */ #define raw_smp_processor_id() 0 -static inline int up_smp_call_function(smp_call_func_t func, void *info) +static inline int smp_call_function(smp_call_func_t func, void *info, int wait) { return 0; } -#define smp_call_function(func, info, wait) \ - (up_smp_call_function(func, info)) static inline void smp_send_reschedule(int cpu) { } #define smp_prepare_boot_cpu() do {} while (0) -#define smp_call_function_many(mask, func, info, wait) \ - (up_smp_call_function(func, info)) + +static inline void smp_call_function_many(const struct cpumask *mask, + smp_call_func_t func, void *info, bool wait) +{ +} + static inline void call_function_init(void) { } static inline int -- 2.9.0
[PATCH 11/11] asm-generic: numa: Remove the unused parent_node() macro
Commit a7be6e5a7f8d ("mm: drop useless local parameters of __register_one_node()") removes the last user of parent_node(). The parent_node() macro in generic situation is unnecessary. Remove it for cleanup. Reported-by: Michael Ellerman Signed-off-by: Dou Liyang Cc: Arnd Bergmann Cc: linux-a...@vger.kernel.org --- include/asm-generic/topology.h | 3 --- 1 file changed, 3 deletions(-) diff --git a/include/asm-generic/topology.h b/include/asm-generic/topology.h index fc824e2..a91d842 100644 --- a/include/asm-generic/topology.h +++ b/include/asm-generic/topology.h @@ -44,9 +44,6 @@ #define cpu_to_mem(cpu)((void)(cpu),0) #endif -#ifndef parent_node -#define parent_node(node) ((void)(node),0) -#endif #ifndef cpumask_of_node #define cpumask_of_node(node) ((void)node, cpu_online_mask) #endif -- 2.5.5
[PATCH 01/11] arm64: numa: Remove the unused parent_node() macro
Commit a7be6e5a7f8d ("mm: drop useless local parameters of __register_one_node()") removes the last user of parent_node(). The parent_node() macro in ARM64 platform is unnecessary. Remove it for cleanup. Reported-by: Michael Ellerman Signed-off-by: Dou Liyang Cc: Michael Ellerman Cc: Will Deacon Cc: linux-arm-ker...@lists.infradead.org --- arch/arm64/include/asm/numa.h | 3 --- 1 file changed, 3 deletions(-) diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h index bf466d1..ef7b238 100644 --- a/arch/arm64/include/asm/numa.h +++ b/arch/arm64/include/asm/numa.h @@ -7,9 +7,6 @@ #define NR_NODE_MEMBLKS(MAX_NUMNODES * 2) -/* currently, arm64 implements flat NUMA topology */ -#define parent_node(node) (node) - int __node_distance(int from, int to); #define node_distance(a, b) __node_distance(a, b) -- 2.5.5
[PATCH 09/11] tile/topology: Remove the unused parent_node() macro
Commit a7be6e5a7f8d ("mm: drop useless local parameters of __register_one_node()") removes the last user of parent_node(). The parent_node() macro in tile platform is unnecessary. Remove it for cleanup. Reported-by: Michael Ellerman Signed-off-by: Dou Liyang Cc: Chris Metcalf --- arch/tile/include/asm/topology.h | 6 -- 1 file changed, 6 deletions(-) diff --git a/arch/tile/include/asm/topology.h b/arch/tile/include/asm/topology.h index b11d5fc..635a0a4 100644 --- a/arch/tile/include/asm/topology.h +++ b/arch/tile/include/asm/topology.h @@ -29,12 +29,6 @@ static inline int cpu_to_node(int cpu) return cpu_2_node[cpu]; } -/* - * Returns the number of the node containing Node 'node'. - * This architecture is flat, so it is a pretty simple function! - */ -#define parent_node(node) (node) - /* Returns a bitmask of CPUs on Node 'node'. */ static inline const struct cpumask *cpumask_of_node(int node) { -- 2.5.5
[PATCH v2] Kbuild: use -fshort-wchar globally
A previous patch added the --no-wchar-size-warning to the Makefile to avoid this harmless warning: arm-linux-gnueabi-ld: warning: drivers/xen/efi.o uses 2-byte wchar_t yet the output is to use 4-byte wchar_t; use of wchar_t values across objects may fail Changing kbuild to use thin archives instead of recursive linking unfortunately brings the same warning back during the final link. The kernel does not use wchar_t string literals at this point, and xen does not use wchar_t at all (only efi_char16_t), so the flag has no effect, but as pointed out by Jan Beulich, adding a wchar_t string literal would be bad here. Since wchar_t is always defined as u16, independent of the toolchain default, always passing -fshort-wchar is correct and lets us remove the Xen specific hack along with fixing the warning. Signed-off-by: Arnd Bergmann Fixes: 971a69db7dc0 ("Xen: don't warn about 2-byte wchar_t in efi") Acked-by: David Vrabel Link: https://patchwork.kernel.org/patch/9275217/ --- I submitted an earlier patch in August 2016, simply removing the flag in xen, but there seems to be no harm in enabling it globally --- Makefile | 2 +- drivers/xen/Makefile | 3 --- 2 files changed, 1 insertion(+), 4 deletions(-) diff --git a/Makefile b/Makefile index f1533423094f..0fe63a47fd52 100644 --- a/Makefile +++ b/Makefile @@ -396,7 +396,7 @@ LINUXINCLUDE:= \ KBUILD_CPPFLAGS := -D__KERNEL__ KBUILD_CFLAGS := -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs \ - -fno-strict-aliasing -fno-common \ + -fno-strict-aliasing -fno-common -fshort-wchar \ -Werror-implicit-function-declaration \ -Wno-format-security \ -std=gnu89 $(call cc-option,-fno-PIE) diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile index 8feab810aed9..7f188b8d0c67 100644 --- a/drivers/xen/Makefile +++ b/drivers/xen/Makefile @@ -7,9 +7,6 @@ obj-y += xenbus/ nostackp := $(call cc-option, -fno-stack-protector) CFLAGS_features.o := $(nostackp) -CFLAGS_efi.o += -fshort-wchar -LDFLAGS+= $(call ld-option, --no-wchar-size-warning) - dom0-$(CONFIG_ARM64) += arm-device.o dom0-$(CONFIG_PCI) += pci.o dom0-$(CONFIG_USB_SUPPORT) += dbgp.o -- 2.9.0
Re: [REGRESSION 4.13-rc] NFS returns -EACCESS at the first read
On 07/26/2017 09:30 AM, Takashi Iwai wrote: > On Wed, 26 Jul 2017 14:57:07 +0200, > Anna Schumaker wrote: >> >> Hi Takashi, >> >> On 07/26/2017 08:54 AM, Takashi Iwai wrote: >>> Hi, >>> >>> I seem hitting a regression of NFS client on the today's Linus git >>> tree. The symptom is that the file read over NFS returns occasionally >>> -EACCESS at the first read. When I try to read the same file again >>> (or do some other thing), I can read it successfully. >>> >>> The git bisection leaded to the commit >>> bd8b2441742b49c76bec707757bd9c028ea9838e >>> NFS: Store the raw NFS access mask in the inode's access cache >>> >>> >>> Any further hint for debugging? >> >> Does the patch in this email thread help? >> http://www.spinics.net/lists/linux-nfs/msg64930.html > > Thanks, I gave it a shot and the result looks good. Feel free to my > tested-by tag: > Tested-by: Takashi Iwai > > > Though, when I look around the code, I feel somehow uneasy by that > still MAY_XXX is used for nfs_access_entry.mask, e.g. in > nfs3_proc_access() or nfs4_proc_access(). Are these function OK > without the similar conversion? I just started looking at that at the end of the day yesterday. I think they work by accident, since all the bits in the mask are set by nfs_do_access(). They should probably be converted, but I don't think it's urgent. Anna > > > thanks, > > Takashi >
[PATCH 02/11] ia64: topology: Remove the unused parent_node() macro
Commit a7be6e5a7f8d ("mm: drop useless local parameters of __register_one_node()") removes the last user of parent_node(). The parent_node() macro in IA64(Itanium) platform is unnecessary. Remove it for cleanup. Reported-by: Michael Ellerman Signed-off-by: Dou Liyang Cc: Tony Luck Cc: Fenghua Yu Cc: linux-i...@vger.kernel.org --- arch/ia64/include/asm/topology.h | 7 --- 1 file changed, 7 deletions(-) diff --git a/arch/ia64/include/asm/topology.h b/arch/ia64/include/asm/topology.h index 3ad8f69..82f9bf7 100644 --- a/arch/ia64/include/asm/topology.h +++ b/arch/ia64/include/asm/topology.h @@ -34,13 +34,6 @@ _to_cpu_mask[node]) /* - * Returns the number of the node containing Node 'nid'. - * Not implemented here. Multi-level hierarchies detected with - * the help of node_distance(). - */ -#define parent_node(nid) (nid) - -/* * Determines the node for a given pci bus */ #define pcibus_to_node(bus) PCI_CONTROLLER(bus)->node -- 2.5.5
Re: [PATCH net] Revert "vhost: cache used event for better performance"
On 2017年07月26日 21:18, Jason Wang wrote: On 2017年07月26日 20:57, Michael S. Tsirkin wrote: On Wed, Jul 26, 2017 at 04:03:17PM +0800, Jason Wang wrote: This reverts commit 809ecb9bca6a9424ccd392d67e368160f8b76c92. Since it was reported to break vhost_net. We want to cache used event and use it to check for notification. We try to valid cached used event by checking whether or not it was ahead of new, but this is not correct all the time, it could be stale and there's no way to know about this. Signed-off-by: Jason Wang Could you supply a bit more data here please? How does it get stale? What does guest need to do to make it stale? This will be helpful if anyone wants to bring it back, or if we want to extend the protocol. The problem we don't know whether or not guest has published a new used event. The check vring_need_event(vq->last_used_event, new + vq->num, new) is not sufficient to check for this. Thanks More notes, the previous assumption is that we don't move used event back, but this could happen in fact if idx is wrapper around. Will repost and add this into commit log. Thanks
[PATCH 07/11] sh/numa: Remove the unused parent_node() macro
Commit a7be6e5a7f8d ("mm: drop useless local parameters of __register_one_node()") removes the last user of parent_node(). The parent_node() macro in SUPERH platform is unnecessary. Remove it for cleanup. Reported-by: Michael Ellerman Signed-off-by: Dou Liyang Cc: Yoshinori Sato Cc: Rich Felker Cc: linux...@vger.kernel.org --- arch/sh/include/asm/topology.h | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/sh/include/asm/topology.h b/arch/sh/include/asm/topology.h index 358e3f5..6931f50 100644 --- a/arch/sh/include/asm/topology.h +++ b/arch/sh/include/asm/topology.h @@ -4,7 +4,6 @@ #ifdef CONFIG_NUMA #define cpu_to_node(cpu) ((void)(cpu),0) -#define parent_node(node) ((void)(node),0) #define cpumask_of_node(node) ((void)node, cpu_online_mask) -- 2.5.5
[PATCH 04/11] MIPS: numa: Remove the unused parent_node() macro
Commit a7be6e5a7f8d ("mm: drop useless local parameters of __register_one_node()") removes the last user of parent_node(). The parent_node() macros in both IP27 and Loongson64 are unnecessary. Remove it for cleanup. Reported-by: Michael Ellerman Signed-off-by: Dou Liyang Cc: Ralf Baechle Cc: James Hogan Cc: linux-m...@linux-mips.org --- arch/mips/include/asm/mach-ip27/topology.h | 1 - arch/mips/include/asm/mach-loongson64/topology.h | 1 - 2 files changed, 2 deletions(-) diff --git a/arch/mips/include/asm/mach-ip27/topology.h b/arch/mips/include/asm/mach-ip27/topology.h index defd135..3fb7a0e 100644 --- a/arch/mips/include/asm/mach-ip27/topology.h +++ b/arch/mips/include/asm/mach-ip27/topology.h @@ -23,7 +23,6 @@ struct cpuinfo_ip27 { extern struct cpuinfo_ip27 sn_cpu_info[NR_CPUS]; #define cpu_to_node(cpu) (sn_cpu_info[(cpu)].p_nodeid) -#define parent_node(node) (node) #define cpumask_of_node(node) ((node) == -1 ? \ cpu_all_mask : \ _data(node)->h_cpus) diff --git a/arch/mips/include/asm/mach-loongson64/topology.h b/arch/mips/include/asm/mach-loongson64/topology.h index 0d8f3b5..bcb8856 100644 --- a/arch/mips/include/asm/mach-loongson64/topology.h +++ b/arch/mips/include/asm/mach-loongson64/topology.h @@ -4,7 +4,6 @@ #ifdef CONFIG_NUMA #define cpu_to_node(cpu) (cpu_logical_map(cpu) >> 2) -#define parent_node(node) (node) #define cpumask_of_node(node) (&__node_data[(node)]->cpumask) struct pci_bus; -- 2.5.5
[PATCH] [v2] iopoll: avoid -Wint-in-bool-context warning
When we pass the result of a multiplication as the timeout or the delay, we can get a warning: drivers/mmc/host/bcm2835.c:596:149: error: '*' in boolean context, suggest '&&' instead [-Werror=int-in-bool-context] drivers/mfd/arizona-core.c:247:195: error: '*' in boolean context, suggest '&&' instead [-Werror=int-in-bool-context] drivers/gpu/drm/sun4i/sun4i_hdmi_i2c.c:49:27: error: '*' in boolean context, suggest '&&' instead [-Werror=int-in-bool-context] The warning is a bit questionable inside of a macro, but this is intentional on the side of the gcc developers. It is also an indication of another problem: we evaluate the timeout and sleep arguments multiple times, which can have undesired side-effects when those are complex expressions. This changes the three iopoll variants to use local variables for storing copies of the timeouts. This adds some more type safety, and avoids both the double-evaluation and the gcc warning. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81484 Signed-off-by: Arnd Bergmann --- v2: - use temporary variables instead of zero-comparison, to avoid double evaluation - also address the delay, not just timout handling --- include/linux/iopoll.h | 24 +++- include/linux/regmap.h | 12 +++- 2 files changed, 22 insertions(+), 14 deletions(-) diff --git a/include/linux/iopoll.h b/include/linux/iopoll.h index d29e1e21bf3f..b1d861caca16 100644 --- a/include/linux/iopoll.h +++ b/include/linux/iopoll.h @@ -42,18 +42,21 @@ */ #define readx_poll_timeout(op, addr, val, cond, sleep_us, timeout_us) \ ({ \ - ktime_t timeout = ktime_add_us(ktime_get(), timeout_us); \ - might_sleep_if(sleep_us); \ + u64 __timeout_us = (timeout_us); \ + unsigned long __sleep_us = (sleep_us); \ + ktime_t __timeout = ktime_add_us(ktime_get(), __timeout_us); \ + might_sleep_if((__sleep_us) != 0); \ for (;;) { \ (val) = op(addr); \ if (cond) \ break; \ - if (timeout_us && ktime_compare(ktime_get(), timeout) > 0) { \ + if (__timeout_us && \ + ktime_compare(ktime_get(), __timeout) > 0) { \ (val) = op(addr); \ break; \ } \ - if (sleep_us) \ - usleep_range((sleep_us >> 2) + 1, sleep_us); \ + if (__sleep_us) \ + usleep_range((__sleep_us >> 2) + 1, __sleep_us); \ } \ (cond) ? 0 : -ETIMEDOUT; \ }) @@ -77,17 +80,20 @@ */ #define readx_poll_timeout_atomic(op, addr, val, cond, delay_us, timeout_us) \ ({ \ - ktime_t timeout = ktime_add_us(ktime_get(), timeout_us); \ + u64 __timeout_us = (timeout_us); \ + unsigned long __delay_us = (delay_us); \ + ktime_t __timeout = ktime_add_us(ktime_get(), __timeout_us); \ for (;;) { \ (val) = op(addr); \ if (cond) \ break; \ - if (timeout_us && ktime_compare(ktime_get(), timeout) > 0) { \ + if (__timeout_us && \ + ktime_compare(ktime_get(), __timeout) > 0) { \ (val) = op(addr); \ break; \ } \ - if (delay_us) \ - udelay(delay_us); \ + if (__delay_us) \ + udelay(__delay_us); \ } \ (cond) ? 0 : -ETIMEDOUT; \ }) diff --git a/include/linux/regmap.h b/include/linux/regmap.h index 1474ab0a3922..a4d30c877f6b 100644 --- a/include/linux/regmap.h +++ b/include/linux/regmap.h @@ -120,22 +120,24 @@ struct reg_sequence { */ #define regmap_read_poll_timeout(map, addr, val, cond, sleep_us, timeout_us) \ ({ \ - ktime_t __timeout = ktime_add_us(ktime_get(), timeout_us); \ + u64 __timeout_us = (timeout_us); \ + unsigned long __sleep_us = (sleep_us); \ + ktime_t __timeout = ktime_add_us(ktime_get(), __timeout_us); \ int __ret; \ - might_sleep_if(sleep_us); \ + might_sleep_if(__sleep_us); \ for (;;) { \ __ret = regmap_read((map), (addr), &(val)); \ if (__ret) \ break; \ if (cond) \ break; \ - if ((timeout_us) && \ + if (__timeout_us && \ ktime_compare(ktime_get(), __timeout) > 0) { \ __ret = regmap_read((map), (addr), &(val)); \ break; \ } \ - if (sleep_us) \ - usleep_range(((sleep_us) >> 2) + 1, sleep_us); \ + if (__sleep_us) \ + usleep_range((__sleep_us >> 2) + 1, __sleep_us); \ } \ __ret ?: ((cond) ? 0 : -ETIMEDOUT); \ }) -- 2.9.0
[PATCH 06/11] s390/topology: Remove the unused parent_node() macro
Commit a7be6e5a7f8d ("mm: drop useless local parameters of __register_one_node()") removes the last user of parent_node(). The parent_node() macro in S390 platform is unnecessary. Remove it for cleanup. Reported-by: Michael Ellerman Signed-off-by: Dou Liyang Cc: Martin Schwidefsky Cc: Heiko Carstens Cc: Michael Holzheu Cc: linux-s...@vger.kernel.org --- arch/s390/include/asm/topology.h | 6 -- 1 file changed, 6 deletions(-) diff --git a/arch/s390/include/asm/topology.h b/arch/s390/include/asm/topology.h index fa1bfce..5222da1 100644 --- a/arch/s390/include/asm/topology.h +++ b/arch/s390/include/asm/topology.h @@ -77,12 +77,6 @@ static inline const struct cpumask *cpumask_of_node(int node) return _to_cpumask_map[node]; } -/* - * Returns the number of the node containing node 'node'. This - * architecture is flat, so it is a pretty simple function! - */ -#define parent_node(node) (node) - #define pcibus_to_node(bus) __pcibus_to_node(bus) #define node_distance(a, b) __node_distance(a, b) -- 2.5.5
[PATCH 00/11] Remove the parent_node() for each arch
Michael reports the parent_node() will never be invoked since the Commit a7be6e5a7f8d ("mm: drop useless local parameters of __register_one_node()") removes the last user of it. So we start removing it from the topology.h headers for each arch. Dou Liyang (11): arm64: numa: Remove the unused parent_node() macro ia64: topology: Remove the unused parent_node() macro metag/numa: Remove the unused parent_node() macro MIPS: numa: Remove the unused parent_node() macro powerpc/numa: Remove the unused parent_node() macro s390/topology: Remove the unused parent_node() macro sh/numa: Remove the unused parent_node() macro sparc64/topology: Remove the unused parent_node() macro tile/topology: Remove the unused parent_node() macro x86/topology: Remove the unused parent_node() macro asm-generic: numa: Remove the unused parent_node() macro arch/arm64/include/asm/numa.h| 3 --- arch/ia64/include/asm/topology.h | 7 --- arch/metag/include/asm/topology.h| 1 - arch/mips/include/asm/mach-ip27/topology.h | 1 - arch/mips/include/asm/mach-loongson64/topology.h | 1 - arch/powerpc/include/asm/topology.h | 2 -- arch/s390/include/asm/topology.h | 6 -- arch/sh/include/asm/topology.h | 1 - arch/sparc/include/asm/topology_64.h | 2 -- arch/tile/include/asm/topology.h | 6 -- arch/x86/include/asm/topology.h | 6 -- include/asm-generic/topology.h | 3 --- 12 files changed, 39 deletions(-) -- 2.5.5
Re: [PATCH] iommu/amd: Fix schedule-while-atomic BUG in initialization code
On Wed, Jul 26, 2017 at 03:25:05PM +0200, Artem Savkov wrote: > On Wed, Jul 26, 2017 at 02:26:14PM +0200, Joerg Roedel wrote: > > Yes, that should fix it, but I think its better to just move the > > register_syscore_ops() call to a later initialization step, like in the > > patch below. I tested it an will queue it to my iommu/fixes branch. > > Checked it as well just in case, didn't see any issues. Thank you. > > Reported-and-tested-by: Artem Savkov Thanks for testing it! I added your's and Thomas' tags and applied the patch to my tree. It should go upstream this week. Joerg
Re: [RFC PATCH] mm: memcg: fix css double put in mem_cgroup_iter
On Wed 26-07-17 21:07:42, Wenwei Tao wrote: > From: Wenwei Tao > > By removing the child cgroup while the parent cgroup is > under reclaim, we could trigger the following kernel panic > on kernel 3.10: > > kernel BUG at kernel/cgroup.c:893! > invalid opcode: [#1] SMP > CPU: 1 PID: 22477 Comm: kworker/1:1 Not tainted 3.10.107 #1 > Workqueue: cgroup_destroy css_dput_fn > task: 8817959a5780 ti: 8817e8886000 task.ti: 8817e8886000 > RIP: 0010:[] [] > cgroup_diput+0xc0/0xf0 > RSP: :8817e8887da0 EFLAGS: 00010246 > RAX: RBX: 8817a5dd5d40 RCX: dead0200 > RDX: RSI: 8817973a6910 RDI: 8817f54c2a00 > RBP: 8817e8887dc8 R08: 8817a5dd5dd0 R09: df9fb35794b01820 > R10: df9fb35794b01820 R11: 7fa95b1efcda R12: 8817a5dd5d9c > R13: 8817f38b3a40 R14: 8817973a6910 R15: 8817973a6910 > FS: () GS:88181f22() > knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 7fa6e6234000 CR3: 00179f19d000 CR4: 000407e0 > DR0: DR1: DR2: > DR3: DR6: 0ff0 DR7: 0400 > Stack: > 8817a5dd5d40 8817a5dd5d9c 8817f38b3a40 8817973a6910 > 0040 8817e8887df8 811b37c2 8817fa23c000 > 8817f57dbb80 88181f232ac0 88181f237500 8817e8887e10 > Call Trace: > [] dput+0x1a2/0x2f0 > [] cgroup_dput.isra.21+0x1c/0x30 > [] css_dput_fn+0x1d/0x20 > [] process_one_work+0x17c/0x460 > [] worker_thread+0x116/0x3b0 > [] ? manage_workers.isra.25+0x290/0x290 > [] kthread+0xc0/0xd0 > [] ? insert_kthread_work+0x40/0x40 > [] ret_from_fork+0x58/0x90 > [] ? insert_kthread_work+0x40/0x40 > Code: 41 5e 41 5f 5d c3 0f 1f 44 00 00 48 8b 7f 78 48 8b 07 a8 01 74 15 > 48 81 c7 30 01 00 00 48 c7 c6 a0 a7 0c 81 e8 b2 83 02 00 eb c8 <0f> 0b > 49 8b 4e 18 48 c7 c2 7e f1 7a 81 be 85 03 00 00 48 c7 c7 > RIP [] cgroup_diput+0xc0/0xf0 > RSP > ---[ end trace 85eeea5212c44f51 ]--- > > > I think there is a css double put in mem_cgroup_iter. Under reclaim, > we call mem_cgroup_iter the first time with prev == NULL, and we get > last_visited memcg from per zone's reclaim_iter then call > __mem_cgroup_iter_next > try to get next alive memcg, __mem_cgroup_iter_next could return NULL > if last_visited is already the last one so we put the last_visited's > memcg css and continue to the next while loop, this time we might not > do css_tryget(_visited->css) if the dead_count is changed, but > we still do css_put(_visited->css), we put it twice, this could > trigger the BUG_ON at kernel/cgroup.c:893. Yes, I guess your are right and I suspect that this has been silently fixed by 519ebea3bf6d ("mm: memcontrol: factor out reclaim iterator loading and updating"). I think a more appropriate fix is would be. Are you able to reproduce and re-test it? --- diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 437ae2cbe102..0848ec05c12a 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1224,6 +1224,8 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root, if (last_visited && last_visited != root && !css_tryget(_visited->css)) last_visited = NULL; + } else { + last_visited = true; } } -- Michal Hocko SUSE Labs
Re: Sparse warnings on GENMASK + arm32
On Wed, Jul 26, 2017 at 09:33:01AM -0400, Lance Richardson wrote: > > From: "Stephen Boyd" > > I see sparse warning when I check a clk driver file in the kernel > > on a 32-bit ARM build. > > > > drivers/clk/sunxi/clk-sun6i-ar100.c:65:20: warning: cast truncates bits from > > constant value (3 becomes ) > > Hmm, it seems sparse is incorrectly taking ~0UL to be a 64-bit value > while BITS_PER_LONG is (correctly) evaluated to be 32. > > #define GENMASK(h, l) \ > (((~0UL) << (l)) & (~0UL >> (BITS_PER_LONG - 1 - (h It's the kernel CHECKFLAGS that should be using -m32/-m64 if built on a machine with a different wordsize tht the arch. I sent earlier a patch for ARM, I just forgot to CC the mailing list here. -- Luc