Re: [PATCH-v3 2/3] mfd: 88pm800: Allow configuration of interrupt clear method
On Thursday 25 June 2015 11:20 AM, Krzysztof Kozlowski wrote: On 25.06.2015 14:44, Vaibhav Hiremath wrote: On Thursday 25 June 2015 11:02 AM, Krzysztof Kozlowski wrote: On 25.06.2015 14:26, Vaibhav Hiremath wrote: On Thursday 25 June 2015 05:33 AM, Krzysztof Kozlowski wrote: 2015-06-24 18:21 GMT+09:00 Vaibhav Hiremath : As per the spec, bit 1 (INT_CLEAR_MODE) of reg addr 0xe (page 0) controls the method of clearing interrupt status of 88pm800 family of devices; 0: clear on read 1: clear on write This patch allows to configure this field, through DT. Also, as suggested by "Lee Jones" renaming DT property and variable field to appropriate name. Signed-off-by: Zhao Ye Signed-off-by: Vaibhav Hiremath Yes, Fair enough... I see very little value in runtime configuration, why not just do it only way (either read or write)? I would prefer to just set it by default (during init), to clear irq on write. Hard-coding a default value, if board files are not present, looks OK to me. This is how it will look, I will also update the binding information with this. hvaibhav@hvaibhav-ThinkPad-T440p:~/projects/mainline/linux$ git diff --cached diff --git a/drivers/mfd/88pm800.c b/drivers/mfd/88pm800.c index 0a417ac..e415a06 100644 --- a/drivers/mfd/88pm800.c +++ b/drivers/mfd/88pm800.c @@ -645,9 +645,8 @@ static int pm800_probe(struct i2c_client *client, dev_err(>dev, "failed to allocaate memory\n"); return -ENOMEM; } - - pdata->irq_clr_on_wr = of_property_read_bool(np, - "marvell,irq-clr-on-write"); + /* Setting irq clear method on write */ + pdata->irq_clr_on_wr = true; } ret = pm80x_init(client); Thanks, Vaibhav -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH-v3 2/3] mfd: 88pm800: Allow configuration of interrupt clear method
On 25.06.2015 14:44, Vaibhav Hiremath wrote: > > > On Thursday 25 June 2015 11:02 AM, Krzysztof Kozlowski wrote: >> On 25.06.2015 14:26, Vaibhav Hiremath wrote: >>> >>> >>> On Thursday 25 June 2015 05:33 AM, Krzysztof Kozlowski wrote: 2015-06-24 18:21 GMT+09:00 Vaibhav Hiremath : > As per the spec, bit 1 (INT_CLEAR_MODE) of reg addr 0xe > (page 0) controls the method of clearing interrupt > status of 88pm800 family of devices; > > 0: clear on read > 1: clear on write > > This patch allows to configure this field, through DT. > > Also, as suggested by "Lee Jones" renaming DT property and variable > field to appropriate name. > > Signed-off-by: Zhao Ye > Signed-off-by: Vaibhav Hiremath It does not look like a property of the board. Instead it looks like a runtime configuration so it should not be part of DT bindings. >>> >>> Why do you say that? >>> >>> It is very well feature of 88PM860 device, where you can control irq >>> clear operation (either read/write). >>> >>> >>> Thanks, >>> Vaibhav >>> I understand that previously this was configured by platform data and now you want to move everything to DT. But this does not belong to DT... >>> >>> Thats not completely true. >>> I think DT is the right place for this configuration. >> >> DT and its bindings describe the specific board or device. Let me quote: >> <> structure and language for describing hardware. More specifically, it >> is a description of hardware that is readable by an operating system...>> >> >> Whether you clear interrupts by writing or reading is configured during >> runtime and it is completely independent to wiring. Each board with >> 88pm800 would allow both methods. So this is not a property of hardware >> in the terms of open firmware. This is a runtime configuration. >> > > Yes, > Fair enough... > > I see very little value in runtime configuration, why not just do it > only way (either read or write)? > I would prefer to just set it by default (during init), to clear irq on > write. Hard-coding a default value, if board files are not present, looks OK to me. Best regards, Krzysztof -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] f2fs updates for v4.2
On Thu, Jun 25, 2015 at 05:33:34AM +0100, Al Viro wrote: > On Wed, Jun 24, 2015 at 08:42:02PM -0700, Linus Torvalds wrote: > > On Wed, Jun 24, 2015 at 1:25 PM, Jaegeuk Kim wrote: > > > > > > New features are: > > > o per-file encryption (e.g., ext4) > > > > The new encrypted symlinks needed fixups for the changes that happened > > meanwhile to the symlink handling. I did all that in my merge, and I > > *think* I got it all right, but I would like you to check. In > > particular, I hope you have a test-case and can actually give it a > > whirl on that. > > > > Al added to cc, just in case he could also check my merge resolution > > of fs/f2fs/namei.c (the merge is commit cfcc0ad47f4c, I'll push it out > > after I've finished the filesystem pulls) > > FWIW, linux-next contains fixups for a bunch of such stuff, > including f2fs one. The only difference between your resolution and > Stephen's fixup is > static const char *f2fs_encrypted_follow_link(struct dentry *dentry, > void **cookie) > vs. > static const char *f2fs_encrypted_follow_link(struct dentry *dentry, void > **cookie) > > Said that, f2fs_symlink() looks odd - we create a directory entry *before* > doing page_symlink(). And if it (or encryption) fails, I don't see anything > that would remove that new directory entry. What are we ending up with > in such case? Thanks Al, Right, I missed merging the fix-up patch in linux-next into my pull-request. At a glance, I think there is no problem; except 80 column width, though. Also, agreed that I need to take a look at deleting the dentry to deal with that failure case. Thanks, -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ipc: Modify message queue accounting to reflect both total user data and auxiliary kernel data
On Tue, 2015-06-23 at 00:25 +0200, Marcus Gelderie wrote: > A while back, the message queue implementation in the kernel was > improved to use btrees to speed up retrieval of messages (commit > d6629859b36). The patch introducing the improved kernel handling of > message queues (using btrees) has, as a by-product, changed the > meaning of the QSIZE field in the pseudo-file created for the queue. > Before, this field reflected the size of the user-data in the queue. > Since, it also takes kernel data structures into account. For > example, if 13 bytes of user data are in the queue, on my machine the > file reports a size of 61 bytes. Good catch, and a nice opportunity to make the mq manpage more specific wrt to queue sizes. [...] > Reporting the size of the message queue in kernel has its merits, but > doing so in the QSIZE field of the pseudo file corresponding to the > queue is a breaking change, as mentioned above. This patch therefore > returns the QSIZE field to its original meaning. At the same time, > it introduces a new field QKERSIZE that reflects the size of the queue > in kernel (user data + kernel data). Hmmm I'm not sure about this. What are the specific benefits of having QKERSIZE? We don't export in-kernel data like this in any other ipc (posix or sysv) mechanism, afaik. Plus, we do not compromise kernel data structures like this, as we would break userspace if later we change posix_msg_tree_node. So NAK to this. I would just remove the extra + info->qsize += sizeof(struct posix_msg_tree_node); bits from d6629859b36 (along with -stable v3.5), plus a patch updating the manpage that this field only reflects user data. Thanks, Davidlohr -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Use of pinctrl-single for external device over I2C
On Thursday 25 June 2015 10:08 AM, Tony Lindgren wrote: * Vaibhav Hiremath [150624 10:12]: I do not like this, as this is not HW feature, so DT may not be right approach. So I will dig more from either runtime or Compile time option to use regmap_ Vs raw read/writes. Can't you just check if the pinctrl node has compatible = "syscon" property? A compile time option won't work for sure. I don't know what you would check at runtime as you do not know what the bus is behind syscon. Although, I haven't gone through syscon, but not sure whether syscon would be useful. As you rightly stated, we need to know the bus behind regmap. Thanks, Vaibhav -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH-v3 2/3] mfd: 88pm800: Allow configuration of interrupt clear method
On Thursday 25 June 2015 11:02 AM, Krzysztof Kozlowski wrote: On 25.06.2015 14:26, Vaibhav Hiremath wrote: On Thursday 25 June 2015 05:33 AM, Krzysztof Kozlowski wrote: 2015-06-24 18:21 GMT+09:00 Vaibhav Hiremath : As per the spec, bit 1 (INT_CLEAR_MODE) of reg addr 0xe (page 0) controls the method of clearing interrupt status of 88pm800 family of devices; 0: clear on read 1: clear on write This patch allows to configure this field, through DT. Also, as suggested by "Lee Jones" renaming DT property and variable field to appropriate name. Signed-off-by: Zhao Ye Signed-off-by: Vaibhav Hiremath It does not look like a property of the board. Instead it looks like a runtime configuration so it should not be part of DT bindings. Why do you say that? It is very well feature of 88PM860 device, where you can control irq clear operation (either read/write). Thanks, Vaibhav I understand that previously this was configured by platform data and now you want to move everything to DT. But this does not belong to DT... Thats not completely true. I think DT is the right place for this configuration. DT and its bindings describe the specific board or device. Let me quote: <> Whether you clear interrupts by writing or reading is configured during runtime and it is completely independent to wiring. Each board with 88pm800 would allow both methods. So this is not a property of hardware in the terms of open firmware. This is a runtime configuration. Yes, Fair enough... I see very little value in runtime configuration, why not just do it only way (either read or write)? I would prefer to just set it by default (during init), to clear irq on write. Thanks, Vaibhav -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH-v3 1/3] mfd: 88pm800: Add device tree support
On 25.06.2015 14:27, Vaibhav Hiremath wrote: > > > On Thursday 25 June 2015 05:27 AM, Krzysztof Kozlowski wrote: >> 2015-06-24 18:21 GMT+09:00 Vaibhav Hiremath >> : >>> Add DT support to the 88pm800 driver, along with compatible >>> field for it's sub-devices (rtc, onkey and regulator) >>> >>> Signed-off-by: Chao Xie >>> Signed-off-by: Vaibhav Hiremath >>> --- >>> drivers/mfd/88pm800.c | 25 + >>> 1 file changed, 25 insertions(+) >>> >>> diff --git a/drivers/mfd/88pm800.c b/drivers/mfd/88pm800.c >>> index 841717a..059f01a 100644 >>> --- a/drivers/mfd/88pm800.c >>> +++ b/drivers/mfd/88pm800.c >>> @@ -27,6 +27,7 @@ >>> #include >>> #include >>> #include >>> +#include >>> >>> /* Interrupt Registers */ >>> #define PM800_INT_STATUS1 (0x05) >>> @@ -121,6 +122,11 @@ static const struct i2c_device_id >>> pm80x_id_table[] = { >>> }; >>> MODULE_DEVICE_TABLE(i2c, pm80x_id_table); >>> >>> +static const struct of_device_id pm80x_of_match_table[] = { >>> + { .compatible = "marvell,88pm800", }, >>> + {}, >>> +}; >>> + >>> static struct resource rtc_resources[] = { >>> { >>> .name = "88pm80x-rtc", >>> @@ -133,6 +139,7 @@ static struct resource rtc_resources[] = { >>> static struct mfd_cell rtc_devs[] = { >>> { >>> .name = "88pm80x-rtc", >>> +.of_compatible = "marvell,88pm80x-rtc", >>> .num_resources = ARRAY_SIZE(rtc_resources), >>> .resources = _resources[0], >>> .id = -1, >>> @@ -151,6 +158,7 @@ static struct resource onkey_resources[] = { >>> static const struct mfd_cell onkey_devs[] = { >>> { >>> .name = "88pm80x-onkey", >>> +.of_compatible = "marvell,88pm80x-onkey", >>> .num_resources = 1, >>> .resources = _resources[0], >>> .id = -1, >>> @@ -160,6 +168,7 @@ static const struct mfd_cell onkey_devs[] = { >>> static const struct mfd_cell regulator_devs[] = { >>> { >>> .name = "88pm80x-regulator", >>> +.of_compatible = "marvell,88pm80x-regulator", >>> .id = -1, >>> }, >>> }; >>> @@ -544,8 +553,23 @@ static int pm800_probe(struct i2c_client *client, >>> int ret = 0; >>> struct pm80x_chip *chip; >>> struct pm80x_platform_data *pdata = >>> dev_get_platdata(>dev); >>> + struct device_node *np = client->dev.of_node; >>> struct pm80x_subchip *subchip; >>> >>> + if (!pdata && !np) { >>> + dev_err(>dev, >>> + "pm80x requires platform data or of_node\n"); >>> + return -EINVAL; >>> + } >>> + >>> + if (!pdata) { >>> + pdata = devm_kzalloc(>dev, sizeof(*pdata), >>> GFP_KERNEL); >>> + if (!pdata) { >>> + dev_err(>dev, "failed to allocaate >>> memory\n"); >> >> Generic error message for ENOMEM is not needed. Just return ENOMEM and >> the core code will print the error. >> >> Rest looks fine, > > > Ok, will remove it. > > Should I add your reviewed-by in V4 for this patch? No, not yet. :) I would put such tag in my reply if I had that intention. Best regards, Krzysztof -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH-v3 2/3] mfd: 88pm800: Allow configuration of interrupt clear method
On 25.06.2015 14:26, Vaibhav Hiremath wrote: > > > On Thursday 25 June 2015 05:33 AM, Krzysztof Kozlowski wrote: >> 2015-06-24 18:21 GMT+09:00 Vaibhav Hiremath >> : >>> As per the spec, bit 1 (INT_CLEAR_MODE) of reg addr 0xe >>> (page 0) controls the method of clearing interrupt >>> status of 88pm800 family of devices; >>> >>>0: clear on read >>>1: clear on write >>> >>> This patch allows to configure this field, through DT. >>> >>> Also, as suggested by "Lee Jones" renaming DT property and variable >>> field to appropriate name. >>> >>> Signed-off-by: Zhao Ye >>> Signed-off-by: Vaibhav Hiremath >> >> It does not look like a property of the board. Instead it looks like a >> runtime configuration so it should not be part of DT bindings. >> > > Why do you say that? > > It is very well feature of 88PM860 device, where you can control irq > clear operation (either read/write). > > > Thanks, > Vaibhav > >> I understand that previously this was configured by platform data and >> now you want to move everything to DT. But this does not belong to >> DT... >> > > Thats not completely true. > I think DT is the right place for this configuration. DT and its bindings describe the specific board or device. Let me quote: <> Whether you clear interrupts by writing or reading is configured during runtime and it is completely independent to wiring. Each board with 88pm800 would allow both methods. So this is not a property of hardware in the terms of open firmware. This is a runtime configuration. Description of hardware would be a property which specifies whether a 88pm800-like device or a board using 88pm800 device ALLOWS choosing different interrupt clearing. Best regards, Krzysztof -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: lzo: check for length overrun in variable length encoding backport
Hi Florian, On Wed, Jun 24, 2015 at 09:48:46PM -0700, Florian Fainelli wrote: > Hi, > > Could you backport this commit: > 72cf90124e87d975d0b2114d930808c58b4c05e4 ("lzo: check for length overrun > in variable length encoding.") into stable kernels older than 3.18? > > It should apply cleanly to anything that contains > 8b975bd3f9089f8ee5d7bbfd798537b992bbc7e7 ("lib/lzo: Update LZO > compression to current upstream version") which goes as far as 3.9. Well, it was merged into 3.10.59 as commit 9689415259, 3.12.32 as 4277fc42, 3.14.23 as 7f5f71a92. Are you sure you didn't mean something else and confused it with another id ? Thanks, Willy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH-v3 1/3] mfd: 88pm800: Add device tree support
On Thursday 25 June 2015 05:27 AM, Krzysztof Kozlowski wrote: 2015-06-24 18:21 GMT+09:00 Vaibhav Hiremath : Add DT support to the 88pm800 driver, along with compatible field for it's sub-devices (rtc, onkey and regulator) Signed-off-by: Chao Xie Signed-off-by: Vaibhav Hiremath --- drivers/mfd/88pm800.c | 25 + 1 file changed, 25 insertions(+) diff --git a/drivers/mfd/88pm800.c b/drivers/mfd/88pm800.c index 841717a..059f01a 100644 --- a/drivers/mfd/88pm800.c +++ b/drivers/mfd/88pm800.c @@ -27,6 +27,7 @@ #include #include #include +#include /* Interrupt Registers */ #define PM800_INT_STATUS1 (0x05) @@ -121,6 +122,11 @@ static const struct i2c_device_id pm80x_id_table[] = { }; MODULE_DEVICE_TABLE(i2c, pm80x_id_table); +static const struct of_device_id pm80x_of_match_table[] = { + { .compatible = "marvell,88pm800", }, + {}, +}; + static struct resource rtc_resources[] = { { .name = "88pm80x-rtc", @@ -133,6 +139,7 @@ static struct resource rtc_resources[] = { static struct mfd_cell rtc_devs[] = { { .name = "88pm80x-rtc", +.of_compatible = "marvell,88pm80x-rtc", .num_resources = ARRAY_SIZE(rtc_resources), .resources = _resources[0], .id = -1, @@ -151,6 +158,7 @@ static struct resource onkey_resources[] = { static const struct mfd_cell onkey_devs[] = { { .name = "88pm80x-onkey", +.of_compatible = "marvell,88pm80x-onkey", .num_resources = 1, .resources = _resources[0], .id = -1, @@ -160,6 +168,7 @@ static const struct mfd_cell onkey_devs[] = { static const struct mfd_cell regulator_devs[] = { { .name = "88pm80x-regulator", +.of_compatible = "marvell,88pm80x-regulator", .id = -1, }, }; @@ -544,8 +553,23 @@ static int pm800_probe(struct i2c_client *client, int ret = 0; struct pm80x_chip *chip; struct pm80x_platform_data *pdata = dev_get_platdata(>dev); + struct device_node *np = client->dev.of_node; struct pm80x_subchip *subchip; + if (!pdata && !np) { + dev_err(>dev, + "pm80x requires platform data or of_node\n"); + return -EINVAL; + } + + if (!pdata) { + pdata = devm_kzalloc(>dev, sizeof(*pdata), GFP_KERNEL); + if (!pdata) { + dev_err(>dev, "failed to allocaate memory\n"); Generic error message for ENOMEM is not needed. Just return ENOMEM and the core code will print the error. Rest looks fine, Ok, will remove it. Should I add your reviewed-by in V4 for this patch? Thanks, Vaibhav -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH-v3 2/3] mfd: 88pm800: Allow configuration of interrupt clear method
On Thursday 25 June 2015 05:33 AM, Krzysztof Kozlowski wrote: 2015-06-24 18:21 GMT+09:00 Vaibhav Hiremath : As per the spec, bit 1 (INT_CLEAR_MODE) of reg addr 0xe (page 0) controls the method of clearing interrupt status of 88pm800 family of devices; 0: clear on read 1: clear on write This patch allows to configure this field, through DT. Also, as suggested by "Lee Jones" renaming DT property and variable field to appropriate name. Signed-off-by: Zhao Ye Signed-off-by: Vaibhav Hiremath It does not look like a property of the board. Instead it looks like a runtime configuration so it should not be part of DT bindings. Why do you say that? It is very well feature of 88PM860 device, where you can control irq clear operation (either read/write). Thanks, Vaibhav I understand that previously this was configured by platform data and now you want to move everything to DT. But this does not belong to DT... Thats not completely true. I think DT is the right place for this configuration. Thanks, Vaibhav -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 1/1] usb:serial:f81534 Add F81532/534 Driver
Hello Johan, Peter Hung 於 2015/6/15 上午 09:54 寫道: > This driver is for Fintek F81532/F81534 USB to Serial Ports IC. > > Features: > 1. F81534 is 1-to-4 & F81532 is 1-to-2 serial ports IC > 2. Support Baudrate from B50 to B150 (excluding B100). > 3. The RTS signal can be transformed their behavior with configuration > for transceiver (for RS232/RS485/RS422) (/sys/class/ttyUSBx/uart_mode) > 4. There are 4x3 output-only GPIOs to control transceiver mode. It's > can be controlled via sysfs (/sys/class/ttyUSBx/gpio) > Do you receive my patch? Are there anything should I do to improve it ? -- With Best Regards, Peter Hung -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] arm: boot: store ATAG structure into DT atags field
* Arnd Bergmann [150515 13:23]: > On Friday 15 May 2015 22:16:24 Pali Rohár wrote: > > On Friday 15 May 2015 22:12:41 Arnd Bergmann wrote: > > > On Friday 15 May 2015 21:50:07 Pali Rohár wrote: > > > > } > > > > > > > > } > > > > > > > > + /* include the terminating ATAG_NONE */ > > > > + atag_size = (char *)atag - (char *)atag_list + > > > > sizeof(struct tag_header); + setprop(fdt, "/", "atags", > > > > atag_list, atag_size); > > > > + > > > > > > > > if (memcount) { > > > > > > > > setprop(fdt, "/memory", "reg", mem_reg_property, > > > > > > > > 4 * memcount * memsize); > > > > > > The property should probably have a DT binding, and be named > > > "linux,atags". > > > > > > It may also help to check if the "linux,atags" property already > > > exists and not create it otherwise. That way we can put it into the > > > n900 dts file and have it updated by the compat code, but not expose > > > the atags on other platforms unless they opt in. Using "linux,atags" sounds good to me. And yes checking it with getprop before doing setprop makes sense. > > Maybe what would help: Is there a way to tell decompressor/kernel to not > > touch atag memory and then after kernel/board-code starts it save copy > > of atags? I think it is not possible right now, but correct me if I'm > > wrong... > > > > I don't think that is possible without an incompatible change to the > boot protocol. Agreed, let's keep the changes to minimum. Looks like with the comments posted all the pending four patches from Pali become quite a minimal set of three patches if we keep the rev string as hex. Regrds, Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] arm: devtree: Save atags if are in DT atags field
* Arnd Bergmann [150515 13:11]: > On Friday 15 May 2015 21:50:06 Pali Rohár wrote: > > @@ -256,5 +257,10 @@ const struct machine_desc * __init > > setup_machine_fdt(unsigned int dt_phys) > > system_rev = 0; > > } > > > > + /* Save atags */ > > + prop = of_get_flat_dt_prop(dt_root, "atags", NULL); > > + if (prop) > > + save_atags((void *)prop); > > + > > return mdesc; > > > > How about checking whether this is actually running on the one board > that needs it first? > > I'd rather not introduce something that may end up being considered > an ABI on other machines. It seems having this within CONFIG_ARM_ATAG_DTB_COMPAT should be enough here. Regards, Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RESEND] [PATCH v2 1/2] arm: devtree: Set system_rev from DT revision
* Pali Rohár [150506 04:45]: > On Wednesday 06 May 2015 13:04:01 Arnd Bergmann wrote: > > > > > > It needs to be done in this code, so "system_rev" variable is set > > > properly... > > > > What I mean is which code accesses this variable that early? > > > > ATAG code is doing it at same early stage, so I added it to same early > stage... Yes we should do this early like the other atags. > > > > Also, it seems strange to have a string property and then use kstrtouint > > > > to convert it into a number. I think it should either be specified in a > > > > DT > > > > binding to be a string and then have the kernel not assume that it is a > > > > number, > > > > or we should define it to be binary. > > > > > > > > Arnd > > > > > > Variable "system_rev" is number and it always was. So chaning type will > > > break more parts. > > > > > > And it is string DT property to be human readable. Some other developers > > > suggested for v2 to change it to string (from number). > > > > Both of them would be human readable, you just use something else to > > read them ;-) > > > > If we have a string here, we should just change all uses of system_rev > > in the kernel accordingly, there are only a few of them: Let's just keep it as a hex as it was. After all it's an existing interface in /proc that user space programs may expect to be in hex format already. Pali, care to repost the whole set again right after -rc1 with with rev property naming and documentation added? Just keep it as hex and let's forget any string conversion. Regards, Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] ext4 changes for 4.2-rc1
Hi Linus, Here's my suggested merge resolution to deal with Al Viro's symlink changes. - Ted diff --cc fs/ext4/symlink.c index ba5bd18,68e915a..000 --- a/fs/ext4/symlink.c +++ b/fs/ext4/symlink.c @@@ -35,19 -34,20 +34,17 @@@ static const char *ext4_follow_link(str int res; u32 plen, max_size = inode->i_sb->s_blocksize; - ctx = ext4_get_fname_crypto_ctx(inode, inode->i_sb->s_blocksize); - if (IS_ERR(ctx)) - return ERR_CAST(ctx); - if (!ext4_encrypted_inode(inode)) - return page_follow_link_light(dentry, nd); - + res = ext4_get_encryption_info(inode); + if (res) + return ERR_PTR(res); if (ext4_inode_is_fast_symlink(inode)) { caddr = (char *) EXT4_I(inode)->i_data; max_size = sizeof(EXT4_I(inode)->i_data); } else { cpage = read_mapping_page(inode->i_mapping, 0, NULL); - if (IS_ERR(cpage)) { - ext4_put_fname_crypto_ctx(); + if (IS_ERR(cpage)) - return cpage; + return ERR_CAST(cpage); - } caddr = kmap(cpage); caddr[size] = 0; } @@@ -77,14 -78,13 +75,12 @@@ /* Null-terminate the name */ if (res <= plen) paddr[res] = '\0'; - ext4_put_fname_crypto_ctx(); - nd_set_link(nd, paddr); if (cpage) { kunmap(cpage); page_cache_release(cpage); } - return NULL; + return *cookie = paddr; errout: - ext4_put_fname_crypto_ctx(); if (cpage) { kunmap(cpage); page_cache_release(cpage); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RESUBMIT PATCH 1/1] arm/hw_breakpoint.c: remove unnecessary header
Header is not needed for arm/hw_breakpoint.c, Removing the same. Signed-off-by: Maninder Singh Reviewed-by: Vaneet Narang --- arch/arm/kernel/hw_breakpoint.c |1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/arch/arm/kernel/hw_breakpoint.c b/arch/arm/kernel/hw_breakpoint.c index dc7d0a9..6284779 100644 --- a/arch/arm/kernel/hw_breakpoint.c +++ b/arch/arm/kernel/hw_breakpoint.c @@ -35,7 +35,6 @@ #include #include #include -#include #include /* Breakpoint currently in use for each BRP. */ -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
lzo: check for length overrun in variable length encoding backport
Hi, Could you backport this commit: 72cf90124e87d975d0b2114d930808c58b4c05e4 ("lzo: check for length overrun in variable length encoding.") into stable kernels older than 3.18? It should apply cleanly to anything that contains 8b975bd3f9089f8ee5d7bbfd798537b992bbc7e7 ("lib/lzo: Update LZO compression to current upstream version") which goes as far as 3.9. Thanks! -- Florian -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] staging : Comedi : comedi_fops : Fixed the return error code
On Wed, Jun 24, 2015 at 11:22:24PM +0530, Santosh wrote: > try_module_get fails when the reference count of the module is not > allowed to be incremented ,and hence -ENXIO is returned indicating > no device or address. 1) this patch is 2/2, but then where is your 1/2 patch? 2) You have used "santosh" in your email From: header. use "santosh pai" there. It should be same as what you use in your Signed-off-by: regards sudip -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Use of pinctrl-single for external device over I2C
* Vaibhav Hiremath [150624 10:12]: > > I do not like this, as this is not HW feature, so DT may not be right > approach. > > So I will dig more from either runtime or Compile time option to use > regmap_ Vs raw read/writes. Can't you just check if the pinctrl node has compatible = "syscon" property? A compile time option won't work for sure. I don't know what you would check at runtime as you do not know what the bus is behind syscon. Regards, Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] staging: rtl8192u: bool tests don't need comparisons
On Wed, Jun 24, 2015 at 12:12:01PM +0200, Luis de Bethencourt wrote: > On Wed, Jun 24, 2015 at 11:05:16AM +0530, Sudip Mukherjee wrote: > > On Tue, Jun 23, 2015 at 03:10:56PM +0200, Luis de Bethencourt wrote: > > I based the patch on staging's master and not on the staging-next branch. use staging-testing branch. regards sudip -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] tracing: Have filter check for balanced ops
On Thu, 25 Jun 2015 00:03:02 -0400 Sasha Levin wrote: > On 06/17/2015 08:36 AM, Steven Rostedt wrote: > > Linus, > > > > Vince Weaver reported a warning when he added perf event filters > > into his fuzzer tests. There's a missing check of balanced > > operations when parenthesis are used, and this triggers a WARN_ON() > > and when reading the failure, the filter reports no failure occurred. > > Hey Steven, > > My fuzzings are hitting the warning added by this patch: Yes, Vince said he was able to hit it as well. But the warning itself is useless if you don't supply what filter was used to trigger it. -- Steve > > [2175114.187536] WARNING: CPU: 16 PID: 10388 at > kernel/trace/trace_events_filter.c:1388 replace_preds+0x814/0x2140() > [2175114.190213] Modules linked in: > [2175114.19] CPU: 16 PID: 10388 Comm: trinity-c48 Not tainted > 4.1.0-next-20150623-sasha-00039-ga1eb83a-dirty #2280 > [2175114.194463] 880a2335 6a8e22d4 880a2335f878 > abc8cfa3 > [2175114.196547] 880a2335f8c8 > a21ebd36 > [2175114.198604] 880e60fe09e0 a24608f4 880e61b14830 > 880e60fe09d8 > [2175114.200666] Call Trace: > [2175114.201377] [] dump_stack+0x4f/0x7b > [2175114.202793] [] warn_slowpath_common+0xc6/0x120 > [2175114.206235] [] warn_slowpath_null+0x1a/0x20 > [2175114.207819] [] replace_preds+0x814/0x2140 > [2175114.216433] [] create_filter+0x15a/0x210 > [2175114.231529] [] apply_event_filter+0x28b/0x780 > [2175114.241196] [] event_filter_write+0x106/0x1c0 > [2175114.242823] [] do_loop_readv_writev+0x128/0x1e0 > [2175114.248901] [] do_readv_writev+0x5ae/0x6c0 > [2175114.256760] [] vfs_writev+0x72/0xb0 > [2175114.258134] [] SyS_pwritev+0x1b4/0x220 > [2175114.261291] [] tracesys_phase2+0x88/0x8d > > > Thanks, > Sasha -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] f2fs updates for v4.2
On Wed, Jun 24, 2015 at 08:42:02PM -0700, Linus Torvalds wrote: > On Wed, Jun 24, 2015 at 1:25 PM, Jaegeuk Kim wrote: > > > > New features are: > > o per-file encryption (e.g., ext4) > > The new encrypted symlinks needed fixups for the changes that happened > meanwhile to the symlink handling. I did all that in my merge, and I > *think* I got it all right, but I would like you to check. In > particular, I hope you have a test-case and can actually give it a > whirl on that. > > Al added to cc, just in case he could also check my merge resolution > of fs/f2fs/namei.c (the merge is commit cfcc0ad47f4c, I'll push it out > after I've finished the filesystem pulls) FWIW, linux-next contains fixups for a bunch of such stuff, including f2fs one. The only difference between your resolution and Stephen's fixup is static const char *f2fs_encrypted_follow_link(struct dentry *dentry, void **cookie) vs. static const char *f2fs_encrypted_follow_link(struct dentry *dentry, void **cookie) Said that, f2fs_symlink() looks odd - we create a directory entry *before* doing page_symlink(). And if it (or encryption) fails, I don't see anything that would remove that new directory entry. What are we ending up with in such case? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][v2] asus-rbtn: new driver for asus radio button for Windows 8
On Wed, Jun 24, 2015 at 10:57:51AM +0800, Alex Hung wrote: > ASUS introduced a new approach to handle wireless hotkey > since Windows 8. When the hotkey is pressed, BIOS generates > a notification 0x88 to a new ACPI device, ATK4001. This > new driver not only translates the notification to KEY_RFKILL > but also toggles its LED accordingly. > > Signed-off-by: Alex Hung > --- > MAINTAINERS | 6 + > drivers/platform/x86/Kconfig | 11 ++ > drivers/platform/x86/Makefile| 1 + > drivers/platform/x86/asus-rbtn.c | 240 > +++ > 4 files changed, 258 insertions(+) > create mode 100644 drivers/platform/x86/asus-rbtn.c > > diff --git a/MAINTAINERS b/MAINTAINERS > index d8afd29..03711ce 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -1673,6 +1673,12 @@ S: Maintained > F: drivers/platform/x86/asus*.c > F: drivers/platform/x86/eeepc*.c > > +ASUS RADIO BUTTON DRIVER > +M: Alex Hung > +L: platform-driver-...@vger.kernel.org > +S: Maintained > +F: drivers/platform/x86/asus-rbtn.c > + > ASYNCHRONOUS TRANSFERS/TRANSFORMS (IOAT) API > R: Dan Williams > W: http://sourceforge.net/projects/xscaleiop > diff --git a/drivers/platform/x86/Kconfig b/drivers/platform/x86/Kconfig > index f9f205c..a8ac885 100644 > --- a/drivers/platform/x86/Kconfig > +++ b/drivers/platform/x86/Kconfig > @@ -516,6 +516,17 @@ config EEEPC_LAPTOP > If you have an Eee PC laptop, say Y or M here. If this driver > doesn't work on your Eee PC, try eeepc-wmi instead. > > +config ASUS_RBTN > + tristate "ASUS radio button" > + depends on ACPI > + depends on INPUT > + help > + This driver provides supports for new ASUS radio button for Windows 8. s/supports/support/ Also, avoid using "new" in the Kconfig as this lives forever, in 10 years, it won't be so new :-) Consider: "This driver supports the ASUS radio button for Windows 8." (And maybe fix the entry for HP_WIRELESS while you're at it in a separate patch) ... > +static int asus_rbtn_input_setup(void) > +{ > + int err; > + > + asusrb_input_dev = input_allocate_device(); > + if (!asusrb_input_dev) > + return -ENOMEM; > + > + asusrb_input_dev->name = "ASUS radio hotkeys"; > + asusrb_input_dev->phys = "atk4001/input0"; > + asusrb_input_dev->id.bustype = BUS_HOST; > + asusrb_input_dev->evbit[0] = BIT(EV_KEY); > + set_bit(KEY_RFKILL, asusrb_input_dev->keybit); > + > + err = input_register_device(asusrb_input_dev); > + if (err) > + goto err_free_dev; > + > + return 0; > + > +err_free_dev: > + input_free_device(asusrb_input_dev); > + return err; I missed this on the first round. There is no need for a goto here at all: int ret; ... ret = input_register_Device(asusrb_input_dev); if (ret) input_free_device(asusrb_input_dev); return ret; Much nicer IMHO. Do you have a strong preference for err over ret? In most cases in this driver, ret would be the more typical choice in my experience. I suppose this is modeled after hp-wireless which has the same error path in hp_wireless_input_setup I mentioned above and uses err throughout - consistency is a good thing. I won't argue over the ret/err thing as there is precedent in this subsystem for similar drivers. -- Darren Hart Intel Open Source Technology Center -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] tracing: Have filter check for balanced ops
On 06/17/2015 08:36 AM, Steven Rostedt wrote: > Linus, > > Vince Weaver reported a warning when he added perf event filters > into his fuzzer tests. There's a missing check of balanced > operations when parenthesis are used, and this triggers a WARN_ON() > and when reading the failure, the filter reports no failure occurred. Hey Steven, My fuzzings are hitting the warning added by this patch: [2175114.187536] WARNING: CPU: 16 PID: 10388 at kernel/trace/trace_events_filter.c:1388 replace_preds+0x814/0x2140() [2175114.190213] Modules linked in: [2175114.19] CPU: 16 PID: 10388 Comm: trinity-c48 Not tainted 4.1.0-next-20150623-sasha-00039-ga1eb83a-dirty #2280 [2175114.194463] 880a2335 6a8e22d4 880a2335f878 abc8cfa3 [2175114.196547] 880a2335f8c8 a21ebd36 [2175114.198604] 880e60fe09e0 a24608f4 880e61b14830 880e60fe09d8 [2175114.200666] Call Trace: [2175114.201377] [] dump_stack+0x4f/0x7b [2175114.202793] [] warn_slowpath_common+0xc6/0x120 [2175114.206235] [] warn_slowpath_null+0x1a/0x20 [2175114.207819] [] replace_preds+0x814/0x2140 [2175114.216433] [] create_filter+0x15a/0x210 [2175114.231529] [] apply_event_filter+0x28b/0x780 [2175114.241196] [] event_filter_write+0x106/0x1c0 [2175114.242823] [] do_loop_readv_writev+0x128/0x1e0 [2175114.248901] [] do_readv_writev+0x5ae/0x6c0 [2175114.256760] [] vfs_writev+0x72/0xb0 [2175114.258134] [] SyS_pwritev+0x1b4/0x220 [2175114.261291] [] tracesys_phase2+0x88/0x8d Thanks, Sasha -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] net/fsl: remove dependency FSL_SOC for Gianfar
CONFIG_GIANFAR is not depended on FSL_SOC, it can be built on non-PPC platforms. Signed-off-by: Alison Wang --- drivers/net/ethernet/freescale/Kconfig | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/freescale/Kconfig b/drivers/net/ethernet/freescale/Kconfig index b8de87b..ff76d4e 100644 --- a/drivers/net/ethernet/freescale/Kconfig +++ b/drivers/net/ethernet/freescale/Kconfig @@ -83,12 +83,12 @@ config UGETH_TX_ON_DEMAND config GIANFAR tristate "Gianfar Ethernet" - depends on FSL_SOC select FSL_PQ_MDIO select PHYLIB select CRC32 ---help--- This driver supports the Gigabit TSEC on the MPC83xx, MPC85xx, - and MPC86xx family of chips, and the FEC on the 8540. + and MPC86xx family of chips, the eTSEC on LS1021A and the FEC + on the 8540. endif # NET_VENDOR_FREESCALE -- 2.1.0.27.g96db324 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] ext4 changes for 4.2-rc1
The following changes since commit e26081808edadfd257c6c9d81014e3b25e9a6118: Linux 4.1-rc4 (2015-05-18 10:13:47 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git tags/ext4_for_linus for you to fetch changes up to a2fd66d069d86d793e9d39d4079b96f46d13f237: ext4: set lazytime on remount if MS_LAZYTIME is set by mount (2015-06-23 11:03:54 -0400) A very large number of cleanups and bug fixes --- in particular for the ext4 encryption patches, which is a new feature added in the last merge window. Also fix a number of long-standing xfstest failures. (Quota writes failing due to ENOSPC, a race between truncate and writepage in data=journalled mode that was causing generic/068 to fail, and other corner cases.) Also add support for FALLOC_FL_INSERT_RANGE, and improve jbd2 performance eliminating locking when a buffer is modified more than once during a transaction (which is very common for allocation bitmaps, for example), in which case the state of the journalled buffer head doesn't need to change. Andreas Dilger (1): ext4: improve warning directory handling messages Chao Yu (1): ext4 crypto: release crypto resource on module exit Darrick J. Wong (1): ext4: don't retry file block mapping on bigalloc fs with non-extent file David Moore (1): ext4: BUG_ON assertion repeated for inode1, not done for inode2 Dmitry Monakhov (1): jbd2: use GFP_NOFS in jbd2_cleanup_journal_tail() Eric Whitney (2): ext4: minor cleanup of ext4_da_reserve_space() ext4: make online defrag error reporting consistent Fabian Frederick (3): ext4 crypto: fix sparse warnings in fs/ext4/ioctl.c ext4: use swap() in memswap() ext4: use swap() in mext_page_double_lock() Jan Kara (5): jbd2: simplify code flow in do_get_write_access() jbd2: simplify error path on allocation failure in do_get_write_access() jbd2: more simplifications in do_get_write_access() jbd2: speedup jbd2_journal_get_[write|undo]_access() jbd2: speedup jbd2_journal_dirty_metadata() Josef Bacik (1): ext4: only call ext4_truncate when size <= isize Joseph Qi (1): jbd2: fix ocfs2 corrupt when updating journal superblock fails Lukas Czerner (5): ext4: verify block bitmap even after fresh initialization ext4: try to initialize all groups we can in case of failure on ppc64 ext4: return error code from ext4_mb_good_group() ext4: recalculate journal credits as inode depth changes ext4: wait for existing dio workers in ext4_alloc_file_blocks() Michal Hocko (2): jbd2: revert must-not-fail allocation loops back to GFP_NOFAIL jbd2: get rid of open coded allocation retry loop Namjae Jeon (1): ext4: Add support FALLOC_FL_INSERT_RANGE for fallocate Rasmus Villemoes (1): ext4: mballoc: avoid 20-argument function call Theodore Ts'o (26): ext4 crypto: optimize filename encryption ext4 crypto: don't allocate a page when encrypting/decrypting file names ext4 crypto: separate kernel and userspace structure for the key ext4 crypto: reorganize how we store keys in the inode ext4: clean up superblock encryption mode fields ext4 crypto: use slab caches ext4 crypto: get rid of ci_mode from struct ext4_crypt_info ext4 crypto: shrink size of the ext4_crypto_ctx structure ext4 crypto: require CONFIG_CRYPTO_CTR if ext4 encryption is enabled ext4 crypto: use per-inode tfm structure ext4 crypto: fix memory leaks in ext4_encrypted_zeroout ext4 crypto: set up encryption info for new inodes in ext4_inherit_context() ext4 crypto: make sure the encryption info is initialized on opendir(2) ext4 crypto: encrypt tmpfile located in encryption protected directory ext4 crypto: enforce crypto policy restrictions on cross-renames ext4 crypto: policies may only be set on directories ext4 crypto: clean up error handling in ext4_fname_setup_filename ext4 crypto: allocate the right amount of memory for the on-disk symlink ext4 crypto: handle unexpected lack of encryption keys ext4 crypto: allocate bounce pages using GFP_NOWAIT ext4 crypto: fix ext4_get_crypto_ctx()'s calling convention in ext4_decrypt_one ext4 crypto: fail the mount if blocksize != pagesize ext4: fix race between truncate and __ext4_journalled_writepage() ext4: call sync_blockdev() before invalidate_bdev() in put_super() ext4: prevent ext4_quota_write() from failing due to ENOSPC ext4: set lazytime on remount if MS_LAZYTIME is set by mount fs/ext4/Kconfig | 1 + fs/ext4/balloc.c| 4 +- fs/ext4/crypto.c| 211 +-- fs/ext4/crypto_fname.c | 490
Re: [GIT PULL] f2fs updates for v4.2
On Wed, Jun 24, 2015 at 1:25 PM, Jaegeuk Kim wrote: > > New features are: > o per-file encryption (e.g., ext4) The new encrypted symlinks needed fixups for the changes that happened meanwhile to the symlink handling. I did all that in my merge, and I *think* I got it all right, but I would like you to check. In particular, I hope you have a test-case and can actually give it a whirl on that. Al added to cc, just in case he could also check my merge resolution of fs/f2fs/namei.c (the merge is commit cfcc0ad47f4c, I'll push it out after I've finished the filesystem pulls) Thanks, Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2] trace-cmd: add option to group like comms for profile
On Thu, 21 May 2015 13:30:08 -0400 Josef Bacik wrote: > +static void merge_tasks(struct handle_data *h) > +{ > + struct trace_hash_item **bucket; > + struct trace_hash_item *item; > + > + if (!merge_like_comms) > + return; > + > + trace_hash_for_each_bucket(bucket, >task_hash) { > + trace_hash_for_each_item(item, bucket) > + add_group(h, task_from_item(item)); > + } > +} > + > int trace_profile(void) > { > struct handle_data *h; > > for (h = handles; h; h = h->next) { > + if (merge_like_comms) > + merge_tasks(h); I don't think we need the double check. Here you only call merge_tasks() if merge_like_comms is set, but then the first thing you do in merge_tasks() is to return if merge_like_comms is not set. One check is enough. -- Steve > output_handle(h); > trace_hash_free(>task_hash); > } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] sched: might_sleep(): do rate-limiting before sanity checks
On 06/24/2015 05:03 PM, Dave Hansen wrote: > In any case, we ratelimit might_sleep() checks anyway. But, we > do the ratelimiting *after* we check the other conditions for > might_sleep() including the (costly) irqs_disabled() call. Thinking about this a bit more, this patch is wrong. This only does a _check_ once per jiffy instead of just one warning per jiffy, which is totally bogus. I would be interested, though, if anybody has any ideas about speeding up the irqs_disabled() checking. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] tick/idle/powerpc: Do not register idle states with CPUIDLE_FLAG_TIMER_STOP set in periodic mode
On 06/25/2015 05:36 AM, Rafael J. Wysocki wrote: > On Thu, Jun 25, 2015 at 12:06 AM, Benjamin Herrenschmidt > wrote: >> On Wed, 2015-06-24 at 15:50 +0200, Rafael J. Wysocki wrote: >>> 4.2 material I suppose? >> >> And stable. Without this, if you configure without TICK_ONESHOT, the >> machine will hang. > > OK, which -stable? All of them or any specific series? This needs to go into stable/linux-3.19.y, stable/linux-4.0.y, stable/linux-4.1.y. Thanks Regards Preeti U Murthy > > Rafael > ___ > Linuxppc-dev mailing list > linuxppc-...@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/linuxppc-dev > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2] trace-cmd: add option to group like comms for profile
On Thu, 21 May 2015 13:30:08 -0400 Josef Bacik wrote: > +static int compare_groups(const void *a, const void *b) > +{ > + const char *A = a; > + const char *B = b; > + > + return strcmp(A, B); a and b are not strings here. They are group_data pointers. I think what you want is this: static int compare_groups(const void *a, const void *b) { struct group_data * const *A = a; struct group_data * const *B = b; return strcmp((*A)->comm, (*B)->comm); } -- Steve > +} > + > > +static void output_groups(struct handle_data *h) > +{ > + struct trace_hash_item **bucket; > + struct trace_hash_item *item; > + struct group_data **groups; > + int nr_groups = 0; > + int i; > + > + trace_hash_for_each_bucket(bucket, >group_hash) { > + trace_hash_for_each_item(item, bucket) { > + nr_groups++; > + } > + } > + > + if (nr_groups == 0) > + return; > + > + groups = malloc_or_die(sizeof(*groups) * nr_groups); > + > + nr_groups = 0; > + > + trace_hash_for_each_bucket(bucket, >group_hash) { > + trace_hash_while_item(item, bucket) { > + groups[nr_groups++] = group_from_item(item); > + trace_hash_del(item); > + } > + } > + > + qsort(groups, nr_groups, sizeof(*groups), compare_groups); > + > + for (i = 0; i < nr_groups; i++) { > + output_group(h, groups[i]); > + free_group(groups[i]); > + } > + > + free(groups); > +} > + > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] kexec: Make a pair of reserved pages when kdump fails to start
From: Minfei Huang For some arch, kexec shall map the reserved pages, then use them, when we try to start the kdump service. Now kexec will never unmap the reserved pages, once it fails to continue starting the kdump service. Make a pair of reserved pages in kdump starting path, whatever kexec fails or not. Signed-off-by: Minfei Huang --- kernel/kexec.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/kernel/kexec.c b/kernel/kexec.c index 7a36fdc..ab32d59 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -1308,19 +1308,23 @@ SYSCALL_DEFINE4(kexec_load, unsigned long, entry, unsigned long, nr_segments, image->preserve_context = 1; result = machine_kexec_prepare(image); if (result) - goto out; + goto failure; for (i = 0; i < nr_segments; i++) { result = kimage_load_segment(image, >segment[i]); if (result) - goto out; + goto failure; } kimage_terminate(image); + +failure: if (flags & KEXEC_ON_CRASH) crash_unmap_reserved_pages(); } - /* Install the new kernel, and Uninstall the old */ - image = xchg(dest_image, image); + + if (result == 0) + /* Install the new kernel, and Uninstall the old */ + image = xchg(dest_image, image); out: mutex_unlock(_mutex); -- 2.2.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] dell-laptop: Fix allocating & freeing SMI buffer page
On Tue, Jun 23, 2015 at 10:11:19AM +0200, Pali Rohár wrote: > This commit fix kernel crash when probing for rfkill devices in dell-laptop > driver failed. Function free_page() was incorrectly used on struct page * > instead of virtual address of SMI buffer. > > This commit also simplify allocating page for SMI buffer by using > __get_free_page() function instead of sequential call of functions > alloc_page() and page_address(). > > Signed-off-by: Pali Rohár > Acked-by: Michal Hocko > Cc: sta...@vger.kernel.org Queued, thanks Pali. -- Darren Hart Intel Open Source Technology Center -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 12/13] stop_machine: Remove lglock
On Wed, Jun 24, 2015 at 07:58:30PM +0200, Peter Zijlstra wrote: > On Wed, Jun 24, 2015 at 10:10:17AM -0700, Paul E. McKenney wrote: > > > The thing is, once you start bailing on this condition your 'queue' > > > drains very fast and this is around the same time sync_rcu() would've > > > released the waiters too. > > > > In my experience, this sort of thing simply melts down on large systems. > > I am reworking this with multiple locks so as to keep the large-system > > contention down to a dull roar. > > So with the MCS queue we're got less global trashing than you had with > the start/done tickets. Only the queue head on enqueue. Here is what I had in mind, where you don't have any global trashing except when the ->expedited_sequence gets updated. Passes mild rcutorture testing. Still needs asynchronous CPU stoppage and stall warnings and trace documentation updates. Plus fixes for whatever bugs show up. Thanx, Paul diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 78d0a87ff354..887370b7e52a 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -70,6 +70,7 @@ MODULE_ALIAS("rcutree"); static struct lock_class_key rcu_node_class[RCU_NUM_LVLS]; static struct lock_class_key rcu_fqs_class[RCU_NUM_LVLS]; +static struct lock_class_key rcu_exp_class[RCU_NUM_LVLS]; /* * In order to export the rcu_state name to the tracing tools, it @@ -3323,6 +3324,22 @@ static int synchronize_sched_expedited_cpu_stop(void *data) return 0; } +/* Common code for synchronize_sched_expedited() work-done checking. */ +static bool sync_sched_exp_wd(struct rcu_state *rsp, struct rcu_node *rnp, + atomic_long_t *stat, unsigned long s) +{ + if (ULONG_CMP_GE(READ_ONCE(rsp->expedited_sequence), s)) { + if (rnp) + mutex_unlock(>exp_funnel_mutex); + /* Ensure test happens before caller kfree(). */ + smp_mb__before_atomic(); /* ^^^ */ + atomic_long_inc(stat); + put_online_cpus(); + return true; + } + return false; +} + /** * synchronize_sched_expedited - Brute-force RCU-sched grace period * @@ -3334,58 +3351,24 @@ static int synchronize_sched_expedited_cpu_stop(void *data) * restructure your code to batch your updates, and then use a single * synchronize_sched() instead. * - * This implementation can be thought of as an application of ticket - * locking to RCU, with sync_sched_expedited_started and - * sync_sched_expedited_done taking on the roles of the halves - * of the ticket-lock word. Each task atomically increments - * sync_sched_expedited_started upon entry, snapshotting the old value, - * then attempts to stop all the CPUs. If this succeeds, then each - * CPU will have executed a context switch, resulting in an RCU-sched - * grace period. We are then done, so we use atomic_cmpxchg() to - * update sync_sched_expedited_done to match our snapshot -- but - * only if someone else has not already advanced past our snapshot. - * - * On the other hand, if try_stop_cpus() fails, we check the value - * of sync_sched_expedited_done. If it has advanced past our - * initial snapshot, then someone else must have forced a grace period - * some time after we took our snapshot. In this case, our work is - * done for us, and we can simply return. Otherwise, we try again, - * but keep our initial snapshot for purposes of checking for someone - * doing our work for us. - * - * If we fail too many times in a row, we fall back to synchronize_sched(). + * This implementation can be thought of as an application of sequence + * locking to expedited grace periods, but using the sequence counter to + * determine when someone else has already done the work instead of for + * retrying readers. */ void synchronize_sched_expedited(void) { - cpumask_var_t cm; - bool cma = false; int cpu; - long firstsnap, s, snap; - int trycount = 0; + long s; struct rcu_state *rsp = _sched_state; + struct rcu_node *rnp0; + struct rcu_node *rnp1 = NULL; - /* -* If we are in danger of counter wrap, just do synchronize_sched(). -* By allowing sync_sched_expedited_started to advance no more than -* ULONG_MAX/8 ahead of sync_sched_expedited_done, we are ensuring -* that more than 3.5 billion CPUs would be required to force a -* counter wrap on a 32-bit system. Quite a few more CPUs would of -* course be required on a 64-bit system. -*/ - if (ULONG_CMP_GE((ulong)atomic_long_read(>expedited_start), -(ulong)atomic_long_read(>expedited_done) + -ULONG_MAX / 8)) { - wait_rcu_gp(call_rcu_sched); - atomic_long_inc(>expedited_wrap); - return; -
Re: [PATCH V2] trace-cmd: add option to group like comms for profile
On Thu, 21 May 2015 13:30:08 -0400 Josef Bacik wrote: > V1->V2: > -renamed the option to --by-comm, added it to trace-cmd report --profile as > well > -fixed up the string hash Or break it ;-) > -changed it to merge all events after the fact so it's less error prone > diff --git a/trace-hash-local.h b/trace-hash-local.h > index 997b11c..eaeeaaf 100644 > --- a/trace-hash-local.h > +++ b/trace-hash-local.h > @@ -48,4 +48,13 @@ static inline unsigned int trace_hash(int val) > return hash; > } > > +static inline unsigned int trace_hash_str(char *str) > +{ > + int val = 0; > + int i; > + > + for (i = 0; str[i]; i++) > + val += ((int)str[i]) << (i & 0xff); > + return trace_hash(val); > +} > I need to clean out my medicine cabinet and remove all the expired meds. Because I was definitely taking something nasty when I recommended (i & 0xff)! When i is greater than 32 (which is less than 0xff) it will overflow the addition. What I wanted was that we don't shift more than 24 bits. Where 2 ** 24 - 1 == 0xff. That should be: val += ((int)str[i]) << (i % 25); my bad :-/ To avoid the slow '%', I'll just use '& 0xf', as shifting it 16 times is enough for this algorithm. No need to send a new patch, I'll fix it, as it was my brain fart. I'll review the rest of this patch too and apply it if nothing sticks out. Thanks! -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/4] blk-mq: establish new mapping before cpu starts handling requests
2015-06-25 1:24 GMT+09:00 Ming Lei : > On Wed, Jun 24, 2015 at 10:34 PM, Akinobu Mita wrote: >> Hi Ming, >> >> 2015-06-24 18:46 GMT+09:00 Ming Lei : >>> On Sun, Jun 21, 2015 at 9:52 PM, Akinobu Mita >>> wrote: ctx->index_hw is zero for the CPUs which have never been onlined since the block queue was initialized. If one of those CPUs is hotadded and starts handling request before new mappings are established, pending >>> >>> Could you explain a bit what the handling request is? The fact is that >>> blk_mq_queue_reinit() is run after all queues are put into freezing. >> >> Notifier callbacks for CPU_ONLINE action can be run on the other CPU >> than the CPU which was just onlined. So it is possible for the >> process running on the just onlined CPU to insert request and run >> hw queue before blk_mq_queue_reinit_notify() is actually called with >> action=CPU_ONLINE. > > You are right because blk_mq_queue_reinit_notify() is alwasy run after > the CPU becomes UP, so there is a tiny window in which the CPU is up > but the mapping is updated. Per current design, the CPU just onlined > is still mapped to hw queue 0 until the mapping is updated by > blk_mq_queue_reinit_notify(). > > But I am wondering why it is a problem and why you think flush_busy_ctxs > can't find the requests on the software queue in this situation? The problem happens when the CPU has just been onlined first time since the request queue was initialized. At this time ctx->index_hw for the CPU is still zero before blk_mq_queue_reinit_notify is called. The request can be inserted to ctx->rq_list, but blk_mq_hctx_mark_pending() marks busy for wrong bit position as ctx->index_hw is zero. flush_busy_ctxs() only retrieves the requests from software queues which are marked busy. So the request just inserted is ignored as the corresponding bit position is not busy. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/3] restartable sequences: fast user-space percpu critical sections
On Wed, Jun 24, 2015 at 5:07 PM, Andy Lutomirski wrote: > On Wed, Jun 24, 2015 at 3:26 PM, Paul Turner wrote: >> This is a fairly small series demonstrating a feature we've found to be quite >> powerful in practice, "restartable sequences". >> > > On an extremely short glance, I'm starting to think that the right > approach, at least for x86, is to implement per-cpu gsbase. Then you > could do cmpxchg with a gs prefix to atomically take a percpu lock and > atomically release a percpu lock and check whether someone else stole > the lock from you. (Note: cmpxchg, unlike lock cmpxchg, is very > fast.) > > This is totally useless for other architectures, but I think it would > be reasonable clean on x86. Thoughts? So this gives semantics that are obviously similar to this_cpu(). This provides allows reasonable per-cpu counters (which is alone almost sufficient for a strong user-space RCU implementation giving this some legs). However, unless there's a nice implementation trick I'm missing, the thing that stands out to me for locks (or other primitives) is that this forces a two-phase commit. There's no way (short of say, cmpxchg16b) to perform a write conditional on the lock not having been stolen from us (and subsequently release the lock). e.g. 1) We take the operation in some sort of speculative mode, that another thread on the same cpu is stilled allowed to steal from us 2) We prepare what we want to commit 3) At this point we have to promote the lock taken in (1) to perform our actual commit, or see that someone else has stolen (1) 4) Release the promoted lock in (3) However, this means that if we're preempted at (3) then no other thread on that cpu can make progress until we've been rescheduled and released the lock; a nice property of the model we have today is that threads sharing a cpu can not impede each other beyond what the scheduler allows. A lesser concern, but worth mentioning, is that there are also potential pitfalls in the interaction with signal handlers, particularly if a 2-phase commit is used. - Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] trace-cmd: add a kernel memory leak detector
On Tue, 23 Jun 2015 16:06:39 -0700 Josef Bacik wrote: > I needed to track down a very slow memory leak so I adapted the same approach > trace-cmd profile uses to track kernel memory allocations. You run this with > > trace-cmd kmemleak Note, I'm still playing with this. > > and then you can kill -SIGUSR2 to get current status updates, > or > just stop the process when you are ready. It will tell you how much was lost > and the size of the objects that were allocated, along with the tracebacks and > the counts of the allocators. Thanks, > > Signed-off-by: Josef Bacik > diff --git a/trace-kmemleak.c b/trace-kmemleak.c > new file mode 100644 > index 000..2e288fe > --- /dev/null > +++ b/trace-kmemleak.c Please add a copyright notice here. You can add your name as author but more importantly, please state what license this is under. See trace-record.c for details. > @@ -0,0 +1,552 @@ > +#define _LARGEFILE64_SOURCE > +#include > +#include > +#include > +#include > +#include > + > +#include "trace-local.h" > +#include "trace-hash.h" > +#include "list.h" > + > +#define memory_from_item(item) container_of(item, struct memory, hash) > +#define memory_from_phash(item) container_of(item, struct memory, phash) > +#define leak_from_item(item) container_of(item, struct memory_leak, hash) > +#define edata_from_item(item)container_of(item, struct event_data, > hash) > +#define stack_from_item(item)container_of(item, struct stack_trace, > hash) > + > --- a/trace-record.c > +++ b/trace-record.c > @@ -3703,6 +3721,7 @@ static void add_hook(struct buffer_instance *instance, > const char *arg) > } > > enum { > + OPT_kmemleak= 249, > OPT_bycomm = 250, Ug, I realized I never applied your bycomm patch. I'll need to look at that now too. Egad, I've been putting off trace-cmd for to long. I need to start getting back to it! -- Steve > OPT_stderr = 251, > OPT_profile = 252, > @@ -3738,7 +3757,7 @@ void trace_record (int argc, char **argv) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/6] mtd: docg3: Don't do ERR_PTR(0)
Hi Robert, On Tue, Jun 23, 2015 at 10:41:33PM +0200, Robert Jarzmik wrote: > Richard Weinberger writes: > > > Am 17.06.2015 um 20:41 schrieb Brian Norris: > >> Have you tested this patch? > > > > nah, I don't own such a device. > But I do. If you resend a patch, please Cc me. You can even ask for a test > from > time to time if you want a confirmation, I have a 2 floors docg3 device. Do you want to be on a MAINTAINERS entry for this driver? Brian -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Wed, Jun 24, 2015 at 7:14 PM, Steven Rostedt wrote: > > I don't think it will complicate things even if the API changes. The distros > will have to deal with that fall out. Mainline only cares about its own > regressions. But any API changes would only be done for good reasons, and give > the distros an excuse to fix whatever was done wrong in the first place. I don't think that's true. Realistically, every single kernel developer tends to work on a machine with some random distro. If that developer cannot compile his own kernel because his distro stops working, or has to use some "kdbus=0" switch to turn off the kernel kdbus and (hopefuly) the distro just switches to the legacy user mode bus, then for that developer, merging and enabling incompatible kdbus implementation is basically a regression. We've seen this before. We end up stuck with the ABI of whatever user land applications. It doesn't matter where that ABI came from. I do agree that distro's that want to enable kdbus before any agreed version has been merged would get to also act as guinea pigs and do their own QA, and handle fallout from whatever problems they encounter etc. That part might be good. But I don't think we really end up having the option to make up some incompatible kdbus ABI after-the-fact. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kdbus: to merge or not to merge?
On Tue, Jun 23, 2015 at 08:07:41AM -0700, Andy Lutomirski wrote: > > FWIW, once there are real distros with kdbus userspace enabled, > reviewing kdbus gets more complicated -- we'll be in the position > where merging kdbus in a different form from that which was proposed > will break existing users. Actually, I think distros having it in their kernel before it's in mainline is actually a good thing. Let them straighten out the issues that may come up (not to mention possible CVEs). If the distros have it in their kernels and out in the public for 6 months or more, that may give enough information as to whether or not it should be merged. I don't think it will complicate things even if the API changes. The distros will have to deal with that fall out. Mainline only cares about its own regressions. But any API changes would only be done for good reasons, and give the distros an excuse to fix whatever was done wrong in the first place. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding
> -Original Message- > From: Alex Williamson [mailto:alex.william...@redhat.com] > Sent: Thursday, June 25, 2015 3:49 AM > To: Eric Auger > Cc: Joerg Roedel; Avi Kivity; Wu, Feng; k...@vger.kernel.org; > linux-kernel@vger.kernel.org; pbonz...@redhat.com; mtosa...@redhat.com > Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding > > On Wed, 2015-06-24 at 18:25 +0200, Eric Auger wrote: > > Hi Joerg, > > > > On 06/24/2015 05:50 PM, Joerg Roedel wrote: > > > On Mon, Jun 15, 2015 at 06:17:03PM +0200, Eric Auger wrote: > > >> I guess this discussion also is relevant wrt "[RFC v6 00/16] KVM-VFIO > > >> IRQ forward control" series? Or is that "central registry maintained by > > >> a posted interrupts manager" something more specific to x86? > > > > > > From what I understood so far, the feature you implemented for ARM is a > > > bit different from the ones that get introduced to x86. > > > > > > Can you please share some details on how the ARM version works? I am > > > interested in how the GICv2 is configured for IRQ forwarding. The > > > question is whether the forwarding information needs to be updated from > > > KVM and what information about the IRQ KVM needs for this. > > > > The principle is that when you inject a virtual IRQ to a guest, you > > program a register in the GIC, known as a list register. There you put > > both the virtual IRQ you want to inject but also the physical IRQ it is > > linked with (HWbit mode set = forwarding set). When the guest completes > > the virtual IRQ the GIC HW automatically deactivates the physical IRQ > > found in the list register. In that mode the physical IRQ deactivation > > is under the ownership of the guest (actually automatically done by the HW). > > > > If HWbit mode is *not* set (forwarding not set), you do not specify the > > HW IRQ in the list register. The host deactivates the physical IRQ & > > masks it before triggering the virtual IRQ. Only the virtual IRQ ID is > > programmed in the list register. When the guest completes the virtual > > IRQ, a physical maintenance IRQ is triggered. The hyp mode is entered > > and eventually the host unmasks the IRQ. > > > > Some illustrations can be found in > > http://www.linux-kvm.org/images/a/a8/01x04-ARMdevice.pdf > > > I think an important aspect for our design is that in the case of Posted > Interrupts, they're only used for edge triggered interrupts so VFIO is > only an information provider for KVM to configure it. Exactly! For PI, KVM only needs some information from VFIO when the guests set the irq affinity. Thanks, Feng VFIO will > hopefully just see fewer interrupts as they magically appear directly in > the guest. IRQ Forwarding however affects the de-assertion of level > triggered interrupts. VFIO needs to switch to something more like an > edge handler when IRQ Forwarding is enabled. So in that model, VFIO > needs to provide information as well as consume it to change behavior. > Thanks, > > Alex N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a��� 0��h���i
RE: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding
> -Original Message- > From: Joerg Roedel [mailto:j...@8bytes.org] > Sent: Wednesday, June 24, 2015 11:46 PM > To: Alex Williamson > Cc: Wu, Feng; Eric Auger; Avi Kivity; k...@vger.kernel.org; > linux-kernel@vger.kernel.org; pbonz...@redhat.com; mtosa...@redhat.com > Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding > > On Thu, Jun 18, 2015 at 02:04:08PM -0600, Alex Williamson wrote: > > There are plenty of details to be filled in, > > I also need to fill plenty of details in my head first, so here are some > suggestions based on my current understanding. Please don't hesitate to > correct me if where I got something wrong. > > So first I totally agree that the handling of PI/non-PI configurations > should be transparent to user-space. > > I read a bit through the VT-d spec, and my understanding of posted > interrupts so far is that: > > 1) Each VCPU gets a PI-Descriptor with its pending Posted > Interrupts. This descriptor needs to be updated when a VCPU > is migrated to another PCPU and should thus be under control > of KVM. > > This is similar to the vAPIC backing page in the AMD version > of this, except that the PCPU routing information is stored > somewhere else on AMD. > > 2) As long as the VCPU runs the IRTEs are configured for > posting, when the VCPU goes to sleep the old remapped entry is > established again. So when the VCPU sleeps the interrupt > would get routed to VFIO and forwarded through the eventfd. When the vCPU sleeps, says, blocked when guest is running HLT, the interrupt is still in posted mode. The solution is when the vCPU is blocked, we use another notification vector (named wakeup notification vector) to wakeup the blocked vCPU when interrupts happens. And in the wakeup event handler, we unblock the vCPU. Thanks, Feng > > This would be different to the AMD version, where we have a > running bit. When this is clear the IOMMU will trigger an event > in its event-log. This might need special handling in VFIO > ('might' because VFIO does not need to forward the interrupt, > it just needs to make sure the VCPU wakes up). > > Please correct me if my understanding of the Intel version is > wrong. > > So most of the data structures the IOMMU reads for this need to be > updated from KVM code (either x86-generic or AMD/Intel specific code), > as KVM has the information about VCPU load/unload and the IRQ routing. Yes, this part has nothing to do with VFIO, KVM itself can handle it well. > > What KVM needs from VFIO are the informations about the physical > interrupts, and it makes total sense to attach them as metadata to the > eventfd. When guest set the irq affinity, QEMU first gets the MSI/MSIx configuration, then it passes these information to kernel space via VFIO infrastructure, we need these MSI/MSIx configuration to update the associated posted-format IRTE according. This is the key point for PI in term of VFIO. Thanks, Feng > > But the problems start at how this metadata should look like. It would > be good to have some generic description, but not sure if this is > possible. Otherwise this metadata would need to be requested by VFIO > from the IOMMU driver and passed on to KVM, which it then passes back to > the IOMMU driver. Or something like that. > > > > Joerg -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 2/4] net: dsa: add support for switchdev VLAN objects
On Wed, Jun 24, 2015 at 11:50 AM, Vivien Didelot wrote: > This patch adds the glue between DSA and switchdev operations to add, > delete and dump SWITCHDEV_OBJ_PORT_VLAN objects. > > This is a first step to link the "bridge vlan" command with hardware > entries for DSA compatible switch chips. > > Signed-off-by: Vivien Didelot Acked-by: Scott Feldman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/1] perf/x86: fix SLM MSR_OFFCORE_RSP1 valid_mask
From: Kan Liang AVG_LATENCY(bit 38) is only available on MSR_OFFCORE_RSP0. So the bit should be removed from RSP1 valid_mask. Since RSP0 and RSP1 may have different valid_mask, intel_alt_er should validate the config on the alternate offcore reg before replacing it. Signed-off-by: Kan Liang --- arch/x86/kernel/cpu/perf_event_intel.c | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c index b9826a9..71815cf 100644 --- a/arch/x86/kernel/cpu/perf_event_intel.c +++ b/arch/x86/kernel/cpu/perf_event_intel.c @@ -1114,7 +1114,7 @@ static struct extra_reg intel_slm_extra_regs[] __read_mostly = { /* must define OFFCORE_RSP_X first, see intel_fixup_er() */ INTEL_UEVENT_EXTRA_REG(0x01b7, MSR_OFFCORE_RSP_0, 0x768005ull, RSP_0), - INTEL_UEVENT_EXTRA_REG(0x02b7, MSR_OFFCORE_RSP_1, 0x768005ull, RSP_1), + INTEL_UEVENT_EXTRA_REG(0x02b7, MSR_OFFCORE_RSP_1, 0x368005ull, RSP_1), EVENT_EXTRA_END }; @@ -1699,18 +1699,22 @@ intel_bts_constraints(struct perf_event *event) return NULL; } -static int intel_alt_er(int idx) +static int intel_alt_er(int idx, u64 config) { + int alt_idx; if (!(x86_pmu.flags & PMU_FL_HAS_RSP_1)) return idx; if (idx == EXTRA_REG_RSP_0) - return EXTRA_REG_RSP_1; + alt_idx = EXTRA_REG_RSP_1; if (idx == EXTRA_REG_RSP_1) - return EXTRA_REG_RSP_0; + alt_idx = EXTRA_REG_RSP_0; - return idx; + if (config & ~x86_pmu.extra_regs[alt_idx].valid_mask) + return idx; + + return alt_idx; } static void intel_fixup_er(struct perf_event *event, int idx) @@ -1799,7 +1803,7 @@ again: */ c = NULL; } else { - idx = intel_alt_er(idx); + idx = intel_alt_er(idx, reg->config); if (idx != reg->idx) { raw_spin_unlock_irqrestore(>lock, flags); goto again; -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [GIT] Networking
Linus, > > On the *other* side of the same conflict, I find an even more offensive > commit, > namely commit 4cd7c9479aff ("IB/mad: Add support for additional MAD info > to/from drivers") which adds a BUG_ON() for a sanity check, rather than just > returning -EINVAL or something sane like that. > > I'm getting *real* tired of that BUG_ON() shit. I realize that infiniband is a > niche market, and those "commercial grade" niche markets are more-than- > used-to crap code and horrible hacks, but this is still the kernel. We don't > add > random machine-killing debug checks when it is *so* simple to just do > > if (WARN_ON_ONCE(..)) > return -EINVAL; > > instead. Please accept my apologies. The original patch used WARN_ON but I was advised to use BUG_ON in a review and I should have thought about it more rather than blindly make the change. > > Killing the machine for idiotic things like that is truly offensive, and truly > horrible horrible code. Why do I keep on having to tell people off for doing > these things? Why do people keep thinking that debugging-by-killing-the- > machine is a good idea? > > Either that BUG_ON() cannot possibly happen, in which case it should damn > well not exist in the first place. Or it's a valuable debug aid, in which > case it > should damn well not be a BUG_ON. You can't have it both ways. It was intended as a debug aid. > > The next pointless BUG_ON() I see, I will start getting _really_ unpleasant > about. > > Doug, get rid of those things asap. Fix submitted to Doug. https://patchwork.kernel.org/patch/6671931/ Ira
Re: linux-next: build failure after merge of the modules tree
Hi Dan, On Thu, 25 Jun 2015 08:57:06 +1000 Stephen Rothwell wrote: > > On Wed, 24 Jun 2015 14:18:44 -0400 Dan Streetman wrote: > > > > On Tue, Jun 23, 2015 at 9:37 PM, Stephen Rothwell > > wrote: > > > > > > After merging the modules tree, today's linux-next build (x86_64 > > > allmodconfig) failed like this: > > > > that's weird. Are you sure it failed during allmodconfig? I can see > > why it would fail like that if CONFIG_MODULES ins't defined, which > > I'll send a patch for... > > Pretty sure - and, in any case, I don't do any CONFIG_MODULES=n builds > between tree merges (only later in the day). That is why I couldn't > figure out what went wrong. > > I will apply your patch today and see if that helps. I built without your patch and it failed again, but applying your patch fixes it. Rusty, you can consider this Tested-by: Stephen Rothwell for "[PATCH] modules: only use mod->param_lock if CONFIG_MODULES" -- Cheers, Stephen Rothwells...@canb.auug.org.au pgpZAvvYXZwgD.pgp Description: OpenPGP digital signature
[PATCH v8 9/9] video: fbdev: vt8623fb: use arch_phys_wc_add() and pci_iomap_wc()
From: "Luis R. Rodriguez" This driver uses the same area for MTRR as for the ioremap(). Convert the driver from using the x86 specific MTRR code to the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add() will avoid MTRR if write-combining is available, in order to take advantage of that also ensure the ioremap'd area is requested as write-combining. There are a few motivations for this: a) Take advantage of PAT when available b) Help bury MTRR code away, MTRR is architecture specific and on x86 its replaced by PAT c) Help with the goal of eventually using _PAGE_CACHE_UC over _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (see commit de33c442e titled "x86 PAT: fix performance drop for glx, use UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()") The conversion done is expressed by the following Coccinelle SmPL patch, it additionally required manual intervention to address all the #ifdery and removal of redundant things which arch_phys_wc_add() already addresses such as verbose message about when MTRR fails and doing nothing when we didn't get an MTRR. @ mtrr_found @ expression index, base, size; @@ -index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1); +index = arch_phys_wc_add(base, size); @ mtrr_rm depends on mtrr_found @ expression mtrr_found.index, mtrr_found.base, mtrr_found.size; @@ -mtrr_del(index, base, size); +arch_phys_wc_del(index); @ mtrr_rm_zero_arg depends on mtrr_found @ expression mtrr_found.index; @@ -mtrr_del(index, 0, 0); +arch_phys_wc_del(index); @ mtrr_rm_fb_info depends on mtrr_found @ struct fb_info *info; expression mtrr_found.index; @@ -mtrr_del(index, info->fix.smem_start, info->fix.smem_len); +arch_phys_wc_del(index); @ ioremap_replace_nocache depends on mtrr_found @ struct fb_info *info; expression base, size; @@ -info->screen_base = ioremap_nocache(base, size); +info->screen_base = ioremap_wc(base, size); @ ioremap_replace_default depends on mtrr_found @ struct fb_info *info; expression base, size; @@ -info->screen_base = ioremap(base, size); +info->screen_base = ioremap_wc(base, size); Generated-by: Coccinelle SmPL Cc: Rob Clark Cc: Laurent Pinchart Cc: Jingoo Han Cc: "Lad, Prabhakar" Cc: Suresh Siddha Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Juergen Gross Cc: Daniel Vetter Cc: Andy Lutomirski Cc: Dave Airlie Cc: Antonino Daplas Cc: Jean-Christophe Plagniol-Villard Cc: Tomi Valkeinen Cc: linux-fb...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Acked-by: Tomi Valkeinen Signed-off-by: Luis R. Rodriguez --- drivers/video/fbdev/vt8623fb.c | 31 ++- 1 file changed, 6 insertions(+), 25 deletions(-) diff --git a/drivers/video/fbdev/vt8623fb.c b/drivers/video/fbdev/vt8623fb.c index ea7f056..60f24828 100644 --- a/drivers/video/fbdev/vt8623fb.c +++ b/drivers/video/fbdev/vt8623fb.c @@ -26,13 +26,9 @@ #include /* Why should fb driver call console functions? because console_lock() */ #include -#ifdef CONFIG_MTRR -#include -#endif - struct vt8623fb_info { char __iomem *mmio_base; - int mtrr_reg; + int wc_cookie; struct vgastate state; struct mutex open_lock; unsigned int ref_count; @@ -99,10 +95,7 @@ static struct svga_timing_regs vt8623_timing_regs = { /* Module parameters */ static char *mode_option = "640x480-8@60"; - -#ifdef CONFIG_MTRR static int mtrr = 1; -#endif MODULE_AUTHOR("(c) 2006 Ondrej Zajicek "); MODULE_LICENSE("GPL"); @@ -112,11 +105,8 @@ module_param(mode_option, charp, 0644); MODULE_PARM_DESC(mode_option, "Default video mode ('640x480-8@60', etc)"); module_param_named(mode, mode_option, charp, 0); MODULE_PARM_DESC(mode, "Default video mode e.g. '648x480-8@60' (deprecated)"); - -#ifdef CONFIG_MTRR module_param(mtrr, int, 0444); MODULE_PARM_DESC(mtrr, "Enable write-combining with MTRR (1=enable, 0=disable, default=1)"); -#endif /* - */ @@ -710,7 +700,7 @@ static int vt8623_pci_probe(struct pci_dev *dev, const struct pci_device_id *id) info->fix.mmio_len = pci_resource_len(dev, 1); /* Map physical IO memory address into kernel space */ - info->screen_base = pci_iomap(dev, 0, 0); + info->screen_base = pci_iomap_wc(dev, 0, 0); if (! info->screen_base) { rc = -ENOMEM; dev_err(info->device, "iomap for framebuffer failed\n"); @@ -781,12 +771,9 @@ static int vt8623_pci_probe(struct pci_dev *dev, const struct pci_device_id *id) /* Record a reference to the driver data */ pci_set_drvdata(dev, info); -#ifdef CONFIG_MTRR - if (mtrr) { - par->mtrr_reg = -1; - par->mtrr_reg = mtrr_add(info->fix.smem_start, info->fix.smem_len, MTRR_TYPE_WRCOMB, 1); - } -#endif + if (mtrr) + par->wc_cookie = arch_phys_wc_add(info->fix.smem_start, +
[PATCH v5 3/3] video: fbdev: atyfb: use arch_phys_wc_add() and ioremap_wc()
From: "Luis R. Rodriguez" This driver uses strong UC for the MMIO region, and ioremap_wc() for the framebuffer to whitelist for the WC MTRR what can be changed to WC. On PAT systems we don't need the MTRR call so just use arch_phys_wc_add() there, this lets us remove all those ifdefs. Lets also be consistent and use ioremap_wc() for ATARI as well. There are a few motivations for this: a) Take advantage of PAT when available b) Help bury MTRR code away, MTRR is architecture specific and on x86 its replaced by PAT c) Help with the goal of eventually using _PAGE_CACHE_UC over _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (see commit de33c442e titled "x86 PAT: fix performance drop for glx, use UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()") The conversion done is expressed by the following Coccinelle SmPL patch, it additionally required manual intervention to address all the #ifdery and removal of redundant things which arch_phys_wc_add() already addresses such as verbose message about when MTRR fails and doing nothing when we didn't get an MTRR. @ mtrr_found @ expression index, base, size; @@ -index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1); +index = arch_phys_wc_add(base, size); @ mtrr_rm depends on mtrr_found @ expression mtrr_found.index, mtrr_found.base, mtrr_found.size; @@ -mtrr_del(index, base, size); +arch_phys_wc_del(index); @ mtrr_rm_zero_arg depends on mtrr_found @ expression mtrr_found.index; @@ -mtrr_del(index, 0, 0); +arch_phys_wc_del(index); @ mtrr_rm_fb_info depends on mtrr_found @ struct fb_info *info; expression mtrr_found.index; @@ -mtrr_del(index, info->fix.smem_start, info->fix.smem_len); +arch_phys_wc_del(index); @ ioremap_replace_nocache depends on mtrr_found @ struct fb_info *info; expression base, size; @@ -info->screen_base = ioremap_nocache(base, size); +info->screen_base = ioremap_wc(base, size); @ ioremap_replace_default depends on mtrr_found @ struct fb_info *info; expression base, size; @@ -info->screen_base = ioremap(base, size); +info->screen_base = ioremap_wc(base, size); Generated-by: Coccinelle SmPL Cc: Toshi Kani Cc: Suresh Siddha Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Juergen Gross Cc: Daniel Vetter Cc: Andy Lutomirski Cc: Dave Airlie Cc: Antonino Daplas Cc: Jean-Christophe Plagniol-Villard Cc: Tomi Valkeinen Cc: Ville Syrjälä Cc: Rob Clark Cc: Mathias Krause Cc: Andrzej Hajda Cc: Mel Gorman Cc: Vlastimil Babka Cc: Borislav Petkov Cc: Davidlohr Bueso Cc: linux-fb...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Luis R. Rodriguez --- drivers/video/fbdev/aty/atyfb.h | 4 +--- drivers/video/fbdev/aty/atyfb_base.c | 36 +++- 2 files changed, 8 insertions(+), 32 deletions(-) diff --git a/drivers/video/fbdev/aty/atyfb.h b/drivers/video/fbdev/aty/atyfb.h index 89ec439..63c4842 100644 --- a/drivers/video/fbdev/aty/atyfb.h +++ b/drivers/video/fbdev/aty/atyfb.h @@ -182,9 +182,7 @@ struct atyfb_par { unsigned long irq_flags; unsigned int irq; spinlock_t int_lock; -#ifdef CONFIG_MTRR - int mtrr_aper; -#endif + int wc_cookie; u32 mem_cntl; struct crtc saved_crtc; union aty_pll saved_pll; diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c index 546f5af..96c605c 100644 --- a/drivers/video/fbdev/aty/atyfb_base.c +++ b/drivers/video/fbdev/aty/atyfb_base.c @@ -98,9 +98,6 @@ #ifdef CONFIG_PMAC_BACKLIGHT #include #endif -#ifdef CONFIG_MTRR -#include -#endif /* * Debug flags. @@ -303,9 +300,7 @@ static struct fb_ops atyfb_ops = { }; static bool noaccel; -#ifdef CONFIG_MTRR static bool nomtrr; -#endif static int vram; static int pll; static int mclk; @@ -2628,17 +2623,13 @@ static int aty_init(struct fb_info *info) aty_st_le32(BUS_CNTL, aty_ld_le32(BUS_CNTL, par) | BUS_APER_REG_DIS, par); -#ifdef CONFIG_MTRR - par->mtrr_aper = -1; - if (!nomtrr) { + if (!nomtrr) /* * Only the ioremap_wc()'d area will get WC here * since ioremap_uc() was used on the entire PCI BAR. */ - par->mtrr_aper = mtrr_add(par->res_start, par->res_size, - MTRR_TYPE_WRCOMB, 1); - } -#endif + par->wc_cookie = arch_phys_wc_add(par->res_start, + par->res_size); info->fbops = _ops; info->pseudo_palette = par->pseudo_palette; @@ -2766,13 +2757,8 @@ aty_init_exit: /* restore video mode */ aty_set_crtc(par, >saved_crtc); par->pll_ops->set_pll(info, >saved_pll); + arch_phys_wc_del(par->wc_cookie); -#ifdef CONFIG_MTRR - if (par->mtrr_aper >= 0) { - mtrr_del(par->mtrr_aper, 0, 0); - par->mtrr_aper = -1; - } -#endif return ret; } @@
[PATCH v8 8/9] video: fbdev: s3fb: use arch_phys_wc_add() and pci_iomap_wc()
From: "Luis R. Rodriguez" This driver uses the same area for MTRR as for the ioremap(). Convert the driver from using the x86 specific MTRR code to the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add() will avoid MTRR if write-combining is available, in order to take advantage of that also ensure the ioremap'd area is requested as write-combining. There are a few motivations for this: a) Take advantage of PAT when available b) Help bury MTRR code away, MTRR is architecture specific and on x86 its replaced by PAT c) Help with the goal of eventually using _PAGE_CACHE_UC over _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (see commit de33c442e titled "x86 PAT: fix performance drop for glx, use UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()") The conversion done is expressed by the following Coccinelle SmPL patch, it additionally required manual intervention to address all the #ifdery and removal of redundant things which arch_phys_wc_add() already addresses such as verbose message about when MTRR fails and doing nothing when we didn't get an MTRR. @ mtrr_found @ expression index, base, size; @@ -index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1); +index = arch_phys_wc_add(base, size); @ mtrr_rm depends on mtrr_found @ expression mtrr_found.index, mtrr_found.base, mtrr_found.size; @@ -mtrr_del(index, base, size); +arch_phys_wc_del(index); @ mtrr_rm_zero_arg depends on mtrr_found @ expression mtrr_found.index; @@ -mtrr_del(index, 0, 0); +arch_phys_wc_del(index); @ mtrr_rm_fb_info depends on mtrr_found @ struct fb_info *info; expression mtrr_found.index; @@ -mtrr_del(index, info->fix.smem_start, info->fix.smem_len); +arch_phys_wc_del(index); @ ioremap_replace_nocache depends on mtrr_found @ struct fb_info *info; expression base, size; @@ -info->screen_base = ioremap_nocache(base, size); +info->screen_base = ioremap_wc(base, size); @ ioremap_replace_default depends on mtrr_found @ struct fb_info *info; expression base, size; @@ -info->screen_base = ioremap(base, size); +info->screen_base = ioremap_wc(base, size); Generated-by: Coccinelle SmPL Cc: Jean-Christophe Plagniol-Villard Cc: Tomi Valkeinen Cc: Jingoo Han Cc: Geert Uytterhoeven Cc: Daniel Vetter Cc: "Lad, Prabhakar" Cc: Rickard Strandqvist Cc: Suresh Siddha Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Juergen Gross Cc: Andy Lutomirski Cc: Dave Airlie Cc: Antonino Daplas Cc: linux-fb...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Acked-by: Tomi Valkeinen Signed-off-by: Luis R. Rodriguez --- drivers/video/fbdev/s3fb.c | 35 ++- 1 file changed, 6 insertions(+), 29 deletions(-) diff --git a/drivers/video/fbdev/s3fb.c b/drivers/video/fbdev/s3fb.c index f0ae61a..13b1090 100644 --- a/drivers/video/fbdev/s3fb.c +++ b/drivers/video/fbdev/s3fb.c @@ -28,13 +28,9 @@ #include #include -#ifdef CONFIG_MTRR -#include -#endif - struct s3fb_info { int chip, rev, mclk_freq; - int mtrr_reg; + int wc_cookie; struct vgastate state; struct mutex open_lock; unsigned int ref_count; @@ -154,11 +150,7 @@ static const struct svga_timing_regs s3_timing_regs = { static char *mode_option; - -#ifdef CONFIG_MTRR static int mtrr = 1; -#endif - static int fasttext = 1; @@ -170,11 +162,8 @@ module_param(mode_option, charp, 0444); MODULE_PARM_DESC(mode_option, "Default video mode ('640x480-8@60', etc)"); module_param_named(mode, mode_option, charp, 0444); MODULE_PARM_DESC(mode, "Default video mode ('640x480-8@60', etc) (deprecated)"); - -#ifdef CONFIG_MTRR module_param(mtrr, int, 0444); MODULE_PARM_DESC(mtrr, "Enable write-combining with MTRR (1=enable, 0=disable, default=1)"); -#endif module_param(fasttext, int, 0644); MODULE_PARM_DESC(fasttext, "Enable S3 fast text mode (1=enable, 0=disable, default=1)"); @@ -1168,7 +1157,7 @@ static int s3_pci_probe(struct pci_dev *dev, const struct pci_device_id *id) info->fix.smem_len = pci_resource_len(dev, 0); /* Map physical IO memory address into kernel space */ - info->screen_base = pci_iomap(dev, 0, 0); + info->screen_base = pci_iomap_wc(dev, 0, 0); if (! info->screen_base) { rc = -ENOMEM; dev_err(info->device, "iomap for framebuffer failed\n"); @@ -1365,12 +1354,9 @@ static int s3_pci_probe(struct pci_dev *dev, const struct pci_device_id *id) /* Record a reference to the driver data */ pci_set_drvdata(dev, info); -#ifdef CONFIG_MTRR - if (mtrr) { - par->mtrr_reg = -1; - par->mtrr_reg = mtrr_add(info->fix.smem_start, info->fix.smem_len, MTRR_TYPE_WRCOMB, 1); - } -#endif + if (mtrr) + par->wc_cookie = arch_phys_wc_add(info->fix.smem_start, + info->fix.smem_len); return 0; @@ -1405,14 +1391,7 @@ static void s3_pci_remove(struct pci_dev *dev)
Re: Stop SSD from waiting for "Spinning up disk..."
On Thu, Jun 25, 2015 at 07:55:45AM +0800, Jeff Chua wrote: > On Thu, Jun 25, 2015 at 12:28 AM, Greg Kroah-Hartman > wrote: > > On Thu, Jun 25, 2015 at 12:22:47AM +0800, Jeff Chua wrote: > > >> Both sda and sdb have the same SSD model. > > > > That's a bug in your USB bridge chip, odds are it is not reporting the > > value properly. There's nothing the scsi core or USB stack can do about > > this, sorry. Please complain to the hardware manufacturer. > > There are workaround boot cmdline parameters for other things ... any > chance to consider one to fix broken rotational option? I'm not sure > how many out there are broken, but I really would like a faster way to > access my USB SSD without waiting for the "disk spinup". Just like module paramaters, boot command lines are not for device specific attributes, sorry. Again, please contact the manufacturer to get this fixed. We can't add a quirk for this bridge because it would not work if you really put a rotational disk behind it. Given the cheap cost of these types of bridges, I recommend just getting one that works. Best of luck, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v5 2/3] video: fbdev: atyfb: replace MTRR UC hole with strong UC
From: "Luis R. Rodriguez" Replace a WC MTRR call followed by a UC MTRR "hole" call with a single WC MTRR call and use strong UC to protect the MMIO region and account for the device's architecture and MTRR size requirements. The atyfb driver relies on two overlapping MTRRs. It does this to account for the fact that on some devices it has the MMIO region bundled together with the framebuffer on the same PCI BAR and the hardware requirement on MTRRs on both base and size to be powers of two. In the atyfb driver's case in the worst case the PCI BAR is of 16 MiB while the MMIO region is on the last 4 KiB of the same PCI BAR. If we use just one MTRR for WC we can only end up with an 8 MiB or 16 MiB framebuffer. Using a 16 MiB WC framebuffer area is unacceptable since we need the MMIO region to not be write-combined. An 8 MiB WC framebuffer option does not let use quite a bit of framebuffer space, it would reduce the resolution capability of the device considerably. An alternative is to use many MTRRs but on some systems that could mean not having not enough MTRRs to cover the framebuffer. The current driver solution is to issue a 16 MiB WC MTRR followed by a 4 KiB UC MTRR on the last 4 KiB. Its worth mentioning and documenting that the current ioremap*() strategy as well: the first ioremap() is used only for the MMIO region, a second ioremap() call is used for the framebuffer *and* the MMIO region, the MMIO region then ends up mmap'd twice. Two ioremap() calls are used since in some situations the framebuffer actually ends up on a separate auxiliary PCI BAR, but this is not always true, in the worst case the PCI BAR is shared for both MMIO and the framebuffer. By allowing overlapping ioremap() calls the driver enables two types of devices with one simple ioremap() strategy. For non PAT systems: As per Intel SDM "11.5.2.1 Selecting Memory Types for Pentium Pro and Pentium II Processors" [0] the effect of a WC MTRR for a region with page attribute settings set to PCD=1, PWT=1 (Linux _PAGE_CACHE_MODE_UC) will render the effective memory type to UC. A WC MTRR for a region with page attribute settings set to PCD=1, PWT=0 (Linux _PAGE_CACHE_MODE_UC_MINUS) will render the effective memory type to WC *but* yet this is considered implementation defined -- that is, "system designers are encouraged to avoid these implementation-defined combinations". A WC MTRR for a region with page attribute settings set to PCD=0, PWT=1 (Linux _PAGE_CACHE_MODE_WC) will render the effective memory type to WC *but* this is also implementation defined. Such is the case for non-PAT systems. For PAT systems: As per Intel SDM "11.5.2.2 Selecting Memory Types for Pentium III and More Recent Processor Families" the ffect of a WC MTRR for a region with a PAT entry value of UC will be UC. The effect of a WC MTRR on a region with a PAT entry UC- will be WC. The effect of a WC MTRR on a regoin with PAT entry WC is WC. This can all be summarized in the following table: -- MTRR Non-PAT PATLinux ioremap valueEffective memory type -- Non-PAT | PAT PAT |PCD ||PWT ||| WC 000 WB _PAGE_CACHE_MODE_WBWC | WC WC 001 WC _PAGE_CACHE_MODE_WCWC* | WC WC 010 UC- _PAGE_CACHE_MODE_UC_MINUS WC* | UC WC 011 UC _PAGE_CACHE_MODE_UCUC | UC -- (*) denotes implementation defined By default Linux today defaults both and ioremap_nocache() to use _PAGE_CACHE_MODE_UC_MINUS. On x86 ioremap() aliases ioremap_nocache(). The preferred value for Linux by may soon change however, the goal is to use _PAGE_CACHE_MODE_UC by default in the future. We can use ioremap_uc() to set PCD=1, PWT=1 on non-PAT systems and use a PAT value of UC for PAT systems. This will ensure the same settings are in place regardless of what Linux decides to use by default later and to not regress our MTRR strategy since the effective memory type will differ depending on the value used. Using a WC MTRR on such an area will be nullified. This technique can be used to protect the MMIO region in this driver's case and address the restrictions of the device's architecture as well as restrictions set upon us by powers of 2 when using MTRRs. This allows us to replace the two MTRR calls with a single 16 MiB WC MTRR and use page-attribute settings for non-PAT and PAT entry values for PAT systems to ensure the appropriate effective memory type won't have a write-combined effect on the MMIO region on both non-PAT and PAT systems. The framebuffer area will be sure to get the write-combined effective memory type by white-listing it with ioremap_wc(). We ensure the desired effective memory types are set by: 0) Using one
[PATCH v8 7/9] video: fbdev: arkfb: use arch_phys_wc_add() and pci_iomap_wc()
From: "Luis R. Rodriguez" Convert the driver from using the x86 specific MTRR code to the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add() will avoid MTRR if write-combining is available, in order to take advantage of that also ensure the ioremap'd area is requested as write-combining. There are a few motivations for this: a) Take advantage of PAT when available b) Help bury MTRR code away, MTRR is architecture specific and on x86 its replaced by PAT c) Help with the goal of eventually using _PAGE_CACHE_UC over _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (see commit de33c442e titled "x86 PAT: fix performance drop for glx, use UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()") The conversion done is expressed by the following Coccinelle SmPL patch, it additionally required manual intervention to address all the #ifdery and removal of redundant things which arch_phys_wc_add() already addresses such as verbose message about when MTRR fails and doing nothing when we didn't get an MTRR. @ mtrr_found @ expression index, base, size; @@ -index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1); +index = arch_phys_wc_add(base, size); @ mtrr_rm depends on mtrr_found @ expression mtrr_found.index, mtrr_found.base, mtrr_found.size; @@ -mtrr_del(index, base, size); +arch_phys_wc_del(index); @ mtrr_rm_zero_arg depends on mtrr_found @ expression mtrr_found.index; @@ -mtrr_del(index, 0, 0); +arch_phys_wc_del(index); @ mtrr_rm_fb_info depends on mtrr_found @ struct fb_info *info; expression mtrr_found.index; @@ -mtrr_del(index, info->fix.smem_start, info->fix.smem_len); +arch_phys_wc_del(index); @ ioremap_replace_nocache depends on mtrr_found @ struct fb_info *info; expression base, size; @@ -info->screen_base = ioremap_nocache(base, size); +info->screen_base = ioremap_wc(base, size); @ ioremap_replace_default depends on mtrr_found @ struct fb_info *info; expression base, size; @@ -info->screen_base = ioremap(base, size); +info->screen_base = ioremap_wc(base, size); Generated-by: Coccinelle SmPL Cc: Laurent Pinchart Cc: Geert Uytterhoeven Cc: "Lad, Prabhakar" Cc: Suresh Siddha Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Juergen Gross Cc: Daniel Vetter Cc: Andy Lutomirski Cc: Dave Airlie Cc: Antonino Daplas Cc: Jean-Christophe Plagniol-Villard Cc: Tomi Valkeinen Cc: linux-fb...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Acked-by: Tomi Valkeinen Signed-off-by: Luis R. Rodriguez --- drivers/video/fbdev/arkfb.c | 36 +--- 1 file changed, 5 insertions(+), 31 deletions(-) diff --git a/drivers/video/fbdev/arkfb.c b/drivers/video/fbdev/arkfb.c index b305a1e..6a317de 100644 --- a/drivers/video/fbdev/arkfb.c +++ b/drivers/video/fbdev/arkfb.c @@ -26,13 +26,9 @@ #include /* Why should fb driver call console functions? because console_lock() */ #include -#ifdef CONFIG_MTRR -#include -#endif - struct arkfb_info { int mclk_freq; - int mtrr_reg; + int wc_cookie; struct dac_info *dac; struct vgastate state; @@ -102,10 +98,6 @@ static const struct svga_timing_regs ark_timing_regs = { static char *mode_option = "640x480-8@60"; -#ifdef CONFIG_MTRR -static int mtrr = 1; -#endif - MODULE_AUTHOR("(c) 2007 Ondrej Zajicek "); MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("fbdev driver for ARK 2000PV"); @@ -115,11 +107,6 @@ MODULE_PARM_DESC(mode_option, "Default video mode ('640x480-8@60', etc)"); module_param_named(mode, mode_option, charp, 0444); MODULE_PARM_DESC(mode, "Default video mode ('640x480-8@60', etc) (deprecated)"); -#ifdef CONFIG_MTRR -module_param(mtrr, int, 0444); -MODULE_PARM_DESC(mtrr, "Enable write-combining with MTRR (1=enable, 0=disable, default=1)"); -#endif - static int threshold = 4; module_param(threshold, int, 0644); @@ -1002,7 +989,7 @@ static int ark_pci_probe(struct pci_dev *dev, const struct pci_device_id *id) info->fix.smem_len = pci_resource_len(dev, 0); /* Map physical IO memory address into kernel space */ - info->screen_base = pci_iomap(dev, 0, 0); + info->screen_base = pci_iomap_wc(dev, 0, 0); if (! info->screen_base) { rc = -ENOMEM; dev_err(info->device, "iomap for framebuffer failed\n"); @@ -1057,14 +1044,8 @@ static int ark_pci_probe(struct pci_dev *dev, const struct pci_device_id *id) /* Record a reference to the driver data */ pci_set_drvdata(dev, info); - -#ifdef CONFIG_MTRR - if (mtrr) { - par->mtrr_reg = -1; - par->mtrr_reg = mtrr_add(info->fix.smem_start, info->fix.smem_len, MTRR_TYPE_WRCOMB, 1); - } -#endif - + par->wc_cookie = arch_phys_wc_add(info->fix.smem_start, + info->fix.smem_len); return 0; /* Error handling */ @@ -1092,14 +1073,7 @@ static void ark_pci_remove(struct pci_dev *dev) if (info) { struct
[PATCH v5 1/3] video: fbdev: atyfb: clarify ioremap() base and length used
From: "Luis R. Rodriguez" This has no functional changes, it just adjusts the ioremap() call for the framebuffer to use the same values we later use for the framebuffer, this will make it easier to review the next change. The size of the framebuffer varies but since this is for PCI we *know* this defaults to 0x80. atyfb_setup_generic() is *only* used on PCI probe. Cc: Toshi Kani Cc: Suresh Siddha Cc: Ingo Molnar Cc: Linus Torvalds Cc: Thomas Gleixner Cc: Juergen Gross Cc: Daniel Vetter Cc: Andy Lutomirski Cc: Dave Airlie Cc: Antonino Daplas Cc: Jean-Christophe Plagniol-Villard Cc: Tomi Valkeinen Cc: Ville Syrjälä Cc: Rob Clark Cc: Mathias Krause Cc: Andrzej Hajda Cc: Mel Gorman Cc: Vlastimil Babka Cc: Borislav Petkov Cc: Davidlohr Bueso Cc: linux-fb...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Luis R. Rodriguez --- drivers/video/fbdev/aty/atyfb_base.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c index 16936bb..8025624 100644 --- a/drivers/video/fbdev/aty/atyfb_base.c +++ b/drivers/video/fbdev/aty/atyfb_base.c @@ -3489,7 +3489,9 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info, /* Map in frame buffer */ info->fix.smem_start = addr; - info->screen_base = ioremap(addr, 0x80); + info->fix.smem_len = 0x80; + + info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len); if (info->screen_base == NULL) { ret = -ENOMEM; goto atyfb_setup_generic_fail; -- 2.3.2.209.gd67f9d5.dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v8 6/9] lib: devres: add pcim_iomap_wc() variants
From: "Luis R. Rodriguez" Now that we have pci_iomap_wc() add the respective devres helpers. These go unexported for now but note that should they later be exported this must go with EXPORT_SYMBOL_GPL(). Cc: Toshi Kani Cc: Andy Lutomirski Cc: Suresh Siddha Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Juergen Gross Cc: Daniel Vetter Cc: Dave Airlie Cc: Bjorn Helgaas Cc: Antonino Daplas Cc: Jean-Christophe Plagniol-Villard Cc: Tomi Valkeinen Cc: Dave Hansen Cc: Arnd Bergmann Cc: Michael S. Tsirkin Cc: venkatesh.pallip...@intel.com Cc: Stefan Bader Cc: Ville Syrjälä Cc: Mel Gorman Cc: Vlastimil Babka Cc: Borislav Petkov Cc: Davidlohr Bueso Cc: konrad.w...@oracle.com Cc: ville.syrj...@linux.intel.com Cc: david.vra...@citrix.com Cc: jbeul...@suse.com Cc: Roger Pau Monné Cc: linux-fb...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: xen-de...@lists.xensource.com Acked-by: Arnd Bergmann Signed-off-by: Luis R. Rodriguez --- include/linux/pci.h | 2 ++ lib/devres.c| 76 + 2 files changed, 78 insertions(+) diff --git a/include/linux/pci.h b/include/linux/pci.h index 1193975..5ff15c1 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1609,9 +1609,11 @@ static inline void pci_dev_specific_enable_acs(struct pci_dev *dev) { } #endif void __iomem *pcim_iomap(struct pci_dev *pdev, int bar, unsigned long maxlen); +void __iomem *pcim_iomap_wc(struct pci_dev *pdev, int bar, unsigned long maxlen); void pcim_iounmap(struct pci_dev *pdev, void __iomem *addr); void __iomem * const *pcim_iomap_table(struct pci_dev *pdev); int pcim_iomap_regions(struct pci_dev *pdev, int mask, const char *name); +int pcim_iomap_wc_regions(struct pci_dev *pdev, int mask, const char *name); int pcim_iomap_regions_request_all(struct pci_dev *pdev, int mask, const char *name); void pcim_iounmap_regions(struct pci_dev *pdev, int mask); diff --git a/lib/devres.c b/lib/devres.c index fbe2aac..38acc53 100644 --- a/lib/devres.c +++ b/lib/devres.c @@ -304,6 +304,29 @@ void __iomem *pcim_iomap(struct pci_dev *pdev, int bar, unsigned long maxlen) EXPORT_SYMBOL(pcim_iomap); /** + * pcim_iomap_wc - Managed pcim_iomap_wc() + * @pdev: PCI device to iomap for + * @bar: BAR to iomap + * @maxlen: Maximum length of iomap + * + * Managed pci_iomap_wc(). Map is automatically unmapped on driver + * detach. + */ +void __iomem *pcim_iomap_wc(struct pci_dev *pdev, int bar, unsigned long maxlen) +{ + void __iomem **tbl; + + BUG_ON(bar >= PCIM_IOMAP_MAX); + + tbl = (void __iomem **)pcim_iomap_table(pdev); + if (!tbl || tbl[bar]) /* duplicate mappings not allowed */ + return NULL; + + tbl[bar] = pci_iomap_wc(pdev, bar, maxlen); + return tbl[bar]; +} + +/** * pcim_iounmap - Managed pci_iounmap() * @pdev: PCI device to iounmap for * @addr: Address to unmap @@ -383,6 +406,59 @@ int pcim_iomap_regions(struct pci_dev *pdev, int mask, const char *name) EXPORT_SYMBOL(pcim_iomap_regions); /** + * pcim_iomap_wc_regions - Request and iomap PCI BARs with write-combining + * @pdev: PCI device to map IO resources for + * @mask: Mask of BARs to request and iomap + * @name: Name used when requesting regions + * + * Request and iomap regions specified by @mask with a preference for + * write-combining. + */ +int pcim_iomap_wc_regions(struct pci_dev *pdev, int mask, const char *name) +{ + void __iomem * const *iomap; + int i, rc; + + iomap = pcim_iomap_table(pdev); + if (!iomap) + return -ENOMEM; + + for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) { + unsigned long len; + + if (!(mask & (1 << i))) + continue; + + rc = -EINVAL; + len = pci_resource_len(pdev, i); + if (!len) + goto err_inval; + + rc = pci_request_region(pdev, i, name); + if (rc) + goto err_inval; + + rc = -ENOMEM; + if (!pcim_iomap_wc(pdev, i, 0)) + goto err_region; + } + + return 0; + + err_region: + pci_release_region(pdev, i); + err_inval: + while (--i >= 0) { + if (!(mask & (1 << i))) + continue; + pcim_iounmap(pdev, iomap[i]); + pci_release_region(pdev, i); + } + + return rc; +} + +/** * pcim_iomap_regions_request_all - Request all BARs and iomap specified ones * @pdev: PCI device to map IO resources for * @mask: Mask of BARs to iomap -- 2.3.2.209.gd67f9d5.dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v5 0/3] atyfb: address MTRR corner case
From: "Luis R. Rodriguez" Andrew, Forgive me for the TL;DR, I'm afraid I need to be crystal clear on this patchset as its the most complex in the entire series. The skinny is that this patchset addresses a complex work around with APIs now merged upstream going in for v4.2, the driver maintainer hasn't followed up with the driver changes for over a month and no one else has provided Acks for these device driver changes [0]. We have a few options: 0) Sit and wait for a driver maintainer to review this 1) Merge this as-is and hope for reports 2) go with the nopat requirement as with the ivtv and ipath driver I'd prefer to merge this as is, and only if reports come back with issues should we then consider 2) as we'd then have at least a well documented work effort required for this transformation. This device driver is also old, so I don't expect much reports anyway. The TL;DR: As part of the long haul effort to rid the world of direct MTRR use [1] we've have had to also work on alternative solutions which can co-exist with PAT interfaces. Most of the transformation of device drivers to use PAT was fairly easy (TM): so long as ioremap_wc() was used we could then convert over the drivers using mtrr_add() over to the arch-agnostic and PAT-aware (ignored when PAT is enabled) arch_phys_wc_add(). This was typically easy to do, for instance in cases where a full PCI BAR was used for MMIO registers and another PCI BAR was used with write-combining effects desirable. In some cases we just needed new WC apis for some buses. This was the case for most modern devices, but a few old devices had a combined set of MMIO registers and the write-combined area mixed. In such situations even when using MTRR one had to figure out creative solutions to make things work, specially considering MTRRs were limited and they had size constraints: an MTRR base and size must be a power of two. The good news is that on Linux there were only three device drivers in total that we ended up with radical issue with when converting them over to PAT interfaces. One was with the ivtv media device driver, another was the infiniband ipath device driver. The other one was the framebuffer atyfb device driver that this series addresses. For both ivtv and ipath we've decided to simply require users of those devices to boot with the nopat kernel parameter because both devices drivers are ancient and the work required to fully convert to PAT interfaces is significant (in the ipath case) or nearly almost impossible (ivtv). For details please refer to the respective and now upstream commits: 7ea402d x86/mm/pat, drivers/infiniband/ipath: Use arch_phys_wc_add() and require PAT disabled 1bf1735 x86/mm/pat, drivers/media/ivtv: Use arch_phys_wc_add() and require PAT disabled To demo exactly how much effort would have been required I decided to venture into atyfb and try to fix that device driver first, considering it had the worst case situation to address as it used size hackery and MTRR combinations of different types. In order to accomplish this we needed to map out all possible combinatorial effects of PAT page entries with write-combining, and page attributes (PAT, PCD, PWT) with write-combining effects for non-PAT systems. We did this not only for atyfb's sake but also for any other possible future driver which might meet these same needs. We needed to take this a bit more seriously given that our long term goal was also to change the default behaviour of ioremap_nocache() to use strong UC instead of UC-, we needed to take this into consideration when converting drivers over. The documentation table for all these possible combinatorial entries is now upstream: 2f9e897 x86/mm/mtrr, pat: Document Write Combining MTRR type effects on PAT / non-PAT pages Of importance to this patch set is this table: -- MTRR Non-PAT PATLinux ioremap valueEffective memory type -- Non-PAT | PAT PAT |PCD ||PWT ||| WC 000 WB _PAGE_CACHE_MODE_WBWC | WC WC 001 WC _PAGE_CACHE_MODE_WCWC* | WC WC 010 UC- _PAGE_CACHE_MODE_UC_MINUS WC* | UC WC 011 UC _PAGE_CACHE_MODE_UCUC | UC -- In the atyfb case it used to use two MTRR calls, a large WC MTRR followed by a UC MTRR "hole" call for the MMIO registers. This was done this way on atyfb because of the offset and size of the framebuffer area would only work well this way, otherwise you'd also have to try a series of small MTRR calls and you might end up running out of MTRRs. For non-PAT systems we take advantage of the above map to protect an MMIO region with 011 page attributes (this maps to strong UC for PAT systems) so that if a
[PATCH v8 5/9] PCI: Add pci_iomap_wc() variants
From: "Luis R. Rodriguez" PCI BARs tell us whether prefetching is safe, but they don't say anything about write combining (WC). WC changes ordering rules and allows writes to be collapsed, so it's not safe in general to use it on a prefetchable region. Add pci_iomap_wc() and pci_iomap_wc_range() so drivers can take advantage of write combining when they know it's safe. On architectures that don't fully support WC, e.g., x86 without PAT, drivers for legacy framebuffers may get some of the benefit by using arch_phys_wc_add() in addition to pci_iomap_wc(). But arch_phys_wc_add() is unreliable and should be avoided in general. On x86, it uses MTRRs, which are limited in number and size, so the results will vary based on driver loading order. The goals of adding pci_iomap_wc() are to: - Give drivers an architecture-independent way to use WC so they can stop using interfaces like mtrr_add() (on x86, pci_iomap_wc() uses PAT when available) - Move toward using _PAGE_CACHE_MODE_UC, not _PAGE_CACHE_MODE_UC_MINUS, on x86 on ioremap_nocache() (see de33c442ed2a ("x86 PAT: fix performance drop for glx, use UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()") Link: http://lkml.kernel.org/r/1426893517-2511-6-git-send-email-mcg...@do-not-panic.com Original-posting: http://lkml.kernel.org/r/1432163293-20965-1-git-send-email-mcg...@do-not-panic.com Cc: Toshi Kani Cc: Andy Lutomirski Cc: Suresh Siddha Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Juergen Gross Cc: Daniel Vetter Cc: Dave Airlie Cc: Bjorn Helgaas Cc: Antonino Daplas Cc: Jean-Christophe Plagniol-Villard Cc: Tomi Valkeinen Cc: Dave Hansen Cc: Arnd Bergmann Cc: Michael S. Tsirkin Cc: venkatesh.pallip...@intel.com Cc: Stefan Bader Cc: Ville Syrjälä Cc: Mel Gorman Cc: Vlastimil Babka Cc: Borislav Petkov Cc: Davidlohr Bueso Cc: konrad.w...@oracle.com Cc: ville.syrj...@linux.intel.com Cc: david.vra...@citrix.com Cc: jbeul...@suse.com Cc: Roger Pau Monné Cc: linux-fb...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: xen-de...@lists.xensource.com Acked-by: Arnd Bergmann Signed-off-by: Luis R. Rodriguez --- include/asm-generic/pci_iomap.h | 14 ++ lib/pci_iomap.c | 61 + 2 files changed, 75 insertions(+) diff --git a/include/asm-generic/pci_iomap.h b/include/asm-generic/pci_iomap.h index 7389c87..b1e17fc 100644 --- a/include/asm-generic/pci_iomap.h +++ b/include/asm-generic/pci_iomap.h @@ -15,9 +15,13 @@ struct pci_dev; #ifdef CONFIG_PCI /* Create a virtual mapping cookie for a PCI BAR (memory or IO) */ extern void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long max); +extern void __iomem *pci_iomap_wc(struct pci_dev *dev, int bar, unsigned long max); extern void __iomem *pci_iomap_range(struct pci_dev *dev, int bar, unsigned long offset, unsigned long maxlen); +extern void __iomem *pci_iomap_wc_range(struct pci_dev *dev, int bar, + unsigned long offset, + unsigned long maxlen); /* Create a virtual mapping cookie for a port on a given PCI device. * Do not call this directly, it exists to make it easier for architectures * to override */ @@ -34,12 +38,22 @@ static inline void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned lon return NULL; } +static inline void __iomem *pci_iomap_wc(struct pci_dev *dev, int bar, unsigned long max) +{ + return NULL; +} static inline void __iomem *pci_iomap_range(struct pci_dev *dev, int bar, unsigned long offset, unsigned long maxlen) { return NULL; } +static inline void __iomem *pci_iomap_wc_range(struct pci_dev *dev, int bar, + unsigned long offset, + unsigned long maxlen) +{ + return NULL; +} #endif #endif /* __ASM_GENERIC_IO_H */ diff --git a/lib/pci_iomap.c b/lib/pci_iomap.c index bcce5f1..9604dcb 100644 --- a/lib/pci_iomap.c +++ b/lib/pci_iomap.c @@ -52,6 +52,46 @@ void __iomem *pci_iomap_range(struct pci_dev *dev, EXPORT_SYMBOL(pci_iomap_range); /** + * pci_iomap_wc_range - create a virtual WC mapping cookie for a PCI BAR + * @dev: PCI device that owns the BAR + * @bar: BAR number + * @offset: map memory at the given offset in BAR + * @maxlen: max length of the memory to map + * + * Using this function you will get a __iomem address to your device BAR. + * You can access it using ioread*() and iowrite*(). These functions hide + * the details if this is a MMIO or PIO address space and will just do what + * you expect from them in the correct way. When possible write combining + * is used. + * + * @maxlen specifies the maximum length to map. If you want to get access to + * the complete BAR from offset to
Re: [PATCH v4 4/9] staging:lustre: merge socklnd_lib-linux.h into socklnd.h
On Wed, Jun 24, 2015 at 02:37:51PM +0200, Geert Uytterhoeven wrote: > Hi James, > > On Thu, Jun 11, 2015 at 9:18 PM, James Simmons wrote: > > From: John L. Hammond > > > > Originally socklnd_lib-linux.h contained linux specific > > wrappers and defines but since the linux kernel is the > > only supported platform now we can merge what little > > remains in the header into socklnd.h. This is broken > > out of the original patch 12932 that was merged to the > > Intel/OpenSFS branch. > > > > Signed-off-by: John L. Hammond > > Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2675 > > Reviewed-on: http://review.whamcloud.com/12932 > > Reviewed-by: Isaac Huang > > Reviewed-by: James Simmons > > Reviewed-by: Oleg Drokin > > Signed-off-by: James Simmons > > --- > > .../staging/lustre/lnet/klnds/socklnd/socklnd.h| 39 +- > > .../lustre/lnet/klnds/socklnd/socklnd_lib-linux.h | 86 > > > > .../lustre/lnet/klnds/socklnd/socklnd_lib.c|4 +- > > 3 files changed, 40 insertions(+), 89 deletions(-) > > delete mode 100644 > > drivers/staging/lustre/lnet/klnds/socklnd/socklnd_lib-linux.h > > > > diff --git a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.h > > b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.h > > index 53275f9..7125eb9 100644 > > --- a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.h > > +++ b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.h > > @@ -25,16 +25,40 @@ > > * > > */ > > > > +#ifndef _SOCKLND_SOCKLND_H_ > > +#define _SOCKLND_SOCKLND_H_ > > + > > #define DEBUG_PORTAL_ALLOC > > #define DEBUG_SUBSYSTEM S_LND > > > > -#include "socklnd_lib-linux.h" > > +#include > > +#include > > +#include > > Including first causes a build failure for m68k/allmodconfig: > > arch/m68k/include/asm/irq.h:77:12: error: expected '=', ',', ';', > 'asm' or '__attribute__' before 'void' > arch/m68k/include/asm/irq.h:78:1: error: unknown type name 'atomic_t' > arch/m68k/include/asm/irq.h:77:12: error: expected '=', ',', ';', > 'asm' or '__attribute__' before 'void' > arch/m68k/include/asm/irq.h:78:1: error: unknown type name 'atomic_t' > > http://kisskb.ellerman.id.au/kisskb/buildresult/12448922/ > > Fixing it inside arch/m68k/include/asm/irq.h might cause Include Hell, > so perhaps you can just move the include below all > includes? > Hi Geert, I have not tested it, but I think the following may fix the problem while avoiding any include problems. Since pt_regs is used in the file, one could argue that it should be declared. Thanks, Guenter -- diff --git a/arch/m68k/include/asm/irq.h b/arch/m68k/include/asm/irq.h index 81ca118d58af..28ffa8d59cf0 100644 --- a/arch/m68k/include/asm/irq.h +++ b/arch/m68k/include/asm/irq.h @@ -74,6 +74,8 @@ extern unsigned int irq_canonicalize(unsigned int irq); #define irq_canonicalize(irq) (irq) #endif /* !(CONFIG_M68020 || CONFIG_M68030 || CONFIG_M68040 || CONFIG_M68060) */ +struct pt_regs; + asmlinkage void do_IRQ(int irq, struct pt_regs *regs); extern atomic_t irq_err_count; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v8 4/9] video: fbdev: gxt4500: use pci_ioremap_wc_bar() for framebuffer
From: "Luis R. Rodriguez" The driver doesn't use mtrr_add() or arch_phys_wc_add() but since we know the framebuffer is isolated already on an ioremap() we can take advantage of write combining for performance where possible. In this case there are a few motivations for this: a) Take advantage of PAT when available b) Help with the goal of eventually using _PAGE_CACHE_UC over _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (see commit de33c442e titled "x86 PAT: fix performance drop for glx, use UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()") Cc: Laurent Pinchart Cc: Rob Clark Cc: Geert Uytterhoeven Cc: Suresh Siddha Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Juergen Gross Cc: Daniel Vetter Cc: Andy Lutomirski Cc: Dave Airlie Cc: Antonino Daplas Cc: Jean-Christophe Plagniol-Villard Cc: Tomi Valkeinen Cc: linux-fb...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Acked-by: Tomi Valkeinen Signed-off-by: Luis R. Rodriguez --- drivers/video/fbdev/gxt4500.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/video/fbdev/gxt4500.c b/drivers/video/fbdev/gxt4500.c index 135d78a..f19133a 100644 --- a/drivers/video/fbdev/gxt4500.c +++ b/drivers/video/fbdev/gxt4500.c @@ -662,7 +662,7 @@ static int gxt4500_probe(struct pci_dev *pdev, const struct pci_device_id *ent) info->fix.smem_start = fb_phys; info->fix.smem_len = pci_resource_len(pdev, 1); - info->screen_base = pci_ioremap_bar(pdev, 1); + info->screen_base = pci_ioremap_wc_bar(pdev, 1); if (!info->screen_base) { dev_err(>dev, "gxt4500: cannot map framebuffer\n"); goto err_unmap_regs; -- 2.3.2.209.gd67f9d5.dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] virtio_net: Adding tx_timeout function.
2015-06-24 3:10 GMT-03:00 Michael S. Tsirkin : > > On Tue, Jun 23, 2015 at 10:44:29PM -0300, Julio Faracco wrote: > > virtio_net paravirtualized driver does not have a tx_timeout() function to > > guarantee that the driver will recover properly after receiving a timeout > > during a transmission of a packet. This patch add this feature and throw a > > timeout exception after 5 HZ. Considering some tests, this is the best > > time to use here. > > > > Signed-off-by: Julio Faracco > > Cc: Jason Wang > > Looks like a bunch of locks and flushes are missing in this patch. IMHO > that's just too painful with current hardware. IMO the right thing to > do here is to add ability to reset specific queues to hardware. > I agree, Michael. This model is the default one resetting the device due to transmission timeout. To have a better performance, only some queues must be reset. > > --- > > drivers/net/virtio_net.c | 69 > > +- > > 1 file changed, 68 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > index 63c7810..75ac45c 100644 > > --- a/drivers/net/virtio_net.c > > +++ b/drivers/net/virtio_net.c > > @@ -135,6 +135,9 @@ struct virtnet_info { > > /* Work struct for config space updates */ > > struct work_struct config_work; > > > > + /* Work struct for resetting the virtio-net driver. */ > > + struct work_struct reset_task; > > + > > /* Does the affinity hint is set for virtqueues? */ > > bool affinity_hint_set; > > > > @@ -1394,6 +1397,18 @@ static int virtnet_change_mtu(struct net_device > > *dev, int new_mtu) > > return 0; > > } > > > > +static void virtnet_tx_timeout(struct net_device *dev) > > +{ > > + struct virtnet_info *vi = netdev_priv(dev); > > + > > + dev_warn(>dev, "TX Timeout exception with latency: %ld\n", > > + jiffies - dev_trans_start(dev)); > > + > > + schedule_work(>reset_task); > > What if after this triggers user does something > to the device (e.g. attempts to remove it)? > Or if a packet is transmitted or used? At some point, this work must be canceled. Yes, you are right. Specially, when the driver is being removed. > > > +} > > + > > +static void virtnet_reset_task(struct work_struct *work); > > + > > static const struct net_device_ops virtnet_netdev = { > > .ndo_open= virtnet_open, > > .ndo_stop= virtnet_close, > > @@ -1405,6 +1420,7 @@ static const struct net_device_ops virtnet_netdev = { > > .ndo_get_stats64 = virtnet_stats, > > .ndo_vlan_rx_add_vid = virtnet_vlan_rx_add_vid, > > .ndo_vlan_rx_kill_vid = virtnet_vlan_rx_kill_vid, > > + .ndo_tx_timeout = virtnet_tx_timeout, > > #ifdef CONFIG_NET_POLL_CONTROLLER > > .ndo_poll_controller = virtnet_netpoll, > > #endif > > @@ -1750,6 +1766,7 @@ static int virtnet_probe(struct virtio_device *vdev) > > dev->netdev_ops = _netdev; > > dev->features = NETIF_F_HIGHDMA; > > > > + dev->watchdog_timeo = 5 * HZ; > > dev->ethtool_ops = _ethtool_ops; > > SET_NETDEV_DEV(dev, >dev); > > > > @@ -1811,6 +1828,7 @@ static int virtnet_probe(struct virtio_device *vdev) > > } > > > > INIT_WORK(>config_work, virtnet_config_changed_work); > > + INIT_WORK(>reset_task, virtnet_reset_task); > > > > /* If we can receive ANY GSO packets, we must allocate large ones. */ > > if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) || > > @@ -1891,7 +1909,7 @@ static int virtnet_probe(struct virtio_device *vdev) > > netif_carrier_on(dev); > > } > > > > - pr_debug("virtnet: registered device %s with %d RX and TX vq's\n", > > + pr_debug("virtio_net: registered device %s with %d RX and TX vq's\n", > >dev->name, max_queue_pairs); > > > > return 0; > > @@ -2001,6 +2019,55 @@ static int virtnet_restore(struct virtio_device > > *vdev) > > } > > #endif > > > > +static void virtnet_reset_task(struct work_struct *work) > > +{ > > + struct virtnet_info *vi = > > + container_of(work, struct virtnet_info, reset_task); > > + struct net_device *dev = vi->dev; > > + struct virtio_device *vdev = vi->vdev; > > + int err, i; > > + > > + flush_work(>config_work); > > + > > + netif_device_detach(vi->dev); > > + cancel_delayed_work_sync(>refill); > > + > > + if (netif_running(vi->dev)) { > > + for (i = 0; i < vi->max_queue_pairs; i++) { > > + napi_disable(>rq[i].napi); > > + napi_hash_del(>rq[i].napi); > > + netif_napi_del(>rq[i].napi); > > + } > > + } > > + > > + remove_vq_common(vi); > > + > > + dev->stats.tx_errors++; > > + > > + err = init_vqs(vi); > > + if (err) { > > + dev_warn(>dev, "virtio_net: virtqueue initialization > > failed.\n"); > > + return; > > + } > > + >
[PATCH v8 3/9] video: fbdev: kyrofb: use arch_phys_wc_add() and pci_ioremap_wc_bar()
From: "Luis R. Rodriguez" Convert the driver from using the x86 specific MTRR code to the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add() will avoid MTRR if write-combining is available, in order to take advantage of that also ensure the ioremap'd area is requested as write-combining. There are a few motivations for this: a) Take advantage of PAT when available b) Help bury MTRR code away, MTRR is architecture specific and on x86 its replaced by PAT c) Help with the goal of eventually using _PAGE_CACHE_UC over _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (see commit de33c442e titled "x86 PAT: fix performance drop for glx, use UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()") The conversion done is expressed by the following Coccinelle SmPL patch, it additionally required manual intervention to address all the #ifdery and removal of redundant things which arch_phys_wc_add() already addresses such as verbose message about when MTRR fails and doing nothing when we didn't get an MTRR. @ mtrr_found @ expression index, base, size; @@ -index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1); +index = arch_phys_wc_add(base, size); @ mtrr_rm depends on mtrr_found @ expression mtrr_found.index, mtrr_found.base, mtrr_found.size; @@ -mtrr_del(index, base, size); +arch_phys_wc_del(index); @ mtrr_rm_zero_arg depends on mtrr_found @ expression mtrr_found.index; @@ -mtrr_del(index, 0, 0); +arch_phys_wc_del(index); @ mtrr_rm_fb_info depends on mtrr_found @ struct fb_info *info; expression mtrr_found.index; @@ -mtrr_del(index, info->fix.smem_start, info->fix.smem_len); +arch_phys_wc_del(index); @ ioremap_replace_nocache depends on mtrr_found @ struct fb_info *info; expression base, size; @@ -info->screen_base = ioremap_nocache(base, size); +info->screen_base = ioremap_wc(base, size); @ ioremap_replace_default depends on mtrr_found @ struct fb_info *info; expression base, size; @@ -info->screen_base = ioremap(base, size); +info->screen_base = ioremap_wc(base, size); Generated-by: Coccinelle SmPL Cc: Jingoo Han Cc: Geert Uytterhoeven Cc: Laurent Pinchart Cc: Suresh Siddha Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Juergen Gross Cc: Daniel Vetter Cc: Andy Lutomirski Cc: Dave Airlie Cc: Antonino Daplas Cc: Jean-Christophe Plagniol-Villard Cc: Tomi Valkeinen Cc: linux-fb...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Acked-by: Tomi Valkeinen Signed-off-by: Luis R. Rodriguez --- drivers/video/fbdev/kyro/fbdev.c | 33 +++-- include/video/kyro.h | 4 +--- 2 files changed, 12 insertions(+), 25 deletions(-) diff --git a/drivers/video/fbdev/kyro/fbdev.c b/drivers/video/fbdev/kyro/fbdev.c index 65041e1..5bb0153 100644 --- a/drivers/video/fbdev/kyro/fbdev.c +++ b/drivers/video/fbdev/kyro/fbdev.c @@ -22,9 +22,6 @@ #include #include #include -#ifdef CONFIG_MTRR -#include -#endif #include @@ -84,9 +81,7 @@ static device_info_t deviceInfo; static char *mode_option = NULL; static int nopan = 0; static int nowrap = 1; -#ifdef CONFIG_MTRR static int nomtrr = 0; -#endif /* PCI driver prototypes */ static int kyrofb_probe(struct pci_dev *pdev, const struct pci_device_id *ent); @@ -570,10 +565,8 @@ static int __init kyrofb_setup(char *options) nopan = 1; } else if (strcmp(this_opt, "nowrap") == 0) { nowrap = 1; -#ifdef CONFIG_MTRR } else if (strcmp(this_opt, "nomtrr") == 0) { nomtrr = 1; -#endif } else { mode_option = this_opt; } @@ -691,17 +684,16 @@ static int kyrofb_probe(struct pci_dev *pdev, const struct pci_device_id *ent) currentpar->regbase = deviceInfo.pSTGReg = ioremap_nocache(kyro_fix.mmio_start, kyro_fix.mmio_len); + if (!currentpar->regbase) + goto out_free_fb; - info->screen_base = ioremap_nocache(kyro_fix.smem_start, - kyro_fix.smem_len); + info->screen_base = pci_ioremap_wc_bar(pdev, 0); + if (!info->screen_base) + goto out_unmap_regs; -#ifdef CONFIG_MTRR if (!nomtrr) - currentpar->mtrr_handle = - mtrr_add(kyro_fix.smem_start, -kyro_fix.smem_len, -MTRR_TYPE_WRCOMB, 1); -#endif + currentpar->wc_cookie = arch_phys_wc_add(kyro_fix.smem_start, +kyro_fix.smem_len); kyro_fix.ypanstep = nopan ? 0 : 1; kyro_fix.ywrapstep = nowrap ? 0 : 1; @@ -745,8 +737,10 @@ static int kyrofb_probe(struct pci_dev *pdev, const struct pci_device_id *ent) return 0; out_unmap: - iounmap(currentpar->regbase); iounmap(info->screen_base); +out_unmap_regs: + iounmap(currentpar->regbase); +out_free_fb:
[PATCH v8 2/9] video: fbdev: i740fb: use arch_phys_wc_add() and pci_ioremap_wc_bar()
From: "Luis R. Rodriguez" Convert the driver from using the x86 specific MTRR code to the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add() will avoid MTRR if write-combining is available, in order to take advantage of that also ensure the ioremap'd area is requested as write-combining. There are a few motivations for this: a) Take advantage of PAT when available b) Help bury MTRR code away, MTRR is architecture specific and on x86 its replaced by PAT c) Help with the goal of eventually using _PAGE_CACHE_UC over _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (see commit de33c442e titled "x86 PAT: fix performance drop for glx, use UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()") The conversion done is expressed by the following Coccinelle SmPL patch, it additionally required manual intervention to address all the #ifdery and removal of redundant things which arch_phys_wc_add() already addresses such as verbose message about when MTRR fails and doing nothing when we didn't get an MTRR. @ mtrr_found @ expression index, base, size; @@ -index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1); +index = arch_phys_wc_add(base, size); @ mtrr_rm depends on mtrr_found @ expression mtrr_found.index, mtrr_found.base, mtrr_found.size; @@ -mtrr_del(index, base, size); +arch_phys_wc_del(index); @ mtrr_rm_zero_arg depends on mtrr_found @ expression mtrr_found.index; @@ -mtrr_del(index, 0, 0); +arch_phys_wc_del(index); @ mtrr_rm_fb_info depends on mtrr_found @ struct fb_info *info; expression mtrr_found.index; @@ -mtrr_del(index, info->fix.smem_start, info->fix.smem_len); +arch_phys_wc_del(index); @ ioremap_replace_nocache depends on mtrr_found @ struct fb_info *info; expression base, size; @@ -info->screen_base = ioremap_nocache(base, size); +info->screen_base = ioremap_wc(base, size); @ ioremap_replace_default depends on mtrr_found @ struct fb_info *info; expression base, size; @@ -info->screen_base = ioremap(base, size); +info->screen_base = ioremap_wc(base, size); Generated-by: Coccinelle SmPL Cc: Jingoo Han Cc: Bjorn Helgaas Cc: Geert Uytterhoeven Cc: Rob Clark Cc: Benoit Taine Cc: Suresh Siddha Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Juergen Gross Cc: Daniel Vetter Cc: Andy Lutomirski Cc: Dave Airlie Cc: Antonino Daplas Cc: Jean-Christophe Plagniol-Villard Cc: Tomi Valkeinen Cc: linux-fb...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Acked-by: Tomi Valkeinen Signed-off-by: Luis R. Rodriguez --- drivers/video/fbdev/i740fb.c | 35 ++- 1 file changed, 6 insertions(+), 29 deletions(-) diff --git a/drivers/video/fbdev/i740fb.c b/drivers/video/fbdev/i740fb.c index a2b4204..452e116 100644 --- a/drivers/video/fbdev/i740fb.c +++ b/drivers/video/fbdev/i740fb.c @@ -27,24 +27,15 @@ #include #include -#ifdef CONFIG_MTRR -#include -#endif - #include "i740_reg.h" static char *mode_option; - -#ifdef CONFIG_MTRR static int mtrr = 1; -#endif struct i740fb_par { unsigned char __iomem *regs; bool has_sgram; -#ifdef CONFIG_MTRR - int mtrr_reg; -#endif + int wc_cookie; bool ddc_registered; struct i2c_adapter ddc_adapter; struct i2c_algo_bit_data ddc_algo; @@ -1040,7 +1031,7 @@ static int i740fb_probe(struct pci_dev *dev, const struct pci_device_id *ent) goto err_request_regions; } - info->screen_base = pci_ioremap_bar(dev, 0); + info->screen_base = pci_ioremap_wc_bar(dev, 0); if (!info->screen_base) { dev_err(info->device, "error remapping base\n"); ret = -ENOMEM; @@ -1144,13 +1135,9 @@ static int i740fb_probe(struct pci_dev *dev, const struct pci_device_id *ent) fb_info(info, "%s frame buffer device\n", info->fix.id); pci_set_drvdata(dev, info); -#ifdef CONFIG_MTRR - if (mtrr) { - par->mtrr_reg = -1; - par->mtrr_reg = mtrr_add(info->fix.smem_start, - info->fix.smem_len, MTRR_TYPE_WRCOMB, 1); - } -#endif + if (mtrr) + par->wc_cookie = arch_phys_wc_add(info->fix.smem_start, + info->fix.smem_len); return 0; err_reg_framebuffer: @@ -1177,13 +1164,7 @@ static void i740fb_remove(struct pci_dev *dev) if (info) { struct i740fb_par *par = info->par; - -#ifdef CONFIG_MTRR - if (par->mtrr_reg >= 0) { - mtrr_del(par->mtrr_reg, 0, 0); - par->mtrr_reg = -1; - } -#endif + arch_phys_wc_del(par->wc_cookie); unregister_framebuffer(info); fb_dealloc_cmap(>cmap); if (par->ddc_registered) @@ -1287,10 +1268,8 @@ static int __init i740fb_setup(char *options) while ((opt = strsep(, ",")) != NULL) { if (!*opt) continue; -#ifdef
[PATCH v8 1/9] pci: add pci_ioremap_wc_bar()
From: "Luis R. Rodriguez" This lets drivers take advantage of PAT when available. This should help with the transition of converting video drivers over to ioremap_wc() to help with the goal of eventually using _PAGE_CACHE_UC over _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e titled "x86 PAT: fix performance drop for glx, use UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()") Cc: Toshi Kani Cc: Bjorn Helgaas Cc: Suresh Siddha Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Juergen Gross Cc: Daniel Vetter Cc: Andy Lutomirski Cc: Dave Airlie Cc: Antonino Daplas Cc: Jean-Christophe Plagniol-Villard Cc: Tomi Valkeinen Cc: Ville Syrjälä Cc: Mel Gorman Cc: Vlastimil Babka Cc: Borislav Petkov Cc: Davidlohr Bueso Cc: linux-fb...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Acked-by: Arnd Bergmann Signed-off-by: Luis R. Rodriguez --- drivers/pci/pci.c | 14 ++ include/linux/pci.h | 1 + 2 files changed, 15 insertions(+) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 0008c95..fdae37b 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -138,6 +138,20 @@ void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar) return ioremap_nocache(res->start, resource_size(res)); } EXPORT_SYMBOL_GPL(pci_ioremap_bar); + +void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar) +{ + /* +* Make sure the BAR is actually a memory resource, not an IO resource +*/ + if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM)) { + WARN_ON(1); + return NULL; + } + return ioremap_wc(pci_resource_start(pdev, bar), + pci_resource_len(pdev, bar)); +} +EXPORT_SYMBOL_GPL(pci_ioremap_wc_bar); #endif #define PCI_FIND_CAP_TTL 48 diff --git a/include/linux/pci.h b/include/linux/pci.h index c0dd4ab..1193975 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1657,6 +1657,7 @@ static inline void pci_mmcfg_late_init(void) { } int pci_ext_cfg_avail(void); void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar); +void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar); #ifdef CONFIG_PCI_IOV int pci_iov_virtfn_bus(struct pci_dev *dev, int id); -- 2.3.2.209.gd67f9d5.dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v8 0/9] pci: add pci_iomap_wc() and pci_ioremap_wc_bar()
From: "Luis R. Rodriguez" Boris, This patchset is part of the long haul of series that addresses removal of direct use of MTRR and transforms drivers over to use PAT interfaces when available [0]. Other than this series there is only one more pending series for that effort, the other one being the atyfb device driver specific changes which no one has replied to for over one month and I'll soon repost and hope that Andrew might pick up. The patches in this series were originally split in two series but I've combined them now given all Acks have been collected and they are all related. Tomi has provided his Acked-by for all device driver changes. Bjorn had originally reviewed this series and was comfortable with all the code except for the use of EXPORT_SYMBOL_GPL() despite new clarifications of how we can use this for new symbols and our preference for it on new PAT interfaces [1], despite this Bjorn has clarified he's comfortable with this going in through another maintainer and in particular Arnd [2]. The v7 series was posted addressing Arnd, Arnd provided his Acked-by for all PCI and devres changes but noted he's on parental leave and not taking any patches for arm-soc or asm-generic until he's back at work in around 3 months from now [2] so he suggested to see if I could find another maintainer to have these go through. This v8 goes unmodified, except for the devres commit, since those routines are not yet used by any device driver for now I've just skipped exporting the symbols but did note that if they will be it must be exported with EXPORT_SYMBOL_GPL(). Once we have a driver need them upstream we can export these. Although I had test compiled this before just to be safe I went ahead and successfully test-compiled this set with allmodconfig, specially since I've now removed the exports for the devres routines. Please let me know if these might be able to go through you or if there are any questions. I will note the recent discussion with Benjamin over the v7 series concluded that the ideas we both were alluding to, on automating instead the WC effects for devices seems a bit too idealistic for PCI / PCIE for now, but perhaps we should at least consider this in the future for userspace mmap() calls [4]. [0] http://lkml.kernel.org/r/CAB=NE6UgtdSoBsA=8+ueyrazhdnwusmqaohhaaefqudbrsy...@mail.gmail.com [1] http://lkml.kernel.org/r/caerspo4sha-f83x1nw2qdlt9gdubfxcq7uejmsffc5gbjj8...@mail.gmail.com [2] http://lkml.kernel.org/r/caerspo7cnh1wpgqjceu8etxifnp_piq3cbwnkiwqpuad-fd...@mail.gmail.com [3] http://lkml.kernel.org/r/1435193521.3790.26.ca...@kernel.crashing.org Luis R. Rodriguez (9): pci: add pci_ioremap_wc_bar() video: fbdev: i740fb: use arch_phys_wc_add() and pci_ioremap_wc_bar() video: fbdev: kyrofb: use arch_phys_wc_add() and pci_ioremap_wc_bar() video: fbdev: gxt4500: use pci_ioremap_wc_bar() for framebuffer PCI: Add pci_iomap_wc() variants lib: devres: add pcim_iomap_wc() variants video: fbdev: arkfb: use arch_phys_wc_add() and pci_iomap_wc() video: fbdev: s3fb: use arch_phys_wc_add() and pci_iomap_wc() video: fbdev: vt8623fb: use arch_phys_wc_add() and pci_iomap_wc() drivers/pci/pci.c| 14 drivers/video/fbdev/arkfb.c | 36 +++ drivers/video/fbdev/gxt4500.c| 2 +- drivers/video/fbdev/i740fb.c | 35 -- drivers/video/fbdev/kyro/fbdev.c | 33 ++--- drivers/video/fbdev/s3fb.c | 35 -- drivers/video/fbdev/vt8623fb.c | 31 include/asm-generic/pci_iomap.h | 14 include/linux/pci.h | 3 ++ include/video/kyro.h | 4 +-- lib/devres.c | 76 lib/pci_iomap.c | 61 12 files changed, 204 insertions(+), 140 deletions(-) -- 2.3.2.209.gd67f9d5.dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 0/6] x86: document and address MTRR corner cases
On Fri, Jun 19, 2015 at 3:22 PM, Luis R. Rodriguez wrote: > Tomi, Dave, Andy, > > Its' been one month now since posting the last unmodified version > (other than commit log) of this series [0] and no word or follow up > from Ville. The merge window is closing in and other than the PCI > changes this would be the last pending series. Can I trouble one of > you for your review ? I will note that this series depends on the > ioremap_uc() which went in through Ingo's tree and visible on > linux-next. > > [0] http://lkml.kernel.org/r/20150529174051.gc23...@wotan.suse.de Alright, I'll poke to see if Andrew might take these then. I'll post a new clean series just to be crystal clear as this is a complex set, I admit and it may be worth re-iterating things. Luis -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 02/12] [media] dvb-pll: Add support for THOMSON DTT7546X tuner.
On Wed, 2015-06-24 at 16:11 +0100, Peter Griffin wrote: > This is used in conjunction with the STV0367 demodulator on > the STV0367-NIM-V1.0 NIM card which can be used with the STi > STB SoC's. Barely associated to this specific patch, but for dvb-pll.c, another thing that seems possible is to convert the struct dvb_pll_desc uses to const and change the "entries" fixed array size from 12 to [] It'd save a couple KB overall and remove ~5KB of data. $ size drivers/media/dvb-frontends/dvb-pll.o* textdata bss dec hex filename 852015522120 121922fa0 drivers/media/dvb-frontends/dvb-pll.o.new 562463632120 14107371b drivers/media/dvb-frontends/dvb-pll.o.old --- drivers/media/dvb-frontends/dvb-pll.c | 50 +-- 1 file changed, 25 insertions(+), 25 deletions(-) diff --git a/drivers/media/dvb-frontends/dvb-pll.c b/drivers/media/dvb-frontends/dvb-pll.c index 6d8fe88..53089e1 100644 --- a/drivers/media/dvb-frontends/dvb-pll.c +++ b/drivers/media/dvb-frontends/dvb-pll.c @@ -34,7 +34,7 @@ struct dvb_pll_priv { struct i2c_adapter *i2c; /* the PLL descriptor */ - struct dvb_pll_desc *pll_desc; + const struct dvb_pll_desc *pll_desc; /* cached frequency/bandwidth */ u32 frequency; @@ -57,7 +57,7 @@ MODULE_PARM_DESC(id, "force pll id to use (DEBUG ONLY)"); /* --- */ struct dvb_pll_desc { - char *name; + const char *name; u32 min; u32 max; u32 iffreq; @@ -71,13 +71,13 @@ struct dvb_pll_desc { u32 stepsize; u8 config; u8 cb; - } entries[12]; + } entries[]; }; /* --- */ /* descriptions*/ -static struct dvb_pll_desc dvb_pll_thomson_dtt7579 = { +static const struct dvb_pll_desc dvb_pll_thomson_dtt7579 = { .name = "Thomson dtt7579", .min = 17700, .max = 85800, @@ -99,7 +99,7 @@ static void thomson_dtt759x_bw(struct dvb_frontend *fe, u8 *buf) buf[3] |= 0x10; } -static struct dvb_pll_desc dvb_pll_thomson_dtt759x = { +static const struct dvb_pll_desc dvb_pll_thomson_dtt759x = { .name = "Thomson dtt759x", .min = 17700, .max = 89600, @@ -123,7 +123,7 @@ static void thomson_dtt7520x_bw(struct dvb_frontend *fe, u8 *buf) buf[3] ^= 0x10; } -static struct dvb_pll_desc dvb_pll_thomson_dtt7520x = { +static const struct dvb_pll_desc dvb_pll_thomson_dtt7520x = { .name = "Thomson dtt7520x", .min = 18500, .max = 9, @@ -141,7 +141,7 @@ static struct dvb_pll_desc dvb_pll_thomson_dtt7520x = { }, }; -static struct dvb_pll_desc dvb_pll_lg_z201 = { +static const struct dvb_pll_desc dvb_pll_lg_z201 = { .name = "LG z201", .min = 17400, .max = 86200, @@ -157,7 +157,7 @@ static struct dvb_pll_desc dvb_pll_lg_z201 = { }, }; -static struct dvb_pll_desc dvb_pll_unknown_1 = { +static const struct dvb_pll_desc dvb_pll_unknown_1 = { .name = "unknown 1", /* used by dntv live dvb-t */ .min = 17400, .max = 86200, @@ -179,7 +179,7 @@ static struct dvb_pll_desc dvb_pll_unknown_1 = { /* Infineon TUA6010XS * used in Thomson Cable Tuner */ -static struct dvb_pll_desc dvb_pll_tua6010xs = { +static const struct dvb_pll_desc dvb_pll_tua6010xs = { .name = "Infineon TUA6010XS", .min = 4425, .max = 85800, @@ -193,7 +193,7 @@ static struct dvb_pll_desc dvb_pll_tua6010xs = { }; /* Panasonic env57h1xd5 (some Philips PLL ?) */ -static struct dvb_pll_desc dvb_pll_env57h1xd5 = { +static const struct dvb_pll_desc dvb_pll_env57h1xd5 = { .name = "Panasonic ENV57H1XD5", .min = 4425, .max = 85800, @@ -217,7 +217,7 @@ static void tda665x_bw(struct dvb_frontend *fe, u8 *buf) buf[3] |= 0x08; } -static struct dvb_pll_desc dvb_pll_tda665x = { +static const struct dvb_pll_desc dvb_pll_tda665x = { .name = "Philips TDA6650/TDA6651", .min = 4425, .max = 85800, @@ -251,7 +251,7 @@ static void tua6034_bw(struct dvb_frontend *fe, u8 *buf) buf[3] |= 0x08; } -static struct dvb_pll_desc dvb_pll_tua6034 = { +static const struct dvb_pll_desc dvb_pll_tua6034 = { .name = "Infineon TUA6034", .min = 4425, .max = 85800, @@ -275,7 +275,7 @@ static void tded4_bw(struct dvb_frontend *fe, u8 *buf) buf[3] |= 0x04; } -static struct dvb_pll_desc dvb_pll_tded4 = { +static const struct dvb_pll_desc dvb_pll_tded4 = { .name = "ALPS TDED4", .min = 4700, .max = 86300, @@ -293,7 +293,7 @@ static struct dvb_pll_desc
Re: [Xen-devel] [PATCH v7 5/9] PCI: Add pci_iomap_wc() variants
On Wed, 2015-06-24 at 17:58 -0700, Luis R. Rodriguez wrote: > On Wed, Jun 24, 2015 at 5:52 PM, Benjamin Herrenschmidt > wrote: > > On Thu, 2015-06-25 at 02:08 +0200, Luis R. Rodriguez wrote: > >> > >> OK thanks I'll proceed with these patches then. > >> > >> > As for user mappings, > >> > >> Which APIs were you considering in this regard BTW? > > > > mmap of the generic /sys/bus/pci/.../resource* > > Like? Got a demo patch in mind ? :) Nope. I was just thinking out loud. Today I have yet to see a problem with what we do so ... Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v2 03/28] ACPICA: Hardware: Enable 64-bit firmware waking vector for selected FACS.
Hi, Rafael > From: Rafael J. Wysocki [mailto:r...@rjwysocki.net] > Sent: Wednesday, June 24, 2015 10:06 PM > > On Wednesday, June 24, 2015 11:02:10 AM Lv Zheng wrote: > > ACPICA commit 7aa598d711644ab0de5f70ad88f1e2de253115e4 > > > > The following commit is reported to have broken s2ram on some platforms: > > Commit: 0249ed2444d65d65fc3f3f64f398f1ad0b7e54cd > > ACPICA: Add option to favor 32-bit FADT addresses. > > The platform reports 2 FACS tables (which is not allowed by ACPI > > specification) and the new 32-bit address favor rule forces OSPMs to use > > the FACS table reported via FADT's X_FIRMWARE_CTRL field. > > > > The root cause of the reported bug might be one of the followings: > > 1. BIOS may favor the 64-bit firmware waking vector address when the > >version of the FACS is greater than 0 and Linux currently only supports > >resuming from the real mode, so the 64-bit firmware waking vector has > >never been set and might be invalid to BIOS while the commit enables > >higher version FACS. > > 2. BIOS may favor the FACS reported via the "FIRMWARE_CTRL" field in the > >FADT while the commit doesn't set the firmware waking vector address of > >the FACS reported by "FIRMWARE_CTRL", it only sets the firware waking > >vector address of the FACS reported by "X_FIRMWARE_CTRL". > > > > This patch excludes the cases that can trigger the bugs caused by the root > > cause 1. > > > > ACPI specification says: > > A. 32-bit FACS address (FIRMWARE_CTRL field in FADT): > >Physical memory address of the FACS, where OSPM and firmware exchange > >control information. > >If the X_FIRMWARE_CTRL field contains a non zero value then this field > >must be zero. > >A zero value indicates that no FACS is specified by this field. > > B. 64-bit FACS address (X_FIRMWARE_CTRL field in FADT): > >64bit physical memory address of the FACS. > >This field is used when the physical address of the FACS is above 4GB. > >If the FIRMWARE_CTRL field contains a non zero value then this field > >must be zero. > >A zero value indicates that no FACS is specified by this field. > > Thus the 32bit and 64bit firmware waking vector should indicate completely > > different resuming environment - real mode (1MB addressable) and non real > > mode (4GB+ addressable) and currently Linux only supports resuming from > > real mode. > > > > This patch enables 64-bit firmware waking vector for selected FACS via > > acpi_set_firmware_waking_vector() so that it's up to OSPMs to determine > > which > > resuming mode should be used by BIOS and ACPICA changes won't trigger the > > bugs caused by the root cause 1. For example, Linux can pass > > physical_address64=0 as the parameter of acpi_set_firmware_waking_vector() > > to > > indicate no 64bit waking vector support. Lv Zheng. > > > > This patch also updates acpi_set_firmware_waking_vector() invocations in > > order to keep 32-bit firmware waking vector favor for Linux. 64-bit > > firmware waking vector has never been enabled by Linux. The > > (acpi_physical_address)0 for 64-bit address can be used to force ACPICA to > > set only 32-bit firmware waking vector for Linux. > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=74021 > > Link: https://github.com/acpica/acpica/commit/7aa598d7 > > Cc: 3.14.1+ # 3.14.1+ > > Reported-and-tested-by: Oswald Buddenhagen > > Signed-off-by: Lv Zheng > > Signed-off-by: Bob Moore > > Cc: Thomas Gleixner > > Cc: Ingo Molnar > > Cc: "H. Peter Anvin" > > Cc: x...@kernel.org > > Cc: Tony Luck > > Cc: Fenghua Yu > > Cc: linux-i...@vger.kernel.org > > --- > > arch/ia64/include/asm/acpi.h|3 +- > > arch/ia64/kernel/acpi.c |2 -- > > arch/x86/include/asm/acpi.h |3 +- > > drivers/acpi/acpica/hwxfsleep.c | 61 > > --- > > drivers/acpi/sleep.c|8 +++-- > > include/acpi/acpixf.h | 11 +++ > > 6 files changed, 33 insertions(+), 55 deletions(-) > > > > diff --git a/arch/ia64/include/asm/acpi.h b/arch/ia64/include/asm/acpi.h > > index aa0fdf1..0ac4fab 100644 > > --- a/arch/ia64/include/asm/acpi.h > > +++ b/arch/ia64/include/asm/acpi.h > > @@ -79,7 +79,8 @@ int acpi_gsi_to_irq (u32 gsi, unsigned int *irq); > > /* Low-level suspend routine. */ > > extern int acpi_suspend_lowlevel(void); > > > > -extern unsigned long acpi_wakeup_address; > > +#define acpi_wakeup_address((acpi_physical_address)0) > > +#define acpi_wakeup_address64 ((acpi_physical_address)0) > > > > /* > > * Record the cpei override flag and current logical cpu. This is > > diff --git a/arch/ia64/kernel/acpi.c b/arch/ia64/kernel/acpi.c > > index b1698bc..1b08d6f 100644 > > --- a/arch/ia64/kernel/acpi.c > > +++ b/arch/ia64/kernel/acpi.c > > @@ -60,8 +60,6 @@ int acpi_lapic; > > unsigned int acpi_cpei_override; > > unsigned int acpi_cpei_phys_cpuid; > > > > -unsigned long acpi_wakeup_address = 0; > > - > > #ifdef
Re: [PATCH] sched: split sched_switch trace event into two
On Wed, 24 Jun 2015 16:19:33 -0700 Cong Wang wrote: > For compatibility, the sched_switch event is not touched. Yes, and sched_out() should not be added. > > Cc: Steven Rostedt > Cc: Ingo Molnar > Cc: Peter Zijlstra > Signed-off-by: Cong Wang > Signed-off-by: Cong Wang > --- > include/trace/events/sched.h | 51 > +++- > kernel/sched/core.c | 2 ++ > 2 files changed, 52 insertions(+), 1 deletion(-) > > diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h > index d57a575..c31f1e0 100644 > --- a/include/trace/events/sched.h > +++ b/include/trace/events/sched.h > @@ -112,8 +112,57 @@ static inline long __trace_sched_switch_state(struct > task_struct *p) > #endif /* CREATE_TRACE_POINTS */ > > /* > - * Tracepoint for task switches, performed by the scheduler: > + * Tracepoints for task switches, performed by the scheduler: > */ > +TRACE_EVENT(sched_out, > + > + TP_PROTO(struct task_struct *curr), > + > + TP_ARGS(curr), > + > + TP_STRUCT__entry( > + __array(char, comm, TASK_COMM_LEN ) > + __field(int,prio) > + __field(long, state ) > + ), > + > + TP_fast_assign( > + __entry->prio = curr->prio; > + __entry->state = __trace_sched_switch_state(curr); > + memcpy(__entry->comm, curr->comm, TASK_COMM_LEN); > + ), > + > + TP_printk("comm=%s prio=%d state=%s%s", > + __entry->comm, __entry->prio, > + __entry->state & (TASK_STATE_MAX-1) ? > + __print_flags(__entry->state & (TASK_STATE_MAX-1), "|", > + { 1, "S"} , { 2, "D" }, { 4, "T" }, { 8, "t" }, > + { 16, "Z" }, { 32, "X" }, { 64, "x" }, > + { 128, "K" }, { 256, "W" }, { 512, "P" }, > + { 1024, "N" }) : "R", > + __entry->state & TASK_STATE_MAX ? "+" : "") > +); > + > +TRACE_EVENT(sched_in, > + > + TP_PROTO(struct task_struct *next), > + > + TP_ARGS(next), > + > + TP_STRUCT__entry( > + __array(char, comm, TASK_COMM_LEN ) > + __field(int,prio) > + ), > + > + TP_fast_assign( > + memcpy(__entry->comm, next->comm, TASK_COMM_LEN); > + __entry->prio = next->prio; > + ), > + > + TP_printk("comm=%s prio=%d", > + __entry->comm, __entry->prio) > +); > + > TRACE_EVENT(sched_switch, > > TP_PROTO(struct task_struct *prev, > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index c86935a..681fc50 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -2219,6 +2219,7 @@ prepare_task_switch(struct rq *rq, struct task_struct > *prev, > struct task_struct *next) > { > trace_sched_switch(prev, next); > + trace_sched_out(prev); Tracepoints are low overhead, but they do take up space. This is a useless tracepoint. If anything, I'll work on adding an alias or something. But please don't add a tracepoint next to a tracepoint that encompasses the data. > sched_info_switch(rq, prev, next); > perf_event_task_sched_out(prev, next); > fire_sched_out_preempt_notifiers(prev, next); > @@ -2288,6 +2289,7 @@ static struct rq *finish_task_switch(struct task_struct > *prev) > } > > tick_nohz_task_switch(current); > + trace_sched_in(current); Why not have a: sched_switch_post(prev, current); That way, the hook can be useful for other tools. -- Steve > return rq; > } > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 5/9] PCI: Add pci_iomap_wc() variants
On Wed, 2015-06-24 at 18:38 +0200, Luis R. Rodriguez wrote: > On Wed, Jun 24, 2015 at 08:42:23AM +1000, Benjamin Herrenschmidt wrote: > > On Fri, 2015-06-19 at 15:08 -0700, Luis R. Rodriguez wrote: > > > From: "Luis R. Rodriguez" > > > > > > PCI BARs tell us whether prefetching is safe, but they don't say anything > > > about write combining (WC). WC changes ordering rules and allows writes > > > to > > > be collapsed, so it's not safe in general to use it on a prefetchable > > > region. > > > > Well, the PCIe spec at least specifies that a prefetchable BAR also > > tolerates write merging... > > How can that be determined and can that be used as a full bullet proof hint > to enable wc ? And are you sure? :) Well, I"m sure the spec says that ;-) But it could be new to PCIe, I haven't checked legacy PCI. > Reason all this was stated was to be > apologetic over why we can't automate this behind the scenes. Otherwise > we could amend what you stated into the commit log to elaborate on our > technical apology. Let me know! At least on powerpc, for mmap of resource to userspace, we take off the garded bit in the PTE for prefetchable BARs. This has the effect architecturally of enabling both prefetch and write combine (ie. side effect) though afaik, the implementations probably don't actually prefetch. We've done that for years. In fact we don't have a way to split the notions, it's either G or no G, which carries both meanings. Do you have example/case of a device having problems ? Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ARM64: smp: Silence suspicious RCU usage with ipi tracepoints
On Wed, 24 Jun 2015 23:29:30 +0200 Peter Zijlstra wrote: > On Wed, Jun 24, 2015 at 01:14:18PM -0700, Stephen Boyd wrote: > > John Stultz reported an RCU splat on ARM with ipi trace events > > enabled. It looks like the same problem exists on ARM64. > > > > At this point in the IPI handling path we haven't called > > irq_enter() yet, so RCU doesn't know that we're about to exit > > idle and properly warns that we're using RCU from an idle CPU. > > Use trace_ipi_entry_rcuidle() instead of trace_ipi_entry() so > > that RCU is informed about our exit from idle. > > I have a problem with $subject. It says 'silence', whereas afaict this > fixes an actual bug, so it should be 'fixes'. Agreed, otherwise Acked-by: Steven Rostedt -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: changing format/size of data in TRACE_EVENT(extlog_mem_event)
On Wed, 24 Jun 2015 14:56:49 -0700 "Luck, Tony" wrote: > So the question is - how can we update the trace event to include these > new wider fields with the minimum pain to applications that look at it? > I don't know if there are any other consumers besides rasdaemon at the > moment ... but we don't want ugly transitions where you have to guess > which version of the application you need to run to work with a given > kernel version. It comes down to if the rasdaemon (and any other user) included the event_parse.c "library" (it's not a public library yet, and we really should make it one). Because if it did, it doesn't matter what the field is, the event descriptions will give the size, and as long as the name of a field exists, and it doesn't change type (that is, from integer to string), it should be fine. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 00/16] libnvdimm: non-volatile memory devices
On Wed, 2015-06-17 at 19:13 -0400, Dan Williams wrote: > A new sub-system in support of non-volatile memory storage devices. > > Stephen, please add libnvdimm-for-next to -next: > > git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm libnvdimm-for-next > > Changes since v6 [1]: > > 1/ Deferred the patches dependent on ->rw_bytes() (BTT - stacked block >driver, BLK - mmio aperture windows driver, NFIT_TEST - unit test >infrastructure for all libnvdimm + nfit components) to their own >patchset. Make the ->rw_bytes() implementation the first patch in >that series (Christoph) > > 2/ Collected acks from Christoph and Rafael! > > 3/ Add a HAS_IOMEM dependency to CONFIG_BLK_DEV_PMEM following commit >b6f2098fb708 "block: pmem: Add dependency on HAS_IOMEM" in 4.1-rc8. > > 4/ Move libnvdimm to subsys_initcall() and move arch/x86/kernel/pmem.c >back to device_initcall(). This allows ACPI_NFIT to be built-in. >(Linda) > > 5/ Drop the ACPI_DRIVER_ALL_NOTIFY_EVENTS flag in the nfit driver. >(Rafael) > > 6/ Reference count the nvdimm_drvdata object. This fixes a bug that was >found when the unit tests were extended to test disabling an nvdimm >while a region device still had references to label data. > : > Dan Williams (16): > e820, efi: add ACPI 6.0 persistent memory types > libnvdimm, nfit: initial libnvdimm infrastructure and NFIT support > libnvdimm: control character device and nvdimm_bus sysfs attributes > libnvdimm, nfit: dimm/memory-devices > libnvdimm: control (ioctl) messages for nvdimm_bus and nvdimm devices > libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver > infrastructure > libnvdimm, nfit: regions (block-data-window, persistent memory, > volatile memory) > libnvdimm: support for legacy (non-aliasing) nvdimms > libnvdimm, pmem: move pmem to drivers/nvdimm/ > libnvdimm, pmem: add libnvdimm support to the pmem driver > libnvdimm, nfit: add interleave-set state-tracking infrastructure > libnvdimm: namespace indices: read and validate > libnvdimm: pmem label sets and namespace instantiation. > libnvdimm: blk labels and namespace instantiation > libnvdimm: write pmem label set > libnvdimm: write blk label set We have been successfully running this patchset on our NFIT-enabled prototype systems with pmem. (Intel example _DSM, label, blk are not available for testing.) So for patch 1/16 to 4/15, and 6/16 to 10/16. Tested-by: Toshi Kani Thanks, -Toshi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [PATCH v7 5/9] PCI: Add pci_iomap_wc() variants
On Wed, Jun 24, 2015 at 5:52 PM, Benjamin Herrenschmidt wrote: > On Thu, 2015-06-25 at 02:08 +0200, Luis R. Rodriguez wrote: >> >> OK thanks I'll proceed with these patches then. >> >> > As for user mappings, >> >> Which APIs were you considering in this regard BTW? > > mmap of the generic /sys/bus/pci/.../resource* Like? Got a demo patch in mind ? :) Luis -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFCv2][PATCH 1/7] fs: optimize inotify/fsnotify code for unwatched files
On Wed, 2015-06-24 at 17:16 -0700, Dave Hansen wrote: > From: Dave Hansen > > I have a _tiny_ microbenchmark that sits in a loop and writes > single bytes to a file. Writing one byte to a tmpfs file is > around 2x slower than reading one byte from a file, which is a > _bit_ more than I expecte. This is a dumb benchmark, but I think > it's hard to deny that write() is a hot path and we should avoid > unnecessary overhead there. > > I did a 'perf record' of 30-second samples of read and write. > The top item in a diffprofile is srcu_read_lock() from > fsnotify(). There are active inotify fd's from systemd, but > nothing is actually listening to the file or its part of > the filesystem. > > I *think* we can avoid taking the srcu_read_lock() for the > common case where there are no actual marks on the file. > This means that there will both be nothing to notify for > *and* implies that there is no need for clearing the ignore > mask. > > This patch gave a 13.8% speedup in writes/second on my test, > which is an improvement from the 10.8% that I saw with the > last version. > > Signed-off-by: Dave Hansen > Cc: Andrew Morton > Cc: Jan Kara > Cc: Al Viro > Cc: Eric Paris > Cc: John McCutchan > Cc: Robert Love > Cc: Tim Chen > Cc: Andi Kleen > Cc: linux-kernel@vger.kernel.org > --- > > b/fs/notify/fsnotify.c | 10 ++ > 1 file changed, 10 insertions(+) > > diff -puN fs/notify/fsnotify.c~optimize-fsnotify fs/notify/fsnotify.c > --- a/fs/notify/fsnotify.c~optimize-fsnotify 2015-06-24 > 17:14:34.573109264 -0700 > +++ b/fs/notify/fsnotify.c2015-06-24 17:14:34.576109399 -0700 > @@ -213,6 +213,16 @@ int fsnotify(struct inode *to_tell, __u3 > !(test_mask & to_tell->i_fsnotify_mask) && > !(mnt && test_mask & mnt->mnt_fsnotify_mask)) > return 0; > + /* > + * Optimization: srcu_read_lock() has a memory barrier which > can > + * be expensive. It protects walking the *_fsnotify_marks > lists. > + * However, if we do not walk the lists, we do not have to > do > + * SRCU because we have no references to any objects and do > not > + * need SRCU to keep them "alive". > + */ > + if (!to_tell->i_fsnotify_marks.first && > + (!mnt || !mnt->mnt_fsnotify_marks.first)) > + return 0; two useless peeps from the old peanut gallery of long lost 1) should you actually move this check up before the IN_MODIFY check? This seems like it would be by far the most common case, and you'd save yourself a bunch of useless conditionals/bit operations. 2) do you want to use hlist_empty(_tell->i_fsnotify_marks) instead, for readability (and they are static inline, so compiled code is the same) It is fine as it is. Don't know how much you want to try to bikeshed... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 5/9] PCI: Add pci_iomap_wc() variants
On Thu, 2015-06-25 at 02:08 +0200, Luis R. Rodriguez wrote: > > OK thanks I'll proceed with these patches then. > > > As for user mappings, > > Which APIs were you considering in this regard BTW? mmap of the generic /sys/bus/pci/.../resource* > > maybe the right thing to do is to let us do what we do by > > default with a quirk that can set a flag in pci_dev to disable that > > behaviour (maybe on a per BAR basis ?). > > That might mean it could restrict userspace WC to require devices > to have WC parts on a full PCI BAR. Although this is restrictive > having reviewed most WC uses in the kernel I'd think this would be > a fair compromise to make, but again, if things are still murky > perhaps best we kiss this idea good bye for now and hope for it > to come in on future buses or ammendments (if that's even possible?). > > > I think the common case is that WC works. > > If WC does not I will note one hack which migh be worth mentioning -- > just for > the record, this was devised as a shortcoming of a device where they > failed to > split things properly and that *without* WC performance suffered quite > a bit so > they made one full PCI BAR WC and as a work around this: > > http://lkml.kernel.org/r/20150416041837.GA5712@hykim-PC > > That is for registers that needed it: > > write; wmb; > > Then if they wanted to wait till the NIC has seen the write, they did: > > write; wmb; read; > Right, and as I mentioned, on some archs like powerpc (and possibly more), writel() and co contains an implicit mb() > Luis-- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 09/10] mm/compaction: redesign compaction
Currently, compaction works as following. 1) migration scanner scans from zone_start_pfn to zone_end_pfn to find migratable pages 2) free scanner scans from zone_end_pfn to zone_start_pfn to find free pages 3) If both scanner crossed, compaction is finished. This algorithm has some drawbacks. 1) Back of the zone cannot be scanned by migration scanner because migration scanner can't pass over freepage scanner. So, although there are some high order page candidates at back of the zone, we can't utilize it. Another weakness is 2) compaction's success highly depends on amount of freepage. Compaction can migrate used pages by amount of freepage at maximum. If we can't make high order page by this effort, both scanner should meet and compaction will fail. We can easily observe problem 1) by following test. Memory is artificially fragmented to make order 3 allocation hard. And, most of pageblocks are changed to unmovable migratetype. System: 512 MB with 32 MB Zram Memory: 25% memory is allocated to make fragmentation and 200 MB is occupied by memory hogger. Most pageblocks are movable migratetype. Fragmentation: Successful order 3 allocation candidates may be around 1500 roughly. Allocation attempts: Roughly 3000 order 3 allocation attempts with GFP_NORETRY. This value is determined to saturate allocation success. Test: hogger-frag-movable nonmovable compact_free_scanned 5883401 compact_isolated 83201 compact_migrate_scanned2755690 compact_stall 664 compact_success102 pgmigrate_success38663 Success:26 Success(N): 56 Column 'Success' and 'Success(N) are calculated by following equations. Success = successful allocation * 100 / attempts Success(N) = successful allocation * 100 / number of successful order-3 allocation As mentioned above, there are roughly 1500 high order page candidates, but, compaction just returns 56% of them, because migration scanner can't pass over freepage scanner. With new compaction approach, it can be increased to 94% by this patch. To check 2), hogger-frag-movable benchmark is used again, but, with some tweaks. Amount of allocated memory by memory hogger varys. Test: hogger-frag-movable with free memory variation bzImage-improve-base Hogger: 150MB 200MB 250MB 300MB Success:41 25 17 9 Success(N): 87 53 37 22 As background knowledge, up to 250MB, there is enough memory to succeed all order-3 allocation attempts. In 300MB case, available memory before starting allocation attempt is just 57MB, so all of attempts cannot succeed. Anyway, as free memory decreases, compaction success rate also decreases. It is better to remove this dependency to get stable compaction result in any case. This patch solves these problems mentioned in above. Freepage scanner is greatly changed to scan zone from zone_start_pfn to zone_end_pfn. And, by this change, compaction finish condition is also changed that migration scanner reach zone_end_pfn. With these changes, migration scanner can traverse anywhere in the zone. To prevent back and forth migration within one compaction iteration, freepage scanner marks skip-bit when scanning pageblock. migration scanner checks it and will skip this marked pageblock so back and forth migration cannot be possible in one compaction iteration. If freepage scanner reachs the end of zone, it restarts at zone_start_pfn. In this time, freepage scanner would scan the pageblock where migration scanner try to migrate some pages but fail to make high order page. This leaved freepages means that they can't become high order page due to the fragmentation so it is good source for freepage scanner. With this change, above test result is: Test: hogger-frag-movable nonmovable redesign compact_free_scanned 58834018103231 compact_isolated 832013108978 compact_migrate_scanned27556904316163 compact_stall 664 2117 compact_success102234 pgmigrate_success386631547318 Success:26 45 Success(N): 56 94 Test: hogger-frag-movable with free memory variation Hogger: 150MB 200MB 250MB 300MB bzImage-improve-base Success:41 25 17 9 Success(N): 87 53 37 22 bzImage-improve-threshold Success:44 44 42 37 Success(N): 94 92 91 80 Compaction gives us almost all possible high order page. Overhead is highly increased, but, further patch will reduce it greatly by
[RFC PATCH 05/10] mm/compaction: make freepage scanner scans non-movable pageblock
Currently, freescanner doesn't scan non-movable pageblock, because if freepages in non-movable pageblock are exhausted, another movable pageblock would be used for non-movable allocation and it could cause fragmentation. But, we should know that watermark check for compaction doesn't consider this reality. So, if all freepages are in non-movable pageblock, although, system has enough freepages and watermark check is passed, freepage scanner can't get any freepage and compaction will be failed. There is no way to get precise number of freepage on movable pageblock and no way to reclaim only used pages in movable pageblock. Therefore, I think that best way to overcome this situation is to use freepage in non-movable pageblock in compaction. My test setup for this situation is: Memory is artificially fragmented to make order 3 allocation hard. And, most of pageblocks are changed to unmovable migratetype. System: 512 MB with 32 MB Zram Memory: 25% memory is allocated to make fragmentation and kernel build is running on background. Fragmentation: Successful order 3 allocation candidates may be around 1500 roughly. Allocation attempts: Roughly 3000 order 3 allocation attempts with GFP_NORETRY. This value is determined to saturate allocation success. Below is the result of this test. Test: build-frag-unmovable base nonmovable compact_free_scanned 50323784110920 compact_isolated 53368 330762 compact_migrate_scanned14565166164677 compact_stall 538746 compact_success 93350 pgmigrate_success19926 152754 Success:15 31 Success(N): 33 65 Column 'Success' and 'Success(N) are calculated by following equations. Success = successful allocation * 100 / attempts Success(N) = successful allocation * 100 / order 3 candidate Result shows that success rate is doubled in this case because we can search more area. But, we can observe regression in other case. Test: stress-highalloc in mmtests (tweaks to request order-7 unmovable allocation) Ops 1 30.008.33 Ops 2 32.33 26.67 Ops 3 91.67 92.00 Compaction stalls 51105581 Compaction success17871807 Compaction failures 33233774 Compaction pages isolated 637091115421622 Compaction migrate scanned5268140583721428 Compaction free scanned 418049611 579768237 Compaction cost 37458822 Although this regression is bad, there are also much improvement in other cases that most of pageblocks are non-movable migratetype. IMHO, this patch can be justified by this improvement. Moreover, this regression disappears after applying following patches, so we don't need to worry about regression much. Migration scanner already scans non-movable pageblock and make some freepage in that pageblock through migration. So, even if freepage scanner scans non-movable pageblock and uses freepage in that pageblock, number of freepages on non-movable pageblock wouldn't diminish much and wouldn't cause much fragmentation. Signed-off-by: Joonsoo Kim --- mm/compaction.c | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index dd2063b..8d1b3b5 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -905,12 +905,8 @@ static bool suitable_migration_target(struct page *page) return false; } - /* If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration */ - if (migrate_async_suitable(get_pageblock_migratetype(page))) - return true; - - /* Otherwise skip the block */ - return false; + /* Otherwise scan the block */ + return true; } /* -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 00/10] redesign compaction algorithm
Recently, I got a report that android get slow due to order-2 page allocation. With some investigation, I found that compaction usually fails and many pages are reclaimed to make order-2 freepage. I can't analyze detailed reason that causes compaction fail because I don't have reproducible environment and compaction code is changed so much from that version, v3.10. But, I was inspired by this report and started to think limitation of current compaction algorithm. Limitation of current compaction algorithm: 1) Migrate scanner can't scan behind of free scanner, because each scanner starts at both side of zone and go toward each other. If they meet at some point, compaction is stopped and scanners' position is reset to both side of zone again. From my experience, migrate scanner usually doesn't scan beyond of half of the zone range. 2) Compaction capability is highly depends on amount of free memory. If there is 50 MB free memory on 4 GB system, migrate scanner can migrate 50 MB used pages at maximum and then will meet free scanner. If compaction can't make enough high order freepages during this amount of work, compaction would fail. There is no way to escape this failure situation in current algorithm and it will scan same region and fail again and again. And then, it goes into compaction deferring logic and will be deferred for some times. 3) Compaction capability is highly depends on migratetype of memory, because freepage scanner doesn't scan unmovable pageblock. To investigate compaction limitations, I made some compaction benchmarks. Base environment of this benchmark is fragmented memory. Before testing, 25% of total size of memory is allocated. With some tricks, these allocations are evenly distributed to whole memory range. So, after allocation is finished, memory is highly fragmented and possibility of successful order-3 allocation is very low. Roughly 1500 order-3 allocation can be successful. Tests attempt excessive amount of allocation request, that is, 3000, to find out algorithm limitation. There are two variations. pageblock type (unmovable / movable): One is that most pageblocks are unmovable migratetype and the other is that most pageblocks are movable migratetype. memory usage (memory hogger 200 MB / kernel build with -j8): Memory hogger means that 200 MB free memory is occupied by hogger. Kernel build means that kernel build is running on background and it will consume free memory, but, amount of consumption will be very fluctuated. With these variations, I made 4 test cases by mixing them. hogger-frag-unmovable hogger-frag-movable build-frag-unmovable build-frag-movable All tests are conducted on 512 MB QEMU virtual machine with 8 CPUs. I can easily check weakness of compaction algorithm by following test. To check 1), hogger-frag-movable benchmark is used. Result is as following. bzImage-improve-base compact_free_scanned 5240676 compact_isolated 75048 compact_migrate_scanned2468387 compact_stall 710 compact_success98 pgmigrate_success 34869 Success: 25 Success(N):53 Column 'Success' and 'Success(N) are calculated by following equations. Success = successful allocation * 100 / attempts Success(N) = successful allocation * 100 / number of successful order-3 allocation As mentioned above, there are roughly 1500 high order page candidates, but, compaction just returns 53% of them. With new compaction approach, it can be increased to 94%. See result at the end of this cover-letter. To check 2), hogger-frag-movable benchmark is used again, but, with some tweaks. Amount of allocated memory by memory hogger varys. bzImage-improve-base Hogger: 150MB 200MB 250MB 300MB Success:41 25 17 9 Success(N): 87 53 37 22 As background knowledge, up to 250MB, there is enough memory to succeed all order-3 allocation attempts. In 300MB case, available memory before starting allocation attempt is just 57MB, so all of attempts cannot succeed. Anyway, as free memory decreases, compaction success rate also decreases. It is better to remove this dependency to get stable compaction result in any case. To check 3), build-frag-unmovable/movable benchmarks are used. All factors are same except pageblock migratetypes. Test: build-frag-unmovable bzImage-improve-base compact_free_scanned 5032378 compact_isolated 53368 compact_migrate_scanned1456516 compact_stall 538 compact_success93 pgmigrate_success 19926 Success: 15 Success(N):33 Test: build-frag-movable bzImage-improve-base compact_free_scanned 3059086 compact_isolated 129085 compact_migrate_scanned5029856 compact_stall 388 compact_success99
[RFC PATCH 03/10] mm/compaction: always update cached pfn
Signed-off-by: Joonsoo Kim --- mm/compaction.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/mm/compaction.c b/mm/compaction.c index 9c5d43c..2d8e211 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -510,6 +510,10 @@ isolate_fail: if (locked) spin_unlock_irqrestore(>zone->lock, flags); + if (blockpfn == end_pfn && + blockpfn > cc->zone->compact_cached_free_pfn) + cc->zone->compact_cached_free_pfn = blockpfn; + update_pageblock_skip(cc, valid_page, total_isolated, *start_pfn, end_pfn, blockpfn, false); @@ -811,6 +815,13 @@ isolate_success: if (locked) spin_unlock_irqrestore(>lru_lock, flags); + if (low_pfn == end_pfn && cc->mode != MIGRATE_ASYNC) { + int sync = cc->mode != MIGRATE_ASYNC; + + if (low_pfn > zone->compact_cached_migrate_pfn[sync]) + zone->compact_cached_migrate_pfn[sync] = low_pfn; + } + update_pageblock_skip(cc, valid_page, nr_isolated, start_pfn, end_pfn, low_pfn, true); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 07/10] mm/compaction: limit compaction activity in compaction depleted state
Compaction deferring was introduced to reduce overhead of compaction when compaction attempt is expected to fail. But, it has a problem. Whole zone is rescanned after some compaction attempts are deferred and this rescan overhead is quite big. And, it imposes large latency to one random requestor while others will get nearly zero latency to fail due to deferring compaction. This patch try to handle this situation differently to solve above problems. At first, we should know when compaction will fail. Previous patch defines compaction depleted state. In this state, compaction failure is highly expected so we don't need to take much effort on compaction. So, this patch forces migration scanner scan restricted number of pages in this state. With this way, we can evenly distribute compaction overhead to all compaction requestors. And, there is a way to escape from compaction depleted state so we don't need to defer specific number of compaction attempts unconditionally if compaction possibility recovers. In this patch, migration scanner limit is defined to imitate current compaction deferring approach. But, we can tune it easily if this overhead doesn't look appropriate. It would be further work. There would be a situation that compactino depleted state is maintained for a long time. In this case, repeated compaction attempts would cause useless overhead continually. To optimize this case, this patch introduce compaction depletion depth and make migration scanner limit diminished according to this depth. It effectively reduce compaction overhead in this situation. Signed-off-by: Joonsoo Kim --- include/linux/mmzone.h | 1 + mm/compaction.c| 61 -- mm/internal.h | 1 + 3 files changed, 61 insertions(+), 2 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index bd9f1a5..700e9b5 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -518,6 +518,7 @@ struct zone { unsigned intcompact_defer_shift; int compact_order_failed; unsigned long compact_success; + unsigned long compact_depletion_depth; #endif #if defined CONFIG_COMPACTION || defined CONFIG_CMA diff --git a/mm/compaction.c b/mm/compaction.c index 9f259b9..aff536f 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -130,6 +130,7 @@ static struct page *pageblock_pfn_to_page(unsigned long start_pfn, /* Do not skip compaction more than 64 times */ #define COMPACT_MAX_DEFER_SHIFT 6 #define COMPACT_MIN_DEPLETE_THRESHOLD 1UL +#define COMPACT_MIN_SCAN_LIMIT (pageblock_nr_pages) static bool compaction_depleted(struct zone *zone) { @@ -147,6 +148,48 @@ static bool compaction_depleted(struct zone *zone) return true; } +static void set_migration_scan_limit(struct compact_control *cc) +{ + struct zone *zone = cc->zone; + int order = cc->order; + unsigned long limit; + + cc->migration_scan_limit = LONG_MAX; + if (order < 0) + return; + + if (!test_bit(ZONE_COMPACTION_DEPLETED, >flags)) + return; + + if (!zone->compact_depletion_depth) + return; + + /* Stop async migration if depleted */ + if (cc->mode == MIGRATE_ASYNC) { + cc->migration_scan_limit = -1; + return; + } + + /* +* Deferred compaction restart compaction every 64 compaction +* attempts and it rescans whole zone range. If we limit +* migration scanner to scan 1/64 range when depleted, 64 +* compaction attempts will rescan whole zone range as same +* as deferred compaction. +*/ + limit = zone->managed_pages >> 6; + + /* +* We don't do async compaction. Instead, give extra credit +* to sync compaction +*/ + limit <<= 1; + limit = max(limit, COMPACT_MIN_SCAN_LIMIT); + + /* Degradation scan limit according to depletion depth. */ + limit >>= zone->compact_depletion_depth; + cc->migration_scan_limit = max(limit, COMPACT_CLUSTER_MAX); +} /* * Compaction is deferred when compaction fails to result in a page * allocation success. 1 << compact_defer_limit compactions are skipped up @@ -243,8 +286,14 @@ static void __reset_isolation_suitable(struct zone *zone) zone->compact_cached_free_pfn = end_pfn; zone->compact_blockskip_flush = false; - if (compaction_depleted(zone)) - set_bit(ZONE_COMPACTION_DEPLETED, >flags); + if (compaction_depleted(zone)) { + if (test_bit(ZONE_COMPACTION_DEPLETED, >flags)) + zone->compact_depletion_depth++; + else { + set_bit(ZONE_COMPACTION_DEPLETED, >flags); + zone->compact_depletion_depth = 0; + } + } zone->compact_success = 0; /* Walk the zone and mark
[RFC PATCH 06/10] mm/compaction: introduce compaction depleted state on zone
Further compaction attempt is deferred when some of compaction attempts already fails. But, after some number of trial are skipped, compaction restarts work to check whether compaction is now possible or not. It scans whole range of zone to determine this possibility and if compaction possibility doesn't recover, this whole range scan is quite big overhead. As a first step to reduce this overhead, this patch implement compaction depleted state on zone. The way to determine depletion of compaction possility is checking number of success on previous compaction attempt. If number of successful compaction is below than specified threshold, we guess that compaction will not successful next time so mark the zone as compaction depleted. In this patch, threshold is choosed by 1 to imitate current compaction deferring algorithm. In the following patch, compaction algorithm will be changed and this threshold is also adjusted to that change. In this patch, only state definition is implemented. There is no action for this new state so no functional change. But, following patch will add some handling for this new state. Signed-off-by: Joonsoo Kim --- include/linux/mmzone.h | 2 ++ mm/compaction.c| 38 +++--- 2 files changed, 37 insertions(+), 3 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 754c259..bd9f1a5 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -517,6 +517,7 @@ struct zone { unsigned intcompact_considered; unsigned intcompact_defer_shift; int compact_order_failed; + unsigned long compact_success; #endif #if defined CONFIG_COMPACTION || defined CONFIG_CMA @@ -543,6 +544,7 @@ enum zone_flags { * many pages under writeback */ ZONE_FAIR_DEPLETED, /* fair zone policy batch depleted */ + ZONE_COMPACTION_DEPLETED, /* compaction possiblity depleted */ }; static inline unsigned long zone_end_pfn(const struct zone *zone) diff --git a/mm/compaction.c b/mm/compaction.c index 8d1b3b5..9f259b9 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -129,6 +129,23 @@ static struct page *pageblock_pfn_to_page(unsigned long start_pfn, /* Do not skip compaction more than 64 times */ #define COMPACT_MAX_DEFER_SHIFT 6 +#define COMPACT_MIN_DEPLETE_THRESHOLD 1UL + +static bool compaction_depleted(struct zone *zone) +{ + unsigned long threshold; + unsigned long success = zone->compact_success; + + /* +* Now, to imitate current compaction deferring approach, +* choose threshold to 1. It will be changed in the future. +*/ + threshold = COMPACT_MIN_DEPLETE_THRESHOLD; + if (success >= threshold) + return false; + + return true; +} /* * Compaction is deferred when compaction fails to result in a page @@ -226,6 +243,10 @@ static void __reset_isolation_suitable(struct zone *zone) zone->compact_cached_free_pfn = end_pfn; zone->compact_blockskip_flush = false; + if (compaction_depleted(zone)) + set_bit(ZONE_COMPACTION_DEPLETED, >flags); + zone->compact_success = 0; + /* Walk the zone and mark every pageblock as suitable for isolation */ for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) { struct page *page; @@ -1197,22 +1218,28 @@ static int __compact_finished(struct zone *zone, struct compact_control *cc, bool can_steal; /* Job done if page is free of the right migratetype */ - if (!list_empty(>free_list[migratetype])) + if (!list_empty(>free_list[migratetype])) { + zone->compact_success++; return COMPACT_PARTIAL; + } #ifdef CONFIG_CMA /* MIGRATE_MOVABLE can fallback on MIGRATE_CMA */ if (migratetype == MIGRATE_MOVABLE && - !list_empty(>free_list[MIGRATE_CMA])) + !list_empty(>free_list[MIGRATE_CMA])) { + zone->compact_success++; return COMPACT_PARTIAL; + } #endif /* * Job done if allocation would steal freepages from * other migratetype buddy lists. */ if (find_suitable_fallback(area, order, migratetype, - true, _steal) != -1) + true, _steal) != -1) { + zone->compact_success++; return COMPACT_PARTIAL; + } } return COMPACT_NO_SUITABLE_PAGE; @@ -1452,6 +1479,11 @@ out: trace_mm_compaction_end(start_pfn, cc->migrate_pfn,
RE: [PATCH v2 05/28] ACPICA: Hardware: Enable firmware waking vector for both 32-bit and 64-bit FACS.
Hi, Rafael > From: Rafael J. Wysocki [mailto:r...@rjwysocki.net] > Sent: Thursday, June 25, 2015 7:57 AM > > On Wednesday, June 24, 2015 11:02:54 AM Lv Zheng wrote: > > ACPICA commit 368eb60778b27b6ae94d3658ddc902ca1342a963 > > ACPICA commit 70f62a80d65515e1285fdeeb50d94ee6f07df4bd > > > > The following commit is reported to have broken s2ram on some platforms: > > Commit: 0249ed2444d65d65fc3f3f64f398f1ad0b7e54cd > > ACPICA: Add option to favor 32-bit FADT addresses. > > The platform reports 2 FACS tables (which is not allowed by ACPI > > specification) and the new 32-bit address favor rule forces OSPMs to use > > the FACS table reported via FADT's X_FIRMWARE_CTRL field. > > > > The root cause of the reported bug might be one of the followings: > > 1. BIOS may favor the 64-bit firmware waking vector address when the > >version of the FACS is greater than 0 and Linux currently only supports > >resuming from the real mode, so the 64-bit firmware waking vector has > >never been set and might be invalid to BIOS while the commit enables > >higher version FACS. > > 2. BIOS may favor the FACS reported via the "FIRMWARE_CTRL" field in the > >FADT while the commit doesn't set the firmware waking vector address of > >the FACS reported by "FIRMWARE_CTRL", it only sets the firware waking > >vector address of the FACS reported by "X_FIRMWARE_CTRL". > > > > This patch excludes the cases that can trigger the bugs caused by the root > > cause 2. > > > > There is no handshaking mechanism can be used by OSPM to tell BIOS which > > FACS is currently used. Thus the FACS reported by "FIRMWARE_CTRL" may still > > be used by BIOS and the 0 value of the 32-bit firmware waking vector might > > trigger such failure. > > > > This patch enables the firmware waking vectors for both 32bit/64bit FACS > > tables in order to ensure we can exclude the cases that trigger the bugs > > caused by the root cause 2. The exclusion is split into 2 commits so that > > if it turns out not to be necessary, this single commit can be reverted > > without affecting the useful one. Lv Zheng, Bob Moore. > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=74021 > > Link: https://github.com/acpica/acpica/commit/368eb607 > > Link: https://github.com/acpica/acpica/commit/70f62a80 > > Reported-and-tested-by: Oswald Buddenhagen > > Signed-off-by: Lv Zheng > > Signed-off-by: Bob Moore > > --- > > drivers/acpi/acpica/acglobal.h |2 ++ > > drivers/acpi/acpica/hwxfsleep.c | 74 > > --- > > drivers/acpi/acpica/tbutils.c | 14 > > 3 files changed, 71 insertions(+), 19 deletions(-) > > > > diff --git a/drivers/acpi/acpica/acglobal.h b/drivers/acpi/acpica/acglobal.h > > index a0c4787..53f96a3 100644 > > --- a/drivers/acpi/acpica/acglobal.h > > +++ b/drivers/acpi/acpica/acglobal.h > > @@ -61,6 +61,8 @@ ACPI_GLOBAL(struct acpi_table_header, > > acpi_gbl_original_dsdt_header); > > > > #if (!ACPI_REDUCED_HARDWARE) > > ACPI_GLOBAL(struct acpi_table_facs *, acpi_gbl_FACS); > > +ACPI_GLOBAL(struct acpi_table_facs *, acpi_gbl_facs32); > > +ACPI_GLOBAL(struct acpi_table_facs *, acpi_gbl_facs64); > > > > #endif /* !ACPI_REDUCED_HARDWARE */ > > > > diff --git a/drivers/acpi/acpica/hwxfsleep.c > > b/drivers/acpi/acpica/hwxfsleep.c > > index c67cd32..e273b2e 100644 > > --- a/drivers/acpi/acpica/hwxfsleep.c > > +++ b/drivers/acpi/acpica/hwxfsleep.c > > @@ -50,6 +50,13 @@ > > ACPI_MODULE_NAME("hwxfsleep") > > > > /* Local prototypes */ > > +#if (!ACPI_REDUCED_HARDWARE) > > +static acpi_status > > +acpi_hw_set_firmware_waking_vector(struct acpi_table_facs *facs, > > + acpi_physical_address physical_address, > > + acpi_physical_address physical_address64); > > +#endif > > + > > static acpi_status acpi_hw_sleep_dispatch(u8 sleep_state, u32 function_id); > > > > /* > > @@ -79,9 +86,10 @@ static struct acpi_sleep_functions acpi_sleep_dispatch[] > > = { > > #if (!ACPI_REDUCED_HARDWARE) > > > > /*** > > * > > - * FUNCTION:acpi_set_firmware_waking_vector > > + * FUNCTION:acpi_hw_set_firmware_waking_vector > > * > > - * PARAMETERS: physical_address- 32-bit physical address of ACPI real > > mode > > + * PARAMETERS: facs- Pointer to FACS table > > + * physical_address- 32-bit physical address of ACPI real > > mode > > *entry point > > * physical_address64 - 64-bit physical address of ACPI > > protected > > *entry point > > @@ -92,11 +100,12 @@ static struct acpi_sleep_functions > > acpi_sleep_dispatch[] = { > > * > > > > **/ > > > > -acpi_status > >
[RFC PATCH 10/10] mm/compaction: new threshold for compaction depleted zone
Now, compaction algorithm become powerful. Migration scanner traverses whole zone range. So, old threshold for depleted zone which is designed to imitate compaction deferring approach isn't appropriate for current compaction algorithm. If we adhere to current threshold, 1, we can't avoid excessive overhead caused by compaction, because one compaction for low order allocation would be easily successful in any situation. This patch re-implements threshold calculation based on zone size and allocation requested order. We judge whther compaction possibility is depleted or not by number of successful compaction. Roughly, 1/100 of future scanned area should be allocated for high order page during one comaction iteration in order to determine whether zone's compaction possiblity is depleted or not. Below is test result with following setup. Memory is artificially fragmented to make order 3 allocation hard. And, most of pageblocks are changed to unmovable migratetype. System: 512 MB with 32 MB Zram Memory: 25% memory is allocated to make fragmentation and 200 MB is occupied by memory hogger. Most pageblocks are unmovable migratetype. Fragmentation: Successful order 3 allocation candidates may be around 1500 roughly. Allocation attempts: Roughly 3000 order 3 allocation attempts with GFP_NORETRY. This value is determined to saturate allocation success. Test: hogger-frag-unmovable redesign threshold compact_free_scanned 64410952235764 compact_isolated 2711081 647701 compact_migrate_scanned41754641697292 compact_stall 2059 2092 compact_success207210 pgmigrate_success 1348113 318395 Success:44 40 Success(N): 90 83 This change results in greatly decreasing compaction overhead when zone's compaction possibility is nearly depleted. But, I should admit that it's not perfect because compaction success rate is decreased. More precise tuning threshold would restore this regression, but, it highly depends on workload so I'm not doing it here. Other test doesn't show any regression. System: 512 MB with 32 MB Zram Memory: 25% memory is allocated to make fragmentation and kernel build is running on background. Most pageblocks are movable migratetype. Fragmentation: Successful order 3 allocation candidates may be around 1500 roughly. Allocation attempts: Roughly 3000 order 3 allocation attempts with GFP_NORETRY. This value is determined to saturate allocation success. Test: build-frag-movable redesign threshold compact_free_scanned 23595531461131 compact_isolated907515 387373 compact_migrate_scanned37856052177090 compact_stall 2195 2157 compact_success247225 pgmigrate_success 439739 182366 Success:43 43 Success(N): 89 90 Signed-off-by: Joonsoo Kim --- mm/compaction.c | 19 --- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 99f533f..63702b3 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -129,19 +129,24 @@ static struct page *pageblock_pfn_to_page(unsigned long start_pfn, /* Do not skip compaction more than 64 times */ #define COMPACT_MAX_FAILED 4 -#define COMPACT_MIN_DEPLETE_THRESHOLD 1UL +#define COMPACT_MIN_DEPLETE_THRESHOLD 4UL #define COMPACT_MIN_SCAN_LIMIT (pageblock_nr_pages) static bool compaction_depleted(struct zone *zone) { - unsigned long threshold; + unsigned long nr_possible; unsigned long success = zone->compact_success; + unsigned long threshold; - /* -* Now, to imitate current compaction deferring approach, -* choose threshold to 1. It will be changed in the future. -*/ - threshold = COMPACT_MIN_DEPLETE_THRESHOLD; + nr_possible = zone->managed_pages >> zone->compact_order_failed; + + /* Migration scanner can scans more than 1/4 range of zone */ + nr_possible >>= 2; + + /* We hope to succeed more than 1/100 roughly */ + threshold = nr_possible >> 7; + + threshold = max(threshold, COMPACT_MIN_DEPLETE_THRESHOLD); if (success >= threshold) return false; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 08/10] mm/compaction: remove compaction deferring
Now, we have a way to determine compaction depleted state and compaction activity will be limited according this state and depletion depth so compaction overhead would be well controlled without compaction deferring. So, this patch remove compaction deferring completely. Various functions are renamed and tracepoint outputs are changed due to this removing. Signed-off-by: Joonsoo Kim --- include/linux/compaction.h| 14 +--- include/linux/mmzone.h| 3 +- include/trace/events/compaction.h | 30 +++- mm/compaction.c | 74 ++- mm/page_alloc.c | 2 +- mm/vmscan.c | 4 +-- 6 files changed, 37 insertions(+), 90 deletions(-) diff --git a/include/linux/compaction.h b/include/linux/compaction.h index aa8f61c..8d98f3c 100644 --- a/include/linux/compaction.h +++ b/include/linux/compaction.h @@ -45,11 +45,8 @@ extern void reset_isolation_suitable(pg_data_t *pgdat); extern unsigned long compaction_suitable(struct zone *zone, int order, int alloc_flags, int classzone_idx); -extern void defer_compaction(struct zone *zone, int order); -extern bool compaction_deferred(struct zone *zone, int order); -extern void compaction_defer_reset(struct zone *zone, int order, +extern void compaction_failed_reset(struct zone *zone, int order, bool alloc_success); -extern bool compaction_restarting(struct zone *zone, int order); #else static inline unsigned long try_to_compact_pages(gfp_t gfp_mask, @@ -74,15 +71,6 @@ static inline unsigned long compaction_suitable(struct zone *zone, int order, return COMPACT_SKIPPED; } -static inline void defer_compaction(struct zone *zone, int order) -{ -} - -static inline bool compaction_deferred(struct zone *zone, int order) -{ - return true; -} - #endif /* CONFIG_COMPACTION */ #if defined(CONFIG_COMPACTION) && defined(CONFIG_SYSFS) && defined(CONFIG_NUMA) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 700e9b5..e13b732 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -514,8 +514,7 @@ struct zone { * are skipped before trying again. The number attempted since * last failure is tracked with compact_considered. */ - unsigned intcompact_considered; - unsigned intcompact_defer_shift; + int compact_failed; int compact_order_failed; unsigned long compact_success; unsigned long compact_depletion_depth; diff --git a/include/trace/events/compaction.h b/include/trace/events/compaction.h index 9a6a3fe..323e614 100644 --- a/include/trace/events/compaction.h +++ b/include/trace/events/compaction.h @@ -239,7 +239,7 @@ DEFINE_EVENT(mm_compaction_suitable_template, mm_compaction_suitable, ); #ifdef CONFIG_COMPACTION -DECLARE_EVENT_CLASS(mm_compaction_defer_template, +DECLARE_EVENT_CLASS(mm_compaction_deplete_template, TP_PROTO(struct zone *zone, int order), @@ -249,8 +249,9 @@ DECLARE_EVENT_CLASS(mm_compaction_defer_template, __field(int, nid) __field(char *, name) __field(int, order) - __field(unsigned int, considered) - __field(unsigned int, defer_shift) + __field(unsigned long, success) + __field(unsigned long, depletion_depth) + __field(int, failed) __field(int, order_failed) ), @@ -258,35 +259,30 @@ DECLARE_EVENT_CLASS(mm_compaction_defer_template, __entry->nid = zone_to_nid(zone); __entry->name = (char *)zone->name; __entry->order = order; - __entry->considered = zone->compact_considered; - __entry->defer_shift = zone->compact_defer_shift; + __entry->success = zone->compact_success; + __entry->depletion_depth = zone->compact_depletion_depth; + __entry->failed = zone->compact_failed; __entry->order_failed = zone->compact_order_failed; ), - TP_printk("node=%d zone=%-8s order=%d order_failed=%d consider=%u limit=%lu", + TP_printk("node=%d zone=%-8s order=%d failed=%d order_failed=%d consider=%lu depth=%lu", __entry->nid, __entry->name, __entry->order, + __entry->failed, __entry->order_failed, - __entry->considered, - 1UL << __entry->defer_shift) + __entry->success, + __entry->depletion_depth) ); -DEFINE_EVENT(mm_compaction_defer_template, mm_compaction_deferred, +DEFINE_EVENT(mm_compaction_deplete_template, mm_compaction_fail_compaction, TP_PROTO(struct zone *zone, int order), TP_ARGS(zone, order) );
[RFC PATCH 04/10] mm/compaction: clean-up restarting condition check
Rename check function and move one outer condition check to this function. There is no functional change. Signed-off-by: Joonsoo Kim --- mm/compaction.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 2d8e211..dd2063b 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -188,8 +188,11 @@ void compaction_defer_reset(struct zone *zone, int order, } /* Returns true if restarting compaction after many failures */ -bool compaction_restarting(struct zone *zone, int order) +static bool compaction_direct_restarting(struct zone *zone, int order) { + if (current_is_kswapd()) + return false; + if (order < zone->compact_order_failed) return false; @@ -1327,7 +1330,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc) * is about to be retried after being deferred. kswapd does not do * this reset as it'll reset the cached information when going to sleep. */ - if (compaction_restarting(zone, cc->order) && !current_is_kswapd()) + if (compaction_direct_restarting(zone, cc->order)) __reset_isolation_suitable(zone); /* -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 01/10] mm/compaction: update skip-bit if whole pageblock is really scanned
Scanning pageblock is stopped at the middle of pageblock if enough pages are isolated. In the next run, it begins again at this position and if it find that there is no isolation candidate from the middle of pageblock to end of pageblock, it updates skip-bit. In this case, scanner doesn't start at begin of pageblock so it is not appropriate to set skipbit. This patch fixes this situation that updating skip-bit only happens when whole pageblock is really scanned. Signed-off-by: Joonsoo Kim --- mm/compaction.c | 32 ++-- 1 file changed, 18 insertions(+), 14 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 6ef2fdf..4397bf7 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -261,7 +261,8 @@ void reset_isolation_suitable(pg_data_t *pgdat) */ static void update_pageblock_skip(struct compact_control *cc, struct page *page, unsigned long nr_isolated, - bool migrate_scanner) + unsigned long start_pfn, unsigned long end_pfn, + unsigned long curr_pfn, bool migrate_scanner) { struct zone *zone = cc->zone; unsigned long pfn; @@ -275,6 +276,13 @@ static void update_pageblock_skip(struct compact_control *cc, if (nr_isolated) return; + /* Update the pageblock-skip if the whole pageblock was scanned */ + if (curr_pfn != end_pfn) + return; + + if (start_pfn != round_down(end_pfn - 1, pageblock_nr_pages)) + return; + set_pageblock_skip(page); pfn = page_to_pfn(page); @@ -300,7 +308,8 @@ static inline bool isolation_suitable(struct compact_control *cc, static void update_pageblock_skip(struct compact_control *cc, struct page *page, unsigned long nr_isolated, - bool migrate_scanner) + unsigned long start_pfn, unsigned long end_pfn, + unsigned long curr_pfn, bool migrate_scanner) { } #endif /* CONFIG_COMPACTION */ @@ -493,9 +502,6 @@ isolate_fail: trace_mm_compaction_isolate_freepages(*start_pfn, blockpfn, nr_scanned, total_isolated); - /* Record how far we have got within the block */ - *start_pfn = blockpfn; - /* * If strict isolation is requested by CMA then check that all the * pages requested were isolated. If there were any failures, 0 is @@ -507,9 +513,11 @@ isolate_fail: if (locked) spin_unlock_irqrestore(>zone->lock, flags); - /* Update the pageblock-skip if the whole pageblock was scanned */ - if (blockpfn == end_pfn) - update_pageblock_skip(cc, valid_page, total_isolated, false); + update_pageblock_skip(cc, valid_page, total_isolated, + *start_pfn, end_pfn, blockpfn, false); + + /* Record how far we have got within the block */ + *start_pfn = blockpfn; count_compact_events(COMPACTFREE_SCANNED, nr_scanned); if (total_isolated) @@ -806,12 +814,8 @@ isolate_success: if (locked) spin_unlock_irqrestore(>lru_lock, flags); - /* -* Update the pageblock-skip information and cached scanner pfn, -* if the whole pageblock was scanned without isolating any page. -*/ - if (low_pfn == end_pfn) - update_pageblock_skip(cc, valid_page, nr_isolated, true); + update_pageblock_skip(cc, valid_page, nr_isolated, + start_pfn, end_pfn, low_pfn, true); trace_mm_compaction_isolate_migratepages(start_pfn, low_pfn, nr_scanned, nr_isolated); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 02/10] mm/compaction: skip useless pfn for scanner's cached pfn
Scanner's cached pfn is used to determine the start position of scanner at next compaction run. Current cached pfn points the skipped pageblock so we uselessly checks whether pageblock is valid for compaction and skip-bit is set or not. If we set scanner's cached pfn to next pfn of skipped pageblock, we don't need to do this check. Signed-off-by: Joonsoo Kim --- mm/compaction.c | 15 ++- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 4397bf7..9c5d43c 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -265,7 +265,6 @@ static void update_pageblock_skip(struct compact_control *cc, unsigned long curr_pfn, bool migrate_scanner) { struct zone *zone = cc->zone; - unsigned long pfn; if (cc->ignore_skip_hint) return; @@ -285,18 +284,16 @@ static void update_pageblock_skip(struct compact_control *cc, set_pageblock_skip(page); - pfn = page_to_pfn(page); - /* Update where async and sync compaction should restart */ if (migrate_scanner) { - if (pfn > zone->compact_cached_migrate_pfn[0]) - zone->compact_cached_migrate_pfn[0] = pfn; + if (end_pfn > zone->compact_cached_migrate_pfn[0]) + zone->compact_cached_migrate_pfn[0] = end_pfn; if (cc->mode != MIGRATE_ASYNC && - pfn > zone->compact_cached_migrate_pfn[1]) - zone->compact_cached_migrate_pfn[1] = pfn; + end_pfn > zone->compact_cached_migrate_pfn[1]) + zone->compact_cached_migrate_pfn[1] = end_pfn; } else { - if (pfn < zone->compact_cached_free_pfn) - zone->compact_cached_free_pfn = pfn; + if (start_pfn < zone->compact_cached_free_pfn) + zone->compact_cached_free_pfn = start_pfn; } } #else -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PULL] Documentation for 4.2
The following changes since commit d4a4f75cd8f29cd9464a5a32e9224a91571d6649: Linux 4.1-rc7 (2015-06-07 20:23:50 -0700) are available in the git repository at: git://git.lwn.net/linux-2.6.git tags/docs-for-linus for you to fetch changes up to 36f95a0b34cb980dcfff9c1082ca5d8f0dc5e78b: doc:md: fix typo in md.txt. (2015-06-23 06:49:44 -0600) Documentation updates for 4.2 The main thing here is Ingo's big subdirectory documenting feature support for each architecture. Beyond that, it's the usual pile of fixes, tweaks, and small additions. Alexander Kuleshov (1): Documentation/kernel-parameters: add missing pciserial to the earlyprintk Andreas Gruenbacher (1): vfs: Minor documentation fix Anish Bhatt (1): kbuild : Fix documentation of INSTALL_HDR_PATH Baruch Siach (1): Documentation/CodingStyle: fix example macro parenthesis imbalance Ben Hutchings (1): firmware: Update information in linux.git about adding firmware Chen Gang (1): Docs: blackfin: Use new switch macro SAMPLE_IRQ_TIMER instead of IRQ_TIMER5 Chen Hanxiao (2): Docs: proc: fix kernel version docs: add VmPMD description in proc Christoffer Dall (1): stable: Update documentation to clarify preferred procedure Frans Klaver (1): Doc: networking: txtimestamp: fix printf format warning Geert Uytterhoeven (4): Documentation/magic-number: Remove SCI_MAGIC Documentation/magic-number: Remove SCC_MAGIC DMA-API: Spelling s/This/Think/ gpiolib: Grammar s/an negative/a negative/ H. Nikolaus Schaller (1): Documentation usb serial: fixed how to provide vendor and product id Ingo Molnar (44): Documentation/features/vm: Add feature description and arch support status file for 'numa-memblock' Documentation/features/vm: Add feature description and arch support status file for 'PG_uncached' Documentation/features/lib: Add feature description and arch support status file for 'strncasecmp' Documentation/features/io: Add feature description and arch support status file for 'sg-chain' Documentation/features/vm: Add feature description and arch support status file for 'huge-vmap' Documentation/features/vm: Add feature description and arch support status file for 'pte_special' Documentation/features/vm: Add feature description and arch support status file for 'pmdp_splitting_flush' Documentation/features/debug: Add feature description and arch support status file for 'KASAN' Documentation/features/time: Add feature description and arch support status file for 'modern-timekeeping' Documentation/features/time: Add feature description and arch support status file for 'virt-cpuacct' Documentation/features/time: Add feature description and arch support status file for 'irq-time-acct' Documentation/features/vm: Add feature description and arch support status file for 'THP' Documentation/features/locking: Add feature description and arch support status file for 'rwsem-optimized' Documentation/features/sched: Add feature description and arch support status file for 'numa-balancing' Documentation/features/io: Add feature description and arch support status file for 'dma-contiguous' Documentation/features/io: Add feature description and arch support status file for 'dma_map_attrs' Documentation/features/core: Add feature description and arch support status file for 'tracehook' Documentation/features/vm: Add feature description and arch support status file for 'ioremap_prot' Documentation/features/locking: Add feature description and arch support status file for 'lockdep' Documentation/features/debug: Add feature description and arch support status file for 'stackprotector' Documentation/features/core: Add feature description and arch support status file for 'jump-labels' Documentation/features/seccomp: Add feature description and arch support status file for 'seccomp-filter' Documentation/features/time: Add feature description and arch support status file for 'context-tracking' Documentation/features/debug: Add feature description and arch support status file for 'kgdb' Documentation/features/time: Add feature description and arch support status file for 'clockevents' Documentation/features/vm: Add feature description and arch support status file for 'ELF-ASLR' Documentation/features/time: Add feature description and arch support status file for 'arch-tick-broadcast' Documentation/features/debug: Add feature description and arch support status file for 'kprobes' Documentation/features/debug: Add feature description and arch support status file for 'optprobes' Documentation/features/debug: Add feature description and arch support status
Re: [RFC PATCH] Fix: x86 unaligned __memcpy to/from virtual memory
On Wed, Jun 24, 2015 at 4:54 PM, Mathieu Desnoyers wrote: > > OK, see below. This time the fault occurred at an unaligned address. > It fails on the !pte_present(*pte_ref) check. So every time, %rcx is 0x001fb. Once, your rdx value (which is remaining bytes after the movsq) was 3, the other two times it's 0. What's so magical about that 4056-byte copy (+3 bytes once)? Are you *sure* that copy is valid? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 00/14] crypto: add a new driver for Marvell's CESA
Hello Paul, On Thu, Jun 25, 2015 at 2:00 AM, Paul Gortmaker wrote: > [Re: [PATCH v7 00/14] crypto: add a new driver for Marvell's CESA] On > 22/06/2015 (Mon 15:59) Herbert Xu wrote: > >> On Mon, Jun 22, 2015 at 09:23:36AM +0200, Boris Brezillon wrote: >> > Hi Herbert, >> > >> > On Sun, 21 Jun 2015 16:27:17 +0800 >> > Herbert Xu wrote: >> > >> > > On Sun, Jun 21, 2015 at 10:24:18AM +0200, Boris Brezillon wrote: >> > > > >> > > > Indeed. Here is a patch fixing that. >> > > >> > > I think you should just kill COMPILE_TEST instead of adding ARM. >> > >> > The following patch is killing the COMPILE_TEST dependency. >> >> Patch applied. > > Just a heads up, this driver is still killing a couple of linux-next > builds today and for the past few days. > > drivers/crypto/mv_cesa.c:1037:2: error: implicit declaration of function > 'of_get_named_gen_pool' [-Werror=implicit-function-declaration] > > http://kisskb.ellerman.id.au/kisskb/buildresult/12448851/ > http://kisskb.ellerman.id.au/kisskb/buildresult/12448776/ > > Missing dependency on CONFIG_OF_ presumably. > I haven't looked at the series but has a stub of_get_named_gen_pool() function if CONFIG_OF is not enabled [0]. So it seems that the problem is rather that the header is not being included in some file. > Paul. > -- > Best regards, Javier [0]: http://lxr.free-electrons.com/source/include/linux/genalloc.h#L131 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] Fix: x86 unaligned __memcpy to/from virtual memory
- On Jun 24, 2015, at 7:54 PM, Mathieu Desnoyers mathieu.desnoy...@efficios.com wrote: > - On Jun 24, 2015, at 3:15 PM, Linus Torvalds > torva...@linux-foundation.org > wrote: > >> On Wed, Jun 24, 2015 at 11:49 AM, Mathieu Desnoyers >> wrote: >>> >>> Here is the output. I added the printk just after the initial range >>> check within vmalloc_fault. >> >> Good. Can you add printk's to the error return paths too, so that we >> see which one it is that triggers. > > OK, see below. This time the fault occurred at an unaligned address. > It fails on the !pte_present(*pte_ref) check. I just tried to to a bytewise copy in C rather than call memcpy, and I got the fault to trigger. So I guess I was on the wrong track assuming __memcpy would be the culprit. What is odd is that if I issue vmalloc_sync_all() after each vmalloc call, the OOPS never triggers. It is clearly a test case that ends up stressing vfree/vmalloc. [ 34.751984] DEBUG: vmalloc_fault at address 0xc9000729 [ 34.753188] DEBUG: !pte_present(*pte_ref) error [ 34.753188] BUG: unable to handle kernel paging request at c9000729 [ 34.753188] IP: [] lttng_event_write+0x90/0xd0 [lttng_ring_buffer_metadata_client] [ 34.753188] PGD 236c92067 PUD 236c93067 PMD b6964067 PTE 0 [ 34.753188] Oops: [#1] SMP [ 34.753188] Modules linked in: lttng_probe_workqueue(O) lttng_probe_vmscan(O) lttng_probe_udp(O) lttng_probe_timer(O) lttng_probe_sunrpc(O) lttng_probe_statedump(O) lttng_probe_sock(O) lttng_probe_skb(O) lttng_probe_signal(O) lttng_probe_scsi(O) lttng_probe_sched(O) lttng_probe_regmap(O) lttng_probe_rcu(O) lttng_probe_random(O) lttng_probe_power(O) lttng_probe_net(O) lttng_probe_napi(O) lttng_probe_module(O) lttng_probe_kmem(O) lttng_probe_jbd2(O) lttng_probe_irq(O) lttng_probe_ext4(O) lttng_probe_compaction(O) lttng_probe_block(O) lttng_types(O) lttng_ring_buffer_metadata_mmap_client(O) lttng_ring_buffer_client_mmap_overwrite(O) lttng_ring_buffer_client_mmap_discard(O) lttng_ring_buffer_metadata_client(O) lttng_ring_buffer_client_overwrite(O) lttng_ring_buffer_client_discard(O) lttng_tracer(O) lttng_statedump(O) lttng_kprobes(O) lttng_lib_ring_buffer(O) lttng_kretprobes(O) virtio_blk virtio_net virtio_pci virtio_ring virtio [last unloaded: lttng_statedump] [ 34.753188] CPU: 26 PID: 3563 Comm: lttng-consumerd Tainted: G O 4.1.0+ #11 [ 34.753188] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [ 34.753188] task: 880234d94880 ti: 88022af6c000 task.ti: 88022af6c000 [ 34.753188] RIP: 0010:[] [] lttng_event_write+0x90/0xd0 [lttng_ring_buffer_metadata_client] [ 34.753188] RSP: 0018:88022af6fda8 EFLAGS: 00010212 [ 34.753188] RAX: 009d RBX: 0fd8 RCX: 0025 [ 34.753188] RDX: 8800b7681120 RSI: c9000728ff63 RDI: [ 34.753188] RBP: 88022af6fdb8 R08: 009d R09: 88022ea33025 [ 34.753188] R10: 003b R11: 0246 R12: 88022af6fdc8 [ 34.753188] R13: 880231565c00 R14: 0fd8 R15: 0fd8 [ 34.753188] FS: 7fd64b5f2700() GS:88023754() knlGS: [ 34.753188] CS: 0010 DS: ES: CR0: 8005003b [ 34.753188] CR2: c9000729 CR3: 000233803000 CR4: 06e0 [ 34.753188] Stack: [ 34.753188] 880234cbff00 880234cbff50 88022af6fe48 a048e060 [ 34.753188] 880231565c00 0fd8 0001 [ 34.753188] 88023155d000 0fd8 4025 4025 [ 34.753188] Call Trace: [ 34.753188] [] lttng_metadata_output_channel+0xd0/0x120 [lttng_tracer] [ 34.753188] [] lttng_metadata_ring_buffer_ioctl+0x79/0xd0 [lttng_tracer] [ 34.753188] [] do_vfs_ioctl+0x2e0/0x4e0 [ 34.753188] [] ? file_has_perm+0x87/0xa0 [ 34.753188] [] SyS_ioctl+0x81/0xa0 [ 34.753188] [] ? syscall_trace_leave+0xd1/0xe0 [ 34.753188] [] tracesys_phase2+0x84/0x89 [ 34.753188] Code: d9 48 0f 47 cb 48 39 cb 75 46 48 8d 57 02 25 ff 0f 00 00 45 31 c0 48 89 c1 31 c0 48 c1 e2 04 4c 01 ca 66 0f 1f 84 00 00 00 00 00 <44> 0f b6 14 06 49 89 c9 4c 03 0a 41 83 c0 01 45 88 14 01 49 63 [ 34.753188] RIP [] lttng_event_write+0x90/0xd0 [lttng_ring_buffer_metadata_client] [ 34.753188] RSP [ 34.753188] CR2: c9000729 [ 34.753188] ---[ end trace 28951381246c3a2e ]--- -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v2 03/28] ACPICA: Hardware: Enable 64-bit firmware waking vector for selected FACS.
Hi, Rafael > From: Rafael J. Wysocki [mailto:r...@rjwysocki.net] > Sent: Thursday, June 25, 2015 7:24 AM > To: Zheng, Lv > > On Wednesday, June 24, 2015 04:05:42 PM Rafael J. Wysocki wrote: > > On Wednesday, June 24, 2015 11:02:10 AM Lv Zheng wrote: > > > ACPICA commit 7aa598d711644ab0de5f70ad88f1e2de253115e4 > > > > > > The following commit is reported to have broken s2ram on some platforms: > > > Commit: 0249ed2444d65d65fc3f3f64f398f1ad0b7e54cd > > > ACPICA: Add option to favor 32-bit FADT addresses. > > > The platform reports 2 FACS tables (which is not allowed by ACPI > > > specification) and the new 32-bit address favor rule forces OSPMs to use > > > the FACS table reported via FADT's X_FIRMWARE_CTRL field. > > > > > > The root cause of the reported bug might be one of the followings: > > > 1. BIOS may favor the 64-bit firmware waking vector address when the > > >version of the FACS is greater than 0 and Linux currently only supports > > >resuming from the real mode, so the 64-bit firmware waking vector has > > >never been set and might be invalid to BIOS while the commit enables > > >higher version FACS. > > > 2. BIOS may favor the FACS reported via the "FIRMWARE_CTRL" field in the > > >FADT while the commit doesn't set the firmware waking vector address of > > >the FACS reported by "FIRMWARE_CTRL", it only sets the firware waking > > >vector address of the FACS reported by "X_FIRMWARE_CTRL". > > > > > > This patch excludes the cases that can trigger the bugs caused by the root > > > cause 1. > > > > > > ACPI specification says: > > > A. 32-bit FACS address (FIRMWARE_CTRL field in FADT): > > >Physical memory address of the FACS, where OSPM and firmware exchange > > >control information. > > >If the X_FIRMWARE_CTRL field contains a non zero value then this field > > >must be zero. > > >A zero value indicates that no FACS is specified by this field. > > > B. 64-bit FACS address (X_FIRMWARE_CTRL field in FADT): > > >64bit physical memory address of the FACS. > > >This field is used when the physical address of the FACS is above 4GB. > > >If the FIRMWARE_CTRL field contains a non zero value then this field > > >must be zero. > > >A zero value indicates that no FACS is specified by this field. > > > Thus the 32bit and 64bit firmware waking vector should indicate completely > > > different resuming environment - real mode (1MB addressable) and non real > > > mode (4GB+ addressable) and currently Linux only supports resuming from > > > real mode. > > > > > > This patch enables 64-bit firmware waking vector for selected FACS via > > > acpi_set_firmware_waking_vector() so that it's up to OSPMs to determine > > > which > > > resuming mode should be used by BIOS and ACPICA changes won't trigger the > > > bugs caused by the root cause 1. For example, Linux can pass > > > physical_address64=0 as the parameter of > > > acpi_set_firmware_waking_vector() to > > > indicate no 64bit waking vector support. Lv Zheng. > > > > > > This patch also updates acpi_set_firmware_waking_vector() invocations in > > > order to keep 32-bit firmware waking vector favor for Linux. 64-bit > > > firmware waking vector has never been enabled by Linux. The > > > (acpi_physical_address)0 for 64-bit address can be used to force ACPICA to > > > set only 32-bit firmware waking vector for Linux. > > > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=74021 > > > Link: https://github.com/acpica/acpica/commit/7aa598d7 > > > Cc: 3.14.1+ # 3.14.1+ > > > Reported-and-tested-by: Oswald Buddenhagen > > > Signed-off-by: Lv Zheng > > > Signed-off-by: Bob Moore > > > Cc: Thomas Gleixner > > > Cc: Ingo Molnar > > > Cc: "H. Peter Anvin" > > > Cc: x...@kernel.org > > > Cc: Tony Luck > > > Cc: Fenghua Yu > > > Cc: linux-i...@vger.kernel.org > > > --- > > > arch/ia64/include/asm/acpi.h|3 +- > > > arch/ia64/kernel/acpi.c |2 -- > > > arch/x86/include/asm/acpi.h |3 +- > > > drivers/acpi/acpica/hwxfsleep.c | 61 > > > --- > > > drivers/acpi/sleep.c|8 +++-- > > > include/acpi/acpixf.h | 11 +++ > > > 6 files changed, 33 insertions(+), 55 deletions(-) > > > > > > diff --git a/arch/ia64/include/asm/acpi.h b/arch/ia64/include/asm/acpi.h > > > index aa0fdf1..0ac4fab 100644 > > > --- a/arch/ia64/include/asm/acpi.h > > > +++ b/arch/ia64/include/asm/acpi.h > > > @@ -79,7 +79,8 @@ int acpi_gsi_to_irq (u32 gsi, unsigned int *irq); > > > /* Low-level suspend routine. */ > > > extern int acpi_suspend_lowlevel(void); > > > > > > -extern unsigned long acpi_wakeup_address; > > > +#define acpi_wakeup_address ((acpi_physical_address)0) > > > +#define acpi_wakeup_address64((acpi_physical_address)0) > > > > > > /* > > > * Record the cpei override flag and current logical cpu. This is > > > diff --git a/arch/ia64/kernel/acpi.c b/arch/ia64/kernel/acpi.c > > > index
RE: [PATCH v2 02/28] ACPICA: Linuxize: Replace __FUNCTION__ with __func__.
Hi, > From: Christoph Hellwig [mailto:h...@infradead.org] > Sent: Wednesday, June 24, 2015 8:56 PM > > On Wed, Jun 24, 2015 at 11:02:03AM +0800, Lv Zheng wrote: > > ACPICA commit cb3d1c79f862cd368d749c9b8d9dced40111b0d0 > > > > __FUNCTION__ is MSVC only, in Linux, it is __func__. Lv Zheng. > > > > In ACPICA, this is achieved by string replacement in release script and > > this patch contains the source code difference between the Linux upstream > > and ACPICA that is caused by the back porting. > > __func__ is in C99 and never. __FUNCTION__ is an old extension supported > by various compilers. This patch description is used in ACPICA upstream. For ACPICA code base, __FUNCTION__ is only used for its MSVC builds. And __func__ is converted from __FUNCTION__ by the linuxize release script. See the original commit here: https://github.com/acpica/acpica/commit/cb3d1c79 So this is simply an automated release output. Without this merged, source code differences between Linux upstream and ACPICA upstream will hurt the automation. Thanks and best regards -Lv -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/3] mfd: ChromeOS EC Kconfig dependency cleanup
Hello, This is a trivial series that do some changes to the dependency for the ChromeOS EC drivers Kconfig symbols. The patches are on top of Paul's patch "mfd: fix dependency warning for CHROME_PLATFORMS on !X86, !ARM": https://lkml.org/lkml/2015/6/20/219. Paul fixed a warning about unmet dependencies but I think the correct fix is to remove unneded dependencies. So that is what this series do and are composed of the following patches: Javier Martinez Canillas (3): platform/chrome: Don't make CHROME_PLATFORMS depends on X86 || ARM mfd: Remove MFD_CROS_EC depends on X86 || ARM mfd: Remove MFD_CROS_EC_SPI depends on OF drivers/mfd/Kconfig | 3 +-- drivers/platform/chrome/Kconfig | 1 - 2 files changed, 1 insertion(+), 3 deletions(-) -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/3] mfd: Remove MFD_CROS_EC_SPI depends on OF
The ChromeOS EC SPI transport driver has a dependency on OF because it uses some OF helpers from the header. But there isn't a need for an explicit dependency since the header has stub functions if CONFIG_OF is not defined. Also, MFD_CROS_EC_SPI already depends on MFD_CROS_EC which in turn has a dependency on OF so in practice can't be selected without CONFIG_OF. Signed-off-by: Javier Martinez Canillas --- drivers/mfd/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/mfd/Kconfig b/drivers/mfd/Kconfig index 653815950aa2..3f68dd251ce8 100644 --- a/drivers/mfd/Kconfig +++ b/drivers/mfd/Kconfig @@ -115,7 +115,7 @@ config MFD_CROS_EC_I2C config MFD_CROS_EC_SPI tristate "ChromeOS Embedded Controller (SPI)" - depends on MFD_CROS_EC && CROS_EC_PROTO && SPI && OF + depends on MFD_CROS_EC && CROS_EC_PROTO && SPI ---help--- If you say Y here, you get support for talking to the ChromeOS EC -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/3] platform/chrome: Don't make CHROME_PLATFORMS depends on X86 || ARM
The Chrome platform support depends on X86 || ARM because there are only Chromebooks using those architectures. But only some drivers depend on a given architecture, and the ones that do already have a dependency on their specific Kconfig symbol entries. An option is to also make CHROME_PLATFORMS depends on || COMPILE_TEST but is more future proof to remove the dependency and let the drivers be built in all architectures if possible to have more build coverage. Signed-off-by: Javier Martinez Canillas --- drivers/platform/chrome/Kconfig | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/platform/chrome/Kconfig b/drivers/platform/chrome/Kconfig index cb1329919527..3271cd1abe7c 100644 --- a/drivers/platform/chrome/Kconfig +++ b/drivers/platform/chrome/Kconfig @@ -4,7 +4,6 @@ menuconfig CHROME_PLATFORMS bool "Platform support for Chrome hardware" - depends on X86 || ARM ---help--- Say Y here to get to see options for platform support for various Chromebooks and Chromeboxes. This option alone does -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/3] mfd: Remove MFD_CROS_EC depends on X86 || ARM
A dependency on X86 || ARM for MFD_CROS_EC was added to fix the warning: (MFD_CROS_EC) selects CHROME_PLATFORMS which has unmet direct dependencies (X86 || ARM) This happened because CHROME_PLATFORMS had a dependency on X86 || ARM but that dependency was removed since there isn't a reason why the option can not be selected on other architectures. So now the above warning will not happen and the MFD_CROS_EC dependency can be removed since is not needed. Signed-off-by: Javier Martinez Canillas --- drivers/mfd/Kconfig | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/mfd/Kconfig b/drivers/mfd/Kconfig index d3235e6f1953..653815950aa2 100644 --- a/drivers/mfd/Kconfig +++ b/drivers/mfd/Kconfig @@ -94,7 +94,6 @@ config MFD_AXP20X config MFD_CROS_EC tristate "ChromeOS Embedded Controller" - depends on X86 || ARM select MFD_CORE select CHROME_PLATFORMS select CROS_EC_PROTO -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFCv2][PATCH 7/7] fsnotify: track when ignored mask clearing is needed
From: Dave Hansen According to Jan Kara: You can have ignored mask set without any of the notification masks set and you are expected to clear the ignored mask on the first IN_MODIFY event. But, the only way we currently have to go and find if we need to do this ignored-mask-clearing is to go through the mark lists and look for them. That mark list iteration requires an srcu_read_lock() which has a memory barrier and can be expensive. The calculation of 'has_ignore' is pretty cheap because we store it next to another value which we are updating and we do it inside of a loop we were already running. This patch will really only matter when we have a workload where a file is being modified often _and_ there is an active fsnotify mark on it. Otherwise the checks against *_fsnotify.marks.first will keep us out of the expensive srcu_read_lock() call. Cc: Jan Kara Cc: Alexander Viro Cc: linux-fsde...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Paul E. McKenney Cc: Tim Chen Cc: Andi Kleen Signed-off-by: Dave Hansen --- b/fs/notify/fsnotify.c | 44 ++-- b/fs/notify/mark.c |8 +-- b/include/linux/fsnotify_head.h |1 3 files changed, 45 insertions(+), 8 deletions(-) diff -puN fs/notify/fsnotify.c~fsnotify-ignore-present fs/notify/fsnotify.c --- a/fs/notify/fsnotify.c~fsnotify-ignore-present 2015-06-24 17:14:37.187226743 -0700 +++ b/fs/notify/fsnotify.c 2015-06-24 17:14:37.194227057 -0700 @@ -183,6 +183,34 @@ static int send_to_group(struct inode *t } /* + * The "logical or" of all of the marks' ->mask is kept in the + * i/mnt_fsnotify.mask. We can check it instead of going + * through all of the marks. fsnotify_recalc_mask() does the + * updates. + */ +static int some_mark_is_interested(__u32 mask, struct inode *inode, struct mount *mnt) +{ + if (mask & inode->i_fsnotify.mask) + return 1; + if (mnt && (mask & mnt->mnt_fsnotify.mask)) + return 1; + return 0; +} + +/* + * fsnotify_recalc_mask() recalculates "has_ignore" whenever any + * mark's flags change. + */ +static int some_mark_needs_ignore_clear(struct inode *inode, struct mount *mnt) +{ + if (inode->i_fsnotify.has_ignore) + return 1; + if (mnt && mnt->mnt_fsnotify.has_ignore) + return 1; + return 0; +} + +/* * This is the main call to fsnotify. The VFS calls into hook specific functions * in linux/fsnotify.h. Those functions then in turn call here. Here will call * out to all of the registered fsnotify_group. Those groups can then use the @@ -205,14 +233,18 @@ int fsnotify(struct inode *to_tell, __u3 mnt = NULL; /* -* if this is a modify event we may need to clear the ignored masks -* otherwise return if neither the inode nor the vfsmount care about -* this type of event. +* We must clear the (user-visible) ignored mask on the first IN_MODIFY +* event despite the 'mask' which is passed in here. But we can safely +* skip that step if we know there are no marks which need this action. +* +* We can also skip looking at the list of marks if we know that none +* of the marks are interested in the events in our 'mask'. */ - if (!(mask & FS_MODIFY) && - !(test_mask & to_tell->i_fsnotify.mask) && - !(mnt && test_mask & mnt->mnt_fsnotify.mask)) + if ((mask & FS_MODIFY) && !some_mark_needs_ignore_clear(to_tell, mnt)) + return 0; + else if (!some_mark_is_interested(test_mask, to_tell, mnt)) return 0; + /* * Optimization: srcu_read_lock() has a memory barrier which can * be expensive. It protects walking the *_fsnotify_marks lists. diff -puN fs/notify/mark.c~fsnotify-ignore-present fs/notify/mark.c --- a/fs/notify/mark.c~fsnotify-ignore-present 2015-06-24 17:14:37.189226832 -0700 +++ b/fs/notify/mark.c 2015-06-24 17:14:37.194227057 -0700 @@ -116,10 +116,14 @@ void fsnotify_recalc_mask(struct fsnotif { u32 new_mask = 0; struct fsnotify_mark *mark; + u32 has_ignore = 0; - hlist_for_each_entry(mark, >marks, obj_list) + hlist_for_each_entry(mark, >marks, obj_list) { + if (mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY) + has_ignore = 1; new_mask |= mark->mask; - + } + fsn->has_ignore = has_ignore; fsn->mask = new_mask; } diff -puN include/linux/fsnotify_head.h~fsnotify-ignore-present include/linux/fsnotify_head.h --- a/include/linux/fsnotify_head.h~fsnotify-ignore-present 2015-06-24 17:14:37.190226877 -0700 +++ b/include/linux/fsnotify_head.h 2015-06-24 17:14:37.193227012 -0700 @@ -11,6 +11,7 @@ struct fsnotify_head { #ifdef CONFIG_FSNOTIFY __u32 mask; /* all events this object