from:"Rafael J. Wysocki"

Re: [PATCH v3 1/5] ACPI: video: Handle fetching EDID that is longer than 256 bytes

2024-02-06 Thread Rafael J. Wysocki

On Fri, Feb 2, 2024 at 5:09 PM Mario Limonciello
 wrote:
>
> On 2/2/2024 10:07, Rafael J. Wysocki wrote:
> > On Thu, Feb 1, 2024 at 11:11 PM Mario Limonciello
> >  wrote:
> >>
> >> The ACPI specification allows for an EDID to be up to 512 bytes but
> >> the _DDC EDID fetching code will only try up to 256 bytes.
> >>
> >> Modify the code to instead start at 512 bytes and work it's way
> >> down instead.
> >>
> >> As _DDC is now called up to 4 times on a machine debugging messages
> >> are noisier than necessary.  Decrease from info to debug.
> >>
> >> Link: 
> >> https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/Apx_B_Video_Extensions/output-device-specific-methods.html#ddc-return-the-edid-for-this-device
> >> Signed-off-by: Mario Limonciello 
> >
> > Acked-by: Rafael J. Wysocki 
> >
> > or I can apply it if that's preferred.
>
> Thanks!
>
> I think go ahead and apply this one to your -next tree.

Applied now.

Barring any issues with it, It will get into linux-next in a couple of days.

Thanks!

Re: [PATCH v3 1/5] ACPI: video: Handle fetching EDID that is longer than 256 bytes

2024-02-02 Thread Rafael J. Wysocki

On Thu, Feb 1, 2024 at 11:11 PM Mario Limonciello
 wrote:
>
> The ACPI specification allows for an EDID to be up to 512 bytes but
> the _DDC EDID fetching code will only try up to 256 bytes.
>
> Modify the code to instead start at 512 bytes and work it's way
> down instead.
>
> As _DDC is now called up to 4 times on a machine debugging messages
> are noisier than necessary.  Decrease from info to debug.
>
> Link: 
> https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/Apx_B_Video_Extensions/output-device-specific-methods.html#ddc-return-the-edid-for-this-device
> Signed-off-by: Mario Limonciello 

Acked-by: Rafael J. Wysocki 

or I can apply it if that's preferred.

Thanks!

> ---
> v1->v2:
>  * Use for loop for acpi_video_get_edid()
>  * Use one of Rafael's suggestions for acpi_video_device_EDID()
>  * Decrease message level too
> ---
>  drivers/acpi/acpi_video.c | 25 +
>  1 file changed, 9 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/acpi/acpi_video.c b/drivers/acpi/acpi_video.c
> index 4afdda9db019..3bfd013e09d2 100644
> --- a/drivers/acpi/acpi_video.c
> +++ b/drivers/acpi/acpi_video.c
> @@ -625,12 +625,9 @@ acpi_video_device_EDID(struct acpi_video_device *device,
>
> if (!device)
> return -ENODEV;
> -   if (length == 128)
> -   arg0.integer.value = 1;
> -   else if (length == 256)
> -   arg0.integer.value = 2;
> -   else
> +   if (!length || (length % 128))
> return -EINVAL;
> +   arg0.integer.value = length / 128;
>
> status = acpi_evaluate_object(device->dev->handle, "_DDC", , 
> );
> if (ACPI_FAILURE(status))
> @@ -641,7 +638,8 @@ acpi_video_device_EDID(struct acpi_video_device *device,
> if (obj && obj->type == ACPI_TYPE_BUFFER)
> *edid = obj;
> else {
> -   acpi_handle_info(device->dev->handle, "Invalid _DDC data\n");
> +   acpi_handle_debug(device->dev->handle,
> +"Invalid _DDC data for length %ld\n", 
> length);
> status = -EFAULT;
> kfree(obj);
> }
> @@ -1447,7 +1445,6 @@ int acpi_video_get_edid(struct acpi_device *device, int 
> type, int device_id,
>
> for (i = 0; i < video->attached_count; i++) {
> video_device = video->attached_array[i].bind_info;
> -   length = 256;
>
> if (!video_device)
> continue;
> @@ -1478,18 +1475,14 @@ int acpi_video_get_edid(struct acpi_device *device, 
> int type, int device_id,
> continue;
> }
>
> -   status = acpi_video_device_EDID(video_device, , 
> length);
> -
> -   if (ACPI_FAILURE(status) || !buffer ||
> -   buffer->type != ACPI_TYPE_BUFFER) {
> -   length = 128;
> +   for (length = 512; length > 0; length -= 128) {
> status = acpi_video_device_EDID(video_device, ,
> length);
> -   if (ACPI_FAILURE(status) || !buffer ||
> -   buffer->type != ACPI_TYPE_BUFFER) {
> -   continue;
> -   }
> +   if (ACPI_SUCCESS(status))
> +   break;
> }
> +   if (!length)
> +   continue;
>
> *edid = buffer->buffer.pointer;
> return length;
> --
> 2.34.1
>

Re: [PATCH 1/2] ACPI: video: Handle fetching EDID that is longer than 256 bytes

2024-01-29 Thread Rafael J. Wysocki

On Fri, Jan 26, 2024 at 7:55 PM Mario Limonciello
 wrote:
>
> The ACPI specification allows for an EDID to be up to 512 bytes but
> the _DDC EDID fetching code will only try up to 256 bytes.
>
> Modify the code to instead start at 512 bytes and work it's way
> down instead.
>
> Link: 
> https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/Apx_B_Video_Extensions/output-device-specific-methods.html#ddc-return-the-edid-for-this-device
> Signed-off-by: Mario Limonciello 
> ---
>  drivers/acpi/acpi_video.c | 23 ---
>  1 file changed, 16 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/acpi/acpi_video.c b/drivers/acpi/acpi_video.c
> index 62f4364e4460..b3b15dd4755d 100644
> --- a/drivers/acpi/acpi_video.c
> +++ b/drivers/acpi/acpi_video.c
> @@ -624,6 +624,10 @@ acpi_video_device_EDID(struct acpi_video_device *device,
> arg0.integer.value = 1;
> else if (length == 256)
> arg0.integer.value = 2;
> +   else if (length == 384)
> +   arg0.integer.value = 3;
> +   else if (length == 512)
> +   arg0.integer.value = 4;

It looks like switch () would be somewhat better.

Or maybe even

arg0.integer.value = length / 128;

The validation could be added too:

if (arg0.integer.value > 4 || arg0.integer.value * 128 != length)
return -EINVAL;

but it is pointless, because the caller is never passing an invalid
number to it AFAICS.

> else
> return -EINVAL;
>
> @@ -1443,7 +1447,7 @@ int acpi_video_get_edid(struct acpi_device *device, int 
> type, int device_id,
>
> for (i = 0; i < video->attached_count; i++) {
> video_device = video->attached_array[i].bind_info;
> -   length = 256;
> +   length = 512;
>
> if (!video_device)
> continue;
> @@ -1478,13 +1482,18 @@ int acpi_video_get_edid(struct acpi_device *device, 
> int type, int device_id,
>
> if (ACPI_FAILURE(status) || !buffer ||
> buffer->type != ACPI_TYPE_BUFFER) {
> -   length = 128;
> -   status = acpi_video_device_EDID(video_device, ,
> -   length);
> -   if (ACPI_FAILURE(status) || !buffer ||
> -   buffer->type != ACPI_TYPE_BUFFER) {
> -   continue;
> +   while (length) {

I would prefer a do {} while () loop here, which could include the
first invocation of acpi_video_device_EDID() too (and reduce code
duplication a bit).

> +   length -= 128;
> +   status = acpi_video_device_EDID(video_device, 
> ,
> +   length);

No line break, please.

> +   if (ACPI_FAILURE(status) || !buffer ||
> +   buffer->type != ACPI_TYPE_BUFFER) {
> +   continue;
> +   }
> +   break;
> }
> +   if (!length)
> +   continue;
> }
>
> *edid = buffer->buffer.pointer;
> --

Re: [PATCH v3 1/2] pm: runtime: Simplify pm_runtime_get_if_active() usage

2024-01-22 Thread Rafael J. Wysocki

On Mon, Jan 22, 2024 at 7:12 PM Bjorn Helgaas  wrote:
>
> On Mon, Jan 22, 2024 at 01:41:21PM +0200, Sakari Ailus wrote:
> > There are two ways to opportunistically increment a device's runtime PM
> > usage count, calling either pm_runtime_get_if_active() or
> > pm_runtime_get_if_in_use(). The former has an argument to tell whether to
> > ignore the usage count or not, and the latter simply calls the former with
> > ign_usage_count set to false. The other users that want to ignore the
> > usage_count will have to explitly set that argument to true which is a bit
> > cumbersome.
>
> s/explitly/explicitly/
>
> > To make this function more practical to use, remove the ign_usage_count
> > argument from the function. The main implementation is renamed as
> > pm_runtime_get_conditional().
> >
> > Signed-off-by: Sakari Ailus 
> > Reviewed-by: Alex Elder  # drivers/net/ipa/ipa_smp2p.c
> > Reviewed-by: Laurent Pinchart 
> > Acked-by: Takashi Iwai  # sound/
> > Reviewed-by: Jacek Lawrynowicz  # 
> > drivers/accel/ivpu/
> > Acked-by: Rodrigo Vivi  # drivers/gpu/drm/i915/
> > Reviewed-by: Rodrigo Vivi 
>
> Acked-by: Bjorn Helgaas  # drivers/pci/
>
> > -EXPORT_SYMBOL_GPL(pm_runtime_get_if_active);
> > +EXPORT_SYMBOL_GPL(pm_runtime_get_conditional);
>
> If pm_runtime_get_conditional() is exported, shouldn't it also be
> documented in Documentation/power/runtime_pm.rst?
>
> But I'm dubious about exporting it because
> __intel_runtime_pm_get_if_active() is the only caller, and you end up
> with the same pattern there that we have before this series in the PM
> core.  Why can't intel_runtime_pm.c be updated to use
> pm_runtime_get_if_active() or pm_runtime_get_if_in_use() directly, and
> make pm_runtime_get_conditional() static?

Sounds like a good suggestion to me.

Re: Question about device links between supplier and consumer

2023-12-07 Thread Rafael J. Wysocki

+Saravana

On Thu, Dec 7, 2023 at 10:51 AM richard clark
 wrote:
>
> Hi,
>
> I have to comment out below code to make the mmc driver be probed
> before the kernel try to run the init mounting the rootfs in the dev
> node generate by the driver:
>
> really_probe(...)
> {
>...
> #if 0
> link_ret = device_links_check_suppliers(dev);
> if (link_ret == -EPROBE_DEFER)
> return link_ret;
> ...
> if (!list_empty(>devres_head)) {
> dev_crit(dev, "Resources present before probing\n");
> ret = -EBUSY;
> goto done;
> }
> #endif
> ...
> }
>
> Otherwise, the mmc driver will be defer probed after the init
> executed, as you can imagine, the init will complain it can not find
> the dev node specified by the 'root=/dev/xxx' in the kernel. command
> line.
>
> This is really bad! I don't know how to check the device dependency or
> what I should do in my driver to make it follow the device dependent
> link rule? or sth i am missing...
>
> Thanks!

Re: [PATCH] driver core: make device_is_dependent() static

2023-11-28 Thread Rafael J. Wysocki

On Tue, Nov 28, 2023 at 11:28 AM Greg Kroah-Hartman
 wrote:
>
> The function device_is_dependent() is only called by the driver core
> internally and should not, at this time, be called by anyone else
> outside of it, so mark it as static so as not to give driver authors the
> wrong idea.
>
> Cc: "Rafael J. Wysocki" 
> Cc: Saravana Kannan 
> Signed-off-by: Greg Kroah-Hartman 

Acked-by: "Rafael J. Wysocki" 

> ---
>  drivers/base/core.c| 2 +-
>  include/linux/device.h | 1 -
>  2 files changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index dafdb9970901..6dcc26eec096 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -298,7 +298,7 @@ static inline bool 
> device_link_flag_is_sync_state_only(u32 flags)
>   * Check if @target depends on @dev or any device dependent on it (its child 
> or
>   * its consumer etc).  Return 1 if that is the case or 0 otherwise.
>   */
> -int device_is_dependent(struct device *dev, void *target)
> +static int device_is_dependent(struct device *dev, void *target)
>  {
> struct device_link *link;
> int ret;
> diff --git a/include/linux/device.h b/include/linux/device.h
> index c11d60cabaab..6a4ee40af3df 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -1071,7 +1071,6 @@ int device_rename(struct device *dev, const char 
> *new_name);
>  int device_move(struct device *dev, struct device *new_parent,
> enum dpm_order dpm_order);
>  int device_change_owner(struct device *dev, kuid_t kuid, kgid_t kgid);
> -int device_is_dependent(struct device *dev, void *target);
>
>  static inline bool device_supports_offline(struct device *dev)
>  {
> --
> 2.43.0
>

Re: [V11 1/8] ACPI: Add support for AMD ACPI based Wifi band RFI mitigation feature

2023-09-18 Thread Rafael J. Wysocki

On Thu, Aug 31, 2023 at 8:21 AM Evan Quan  wrote:
>
> Due to electrical and mechanical constraints in certain platform designs
> there may be likely interference of relatively high-powered harmonics of
> the (G-)DDR memory clocks with local radio module frequency bands used
> by Wifi 6/6e/7.
>
> To mitigate this, AMD has introduced a mechanism that devices can use to
> notify active use of particular frequencies so that other devices can make
> relative internal adjustments as necessary to avoid this resonance.

The changelog is only marginally useful IMV.

It doesn't even mention the role of ACPI in all this, so it is quite
unclear what the patch is all about, why it does what it does and what
is actually done in it.

It is also unclear why this code is put into drivers/acpi/, which
should be explained.

> Signed-off-by: Evan Quan 
> --
> v10->v11:
>   - fix typo(Simon)
> ---
>  drivers/acpi/Kconfig  |  17 ++
>  drivers/acpi/Makefile |   2 +
>  drivers/acpi/amd_wbrf.c   | 414 ++
>  include/linux/acpi_amd_wbrf.h | 140 
>  4 files changed, 573 insertions(+)
>  create mode 100644 drivers/acpi/amd_wbrf.c
>  create mode 100644 include/linux/acpi_amd_wbrf.h
>
> diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
> index 00dd309b6682..a092ea72d152 100644
> --- a/drivers/acpi/Kconfig
> +++ b/drivers/acpi/Kconfig
> @@ -594,6 +594,23 @@ config ACPI_PRMT
>   substantially increase computational overhead related to the
>   initialization of some server systems.
>
> +config WBRF_AMD_ACPI
> +   bool "ACPI based WBRF mechanism introduced by AMD"
> +   depends on ACPI
> +   default n
> +   help
> + Wifi band RFI mitigation mechanism allows multiple drivers from
> + different domains to notify the frequencies in use so that hardware
> + can be reconfigured to avoid harmonic conflicts.

So drivers can notify the platform firmware IIUC, but that is not
really clear from the above.  I'm not even sure what the phrase
"notify the frequencies in use" is supposed to mean.

> +
> + AMD has introduced an ACPI based mechanism to support WBRF for some
> + platforms with AMD dGPU and WLAN. This needs support from BIOS 
> equipped
> + with necessary AML implementations and dGPU firmwares.
> +
> + Before enabling this ACPI based mechanism, it is suggested to 
> confirm
> + with the hardware designer/provider first whether your platform
> + equipped with necessary BIOS and firmwares.

No, this doesn't work.

Say you are a distro and you want to supply all of your users with the
same binary kernel image.  What are you expected to do to address the
above?

> +
>  endif  # ACPI
>
>  config X86_PM_TIMER
> diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
> index eaa09bf52f17..a3d2f259d0a5 100644
> --- a/drivers/acpi/Makefile
> +++ b/drivers/acpi/Makefile
> @@ -132,3 +132,5 @@ obj-$(CONFIG_ARM64) += arm64/
>  obj-$(CONFIG_ACPI_VIOT)+= viot.o
>
>  obj-$(CONFIG_RISCV)+= riscv/
> +
> +obj-$(CONFIG_WBRF_AMD_ACPI)+= amd_wbrf.o
> diff --git a/drivers/acpi/amd_wbrf.c b/drivers/acpi/amd_wbrf.c
> new file mode 100644
> index ..8ee0e2977a30
> --- /dev/null
> +++ b/drivers/acpi/amd_wbrf.c
> @@ -0,0 +1,414 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Wifi Band Exclusion Interface (AMD ACPI Implementation)
> + * Copyright (C) 2023 Advanced Micro Devices
> + *

Please document the code in this file at least basically.

You don't even explain what Wifi Band Exclusion means.

The OS-firmware interface that this code is based on should be
described here or a link to its description should be provided at the
very least.

As it is now, one needs to reverse engineer the patch in order to get
any idea about how this interface is designed.

> + */
> +
> +#include 
> +#include 
> +
> +#define ACPI_AMD_WBRF_METHOD   "\\WBRF"
> +
> +/*
> + * Functions bit vector for WBRF method
> + *
> + * Bit 0: Supported for any functions other than function 0.
> + * Bit 1: Function 1 (Add / Remove frequency) is supported.
> + * Bit 2: Function 2 (Get frequency list) is supported.
> + */

Without any additional information, the comment above is meaningless.

> +#define WBRF_ENABLED   0x0
> +#define WBRF_RECORD0x1
> +#define WBRF_RETRIEVE  0x2
> +
> +/* record actions */
> +#define WBRF_RECORD_ADD0x0
> +#define WBRF_RECORD_REMOVE 0x1
> +
> +#define WBRF_REVISION  0x1
> +
> +/*
> + * The data structure used for WBRF_RETRIEVE is not naturally aligned.
> + * And unfortunately the design has been settled down.
> + */
> +struct amd_wbrf_ranges_out {
> +   u32 num_of_ranges;
> +   struct exclusion_range  band_list[MAX_NUM_OF_WBRF_RANGES];
> +} __packed;
> +
> +static const guid_t wifi_acpi_dsm_guid =
> +

Re: [PATCH v5 03/11] PM / QoS: Fix constraints alloc vs reclaim locking

2023-08-22 Thread Rafael J. Wysocki

+0x10c/0x178
> msm_job_run+0x78/0x150
> drm_sched_main+0x290/0x370
> kthread+0xf0/0x100
> ret_from_fork+0x10/0x20
>
> The issue is that dev_pm_qos_mtx is held in the runpm suspend/resume (or
> freq change) path, but it is also held across allocations that could
> recurse into shrinker.
>
> Solve this by changing dev_pm_qos_constraints_allocate() into a function
> that can be called unconditionally before the device qos object is
> needed and before aquiring dev_pm_qos_mtx.  This way the allocations can

acquiring

> be done without holding the mutex.  In the case that we raced with
> another thread to allocate the qos object, detect this *after* acquiring
> the dev_pm_qos_mtx and simply free the redundant allocations.
>
> Suggested-by: Rafael J. Wysocki 
> Signed-off-by: Rob Clark 

Please feel free to add

Acked-by: Rafael J. Wysocki 

to this patch and the next 2 PM QoS ones in this series.

Thanks!

> ---
>  drivers/base/power/qos.c | 76 +---
>  1 file changed, 56 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/base/power/qos.c b/drivers/base/power/qos.c
> index 8e93167f1783..7e95760d16dc 100644
> --- a/drivers/base/power/qos.c
> +++ b/drivers/base/power/qos.c
> @@ -185,27 +185,33 @@ static int apply_constraint(struct dev_pm_qos_request 
> *req,
>  }
>
>  /*
> - * dev_pm_qos_constraints_allocate
> + * dev_pm_qos_constraints_allocate: Allocate and initializes qos constraints
>   * @dev: device to allocate data for
>   *
> - * Called at the first call to add_request, for constraint data allocation
> - * Must be called with the dev_pm_qos_mtx mutex held
> + * Called to allocate constraints before dev_pm_qos_mtx mutex is held.  
> Should
> + * be matched with a call to dev_pm_qos_constraints_set() once dev_pm_qos_mtx
> + * is held.
>   */
> -static int dev_pm_qos_constraints_allocate(struct device *dev)
> +static struct dev_pm_qos *dev_pm_qos_constraints_allocate(struct device *dev)
>  {
> struct dev_pm_qos *qos;
> struct pm_qos_constraints *c;
> struct blocking_notifier_head *n;
>
> -   qos = kzalloc(sizeof(*qos), GFP_KERNEL);
> +   /*
> +* If constraints are already allocated, we can skip speculatively
> +* allocating a new one, as we don't have to work about qos 
> transitioning
> +* from non-null to null.  The constraints are only freed on device
> +* removal.
> +*/
> +   if (dev->power.qos)
> +   return NULL;
> +
> +   qos = kzalloc(sizeof(*qos) + 3 * sizeof(*n), GFP_KERNEL);
> if (!qos)
> -   return -ENOMEM;
> +   return NULL;
>
> -   n = kzalloc(3 * sizeof(*n), GFP_KERNEL);
> -   if (!n) {
> -   kfree(qos);
> -   return -ENOMEM;
> -   }
> +   n = (struct blocking_notifier_head *)(qos + 1);
>
> c = >resume_latency;
> plist_head_init(>list);
> @@ -227,11 +233,29 @@ static int dev_pm_qos_constraints_allocate(struct 
> device *dev)
>
> INIT_LIST_HEAD(>flags.list);
>
> +   return qos;
> +}
> +
> +/*
> + * dev_pm_qos_constraints_set: Ensure dev->power.qos is set
> + *
> + * If dev->power.qos is already set, free the newly allocated qos 
> constraints.
> + * Otherwise set dev->power.qos.  Must be called with dev_pm_qos_mtx held.
> + *
> + * This split unsynchronized allocation and synchronized set moves allocation
> + * out from under dev_pm_qos_mtx, so that lockdep does does not get angry 
> about
> + * drivers which use dev_pm_qos in paths related to shrinker/reclaim.
> + */
> +static void dev_pm_qos_constraints_set(struct device *dev, struct dev_pm_qos 
> *qos)
> +{
> +   if (dev->power.qos) {
> +   kfree(qos);
> +   return;
> +   }
> +
> spin_lock_irq(>power.lock);
> dev->power.qos = qos;
> spin_unlock_irq(>power.lock);
> -
> -   return 0;
>  }
>
>  static void __dev_pm_qos_hide_latency_limit(struct device *dev);
> @@ -309,7 +333,6 @@ void dev_pm_qos_constraints_destroy(struct device *dev)
> dev->power.qos = ERR_PTR(-ENODEV);
> spin_unlock_irq(>power.lock);
>
> -   kfree(qos->resume_latency.notifiers);
> kfree(qos);
>
>   out:
> @@ -341,7 +364,7 @@ static int __dev_pm_qos_add_request(struct device *dev,
> if (IS_ERR(dev->power.qos))
> ret = -ENODEV;
> else if (!dev->power.qos)
> -   ret = dev_pm_qos_constraints_allocate(dev);
> +   ret = -ENOMEM;
>
> trace_dev_pm_qos_add_req

Re: [RFC] PM / QoS: Decouple request alloc from dev_pm_qos_mtx (alternative solution)

2023-08-07 Thread Rafael J. Wysocki

On Fri, Aug 4, 2023 at 11:41 PM Rob Clark  wrote:
>
> From: Rob Clark 
>
> Similar to the previous patch, move the allocation out from under
> dev_pm_qos_mtx, by speculatively doing the allocation and handle
> any race after acquiring dev_pm_qos_mtx by freeing the redundant
> allocation.
>
> Suggested-by: Rafael J. Wysocki 
> Signed-off-by: Rob Clark 
> ---
> This is an alternative to 
> https://patchwork.freedesktop.org/patch/551417/?series=115028=4
>
> So, this does _slightly_ change error paths, for ex
> dev_pm_qos_update_user_latency_tolerance() will now allocate
> dev->power.qos in some error cases.  But this seems harmless?

It is harmless AFAICS.

> A slightly more complicated version of this could conserve the
> previous error path behavior, but I figured I'd try the simpler
> thing first.

Good choice!

>  drivers/base/power/qos.c | 13 +++--
>  1 file changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/base/power/qos.c b/drivers/base/power/qos.c
> index 1b73a704aac1..c7ba85e89c42 100644
> --- a/drivers/base/power/qos.c
> +++ b/drivers/base/power/qos.c
> @@ -920,8 +920,12 @@ s32 dev_pm_qos_get_user_latency_tolerance(struct device 
> *dev)
>  int dev_pm_qos_update_user_latency_tolerance(struct device *dev, s32 val)
>  {
> struct dev_pm_qos *qos = dev_pm_qos_constraints_allocate();
> +   struct dev_pm_qos_request *req = NULL;
> int ret = 0;
>
> +   if (!dev->power.qos->latency_tolerance_req)
> +   req = kzalloc(sizeof(*req), GFP_KERNEL);
> +
> mutex_lock(_pm_qos_mtx);
>
> dev_pm_qos_constraints_set(dev, qos);
> @@ -935,8 +939,6 @@ int dev_pm_qos_update_user_latency_tolerance(struct 
> device *dev, s32 val)
> goto out;
>
> if (!dev->power.qos->latency_tolerance_req) {
> -   struct dev_pm_qos_request *req;
> -
> if (val < 0) {
> if (val == PM_QOS_LATENCY_TOLERANCE_NO_CONSTRAINT)
> ret = 0;
> @@ -944,17 +946,15 @@ int dev_pm_qos_update_user_latency_tolerance(struct 
> device *dev, s32 val)
> ret = -EINVAL;
> goto out;
> }
> -   req = kzalloc(sizeof(*req), GFP_KERNEL);
> if (!req) {
> ret = -ENOMEM;
> goto out;
> }
> ret = __dev_pm_qos_add_request(dev, req, 
> DEV_PM_QOS_LATENCY_TOLERANCE, val);
> -   if (ret < 0) {
> -   kfree(req);
> +   if (ret < 0)
> goto out;
> -   }
> dev->power.qos->latency_tolerance_req = req;
> +   req = NULL;
> } else {
> if (val < 0) {
> __dev_pm_qos_drop_user_request(dev, 
> DEV_PM_QOS_LATENCY_TOLERANCE);
> @@ -966,6 +966,7 @@ int dev_pm_qos_update_user_latency_tolerance(struct 
> device *dev, s32 val)
>
>   out:
> mutex_unlock(_pm_qos_mtx);
> +   kfree(req);
> return ret;
>  }
>  EXPORT_SYMBOL_GPL(dev_pm_qos_update_user_latency_tolerance);
> --

Yes, something like this, please!

Re: [PATCH v3 3/9] PM / QoS: Fix constraints alloc vs reclaim locking

2023-08-04 Thread Rafael J. Wysocki

On Fri, Aug 4, 2023 at 10:38 PM Rob Clark  wrote:
>
> On Fri, Aug 4, 2023 at 12:11 PM Rafael J. Wysocki  wrote:
> >
> > On Fri, Aug 4, 2023 at 8:38 PM Rob Clark  wrote:
> > >
> > > On Fri, Aug 4, 2023 at 10:07 AM Rafael J. Wysocki  
> > > wrote:
> > > >
> > > > On Fri, Aug 4, 2023 at 12:02 AM Rob Clark  wrote:
> > > > >
> > > > > From: Rob Clark 
> > > > >
> > > > > In the process of adding lockdep annotation for drm GPU scheduler's
> > > > > job_run() to detect potential deadlock against shrinker/reclaim, I hit
> > > > > this lockdep splat:
> > > > >
> > > > >==
> > > > >WARNING: possible circular locking dependency detected
> > > > >6.2.0-rc8-debug+ #558 Tainted: GW
> > > > >--
> > > > >ring0/125 is trying to acquire lock:
> > > > >ffd6d6ce0f28 (dev_pm_qos_mtx){+.+.}-{3:3}, at: 
> > > > > dev_pm_qos_update_request+0x38/0x68
> > > > >
> > > > >but task is already holding lock:
> > > > >ff8087239208 (>active_lock){+.+.}-{3:3}, at: 
> > > > > msm_gpu_submit+0xec/0x178
> > > > >
> > > > >which lock already depends on the new lock.
> > > > >
> > > > >the existing dependency chain (in reverse order) is:
> > > > >
> > > > >-> #4 (>active_lock){+.+.}-{3:3}:
> > > > >   __mutex_lock+0xcc/0x3c8
> > > > >   mutex_lock_nested+0x30/0x44
> > > > >   msm_gpu_submit+0xec/0x178
> > > > >   msm_job_run+0x78/0x150
> > > > >   drm_sched_main+0x290/0x370
> > > > >   kthread+0xf0/0x100
> > > > >   ret_from_fork+0x10/0x20
> > > > >
> > > > >-> #3 (dma_fence_map){}-{0:0}:
> > > > >   __dma_fence_might_wait+0x74/0xc0
> > > > >   dma_resv_lockdep+0x1f4/0x2f4
> > > > >   do_one_initcall+0x104/0x2bc
> > > > >   kernel_init_freeable+0x344/0x34c
> > > > >   kernel_init+0x30/0x134
> > > > >   ret_from_fork+0x10/0x20
> > > > >
> > > > >-> #2 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}:
> > > > >   fs_reclaim_acquire+0x80/0xa8
> > > > >   slab_pre_alloc_hook.constprop.0+0x40/0x25c
> > > > >   __kmem_cache_alloc_node+0x60/0x1cc
> > > > >   __kmalloc+0xd8/0x100
> > > > >   topology_parse_cpu_capacity+0x8c/0x178
> > > > >   get_cpu_for_node+0x88/0xc4
> > > > >   parse_cluster+0x1b0/0x28c
> > > > >   parse_cluster+0x8c/0x28c
> > > > >   init_cpu_topology+0x168/0x188
> > > > >   smp_prepare_cpus+0x24/0xf8
> > > > >   kernel_init_freeable+0x18c/0x34c
> > > > >   kernel_init+0x30/0x134
> > > > >   ret_from_fork+0x10/0x20
> > > > >
> > > > >-> #1 (fs_reclaim){+.+.}-{0:0}:
> > > > >   __fs_reclaim_acquire+0x3c/0x48
> > > > >   fs_reclaim_acquire+0x54/0xa8
> > > > >   slab_pre_alloc_hook.constprop.0+0x40/0x25c
> > > > >   __kmem_cache_alloc_node+0x60/0x1cc
> > > > >   kmalloc_trace+0x50/0xa8
> > > > >   dev_pm_qos_constraints_allocate+0x38/0x100
> > > > >   __dev_pm_qos_add_request+0xb0/0x1e8
> > > > >   dev_pm_qos_add_request+0x58/0x80
> > > > >   dev_pm_qos_expose_latency_limit+0x60/0x13c
> > > > >   register_cpu+0x12c/0x130
> > > > >   topology_init+0xac/0xbc
> > > > >   do_one_initcall+0x104/0x2bc
> > > > >   kernel_init_freeable+0x344/0x34c
> > > > >   kernel_init+0x30/0x134
> > > > >   ret_from_fork+0x10/0x20
> > > > >
> > > > >-> #0 (dev_pm_qos_mtx){+.+.}-{3:3}:
> > > > >   __lock_acquire+0xe00/0x1060
> > > > >   lock_acquire+0x1e0/0x2f8
> > > > >   __mutex_lock+0xcc/0x3c8
> > >

Re: [PATCH v3 3/9] PM / QoS: Fix constraints alloc vs reclaim locking

2023-08-04 Thread Rafael J. Wysocki

On Fri, Aug 4, 2023 at 8:38 PM Rob Clark  wrote:
>
> On Fri, Aug 4, 2023 at 10:07 AM Rafael J. Wysocki  wrote:
> >
> > On Fri, Aug 4, 2023 at 12:02 AM Rob Clark  wrote:
> > >
> > > From: Rob Clark 
> > >
> > > In the process of adding lockdep annotation for drm GPU scheduler's
> > > job_run() to detect potential deadlock against shrinker/reclaim, I hit
> > > this lockdep splat:
> > >
> > >==
> > >WARNING: possible circular locking dependency detected
> > >6.2.0-rc8-debug+ #558 Tainted: GW
> > >--
> > >ring0/125 is trying to acquire lock:
> > >ffd6d6ce0f28 (dev_pm_qos_mtx){+.+.}-{3:3}, at: 
> > > dev_pm_qos_update_request+0x38/0x68
> > >
> > >but task is already holding lock:
> > >ff8087239208 (>active_lock){+.+.}-{3:3}, at: 
> > > msm_gpu_submit+0xec/0x178
> > >
> > >which lock already depends on the new lock.
> > >
> > >the existing dependency chain (in reverse order) is:
> > >
> > >-> #4 (>active_lock){+.+.}-{3:3}:
> > >   __mutex_lock+0xcc/0x3c8
> > >   mutex_lock_nested+0x30/0x44
> > >   msm_gpu_submit+0xec/0x178
> > >   msm_job_run+0x78/0x150
> > >   drm_sched_main+0x290/0x370
> > >   kthread+0xf0/0x100
> > >   ret_from_fork+0x10/0x20
> > >
> > >-> #3 (dma_fence_map){}-{0:0}:
> > >   __dma_fence_might_wait+0x74/0xc0
> > >   dma_resv_lockdep+0x1f4/0x2f4
> > >   do_one_initcall+0x104/0x2bc
> > >   kernel_init_freeable+0x344/0x34c
> > >   kernel_init+0x30/0x134
> > >   ret_from_fork+0x10/0x20
> > >
> > >-> #2 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}:
> > >   fs_reclaim_acquire+0x80/0xa8
> > >   slab_pre_alloc_hook.constprop.0+0x40/0x25c
> > >   __kmem_cache_alloc_node+0x60/0x1cc
> > >   __kmalloc+0xd8/0x100
> > >   topology_parse_cpu_capacity+0x8c/0x178
> > >   get_cpu_for_node+0x88/0xc4
> > >   parse_cluster+0x1b0/0x28c
> > >   parse_cluster+0x8c/0x28c
> > >   init_cpu_topology+0x168/0x188
> > >   smp_prepare_cpus+0x24/0xf8
> > >   kernel_init_freeable+0x18c/0x34c
> > >   kernel_init+0x30/0x134
> > >   ret_from_fork+0x10/0x20
> > >
> > >-> #1 (fs_reclaim){+.+.}-{0:0}:
> > >   __fs_reclaim_acquire+0x3c/0x48
> > >   fs_reclaim_acquire+0x54/0xa8
> > >   slab_pre_alloc_hook.constprop.0+0x40/0x25c
> > >   __kmem_cache_alloc_node+0x60/0x1cc
> > >   kmalloc_trace+0x50/0xa8
> > >   dev_pm_qos_constraints_allocate+0x38/0x100
> > >   __dev_pm_qos_add_request+0xb0/0x1e8
> > >   dev_pm_qos_add_request+0x58/0x80
> > >   dev_pm_qos_expose_latency_limit+0x60/0x13c
> > >   register_cpu+0x12c/0x130
> > >   topology_init+0xac/0xbc
> > >   do_one_initcall+0x104/0x2bc
> > >   kernel_init_freeable+0x344/0x34c
> > >   kernel_init+0x30/0x134
> > >   ret_from_fork+0x10/0x20
> > >
> > >-> #0 (dev_pm_qos_mtx){+.+.}-{3:3}:
> > >   __lock_acquire+0xe00/0x1060
> > >   lock_acquire+0x1e0/0x2f8
> > >   __mutex_lock+0xcc/0x3c8
> > >   mutex_lock_nested+0x30/0x44
> > >   dev_pm_qos_update_request+0x38/0x68
> > >   msm_devfreq_boost+0x40/0x70
> > >   msm_devfreq_active+0xc0/0xf0
> > >   msm_gpu_submit+0x10c/0x178
> > >   msm_job_run+0x78/0x150
> > >   drm_sched_main+0x290/0x370
> > >   kthread+0xf0/0x100
> > >   ret_from_fork+0x10/0x20
> > >
> > >other info that might help us debug this:
> > >
> > >Chain exists of:
> > >  dev_pm_qos_mtx --> dma_fence_map --> >active_lock
> > >
> > > Possible unsafe locking scenario:
> > >
> > >   CPU0CPU1
> > >   
> > >  lock(>active_lock);
> > >   lock(dma_fence_ma

Re: [PATCH v3 3/9] PM / QoS: Fix constraints alloc vs reclaim locking

2023-08-04 Thread Rafael J. Wysocki

On Fri, Aug 4, 2023 at 12:02 AM Rob Clark  wrote:
>
> From: Rob Clark 
>
> In the process of adding lockdep annotation for drm GPU scheduler's
> job_run() to detect potential deadlock against shrinker/reclaim, I hit
> this lockdep splat:
>
>==
>WARNING: possible circular locking dependency detected
>6.2.0-rc8-debug+ #558 Tainted: GW
>--
>ring0/125 is trying to acquire lock:
>ffd6d6ce0f28 (dev_pm_qos_mtx){+.+.}-{3:3}, at: 
> dev_pm_qos_update_request+0x38/0x68
>
>but task is already holding lock:
>ff8087239208 (>active_lock){+.+.}-{3:3}, at: 
> msm_gpu_submit+0xec/0x178
>
>which lock already depends on the new lock.
>
>the existing dependency chain (in reverse order) is:
>
>-> #4 (>active_lock){+.+.}-{3:3}:
>   __mutex_lock+0xcc/0x3c8
>   mutex_lock_nested+0x30/0x44
>   msm_gpu_submit+0xec/0x178
>   msm_job_run+0x78/0x150
>   drm_sched_main+0x290/0x370
>   kthread+0xf0/0x100
>   ret_from_fork+0x10/0x20
>
>-> #3 (dma_fence_map){}-{0:0}:
>   __dma_fence_might_wait+0x74/0xc0
>   dma_resv_lockdep+0x1f4/0x2f4
>   do_one_initcall+0x104/0x2bc
>   kernel_init_freeable+0x344/0x34c
>   kernel_init+0x30/0x134
>   ret_from_fork+0x10/0x20
>
>-> #2 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}:
>   fs_reclaim_acquire+0x80/0xa8
>   slab_pre_alloc_hook.constprop.0+0x40/0x25c
>   __kmem_cache_alloc_node+0x60/0x1cc
>   __kmalloc+0xd8/0x100
>   topology_parse_cpu_capacity+0x8c/0x178
>   get_cpu_for_node+0x88/0xc4
>   parse_cluster+0x1b0/0x28c
>   parse_cluster+0x8c/0x28c
>   init_cpu_topology+0x168/0x188
>   smp_prepare_cpus+0x24/0xf8
>   kernel_init_freeable+0x18c/0x34c
>   kernel_init+0x30/0x134
>   ret_from_fork+0x10/0x20
>
>-> #1 (fs_reclaim){+.+.}-{0:0}:
>   __fs_reclaim_acquire+0x3c/0x48
>   fs_reclaim_acquire+0x54/0xa8
>   slab_pre_alloc_hook.constprop.0+0x40/0x25c
>   __kmem_cache_alloc_node+0x60/0x1cc
>   kmalloc_trace+0x50/0xa8
>   dev_pm_qos_constraints_allocate+0x38/0x100
>   __dev_pm_qos_add_request+0xb0/0x1e8
>   dev_pm_qos_add_request+0x58/0x80
>   dev_pm_qos_expose_latency_limit+0x60/0x13c
>   register_cpu+0x12c/0x130
>   topology_init+0xac/0xbc
>   do_one_initcall+0x104/0x2bc
>   kernel_init_freeable+0x344/0x34c
>   kernel_init+0x30/0x134
>   ret_from_fork+0x10/0x20
>
>-> #0 (dev_pm_qos_mtx){+.+.}-{3:3}:
>   __lock_acquire+0xe00/0x1060
>   lock_acquire+0x1e0/0x2f8
>   __mutex_lock+0xcc/0x3c8
>   mutex_lock_nested+0x30/0x44
>   dev_pm_qos_update_request+0x38/0x68
>   msm_devfreq_boost+0x40/0x70
>   msm_devfreq_active+0xc0/0xf0
>   msm_gpu_submit+0x10c/0x178
>   msm_job_run+0x78/0x150
>   drm_sched_main+0x290/0x370
>   kthread+0xf0/0x100
>   ret_from_fork+0x10/0x20
>
>other info that might help us debug this:
>
>Chain exists of:
>  dev_pm_qos_mtx --> dma_fence_map --> >active_lock
>
> Possible unsafe locking scenario:
>
>   CPU0CPU1
>   
>  lock(>active_lock);
>   lock(dma_fence_map);
>   lock(>active_lock);
>  lock(dev_pm_qos_mtx);
>
> *** DEADLOCK ***
>
>3 locks held by ring0/123:
> #0: ff8087251170 (>lock){+.+.}-{3:3}, at: msm_job_run+0x64/0x150
> #1: ffd00b0e57e8 (dma_fence_map){}-{0:0}, at: 
> msm_job_run+0x68/0x150
> #2: ff8087251208 (>active_lock){+.+.}-{3:3}, at: 
> msm_gpu_submit+0xec/0x178
>
>stack backtrace:
>CPU: 6 PID: 123 Comm: ring0 Not tainted 6.2.0-rc8-debug+ #559
>Hardware name: Google Lazor (rev1 - 2) with LTE (DT)
>Call trace:
> dump_backtrace.part.0+0xb4/0xf8
> show_stack+0x20/0x38
> dump_stack_lvl+0x9c/0xd0
> dump_stack+0x18/0x34
> print_circular_bug+0x1b4/0x1f0
> check_noncircular+0x78/0xac
> __lock_acquire+0xe00/0x1060
> lock_acquire+0x1e0/0x2f8
> __mutex_lock+0xcc/0x3c8
> mutex_lock_nested+0x30/0x44
> dev_pm_qos_update_request+0x38/0x68
> msm_devfreq_boost+0x40/0x70
> msm_devfreq_active+0xc0/0xf0
> msm_gpu_submit+0x10c/0x178
> msm_job_run+0x78/0x150
> drm_sched_main+0x290/0x370
> kthread+0xf0/0x100
> ret_from_fork+0x10/0x20
>
> The issue is that dev_pm_qos_mtx is held in the runpm suspend/resume (or
> freq change) path, but it is also held across allocations that could
> recurse into shrinker.
>
> Solve this by changing dev_pm_qos_constraints_allocate() into a function
> that can be called unconditionally before

Re: [PATCH V4 1/8] drivers/acpi: Add support for Wifi band RF mitigations

2023-06-23 Thread Rafael J. Wysocki

On Fri, Jun 23, 2023 at 6:48 PM Limonciello, Mario
 wrote:
>
>
> On 6/23/2023 11:28 AM, Rafael J. Wysocki wrote:
> > On Fri, Jun 23, 2023 at 5:57 PM Limonciello, Mario
> >  wrote:
> >>
> >> On 6/23/2023 9:52 AM, Rafael J. Wysocki wrote:
> >>> On Wed, Jun 21, 2023 at 7:47 AM Evan Quan  wrote:
> >>>> From: Mario Limonciello 
> >>>>
> >>>> Due to electrical and mechanical constraints in certain platform designs
> >>>> there may be likely interference of relatively high-powered harmonics of
> >>>> the (G-)DDR memory clocks with local radio module frequency bands used
> >>>> by Wifi 6/6e/7.
> >>>>
> >>>> To mitigate this, AMD has introduced an ACPI based mechanism that
> >>>> devices can use to notify active use of particular frequencies so
> >>>> that devices can make relative internal adjustments as necessary
> >>>> to avoid this resonance.
> >>>>
> >>>> In order for a device to support this, the expected flow for device
> >>>> driver or subsystems:
> >>>>
> >>>> Drivers/subsystems contributing frequencies:
> >>>>
> >>>> 1) During probe, check `wbrf_supported_producer` to see if WBRF supported
> >>> The prefix should be acpi_wbrf_ or acpi_amd_wbrf_ even, so it is clear
> >>> that this uses ACPI and is AMD-specific.
> >> I guess if we end up with an intermediary library approach
> >> wbrf_supported_producer makes sense and that could call acpi_wbrf_*.
> >>
> >> But with no intermediate library your suggestion makes sense.
> >>
> >> I would prefer not to make it acpi_amd as there is no reason that
> >> this exact same problem couldn't happen on an
> >> Wifi 6e + Intel SOC + AMD dGPU design too and OEMs could use the
> >> same mitigation mechanism as Wifi6e + AMD SOC + AMD dGPU too.
> > The mitigation mechanism might be the same, but the AML interface very
> > well may be different.
>
>
> Right.  I suppose right now we should keep it prefixed as "amd",
> and if it later is promoted as a standard it can be renamed.
>
>
> >
> > My point is that this particular interface is AMD-specific ATM and I'm
> > not aware of any plans to make it "standard" in some way.
>
>
> Yeah; this implementation is currently AMD specific AML, but I
> expect the exact same AML would be delivered to OEMs using the
> dGPUs.
>
>
> >
> > Also if the given interface is specified somewhere, it would be good
> > to have a pointer to that place.
>
>
> It's a code first implementation.  I'm discussing with the
> owners when they will release it.
>
>
> >
> >>> Whether or not there needs to be an intermediate library wrapped
> >>> around this is a different matter.
> > IMO individual drivers should not be expected to use this interface
> > directly, as that would add to boilerplate code and overall bloat.
>
> The thing is the ACPI method is not a platform method.  It's
> a function of the device (_DSM).

_DSM is an interface to the platform like any other AML, so I'm not
really sure what you mean.

> The reason for having acpi_wbrf.c in the first place is to
> avoid the boilerplate of the _DSM implementation across multiple
> drivers.

Absolutely, drivers should not be bothered with having to use _DSM in
any case.  However, they may not even realize that they are running on
a system using ACPI and I'm not sure if they really should care.

> >
> > Also whoever uses it, would first need to check if the device in
> > question has an ACPI companion.
>
>
> Which comes back to Andrew's point.
> Either we:
>
> Have a generic wbrf_ helper that takes struct *device and
> internally checks if there is an ACPI companion and support.
>
> or
>
> Do the check for support in mac80211 + applicable drivers
> and only call the AMD WBRF ACPI method in those drivers in
> those cases.

Either of the above has problems IMO.

The problem with the wbrf_ helper approach is that it adds
(potentially) several pieces of interaction with the platform,
potentially for every driver, in places where drivers don't do such
things as a rule.

The problem with the other approach is that the drivers in question
now need to be aware of ACPI in general and the AMD WBRF interface in
particular and if other similar interfaces are added by other vendors,
they will have to learn about those as well.

I think that we need to start over with a general problem statement
that in some cases the platform needs to be consulted regarding radio
frequencies that drivers would like to use, because it may need to
adjust or simply say which ranges are "noisy" (or even completely
unusable for that matter).  That should allow us to figure out how the
interface should look like from the driver side and it should be
possible to hook up the existing platform interface to that.

Re: [PATCH V4 1/8] drivers/acpi: Add support for Wifi band RF mitigations

2023-06-23 Thread Rafael J. Wysocki

On Wed, Jun 21, 2023 at 7:47 AM Evan Quan  wrote:
>
> From: Mario Limonciello 
>
> Due to electrical and mechanical constraints in certain platform designs
> there may be likely interference of relatively high-powered harmonics of
> the (G-)DDR memory clocks with local radio module frequency bands used
> by Wifi 6/6e/7.
>
> To mitigate this, AMD has introduced an ACPI based mechanism that
> devices can use to notify active use of particular frequencies so
> that devices can make relative internal adjustments as necessary
> to avoid this resonance.
>
> In order for a device to support this, the expected flow for device
> driver or subsystems:
>
> Drivers/subsystems contributing frequencies:
>
> 1) During probe, check `wbrf_supported_producer` to see if WBRF supported
>for the device.
> 2) If adding frequencies, then call `wbrf_add_exclusion` with the
>start and end ranges of the frequencies.
> 3) If removing frequencies, then call `wbrf_remove_exclusion` with
>start and end ranges of the frequencies.
>
> Drivers/subsystems responding to frequencies:
>
> 1) During probe, check `wbrf_supported_consumer` to see if WBRF is supported
>for the device.
> 2) Call the `wbrf_retrieve_exclusions` to retrieve the current
>exclusions on receiving an ACPI notification for a new frequency
>change.
>
> Signed-off-by: Mario Limonciello 
> Co-developed-by: Evan Quan 
> Signed-off-by: Evan Quan 
> --
> v1->v2:
>   - move those wlan specific implementations to net/mac80211(Mario)
> ---
>  drivers/acpi/Kconfig |   7 ++
>  drivers/acpi/Makefile|   2 +
>  drivers/acpi/acpi_wbrf.c | 215 +++
>  include/linux/wbrf.h |  55 ++
>  4 files changed, 279 insertions(+)
>  create mode 100644 drivers/acpi/acpi_wbrf.c
>  create mode 100644 include/linux/wbrf.h
>
> diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
> index ccbeab9500ec..0276c1487fa2 100644
> --- a/drivers/acpi/Kconfig
> +++ b/drivers/acpi/Kconfig
> @@ -611,3 +611,10 @@ config X86_PM_TIMER
>
>   You should nearly always say Y here because many modern
>   systems require this timer.
> +
> +config ACPI_WBRF
> +   bool "ACPI Wifi band RF mitigation mechanism"
> +   help
> + Wifi band RF mitigation mechanism allows multiple drivers from
> + different domains to notify the frequencies in use so that hardware
> + can be reconfigured to avoid harmonic conflicts.
> diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
> index feb36c0b9446..14863b0c619f 100644
> --- a/drivers/acpi/Makefile
> +++ b/drivers/acpi/Makefile
> @@ -131,3 +131,5 @@ obj-y   += dptf/
>  obj-$(CONFIG_ARM64)+= arm64/
>
>  obj-$(CONFIG_ACPI_VIOT)+= viot.o
> +
> +obj-$(CONFIG_ACPI_WBRF)+= acpi_wbrf.o
> diff --git a/drivers/acpi/acpi_wbrf.c b/drivers/acpi/acpi_wbrf.c
> new file mode 100644
> index ..8c275998ac29
> --- /dev/null
> +++ b/drivers/acpi/acpi_wbrf.c
> @@ -0,0 +1,215 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * AMD Wifi Band Exclusion Interface

Where is the AML interface for this defined and how does it work?

> + * Copyright (C) 2023 Advanced Micro Devices
> + *
> + */
> +
> +#include 
> +
> +/* functions */
> +#define WBRF_RECORD0x1
> +#define WBRF_RETRIEVE  0x2
> +
> +/* record actions */
> +#define WBRF_RECORD_ADD0x0
> +#define WBRF_RECORD_REMOVE 0x1
> +
> +#define WBRF_REVISION  0x1
> +
> +static const guid_t wifi_acpi_dsm_guid =
> +   GUID_INIT(0x7b7656cf, 0xdc3d, 0x4c1c,
> + 0x83, 0xe9, 0x66, 0xe7, 0x21, 0xde, 0x30, 0x70);
> +
> +static int wbrf_dsm(struct acpi_device *adev, u8 fn,
> +   union acpi_object *argv4,
> +   union acpi_object **out)
> +{
> +   union acpi_object *obj;
> +   int rc;
> +
> +   obj = acpi_evaluate_dsm(adev->handle, _acpi_dsm_guid,
> +   WBRF_REVISION, fn, argv4);
> +   if (!obj)
> +   return -ENXIO;
> +
> +   switch (obj->type) {
> +   case ACPI_TYPE_BUFFER:
> +   if (!*out) {
> +   rc = -EINVAL;
> +   break;

I'm not sure why you want to return an error in this case.  Did you
really mean !out?

> +   }
> +   *out = obj;
> +   return 0;
> +
> +   case ACPI_TYPE_INTEGER:
> +   rc =  obj->integer.value ? -EINVAL : 0;
> +   break;

An empty line here, please, as you added one after the return statement above.

> +   default:
> +   rc = -EOPNOTSUPP;
> +   }
> +   ACPI_FREE(obj);
> +
> +   return rc;

How does the caller know whether or not they need to free the out
object after calling this function?

> +}
> +
> +static int wbrf_record(struct acpi_device *adev, uint8_t action,
> +  struct wbrf_ranges_in *in)
> +{
> +   union

Re: [PATCH V4 1/8] drivers/acpi: Add support for Wifi band RF mitigations

2023-06-23 Thread Rafael J. Wysocki

On Fri, Jun 23, 2023 at 5:57 PM Limonciello, Mario
 wrote:
>
>
> On 6/23/2023 9:52 AM, Rafael J. Wysocki wrote:
> > On Wed, Jun 21, 2023 at 7:47 AM Evan Quan  wrote:
> >> From: Mario Limonciello 
> >>
> >> Due to electrical and mechanical constraints in certain platform designs
> >> there may be likely interference of relatively high-powered harmonics of
> >> the (G-)DDR memory clocks with local radio module frequency bands used
> >> by Wifi 6/6e/7.
> >>
> >> To mitigate this, AMD has introduced an ACPI based mechanism that
> >> devices can use to notify active use of particular frequencies so
> >> that devices can make relative internal adjustments as necessary
> >> to avoid this resonance.
> >>
> >> In order for a device to support this, the expected flow for device
> >> driver or subsystems:
> >>
> >> Drivers/subsystems contributing frequencies:
> >>
> >> 1) During probe, check `wbrf_supported_producer` to see if WBRF supported
> > The prefix should be acpi_wbrf_ or acpi_amd_wbrf_ even, so it is clear
> > that this uses ACPI and is AMD-specific.
>
> I guess if we end up with an intermediary library approach
> wbrf_supported_producer makes sense and that could call acpi_wbrf_*.
>
> But with no intermediate library your suggestion makes sense.
>
> I would prefer not to make it acpi_amd as there is no reason that
> this exact same problem couldn't happen on an
> Wifi 6e + Intel SOC + AMD dGPU design too and OEMs could use the
> same mitigation mechanism as Wifi6e + AMD SOC + AMD dGPU too.

The mitigation mechanism might be the same, but the AML interface very
well may be different.

My point is that this particular interface is AMD-specific ATM and I'm
not aware of any plans to make it "standard" in some way.

Also if the given interface is specified somewhere, it would be good
to have a pointer to that place.

> >
> > Whether or not there needs to be an intermediate library wrapped
> > around this is a different matter.

IMO individual drivers should not be expected to use this interface
directly, as that would add to boilerplate code and overall bloat.

Also whoever uses it, would first need to check if the device in
question has an ACPI companion.

Re: [PATCH V4 1/8] drivers/acpi: Add support for Wifi band RF mitigations

2023-06-23 Thread Rafael J. Wysocki

On Wed, Jun 21, 2023 at 7:47 AM Evan Quan  wrote:
>
> From: Mario Limonciello 
>
> Due to electrical and mechanical constraints in certain platform designs
> there may be likely interference of relatively high-powered harmonics of
> the (G-)DDR memory clocks with local radio module frequency bands used
> by Wifi 6/6e/7.
>
> To mitigate this, AMD has introduced an ACPI based mechanism that
> devices can use to notify active use of particular frequencies so
> that devices can make relative internal adjustments as necessary
> to avoid this resonance.
>
> In order for a device to support this, the expected flow for device
> driver or subsystems:
>
> Drivers/subsystems contributing frequencies:
>
> 1) During probe, check `wbrf_supported_producer` to see if WBRF supported

The prefix should be acpi_wbrf_ or acpi_amd_wbrf_ even, so it is clear
that this uses ACPI and is AMD-specific.

Whether or not there needs to be an intermediate library wrapped
around this is a different matter.

Re: [PATCH v2 17/23] PM / QoS: Fix constraints alloc vs reclaim locking

2023-03-27 Thread Rafael J. Wysocki

On Mon, Mar 20, 2023 at 3:45 PM Rob Clark  wrote:
>
> From: Rob Clark 
>
> In the process of adding lockdep annotation for drm GPU scheduler's
> job_run() to detect potential deadlock against shrinker/reclaim, I hit
> this lockdep splat:
>
>==
>WARNING: possible circular locking dependency detected
>6.2.0-rc8-debug+ #558 Tainted: GW
>--
>ring0/125 is trying to acquire lock:
>ffd6d6ce0f28 (dev_pm_qos_mtx){+.+.}-{3:3}, at: 
> dev_pm_qos_update_request+0x38/0x68
>
>but task is already holding lock:
>ff8087239208 (>active_lock){+.+.}-{3:3}, at: 
> msm_gpu_submit+0xec/0x178
>
>which lock already depends on the new lock.
>
>the existing dependency chain (in reverse order) is:
>
>-> #4 (>active_lock){+.+.}-{3:3}:
>   __mutex_lock+0xcc/0x3c8
>   mutex_lock_nested+0x30/0x44
>   msm_gpu_submit+0xec/0x178
>   msm_job_run+0x78/0x150
>   drm_sched_main+0x290/0x370
>   kthread+0xf0/0x100
>   ret_from_fork+0x10/0x20
>
>-> #3 (dma_fence_map){}-{0:0}:
>   __dma_fence_might_wait+0x74/0xc0
>   dma_resv_lockdep+0x1f4/0x2f4
>   do_one_initcall+0x104/0x2bc
>   kernel_init_freeable+0x344/0x34c
>   kernel_init+0x30/0x134
>   ret_from_fork+0x10/0x20
>
>-> #2 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}:
>   fs_reclaim_acquire+0x80/0xa8
>   slab_pre_alloc_hook.constprop.0+0x40/0x25c
>   __kmem_cache_alloc_node+0x60/0x1cc
>   __kmalloc+0xd8/0x100
>   topology_parse_cpu_capacity+0x8c/0x178
>   get_cpu_for_node+0x88/0xc4
>   parse_cluster+0x1b0/0x28c
>   parse_cluster+0x8c/0x28c
>   init_cpu_topology+0x168/0x188
>   smp_prepare_cpus+0x24/0xf8
>   kernel_init_freeable+0x18c/0x34c
>   kernel_init+0x30/0x134
>   ret_from_fork+0x10/0x20
>
>-> #1 (fs_reclaim){+.+.}-{0:0}:
>   __fs_reclaim_acquire+0x3c/0x48
>   fs_reclaim_acquire+0x54/0xa8
>   slab_pre_alloc_hook.constprop.0+0x40/0x25c
>   __kmem_cache_alloc_node+0x60/0x1cc
>   kmalloc_trace+0x50/0xa8
>   dev_pm_qos_constraints_allocate+0x38/0x100
>   __dev_pm_qos_add_request+0xb0/0x1e8
>   dev_pm_qos_add_request+0x58/0x80
>   dev_pm_qos_expose_latency_limit+0x60/0x13c
>   register_cpu+0x12c/0x130
>   topology_init+0xac/0xbc
>   do_one_initcall+0x104/0x2bc
>   kernel_init_freeable+0x344/0x34c
>   kernel_init+0x30/0x134
>   ret_from_fork+0x10/0x20
>
>-> #0 (dev_pm_qos_mtx){+.+.}-{3:3}:
>   __lock_acquire+0xe00/0x1060
>   lock_acquire+0x1e0/0x2f8
>   __mutex_lock+0xcc/0x3c8
>   mutex_lock_nested+0x30/0x44
>   dev_pm_qos_update_request+0x38/0x68
>   msm_devfreq_boost+0x40/0x70
>   msm_devfreq_active+0xc0/0xf0
>   msm_gpu_submit+0x10c/0x178
>   msm_job_run+0x78/0x150
>   drm_sched_main+0x290/0x370
>   kthread+0xf0/0x100
>   ret_from_fork+0x10/0x20
>
>other info that might help us debug this:
>
>Chain exists of:
>  dev_pm_qos_mtx --> dma_fence_map --> >active_lock
>
> Possible unsafe locking scenario:
>
>   CPU0CPU1
>   
>  lock(>active_lock);
>   lock(dma_fence_map);
>   lock(>active_lock);
>  lock(dev_pm_qos_mtx);
>
> *** DEADLOCK ***
>
>3 locks held by ring0/123:
> #0: ff8087251170 (>lock){+.+.}-{3:3}, at: msm_job_run+0x64/0x150
> #1: ffd00b0e57e8 (dma_fence_map){}-{0:0}, at: 
> msm_job_run+0x68/0x150
> #2: ff8087251208 (>active_lock){+.+.}-{3:3}, at: 
> msm_gpu_submit+0xec/0x178
>
>stack backtrace:
>CPU: 6 PID: 123 Comm: ring0 Not tainted 6.2.0-rc8-debug+ #559
>Hardware name: Google Lazor (rev1 - 2) with LTE (DT)
>Call trace:
> dump_backtrace.part.0+0xb4/0xf8
> show_stack+0x20/0x38
> dump_stack_lvl+0x9c/0xd0
> dump_stack+0x18/0x34
> print_circular_bug+0x1b4/0x1f0
> check_noncircular+0x78/0xac
> __lock_acquire+0xe00/0x1060
> lock_acquire+0x1e0/0x2f8
> __mutex_lock+0xcc/0x3c8
> mutex_lock_nested+0x30/0x44
> dev_pm_qos_update_request+0x38/0x68
> msm_devfreq_boost+0x40/0x70
> msm_devfreq_active+0xc0/0xf0
> msm_gpu_submit+0x10c/0x178
> msm_job_run+0x78/0x150
> drm_sched_main+0x290/0x370
> kthread+0xf0/0x100
> ret_from_fork+0x10/0x20
>
> The issue is that dev_pm_qos_mtx is held in the runpm suspend/resume (or
> freq change) path, but it is also held across allocations that could
> recurse into shrinker.
>
> Solve this by changing dev_pm_qos_constraints_allocate() into a function
> that can be called unconditionally before

Re: [PATCH 10/13] PM / QoS: Teach lockdep about dev_pm_qos_mtx locking order

2023-03-13 Thread Rafael J. Wysocki

On Sun, Mar 12, 2023 at 9:42 PM Rob Clark  wrote:
>
> From: Rob Clark 
>
> Annotate dev_pm_qos_mtx to teach lockdep to scream about allocations
> that could trigger reclaim under dev_pm_qos_mtx.

So why is this needed?

> Signed-off-by: Rob Clark 
> ---
>  drivers/base/power/qos.c | 11 +++
>  1 file changed, 11 insertions(+)
>
> diff --git a/drivers/base/power/qos.c b/drivers/base/power/qos.c
> index 9cba334b3729..d4addda3944a 100644
> --- a/drivers/base/power/qos.c
> +++ b/drivers/base/power/qos.c
> @@ -1012,3 +1012,14 @@ void dev_pm_qos_hide_latency_tolerance(struct device 
> *dev)
> pm_runtime_put(dev);
>  }
>  EXPORT_SYMBOL_GPL(dev_pm_qos_hide_latency_tolerance);
> +
> +static int __init dev_pm_qos_init(void)
> +{
> +   /* Teach lockdep about lock ordering wrt. shrinker: */
> +   fs_reclaim_acquire(GFP_KERNEL);
> +   might_lock(_pm_qos_mtx);
> +   fs_reclaim_release(GFP_KERNEL);
> +
> +   return 0;
> +}
> +early_initcall(dev_pm_qos_init);
> --
> 2.39.2
>

Re: [PATCH 08/13] PM / QoS: Fix constraints alloc vs reclaim locking

2023-03-13 Thread Rafael J. Wysocki

On Sun, Mar 12, 2023 at 9:42 PM Rob Clark  wrote:
>
> From: Rob Clark 
>
> In the process of adding lockdep annotation for drm GPU scheduler's
> job_run() to detect potential deadlock against shrinker/reclaim, I hit
> this lockdep splat:
>
>==
>WARNING: possible circular locking dependency detected
>6.2.0-rc8-debug+ #558 Tainted: GW
>--
>ring0/125 is trying to acquire lock:
>ffd6d6ce0f28 (dev_pm_qos_mtx){+.+.}-{3:3}, at: 
> dev_pm_qos_update_request+0x38/0x68
>
>but task is already holding lock:
>ff8087239208 (>active_lock){+.+.}-{3:3}, at: 
> msm_gpu_submit+0xec/0x178
>
>which lock already depends on the new lock.
>
>the existing dependency chain (in reverse order) is:
>
>-> #4 (>active_lock){+.+.}-{3:3}:
>   __mutex_lock+0xcc/0x3c8
>   mutex_lock_nested+0x30/0x44
>   msm_gpu_submit+0xec/0x178
>   msm_job_run+0x78/0x150
>   drm_sched_main+0x290/0x370
>   kthread+0xf0/0x100
>   ret_from_fork+0x10/0x20
>
>-> #3 (dma_fence_map){}-{0:0}:
>   __dma_fence_might_wait+0x74/0xc0
>   dma_resv_lockdep+0x1f4/0x2f4
>   do_one_initcall+0x104/0x2bc
>   kernel_init_freeable+0x344/0x34c
>   kernel_init+0x30/0x134
>   ret_from_fork+0x10/0x20
>
>-> #2 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}:
>   fs_reclaim_acquire+0x80/0xa8
>   slab_pre_alloc_hook.constprop.0+0x40/0x25c
>   __kmem_cache_alloc_node+0x60/0x1cc
>   __kmalloc+0xd8/0x100
>   topology_parse_cpu_capacity+0x8c/0x178
>   get_cpu_for_node+0x88/0xc4
>   parse_cluster+0x1b0/0x28c
>   parse_cluster+0x8c/0x28c
>   init_cpu_topology+0x168/0x188
>   smp_prepare_cpus+0x24/0xf8
>   kernel_init_freeable+0x18c/0x34c
>   kernel_init+0x30/0x134
>   ret_from_fork+0x10/0x20
>
>-> #1 (fs_reclaim){+.+.}-{0:0}:
>   __fs_reclaim_acquire+0x3c/0x48
>   fs_reclaim_acquire+0x54/0xa8
>   slab_pre_alloc_hook.constprop.0+0x40/0x25c
>   __kmem_cache_alloc_node+0x60/0x1cc
>   kmalloc_trace+0x50/0xa8
>   dev_pm_qos_constraints_allocate+0x38/0x100
>   __dev_pm_qos_add_request+0xb0/0x1e8
>   dev_pm_qos_add_request+0x58/0x80
>   dev_pm_qos_expose_latency_limit+0x60/0x13c
>   register_cpu+0x12c/0x130
>   topology_init+0xac/0xbc
>   do_one_initcall+0x104/0x2bc
>   kernel_init_freeable+0x344/0x34c
>   kernel_init+0x30/0x134
>   ret_from_fork+0x10/0x20
>
>-> #0 (dev_pm_qos_mtx){+.+.}-{3:3}:
>   __lock_acquire+0xe00/0x1060
>   lock_acquire+0x1e0/0x2f8
>   __mutex_lock+0xcc/0x3c8
>   mutex_lock_nested+0x30/0x44
>   dev_pm_qos_update_request+0x38/0x68
>   msm_devfreq_boost+0x40/0x70
>   msm_devfreq_active+0xc0/0xf0
>   msm_gpu_submit+0x10c/0x178
>   msm_job_run+0x78/0x150
>   drm_sched_main+0x290/0x370
>   kthread+0xf0/0x100
>   ret_from_fork+0x10/0x20
>
>other info that might help us debug this:
>
>Chain exists of:
>  dev_pm_qos_mtx --> dma_fence_map --> >active_lock
>
> Possible unsafe locking scenario:
>
>   CPU0CPU1
>   
>  lock(>active_lock);
>   lock(dma_fence_map);
>   lock(>active_lock);
>  lock(dev_pm_qos_mtx);
>
> *** DEADLOCK ***
>
>3 locks held by ring0/123:
> #0: ff8087251170 (>lock){+.+.}-{3:3}, at: msm_job_run+0x64/0x150
> #1: ffd00b0e57e8 (dma_fence_map){}-{0:0}, at: 
> msm_job_run+0x68/0x150
> #2: ff8087251208 (>active_lock){+.+.}-{3:3}, at: 
> msm_gpu_submit+0xec/0x178
>
>stack backtrace:
>CPU: 6 PID: 123 Comm: ring0 Not tainted 6.2.0-rc8-debug+ #559
>Hardware name: Google Lazor (rev1 - 2) with LTE (DT)
>Call trace:
> dump_backtrace.part.0+0xb4/0xf8
> show_stack+0x20/0x38
> dump_stack_lvl+0x9c/0xd0
> dump_stack+0x18/0x34
> print_circular_bug+0x1b4/0x1f0
> check_noncircular+0x78/0xac
> __lock_acquire+0xe00/0x1060
> lock_acquire+0x1e0/0x2f8
> __mutex_lock+0xcc/0x3c8
> mutex_lock_nested+0x30/0x44
> dev_pm_qos_update_request+0x38/0x68
> msm_devfreq_boost+0x40/0x70
> msm_devfreq_active+0xc0/0xf0
> msm_gpu_submit+0x10c/0x178
> msm_job_run+0x78/0x150
> drm_sched_main+0x290/0x370
> kthread+0xf0/0x100
> ret_from_fork+0x10/0x20
>
> The issue is that dev_pm_qos_mtx is held in the runpm suspend/resume (or
> freq change) path, but it is also held across allocations that could
> recurse into shrinker.
>
> Solve this by changing dev_pm_qos_constraints_allocate() into a function
> that can be called unconditionally before

Re: [PATCH] ACPI: video: Add backlight=native DMI quirk for Asus U46E

2023-01-23 Thread Rafael J. Wysocki

On Thu, Jan 19, 2023 at 6:24 PM Hans de Goede  wrote:
>
> The Asus U46E backlight tables have a set of interesting problems:
>
> 1. Its ACPI tables do make _OSI ("Windows 2012") checks, so
>acpi_osi_is_win8() should return true.
>
>But the tables have 2 sets of _OSI calls, one from the usual global
>_INI method setting a global OSYS variable and a second set of _OSI
>calls from a MSOS method and the MSOS method is the only one calling
>_OSI ("Windows 2012").
>
>The MSOS method only gets called in the following cases:
>1. From some Asus specific WMI methods
>2. From _DOD, which only runs after acpi_video_get_backlight_type()
>   has already been called by the i915 driver
>3. From other ACPI video bus methods which never run (see below)
>4. From some EC query callbacks
>
>So when i915 calls acpi_video_get_backlight_type() MSOS has never run
>and acpi_osi_is_win8() returns false, so acpi_video_get_backlight_type()
>returns acpi_video as the desired backlight type, which causes
>the intel_backlight device to not register.
>
> 2. _DOD effectively does this:
>
> Return (Package (0x01)
> {
> 0x0400
> })
>
>causing acpi_video_device_in_dod() to return false, which causes
>the acpi_video backlight device to not register.
>
> Leaving the user with no backlight device at all. Note that before 6.1.y
> the i915 driver would register the intel_backlight device unconditionally
> and since that then was the only backlight device userspace would use that.
>
> Add a backlight=native DMI quirk for this special laptop to restore
> the old (and working) behavior of the intel_backlight device registering.
>
> Fixes: fb1836c91317 ("ACPI: video: Prefer native over vendor")
> Signed-off-by: Hans de Goede 
> ---
>  drivers/acpi/video_detect.c | 8 
>  1 file changed, 8 insertions(+)
>
> diff --git a/drivers/acpi/video_detect.c b/drivers/acpi/video_detect.c
> index aa6196e5e574..64eab35037c3 100644
> --- a/drivers/acpi/video_detect.c
> +++ b/drivers/acpi/video_detect.c
> @@ -610,6 +610,14 @@ static const struct dmi_system_id 
> video_detect_dmi_table[] = {
> DMI_MATCH(DMI_PRODUCT_NAME, "GA503"),
> },
> },
> +   {
> +.callback = video_detect_force_native,
> +/* Asus U46E */
> +.matches = {
> +   DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK Computer Inc."),
> +   DMI_MATCH(DMI_PRODUCT_NAME, "U46E"),
> +   },
> +   },
> {
>  .callback = video_detect_force_native,
>  /* Asus UX303UB */
> --

Applied as 6.2-rcf material, thanks!

Re: [PATCH 0/2] ACPI: video: More backlight quirks

2023-01-23 Thread Rafael J. Wysocki

On Thu, Jan 19, 2023 at 5:38 PM Hans de Goede  wrote:
>
> Hi Rafael,
>
> With the backlight changes landing in 6.1.y now showing up in
> distribution repositories I have been receiving a steady stream of
> backlight bug reports by email.
>
> These bug-reports fall into various categories and most of them are
> already fixed with some recent fixes which are in 6.1.7 and later.
>
> One category (unfortunately) requires adding DMI quirks.
>
> I have been receiving reports from users with pre Windows 8 laptops,
> who used to pass acpi_backlight=vendor on the kernel commandline to hide
> a non functioning acpi_video# backlight device, so that userspace will
> use the native (GPU driver) backlight device instead.
>
> Starting with 6.1.y acpi_backlight=vendor is now also honored by
> the native backlight drivers, hiding the native backlight device,
> leaving these users with no backlight device at all.
>
> This leads to them sending me a bug-report. Which in a way is a good
> thing because these models really needed to have a DMI quirk added
> all along, but this was never reported upstream.
>
> The fix here is to use "acpi_backlight=native" and to set this through
> a DMI quirk so that things will work out of the box.
>
> The Acer Aspire 4810T quirk from a couple of days was like this and
> the first quirk in this series is too.
>
> I expect to receive more bug-reports like this, so you can expect
> a steady trickle of backlight quirk patches from me the coming time.
>
> Note the second quirk in this series is also a "acpi_backlight=native"
> quirk, but the root cause is somewhat different, see the commit msg.
>
> Regards,
>
> Hans
>
>
>
> Hans de Goede (2):
>   ACPI: video: Add backlight=native DMI quirk for HP Pavilion g6-1d80nr
>   ACPI: video: Add backlight=native DMI quirk for HP EliteBook 8460p
>
>  drivers/acpi/video_detect.c | 17 +
>  1 file changed, 17 insertions(+)
>
> --

Both applied as 6.2-rc material, thanks!

Re: [PATCH] ACPI: video: Add backlight=native DMI quirk for Acer Aspire 4810T

2023-01-17 Thread Rafael J. Wysocki

On Fri, Jan 13, 2023 at 12:41 PM Hans de Goede  wrote:
>
> The Acer Aspire 4810T predates Windows 8, so it defaults to using
> acpi_video# for backlight control, but this is non functional on
> this model.
>
> Add a DMI quirk to use the native backlight interface which does
> work properly.
>
> Signed-off-by: Hans de Goede 
> ---
>  drivers/acpi/video_detect.c | 8 
>  1 file changed, 8 insertions(+)
>
> diff --git a/drivers/acpi/video_detect.c b/drivers/acpi/video_detect.c
> index d4edd64dcc2f..fb526ba8825b 100644
> --- a/drivers/acpi/video_detect.c
> +++ b/drivers/acpi/video_detect.c
> @@ -515,6 +515,14 @@ static const struct dmi_system_id 
> video_detect_dmi_table[] = {
> DMI_MATCH(DMI_PRODUCT_NAME, "Precision 7510"),
> },
> },
> +   {
> +.callback = video_detect_force_native,
> +/* Acer Aspire 4810T */
> +.matches = {
> +   DMI_MATCH(DMI_SYS_VENDOR, "Acer"),
> +   DMI_MATCH(DMI_PRODUCT_NAME, "Aspire 4810T"),
> +   },
> +   },
> {
>  .callback = video_detect_force_native,
>  /* Acer Aspire 5738z */
> --

Applied as 6.2-rc material, thanks!

Re: [PATCH v2] ACPI: Fix selecting the wrong ACPI fwnode for the iGPU on some Dell laptops

2023-01-11 Thread Rafael J. Wysocki

On Wed, Jan 11, 2023 at 9:23 PM Hans de Goede  wrote:
>
> Hi,
>
> On 1/11/23 21:16, Rafael J. Wysocki wrote:
> > On Tue, Jan 10, 2023 at 4:30 PM Hans de Goede  wrote:
> >>
> >> The Dell Latitude E6430 both with and without the optional NVidia dGPU
> >> has a bug in its ACPI tables which is causing Linux to assign the wrong
> >> ACPI fwnode / companion to the pci_device for the i915 iGPU.
> >>
> >> Specifically under the PCI root bridge there are these 2 ACPI Device()s :
> >>
> >>  Scope (_SB.PCI0)
> >>  {
> >>  Device (GFX0)
> >>  {
> >>  Name (_ADR, 0x0002)  // _ADR: Address
> >>  }
> >>
> >>  ...
> >>
> >>  Device (VID)
> >>  {
> >>  Name (_ADR, 0x0002)  // _ADR: Address
> >>  ...
> >>
> >>  Method (_DOS, 1, NotSerialized)  // _DOS: Disable Output Switching
> >>  {
> >>  VDP8 = Arg0
> >>  VDP1 (One, VDP8)
> >>  }
> >>
> >>  Method (_DOD, 0, NotSerialized)  // _DOD: Display Output Devices
> >>  {
> >>  ...
> >>  }
> >>  ...
> >>  }
> >>  }
> >>
> >> The non-functional GFX0 ACPI device is a problem, because this gets
> >> returned as ACPI companion-device by acpi_find_child_device() for the iGPU.
> >>
> >> This is a long standing problem and the i915 driver does use the ACPI
> >> companion for some things, but works fine without it.
> >>
> >> However since commit 63f534b8bad9 ("ACPI: PCI: Rework acpi_get_pci_dev()")
> >> acpi_get_pci_dev() relies on the physical-node pointer in the acpi_device
> >> and that is set on the wrong acpi_device because of the wrong
> >> acpi_find_child_device() return. This breaks the ACPI video code,
> >> leading to non working backlight control in some cases.
> >>
> >> Add a type.backlight flag, mark ACPI video bus devices with this and make
> >> find_child_checks() return a higher score for children with this flag set,
> >> so that it picks the right companion-device.
> >>
> >> Co-developed-by: Rafael J. Wysocki 
> >> Signed-off-by: Hans de Goede 
> >> ---
> >> Changes in v2:
> >> - Switch to Rafael's suggested implementation using a type.backlight flag
> >>   and only make find_child_checks() return a higher score when this is set
> >> ---
> >>  drivers/acpi/glue.c | 14 --
> >>  drivers/acpi/scan.c |  7 +--
> >>  include/acpi/acpi_bus.h |  3 ++-
> >>  3 files changed, 19 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/drivers/acpi/glue.c b/drivers/acpi/glue.c
> >> index 204fe94c7e45..a194f30876c5 100644
> >> --- a/drivers/acpi/glue.c
> >> +++ b/drivers/acpi/glue.c
> >> @@ -75,7 +75,8 @@ static struct acpi_bus_type *acpi_get_bus_type(struct 
> >> device *dev)
> >>  }
> >>
> >>  #define FIND_CHILD_MIN_SCORE   1
> >> -#define FIND_CHILD_MAX_SCORE   2
> >> +#define FIND_CHILD_MID_SCORE   2
> >> +#define FIND_CHILD_MAX_SCORE   3
> >>
> >>  static int match_any(struct acpi_device *adev, void *not_used)
> >>  {
> >> @@ -96,8 +97,17 @@ static int find_child_checks(struct acpi_device *adev, 
> >> bool check_children)
> >> return -ENODEV;
> >>
> >> status = acpi_evaluate_integer(adev->handle, "_STA", NULL, );
> >> -   if (status == AE_NOT_FOUND)
> >> +   if (status == AE_NOT_FOUND) {
> >> +   /*
> >> +* Special case: backlight device objects without _STA are
> >> +* preferred to other objects with the same _ADR value, 
> >> because
> >> +* it is more likely that they are actually useful.
> >> +*/
> >> +   if (adev->pnp.type.backlight)
> >> +   return FIND_CHILD_MID_SCORE;
> >> +
> >> return FIND_CHILD_MIN_SCORE;
> >> +   }
> >>
> >> if (ACPI_FAILURE(status) || !(sta & ACPI_STA_DEVICE_ENABLED))
> >> return -ENODEV;
> >> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> >> index 274344434282..0c6f06abe3f4 100644
> >> --- a/drivers/acpi/scan.c
> >> +++ b/drivers/ac

Re: [PATCH v2] ACPI: Fix selecting the wrong ACPI fwnode for the iGPU on some Dell laptops

2023-01-11 Thread Rafael J. Wysocki

On Tue, Jan 10, 2023 at 4:30 PM Hans de Goede  wrote:
>
> The Dell Latitude E6430 both with and without the optional NVidia dGPU
> has a bug in its ACPI tables which is causing Linux to assign the wrong
> ACPI fwnode / companion to the pci_device for the i915 iGPU.
>
> Specifically under the PCI root bridge there are these 2 ACPI Device()s :
>
>  Scope (_SB.PCI0)
>  {
>  Device (GFX0)
>  {
>  Name (_ADR, 0x0002)  // _ADR: Address
>  }
>
>  ...
>
>  Device (VID)
>  {
>  Name (_ADR, 0x0002)  // _ADR: Address
>  ...
>
>  Method (_DOS, 1, NotSerialized)  // _DOS: Disable Output Switching
>  {
>  VDP8 = Arg0
>  VDP1 (One, VDP8)
>  }
>
>  Method (_DOD, 0, NotSerialized)  // _DOD: Display Output Devices
>  {
>  ...
>  }
>  ...
>  }
>  }
>
> The non-functional GFX0 ACPI device is a problem, because this gets
> returned as ACPI companion-device by acpi_find_child_device() for the iGPU.
>
> This is a long standing problem and the i915 driver does use the ACPI
> companion for some things, but works fine without it.
>
> However since commit 63f534b8bad9 ("ACPI: PCI: Rework acpi_get_pci_dev()")
> acpi_get_pci_dev() relies on the physical-node pointer in the acpi_device
> and that is set on the wrong acpi_device because of the wrong
> acpi_find_child_device() return. This breaks the ACPI video code,
> leading to non working backlight control in some cases.
>
> Add a type.backlight flag, mark ACPI video bus devices with this and make
> find_child_checks() return a higher score for children with this flag set,
> so that it picks the right companion-device.
>
> Co-developed-by: Rafael J. Wysocki 
> Signed-off-by: Hans de Goede 
> ---
> Changes in v2:
> - Switch to Rafael's suggested implementation using a type.backlight flag
>   and only make find_child_checks() return a higher score when this is set
> ---
>  drivers/acpi/glue.c | 14 --
>  drivers/acpi/scan.c |  7 +--
>  include/acpi/acpi_bus.h |  3 ++-
>  3 files changed, 19 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/acpi/glue.c b/drivers/acpi/glue.c
> index 204fe94c7e45..a194f30876c5 100644
> --- a/drivers/acpi/glue.c
> +++ b/drivers/acpi/glue.c
> @@ -75,7 +75,8 @@ static struct acpi_bus_type *acpi_get_bus_type(struct 
> device *dev)
>  }
>
>  #define FIND_CHILD_MIN_SCORE   1
> -#define FIND_CHILD_MAX_SCORE   2
> +#define FIND_CHILD_MID_SCORE   2
> +#define FIND_CHILD_MAX_SCORE   3
>
>  static int match_any(struct acpi_device *adev, void *not_used)
>  {
> @@ -96,8 +97,17 @@ static int find_child_checks(struct acpi_device *adev, 
> bool check_children)
> return -ENODEV;
>
> status = acpi_evaluate_integer(adev->handle, "_STA", NULL, );
> -   if (status == AE_NOT_FOUND)
> +   if (status == AE_NOT_FOUND) {
> +   /*
> +* Special case: backlight device objects without _STA are
> +* preferred to other objects with the same _ADR value, 
> because
> +* it is more likely that they are actually useful.
> +*/
> +   if (adev->pnp.type.backlight)
> +   return FIND_CHILD_MID_SCORE;
> +
> return FIND_CHILD_MIN_SCORE;
> +   }
>
> if (ACPI_FAILURE(status) || !(sta & ACPI_STA_DEVICE_ENABLED))
> return -ENODEV;
> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> index 274344434282..0c6f06abe3f4 100644
> --- a/drivers/acpi/scan.c
> +++ b/drivers/acpi/scan.c
> @@ -1370,9 +1370,12 @@ static void acpi_set_pnp_ids(acpi_handle handle, 
> struct acpi_device_pnp *pnp,
>  * Some devices don't reliably have _HIDs & _CIDs, so add
>  * synthetic HIDs to make sure drivers can find them.
>  */
> -   if (acpi_is_video_device(handle))
> +   if (acpi_is_video_device(handle)) {
> acpi_add_id(pnp, ACPI_VIDEO_HID);
> -   else if (acpi_bay_match(handle))
> +   pnp->type.backlight = 1;
> +   break;
> +   }
> +   if (acpi_bay_match(handle))
> acpi_add_id(pnp, ACPI_BAY_HID);
> else if (acpi_dock_match(handle))
> acpi_add_id(pnp, ACPI_DOCK_HID);
> diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
> index cd3b75e08ec3..e44be31115a6 100644
> --- a/include/acpi/acpi_bus.h
> +++ b/include/acpi/acpi_bus.h
> @@ -230,7 +230,8 @@ struct acpi_pnp_type {
> u32 hardware_id:1;
> u32 bus_address:1;
> u32 platform_id:1;
> -   u32 reserved:29;
> +   u32 backlight:1;
> +   u32 reserved:28;
>  };
>
>  struct acpi_device_pnp {
> --

Applied as 6.2-rc material, thanks!

Re: [PATCH] ACPI: Fix selecting the wrong ACPI fwnode for the iGPU on some Dell laptops

2023-01-10 Thread Rafael J. Wysocki

On Monday, January 9, 2023 9:57:21 PM CET Hans de Goede wrote:
> The Dell Latitude E6430 both with and without the optional NVidia dGPU
> has a bug in its ACPI tables which is causing Linux to assign the wrong
> ACPI fwnode / companion to the pci_device for the i915 iGPU.
> 
> Specifically under the PCI root bridge there are these 2 ACPI Device()s :
> 
>  Scope (_SB.PCI0)
>  {
>  Device (GFX0)
>  {
>  Name (_ADR, 0x0002)  // _ADR: Address
>  }
> 
>  ...
> 
>  Device (VID)
>  {
>  Name (_ADR, 0x0002)  // _ADR: Address
>  ...
> 
>  Method (_DOS, 1, NotSerialized)  // _DOS: Disable Output Switching
>  {
>  VDP8 = Arg0
>  VDP1 (One, VDP8)
>  }
> 
>  Method (_DOD, 0, NotSerialized)  // _DOD: Display Output Devices
>  {
>  ...
>  }
>  ...
>  }
>  }
> 
> The non-functional GFX0 ACPI device is a problem, because this gets
> returned as ACPI companion-device by acpi_find_child_device() for the iGPU.
> 
> This is a long standing problem and the i915 driver does use the ACPI
> companion for some things, but works fine without it.
> 
> However since commit 63f534b8bad9 ("ACPI: PCI: Rework acpi_get_pci_dev()")
> acpi_get_pci_dev() relies on the physical-node pointer in the acpi_device
> and that is set on the wrong acpi_device because of the wrong
> acpi_find_child_device() return. This breaks the ACPI video code, leading
> to non working backlight control in some cases.

Interesting.  Sorry for the trouble.

> Make find_child_checks() return a higher score for children which have
> pnp-ids set by various scan helpers like acpi_is_video_device(), so
> that it picks the right companion-device.

This has a potential of changing the behavior in some cases that are not
relevant here which is generally risky.

> An alternative approach would be to directly call acpi_is_video_device()
> from find_child_checks() but that would be somewhat computationally
> expensive given that acpi_find_child_device() iterates over all the
> PCI0 children every time it is called.

I agree with the above, but my fix would be something like the patch below (not
really tested, but it builds).

---
 drivers/acpi/glue.c |   14 --
 drivers/acpi/scan.c |7 +--
 include/acpi/acpi_bus.h |3 ++-
 3 files changed, 19 insertions(+), 5 deletions(-)

Index: linux-pm/include/acpi/acpi_bus.h
===
--- linux-pm.orig/include/acpi/acpi_bus.h
+++ linux-pm/include/acpi/acpi_bus.h
@@ -230,7 +230,8 @@ struct acpi_pnp_type {
u32 hardware_id:1;
u32 bus_address:1;
u32 platform_id:1;
-   u32 reserved:29;
+   u32 backlight:1;
+   u32 reserved:28;
 };
 
 struct acpi_device_pnp {
Index: linux-pm/drivers/acpi/scan.c
===
--- linux-pm.orig/drivers/acpi/scan.c
+++ linux-pm/drivers/acpi/scan.c
@@ -1370,9 +1370,12 @@ static void acpi_set_pnp_ids(acpi_handle
 * Some devices don't reliably have _HIDs & _CIDs, so add
 * synthetic HIDs to make sure drivers can find them.
 */
-   if (acpi_is_video_device(handle))
+   if (acpi_is_video_device(handle)) {
acpi_add_id(pnp, ACPI_VIDEO_HID);
-   else if (acpi_bay_match(handle))
+   pnp->type.backlight = 1;
+   break;
+   }
+   if (acpi_bay_match(handle))
acpi_add_id(pnp, ACPI_BAY_HID);
else if (acpi_dock_match(handle))
acpi_add_id(pnp, ACPI_DOCK_HID);
Index: linux-pm/drivers/acpi/glue.c
===
--- linux-pm.orig/drivers/acpi/glue.c
+++ linux-pm/drivers/acpi/glue.c
@@ -75,7 +75,8 @@ static struct acpi_bus_type *acpi_get_bu
 }
 
 #define FIND_CHILD_MIN_SCORE   1
-#define FIND_CHILD_MAX_SCORE   2
+#define FIND_CHILD_MID_SCORE   2
+#define FIND_CHILD_MAX_SCORE   3
 
 static int match_any(struct acpi_device *adev, void *not_used)
 {
@@ -96,8 +97,17 @@ static int find_child_checks(struct acpi
return -ENODEV;
 
status = acpi_evaluate_integer(adev->handle, "_STA", NULL, );
-   if (status == AE_NOT_FOUND)
+   if (status == AE_NOT_FOUND) {
+   /*
+* Special case: backlight device objects without _STA are
+* preferred to other objects with the same _ADR value, because
+* it is more likely that they are actually useful.
+*/
+   if (adev->pnp.type.backlight)
+   return FIND_CHILD_MID_SCORE;
+
return FIND_CHILD_MIN_SCORE;
+   }
 
if (ACPI_FAILURE(status) || !(sta & ACPI_STA_DEVICE_ENABLED))
return -ENODEV;

Re: [PATCH 3/5] kobject: kset_uevent_ops: make filter() callback take a const *

2022-11-21 Thread Rafael J. Wysocki

On Mon, Nov 21, 2022 at 10:47 AM Greg Kroah-Hartman
 wrote:
>
> The filter() callback in struct kset_uevent_ops does not modify the
> kobject passed into it, so make the pointer const to enforce this
> restriction.  When doing so, fix up all existing filter() callbacks to
> have the correct signature to preserve the build.
>
> Cc: "Rafael J. Wysocki" 
> Cc: Sumit Semwal 
> Cc: "Christian König" 
> Cc: linux-me...@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linaro-mm-...@lists.linaro.org
> Signed-off-by: Greg Kroah-Hartman 

Acked-by: Rafael J. Wysocki 

> ---
>  drivers/base/bus.c| 2 +-
>  drivers/base/core.c   | 4 ++--
>  drivers/dma-buf/dma-buf-sysfs-stats.c | 2 +-
>  include/linux/kobject.h   | 2 +-
>  kernel/params.c   | 2 +-
>  5 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/base/bus.c b/drivers/base/bus.c
> index 7ca47e5b3c1f..4ec6dbab73be 100644
> --- a/drivers/base/bus.c
> +++ b/drivers/base/bus.c
> @@ -163,7 +163,7 @@ static struct kobj_type bus_ktype = {
> .release= bus_release,
>  };
>
> -static int bus_uevent_filter(struct kobject *kobj)
> +static int bus_uevent_filter(const struct kobject *kobj)
>  {
> const struct kobj_type *ktype = get_ktype(kobj);
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index a79b99ecf4d8..005a2b092f3e 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -2362,12 +2362,12 @@ static struct kobj_type device_ktype = {
>  };
>
>
> -static int dev_uevent_filter(struct kobject *kobj)
> +static int dev_uevent_filter(const struct kobject *kobj)
>  {
> const struct kobj_type *ktype = get_ktype(kobj);
>
> if (ktype == _ktype) {
> -   struct device *dev = kobj_to_dev(kobj);
> +   const struct device *dev = kobj_to_dev(kobj);
> if (dev->bus)
> return 1;
> if (dev->class)
> diff --git a/drivers/dma-buf/dma-buf-sysfs-stats.c 
> b/drivers/dma-buf/dma-buf-sysfs-stats.c
> index 2bba0babcb62..f69d68122b9b 100644
> --- a/drivers/dma-buf/dma-buf-sysfs-stats.c
> +++ b/drivers/dma-buf/dma-buf-sysfs-stats.c
> @@ -132,7 +132,7 @@ void dma_buf_stats_teardown(struct dma_buf *dmabuf)
>
>
>  /* Statistics files do not need to send uevents. */
> -static int dmabuf_sysfs_uevent_filter(struct kobject *kobj)
> +static int dmabuf_sysfs_uevent_filter(const struct kobject *kobj)
>  {
> return 0;
>  }
> diff --git a/include/linux/kobject.h b/include/linux/kobject.h
> index 5a2d58e10bf5..640f59d4b3de 100644
> --- a/include/linux/kobject.h
> +++ b/include/linux/kobject.h
> @@ -135,7 +135,7 @@ struct kobj_uevent_env {
>  };
>
>  struct kset_uevent_ops {
> -   int (* const filter)(struct kobject *kobj);
> +   int (* const filter)(const struct kobject *kobj);
> const char *(* const name)(struct kobject *kobj);
> int (* const uevent)(struct kobject *kobj, struct kobj_uevent_env 
> *env);
>  };
> diff --git a/kernel/params.c b/kernel/params.c
> index 5b92310425c5..d2237209ceda 100644
> --- a/kernel/params.c
> +++ b/kernel/params.c
> @@ -926,7 +926,7 @@ static const struct sysfs_ops module_sysfs_ops = {
> .store = module_attr_store,
>  };
>
> -static int uevent_filter(struct kobject *kobj)
> +static int uevent_filter(const struct kobject *kobj)
>  {
> const struct kobj_type *ktype = get_ktype(kobj);
>
> --
> 2.38.1
>

Re: [PATCH] ACPICA: Fix return

2022-11-08 Thread Rafael J. Wysocki

On Tue, Nov 8, 2022 at 12:48 PM  wrote:
>
> return is not a function, parentheses are not required
>
> Signed-off-by: KaiLong Wang 

ACPICA material is to be submitted to the upstream project at GitHub
(please see MAINTAINERS for the link).

You may notice, however, that your changes do not align with the
coding style there.

Moreover, the patch contains non-ACPICA changes that are not mentioned
in the changelog.

> ---
>  drivers/acpi/acpica/evsci.c | 12 +---
>  drivers/gpu/drm/amd/display/dc/core/dc_stream.c | 17 +++--
>  2 files changed, 12 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/acpi/acpica/evsci.c b/drivers/acpi/acpica/evsci.c
> index 3915ff61412b..63dd2aa2d16a 100644
> --- a/drivers/acpi/acpica/evsci.c
> +++ b/drivers/acpi/acpica/evsci.c
> @@ -38,9 +38,8 @@ u32 acpi_ev_sci_dispatch(void)
>
> /* Are there any host-installed SCI handlers? */
>
> -   if (!acpi_gbl_sci_handler_list) {
> -   return (int_status);
> -   }
> +   if (!acpi_gbl_sci_handler_list)
> +   return int_status;
>
> flags = acpi_os_acquire_lock(acpi_gbl_gpe_lock);
>
> @@ -57,7 +56,7 @@ u32 acpi_ev_sci_dispatch(void)
> }
>
> acpi_os_release_lock(acpi_gbl_gpe_lock, flags);
> -   return (int_status);
> +   return int_status;
>  }
>
>  
> /***
> @@ -193,9 +192,8 @@ acpi_status acpi_ev_remove_all_sci_handlers(void)
> acpi_os_remove_interrupt_handler((u32) 
> acpi_gbl_FADT.sci_interrupt,
>  acpi_ev_sci_xrupt_handler);
>
> -   if (!acpi_gbl_sci_handler_list) {
> -   return (status);
> -   }
> +   if (!acpi_gbl_sci_handler_list)
> +   return status;
>
> flags = acpi_os_acquire_lock(acpi_gbl_gpe_lock);
>
> diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_stream.c 
> b/drivers/gpu/drm/amd/display/dc/core/dc_stream.c
> index 38d71b5c1f2d..1a20117b 100644
> --- a/drivers/gpu/drm/amd/display/dc/core/dc_stream.c
> +++ b/drivers/gpu/drm/amd/display/dc/core/dc_stream.c
> @@ -29,7 +29,6 @@
>  #include "core_types.h"
>  #include "resource.h"
>  #include "ipp.h"
> -#include "timing_generator.h"
>  #include "dc_dmub_srv.h"
>
>  #define DC_LOGGER dc->ctx->logger
> @@ -152,9 +151,8 @@ static void dc_stream_free(struct kref *kref)
>
>  void dc_stream_release(struct dc_stream_state *stream)
>  {
> -   if (stream != NULL) {
> +   if (stream != NULL)
> kref_put(>refcount, dc_stream_free);
> -   }
>  }
>
>  struct dc_stream_state *dc_create_stream_for_sink(
> @@ -316,11 +314,11 @@ bool dc_stream_set_cursor_attributes(
> struct dc  *dc;
> bool reset_idle_optimizations = false;
>
> -   if (NULL == stream) {
> +   if (stream == NULL) {
> dm_error("DC: dc_stream is NULL!\n");
> return false;
> }
> -   if (NULL == attributes) {
> +   if (attributes == NULL) {
> dm_error("DC: attributes is NULL!\n");
> return false;
> }
> @@ -399,12 +397,12 @@ bool dc_stream_set_cursor_position(
> struct dc  *dc = stream->ctx->dc;
> bool reset_idle_optimizations = false;
>
> -   if (NULL == stream) {
> +   if (stream == NULL) {
> dm_error("DC: dc_stream is NULL!\n");
> return false;
> }
>
> -   if (NULL == position) {
> +   if (position == NULL) {
> dm_error("DC: cursor position is NULL!\n");
> return false;
> }
> @@ -468,9 +466,8 @@ bool dc_stream_add_writeback(struct dc *dc,
> }
> }
>
> -   if (!isDrc) {
> +   if (!isDrc)
> stream->writeback_info[stream->num_wb_info++] = *wb_info;
> -   }
>
> if (dc->hwss.enable_writeback) {
> struct dc_stream_status *stream_status = 
> dc_stream_get_status(stream);
> @@ -526,7 +523,7 @@ bool dc_stream_remove_writeback(struct dc *dc,
> /* remove writeback info for disabled writeback pipes from stream */
> for (i = 0, j = 0; i < stream->num_wb_info; i++) {
> if (stream->writeback_info[i].wb_enabled) {
> -   if (j < i)
> +   if (i != j)
> /* trim the array */
> stream->writeback_info[j] = 
> stream->writeback_info[i];
> j++;
> --
> 2.36.1

Re: [PATCH v5 02/31] drm/i915: Don't register backlight when another backlight should be used (v2)

2022-10-27 Thread Rafael J. Wysocki

On Thu, Oct 27, 2022 at 2:17 PM Hans de Goede  wrote:
>
> Hi,
>
> On 10/27/22 14:09, Rafael J. Wysocki wrote:
> > On Thu, Oct 27, 2022 at 12:37 PM Hans de Goede  wrote:
> >>
> >> Hi,
> >>
> >> On 10/27/22 11:52, Matthew Garrett wrote:
> >>> On Thu, Oct 27, 2022 at 11:39:38AM +0200, Hans de Goede wrote:
> >>>
> >>>> The *only* behavior which actually is new in 6.1 is the native GPU
> >>>> drivers now doing the equivalent of:
> >>>>
> >>>>  if (acpi_video_get_backlight_type() != acpi_backlight_native)
> >>>>  return;
> >>>>
> >>>> In their backlight register paths (i), which is causing the native
> >>>> backlight to disappear on your custom laptop setup and on Chromebooks
> >>>> (with the Chromebooks case being already solved I hope.).
> >>>
> >>> It's causing the backlight control to vanish on any machine that isn't
> >>> ((acpi_video || vendor interface) || !acpi). Most machines that fall
> >>> into that are either weird or Chromebooks or old, but there are machines
> >>> that fall into that.
> >>
> >> I acknowledge that their are machines that fall into this category,
> >> but I expect / hope there to be so few of them that we can just DMI
> >> quirk our way out if this.
> >>
> >> I believe the old group to be small because:
> >>
> >> 1. Generally speaking the "native" control method is usually not
> >> present on the really old (pre ACPI video spec) mobile GPUs.
> >>
> >> 2. On most old laptops I would still expect there to be a vendor
> >> interface too, and if both get registered standard desktop environments
> >> will prefer the vendor one, so then we need a native DMI quirk to
> >> disable the vendor interface anyways and we already have a bunch of
> >> those, so some laptops in this group are already covered by DMI quirks.
> >>
> >> And a fix for the Chromebook case is already in Linus' tree, which
> >> just leaves the weird case, of which there will hopefully be only
> >> a few.
> >>
> >> I do share your worry that this might break some machines, but
> >> the only way to really find out is to get this code out there
> >> I'm afraid.
> >>
> >> I have just written a blog post asking for people to check if
> >> their laptop might be affected; and to report various details
> >> to me of their laptop is affected:
> >>
> >> https://hansdegoede.dreamwidth.org/26548.html
> >>
> >> Lets wait and see how this goes. If I get (too) many reports then
> >> I will send a revert of the addition of the:
> >>
> >> if (acpi_video_get_backlight_type() != acpi_backlight_native)
> >> return;
> >>
> >> check to the i915 / radeon / amd / nouveau drivers.
> >>
> >> (And if I only get a couple of reports I will probably just submit
> >> DMI quirks for the affected models).
> >
> > Sounds reasonable to me, FWIW.
> >
> > And IIUC the check above can be overridden by passing
> > acpi_backlight=native in the kernel command line, right?
>
> Right, that can be used as a quick workaround, but we really do
> want this to work OOTB everywhere.

Sure.

My point is that if it doesn't work OOTB for someone, and say it used
to, they can use the above workaround and report back.

Re: [PATCH v5 02/31] drm/i915: Don't register backlight when another backlight should be used (v2)

2022-10-27 Thread Rafael J. Wysocki

On Thu, Oct 27, 2022 at 12:37 PM Hans de Goede  wrote:
>
> Hi,
>
> On 10/27/22 11:52, Matthew Garrett wrote:
> > On Thu, Oct 27, 2022 at 11:39:38AM +0200, Hans de Goede wrote:
> >
> >> The *only* behavior which actually is new in 6.1 is the native GPU
> >> drivers now doing the equivalent of:
> >>
> >>  if (acpi_video_get_backlight_type() != acpi_backlight_native)
> >>  return;
> >>
> >> In their backlight register paths (i), which is causing the native
> >> backlight to disappear on your custom laptop setup and on Chromebooks
> >> (with the Chromebooks case being already solved I hope.).
> >
> > It's causing the backlight control to vanish on any machine that isn't
> > ((acpi_video || vendor interface) || !acpi). Most machines that fall
> > into that are either weird or Chromebooks or old, but there are machines
> > that fall into that.
>
> I acknowledge that their are machines that fall into this category,
> but I expect / hope there to be so few of them that we can just DMI
> quirk our way out if this.
>
> I believe the old group to be small because:
>
> 1. Generally speaking the "native" control method is usually not
> present on the really old (pre ACPI video spec) mobile GPUs.
>
> 2. On most old laptops I would still expect there to be a vendor
> interface too, and if both get registered standard desktop environments
> will prefer the vendor one, so then we need a native DMI quirk to
> disable the vendor interface anyways and we already have a bunch of
> those, so some laptops in this group are already covered by DMI quirks.
>
> And a fix for the Chromebook case is already in Linus' tree, which
> just leaves the weird case, of which there will hopefully be only
> a few.
>
> I do share your worry that this might break some machines, but
> the only way to really find out is to get this code out there
> I'm afraid.
>
> I have just written a blog post asking for people to check if
> their laptop might be affected; and to report various details
> to me of their laptop is affected:
>
> https://hansdegoede.dreamwidth.org/26548.html
>
> Lets wait and see how this goes. If I get (too) many reports then
> I will send a revert of the addition of the:
>
> if (acpi_video_get_backlight_type() != acpi_backlight_native)
> return;
>
> check to the i915 / radeon / amd / nouveau drivers.
>
> (And if I only get a couple of reports I will probably just submit
> DMI quirks for the affected models).

Sounds reasonable to me, FWIW.

And IIUC the check above can be overridden by passing
acpi_backlight=native in the kernel command line, right?

Re: [PATCH v2] ACPI: video: Fix missing native backlight on Chromebooks

2022-10-24 Thread Rafael J. Wysocki

On Mon, Oct 24, 2022 at 4:32 PM Hans de Goede  wrote:
>
> Hi,
>
> On 10/24/22 16:12, Dmitry Osipenko wrote:
> > Chromebooks don't have backlight in ACPI table, they suppose to use
> > native backlight in this case. Check presence of the CrOS embedded
> > controller ACPI device and prefer the native backlight if EC found.
> >
> > Suggested-by: Hans de Goede 
> > Fixes: 2600bfa3df99 ("ACPI: video: Add acpi_video_backlight_use_native() 
> > helper")
> > Signed-off-by: Dmitry Osipenko 
> > ---
> >
> > Changelog:
> >
> > v2: - Added explanatory comment to the code and added check for the
> >   native backlight presence, like was requested by Hans de Goede.
>
> Thanks this version looks good to me:
>
> Reviewed-by: Hans de Goede 
>
> Rafael, can you pick this up and send it in a fixes pull-req
> for 6.1 to Linus? Or shall I pick this one up and include it
> in my next pull-req?

It would be better if you could pick this up IMV, so please free to add

Acled-by: Rafael J. Wysocki 

to it.

Thanks!

> >
> >  drivers/acpi/video_detect.c | 12 
> >  1 file changed, 12 insertions(+)
> >
> > diff --git a/drivers/acpi/video_detect.c b/drivers/acpi/video_detect.c
> > index 0d9064a9804c..9cd8797d12bb 100644
> > --- a/drivers/acpi/video_detect.c
> > +++ b/drivers/acpi/video_detect.c
> > @@ -668,6 +668,11 @@ static const struct dmi_system_id 
> > video_detect_dmi_table[] = {
> >   { },
> >  };
> >
> > +static bool google_cros_ec_present(void)
> > +{
> > + return acpi_dev_found("GOOG0004");
> > +}
> > +
> >  /*
> >   * Determine which type of backlight interface to use on this system,
> >   * First check cmdline, then dmi quirks, then do autodetect.
> > @@ -730,6 +735,13 @@ static enum acpi_backlight_type 
> > __acpi_video_get_backlight_type(bool native)
> >   return acpi_backlight_video;
> >   }
> >
> > + /*
> > +  * Chromebooks that don't have backlight handle in ACPI table
> > +  * are supposed to use native backlight if it's available.
> > +  */
> > + if (google_cros_ec_present() && native_available)
> > + return acpi_backlight_native;
> > +
> >   /* No ACPI video (old hw), use vendor specific fw methods. */
> >   return acpi_backlight_vendor;
> >  }
>

Re: [PATCH] ACPI: PCI: Fix device reference counting in acpi_get_pci_dev()

2022-10-19 Thread Rafael J. Wysocki

On Wed, Oct 19, 2022 at 2:22 PM Ville Syrjälä
 wrote:
>
> On Wed, Oct 19, 2022 at 01:35:26PM +0200, Rafael J. Wysocki wrote:
> > On Wed, Oct 19, 2022 at 11:02 AM Ville Syrjälä
> >  wrote:
> > >
> > > On Tue, Oct 18, 2022 at 07:34:03PM +0200, Rafael J. Wysocki wrote:
> > > > From: Rafael J. Wysocki 
> > > >
> > > > Commit 63f534b8bad9 ("ACPI: PCI: Rework acpi_get_pci_dev()") failed
> > > > to reference count the device returned by acpi_get_pci_dev() as
> > > > expected by its callers which in some cases may cause device objects
> > > > to be dropped prematurely.
> > > >
> > > > Add the missing get_device() to acpi_get_pci_dev().
> > > >
> > > > Fixes: 63f534b8bad9 ("ACPI: PCI: Rework acpi_get_pci_dev()")
> > >
> > > FYI this (and the rtc-cmos regression discussed in
> > > https://lore.kernel.org/linux-acpi/5887691.lOV4Wx5bFT@kreacher/)
> > > took down the entire Intel gfx CI.
> >
> > Sorry for the disturbance.
> >
> > > I've applied both fixes into our fixup branch and things are looking much
> > > healthier now.
> >
> > Thanks for letting me know.
> >
> > I've just added the $subject patch to my linux-next branch as an
> > urgent fix and the other one has been applied to the RTC tree.
> >
> > > This one caused i915 selftests to eat a lot of POISON_FREE
> > > in the CI. While bisecting it locally I didn't have
> > > poisoning enabled so I got refcount_t undeflows instead.
> >
> > Unfortunately, making no mistakes is generally hard to offer.
> >
> > If catching things like this early is better, what about pulling my
> > bleeding-edge branch, where all of my changes are staged before going
> > into linux-next, into the CI?
>
> Pretty sure we don't have the resources to become the CI for
> everyone. So testing random trees is not really possible. And
> the alternative of pulling random trees into drm-tip is probably
> a not a popular idea either. We used to pull in the sound tree
> since it's pretty closely tied to graphics, but I think we
> stopped even that because it eneded up pulling the whole of
> -rc1 in at random points in time when we were't expecting it.

I see.

> Ideally each subsystem would have its own CI, or there should
> be some kernel wide thing. But I suppose the progress towards
> something like that is glacial.

Well, I definitely cannot afford a dedicated CI just for my tree and I
haven't got any useful information from KernlCI yet (even though hey
pull and test my linux-next branch on a regular basis).

KernelCI seems to be focusing on different set of hardware, so to speak.

> That said, we do test linux-next to some degree. And looks like
> at least one of these could have been caught a bit earlier through
> that. Unfortunately no one is really keeping an eye on that so
> things tend to slip through. Probably need to figure out something
> to make better use of that.

I think it could also be possible to contribute to KernelCI to get
more useful x86 coverage from it.

Re: [PATCH v2] PM: runtime: Return properly from rpm_resume() if dev->power.needs_force_resume flag is set

2022-09-27 Thread Rafael J. Wysocki

On Tue, Sep 27, 2022 at 9:47 AM Liu Ying  wrote:
>
> On Mon, 2022-09-26 at 11:47 +0200, Ulf Hansson wrote:
> > On Fri, 23 Sept 2022 at 17:23, Liu Ying  wrote:
> > > On Fri, 2022-09-23 at 15:48 +0200, Ulf Hansson wrote:
> > > > On Fri, 23 Sept 2022 at 14:47, Liu Ying  wrote:
> > > > > After a device transitions to sleep state through it's system
> > > > > suspend
> > > > > callback pm_runtime_force_suspend(), the device's driver may still
> > > > > try
> > > > > to do runtime PM for the device(runtime suspend first and then
> > > > > runtime
> > > > > resume) although runtime PM is disabled by that callback.  The
> > > > > runtime
> > > > > PM operations would not touch the device effectively and the device
> > > > > is
> > > > > assumed to be resumed through it's system resume callback
> > > > > pm_runtime_force_resume().
> > > >
> > > > This sounds like a fragile use case to me. In principle you want to
> > > > allow the device to be runtime resumed/suspended, after the device
> > > > has
> > > > already been put into a low power state through the regular system
> > > > suspend callback. Normally it seems better to prevent this from
> > > > happening, completely.
> > >
> > > Not sure if we really may prevent this from happening completely.
> > >
> > > > That said, in this case, I wonder if a better option would be to
> > > > point
> > > > ->suspend_late() to pm_runtime_force_suspend() and ->resume_early()
> > > > to
> > > > pm_runtime_force_resume(), rather than using the regular
> > > > ->suspend|resume() callbacks. This should avoid the problem, I think,
> > > > no?
> > >
> > > I thought about this and it actually works for my particular
> > > panel-simple case.  What worries me is that the device(DRM device in my
> > > case) which triggers the runtime PM operations may also use
> > > ->suspend_late/resume_early() callbacks for whatever reasons, hence no
> > > fixed order to suspend/resume the two devices(like panel device and DRM
> > > device).
> > >
> > > Also, not sure if there is any sequence issue by using the
> > > ->suspend_late/resume_early() callbacks in the panel-simple driver,
> > > since it's written for quite a few display panels which may work with
> > > various DRM devices - don't want to break any of them.
> >
> > What you are describing here, is the classical problem we have with
> > suspend/resume ordering of devices.
> >
> > There are in principle two ways to solve this.
> > 1. If it makes sense, the devices might be assigned as parent/child.
> > 2. If it's more a consumer/supplier thing, we can add a device-link
> > between them.
>
> I thought about the two ways for my particular panel-simple case and
> the first impression is that it's not straightforward to use them. For
> DSI panels(with DRM_MODE_CONNECTOR_DSI connector type), it looks like
> panel device's parent is DSI host device(set in mipi_dsi_device_alloc()
> ). For other types of panels, like DPI panels, many show up in device
> tree as child-node of root node and connect a display controller or a
> display bridge through OF graph.  Seems that DRM architecture level
> lacks some sort of glue code to use the two ways.

Well, apparently, the ordering of power management operations
regarding the components in question cannot be arbitrary, but without
any information on the correct ordering in place, there is no way to
guarantee that ordering in every possible code path.  Addressing one
of them is generally insufficient and you will see problems sooner or
later.

[PATCH] drm: amd: amdgpu: ACPI: Add comment about ACPI_FADT_LOW_POWER_S0

2022-08-25 Thread Rafael J. Wysocki

From: Rafael J. Wysocki 

According to the ACPI specification [1], the ACPI_FADT_LOW_POWER_S0
flag merely means that it is better to use low-power S0 idle on the
given platform than S3 (provided that the latter is supported) and it
doesn't preclude using either of them (which of them will be used
depends on the choices made by user space).

However, on some systems that flag is used to indicate whether or not
to enable special firmware mechanics allowing the system to save more
energy when suspended to idle.  If that flag is unset, doing so is
generally risky.

Accordingly, add a comment to explain the ACPI_FADT_LOW_POWER_S0 check
in amdgpu_acpi_is_s0ix_active(), the purpose of which is otherwise
somewhat unclear.

Link: 
https://uefi.org/specs/ACPI/6.4/05_ACPI_Software_Programming_Model/ACPI_Software_Programming_Model.html#fixed-acpi-description-table-fadt
 # [1]
Signed-off-by: Rafael J. Wysocki 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c |6 ++
 1 file changed, 6 insertions(+)

Index: linux-pm/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
===
--- linux-pm.orig/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
+++ linux-pm/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
@@ -1066,6 +1066,12 @@ bool amdgpu_acpi_is_s0ix_active(struct a
(pm_suspend_target_state != PM_SUSPEND_TO_IDLE))
return false;
 
+   /*
+* If ACPI_FADT_LOW_POWER_S0 is not set in the FADT, it is generally
+* risky to do any special firmware-related preparations for entering
+* S0ix even though the system is suspending to idle, so return false
+* in that case.
+*/
if (!(acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0)) {
dev_warn_once(adev->dev,
  "Power consumption will be higher as BIOS has not 
been configured for suspend-to-idle.\n"

Re: [PATCH v2 00/29] drm/kms: Stop registering multiple /sys/class/backlight devs for a single display

2022-07-14 Thread Rafael J. Wysocki

On Tue, Jul 12, 2022 at 9:39 PM Hans de Goede  wrote:
>
> Hi All,
>
> As mentioned in my RFC titled "drm/kms: control display brightness through
> drm_connector properties":
> https://lore.kernel.org/dri-devel/0d188965-d809-81b5-74ce-7d30c49fe...@redhat.com/
>
> The first step towards this is to deal with some existing technical debt
> in backlight handling on x86/ACPI boards, specifically we need to stop
> registering multiple /sys/class/backlight devs for a single display.
>
> This series implements my RFC describing my plan for these cleanups:
> https://lore.kernel.org/dri-devel/98519ba0-7f18-201a-ea34-652f50343...@redhat.com/
>
> This new version addresses the few small remarks made on version 1 (mainly
> changing patch 1/29) and more importantly this finishes the refactoring by
> else addressing all the bits from the "Other issues" section of
> the refactor RFC (resulting in patches 15-29 which are new in v2).
>
> Please review and test! I hope to be able to make an immutable branch
> based on 5.20-rc1 + this series available for merging into the various
> touched subsystems once 5.20-rc2 is out.

Please feel free to add

Acked-by: Rafael J. Wysocki 

to all of the ACPI video patches in this series.

Thanks!

> Hans de Goede (29):
>   ACPI: video: Add acpi_video_backlight_use_native() helper
>   drm/i915: Don't register backlight when another backlight should be
> used
>   drm/amdgpu: Don't register backlight when another backlight should be
> used
>   drm/radeon: Don't register backlight when another backlight should be
> used
>   drm/nouveau: Don't register backlight when another backlight should be
> used
>   ACPI: video: Drop backlight_device_get_by_type() call from
> acpi_video_get_backlight_type()
>   ACPI: video: Remove acpi_video_bus from list before tearing it down
>   ACPI: video: Simplify acpi_video_unregister_backlight()
>   ACPI: video: Make backlight class device registration a separate step
>   ACPI: video: Remove code to unregister acpi_video backlight when a
> native backlight registers
>   drm/i915: Call acpi_video_register_backlight() (v2)
>   drm/nouveau: Register ACPI video backlight when nv_backlight
> registration fails
>   drm/amdgpu: Register ACPI video backlight when skipping amdgpu
> backlight registration
>   drm/radeon: Register ACPI video backlight when skipping radeon
> backlight registration
>   ACPI: video: Refactor acpi_video_get_backlight_type() a bit
>   ACPI: video: Add Nvidia WMI EC brightness control detection
>   ACPI: video: Add Apple GMUX brightness control detection
>   platform/x86: apple-gmux: Stop calling acpi/video.h functions
>   platform/x86: toshiba_acpi: Stop using
> acpi_video_set_dmi_backlight_type()
>   platform/x86: acer-wmi: Move backlight DMI quirks to
> acpi/video_detect.c
>   platform/x86: asus-wmi: Drop DMI chassis-type check from backlight
> handling
>   platform/x86: asus-wmi: Move acpi_backlight=vendor quirks to ACPI
> video_detect.c
>   platform/x86: asus-wmi: Move acpi_backlight=native quirks to ACPI
> video_detect.c
>   platform/x86: samsung-laptop: Move acpi_backlight=[vendor|native]
> quirks to ACPI video_detect.c
>   ACPI: video: Remove acpi_video_set_dmi_backlight_type()
>   ACPI: video: Drop "Samsung X360" acpi_backlight=native quirk
>   ACPI: video: Drop Clevo/TUXEDO NL5xRU and NL5xNU acpi_backlight=native
> quirks
>   ACPI: video: Fix indentation of video_detect_dmi_table[] entries
>   drm/todo: Add entry about dealing with brightness control on devices
> with > 1 panel
>
>  Documentation/gpu/todo.rst|  68 +++
>  drivers/acpi/Kconfig  |   1 +
>  drivers/acpi/acpi_video.c |  59 ++-
>  drivers/acpi/video_detect.c   | 415 +++---
>  drivers/gpu/drm/Kconfig   |  12 +
>  .../gpu/drm/amd/amdgpu/atombios_encoders.c|  14 +-
>  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |   9 +
>  drivers/gpu/drm/gma500/Kconfig|   2 +
>  drivers/gpu/drm/i915/Kconfig  |   2 +
>  .../gpu/drm/i915/display/intel_backlight.c|   7 +
>  drivers/gpu/drm/i915/display/intel_display.c  |   8 +
>  drivers/gpu/drm/i915/display/intel_panel.c|   3 +
>  drivers/gpu/drm/i915/i915_drv.h   |   2 +
>  drivers/gpu/drm/nouveau/nouveau_backlight.c   |  14 +
>  drivers/gpu/drm/radeon/atombios_encoders.c|   7 +
>  drivers/gpu/drm/radeon/radeon_encoders.c  |  11 +-
>  .../gpu/drm/radeon/radeon_legacy_encoders.c   |   7 +
>  drivers/platform/x86/acer-wmi.c   |  66 ---
>  drivers/platform/x86/apple-gmux.c |   3 -
>  dr

Re: [PATCH][next] treewide: Replace zero-length arrays with flexible-array members

2022-02-16 Thread Rafael J. Wysocki

On Tue, Feb 15, 2022 at 8:24 PM Gustavo A. R. Silva
 wrote:
>
> On Tue, Feb 15, 2022 at 09:19:29PM +0200, Leon Romanovsky wrote:
> > On Tue, Feb 15, 2022 at 01:21:10PM -0600, Gustavo A. R. Silva wrote:
> > > On Tue, Feb 15, 2022 at 10:17:40AM -0800, Kees Cook wrote:
> > > > On Tue, Feb 15, 2022 at 11:47:43AM -0600, Gustavo A. R. Silva wrote:
> > > >
> > > > These all look trivially correct to me. Only two didn't have the end of
> > > > the struct visible in the patch, and checking those showed them to be
> > > > trailing members as well, so:
> > > >
> > > > Reviewed-by: Kees Cook 
> > >
> > > I'll add this to my -next tree.
> >
> > I would like to ask you to send mlx5 patch separately to netdev. We are 
> > working
> > to delete that file completely and prefer to avoid from unnecessary merge 
> > conflicts.
>
> Oh OK. Sure thing; I will do so.

Can you also send the ACPI patch separately, please?

We would like to route it through the upstream ACPICA code base.

Re: acpi_get_devices() crash when acpi_disabled==true (was [PATCH v2] drm/privacy-screen: honor acpi=off in detect_thinkpad_privacy_screen)

2022-01-27 Thread Rafael J. Wysocki

On Thu, Jan 27, 2022 at 2:05 PM Hans de Goede  wrote:
>
> Hi,
>
> On 1/26/22 18:11, Rafael J. Wysocki wrote:
> > On Wed, Jan 26, 2022 at 5:41 PM Hans de Goede  wrote:
> >>
> >> Hi,
> >>
> >> On 1/26/22 16:54, Rafael J. Wysocki wrote:
> >>> On Wed, Jan 26, 2022 at 2:47 PM Hans de Goede  wrote:
> >>>>
> >>>> Hi All,
> >>>>
> >>>> On 1/23/22 10:10, Tong Zhang wrote:
> >>>>> when acpi=off is provided in bootarg, kernel crash with
> >>>>>
> >>>>> [1.252739] BUG: kernel NULL pointer dereference, address: 
> >>>>> 0018
> >>>>> [1.258308] Call Trace:
> >>>>> [1.258490]  ? acpi_walk_namespace+0x147/0x147
> >>>>> [1.258770]  acpi_get_devices+0xe4/0x137
> >>>>> [1.258921]  ? drm_core_init+0xc0/0xc0 [drm]
> >>>>> [1.259108]  detect_thinkpad_privacy_screen+0x5e/0xa8 [drm]
> >>>>> [1.259337]  drm_privacy_screen_lookup_init+0xe/0xe85 [drm]
> >>>>>
> >>>>> The reason is that acpi_walk_namespace expects acpi related stuff
> >>>>> initialized but in fact it wouldn't when acpi is set to off. In this 
> >>>>> case
> >>>>> we should honor acpi=off in detect_thinkpad_privacy_screen().
> >>>>>
> >>>>> Signed-off-by: Tong Zhang 
> >>>>
> >>>> Thank you for catching this and thank you for your patch. I was about to 
> >>>> merge
> >>>> this, but then I realized that this might not be the best way to fix 
> >>>> this.
> >>>>
> >>>> A quick grep shows 10 acpi_get_devices() calls outside of drivers/acpi,
> >>>> and at a first glance about half of those are missing an acpi_disabled
> >>>> check. IMHO it would be better to simply add an acpi_disabled check to
> >>>> acpi_get_devices() itself.
> >>>>
> >>>> Rafael, do you agree ?
> >>>
> >>> Yes, I do.
> >>
> >> Did you see my follow-up that that is not going to work because
> >> acpi_get_devices() is an acpica function ?
> >
> > No, I didn't, but it is possible to add a wrapper doing the check
> > around it and convert all of the users.
>
> Yes I did think about that. Note that I've gone ahead and pushed
> the fix which started this to drm-misc-fixes, to resolve the crash
> for now.

OK

> If we add such a wrapper we can remove a bunch of acpi_disabled checks
> from various callers.
>
> > Alternatively, the ACPICA function can check acpi_gbl_root_node
> > against NULL, like in the attached (untested) patch.
>
> That is probably an even better idea, as that avoids the need
> for a wrapper altogether. So I believe that that is the best
> solution.

Allright, let me cut an analogous patch for the upstream ACPICA, then.

Re: acpi_get_devices() crash when acpi_disabled==true (was [PATCH v2] drm/privacy-screen: honor acpi=off in detect_thinkpad_privacy_screen)

2022-01-26 Thread Rafael J. Wysocki

On Wed, Jan 26, 2022 at 5:41 PM Hans de Goede  wrote:
>
> Hi,
>
> On 1/26/22 16:54, Rafael J. Wysocki wrote:
> > On Wed, Jan 26, 2022 at 2:47 PM Hans de Goede  wrote:
> >>
> >> Hi All,
> >>
> >> On 1/23/22 10:10, Tong Zhang wrote:
> >>> when acpi=off is provided in bootarg, kernel crash with
> >>>
> >>> [1.252739] BUG: kernel NULL pointer dereference, address: 
> >>> 0018
> >>> [1.258308] Call Trace:
> >>> [1.258490]  ? acpi_walk_namespace+0x147/0x147
> >>> [1.258770]  acpi_get_devices+0xe4/0x137
> >>> [1.258921]  ? drm_core_init+0xc0/0xc0 [drm]
> >>> [1.259108]  detect_thinkpad_privacy_screen+0x5e/0xa8 [drm]
> >>> [1.259337]  drm_privacy_screen_lookup_init+0xe/0xe85 [drm]
> >>>
> >>> The reason is that acpi_walk_namespace expects acpi related stuff
> >>> initialized but in fact it wouldn't when acpi is set to off. In this case
> >>> we should honor acpi=off in detect_thinkpad_privacy_screen().
> >>>
> >>> Signed-off-by: Tong Zhang 
> >>
> >> Thank you for catching this and thank you for your patch. I was about to 
> >> merge
> >> this, but then I realized that this might not be the best way to fix this.
> >>
> >> A quick grep shows 10 acpi_get_devices() calls outside of drivers/acpi,
> >> and at a first glance about half of those are missing an acpi_disabled
> >> check. IMHO it would be better to simply add an acpi_disabled check to
> >> acpi_get_devices() itself.
> >>
> >> Rafael, do you agree ?
> >
> > Yes, I do.
>
> Did you see my follow-up that that is not going to work because
> acpi_get_devices() is an acpica function ?

No, I didn't, but it is possible to add a wrapper doing the check
around it and convert all of the users.

Alternatively, the ACPICA function can check acpi_gbl_root_node
against NULL, like in the attached (untested) patch.
---
 drivers/acpi/acpica/nswalk.c |3 +++
 1 file changed, 3 insertions(+)

Index: linux-pm/drivers/acpi/acpica/nswalk.c
===
--- linux-pm.orig/drivers/acpi/acpica/nswalk.c
+++ linux-pm/drivers/acpi/acpica/nswalk.c
@@ -169,6 +169,9 @@ acpi_ns_walk_namespace(acpi_object_type
 
 	if (start_node == ACPI_ROOT_OBJECT) {
 		start_node = acpi_gbl_root_node;
+		if (!start_node) {
+			return_ACPI_STATUS(AE_NO_NAMESPACE);
+		}
 	}
 
 	/* Null child means "get first node" */

Re: acpi_get_devices() crash when acpi_disabled==true (was [PATCH v2] drm/privacy-screen: honor acpi=off in detect_thinkpad_privacy_screen)

2022-01-26 Thread Rafael J. Wysocki

On Wed, Jan 26, 2022 at 2:47 PM Hans de Goede  wrote:
>
> Hi All,
>
> On 1/23/22 10:10, Tong Zhang wrote:
> > when acpi=off is provided in bootarg, kernel crash with
> >
> > [1.252739] BUG: kernel NULL pointer dereference, address: 
> > 0018
> > [1.258308] Call Trace:
> > [1.258490]  ? acpi_walk_namespace+0x147/0x147
> > [1.258770]  acpi_get_devices+0xe4/0x137
> > [1.258921]  ? drm_core_init+0xc0/0xc0 [drm]
> > [1.259108]  detect_thinkpad_privacy_screen+0x5e/0xa8 [drm]
> > [1.259337]  drm_privacy_screen_lookup_init+0xe/0xe85 [drm]
> >
> > The reason is that acpi_walk_namespace expects acpi related stuff
> > initialized but in fact it wouldn't when acpi is set to off. In this case
> > we should honor acpi=off in detect_thinkpad_privacy_screen().
> >
> > Signed-off-by: Tong Zhang 
>
> Thank you for catching this and thank you for your patch. I was about to merge
> this, but then I realized that this might not be the best way to fix this.
>
> A quick grep shows 10 acpi_get_devices() calls outside of drivers/acpi,
> and at a first glance about half of those are missing an acpi_disabled
> check. IMHO it would be better to simply add an acpi_disabled check to
> acpi_get_devices() itself.
>
> Rafael, do you agree ?

Yes, I do.

> Note the just added chrome privacy-screen check uses
> acpi_dev_present(), this is also used in about 10 places outside
> of drivers/acpi and AFAIK none of those do an acpi_disabled check.
>
> acpi_dev_present() uses bus_find_device(_bus_type, ...)
> but the acpi_bus_type does not get registered when acpi_disabled
> is set. In the end this is fine though since bus_find_device
> checks for the bus not being registered and then just returns
> NULL.

Right.

> > ---
> > v2: fix typo in previous commit -- my keyboard is eating letters
> >
> >  drivers/gpu/drm/drm_privacy_screen_x86.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/drm_privacy_screen_x86.c 
> > b/drivers/gpu/drm/drm_privacy_screen_x86.c
> > index a2cafb294ca6..e7aa74ad0b24 100644
> > --- a/drivers/gpu/drm/drm_privacy_screen_x86.c
> > +++ b/drivers/gpu/drm/drm_privacy_screen_x86.c
> > @@ -33,6 +33,9 @@ static bool __init detect_thinkpad_privacy_screen(void)
> >   unsigned long long output;
> >   acpi_status status;
> >
> > + if (acpi_disabled)
> > + return false;
> > +
> >   /* Get embedded-controller handle */
> >   status = acpi_get_devices("PNP0C09", acpi_set_handle, NULL, 
> > _handle);
> >   if (ACPI_FAILURE(status) || !ec_handle)
>

Re: Regression report on laptop suspend

2021-12-27 Thread Rafael J. Wysocki

CC Daniel, Thomas and dri-devel.

On Mon, Dec 27, 2021 at 5:32 PM  wrote:
>
> Hello
>
> I've noticed my laptop totally freeze when going to hibernation.
> The git bisect log is appended below.
> Please note however that even the previous good commit was "good" (ie : 
> laptop managed to suspend and resume), the system was unstable and froze few 
> minutes later.

So the breakage need not be related to the first bad commit.

Have you tried to revert that commit?  If so, has it helped?

> Hardware specs: AMD Ryzen 5 4600H with Vega graphics + Nvidia 1650Ti (unused)
> Software: Slackware 14.2 / X.org.
>
> Seems to be related to drm stuff.
> I've issued bugzilla https://bugzilla.kernel.org/show_bug.cgi?id=215427
>
> Thanks
>
> git bisect start
> # good: [8bb7eca972ad531c9b149c0a51ab43a417385813] Linux 5.15
> git bisect good 8bb7eca972ad531c9b149c0a51ab43a417385813
> # bad: [a7904a538933c525096ca2ccde1e60d0ee62c08e] Linux 5.16-rc6
> git bisect bad a7904a538933c525096ca2ccde1e60d0ee62c08e
> # bad: [43e1b12927276cde8052122a24ff796649f09d60] Merge tag 'for_linus' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
> git bisect bad 43e1b12927276cde8052122a24ff796649f09d60
> # good: [fc02cb2b37fe2cbf1d3334b9f0f0eab9431766c4] Merge tag 
> 'net-next-for-5.16' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
> git bisect good fc02cb2b37fe2cbf1d3334b9f0f0eab9431766c4
> # bad: [d9bd054177fbd2c4762546aec40fc3071bfe4cc0] Merge tag 
> 'amd-drm-next-5.16-2021-10-29' of https://gitlab.freedesktop.org/agd5f/linux 
> into drm-next
> git bisect bad d9bd054177fbd2c4762546aec40fc3071bfe4cc0
> # skip: [797d72ce8e0f8fa8a808cb189b5411046432cfd3] Merge tag 
> 'drm-misc-next-2021-10-06' of git://anongit.freedesktop.org/drm/drm-misc into 
> drm-next
> git bisect skip 797d72ce8e0f8fa8a808cb189b5411046432cfd3
> # skip: [bf72ca73aaa6629568cb9b0761be6efdd02a2591] drm/amd/display: [FW 
> Promotion] Release 0.0.85
> git bisect skip bf72ca73aaa6629568cb9b0761be6efdd02a2591
> # good: [bc41f059a080e487c235b539f1e5cdbf605aba9f] drm/i915/dp: fix DG1 and 
> RKL max source rates
> git bisect good bc41f059a080e487c235b539f1e5cdbf605aba9f
> # skip: [58144d283712c9e80e528e001af6ac5aeee71af2] drm/amdgpu: unify BO 
> evicting method in amdgpu_ttm
> git bisect skip 58144d283712c9e80e528e001af6ac5aeee71af2
> # skip: [a5b51a9f8523a0b88ce7e8e8059f75a43c34c57f] drm/i915/gt: add 
> asm/cacheflush.h for use of clflush()
> git bisect skip a5b51a9f8523a0b88ce7e8e8059f75a43c34c57f
> # skip: [40348baedfbc6500e7a090c7da1d55b6c94c334f] drm/amd/display: fix 
> duplicated inclusion
> git bisect skip 40348baedfbc6500e7a090c7da1d55b6c94c334f
> # skip: [7547675b84bf452542463db29adb113cadb7dd6d] drm/virtio: implement 
> context init: track {ring_idx, emit_fence_info} in virtio_gpu_fence
> git bisect skip 7547675b84bf452542463db29adb113cadb7dd6d
> # good: [f01ee019586220c86f238263a4fbde6e72085e11] drm/amd/display: Add DP 
> 2.0 SST DC Support
> git bisect good f01ee019586220c86f238263a4fbde6e72085e11
> # good: [f3ede209d44d71636890a78fa89c5b1c83340320] drm/i915/pci: rename 
> functions to have i915_pci prefix
> git bisect good f3ede209d44d71636890a78fa89c5b1c83340320
> # skip: [4fb530e5caf7cb666948db65f245b350ce520436] drm/virtio: implement 
> context init: support init ioctl
> git bisect skip 4fb530e5caf7cb666948db65f245b350ce520436
> # good: [c7c4dfb6fe704ae3cce1a8f438db75b1a0a9061f] drm/i915/display: Some 
> code improvements and code style fixes for DRRS
> git bisect good c7c4dfb6fe704ae3cce1a8f438db75b1a0a9061f
> # skip: [7a28bee067d524c1b8770aa72a82263eb9fc53f0] drm/amd/display: Disable 
> dpp root clock when not being used
> git bisect skip 7a28bee067d524c1b8770aa72a82263eb9fc53f0
> # good: [5b116c17e6babc6de2e26714bc66228c74038b71] drm/i915/guc: Drop pin 
> count check trick between sched_disable and re-pin
> git bisect good 5b116c17e6babc6de2e26714bc66228c74038b71
> # skip: [9878844094703fbae1c3b301c9bb71253a30efe7] drm/amdgpu: drive all vega 
> asics from the IP discovery table
> git bisect skip 9878844094703fbae1c3b301c9bb71253a30efe7
> # skip: [7194dc998dfffca096c30b3cd39625158608992d] drm/i915/tc: Fix TypeC 
> port init/resume time sanitization
> git bisect skip 7194dc998dfffca096c30b3cd39625158608992d
> # skip: [5c3720be7d46581181782f5cf9585b532feed947] drm/amdgpu: get VCN and 
> SDMA instances from IP discovery table
> git bisect skip 5c3720be7d46581181782f5cf9585b532feed947
> # skip: [a53f2c035e9832d20775d2c66c71495f2dc27699] drm/panfrost: Calculate 
> lock region size correctly
> git bisect skip a53f2c035e9832d20775d2c66c71495f2dc27699
> # skip: [d04287d062a4198ec0bf0112db03618f65d7428a] drm/amdgpu: During s0ix 
> don't wait to signal GFXOFF
> git bisect skip d04287d062a4198ec0bf0112db03618f65d7428a
> # skip: [9ced12182d0d8401d821e9602e56e276459900fc] drm/i915: Catch yet 
> another unconditioal clflush
> git bisect skip 9ced12182d0d8401d821e9602e56e276459900fc
> # skip: [dac3c405b9aedee301d0634b4e275b81f0d74363]

Re: [PATCH v2 12/63] thermal: intel: int340x_thermal: Use struct_group() for memcpy() region

2021-11-24 Thread Rafael J. Wysocki

On Wed, Nov 24, 2021 at 12:53 AM Srinivas Pandruvada
 wrote:
>
> On Tue, 2021-11-23 at 14:19 +0100, Rafael J. Wysocki wrote:
> > On Wed, Aug 18, 2021 at 8:08 AM Kees Cook 
> > wrote:
> > >
> > > In preparation for FORTIFY_SOURCE performing compile-time and run-
> > > time
> > > field bounds checking for memcpy(), avoid intentionally writing
> > > across
> > > neighboring fields.
> > >
> > > Use struct_group() in struct art around members weight, and ac[0-
> > > 9]_max,
> > > so they can be referenced together. This will allow memcpy() and
> > > sizeof()
> > > to more easily reason about sizes, improve readability, and avoid
> > > future
> > > warnings about writing beyond the end of weight.
> > >
> > > "pahole" shows no size nor member offset changes to struct art.
> > > "objdump -d" shows no meaningful object code changes (i.e. only
> > > source
> > > line number induced differences).
> > >
> > > Cc: Zhang Rui 
> > > Cc: Daniel Lezcano 
> > > Cc: Amit Kucheria 
> > > Cc: linux...@vger.kernel.org
> > > Signed-off-by: Kees Cook 
> >
> > Rui, Srinivas, any comments here?
> Looks good.
> Reviewed-by: Srinivas Pandruvada 

Applied as 5.17 material, thank you!

Re: [PATCH v2 12/63] thermal: intel: int340x_thermal: Use struct_group() for memcpy() region

2021-11-23 Thread Rafael J. Wysocki

On Wed, Aug 18, 2021 at 8:08 AM Kees Cook  wrote:
>
> In preparation for FORTIFY_SOURCE performing compile-time and run-time
> field bounds checking for memcpy(), avoid intentionally writing across
> neighboring fields.
>
> Use struct_group() in struct art around members weight, and ac[0-9]_max,
> so they can be referenced together. This will allow memcpy() and sizeof()
> to more easily reason about sizes, improve readability, and avoid future
> warnings about writing beyond the end of weight.
>
> "pahole" shows no size nor member offset changes to struct art.
> "objdump -d" shows no meaningful object code changes (i.e. only source
> line number induced differences).
>
> Cc: Zhang Rui 
> Cc: Daniel Lezcano 
> Cc: Amit Kucheria 
> Cc: linux...@vger.kernel.org
> Signed-off-by: Kees Cook 

Rui, Srinivas, any comments here?

> ---
>  .../intel/int340x_thermal/acpi_thermal_rel.c  |  5 +-
>  .../intel/int340x_thermal/acpi_thermal_rel.h  | 48 ++-
>  2 files changed, 29 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c 
> b/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c
> index a478cff8162a..e90690a234c4 100644
> --- a/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c
> +++ b/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c
> @@ -250,8 +250,9 @@ static int fill_art(char __user *ubuf)
> get_single_name(arts[i].source, art_user[i].source_device);
> get_single_name(arts[i].target, art_user[i].target_device);
> /* copy the rest int data in addition to source and target */
> -   memcpy(_user[i].weight, [i].weight,
> -   sizeof(u64) * (ACPI_NR_ART_ELEMENTS - 2));
> +   BUILD_BUG_ON(sizeof(art_user[i].data) !=
> +sizeof(u64) * (ACPI_NR_ART_ELEMENTS - 2));
> +   memcpy(_user[i].data, [i].data, 
> sizeof(art_user[i].data));
> }
>
> if (copy_to_user(ubuf, art_user, art_len))
> diff --git a/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.h 
> b/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.h
> index 58822575fd54..78d942477035 100644
> --- a/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.h
> +++ b/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.h
> @@ -17,17 +17,19 @@
>  struct art {
> acpi_handle source;
> acpi_handle target;
> -   u64 weight;
> -   u64 ac0_max;
> -   u64 ac1_max;
> -   u64 ac2_max;
> -   u64 ac3_max;
> -   u64 ac4_max;
> -   u64 ac5_max;
> -   u64 ac6_max;
> -   u64 ac7_max;
> -   u64 ac8_max;
> -   u64 ac9_max;
> +   struct_group(data,
> +   u64 weight;
> +   u64 ac0_max;
> +   u64 ac1_max;
> +   u64 ac2_max;
> +   u64 ac3_max;
> +   u64 ac4_max;
> +   u64 ac5_max;
> +   u64 ac6_max;
> +   u64 ac7_max;
> +   u64 ac8_max;
> +   u64 ac9_max;
> +   );
>  } __packed;
>
>  struct trt {
> @@ -47,17 +49,19 @@ union art_object {
> struct {
> char source_device[8]; /* ACPI single name */
> char target_device[8]; /* ACPI single name */
> -   u64 weight;
> -   u64 ac0_max_level;
> -   u64 ac1_max_level;
> -   u64 ac2_max_level;
> -   u64 ac3_max_level;
> -   u64 ac4_max_level;
> -   u64 ac5_max_level;
> -   u64 ac6_max_level;
> -   u64 ac7_max_level;
> -   u64 ac8_max_level;
> -   u64 ac9_max_level;
> +   struct_group(data,
> +   u64 weight;
> +   u64 ac0_max_level;
> +   u64 ac1_max_level;
> +   u64 ac2_max_level;
> +   u64 ac3_max_level;
> +   u64 ac4_max_level;
> +   u64 ac5_max_level;
> +   u64 ac6_max_level;
> +   u64 ac7_max_level;
> +   u64 ac8_max_level;
> +   u64 ac9_max_level;
> +   );
> };
> u64 __data[ACPI_NR_ART_ELEMENTS];
>  };
> --
> 2.30.2
>

Re: [PATCH v14 20/39] pwm: tegra: Add runtime PM and OPP support

2021-10-29 Thread Rafael J. Wysocki

On Fri, Oct 29, 2021 at 6:29 PM Dmitry Osipenko  wrote:
>
> 29.10.2021 18:56, Rafael J. Wysocki пишет:
> > On Fri, Oct 29, 2021 at 5:20 PM Dmitry Osipenko  wrote:
> >>
> >> 26.10.2021 01:40, Dmitry Osipenko пишет:
> >>> + ret = devm_pm_runtime_enable(>dev);
> >>> + if (ret)
> >>> + return ret;
> >>> +
> >>> + ret = devm_tegra_core_dev_init_opp_table_common(>dev);
> >>> + if (ret)
> >>> + return ret;
> >>> +
> >>> + ret = pm_runtime_resume_and_get(>dev);
> >>> + if (ret)
> >>> + return ret;
> >>> +
> >>>   /* Set maximum frequency of the IP */
> >>> - ret = clk_set_rate(pwm->clk, pwm->soc->max_frequency);
> >>> + ret = dev_pm_opp_set_rate(pwm->dev, pwm->soc->max_frequency);
> >>>   if (ret < 0) {
> >>>   dev_err(>dev, "Failed to set max frequency: %d\n", 
> >>> ret);
> >>> - return ret;
> >>> + goto put_pm;
> >>>   }
> >>>
> >>>   /*
> >>> @@ -278,7 +294,7 @@ static int tegra_pwm_probe(struct platform_device 
> >>> *pdev)
> >>>   if (IS_ERR(pwm->rst)) {
> >>>   ret = PTR_ERR(pwm->rst);
> >>>   dev_err(>dev, "Reset control is not found: %d\n", 
> >>> ret);
> >>> - return ret;
> >>> + goto put_pm;
> >>>   }
> >>>
> >>>   reset_control_deassert(pwm->rst);
> >>> @@ -291,10 +307,15 @@ static int tegra_pwm_probe(struct platform_device 
> >>> *pdev)
> >>>   if (ret < 0) {
> >>>   dev_err(>dev, "pwmchip_add() failed: %d\n", ret);
> >>>   reset_control_assert(pwm->rst);
> >>> - return ret;
> >>> + goto put_pm;
> >>>   }
> >>>
> >>> + pm_runtime_put(>dev);
> >>> +
> >>>   return 0;
> >>> +put_pm:
> >>> + pm_runtime_put_sync_suspend(>dev);
> >>> + return ret;
> >>>  }
> >>>
> >>>  static int tegra_pwm_remove(struct platform_device *pdev)
> >>> @@ -305,20 +326,44 @@ static int tegra_pwm_remove(struct platform_device 
> >>> *pdev)
> >>>
> >>>   reset_control_assert(pc->rst);
> >>>
> >>> + pm_runtime_force_suspend(>dev);
> >>
> >> I just noticed that RPM core doesn't reset RPM-enable count of a device
> >> on driver's unbind (pm_runtime_reinit). It was a bad idea to use
> >> devm_pm_runtime_enable() + pm_runtime_force_suspend() here, since RPM is
> >> disabled twice on driver's removal, and thus, RPM will never be enabled
> >> again.
> >>
> >> I'll fix it for PWM and other drivers in this series, in v15.
> >
> > Well, for the record, IMV using pm_runtime_force_suspend() is
> > generally a questionable idea.
> >
>
> Please clarify why it's a questionable idea.

There are a few reasons.

Generally speaking, it makes assumptions that may not be satisfied.

For instance, it assumes that the driver will never have to work with
the ACPI PM domain, because the ACPI PM domain has a separate set of
callbacks for system-wide suspend and resume and they are not the same
as its PM-runtime callbacks, so if the driver is combined with the
ACPI PM domain, running pm_runtime_force_suspend() may not work as
expected.

Next, it assumes that PM-runtime is actually enabled for the device
and the RPM_STATUS of it is valid when it is running.

Further, it assumes that the PM-runtime suspend callback of the driver
will always be suitable for system-wide suspend which may not be the
case if the device can generate wakeup signals and it is not allowed
to wake up the system from sleep by user space.

Next, if the driver has to work with a PM domain (other than the ACPI
one) or bus type that doesn't take the pm_runtime_force_suspend()
explicitly into account, it may end up running the runtime-suspend
callback provided by that entity from within its system-wide suspend
callback which may not work as expected.

I guess I could add a few if I had to.

Re: [PATCH v14 20/39] pwm: tegra: Add runtime PM and OPP support

2021-10-29 Thread Rafael J. Wysocki

On Fri, Oct 29, 2021 at 5:20 PM Dmitry Osipenko  wrote:
>
> 26.10.2021 01:40, Dmitry Osipenko пишет:
> > + ret = devm_pm_runtime_enable(>dev);
> > + if (ret)
> > + return ret;
> > +
> > + ret = devm_tegra_core_dev_init_opp_table_common(>dev);
> > + if (ret)
> > + return ret;
> > +
> > + ret = pm_runtime_resume_and_get(>dev);
> > + if (ret)
> > + return ret;
> > +
> >   /* Set maximum frequency of the IP */
> > - ret = clk_set_rate(pwm->clk, pwm->soc->max_frequency);
> > + ret = dev_pm_opp_set_rate(pwm->dev, pwm->soc->max_frequency);
> >   if (ret < 0) {
> >   dev_err(>dev, "Failed to set max frequency: %d\n", ret);
> > - return ret;
> > + goto put_pm;
> >   }
> >
> >   /*
> > @@ -278,7 +294,7 @@ static int tegra_pwm_probe(struct platform_device *pdev)
> >   if (IS_ERR(pwm->rst)) {
> >   ret = PTR_ERR(pwm->rst);
> >   dev_err(>dev, "Reset control is not found: %d\n", ret);
> > - return ret;
> > + goto put_pm;
> >   }
> >
> >   reset_control_deassert(pwm->rst);
> > @@ -291,10 +307,15 @@ static int tegra_pwm_probe(struct platform_device 
> > *pdev)
> >   if (ret < 0) {
> >   dev_err(>dev, "pwmchip_add() failed: %d\n", ret);
> >   reset_control_assert(pwm->rst);
> > - return ret;
> > + goto put_pm;
> >   }
> >
> > + pm_runtime_put(>dev);
> > +
> >   return 0;
> > +put_pm:
> > + pm_runtime_put_sync_suspend(>dev);
> > + return ret;
> >  }
> >
> >  static int tegra_pwm_remove(struct platform_device *pdev)
> > @@ -305,20 +326,44 @@ static int tegra_pwm_remove(struct platform_device 
> > *pdev)
> >
> >   reset_control_assert(pc->rst);
> >
> > + pm_runtime_force_suspend(>dev);
>
> I just noticed that RPM core doesn't reset RPM-enable count of a device
> on driver's unbind (pm_runtime_reinit). It was a bad idea to use
> devm_pm_runtime_enable() + pm_runtime_force_suspend() here, since RPM is
> disabled twice on driver's removal, and thus, RPM will never be enabled
> again.
>
> I'll fix it for PWM and other drivers in this series, in v15.

Well, for the record, IMV using pm_runtime_force_suspend() is
generally a questionable idea.

[PATCH v2 2/7] nouveau: ACPI: Use the ACPI_COMPANION() macro directly

2021-10-13 Thread Rafael J. Wysocki

From: Rafael J. Wysocki 

The ACPI_HANDLE() macro is a wrapper arond the ACPI_COMPANION()
macro and the ACPI handle produced by the former comes from the
ACPI device object produced by the latter, so it is way more
straightforward to evaluate the latter directly instead of passing
the handle produced by the former to acpi_bus_get_device().

Modify nouveau_acpi_edid() accordingly (no intentional functional
impact).

Signed-off-by: Rafael J. Wysocki 
Reviewed-by: Ben Skeggs 
---

v1 -> v2:
   * Resend with a different From and S-o-b address and with R-by from Ben.
 No other changes.

---
 drivers/gpu/drm/nouveau/nouveau_acpi.c |9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

Index: linux-pm/drivers/gpu/drm/nouveau/nouveau_acpi.c
===
--- linux-pm.orig/drivers/gpu/drm/nouveau/nouveau_acpi.c
+++ linux-pm/drivers/gpu/drm/nouveau/nouveau_acpi.c
@@ -364,7 +364,6 @@ void *
 nouveau_acpi_edid(struct drm_device *dev, struct drm_connector *connector)
 {
struct acpi_device *acpidev;
-   acpi_handle handle;
int type, ret;
void *edid;
 
@@ -377,12 +376,8 @@ nouveau_acpi_edid(struct drm_device *dev
return NULL;
}
 
-   handle = ACPI_HANDLE(dev->dev);
-   if (!handle)
-   return NULL;
-
-   ret = acpi_bus_get_device(handle, );
-   if (ret)
+   acpidev = ACPI_COMPANION(dev->dev);
+   if (!acpidev)
return NULL;
 
ret = acpi_video_get_edid(acpidev, type, -1, );

[PATCH v1 2/7] nouveau: ACPI: Use the ACPI_COMPANION() macro directly

2021-10-12 Thread Rafael J. Wysocki

From: Rafael J. Wysocki 

The ACPI_HANDLE() macro is a wrapper arond the ACPI_COMPANION()
macro and the ACPI handle produced by the former comes from the
ACPI device object produced by the latter, so it is way more
straightforward to evaluate the latter directly instead of passing
the handle produced by the former to acpi_bus_get_device().

Modify nouveau_acpi_edid() accordingly (no intentional functional
impact).

Signed-off-by: Rafael J. Wysocki 
---
 drivers/gpu/drm/nouveau/nouveau_acpi.c |9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

Index: linux-pm/drivers/gpu/drm/nouveau/nouveau_acpi.c
===
--- linux-pm.orig/drivers/gpu/drm/nouveau/nouveau_acpi.c
+++ linux-pm/drivers/gpu/drm/nouveau/nouveau_acpi.c
@@ -364,7 +364,6 @@ void *
 nouveau_acpi_edid(struct drm_device *dev, struct drm_connector *connector)
 {
struct acpi_device *acpidev;
-   acpi_handle handle;
int type, ret;
void *edid;
 
@@ -377,12 +376,8 @@ nouveau_acpi_edid(struct drm_device *dev
return NULL;
}
 
-   handle = ACPI_HANDLE(dev->dev);
-   if (!handle)
-   return NULL;
-
-   ret = acpi_bus_get_device(handle, );
-   if (ret)
+   acpidev = ACPI_COMPANION(dev->dev);
+   if (!acpidev)
return NULL;
 
ret = acpi_video_get_edid(acpidev, type, -1, );

Re: [PATCH] component: Move host device to end of device lists on binding

2021-05-11 Thread Rafael J. Wysocki

On Tue, May 11, 2021 at 7:00 PM Stephen Boyd  wrote:
>
> Quoting Rafael J. Wysocki (2021-05-11 03:52:06)
> > On Mon, May 10, 2021 at 9:08 PM Stephen Boyd  wrote:
> >
> > [cut]
> >
> > >
> > > >
> > > > > I will try it, but then I wonder about things like system wide
> > > > > suspend/resume too. The drm encoder chain would need to reimplement 
> > > > > the
> > > > > logic for system wide suspend/resume so that any PM ops attached to 
> > > > > the
> > > > > msm device run in the correct order. Right now the bridge PM ops will
> > > > > run, the i2c bus PM ops will run, and then the msm PM ops will run.
> > > > > After this change, the msm PM ops will run, the bridge PM ops will 
> > > > > run,
> > > > > and then the i2c bus PM ops will run. It feels like that could be a
> > > > > problem if we're suspending the DSI encoder while the bridge is still
> > > > > active.
> > > >
> > > > Yup suspend/resume has the exact same problem as shutdown.
> > >
> > > I think suspend/resume has the exact opposite problem. At least I think
> > > the correct order is to suspend the bridge, then the encoder, i.e. DSI,
> > > like is happening today. It looks like drm_atomic_helper_shutdown()
> > > operates from the top down when we want bottom up? I admit I have no
> > > idea what is supposed to happen here.
> >
> > Why would the system-wide suspend ordering be different from the
> > shutdown ordering?
>
> I don't really know. I'm mostly noting that today the order of suspend
> is to suspend the bridge device first and then the aggregate device. If
> the suspend of the aggregate device is traversing the devices like
> drm_atomic_helper_shutdown() then it would operate on the bridge device
> after it has been suspended, like is happening during shutdown. But it
> looks like that isn't happening. At least for the msm driver we're
> suspending the aggregate device after the bridge, and there are some
> weird usages of prepare and complete in there (see msm_pm_prepare() and
> msm_pm_complete) which makes me think that it's all working around this
> component code.

Well, it looks like the "prepare" phase is used sort-of against the
rules (because "prepare" is not supposed to make changes to the
hardware configuration or at least that is not its role) in order to
work around an ordering issue that is present in shutdown which
doesn't have a "prepare" phase.

> The prepare phase is going to suspend the display pipeline, and then the
> bridge device will run its suspend hooks, and then the aggregate driver
> will run its suspend hooks. If we had a proper device for the aggregate
> device instead of the bind/unbind component hooks we could clean this
> up.

I'm not sufficiently familiar with the component code to add anything
constructive here, but generally speaking it looks like the "natural"
dpm_list ordering does not match the order in which the devices in
question should be suspended (or shut down for that matter), so indeed
it is necessary to reorder dpm_list this way or another.

Please also note that it generally may not be sufficient to reorder
dpm_list if the devices are suspended and resumed asynchronously
during system-wide transitions, because in that case the callbacks of
different devices are only started in the dpm_list order, but they may
be completed in a different order (and of course they may run in
parallel with each other).

Shutdown is simpler, because it runs the callback synchronously for
all devices IIRC.

Re: [PATCH] component: Move host device to end of device lists on binding

2021-05-11 Thread Rafael J. Wysocki

On Mon, May 10, 2021 at 9:08 PM Stephen Boyd  wrote:

[cut]

>
> >
> > > I will try it, but then I wonder about things like system wide
> > > suspend/resume too. The drm encoder chain would need to reimplement the
> > > logic for system wide suspend/resume so that any PM ops attached to the
> > > msm device run in the correct order. Right now the bridge PM ops will
> > > run, the i2c bus PM ops will run, and then the msm PM ops will run.
> > > After this change, the msm PM ops will run, the bridge PM ops will run,
> > > and then the i2c bus PM ops will run. It feels like that could be a
> > > problem if we're suspending the DSI encoder while the bridge is still
> > > active.
> >
> > Yup suspend/resume has the exact same problem as shutdown.
>
> I think suspend/resume has the exact opposite problem. At least I think
> the correct order is to suspend the bridge, then the encoder, i.e. DSI,
> like is happening today. It looks like drm_atomic_helper_shutdown()
> operates from the top down when we want bottom up? I admit I have no
> idea what is supposed to happen here.

Why would the system-wide suspend ordering be different from the
shutdown ordering?

Re: [PATCH] component: Move host device to end of device lists on binding

2021-05-10 Thread Rafael J. Wysocki

On Sat, May 8, 2021 at 9:41 AM Stephen Boyd  wrote:
>
> The device lists are poorly ordered when the component device code is
> used. This is because component_master_add_with_match() returns 0
> regardless of component devices calling component_add() first. It can
> really only fail if an allocation fails, in which case everything is
> going bad and we're out of memory. The host device (called master_dev in
> the code), can succeed at probe and be put on the device lists before
> any of the component devices are probed and put on the lists.
>
> Within the component device framework this usually isn't that bad
> because the real driver work is done at bind time via
> component{,master}_ops::bind(). It becomes a problem when the driver
> core, or host driver, wants to operate on the component device outside
> of the bind/unbind functions, e.g. via 'remove' or 'shutdown'. The
> driver core doesn't understand the relationship between the host device
> and the component devices and could possibly try to operate on component
> devices when they're already removed from the system or shut down.
>
> Normally, device links or probe defer would reorder the lists and put
> devices that depend on other devices in the lists at the correct
> location, but with component devices this doesn't happen because this
> information isn't expressed anywhere. Drivers simply succeed at
> registering their component or host with the component framework and
> wait for their bind() callback to be called once the other components
> are ready. We could make various device links between 'master_dev' and
> 'component->dev' but it's not necessary. Let's simply move the hosting
> device to the end of the device lists when the component device fully
> binds. This way we know that all components are present and have probed
> properly and now the host device has really probed so it's safe to
> assume the host driver ops can operate on any component device.

Moving a device to the end of dpm_list is generally risky in cases
when some dependency information may be missing.

For example, if there is a device depending on the hosting one, but
that dependency is not represented by a device link or a direct
ancestor-descendant relationship (or generally a path in the device
dependency graph leading from one of them to the other), then moving
it to the end of dpm_list would cause system-wide suspend to fail (the
hosting device would be suspended before the one depending on it).

That may not be a concern here, but at least it would be good to
document why it is not a concern.

Re: NVIDIA GPU fallen off the bus after exiting s2idle

2021-05-06 Thread Rafael J. Wysocki

On Tue, May 4, 2021 at 10:08 AM Chris Chiu  wrote:
>
> Hi,
> We have some Intel laptops (11th generation CPU) with NVIDIA GPU
> suffering the same GPU falling off the bus problem while exiting
> s2idle with external display connected. These laptops connect the
> external display via the HDMI/DisplayPort on a USB Type-C interfaced
> dock. If we enter and exit s2idle with the dock connected, the NVIDIA
> GPU (confirmed on 10de:24b6 and 10de:25b8) and the PCIe port can come
> back to D0 w/o problem. If we enter the s2idle, disconnect the dock,
> then exit the s2idle, both external display and the panel will remain
> with no output. The dmesg as follows shows the "nvidia :01:00.0:
> can't change power state from D3cold to D0 (config space
> inaccessible)" due to the following ACPI error
> [ 154.446781]
> [ 154.446783]
> [ 154.446783] Initialized Local Variables for Method [IPCS]:
> [ 154.446784] Local0: 9863e365  Integer 09C5
> [ 154.446790]
> [ 154.446791] Initialized Arguments for Method [IPCS]: (7 arguments
> defined for method invocation)
> [ 154.446792] Arg0: 25568fbd  Integer 00AC
> [ 154.446795] Arg1: 9ef30e76  Integer 
> [ 154.446798] Arg2: fdf820f0  Integer 0010
> [ 154.446801] Arg3: 9fc2a088  Integer 0001
> [ 154.446804] Arg4: 3a3418f7  Integer 0001
> [ 154.446807] Arg5: 20c4b87c  Integer 
> [ 154.446810] Arg6: 8b965a8a  Integer 
> [ 154.446813]
> [ 154.446815] ACPI Error: Aborting method \IPCS due to previous error
> (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529)
> [ 154.446824] ACPI Error: Aborting method \MCUI due to previous error
> (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529)
> [ 154.446829] ACPI Error: Aborting method \SPCX due to previous error
> (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529)
> [ 154.446835] ACPI Error: Aborting method \_SB.PC00.PGSC due to
> previous error (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529)
> [ 154.446841] ACPI Error: Aborting method \_SB.PC00.PGON due to
> previous error (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529)
> [ 154.446846] ACPI Error: Aborting method \_SB.PC00.PEG1.NPON due to
> previous error (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529)
> [ 154.446852] ACPI Error: Aborting method \_SB.PC00.PEG1.PG01._ON due
> to previous error (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529)
> [ 154.446860] acpi device:02: Failed to change power state to D0
> [ 154.690760] video LNXVIDEO:00: Cannot transition to power state D0
> for parent in (unknown)

If I were to guess, I would say that AML tries to access memory that
is not accessible while suspended, probably PCI config space.

> The IPCS is the last function called from \_SB.PC00.PEG1.PG01._ON
> which we expect it to prepare everything before bringing back the
> NVIDIA GPU but it's stuck in the infinite loop as described below.
> Please refer to
> https://gist.github.com/mschiu77/fa4f5a97297749d0d66fe60c1d421c44 for
> the full DSDT.dsl.

The DSDT alone may not be sufficient.

Can you please create a bug entry at bugzilla.kernel.org for this
issue and attach the full output of acpidump from one of the affected
machines to it?  And please let me know the number of the bug.

Also please attach the output of dmesg including a suspend-resume
cycle including dock disconnection while suspended and the ACPI
messages quoted below.

>While (One)
> {
> If ((!IBSY || (IERR == One)))
> {
> Break
> }
>
> If ((Local0 > TMOV))
> {
> RPKG [Zero] = 0x03
> Return (RPKG) /* \IPCS.RPKG */
> }
>
> Sleep (One)
> Local0++
> }
>
> And the upstream PCIe port of NVIDIA seems to become inaccessible due
> to the messages as follows.
> [ 292.746508] pcieport :00:01.0: waiting 100 ms for downstream
> link, after activation
> [ 292.882296] pci :01:00.0: waiting additional 100 ms to become accessible
> [ 316.876997] pci :01:00.0: can't change power state from D3cold
> to D0 (config space inaccessible)
>
> Since the IPCS is the Intel Reference Code and we don't really know
> why the never-end loop happens just because we unplug the dock while
> the system still stays in s2idle. Can anyone from Intel suggest what
> happens here?

This list is not the right channel for inquiries related to Intel
support, we can only help you as Linux kernel developers in this
venue.

> And one thing also worth mentioning, if we unplug the display cable
> from the dock before entering the s2idle, NVIDIA GPU can come back w/o
> problem even if we disconnect the dock before exiting s2idle. Here's
> the lspci information
> https://gist.github.com/mschiu77/0bfc439d15d52d20de0129b1b2a86dc4 and
> the dmesg log with ACPI trace_state enabled and dynamic debug on for
>

Re: [PATCH 000/141] Fix fall-through warnings for Clang

2020-11-23 Thread Rafael J. Wysocki

On Mon, Nov 23, 2020 at 4:58 PM James Bottomley
 wrote:
>
> On Mon, 2020-11-23 at 15:19 +0100, Miguel Ojeda wrote:
> > On Sun, Nov 22, 2020 at 11:36 PM James Bottomley
> >  wrote:

[cut]

> >
> > Maintainers routinely review 1-line trivial patches, not to mention
> > internal API changes, etc.
>
> We're also complaining about the inability to recruit maintainers:
>
> https://www.theregister.com/2020/06/30/hard_to_find_linux_maintainers_says_torvalds/
>
> And burn out:
>
> http://antirez.com/news/129

Right.

> The whole crux of your argument seems to be maintainers' time isn't
> important so we should accept all trivial patches ... I'm pushing back
> on that assumption in two places, firstly the valulessness of the time
> and secondly that all trivial patches are valuable.
>
> > If some company does not want to pay for that, that's fine, but they
> > don't get to be maintainers and claim `Supported`.
>
> What I'm actually trying to articulate is a way of measuring value of
> the patch vs cost ... it has nothing really to do with who foots the
> actual bill.
>
> One thesis I'm actually starting to formulate is that this continual
> devaluing of maintainers is why we have so much difficulty keeping and
> recruiting them.

Absolutely.

This is just one of the factors involved, but a significant one IMV.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4] Add power/gpu_frequency tracepoint.

2020-11-17 Thread Rafael J. Wysocki


On 11/16/2020 10:05 PM, Steven Rostedt wrote:

On Mon, 16 Nov 2020 12:55:29 -0800
Peiyong Lin  wrote:


Hi there,

May I ask whether the merge window has passed? If so is it possible to
ask for a review?

This is up to the maintainers of power management to accept this.

Rafael?


I'd say up to the GPU people rather (dri-devel CCed) since that's where 
it is going to be used.


Also it would be good to see at least one in-the-tree user of this (or a 
usage example at least).


Cheers!


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v6 2/4] driver core: add deferring probe reason to devices_deferred property

2020-06-26 Thread Rafael J. Wysocki

On Fri, Jun 26, 2020 at 12:01 PM Andrzej Hajda  wrote:
>
> /sys/kernel/debug/devices_deferred property contains list of deferred devices.
> This list does not contain reason why the driver deferred probe, the patch
> improves it.
> The natural place to set the reason is probe_err function introduced recently,
> ie. if probe_err will be called with -EPROBE_DEFER instead of printk the 
> message
> will be attached to deferred device and printed when user read 
> devices_deferred
> property.
>
> Signed-off-by: Andrzej Hajda 
> Reviewed-by: Mark Brown 
> Reviewed-by: Javier Martinez Canillas 
> Reviewed-by: Andy Shevchenko 

Reviewed-by: Rafael J. Wysocki 

> ---
>  drivers/base/base.h |  3 +++
>  drivers/base/core.c |  8 ++--
>  drivers/base/dd.c   | 23 ++-
>  3 files changed, 31 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/base/base.h b/drivers/base/base.h
> index 95c22c0f9036..6954fccab3d7 100644
> --- a/drivers/base/base.h
> +++ b/drivers/base/base.h
> @@ -93,6 +93,7 @@ struct device_private {
> struct klist_node knode_class;
> struct list_head deferred_probe;
> struct device_driver *async_driver;
> +   char *deferred_probe_reason;
> struct device *device;
> u8 dead:1;
>  };
> @@ -134,6 +135,8 @@ extern void device_release_driver_internal(struct device 
> *dev,
>  extern void driver_detach(struct device_driver *drv);
>  extern int driver_probe_device(struct device_driver *drv, struct device 
> *dev);
>  extern void driver_deferred_probe_del(struct device *dev);
> +extern void device_set_deferred_probe_reson(const struct device *dev,
> +   struct va_format *vaf);
>  static inline int driver_match_device(struct device_driver *drv,
>   struct device *dev)
>  {
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 3a827c82933f..fee047f03681 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -3963,6 +3963,8 @@ define_dev_printk_level(_dev_info, KERN_INFO);
>   * This helper implements common pattern present in probe functions for error
>   * checking: print debug or error message depending if the error value is
>   * -EPROBE_DEFER and propagate error upwards.
> + * In case of -EPROBE_DEFER it sets also defer probe reason, which can be
> + * checked later by reading devices_deferred debugfs attribute.
>   * It replaces code sequence:
>   * if (err != -EPROBE_DEFER)
>   * dev_err(dev, ...);
> @@ -3984,10 +3986,12 @@ int dev_err_probe(const struct device *dev, int err, 
> const char *fmt, ...)
> vaf.fmt = fmt;
> vaf.va = 
>
> -   if (err != -EPROBE_DEFER)
> +   if (err != -EPROBE_DEFER) {
> dev_err(dev, "error %d: %pV", err, );
> -   else
> +   } else {
> +   device_set_deferred_probe_reson(dev, );
> dev_dbg(dev, "error %d: %pV", err, );
> +   }
>
> va_end(args);
>
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 9a1d940342ac..dd5683b61f74 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -27,6 +27,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #include "base.h"
>  #include "power/power.h"
> @@ -136,6 +137,8 @@ void driver_deferred_probe_del(struct device *dev)
> if (!list_empty(>p->deferred_probe)) {
> dev_dbg(dev, "Removed from deferred list\n");
> list_del_init(>p->deferred_probe);
> +   kfree(dev->p->deferred_probe_reason);
> +   dev->p->deferred_probe_reason = NULL;
> }
> mutex_unlock(_probe_mutex);
>  }
> @@ -211,6 +214,23 @@ void device_unblock_probing(void)
> driver_deferred_probe_trigger();
>  }
>
> +/**
> + * device_set_deferred_probe_reson() - Set defer probe reason message for 
> device
> + * @dev: the pointer to the struct device
> + * @vaf: the pointer to va_format structure with message
> + */
> +void device_set_deferred_probe_reson(const struct device *dev, struct 
> va_format *vaf)
> +{
> +   const char *drv = dev_driver_string(dev);
> +
> +   mutex_lock(_probe_mutex);
> +
> +   kfree(dev->p->deferred_probe_reason);
> +   dev->p->deferred_probe_reason = kasprintf(GFP_KERNEL, "%s: %pV", drv, 
> vaf);
> +
> +   mutex_unlock(_probe_mutex);
> +}
> +
>  /*
>   * deferred_devs_show() - Show the devices in the deferred probe pending 
> list.
>   */
> @@ -221,7 +241,8 @@ static int deferred_devs_show(struct seq_file *s, v

Re: [PATCH v6 1/4] driver core: add device probe log helper

2020-06-26 Thread Rafael J. Wysocki

On Fri, Jun 26, 2020 at 12:01 PM Andrzej Hajda  wrote:
>
> During probe every time driver gets resource it should usually check for
> error printk some message if it is not -EPROBE_DEFER and return the error.
> This pattern is simple but requires adding few lines after any resource
> acquisition code, as a result it is often omitted or implemented only
> partially.
> dev_err_probe helps to replace such code sequences with simple call,
> so code:
> if (err != -EPROBE_DEFER)
> dev_err(dev, ...);
> return err;
> becomes:
> return probe_err(dev, err, ...);
>
> Signed-off-by: Andrzej Hajda 

Reviewed-by: Rafael J. Wysocki 

> ---
>  drivers/base/core.c| 42 ++
>  include/linux/device.h |  3 +++
>  2 files changed, 45 insertions(+)
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 67d39a90b45c..3a827c82933f 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -3953,6 +3953,48 @@ define_dev_printk_level(_dev_info, KERN_INFO);
>
>  #endif
>
> +/**
> + * dev_err_probe - probe error check and log helper
> + * @dev: the pointer to the struct device
> + * @err: error value to test
> + * @fmt: printf-style format string
> + * @...: arguments as specified in the format string
> + *
> + * This helper implements common pattern present in probe functions for error
> + * checking: print debug or error message depending if the error value is
> + * -EPROBE_DEFER and propagate error upwards.
> + * It replaces code sequence:
> + * if (err != -EPROBE_DEFER)
> + * dev_err(dev, ...);
> + * else
> + * dev_dbg(dev, ...);
> + * return err;
> + * with
> + * return dev_err_probe(dev, err, ...);
> + *
> + * Returns @err.
> + *
> + */
> +int dev_err_probe(const struct device *dev, int err, const char *fmt, ...)
> +{
> +   struct va_format vaf;
> +   va_list args;
> +
> +   va_start(args, fmt);
> +   vaf.fmt = fmt;
> +   vaf.va = 
> +
> +   if (err != -EPROBE_DEFER)
> +   dev_err(dev, "error %d: %pV", err, );
> +   else
> +   dev_dbg(dev, "error %d: %pV", err, );
> +
> +   va_end(args);
> +
> +   return err;
> +}
> +EXPORT_SYMBOL_GPL(dev_err_probe);
> +
>  static inline bool fwnode_is_primary(struct fwnode_handle *fwnode)
>  {
> return fwnode && !IS_ERR(fwnode->secondary);
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 15460a5ac024..6b2272ae9af8 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -964,6 +964,9 @@ void device_link_remove(void *consumer, struct device 
> *supplier);
>  void device_links_supplier_sync_state_pause(void);
>  void device_links_supplier_sync_state_resume(void);
>
> +extern __printf(3, 4)
> +int dev_err_probe(const struct device *dev, int err, const char *fmt, ...);
> +
>  /* Create alias, so I can be autoloaded. */
>  #define MODULE_ALIAS_CHARDEV(major,minor) \
> MODULE_ALIAS("char-major-" __stringify(major) "-" __stringify(minor))
> --
> 2.17.1
>
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RESEND][PATCH v8 4/8] PM / EM: add support for other devices than CPUs in Energy Model

2020-06-24 Thread Rafael J. Wysocki

On Wed, Jun 10, 2020 at 12:12 PM Lukasz Luba  wrote:
>
> Add support for other devices than CPUs. The registration function
> does not require a valid cpumask pointer and is ready to handle new
> devices. Some of the internal structures has been reorganized in order to
> keep consistent view (like removing per_cpu pd pointers).
>
> Signed-off-by: Lukasz Luba 
> ---
> Hi all,
>
> This is just a small change compared to v8 addressing Rafael's
> comments an Dan's static analyzes.
> Here are the changes:
> - added comment about mutex usage in the unregister function
> - changed 'dev' into @dev in the kerneldoc comments
> - removed 'else' statement from em_create_pd() to calm down static analizers

I've applied the series as 5.9 material with patch [4/8] replaced with this one.

Sorry for the delays in handling this!

Thanks!

>  include/linux/device.h   |   5 +
>  include/linux/energy_model.h |  29 -
>  kernel/power/energy_model.c  | 244 ---
>  3 files changed, 194 insertions(+), 84 deletions(-)
>
> diff --git a/include/linux/device.h b/include/linux/device.h
> index ac8e37cd716a..7023d3ea189b 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -13,6 +13,7 @@
>  #define _DEVICE_H_
>
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -559,6 +560,10 @@ struct device {
> struct dev_pm_info  power;
> struct dev_pm_domain*pm_domain;
>
> +#ifdef CONFIG_ENERGY_MODEL
> +   struct em_perf_domain   *em_pd;
> +#endif
> +
>  #ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
> struct irq_domain   *msi_domain;
>  #endif
> diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
> index 7076cb22b247..2d4689964029 100644
> --- a/include/linux/energy_model.h
> +++ b/include/linux/energy_model.h
> @@ -12,8 +12,10 @@
>
>  /**
>   * em_perf_state - Performance state of a performance domain
> - * @frequency: The CPU frequency in KHz, for consistency with CPUFreq
> - * @power: The power consumed by 1 CPU at this level, in milli-watts
> + * @frequency: The frequency in KHz, for consistency with CPUFreq
> + * @power: The power consumed at this level, in milli-watts (by 1 CPU or
> +   by a registered device). It can be a total power: static and
> +   dynamic.
>   * @cost:  The cost coefficient associated with this level, used during
>   * energy calculation. Equal to: power * max_frequency / 
> frequency
>   */
> @@ -27,12 +29,16 @@ struct em_perf_state {
>   * em_perf_domain - Performance domain
>   * @table: List of performance states, in ascending order
>   * @nr_perf_states:Number of performance states
> - * @cpus:  Cpumask covering the CPUs of the domain
> + * @cpus:  Cpumask covering the CPUs of the domain. It's here
> + * for performance reasons to avoid potential cache
> + * misses during energy calculations in the scheduler
> + * and simplifies allocating/freeing that memory region.
>   *
> - * A "performance domain" represents a group of CPUs whose performance is
> - * scaled together. All CPUs of a performance domain must have the same
> - * micro-architecture. Performance domains often have a 1-to-1 mapping with
> - * CPUFreq policies.
> + * In case of CPU device, a "performance domain" represents a group of CPUs
> + * whose performance is scaled together. All CPUs of a performance domain
> + * must have the same micro-architecture. Performance domains often have
> + * a 1-to-1 mapping with CPUFreq policies. In case of other devices the @cpus
> + * field is unused.
>   */
>  struct em_perf_domain {
> struct em_perf_state *table;
> @@ -71,10 +77,12 @@ struct em_data_callback {
>  #define EM_DATA_CB(_active_power_cb) { .active_power = &_active_power_cb }
>
>  struct em_perf_domain *em_cpu_get(int cpu);
> +struct em_perf_domain *em_pd_get(struct device *dev);
>  int em_register_perf_domain(cpumask_t *span, unsigned int nr_states,
> struct em_data_callback *cb);
>  int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
> struct em_data_callback *cb, cpumask_t *span);
> +void em_dev_unregister_perf_domain(struct device *dev);
>
>  /**
>   * em_pd_energy() - Estimates the energy consumed by the CPUs of a perf. 
> domain
> @@ -184,10 +192,17 @@ int em_dev_register_perf_domain(struct device *dev, 
> unsigned int nr_states,
>  {
> return -EINVAL;
>  }
> +static inline void em_dev_unregister_perf_domain(struct device *dev)
> +{
> +}
>  static inline struct em_perf_domain *em_cpu_get(int cpu)
>  {
> return NULL;
>  }
> +static inline struct em_perf_domain *em_pd_get(struct device *dev)
> +{
> +   return NULL;
> +}
>  static inline unsigned long em_pd_energy(struct em_perf_domain *pd,
> unsigned long max_util,

Re: [RESEND PATCH v5 3/5] drivers core: allow probe_err accept integer and pointer types

2020-06-24 Thread Rafael J. Wysocki

On Wed, Jun 24, 2020 at 4:44 PM Andrzej Hajda  wrote:
>
>
> On 24.06.2020 14:14, Rafael J. Wysocki wrote:
> > On Wed, Jun 24, 2020 at 1:41 PM Andrzej Hajda  wrote:
> >> Many resource acquisition functions return error value encapsulated in
> >> pointer instead of integer value. To simplify coding we can use macro
> >> which will accept both types of error.
> >> With this patch user can use:
> >>  probe_err(dev, ptr, ...)
> >> instead of:
> >>  probe_err(dev, PTR_ERR(ptr), ...)
> >> Without loosing old functionality:
> >>  probe_err(dev, err, ...)
> >>
> >> Signed-off-by: Andrzej Hajda 
> > The separation of this change from patch [1/5] looks kind of artificial to 
> > me.
> >
> > You are introducing a new function anyway, so why not to make it what
> > you want right away?
>
>
> Two reasons:
>
> 1.This patch is my recent idea, I didn't want to mix it with already
> reviewed code.
>
> 2. This patch could be treated hacky by some devs due to macro
> definition and type-casting.

Fair enough.

There is some opposition against the $subject one, so I guess it may
be dropped even.

Thanks!
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RESEND PATCH v5 3/5] drivers core: allow probe_err accept integer and pointer types

2020-06-24 Thread Rafael J. Wysocki

On Wed, Jun 24, 2020 at 1:41 PM Andrzej Hajda  wrote:
>
> Many resource acquisition functions return error value encapsulated in
> pointer instead of integer value. To simplify coding we can use macro
> which will accept both types of error.
> With this patch user can use:
> probe_err(dev, ptr, ...)
> instead of:
> probe_err(dev, PTR_ERR(ptr), ...)
> Without loosing old functionality:
> probe_err(dev, err, ...)
>
> Signed-off-by: Andrzej Hajda 

The separation of this change from patch [1/5] looks kind of artificial to me.

You are introducing a new function anyway, so why not to make it what
you want right away?

> ---
>  drivers/base/core.c| 25 ++---
>  include/linux/device.h | 25 -
>  2 files changed, 26 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 2a96954d5460..df283c62d9c0 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -3953,28 +3953,7 @@ define_dev_printk_level(_dev_info, KERN_INFO);
>
>  #endif
>
> -/**
> - * probe_err - probe error check and log helper
> - * @dev: the pointer to the struct device
> - * @err: error value to test
> - * @fmt: printf-style format string
> - * @...: arguments as specified in the format string
> - *
> - * This helper implements common pattern present in probe functions for error
> - * checking: print message if the error is not -EPROBE_DEFER and propagate 
> it.
> - * In case of -EPROBE_DEFER it sets defer probe reason, which can be checked
> - * later by reading devices_deferred debugfs attribute.
> - * It replaces code sequence:
> - * if (err != -EPROBE_DEFER)
> - * dev_err(dev, ...);
> - * return err;
> - * with
> - * return probe_err(dev, err, ...);
> - *
> - * Returns @err.
> - *
> - */
> -int probe_err(const struct device *dev, int err, const char *fmt, ...)
> +int __probe_err(const struct device *dev, int err, const char *fmt, ...)
>  {
> struct va_format vaf;
> va_list args;
> @@ -3992,7 +3971,7 @@ int probe_err(const struct device *dev, int err, const 
> char *fmt, ...)
>
> return err;
>  }
> -EXPORT_SYMBOL_GPL(probe_err);
> +EXPORT_SYMBOL_GPL(__probe_err);
>
>  static inline bool fwnode_is_primary(struct fwnode_handle *fwnode)
>  {
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 40a90d9bf799..22d3c3d4f461 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -965,7 +965,30 @@ void device_links_supplier_sync_state_pause(void);
>  void device_links_supplier_sync_state_resume(void);
>
>  extern __printf(3, 4)
> -int probe_err(const struct device *dev, int err, const char *fmt, ...);
> +int __probe_err(const struct device *dev, int err, const char *fmt, ...);
> +
> +/**
> + * probe_err - probe error check and log helper
> + * @dev: the pointer to the struct device
> + * @err: error value to test, can be integer or pointer type
> + * @fmt: printf-style format string
> + * @...: arguments as specified in the format string
> + *
> + * This helper implements common pattern present in probe functions for error
> + * checking: print message if the error is not -EPROBE_DEFER and propagate 
> it.
> + * In case of -EPROBE_DEFER it sets defer probe reason, which can be checked
> + * later by reading devices_deferred debugfs attribute.
> + * It replaces code sequence:
> + * if (err != -EPROBE_DEFER)
> + * dev_err(dev, ...);
> + * return err;
> + * with
> + * return probe_err(dev, err, ...);
> + *
> + * Returns @err.
> + *
> + */
> +#define probe_err(dev, err, args...) __probe_err(dev, (long)(err), args)
>
>  /* Create alias, so I can be autoloaded. */
>  #define MODULE_ALIAS_CHARDEV(major,minor) \
> --
> 2.17.1
>
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RESEND PATCH v5 2/5] driver core: add deferring probe reason to devices_deferred property

2020-06-24 Thread Rafael J. Wysocki

On Wed, Jun 24, 2020 at 1:41 PM Andrzej Hajda  wrote:
>
> /sys/kernel/debug/devices_deferred property contains list of deferred devices.
> This list does not contain reason why the driver deferred probe, the patch
> improves it.
> The natural place to set the reason is probe_err function introduced recently,
> ie. if probe_err will be called with -EPROBE_DEFER instead of printk the 
> message
> will be attached to deferred device and printed when user read 
> devices_deferred
> property.
>
> Signed-off-by: Andrzej Hajda 
> Reviewed-by: Mark Brown 
> Reviewed-by: Javier Martinez Canillas 
> Reviewed-by: Andy Shevchenko 
> ---
>  drivers/base/base.h |  3 +++
>  drivers/base/core.c | 10 ++
>  drivers/base/dd.c   | 21 -
>  3 files changed, 29 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/base/base.h b/drivers/base/base.h
> index 95c22c0f9036..93ef1c2f4c1f 100644
> --- a/drivers/base/base.h
> +++ b/drivers/base/base.h
> @@ -93,6 +93,7 @@ struct device_private {
> struct klist_node knode_class;
> struct list_head deferred_probe;
> struct device_driver *async_driver;
> +   char *deferred_probe_msg;

What about calling this deferred_probe_reason?

> struct device *device;
> u8 dead:1;
>  };
> @@ -134,6 +135,8 @@ extern void device_release_driver_internal(struct device 
> *dev,
>  extern void driver_detach(struct device_driver *drv);
>  extern int driver_probe_device(struct device_driver *drv, struct device 
> *dev);
>  extern void driver_deferred_probe_del(struct device *dev);
> +extern void __deferred_probe_set_msg(const struct device *dev,
> +struct va_format *vaf);

I'd call this device_set_deferred_probe_reson() to follow the naming
convention for the function names in this header file.

>  static inline int driver_match_device(struct device_driver *drv,
>   struct device *dev)
>  {
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index ee9da66bff1b..2a96954d5460 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -3962,6 +3962,8 @@ define_dev_printk_level(_dev_info, KERN_INFO);
>   *
>   * This helper implements common pattern present in probe functions for error
>   * checking: print message if the error is not -EPROBE_DEFER and propagate 
> it.
> + * In case of -EPROBE_DEFER it sets defer probe reason, which can be checked
> + * later by reading devices_deferred debugfs attribute.
>   * It replaces code sequence:
>   * if (err != -EPROBE_DEFER)
>   * dev_err(dev, ...);
> @@ -3977,14 +3979,14 @@ int probe_err(const struct device *dev, int err, 
> const char *fmt, ...)
> struct va_format vaf;
> va_list args;
>
> -   if (err == -EPROBE_DEFER)
> -   return err;
> -
> va_start(args, fmt);
> vaf.fmt = fmt;
> vaf.va = 
>
> -   dev_err(dev, "error %d: %pV", err, );
> +   if (err == -EPROBE_DEFER)
> +   __deferred_probe_set_msg(dev, );
> +   else
> +   dev_err(dev, "error %d: %pV", err, );
>
> va_end(args);
>
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 9a1d940342ac..f44d26454b6a 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -27,6 +27,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #include "base.h"
>  #include "power/power.h"
> @@ -136,6 +137,8 @@ void driver_deferred_probe_del(struct device *dev)
> if (!list_empty(>p->deferred_probe)) {
> dev_dbg(dev, "Removed from deferred list\n");
> list_del_init(>p->deferred_probe);
> +   kfree(dev->p->deferred_probe_msg);
> +   dev->p->deferred_probe_msg = NULL;
> }
> mutex_unlock(_probe_mutex);
>  }
> @@ -211,6 +214,21 @@ void device_unblock_probing(void)
> driver_deferred_probe_trigger();
>  }
>
> +/*
> + * __deferred_probe_set_msg() - Set defer probe reason message for device

I'd change this into a full kerneldoc comment.

> + */
> +void __deferred_probe_set_msg(const struct device *dev, struct va_format 
> *vaf)
> +{
> +   const char *drv = dev_driver_string(dev);
> +
> +   mutex_lock(_probe_mutex);
> +
> +   kfree(dev->p->deferred_probe_msg);
> +   dev->p->deferred_probe_msg = kasprintf(GFP_KERNEL, "%s: %pV", drv, 
> vaf);
> +
> +   mutex_unlock(_probe_mutex);
> +}
> +
>  /*
>   * deferred_devs_show() - Show the devices in the deferred probe pending 
> list.
>   */
> @@ -221,7 +239,8 @@ static int deferred_devs_show(struct seq_file *s, void 
> *data)
> mutex_lock(_probe_mutex);
>
> list_for_each_entry(curr, _probe_pending_list, 
> deferred_probe)
> -   seq_printf(s, "%s\n", dev_name(curr->device));
> +   seq_printf(s, "%s\t%s", dev_name(curr->device),
> +  curr->device->p->deferred_probe_msg ?: "\n");
>
> mutex_unlock(_probe_mutex);
>
> --
> 2.17.1
>

Re: [RESEND PATCH v5 1/5] driver core: add probe_err log helper

2020-06-24 Thread Rafael J. Wysocki

On Wed, Jun 24, 2020 at 1:41 PM Andrzej Hajda  wrote:
>
> During probe every time driver gets resource it should usually check for error
> printk some message if it is not -EPROBE_DEFER and return the error. This
> pattern is simple but requires adding few lines after any resource acquisition
> code, as a result it is often omited or implemented only partially.
> probe_err helps to replace such code sequences with simple call, so code:
> if (err != -EPROBE_DEFER)
> dev_err(dev, ...);
> return err;
> becomes:
> return probe_err(dev, err, ...);
>
> Signed-off-by: Andrzej Hajda 
> Reviewed-by: Javier Martinez Canillas 
> Reviewed-by: Mark Brown 
> Reviewed-by: Andy Shevchenko 

Reviewed-by Rafael J. Wysocki 

> ---
>  drivers/base/core.c| 39 +++
>  include/linux/device.h |  3 +++
>  2 files changed, 42 insertions(+)
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 67d39a90b45c..ee9da66bff1b 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -3953,6 +3953,45 @@ define_dev_printk_level(_dev_info, KERN_INFO);
>
>  #endif
>
> +/**
> + * probe_err - probe error check and log helper
> + * @dev: the pointer to the struct device
> + * @err: error value to test
> + * @fmt: printf-style format string
> + * @...: arguments as specified in the format string
> + *
> + * This helper implements common pattern present in probe functions for error
> + * checking: print message if the error is not -EPROBE_DEFER and propagate 
> it.
> + * It replaces code sequence:
> + * if (err != -EPROBE_DEFER)
> + * dev_err(dev, ...);
> + * return err;
> + * with
> + * return probe_err(dev, err, ...);
> + *
> + * Returns @err.
> + *
> + */
> +int probe_err(const struct device *dev, int err, const char *fmt, ...)
> +{
> +   struct va_format vaf;
> +   va_list args;
> +
> +   if (err == -EPROBE_DEFER)
> +   return err;
> +
> +   va_start(args, fmt);
> +   vaf.fmt = fmt;
> +   vaf.va = 
> +
> +   dev_err(dev, "error %d: %pV", err, );
> +
> +   va_end(args);
> +
> +   return err;
> +}
> +EXPORT_SYMBOL_GPL(probe_err);
> +
>  static inline bool fwnode_is_primary(struct fwnode_handle *fwnode)
>  {
> return fwnode && !IS_ERR(fwnode->secondary);
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 15460a5ac024..40a90d9bf799 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -964,6 +964,9 @@ void device_link_remove(void *consumer, struct device 
> *supplier);
>  void device_links_supplier_sync_state_pause(void);
>  void device_links_supplier_sync_state_resume(void);
>
> +extern __printf(3, 4)
> +int probe_err(const struct device *dev, int err, const char *fmt, ...);
> +
>  /* Create alias, so I can be autoloaded. */
>  #define MODULE_ALIAS_CHARDEV(major,minor) \
> MODULE_ALIAS("char-major-" __stringify(major) "-" __stringify(minor))
> --
> 2.17.1
>
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v3 02/15] ACPI / LPSS: Save Cherry Trail PWM ctx registers only once (at activation)

2020-06-22 Thread Rafael J. Wysocki

On Sat, Jun 20, 2020 at 2:18 PM Hans de Goede  wrote:
>
> The DSDTs on most Cherry Trail devices have an ugly clutch where the PWM
> controller gets turned off from the _PS3 method of the graphics-card dev:
>
> Method (_PS3, 0, Serialized)  // _PS3: Power State 3
> {
> ...
> PWMB = PWMC /* \_SB_.PCI0.GFX0.PWMC */
> PSAT |= 0x03
> Local0 = PSAT /* \_SB_.PCI0.GFX0.PSAT */
> ...
> }
>
> Where PSAT is the power-status register of the PWM controller.
>
> Since the i915 driver will do a pwm_get on the pwm device as it uses it to
> control the LCD panel backlight, there is a device-link marking the i915
> device as a consumer of the pwm device. So that the PWM controller will
> always be suspended after the i915 driver suspends (which is the right
> thing to do). This causes the above GFX0 PS3 AML code to run before
> acpi_lpss.c calls acpi_lpss_save_ctx().
>
> So on these devices the PWM controller will already be off when
> acpi_lpss_save_ctx() runs. This causes it to read/save all 1-s (0x)
> as ctx register values.
>
> When these bogus values get restored on resume the PWM controller actually
> keeps working, since most bits are reserved, but this does set bit 3 of
> the LPSS General purpose register, which for the PWM controller has the
> following function: "This bit is re-used to support 32kHz slow mode.
> Default is 19.2MHz as PWM source clock".
>
> This causes the clock of the PWM controller to switch from 19.2MHz to
> 32KHz, which is a slow-down of a factor 600. Surprisingly enough so far
> there have been few bug reports about this. This is likely because the
> i915 driver was hardcoding the PWM frequency to 46 KHz, which divided
> by 600 would result in a PWM frequency of approx. 78 Hz, which mostly
> still works fine. There are some bug reports about the LCD backlight
> flickering after suspend/resume which are likely caused by this issue.
>
> But with the upcoming patch-series to finally switch the i915 drivers
> code for external PWM controllers to use the atomic API and to honor
> the PWM frequency specified in the video BIOS (VBT), this becomes a much
> bigger problem. On most cases the VBT specifies either 200 Hz or 20
> KHz as PWM frequency, which with the mentioned issue ends up being either
> 1/3 Hz, where the backlight actually visible blinks on and off every 3s,
> or in 33 Hz and horrible flickering of the backlight.
>
> There are a number of possible solutions to this problem:
>
> 1. Make acpi_lpss_save_ctx() run before GFX0._PS3
>  Pro: Clean solution from pov of not medling with save/restore ctx code
>  Con: As mentioned the current ordering is the right thing to do
>  Con: Requires assymmetry in at what suspend/resume phase we do the save vs
>   restore, requiring more suspend/resume ordering hacks in already
>   convoluted acpi_lpss.c suspend/resume code.
> 2. Do some sort of save once mode for the LPSS ctx
>  Pro: Reasonably clean
>  Con: Needs a new LPSS flag + code changes to handle the flag
> 3. Detect we have failed to save the ctx registers and do not restore them
>  Pro: Not PWM specific, might help with issues on other LPSS devices too
>  Con: If we can get away with not restoring the ctx why bother with it at
>   all?
> 4. Do not save the ctx for CHT PWM controllers
>  Pro: Clean, as simple as dropping a flag?
>  Con: Not so simple as dropping a flag, needs a new flag to ensure that
>   we still do lpss_deassert_reset() on device activation.
> 5. Make the pwm-lpss code fixup the LPSS-context registers
>  Pro: Keeps acpi_lpss.c code clean
>  Con: Moves knowledge of LPSS-context into the pwm-lpss.c code
>
> 1 and 5 both do not seem to be a desirable way forward.
>
> 3 and 4 seem ok, but they both assume that restoring the LPSS-context
> registers is not necessary. I have done a couple of test and those do
> show that restoring the LPSS-context indeed does not seem to be necessary
> on devices using s2idle suspend (and successfully reaching S0i3). But I
> have no hardware to test deep / S3 suspend. So I'm not sure that not
> restoring the context is safe.
>
> That leaves solution 2, which is about as simple / clean as 3 and 4,
> so this commit fixes the described problem by implementing a new
> LPSS_SAVE_CTX_ONCE flag and setting that for the CHT PWM controllers.
>
> Signed-off-by: Hans de Goede 

Acked-by: Rafael J. Wysocki 

> ---
> Changes in v2:
> - Move #define LPSS_SAVE_CTX_ONCE define to group it with LPSS_SAVE_CTX
> ---
>  drivers/acpi/acpi_lpss.c | 21 +
>  1 file changed, 17 insertions(+), 4 deleti

Re: [PATCH v3 01/15] ACPI / LPSS: Resume Cherry Trail PWM controller in no-irq phase

2020-06-22 Thread Rafael J. Wysocki

On Sat, Jun 20, 2020 at 2:18 PM Hans de Goede  wrote:
>
> The DSDTs on most Cherry Trail devices have an ugly clutch where the PWM
> controller gets poked from the _PS0 method of the graphics-card device:
>
> Local0 = PSAT /* \_SB_.PCI0.GFX0.PSAT */
> If (((Local0 & 0x03) == 0x03))
> {
> PSAT &= 0xFFFC
> Local1 = PSAT /* \_SB_.PCI0.GFX0.PSAT */
> RSTA = Zero
> RSTF = Zero
> RSTA = One
> RSTF = One
> PWMB |= 0xC000
> PWMC = PWMB /* \_SB_.PCI0.GFX0.PWMB */
> }
>
> Where PSAT is the power-status register of the PWM controller, so if it
> is in D3 when the GFX0 device's PS0 method runs then it will turn it on
> and restore the PWM ctrl register value it saved from its PS3 handler.
> Note not only does it restore it, it ors it with 0xC000 turning it
> on at a time where we may not want it to get turned on at all.
>
> The pwm_get call which the i915 driver does to get a reference to the
> PWM controller, already adds a device-link making the GFX0 device a
> consumer of the PWM device. So it should already have been resumed when
> the above AML runs and the AML should thus not do its undesirable poking
> of the PWM controller register.
>
> But the PCI core powers on PCI devices in the no-irq resume phase and
> thus calls the troublesome PS0 method in the no-irq resume phase.
> Where as LPSS devices by default are resumed in the early resume phase.
>
> This commit sets the resume_from_noirq flag in the bsw_pwm_dev_desc
> struct, so that Cherry Trail PWM controllers will be resumed in the
> no-irq phase. Together with the device-link added by the pwm-get this
> ensures that the PWM controller will be on when the troublesome PS0
> method runs, which stops it from poking the PWM controller.
>
> Signed-off-by: Hans de Goede 

Acked-by: Rafael J. Wysocki 

> ---
>  drivers/acpi/acpi_lpss.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/acpi/acpi_lpss.c b/drivers/acpi/acpi_lpss.c
> index c5a5a179f49d..446e666b3466 100644
> --- a/drivers/acpi/acpi_lpss.c
> +++ b/drivers/acpi/acpi_lpss.c
> @@ -257,6 +257,7 @@ static const struct lpss_device_desc bsw_pwm_dev_desc = {
> .flags = LPSS_SAVE_CTX | LPSS_NO_D3_DELAY,
> .prv_offset = 0x800,
> .setup = bsw_pwm_setup,
> +   .resume_from_noirq = true,
>  };
>
>  static const struct lpss_device_desc byt_uart_dev_desc = {
> --
> 2.26.2
>
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v8 4/8] PM / EM: add support for other devices than CPUs in Energy Model

2020-06-03 Thread Rafael J. Wysocki

On Wed, Jun 3, 2020 at 6:12 PM Lukasz Luba  wrote:
>
>
>
> On 6/3/20 4:40 PM, Rafael J. Wysocki wrote:
> > On Wed, Jun 3, 2020 at 5:26 PM Lukasz Luba  wrote:
> >>
> >>
> >>
> >> On 6/3/20 4:13 PM, Rafael J. Wysocki wrote:
> >>> On Tue, Jun 2, 2020 at 1:31 PM Lukasz Luba  wrote:
> >>>>
> >>>> Hi Daniel,
> >>>>
> >>>> On 6/1/20 10:44 PM, Daniel Lezcano wrote:
> >>>>> On 27/05/2020 11:58, Lukasz Luba wrote:
> >>>>>> Add support for other devices than CPUs. The registration function
> >>>>>> does not require a valid cpumask pointer and is ready to handle new
> >>>>>> devices. Some of the internal structures has been reorganized in order 
> >>>>>> to
> >>>>>> keep consistent view (like removing per_cpu pd pointers).
> >>>>>>
> >>>>>> Signed-off-by: Lukasz Luba 
> >>>>>> ---
> >>>>>
> >>>>> [ ... ]
> >>>>>
> >>>>>> }
> >>>>>> EXPORT_SYMBOL_GPL(em_register_perf_domain);
> >>>>>> +
> >>>>>> +/**
> >>>>>> + * em_dev_unregister_perf_domain() - Unregister Energy Model (EM) for 
> >>>>>> a device
> >>>>>> + * @dev : Device for which the EM is registered
> >>>>>> + *
> >>>>>> + * Try to unregister the EM for the specified device (but not a CPU).
> >>>>>> + */
> >>>>>> +void em_dev_unregister_perf_domain(struct device *dev)
> >>>>>> +{
> >>>>>> +if (IS_ERR_OR_NULL(dev) || !dev->em_pd)
> >>>>>> +return;
> >>>>>> +
> >>>>>> +if (_is_cpu_device(dev))
> >>>>>> +return;
> >>>>>> +
> >>>>>> +mutex_lock(_pd_mutex);
> >>>>>
> >>>>> Is the mutex really needed?
> >>>>
> >>>> I just wanted to align this unregister code with register. Since there
> >>>> is debugfs dir lookup and the device's EM existence checks I thought it
> >>>> wouldn't harm just to lock for a while and make sure the registration
> >>>> path is not used. These two paths shouldn't affect each other, but with
> >>>> modules loading/unloading I wanted to play safe.
> >>>>
> >>>> I can change it maybe to just dmb() and the end of the function if it's
> >>>> a big performance problem in this unloading path. What do you think?
> >>>
> >>> I would rather leave the mutex locking as is.
> >>>
> >>> However, the question to ask is what exactly may go wrong without that
> >>> locking in place?  Is there any specific race condition that you are
> >>> concerned about?
> >>>
> >>
> >> I tried to test this with module loading & unloading with panfrost
> >> driver and CPU hotplug (which should bail out quickly) and was OK.
> >> I don't see any particular race. I don't too much about the
> >> debugfs code, though. That's why I tried to protect from some
> >> scripts/services which try to re-load the driver.
> >>
> >> Apart from that, maybe just this dev->em = NULL to be populated to all
> >> CPUs, which mutex_unlock synchronizes for free here.
> >
> > If it may run concurrently with the registration for the same device,
> > the locking is necessary, but in that case the !dev->em_pd check needs
> > to go under the mutex too IMO, or you may end up leaking the pd if the
> > registration can run between that check and the point at which the
> > mutex is taken.
>
> They don't run concurrently for the same device and users of that EM are
> already gone.
> I just wanted to be sure that everything is cleaned and synced properly.
> Here is some example of the directories under
> /sys/kernel/debug/energy_model
> cpu0, cpu4, gpu, dsp, etc
>
> The only worry that I had was the debugfs dir name, which is a
> string from dev_name() and will be the same for the next registration
> if module is re-loaded.

OK, so that needs to be explained in a comment.

> So the 'name' is reused and debugfs_create_dir()
> and debugfs_remove_recursive() uses this fsnotify, but they are
> operating under inode_lock/unlock() on the parent dir 'energy_model'.
> Then there is also

Re: [PATCH v8 4/8] PM / EM: add support for other devices than CPUs in Energy Model

2020-06-03 Thread Rafael J. Wysocki

On Wed, Jun 3, 2020 at 5:26 PM Lukasz Luba  wrote:
>
>
>
> On 6/3/20 4:13 PM, Rafael J. Wysocki wrote:
> > On Tue, Jun 2, 2020 at 1:31 PM Lukasz Luba  wrote:
> >>
> >> Hi Daniel,
> >>
> >> On 6/1/20 10:44 PM, Daniel Lezcano wrote:
> >>> On 27/05/2020 11:58, Lukasz Luba wrote:
> >>>> Add support for other devices than CPUs. The registration function
> >>>> does not require a valid cpumask pointer and is ready to handle new
> >>>> devices. Some of the internal structures has been reorganized in order to
> >>>> keep consistent view (like removing per_cpu pd pointers).
> >>>>
> >>>> Signed-off-by: Lukasz Luba 
> >>>> ---
> >>>
> >>> [ ... ]
> >>>
> >>>>}
> >>>>EXPORT_SYMBOL_GPL(em_register_perf_domain);
> >>>> +
> >>>> +/**
> >>>> + * em_dev_unregister_perf_domain() - Unregister Energy Model (EM) for a 
> >>>> device
> >>>> + * @dev : Device for which the EM is registered
> >>>> + *
> >>>> + * Try to unregister the EM for the specified device (but not a CPU).
> >>>> + */
> >>>> +void em_dev_unregister_perf_domain(struct device *dev)
> >>>> +{
> >>>> +if (IS_ERR_OR_NULL(dev) || !dev->em_pd)
> >>>> +return;
> >>>> +
> >>>> +if (_is_cpu_device(dev))
> >>>> +return;
> >>>> +
> >>>> +mutex_lock(_pd_mutex);
> >>>
> >>> Is the mutex really needed?
> >>
> >> I just wanted to align this unregister code with register. Since there
> >> is debugfs dir lookup and the device's EM existence checks I thought it
> >> wouldn't harm just to lock for a while and make sure the registration
> >> path is not used. These two paths shouldn't affect each other, but with
> >> modules loading/unloading I wanted to play safe.
> >>
> >> I can change it maybe to just dmb() and the end of the function if it's
> >> a big performance problem in this unloading path. What do you think?
> >
> > I would rather leave the mutex locking as is.
> >
> > However, the question to ask is what exactly may go wrong without that
> > locking in place?  Is there any specific race condition that you are
> > concerned about?
> >
>
> I tried to test this with module loading & unloading with panfrost
> driver and CPU hotplug (which should bail out quickly) and was OK.
> I don't see any particular race. I don't too much about the
> debugfs code, though. That's why I tried to protect from some
> scripts/services which try to re-load the driver.
>
> Apart from that, maybe just this dev->em = NULL to be populated to all
> CPUs, which mutex_unlock synchronizes for free here.

If it may run concurrently with the registration for the same device,
the locking is necessary, but in that case the !dev->em_pd check needs
to go under the mutex too IMO, or you may end up leaking the pd if the
registration can run between that check and the point at which the
mutex is taken.

Apart from this your kerneldoc comments might me improved IMO.

First of all, you can use @dev inside of a kerneldoc (if @dev
represents an argument of the documented function) and that will
produce the right output automatically.

Second, it is better to avoid saying things like "Try to unregister
..." in kerneldoc comments (the "Try to" part is redundant).  Simply
say "Unregister ..." instead.

Thanks!
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v8 4/8] PM / EM: add support for other devices than CPUs in Energy Model

2020-06-03 Thread Rafael J. Wysocki

On Tue, Jun 2, 2020 at 1:31 PM Lukasz Luba  wrote:
>
> Hi Daniel,
>
> On 6/1/20 10:44 PM, Daniel Lezcano wrote:
> > On 27/05/2020 11:58, Lukasz Luba wrote:
> >> Add support for other devices than CPUs. The registration function
> >> does not require a valid cpumask pointer and is ready to handle new
> >> devices. Some of the internal structures has been reorganized in order to
> >> keep consistent view (like removing per_cpu pd pointers).
> >>
> >> Signed-off-by: Lukasz Luba 
> >> ---
> >
> > [ ... ]
> >
> >>   }
> >>   EXPORT_SYMBOL_GPL(em_register_perf_domain);
> >> +
> >> +/**
> >> + * em_dev_unregister_perf_domain() - Unregister Energy Model (EM) for a 
> >> device
> >> + * @dev : Device for which the EM is registered
> >> + *
> >> + * Try to unregister the EM for the specified device (but not a CPU).
> >> + */
> >> +void em_dev_unregister_perf_domain(struct device *dev)
> >> +{
> >> +if (IS_ERR_OR_NULL(dev) || !dev->em_pd)
> >> +return;
> >> +
> >> +if (_is_cpu_device(dev))
> >> +return;
> >> +
> >> +mutex_lock(_pd_mutex);
> >
> > Is the mutex really needed?
>
> I just wanted to align this unregister code with register. Since there
> is debugfs dir lookup and the device's EM existence checks I thought it
> wouldn't harm just to lock for a while and make sure the registration
> path is not used. These two paths shouldn't affect each other, but with
> modules loading/unloading I wanted to play safe.
>
> I can change it maybe to just dmb() and the end of the function if it's
> a big performance problem in this unloading path. What do you think?

I would rather leave the mutex locking as is.

However, the question to ask is what exactly may go wrong without that
locking in place?  Is there any specific race condition that you are
concerned about?
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v8 0/8] Add support for devices in the Energy Model

2020-05-29 Thread Rafael J. Wysocki

On Fri, May 29, 2020 at 5:01 PM Lukasz Luba  wrote:
>
> Hi Rafael,
>
>
> On 5/27/20 10:58 AM, Lukasz Luba wrote:
> > Hi all,
> >
> > Background of this version:
> > This is the v8 of the patch set and is has smaller scope. I had to split
> > the series into two: EM changes and thermal changes due to devfreq
> > dependencies. The patches from v7 9-14 which change devfreq cooling are
> > going to be sent in separate patch series, just after this set get merged
> > into mainline. These patches related to EM got acks and hopefully can go
> > through linux-pm tree. The later thermal patches will go through thermal
> > tree.
> >
> > The idea and purpose of the Energy Model framework changes:
> > This patch set introduces support for devices in the Energy Model (EM)
> > framework. It will unify the power model for thermal subsystem. It will
> > make simpler to add support for new devices willing to use more
> > advanced features (like Intelligent Power Allocation). Now it should
> > require less knowledge and effort for driver developer to add e.g.
> > GPU driver with simple energy model. A more sophisticated energy model
> > in the thermal framework is also possible, driver needs to provide
> > a dedicated callback function. More information can be found in the
> > updated documentation file.
> >
> > First 7 patches are refactoring Energy Model framework to add support
> > of other devices that CPUs. They change:
> > - naming convention from 'capacity' to 'performance' state,
> > - API arguments adding device pointer and not rely only on cpumask,
> > - change naming when 'cpu' was used, now it's a 'device'
> > - internal structure to maintain registered devices
> > - update users to the new API
> > Patch 8 updates OPP framework helper function to be more generic, not
> > CPU specific.
> >
> > The patch set is based on linux-pm branch linux-next 813946019dfd.
> >
>
> Could you take the patch set via your linux-pm?

I can do that, but I didn't realize that it was targeted at me, so I
need some more time to review the patches.

Thanks!
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 10/11] kernel/power: constify sysrq_key_op

2020-05-14 Thread Rafael J. Wysocki

On Wed, May 13, 2020 at 11:46 PM Emil Velikov  wrote:
>
> With earlier commits, the API no longer discards the const-ness of the
> sysrq_key_op. As such we can add the notation.
>
> Cc: Greg Kroah-Hartman 
> Cc: Jiri Slaby 
> Cc: linux-ker...@vger.kernel.org
> Cc: "Rafael J. Wysocki" 
> Cc: Len Brown 
> Cc: linux...@vger.kernel.org
> Signed-off-by: Emil Velikov 

Acked-by: Rafael J. Wysocki 

and I'm assuming that this is going to be applied along with the rest
of the series.

> ---
> Please keep me in the CC list, as I'm not subscribed to the list.
>
> IMHO it would be better if this gets merged this via the tty tree.
> ---
>  kernel/power/poweroff.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/power/poweroff.c b/kernel/power/poweroff.c
> index 6d475281c730..562aa0e450ed 100644
> --- a/kernel/power/poweroff.c
> +++ b/kernel/power/poweroff.c
> @@ -29,7 +29,7 @@ static void handle_poweroff(int key)
> schedule_work_on(cpumask_first(cpu_online_mask), _work);
>  }
>
> -static struct sysrq_key_op sysrq_poweroff_op = {
> +static const struct sysrq_key_op   sysrq_poweroff_op = {
> .handler= handle_poweroff,
> .help_msg   = "poweroff(o)",
> .action_msg = "Power Off",
> --
> 2.25.1
>
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: linux-next: manual merge of the amdgpu tree with the pm tree

2020-05-08 Thread Rafael J. Wysocki

On Friday, May 8, 2020 6:34:57 AM CEST Stephen Rothwell wrote:
> Hi all,
> 
> Today's linux-next merge of the amdgpu tree got a conflict in:
> 
>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> 
> between commit:
> 
>   e07515563d01 ("PM: sleep: core: Rename DPM_FLAG_NEVER_SKIP")
> 
> from the pm tree and commit:
> 
>   500bd19a7e5d ("drm/amdgpu: only set DPM_FLAG_NEVER_SKIP for legacy ATPX 
> BOCO")
> 
> from the amdgpu tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.

Thanks for resolving this, the resolution looks good to me.

Cheers!



___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH v2 7/9] PM: sleep: core: Rename DPM_FLAG_NEVER_SKIP

2020-04-18 Thread Rafael J. Wysocki

From: "Rafael J. Wysocki" 

Rename DPM_FLAG_NEVER_SKIP to DPM_FLAG_NO_DIRECT_COMPLETE which
matches its purpose more closely.

No functional impact.

Signed-off-by: Rafael J. Wysocki 
Acked-by: Bjorn Helgaas  # for PCI parts
Acked-by: Jeff Kirsher 
---

-> v2:
   * Rebased.
   * Added tags received so far.

---
 Documentation/driver-api/pm/devices.rst|  6 +++---
 Documentation/power/pci.rst| 10 +-
 drivers/base/power/main.c  |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c|  2 +-
 drivers/gpu/drm/i915/intel_runtime_pm.c|  2 +-
 drivers/gpu/drm/radeon/radeon_kms.c|  2 +-
 drivers/misc/mei/pci-me.c  |  2 +-
 drivers/misc/mei/pci-txe.c |  2 +-
 drivers/net/ethernet/intel/e1000e/netdev.c |  2 +-
 drivers/net/ethernet/intel/igb/igb_main.c  |  2 +-
 drivers/net/ethernet/intel/igc/igc_main.c  |  2 +-
 drivers/pci/pcie/portdrv_pci.c |  2 +-
 include/linux/pm.h |  6 +++---
 13 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/Documentation/driver-api/pm/devices.rst 
b/Documentation/driver-api/pm/devices.rst
index f66c7b9126ea..4ace0eba4506 100644
--- a/Documentation/driver-api/pm/devices.rst
+++ b/Documentation/driver-api/pm/devices.rst
@@ -361,9 +361,9 @@ the phases are: ``prepare``, ``suspend``, ``suspend_late``, 
``suspend_noirq``.
runtime PM disabled.
 
This feature also can be controlled by device drivers by using the
-   ``DPM_FLAG_NEVER_SKIP`` and ``DPM_FLAG_SMART_PREPARE`` driver power
-   management flags.  [Typically, they are set at the time the driver is
-   probed against the device in question by passing them to the
+   ``DPM_FLAG_NO_DIRECT_COMPLETE`` and ``DPM_FLAG_SMART_PREPARE`` driver
+   power management flags.  [Typically, they are set at the time the driver
+   is probed against the device in question by passing them to the
:c:func:`dev_pm_set_driver_flags` helper function.]  If the first of
these flags is set, the PM core will not apply the direct-complete
procedure described above to the given device and, consequenty, to any
diff --git a/Documentation/power/pci.rst b/Documentation/power/pci.rst
index aa1c7fce6cd0..9e1408121bea 100644
--- a/Documentation/power/pci.rst
+++ b/Documentation/power/pci.rst
@@ -1004,11 +1004,11 @@ including the PCI bus type.  The flags should be set 
once at the driver probe
 time with the help of the dev_pm_set_driver_flags() function and they should 
not
 be updated directly afterwards.
 
-The DPM_FLAG_NEVER_SKIP flag prevents the PM core from using the 
direct-complete
-mechanism allowing device suspend/resume callbacks to be skipped if the device
-is in runtime suspend when the system suspend starts.  That also affects all of
-the ancestors of the device, so this flag should only be used if absolutely
-necessary.
+The DPM_FLAG_NO_DIRECT_COMPLETE flag prevents the PM core from using the
+direct-complete mechanism allowing device suspend/resume callbacks to be 
skipped
+if the device is in runtime suspend when the system suspend starts.  That also
+affects all of the ancestors of the device, so this flag should only be used if
+absolutely necessary.
 
 The DPM_FLAG_SMART_PREPARE flag instructs the PCI bus type to only return a
 positive value from pci_pm_prepare() if the ->prepare callback provided by the
diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index 3170d93e29f9..dbc1e5e7346b 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -1844,7 +1844,7 @@ static int device_prepare(struct device *dev, 
pm_message_t state)
spin_lock_irq(>power.lock);
dev->power.direct_complete = state.event == PM_EVENT_SUSPEND &&
(ret > 0 || dev->power.no_pm_callbacks) &&
-   !dev_pm_test_driver_flags(dev, DPM_FLAG_NEVER_SKIP);
+   !dev_pm_test_driver_flags(dev, DPM_FLAG_NO_DIRECT_COMPLETE);
spin_unlock_irq(>power.lock);
return 0;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index fd1dc3236eca..a9086ea1ab60 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -191,7 +191,7 @@ int amdgpu_driver_load_kms(struct drm_device *dev, unsigned 
long flags)
}
 
if (adev->runpm) {
-   dev_pm_set_driver_flags(dev->dev, DPM_FLAG_NEVER_SKIP);
+   dev_pm_set_driver_flags(dev->dev, DPM_FLAG_NO_DIRECT_COMPLETE);
pm_runtime_use_autosuspend(dev->dev);
pm_runtime_set_autosuspend_delay(dev->dev, 5000);
pm_runtime_set_active(dev->dev);
diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c 
b/drivers/gpu/drm/i915/intel_runtime_pm.c
index ad719c9602af..9cb2d7548daa 100644
--- a/drivers/gpu/drm/i915/intel_runtime_pm.c
+++ b

[PATCH 5/7] PM: sleep: core: Rename DPM_FLAG_NEVER_SKIP

2020-04-10 Thread Rafael J. Wysocki

From: "Rafael J. Wysocki" 

Rename DPM_FLAG_NEVER_SKIP to DPM_FLAG_NO_DIRECT_COMPLETE which
matches its purpose more closely.

No functional impact.

Signed-off-by: Rafael J. Wysocki 
---
 Documentation/driver-api/pm/devices.rst|  6 +++---
 Documentation/power/pci.rst| 10 +-
 drivers/base/power/main.c  |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c|  2 +-
 drivers/gpu/drm/i915/intel_runtime_pm.c|  2 +-
 drivers/gpu/drm/radeon/radeon_kms.c|  2 +-
 drivers/misc/mei/pci-me.c  |  2 +-
 drivers/misc/mei/pci-txe.c |  2 +-
 drivers/net/ethernet/intel/e1000e/netdev.c |  2 +-
 drivers/net/ethernet/intel/igb/igb_main.c  |  2 +-
 drivers/net/ethernet/intel/igc/igc_main.c  |  2 +-
 drivers/pci/pcie/portdrv_pci.c |  2 +-
 include/linux/pm.h |  6 +++---
 13 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/Documentation/driver-api/pm/devices.rst 
b/Documentation/driver-api/pm/devices.rst
index f66c7b9126ea..4ace0eba4506 100644
--- a/Documentation/driver-api/pm/devices.rst
+++ b/Documentation/driver-api/pm/devices.rst
@@ -361,9 +361,9 @@ the phases are: ``prepare``, ``suspend``, ``suspend_late``, 
``suspend_noirq``.
runtime PM disabled.
 
This feature also can be controlled by device drivers by using the
-   ``DPM_FLAG_NEVER_SKIP`` and ``DPM_FLAG_SMART_PREPARE`` driver power
-   management flags.  [Typically, they are set at the time the driver is
-   probed against the device in question by passing them to the
+   ``DPM_FLAG_NO_DIRECT_COMPLETE`` and ``DPM_FLAG_SMART_PREPARE`` driver
+   power management flags.  [Typically, they are set at the time the driver
+   is probed against the device in question by passing them to the
:c:func:`dev_pm_set_driver_flags` helper function.]  If the first of
these flags is set, the PM core will not apply the direct-complete
procedure described above to the given device and, consequenty, to any
diff --git a/Documentation/power/pci.rst b/Documentation/power/pci.rst
index aa1c7fce6cd0..9e1408121bea 100644
--- a/Documentation/power/pci.rst
+++ b/Documentation/power/pci.rst
@@ -1004,11 +1004,11 @@ including the PCI bus type.  The flags should be set 
once at the driver probe
 time with the help of the dev_pm_set_driver_flags() function and they should 
not
 be updated directly afterwards.
 
-The DPM_FLAG_NEVER_SKIP flag prevents the PM core from using the 
direct-complete
-mechanism allowing device suspend/resume callbacks to be skipped if the device
-is in runtime suspend when the system suspend starts.  That also affects all of
-the ancestors of the device, so this flag should only be used if absolutely
-necessary.
+The DPM_FLAG_NO_DIRECT_COMPLETE flag prevents the PM core from using the
+direct-complete mechanism allowing device suspend/resume callbacks to be 
skipped
+if the device is in runtime suspend when the system suspend starts.  That also
+affects all of the ancestors of the device, so this flag should only be used if
+absolutely necessary.
 
 The DPM_FLAG_SMART_PREPARE flag instructs the PCI bus type to only return a
 positive value from pci_pm_prepare() if the ->prepare callback provided by the
diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index 21187ee37b22..aa9c8df9fc4b 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -1850,7 +1850,7 @@ static int device_prepare(struct device *dev, 
pm_message_t state)
spin_lock_irq(>power.lock);
dev->power.direct_complete = state.event == PM_EVENT_SUSPEND &&
(ret > 0 || dev->power.no_pm_callbacks) &&
-   !dev_pm_test_driver_flags(dev, DPM_FLAG_NEVER_SKIP);
+   !dev_pm_test_driver_flags(dev, DPM_FLAG_NO_DIRECT_COMPLETE);
spin_unlock_irq(>power.lock);
return 0;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index fd1dc3236eca..a9086ea1ab60 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -191,7 +191,7 @@ int amdgpu_driver_load_kms(struct drm_device *dev, unsigned 
long flags)
}
 
if (adev->runpm) {
-   dev_pm_set_driver_flags(dev->dev, DPM_FLAG_NEVER_SKIP);
+   dev_pm_set_driver_flags(dev->dev, DPM_FLAG_NO_DIRECT_COMPLETE);
pm_runtime_use_autosuspend(dev->dev);
pm_runtime_set_autosuspend_delay(dev->dev, 5000);
pm_runtime_set_active(dev->dev);
diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c 
b/drivers/gpu/drm/i915/intel_runtime_pm.c
index ad719c9602af..9cb2d7548daa 100644
--- a/drivers/gpu/drm/i915/intel_runtime_pm.c
+++ b/drivers/gpu/drm/i915/intel_runtime_pm.c
@@ -549,7 +549,7 @@ void intel_runtime_pm_enable(struct intel_runtime_pm *rpm)
 * becaue the HDA dri

Re: [PATCH v2 2/2] drm/tegra: Do not implement runtime PM

2019-12-12 Thread Rafael J. Wysocki

On Thu, Dec 12, 2019 at 2:32 PM Ulf Hansson  wrote:
>
> On Thu, 12 Dec 2019 at 13:33, Thierry Reding  wrote:
> >
> > On Thu, Dec 12, 2019 at 09:52:22AM +0100, Ulf Hansson wrote:
> > > On Mon, 9 Dec 2019 at 14:03, Thierry Reding  
> > > wrote:
> > > >
> > > > From: Thierry Reding 
> > > >
> > > > The Tegra DRM driver heavily relies on the implementations for runtime
> > > > suspend/resume to be called at specific times. Unfortunately, there are
> > > > some cases where that doesn't work. One example is if the user disables
> > > > runtime PM for a given subdevice. Another example is that the PM core
> > > > acquires a reference to runtime PM during system sleep, effectively
> > > > preventing devices from going into low power modes. This is intentional
> > > > to avoid nasty race conditions, but it also causes system sleep to not
> > > > function properly on all Tegra systems.
> > >
> > > Are the problems you refer to above, solely for system suspend/resume?
> >
> > No, this patch also fixes potential issues with regular operation of the
> > display driver. The problem is that parts of the driver rely on being
> > able to shut down the hardware during runtime operations, such as
> > disabling an output. Under some circumstances part of this shutdown will
> > imply a reset and, at least on some platforms, we rely on that reset to
> > put the device into a known good state.
> >
> > So if a user decides to prevent the device from runtime suspending, we
> > can potentially run into a situation where we can't properly set a
> > display mode at runtime since we weren't allowed to reset the device.
>
> Thanks for clarifying!
>
> We have very similar issues for SDIO functional drivers (WiFi
> drivers). Typically, at some point there needs to be a guarantee that
> the power has been cut in between a "put" and "get", as to be able to
> re-program a FW.
>
> My worry in regards to this, is that we may reinvent the wheel over
> and over again, just because runtime PM today isn't a good fit.
>
> In principle, if you could, somehow forbid user-space from preventing
> the device from being runtime suspended, that would do the trick,
> wouldn't it?

Treating pm_runtime_suspend() and pm_runtime_resume() as the low-level
device power off and power on routines for the given platform is a
mistake.  It has always been a mistake and I'm not going to accept
changes trying to make it look like it isn't a mistake.

If any generic power off and power on helpers for DT-based platforms
are needed, add them and make PM-runtime use them.  That should be
straightforward enough.

Thanks!
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-12-09 Thread Rafael J. Wysocki

On Mon, Dec 9, 2019 at 12:17 PM Karol Herbst  wrote:
>
> anybody any other ideas?

Not yet, but I'm trying to collect some more information.

> It seems that both patches don't really fix
> the issue and I have no idea left on my side to try out. The only
> thing left I could do to further investigate would be to reverse
> engineer the Nvidia driver as they support runpm on Turing+ GPUs now,
> but I've heard users having similar issues to the one Lyude told us
> about... and I couldn't verify that the patches help there either in a
> reliable way.

It looks like the newer (8+) versions of Windows expect the GPU driver
to prepare the GPU for power removal in some specific way and the
latter fails if the GPU has not been prepared as expected.

Because testing indicates that the Windows 7 path in the platform
firmware works, it may be worth trying to do what it does to the PCIe
link before invoking the _OFF method for the power resource
controlling the GPU power.

If the Mika's theory that the Win7 path simply turns the PCIe link off
is correct, then whatever the _OFF method tries to do to the link
after that should not matter.

> On Wed, Nov 27, 2019 at 8:55 PM Lyude Paul  wrote:
> >
> > On Wed, 2019-11-27 at 12:51 +0100, Karol Herbst wrote:
> > > On Wed, Nov 27, 2019 at 12:49 PM Mika Westerberg
> > >  wrote:
> > > > On Tue, Nov 26, 2019 at 06:10:36PM -0500, Lyude Paul wrote:
> > > > > Hey-this is almost certainly not the right place in this thread to
> > > > > respond,
> > > > > but this thread has gotten so deep evolution can't push the subject
> > > > > further to
> > > > > the right, heh. So I'll just respond here.
> > > >
> > > > :)
> > > >
> > > > > I've been following this and helping out Karol with testing here and
> > > > > there.
> > > > > They had me test Bjorn's PCI branch on the X1 Extreme 2nd generation,
> > > > > which
> > > > > has a turing GPU and 8086:1901 PCI bridge.
> > > > >
> > > > > I was about to say "the patch fixed things, hooray!" but it seems that
> > > > > after
> > > > > trying runtime suspend/resume a couple times things fall apart again:
> > > >
> > > > You mean $subject patch, no?
> > > >
> > >
> > > no, I told Lyude to test the pci/pm branch as the runpm errors we saw
> > > on that machine looked different. Some BAR error the GPU reported
> > > after it got resumed, so I was wondering if the delays were helping
> > > with that. But after some cycles it still caused the same issue, that
> > > the GPU disappeared. Later testing also showed that my patch also
> > > didn't seem to help with this error sadly :/
> > >
> > > > > [  686.883247] nouveau :01:00.0: DRM: suspending object tree...
> > > > > [  752.866484] ACPI Error: Aborting method \_SB.PCI0.PEG0.PEGP.NVPO 
> > > > > due
> > > > > to previous error (AE_AML_LOOP_TIMEOUT) (20190816/psparse-529)
> > > > > [  752.866508] ACPI Error: Aborting method \_SB.PCI0.PGON due to
> > > > > previous error (AE_AML_LOOP_TIMEOUT) (20190816/psparse-529)
> > > > > [  752.866521] ACPI Error: Aborting method \_SB.PCI0.PEG0.PG00._ON due
> > > > > to previous error (AE_AML_LOOP_TIMEOUT) (20190816/psparse-529)
> > > >
> > > > This is probably the culprit. The same AML code fails to properly turn
> > > > on the device.
> > > >
> > > > Is acpidump from this system available somewhere?
> >
> > Attached it to this email
> >
> > > >
> > --
> > Cheers,
> > Lyude Paul
>
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 1/2] PM / runtime: Allow drivers to override runtime PM behaviour on sleep

2019-12-03 Thread Rafael J. Wysocki

On Fri, Nov 29, 2019 at 1:07 PM Thierry Reding  wrote:
>
> On Fri, Nov 29, 2019 at 11:22:08AM +0100, Rafael J. Wysocki wrote:
> >

[cut]

Sorry for the delay.

First off, let me note that I have seen your most recent patches and
thanks for taking the feedback into account, much appreciated!

Nevertheless, I feel that I need to address the below, because it is
really important.

> > Preventing a device from suspending should never be a functional
> > problem.  It may be an energy-efficiency problem, but that's something
> > for user space to consider before writing "on" to a device's control
> > file.
>
> That's really a question of how you define suspension.

In general, yes.

However, if you talk about PM-runtime, there are definitions of
"suspended" and "active" in there already.  Namely, in the PM-runtime
context, "suspended" means "may not be accessible to software" whereas
"active" means "software can access it".

> In the case of
> display drivers we have the somewhat unfortunate situation that in most
> SoCs the display "device" is actually represented by a collection of
> different devices. On Tegra specifically, for example, you have a couple
> of display controllers, then some "encoders" that take pixel streams
> from the display controllers and encode them into some wire format like
> LVDS, HDMI, DSI or DP.
>
> Prohibiting suspension of any of the individual devices causes problems
> because it effectively makes the whole composite display device not
> suspendable.

For PM-runtime, that shouldn't be a problem at all.

PM-runtime is all about (possibly) saving energy by powering down
devices that are not in use.  In particular, It is not about powering
down any devices on demand for any reason other than idleness.
Therefore in PM-runtime a situation in which a given device cannot be
suspended at all is regarded as normal, even though that may not be
desirable for energy-efficiency reasons.  It just means that the
device is in use by somebody all the time.  Moreover, PM-runtime is
designed to make it possible to resume devices at any time (as long as
the hardware works as expected), as soon as they are needed, modulo
some possible delays.  Actually, that's the purpose of a significant
part of the PM-runtime framework.

Accordingly, device drivers may refuse to suspend devices, but
refusing to resume a device is not expected by PM-runtime.

If writing "on" to the "control" file of a device does not cause it to
be resumed (if suspended) and to stay in the "active" meta-state until
"auto" is written to that file, you cannot really claim that
PM-runtime is working correctly on your system.

> Doing so in turn usually means that you can't change the
> display configuration anymore because devices need to be powered up and
> down in order to change the configuration.
>
> I consider powering up and down the devices a form of suspension. Hence
> it seemed natural to implement using runtime PM.

Unfortunately, that's not the case.

The purpose of PM-runtime is to allow idle devices to be put into
power states in which it may not be safe to access them and to make
them go back into the "accessible and responsive" state whenever
software wants/needs to access them in a coordinated fashion.  IOW, it
kind of is a counterpart of CPU idle time management.

> It sounds to me like userspace preventing runtime PM is problematic in
> most scenarios that involve composite devices because it makes all of
> the interactions between the devices a bit complicated.

Even so, that's how it works.

User space can expect to be able to block runtime suspend of devices
at any level of device hierarchy, at least for diagnostics if nothing
else, end the kernel is responsible for ensuring that.

> > > but I would end up reimplementing some of the same concepts. I'd
> > > rather use something that's supported by the PM core and that might be
> > > useful to other drivers than reinvent the wheel.
> >
> > Which doesn't have to be by using PM-runtime suspend for the handling
> > of system-wide suspend, at least in my view.
>
> Well, runtime PM is very convenient for this, though. It would allow the
> same code paths to be used in all cases.

The same low-level power-up and power-down code can be used in all
cases, but PM-runtime is not low-level enough.  It is also
opportunistic, so if you need to power down a device for reasons other
than "natural" idleness, PM-runtime is not the right tool for that
task.

Of course, PM-runtime callbacks can invoke the low-level power-up and
power-down code, but as you said there are reasons for powering down
devices not just because they happen to be idle.  System-wide suspend
is one of them.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 1/3] ACPI / LPSS: Rename pwm_backlight pwm-lookup to pwm_soc_backlight

2019-11-29 Thread Rafael J. Wysocki

On Tuesday, November 19, 2019 4:18:16 PM CET Hans de Goede wrote:
> At least Bay Trail (BYT) and Cherry Trail (CHT) devices can use 1 of 2
> different PWM controllers for controlling the LCD's backlight brightness.
> Either the one integrated into the PMIC or the one integrated into the
> SoC (the 1st LPSS PWM controller).
> 
> So far in the LPSS code on BYT we have skipped registering the LPSS PWM
> controller "pwm_backlight" lookup entry when a Crystal Cove PMIC is
> present, assuming that in this case the PMIC PWM controller will be used.
> 
> On CHT we have been relying on only 1 of the 2 PWM controllers being
> enabled in the DSDT at the same time; and always registered the lookup.
> 
> So far this has been working, but the correct way to determine which PWM
> controller needs to be used is by checking a bit in the VBT table and
> recently I've learned about 2 different BYT devices:
> Point of View MOBII TAB-P800W
> Acer Switch 10 SW5-012
> 
> Which use a Crystal Cove PMIC, yet the LCD is connected to the SoC/LPSS
> PWM controller (and the VBT correctly indicates this), so here our old
> heuristics fail.
> 
> Since only the i915 driver has access to the VBT, this commit renames
> the "pwm_backlight" lookup entries for the 1st BYT/CHT LPSS PWM controller
> to "pwm_soc_backlight" so that the i915 driver can do a pwm_get() for
> the right controller depending on the VBT bit, instead of the i915 driver
> relying on a "pwm_backlight" lookup getting registered which magically
> points to the right controller.
> 
> Signed-off-by: Hans de Goede 

Acked-by: Rafael J. Wysocki 

Or please let me know if you want me to take the whole series.

Thanks!

> ---
>  drivers/acpi/acpi_lpss.c | 11 +++
>  1 file changed, 3 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/acpi/acpi_lpss.c b/drivers/acpi/acpi_lpss.c
> index 751ed38f2a10..63e81d8e675b 100644
> --- a/drivers/acpi/acpi_lpss.c
> +++ b/drivers/acpi/acpi_lpss.c
> @@ -69,10 +69,6 @@ ACPI_MODULE_NAME("acpi_lpss");
>  #define LPSS_SAVE_CTXBIT(4)
>  #define LPSS_NO_D3_DELAY BIT(5)
>  
> -/* Crystal Cove PMIC shares same ACPI ID between different platforms */
> -#define BYT_CRC_HRV  2
> -#define CHT_CRC_HRV  3
> -
>  struct lpss_private_data;
>  
>  struct lpss_device_desc {
> @@ -158,7 +154,7 @@ static void lpss_deassert_reset(struct lpss_private_data 
> *pdata)
>   */
>  static struct pwm_lookup byt_pwm_lookup[] = {
>   PWM_LOOKUP_WITH_MODULE("80860F09:00", 0, ":00:02.0",
> -"pwm_backlight", 0, PWM_POLARITY_NORMAL,
> +"pwm_soc_backlight", 0, PWM_POLARITY_NORMAL,
>  "pwm-lpss-platform"),
>  };
>  
> @@ -170,8 +166,7 @@ static void byt_pwm_setup(struct lpss_private_data *pdata)
>   if (!adev->pnp.unique_id || strcmp(adev->pnp.unique_id, "1"))
>   return;
>  
> - if (!acpi_dev_present("INT33FD", NULL, BYT_CRC_HRV))
> - pwm_add_table(byt_pwm_lookup, ARRAY_SIZE(byt_pwm_lookup));
> + pwm_add_table(byt_pwm_lookup, ARRAY_SIZE(byt_pwm_lookup));
>  }
>  
>  #define LPSS_I2C_ENABLE  0x6c
> @@ -204,7 +199,7 @@ static void byt_i2c_setup(struct lpss_private_data *pdata)
>  /* BSW PWM used for backlight control by the i915 driver */
>  static struct pwm_lookup bsw_pwm_lookup[] = {
>   PWM_LOOKUP_WITH_MODULE("80862288:00", 0, ":00:02.0",
> -"pwm_backlight", 0, PWM_POLARITY_NORMAL,
> +"pwm_soc_backlight", 0, PWM_POLARITY_NORMAL,
>  "pwm-lpss-platform"),
>  };
>  
> 




___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 1/2] PM / runtime: Allow drivers to override runtime PM behaviour on sleep

2019-11-29 Thread Rafael J. Wysocki

On Fri, Nov 29, 2019 at 11:08 AM Thierry Reding
 wrote:
>
> On Thu, Nov 28, 2019 at 11:20:01PM +0100, Rafael J. Wysocki wrote:
> > On Thursday, November 28, 2019 11:03:57 PM CET Rafael J. Wysocki wrote:
> > > On Thursday, November 28, 2019 5:50:26 PM CET Thierry Reding wrote:
> > > >
> > > > --0F1p//8PRICkK4MW
> > > > Content-Type: text/plain; charset=us-ascii
> > > > Content-Disposition: inline
> > > > Content-Transfer-Encoding: quoted-printable
> > > >
> > > > On Thu, Nov 28, 2019 at 05:14:51PM +0100, Rafael J. Wysocki wrote:
> > > > > On Thu, Nov 28, 2019 at 5:03 PM Thierry Reding 
> > > > > =
> > > >  wrote:
> > > > > >
> > > > > > From: Thierry Reding 
> > > > > >
> > > > > > Currently the driver PM core will automatically acquire a runtime PM
> > > > > > reference for devices before system sleep is entered. This is needed
> > > > > > to avoid potential issues related to devices' parents getting put to
> > > > > > runtime suspend at the wrong time and causing problems with their
> > > > > > children.
> > > > >=20
> > > > > Not only for that.
> > > > >=20
> > > > > > In some cases drivers are carefully written to avoid such issues and
> > > > > > the default behaviour can be changed to allow runtime PM to operate
> > > > > > regularly during system sleep.
> > > > >=20
> > > > > But this change breaks quite a few assumptions in the core too, so no,
> > > > > it can't be made.
> > > >
> > > > Anything in particular that I can look at? I'm not seeing any issues
> > > > when I test this, which could of course mean that I'm just getting
> > > > lucky.
> > >
> > > There are races and such that you may never hit during casual testing.
> > >
> > > > One thing that irritated me is that I think this used to work. I do
> > > > recall testing suspend/resume a few years ago and devices would get
> > > > properly runtime suspended/resumed.
> > >
> > > Not true at all.
> > >
> > > The PM core has always taken PM-runtime references on all devices pretty 
> > > much
> > > since when PM-runtime was introduced.
> > >
> > > > I did some digging but couldn't
> > > > find anything that would have had an impact on this.
> > > >
> > > > Given that this is completely opt-in feature, why are you categorically
> > > > NAK'ing this?
> > >
> > > The general problem is that if any device has been touched by system-wide
> > > suspend code, it should not be subject to PM-runtime any more until the
> > > subsequent system-wide resume is able to undo whatever the suspend did.
> > >
> > > Moreover, if a device is runtime-suspended, the system-wide suspend code
> > > may mishandle it, in general.  That's why PM-runtime suspend is not 
> > > allowed
> > > during system-wide transitions at all.  And it has always been like that.
> > >
> > > For a specific platform you may be able to overcome these limitations if
> > > you are careful enough, but certainly they are there in general and surely
> > > you cannot prevent people from using your opt-in just because they think
> > > that they know what they are doing.
> >
> > BTW, what if user space prevents PM-runtime from suspending devices by 
> > writing
> > "on" to their "control" files?
> >
> > System-wide suspend is (of course) still expected to work in that case, so 
> > how
> > exactly would you overcome that?
>
> I suppose one way to overcome that would be to make it an error to write
> "on" to the "control" files for these devices.

Seeing suggestions like this in messages from seasoned kernel
developers is seriously disappointing. :-/

> Currently doing this is likely going to break display support on Tegra,
> so this would be a good idea in this case anyway.

PM-runtime has always allowed user space to prevent devices from being
suspended and it seems that this has not been taken into account by
Tegra display support developers at all.

> Again, I could avoid all of these issues by avoiding runtime PM in this 
> driver,

I don't quite see the connection here.

Preventing a device from suspending should never be a functional
problem.  It may be an energy-efficiency problem, but that's something
for user space to consider before writing "on" to a device's control
file.

> but I would end up reimplementing some of the same concepts. I'd
> rather use something that's supported by the PM core and that might be
> useful to other drivers than reinvent the wheel.

Which doesn't have to be by using PM-runtime suspend for the handling
of system-wide suspend, at least in my view.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 1/2] PM / runtime: Allow drivers to override runtime PM behaviour on sleep

2019-11-29 Thread Rafael J. Wysocki

On Fri, Nov 29, 2019 at 10:34 AM Thierry Reding
 wrote:
>
> On Thu, Nov 28, 2019 at 11:03:57PM +0100, Rafael J. Wysocki wrote:
> > On Thursday, November 28, 2019 5:50:26 PM CET Thierry Reding wrote:
> > >
> > > --0F1p//8PRICkK4MW
> > > Content-Type: text/plain; charset=us-ascii
> > > Content-Disposition: inline
> > > Content-Transfer-Encoding: quoted-printable
> > >
> > > On Thu, Nov 28, 2019 at 05:14:51PM +0100, Rafael J. Wysocki wrote:
> > > > On Thu, Nov 28, 2019 at 5:03 PM Thierry Reding 
> > > > =
> > >  wrote:
> > > > >
> > > > > From: Thierry Reding 
> > > > >
> > > > > Currently the driver PM core will automatically acquire a runtime PM
> > > > > reference for devices before system sleep is entered. This is needed
> > > > > to avoid potential issues related to devices' parents getting put to
> > > > > runtime suspend at the wrong time and causing problems with their
> > > > > children.
> > > >=20
> > > > Not only for that.
> > > >=20
> > > > > In some cases drivers are carefully written to avoid such issues and
> > > > > the default behaviour can be changed to allow runtime PM to operate
> > > > > regularly during system sleep.
> > > >=20
> > > > But this change breaks quite a few assumptions in the core too, so no,
> > > > it can't be made.
> > >
> > > Anything in particular that I can look at? I'm not seeing any issues
> > > when I test this, which could of course mean that I'm just getting
> > > lucky.
> >
> > There are races and such that you may never hit during casual testing.
> >
> > > One thing that irritated me is that I think this used to work. I do
> > > recall testing suspend/resume a few years ago and devices would get
> > > properly runtime suspended/resumed.
> >
> > Not true at all.
> >
> > The PM core has always taken PM-runtime references on all devices pretty 
> > much
> > since when PM-runtime was introduced.
>
> You're right. I was finally able to find a toolchain that I could build
> an old version of the kernel with. I tested system suspend/resume on the
> v4.8 release, which is the first one that had the runtime PM changes as
> well as the subsystem suspend/resume support wired up, and I can't see
> the runtime PM callbacks invoked during system suspend/resume.
>
> So I must be misremembering, or I'm confusing it with some other tests I
> was running at the time.
>
> > > I did some digging but couldn't
> > > find anything that would have had an impact on this.
> > >
> > > Given that this is completely opt-in feature, why are you categorically
> > > NAK'ing this?
> >
> > The general problem is that if any device has been touched by system-wide
> > suspend code, it should not be subject to PM-runtime any more until the
> > subsequent system-wide resume is able to undo whatever the suspend did.
> >
> > Moreover, if a device is runtime-suspended, the system-wide suspend code
> > may mishandle it, in general.  That's why PM-runtime suspend is not allowed
> > during system-wide transitions at all.  And it has always been like that.
>
> For this particular use-case the above should all be irrelevant. None of
> the drivers involved here do anything special at system suspend, because
> runtime suspend already puts the devices into the lowest possible power
> state. Basically when these devices are put into runtime suspend, they
> are completely turned off. The only exception is for things like HDMI
> where the +5V pin remains powered, so that hotplug detection will work.
>
> The runtime PM state of the devices involved is managed by the subsystem
> system suspend/resume helpers in DRM/KMS. Basically those helpers turn
> off all the devices in the composite device, which ultimately results in
> their last runtime PM reference being released. So for system suspend
> and resume, these devices aren't touched, other than maybe for the PM
> core's internal book-keeping.

OK, so you actually want system-wide PM to work like PM-runtime on the
platform in question, but there are substantial differences.

First, PM-runtime suspend can be effectively disabled by user space
and system-wide suspend is always expected to work.

Second, if system wakeup devices are involved, their handling during
system-wide suspend depends on the return value of device_may_wakeup()
which depends on what user space does, whereas PM-runtime assumes
device wakeup to be always enabled.

> > F

Re: [PATCH 1/2] PM / runtime: Allow drivers to override runtime PM behaviour on sleep

2019-11-28 Thread Rafael J. Wysocki

On Thursday, November 28, 2019 11:03:57 PM CET Rafael J. Wysocki wrote:
> On Thursday, November 28, 2019 5:50:26 PM CET Thierry Reding wrote:
> > 
> > --0F1p//8PRICkK4MW
> > Content-Type: text/plain; charset=us-ascii
> > Content-Disposition: inline
> > Content-Transfer-Encoding: quoted-printable
> > 
> > On Thu, Nov 28, 2019 at 05:14:51PM +0100, Rafael J. Wysocki wrote:
> > > On Thu, Nov 28, 2019 at 5:03 PM Thierry Reding =
> >  wrote:
> > > >
> > > > From: Thierry Reding 
> > > >
> > > > Currently the driver PM core will automatically acquire a runtime PM
> > > > reference for devices before system sleep is entered. This is needed
> > > > to avoid potential issues related to devices' parents getting put to
> > > > runtime suspend at the wrong time and causing problems with their
> > > > children.
> > >=20
> > > Not only for that.
> > >=20
> > > > In some cases drivers are carefully written to avoid such issues and
> > > > the default behaviour can be changed to allow runtime PM to operate
> > > > regularly during system sleep.
> > >=20
> > > But this change breaks quite a few assumptions in the core too, so no,
> > > it can't be made.
> > 
> > Anything in particular that I can look at? I'm not seeing any issues
> > when I test this, which could of course mean that I'm just getting
> > lucky.
> 
> There are races and such that you may never hit during casual testing.
> 
> > One thing that irritated me is that I think this used to work. I do
> > recall testing suspend/resume a few years ago and devices would get
> > properly runtime suspended/resumed.
> 
> Not true at all.
> 
> The PM core has always taken PM-runtime references on all devices pretty much
> since when PM-runtime was introduced.
> 
> > I did some digging but couldn't
> > find anything that would have had an impact on this.
> > 
> > Given that this is completely opt-in feature, why are you categorically
> > NAK'ing this?
> 
> The general problem is that if any device has been touched by system-wide
> suspend code, it should not be subject to PM-runtime any more until the
> subsequent system-wide resume is able to undo whatever the suspend did.
> 
> Moreover, if a device is runtime-suspended, the system-wide suspend code
> may mishandle it, in general.  That's why PM-runtime suspend is not allowed
> during system-wide transitions at all.  And it has always been like that.
> 
> For a specific platform you may be able to overcome these limitations if
> you are careful enough, but certainly they are there in general and surely
> you cannot prevent people from using your opt-in just because they think
> that they know what they are doing.

BTW, what if user space prevents PM-runtime from suspending devices by writing
"on" to their "control" files?

System-wide suspend is (of course) still expected to work in that case, so how
exactly would you overcome that?



___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 1/2] PM / runtime: Allow drivers to override runtime PM behaviour on sleep

2019-11-28 Thread Rafael J. Wysocki

On Thursday, November 28, 2019 5:50:26 PM CET Thierry Reding wrote:
> 
> --0F1p//8PRICkK4MW
> Content-Type: text/plain; charset=us-ascii
> Content-Disposition: inline
> Content-Transfer-Encoding: quoted-printable
> 
> On Thu, Nov 28, 2019 at 05:14:51PM +0100, Rafael J. Wysocki wrote:
> > On Thu, Nov 28, 2019 at 5:03 PM Thierry Reding =
>  wrote:
> > >
> > > From: Thierry Reding 
> > >
> > > Currently the driver PM core will automatically acquire a runtime PM
> > > reference for devices before system sleep is entered. This is needed
> > > to avoid potential issues related to devices' parents getting put to
> > > runtime suspend at the wrong time and causing problems with their
> > > children.
> >=20
> > Not only for that.
> >=20
> > > In some cases drivers are carefully written to avoid such issues and
> > > the default behaviour can be changed to allow runtime PM to operate
> > > regularly during system sleep.
> >=20
> > But this change breaks quite a few assumptions in the core too, so no,
> > it can't be made.
> 
> Anything in particular that I can look at? I'm not seeing any issues
> when I test this, which could of course mean that I'm just getting
> lucky.

There are races and such that you may never hit during casual testing.

> One thing that irritated me is that I think this used to work. I do
> recall testing suspend/resume a few years ago and devices would get
> properly runtime suspended/resumed.

Not true at all.

The PM core has always taken PM-runtime references on all devices pretty much
since when PM-runtime was introduced.

> I did some digging but couldn't
> find anything that would have had an impact on this.
> 
> Given that this is completely opt-in feature, why are you categorically
> NAK'ing this?

The general problem is that if any device has been touched by system-wide
suspend code, it should not be subject to PM-runtime any more until the
subsequent system-wide resume is able to undo whatever the suspend did.

Moreover, if a device is runtime-suspended, the system-wide suspend code
may mishandle it, in general.  That's why PM-runtime suspend is not allowed
during system-wide transitions at all.  And it has always been like that.

For a specific platform you may be able to overcome these limitations if
you are careful enough, but certainly they are there in general and surely
you cannot prevent people from using your opt-in just because they think
that they know what they are doing.

> Is there some other alternative that I can look into?

First of all, ensure that the dpm_list ordering is what it should be on the
system/platform in question.  That can be done with the help of device links.

In addition, make sure that the devices needed to suspend other devices are
suspended in the noirq phase of system-wide suspend and resumed in the
noirq phase of system-wide resume.  Or at least all of the other devices
need to be suspended before them and resumed after them.

These two things should allow you to cover the vast majority of cases if
not all of them without messing up with the rules.

Thanks!

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 1/2] PM / runtime: Allow drivers to override runtime PM behaviour on sleep

2019-11-28 Thread Rafael J. Wysocki

On Thu, Nov 28, 2019 at 5:03 PM Thierry Reding  wrote:
>
> From: Thierry Reding 
>
> Currently the driver PM core will automatically acquire a runtime PM
> reference for devices before system sleep is entered. This is needed
> to avoid potential issues related to devices' parents getting put to
> runtime suspend at the wrong time and causing problems with their
> children.

Not only for that.

> In some cases drivers are carefully written to avoid such issues and
> the default behaviour can be changed to allow runtime PM to operate
> regularly during system sleep.

But this change breaks quite a few assumptions in the core too, so no,
it can't be made.

Thanks!
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-11-22 Thread Rafael J. Wysocki

On Fri, Nov 22, 2019 at 12:52 PM Mika Westerberg
 wrote:
>

[cut]

[I'm really running out of time for today, unfortunately.]

> > > > The current design is mostly based on the PCI PM Spec 1.2, so it would
> > > > be consequent to follow system-wide suspend in PM-runtime and avoid
> > > > putting PCIe ports holding devices in D0 into any low-power states.
> > > > but that would make the approach in the $subject patch ineffective.
> > > >
> > > > Moreover, the fact that there are separate branches for "Windows 7"
> > > > and "Windows 8+" kind of suggest a change in the expected behavior
> > > > between Windows 7 and Windows 8, from the AML perspective.  I would
> > > > guess that Windows 7 followed PCI PM 1.2 and Windows 8 (and later)
> > > > does something else.
> > >
> > > My understanding (which may not be correct) is that up to Windows 7 it
> > > never put the devices into D3cold runtime. Only when the system entered
> > > Sx states it evaluated the _OFF methods.
> >
> > I see.

I think I have misunderstood what you said.

I also think that Windows 7 and before didin't do RTD3, but it did PCI
PM nevertheless and platform firmware could expect it to behave in a
specific way in that respect.  That expected behavior seems to have
changed in the next generations of Windows, as reflected by the OS
version and _REV checks in ASL.

> > > Starting from Windows 8 it started doing this runtime so devices can
> > > enter D3cold even when system is in S0.
> >
> > Hmm.  So by setting _REV to 5 we effectively change the _OFF into a NOP?
>
> No, there are two paths in the _OFF() and them some common code such as
> removing power etc.
>
> What the _REV 5 did is that it went into path where the AML seemed to
> directly disable the link.
>
> The other path that is taken with Windows 8+ does not disable the link
> but instead it puts it to low power L2 or L3 state (I suppose L3 since
> it removes the power and the GPU probably does not support wake).

OK, so the very existence of the two paths means that the OS behavior
expected by the firmware in the two cases represented by them is
different.  Presumably, the expected hardware configuration in which
the AML runs also is different in these two cases.

> The ASL code is below. PGOF() gets called from the power resource
> _OFF():

I'll look at it in detail when I have some more time later.

> Method (PGOF, 1, Serialized)
> {
> PIOF = Arg0
> If ((PIOF == Zero))
> {
> If ((SGGP == Zero))
> {
> Return (Zero)
> }
> }
> ElseIf ((PIOF == One))
> {
> If ((P1GP == Zero))
> {
> Return (Zero)
> }
> }
> ElseIf ((PIOF == 0x02))
> {
> If ((P2GP == Zero))
> {
> Return (Zero)
> }
> }
>
> PEBA = \XBAS /* External reference */
> PDEV = GDEV (PIOF)
> PFUN = GFUN (PIOF)
> Name (SCLK, Package (0x03)
> {
> One,
> 0x80,
> Zero
> })
> If ((CCHK (PIOF, Zero) == Zero))
> {
> Return (Zero)
> }
>
> \_SB.PCI0.PEG0.PEGP.LTRE = \_SB.PCI0.PEG0.LREN
> If ((Arg0 == Zero))
> {
> ELC0 = LCT0 /* \_SB_.PCI0.LCT0 */
> H0VI = S0VI /* \_SB_.PCI0.S0VI */
> H0DI = S0DI /* \_SB_.PCI0.S0DI */
> ECP0 = LCP0 /* \_SB_.PCI0.LCP0 */
> }
> ElseIf ((Arg0 == One))
> {
> ELC1 = LCT1 /* \_SB_.PCI0.LCT1 */
> H1VI = S1VI /* \_SB_.PCI0.S1VI */
> H1DI = S1DI /* \_SB_.PCI0.S1DI */
> ECP1 = LCP1 /* \_SB_.PCI0.LCP1 */
> }
> ElseIf ((Arg0 == 0x02))
> {
> ELC2 = LCT2 /* \_SB_.PCI0.LCT2 */
> H2VI = S2VI /* \_SB_.PCI0.S2VI */
> H2DI = S2DI /* \_SB_.PCI0.S2DI */
> ECP2 = LCP2 /* \_SB_.PCI0.LCP2 */
> }
>
> If (((OSYS <= 0x07D9) || ((OSYS == 0x07DF) && (_REV ==
> 0x05
> {
> If ((PIOF == Zero))
> {
> P0LD = One
> TCNT = Zero
> While ((TCNT < LDLY))
> {
> If ((P0LT == 0x08))
> {
> Break
> }
>
> Sleep (0x10)
> TCNT += 0x10
> }
>
> P0RM = One
> P0AP = 0x03
> }
> ElseIf ((PIOF == One))
> {
> P1LD = One
>

Re: [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-11-22 Thread Rafael J. Wysocki

On Fri, Nov 22, 2019 at 12:34 PM Karol Herbst  wrote:
>
> On Fri, Nov 22, 2019 at 12:30 PM Rafael J. Wysocki  wrote:
> >

[cut]

> >
>
> the issue is not AML related at all as I am able to reproduce this
> issue without having to invoke any of that at all, I just need to poke
> into the PCI register directly to cut the power.

Since the register is not documented, you don't actually know what
exactly happens when it is written to.

You basically are saying something like "if I write a specific value
to an undocumented register, that makes things fail".  And yes,
writing things to undocumented registers is likely to cause failure to
happen, in general.

The point is that the kernel will never write into this register by itself.

> The register is not documented, but effectively what the AML code is writing 
> to as well.

So that AML code is problematic.  It expects the write to do something
useful, but that's not the case.  Without the AML, the register would
not have been written to at all.

> Of course it might also be that the code I was testing it was doing
> things in a non conformant way and I just hit a different issue as
> well, but in the end I don't think that the AML code is the root cause
> of all of that.

If AML is not involved at all, things work.  You've just said so in
another message in this thread, quoting verbatim:

"yes. In my previous testing I was poking into the PCI registers of the
bridge controller and the GPU directly and that never caused any
issues as long as I limited it to putting the devices into D3hot."

You cannot claim a hardware bug just because a write to an
undocumented register from AML causes things to break.

First, that may be a bug in the AML (which is not unheard of).
Second, and that is more likely, the expectations of the AML code may
not be met at the time it is run.

Assuming the latter, the root cause is really that the kernel executes
the AML in a hardware configuration in which the expectations of that
AML are not met.

We are now trying to understand what those expectations may be and so
how to cause them to be met.

Your observation that the issue can be avoided if the GPU is not put
into D3hot by a PMCSR write is a step in that direction and it is a
good finding.  The information from Mika based on the ASL analysis is
helpful too.  Let's not jump to premature conclusions too quickly,
though.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-11-22 Thread Rafael J. Wysocki

On Fri, Nov 22, 2019 at 11:36 AM Mika Westerberg
 wrote:
>
> On Thu, Nov 21, 2019 at 11:39:23PM +0100, Rafael J. Wysocki wrote:
> > On Thu, Nov 21, 2019 at 8:49 PM Mika Westerberg
> >  wrote:
> > >
> > > On Thu, Nov 21, 2019 at 04:43:24PM +0100, Rafael J. Wysocki wrote:
> > > > On Thu, Nov 21, 2019 at 1:52 PM Mika Westerberg
> > > >  wrote:
> > > > >
> > > > > On Thu, Nov 21, 2019 at 01:46:14PM +0200, Mika Westerberg wrote:
> > > > > > On Thu, Nov 21, 2019 at 12:34:22PM +0100, Rafael J. Wysocki wrote:
> > > > > > > On Thu, Nov 21, 2019 at 12:28 PM Mika Westerberg
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Wed, Nov 20, 2019 at 11:29:33PM +0100, Rafael J. Wysocki 
> > > > > > > > wrote:
> > > > > > > > > > last week or so I found systems where the GPU was under the 
> > > > > > > > > > "PCI
> > > > > > > > > > Express Root Port" (name from lspci) and on those systems 
> > > > > > > > > > all of that
> > > > > > > > > > seems to work. So I am wondering if it's indeed just the 
> > > > > > > > > > 0x1901 one,
> > > > > > > > > > which also explains Mikas case that Thunderbolt stuff works 
> > > > > > > > > > as devices
> > > > > > > > > > never get populated under this particular bridge 
> > > > > > > > > > controller, but under
> > > > > > > > > > those "Root Port"s
> > > > > > > > >
> > > > > > > > > It always is a PCIe port, but its location within the SoC may 
> > > > > > > > > matter.
> > > > > > > >
> > > > > > > > Exactly. Intel hardware has PCIe ports on CPU side (these are 
> > > > > > > > called
> > > > > > > > PEG, PCI Express Graphics, ports), and the PCH side. I think 
> > > > > > > > the IP is
> > > > > > > > still the same.
> > > > > > > >
> > > > > > > > > Also some custom AML-based power management is involved and 
> > > > > > > > > that may
> > > > > > > > > be making specific assumptions on the configuration of the 
> > > > > > > > > SoC and the
> > > > > > > > > GPU at the time of its invocation which unfortunately are not 
> > > > > > > > > known to
> > > > > > > > > us.
> > > > > > > > >
> > > > > > > > > However, it looks like the AML invoked to power down the GPU 
> > > > > > > > > from
> > > > > > > > > acpi_pci_set_power_state() gets confused if it is not in PCI 
> > > > > > > > > D0 at
> > > > > > > > > that point, so it looks like that AML tries to access device 
> > > > > > > > > memory on
> > > > > > > > > the GPU (beyond the PCI config space) or similar which is not
> > > > > > > > > accessible in PCI power states below D0.
> > > > > > > >
> > > > > > > > Or the PCI config space of the GPU when the parent root port is 
> > > > > > > > in D3hot
> > > > > > > > (as it is the case here). Also then the GPU config space is not
> > > > > > > > accessible.
> > > > > > >
> > > > > > > Why would the parent port be in D3hot at that point?  Wouldn't 
> > > > > > > that be
> > > > > > > a suspend ordering violation?
> > > > > >
> > > > > > No. We put the GPU into D3hot first,
> > > >
> > > > OK
> > > >
> > > > Does this involve any AML, like a _PS3 under the GPU object?
> > >
> > > I don't see _PS3 (nor _PS0) for that object. If I read it right the GPU
> > > itself is not described in ACPI tables at all.
> >
> > OK
> >
> > > > > > then the root port and then turn
> > > > > > off the power resource (which is attached to the root port) 
> > > > > > resulting
> > &g

Re: [PATCH v5] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-11-22 Thread Rafael J. Wysocki

On Fri, Nov 22, 2019 at 1:22 AM Karol Herbst  wrote:
>
> Fixes state transitions of Nvidia Pascal GPUs from D3cold into higher device
> states.
>
> v2: convert to pci_dev quirk
> put a proper technical explanation of the issue as a in-code comment
> v3: disable it only for certain combinations of intel and nvidia hardware
> v4: simplify quirk by setting flag on the GPU itself
> v5: restructure quirk to make it easier to add new IDs
> fix whitespace issues
> fix potential NULL pointer access
> update the quirk documentation
>
> Signed-off-by: Karol Herbst 
> Cc: Bjorn Helgaas 
> Cc: Lyude Paul 
> Cc: Rafael J. Wysocki 
> Cc: Mika Westerberg 
> Cc: linux-...@vger.kernel.org
> Cc: linux...@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: nouv...@lists.freedesktop.org
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=205623
> ---
>  drivers/pci/pci.c|  7 ++
>  drivers/pci/quirks.c | 51 
>  include/linux/pci.h  |  1 +
>  3 files changed, 59 insertions(+)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 57f15a7e6f0b..e08db2daa924 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -850,6 +850,13 @@ static int pci_raw_set_power_state(struct pci_dev *dev, 
> pci_power_t state)
>|| (state == PCI_D2 && !dev->d2_support))
> return -EIO;
>
> +   /*
> +* Check if we have a bad combination of bridge controller and nvidia
> +* GPU, see quirk_broken_nv_runpm for more info
> +*/
> +   if (state != PCI_D0 && dev->broken_nv_runpm)
> +   return 0;

The result of this change in the suspend-to-idle path will be leaving
the device and its PCIe port in D0 while suspended, unless the device
itself has power management methods in the ACPI tables (according to
Mika that is not the case).

I don't think that this is desirable.

> +
> pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, );
>
> /*
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 44c4ae1abd00..24e3f247d291 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -5268,3 +5268,54 @@ static void 
> quirk_reset_lenovo_thinkpad_p50_nvgpu(struct pci_dev *pdev)
>  DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_NVIDIA, 0x13b1,
>   PCI_CLASS_DISPLAY_VGA, 8,
>   quirk_reset_lenovo_thinkpad_p50_nvgpu);
> +
> +/*
> + * Some Intel PCIe bridge controllers cause devices to not reappear doing a
> + * D0 -> D3hot -> D3cold -> D0 sequence.

This is inaccurate and not entirely fair AFAICS.

First off, what is a "PCIe bridge controller"?  A PCIe root complex?

Second, I don't think that you really can blame hardware here, because
the problem is related to AML (forcing a different code path in AML
makes it go away, so the same hardware with different AML would work).

More precisely, the behavior of the kernel is not what is expected by
AML associated with the PCIe port holding the device.

> Skipping the intermediate D3hot step
> + * seems to make it work again.

Yes, but the change would need to cover both the PM-runtime and
suspend-to-idle code paths.

Also it may be driver-induced rather than quirk-based.

> + *
> + * This leads to various manifestations of this issue:
> + *  - AIML code execution hits an infinite loop (as the coe waits on device

Typo: coe -> code

> + *memory to change).

Which AML code is this, the power-off part or power-on part?  Is this
AML code associated with the GPU or with the PCIe port holding it (I
guess the latter from what Mika said)?

Also IIRC ACPICA has a mechanism to break infinite loops in AML by
aborting the looping method after a timeout.

> + *  - kernel crashes, as all PCI reads return -1, which most code isn't able
> + *to handle well enough.
> + *  - sudden shutdowns, as the kernel identified an unrecoverable error after
> + *userspace tries to access the GPU.

IMO it would be enough to say that the GPU is not accessible after an
attempt to remove power from it.

> + *
> + * In all cases dmesg will contain at least one line like this:
> + * 'nouveau :01:00.0: Refused to change power state, currently in D3'
> + * followed by a lot of nouveau timeouts.
> + *
> + * ACPI code

Which ACPI code?

> writes bit 0x80 to the not documented PCI register 0x248 of the

0x248 relative to what?  A PCI bar (if so then which one) or the PCI
config space (and which part of it if so)?

> + * Intel PCIe bridge controller (0x1901) in order to power down the GPU.

This doesn't seem accurate.  It rather writes to this register to
change the state of the PCIe link between the GPU and the PC

Re: [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-11-22 Thread Rafael J. Wysocki

On Fri, Nov 22, 2019 at 1:13 AM Karol Herbst  wrote:
>
> so while trying to test with d3cold disabled, I noticed that I run
> into the exact same error.

Does this mean that you disabled d3cold on the GPU via sysfs (the
"d3cold_allowed" attribute was 0) and the original problem still
occurred in that configuration?

> And I verified that the
> \_SB.PCI0.PEG0.PG00._STA returns 1, which indicates it should still be
> turned on.

I don't really understand this comment, so can you explain it a bit to
me, please?
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-11-21 Thread Rafael J. Wysocki

On Thu, Nov 21, 2019 at 8:49 PM Mika Westerberg
 wrote:
>
> On Thu, Nov 21, 2019 at 04:43:24PM +0100, Rafael J. Wysocki wrote:
> > On Thu, Nov 21, 2019 at 1:52 PM Mika Westerberg
> >  wrote:
> > >
> > > On Thu, Nov 21, 2019 at 01:46:14PM +0200, Mika Westerberg wrote:
> > > > On Thu, Nov 21, 2019 at 12:34:22PM +0100, Rafael J. Wysocki wrote:
> > > > > On Thu, Nov 21, 2019 at 12:28 PM Mika Westerberg
> > > > >  wrote:
> > > > > >
> > > > > > On Wed, Nov 20, 2019 at 11:29:33PM +0100, Rafael J. Wysocki wrote:
> > > > > > > > last week or so I found systems where the GPU was under the "PCI
> > > > > > > > Express Root Port" (name from lspci) and on those systems all 
> > > > > > > > of that
> > > > > > > > seems to work. So I am wondering if it's indeed just the 0x1901 
> > > > > > > > one,
> > > > > > > > which also explains Mikas case that Thunderbolt stuff works as 
> > > > > > > > devices
> > > > > > > > never get populated under this particular bridge controller, 
> > > > > > > > but under
> > > > > > > > those "Root Port"s
> > > > > > >
> > > > > > > It always is a PCIe port, but its location within the SoC may 
> > > > > > > matter.
> > > > > >
> > > > > > Exactly. Intel hardware has PCIe ports on CPU side (these are called
> > > > > > PEG, PCI Express Graphics, ports), and the PCH side. I think the IP 
> > > > > > is
> > > > > > still the same.
> > > > > >
> > > > > > > Also some custom AML-based power management is involved and that 
> > > > > > > may
> > > > > > > be making specific assumptions on the configuration of the SoC 
> > > > > > > and the
> > > > > > > GPU at the time of its invocation which unfortunately are not 
> > > > > > > known to
> > > > > > > us.
> > > > > > >
> > > > > > > However, it looks like the AML invoked to power down the GPU from
> > > > > > > acpi_pci_set_power_state() gets confused if it is not in PCI D0 at
> > > > > > > that point, so it looks like that AML tries to access device 
> > > > > > > memory on
> > > > > > > the GPU (beyond the PCI config space) or similar which is not
> > > > > > > accessible in PCI power states below D0.
> > > > > >
> > > > > > Or the PCI config space of the GPU when the parent root port is in 
> > > > > > D3hot
> > > > > > (as it is the case here). Also then the GPU config space is not
> > > > > > accessible.
> > > > >
> > > > > Why would the parent port be in D3hot at that point?  Wouldn't that be
> > > > > a suspend ordering violation?
> > > >
> > > > No. We put the GPU into D3hot first,
> >
> > OK
> >
> > Does this involve any AML, like a _PS3 under the GPU object?
>
> I don't see _PS3 (nor _PS0) for that object. If I read it right the GPU
> itself is not described in ACPI tables at all.

OK

> > > > then the root port and then turn
> > > > off the power resource (which is attached to the root port) resulting
> > > > the topology entering D3cold.
> > >
> > > I don't see that happening in the AML though.
> >
> > Which AML do you mean, specifically?  The _OFF method for the root
> > port's _PR3 power resource or something else?
>
> The root port's _OFF method for the power resource returned by its _PR3.

OK, so without the $subject patch we (1) program the downstream
component (GPU) into D3hot, then we (2) program the port holding it
into D3hot and then we (3) let the AML (_OFF for the power resource
listed by _PR3 under the port object) run.

Something strange happens at this point (and I guess that _OFF doesn't
even reach the point where it removes power from the port which is why
we see a lock-up).

We know that skipping (1) makes things work and we kind of suspect
that skipping (3) would make things work either, but what about doing
(1) and (3) without (2)?

> > > Basically the difference is that when Windows 7 or Linux (the _REV==5
> > > check) then we directly do link disable whereas in Windows 8+ we

Re: [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-11-21 Thread Rafael J. Wysocki

On Thu, Nov 21, 2019 at 5:06 PM Karol Herbst  wrote:
>
> On Thu, Nov 21, 2019 at 4:47 PM Rafael J. Wysocki  wrote:
> >
> > On Thu, Nov 21, 2019 at 1:53 PM Karol Herbst  wrote:
> > >
> > > On Thu, Nov 21, 2019 at 12:46 PM Mika Westerberg
> > >  wrote:
> > > >
> > > > On Thu, Nov 21, 2019 at 12:34:22PM +0100, Rafael J. Wysocki wrote:
> > > > > On Thu, Nov 21, 2019 at 12:28 PM Mika Westerberg
> > > > >  wrote:
> > > > > >
> > > > > > On Wed, Nov 20, 2019 at 11:29:33PM +0100, Rafael J. Wysocki wrote:
> > > > > > > > last week or so I found systems where the GPU was under the "PCI
> > > > > > > > Express Root Port" (name from lspci) and on those systems all 
> > > > > > > > of that
> > > > > > > > seems to work. So I am wondering if it's indeed just the 0x1901 
> > > > > > > > one,
> > > > > > > > which also explains Mikas case that Thunderbolt stuff works as 
> > > > > > > > devices
> > > > > > > > never get populated under this particular bridge controller, 
> > > > > > > > but under
> > > > > > > > those "Root Port"s
> > > > > > >
> > > > > > > It always is a PCIe port, but its location within the SoC may 
> > > > > > > matter.
> > > > > >
> > > > > > Exactly. Intel hardware has PCIe ports on CPU side (these are called
> > > > > > PEG, PCI Express Graphics, ports), and the PCH side. I think the IP 
> > > > > > is
> > > > > > still the same.
> > > > > >
> > >
> > > yeah, I meant the bridge controller with the ID 0x1901 is on the CPU
> > > side. And if the Nvidia GPU is on a port on the PCH side it all seems
> > > to work just fine.
> >
> > But that may involve different AML too, may it not?
> >
> > > > > > > Also some custom AML-based power management is involved and that 
> > > > > > > may
> > > > > > > be making specific assumptions on the configuration of the SoC 
> > > > > > > and the
> > > > > > > GPU at the time of its invocation which unfortunately are not 
> > > > > > > known to
> > > > > > > us.
> > > > > > >
> > > > > > > However, it looks like the AML invoked to power down the GPU from
> > > > > > > acpi_pci_set_power_state() gets confused if it is not in PCI D0 at
> > > > > > > that point, so it looks like that AML tries to access device 
> > > > > > > memory on
> > > > > > > the GPU (beyond the PCI config space) or similar which is not
> > > > > > > accessible in PCI power states below D0.
> > > > > >
> > > > > > Or the PCI config space of the GPU when the parent root port is in 
> > > > > > D3hot
> > > > > > (as it is the case here). Also then the GPU config space is not
> > > > > > accessible.
> > > > >
> > > > > Why would the parent port be in D3hot at that point?  Wouldn't that be
> > > > > a suspend ordering violation?
> > > >
> > > > No. We put the GPU into D3hot first, then the root port and then turn
> > > > off the power resource (which is attached to the root port) resulting
> > > > the topology entering D3cold.
> > > >
> > >
> > > If the kernel does a D0 -> D3hot -> D0 cycle this works as well, but
> > > the power savings are way lower, so I kind of prefer skipping D3hot
> > > instead of D3cold. Skipping D3hot doesn't seem to make any difference
> > > in power savings in my testing.
> >
> > OK
> >
> > What exactly did you do to skip D3cold in your testing?
> >
>
> For that I poked into the PCI registers directly and skipped doing the
> ACPI calls and simply checked for the idle power consumption on my
> laptop.

That doesn't involve the PCIe port PM, however.

> But I guess I should retest with calling pci_d3cold_disable
> from nouveau instead? Or is there a different preferable way of
> testing this?

There is a sysfs attribute called "d3cold_allowed" which can be used
for "blocking" D3cold, so can you please retest using that?
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-11-21 Thread Rafael J. Wysocki

On Thu, Nov 21, 2019 at 1:53 PM Karol Herbst  wrote:
>
> On Thu, Nov 21, 2019 at 12:46 PM Mika Westerberg
>  wrote:
> >
> > On Thu, Nov 21, 2019 at 12:34:22PM +0100, Rafael J. Wysocki wrote:
> > > On Thu, Nov 21, 2019 at 12:28 PM Mika Westerberg
> > >  wrote:
> > > >
> > > > On Wed, Nov 20, 2019 at 11:29:33PM +0100, Rafael J. Wysocki wrote:
> > > > > > last week or so I found systems where the GPU was under the "PCI
> > > > > > Express Root Port" (name from lspci) and on those systems all of 
> > > > > > that
> > > > > > seems to work. So I am wondering if it's indeed just the 0x1901 one,
> > > > > > which also explains Mikas case that Thunderbolt stuff works as 
> > > > > > devices
> > > > > > never get populated under this particular bridge controller, but 
> > > > > > under
> > > > > > those "Root Port"s
> > > > >
> > > > > It always is a PCIe port, but its location within the SoC may matter.
> > > >
> > > > Exactly. Intel hardware has PCIe ports on CPU side (these are called
> > > > PEG, PCI Express Graphics, ports), and the PCH side. I think the IP is
> > > > still the same.
> > > >
>
> yeah, I meant the bridge controller with the ID 0x1901 is on the CPU
> side. And if the Nvidia GPU is on a port on the PCH side it all seems
> to work just fine.

But that may involve different AML too, may it not?

> > > > > Also some custom AML-based power management is involved and that may
> > > > > be making specific assumptions on the configuration of the SoC and the
> > > > > GPU at the time of its invocation which unfortunately are not known to
> > > > > us.
> > > > >
> > > > > However, it looks like the AML invoked to power down the GPU from
> > > > > acpi_pci_set_power_state() gets confused if it is not in PCI D0 at
> > > > > that point, so it looks like that AML tries to access device memory on
> > > > > the GPU (beyond the PCI config space) or similar which is not
> > > > > accessible in PCI power states below D0.
> > > >
> > > > Or the PCI config space of the GPU when the parent root port is in D3hot
> > > > (as it is the case here). Also then the GPU config space is not
> > > > accessible.
> > >
> > > Why would the parent port be in D3hot at that point?  Wouldn't that be
> > > a suspend ordering violation?
> >
> > No. We put the GPU into D3hot first, then the root port and then turn
> > off the power resource (which is attached to the root port) resulting
> > the topology entering D3cold.
> >
>
> If the kernel does a D0 -> D3hot -> D0 cycle this works as well, but
> the power savings are way lower, so I kind of prefer skipping D3hot
> instead of D3cold. Skipping D3hot doesn't seem to make any difference
> in power savings in my testing.

OK

What exactly did you do to skip D3cold in your testing?
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-11-21 Thread Rafael J. Wysocki

On Thu, Nov 21, 2019 at 1:52 PM Mika Westerberg
 wrote:
>
> On Thu, Nov 21, 2019 at 01:46:14PM +0200, Mika Westerberg wrote:
> > On Thu, Nov 21, 2019 at 12:34:22PM +0100, Rafael J. Wysocki wrote:
> > > On Thu, Nov 21, 2019 at 12:28 PM Mika Westerberg
> > >  wrote:
> > > >
> > > > On Wed, Nov 20, 2019 at 11:29:33PM +0100, Rafael J. Wysocki wrote:
> > > > > > last week or so I found systems where the GPU was under the "PCI
> > > > > > Express Root Port" (name from lspci) and on those systems all of 
> > > > > > that
> > > > > > seems to work. So I am wondering if it's indeed just the 0x1901 one,
> > > > > > which also explains Mikas case that Thunderbolt stuff works as 
> > > > > > devices
> > > > > > never get populated under this particular bridge controller, but 
> > > > > > under
> > > > > > those "Root Port"s
> > > > >
> > > > > It always is a PCIe port, but its location within the SoC may matter.
> > > >
> > > > Exactly. Intel hardware has PCIe ports on CPU side (these are called
> > > > PEG, PCI Express Graphics, ports), and the PCH side. I think the IP is
> > > > still the same.
> > > >
> > > > > Also some custom AML-based power management is involved and that may
> > > > > be making specific assumptions on the configuration of the SoC and the
> > > > > GPU at the time of its invocation which unfortunately are not known to
> > > > > us.
> > > > >
> > > > > However, it looks like the AML invoked to power down the GPU from
> > > > > acpi_pci_set_power_state() gets confused if it is not in PCI D0 at
> > > > > that point, so it looks like that AML tries to access device memory on
> > > > > the GPU (beyond the PCI config space) or similar which is not
> > > > > accessible in PCI power states below D0.
> > > >
> > > > Or the PCI config space of the GPU when the parent root port is in D3hot
> > > > (as it is the case here). Also then the GPU config space is not
> > > > accessible.
> > >
> > > Why would the parent port be in D3hot at that point?  Wouldn't that be
> > > a suspend ordering violation?
> >
> > No. We put the GPU into D3hot first,

OK

Does this involve any AML, like a _PS3 under the GPU object?

> > then the root port and then turn
> > off the power resource (which is attached to the root port) resulting
> > the topology entering D3cold.
>
> I don't see that happening in the AML though.

Which AML do you mean, specifically?  The _OFF method for the root
port's _PR3 power resource or something else?

> Basically the difference is that when Windows 7 or Linux (the _REV==5
> check) then we directly do link disable whereas in Windows 8+ we invoke
> LKDS() method that puts the link into L2/L3. None of the fields they
> access seem to touch the GPU itself.

So that may be where the problem is.

Putting the downstream component into PCI D[1-3] is expected to put
the link into L1, so I'm not sure how that plays with the later
attempt to put it into L2/L3 Ready.

Also, L2/L3 Ready is expected to be transient, so finally power should
be removed somehow.

> LKDS() for the first PEG port looks like this:
>
>P0L2 = One
>Sleep (0x10)
>Local0 = Zero
>While (P0L2)
>{
> If ((Local0 > 0x04))
> {
> Break
> }
>
> Sleep (0x10)
> Local0++
>}
>
> One thing that comes to mind is that the loop can end even if P0L2 is
> not cleared as it does only 5 iterations with 16 ms sleep between. Maybe
> Sleep() is implemented differently in Windows? I mean Linux may be
> "faster" here and return prematurely and if we leave the port into D0
> this does not happen, or something. I'm just throwing out ideas :)

But this actually works for the downstream component in D0, doesn't it?

Also, if the downstream component is in D0, the port actually should
stay in D0 too, so what would happen with the $subject patch applied?
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-11-21 Thread Rafael J. Wysocki

On Thu, Nov 21, 2019 at 12:28 PM Mika Westerberg
 wrote:
>
> On Wed, Nov 20, 2019 at 11:29:33PM +0100, Rafael J. Wysocki wrote:
> > > last week or so I found systems where the GPU was under the "PCI
> > > Express Root Port" (name from lspci) and on those systems all of that
> > > seems to work. So I am wondering if it's indeed just the 0x1901 one,
> > > which also explains Mikas case that Thunderbolt stuff works as devices
> > > never get populated under this particular bridge controller, but under
> > > those "Root Port"s
> >
> > It always is a PCIe port, but its location within the SoC may matter.
>
> Exactly. Intel hardware has PCIe ports on CPU side (these are called
> PEG, PCI Express Graphics, ports), and the PCH side. I think the IP is
> still the same.
>
> > Also some custom AML-based power management is involved and that may
> > be making specific assumptions on the configuration of the SoC and the
> > GPU at the time of its invocation which unfortunately are not known to
> > us.
> >
> > However, it looks like the AML invoked to power down the GPU from
> > acpi_pci_set_power_state() gets confused if it is not in PCI D0 at
> > that point, so it looks like that AML tries to access device memory on
> > the GPU (beyond the PCI config space) or similar which is not
> > accessible in PCI power states below D0.
>
> Or the PCI config space of the GPU when the parent root port is in D3hot
> (as it is the case here). Also then the GPU config space is not
> accessible.

Why would the parent port be in D3hot at that point?  Wouldn't that be
a suspend ordering violation?

> I took a look at the HP Omen ACPI tables which has similar problem and
> there is also check for Windows 7 (but not Linux) so I think one
> alternative workaround would be to add these devices into
> acpi_osi_dmi_table[] where .callback is set to dmi_disable_osi_win8 (or
> pass 'acpi_osi="!Windows 2012"' in the kernel command line).

I'd like to understand the facts that have been established so far
before deciding what to do about them. :-)
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-11-21 Thread Rafael J. Wysocki

On Thu, Nov 21, 2019 at 12:17 PM Mika Westerberg
 wrote:
>
> On Thu, Nov 21, 2019 at 12:03:52PM +0100, Rafael J. Wysocki wrote:
> > On Thu, Nov 21, 2019 at 11:14 AM Mika Westerberg
> >  wrote:
> > >
> > > On Wed, Nov 20, 2019 at 10:36:31PM +0100, Karol Herbst wrote:
> > > > with the branch and patch applied:
> > > > https://gist.githubusercontent.com/karolherbst/03c4c8141b0fa292d781badfa186479e/raw/5c62640afbc57d6e69ea924c338bd2836e770d02/gistfile1.txt
> > >
> > > Thanks for testing. Too bad it did not help :( I suppose there is no
> > > change if you increase the delay to say 1s?
> >
> > Well, look at the original patch in this thread.
> >
> > What it does is to prevent the device (GPU in this particular case)
> > from going into a PCI low-power state before invoking AML to power it
> > down (the AML is still invoked after this patch AFAICS), so why would
> > that have anything to do with the delays?
>
> Yes, I know what it does :) I was just thinking that maybe it's still
> the link that does not come up when we go back to D0 I guess that's not
> the case here.

I'm not sure why that would be related to putting the device into,
say, PCI D3 before invoking AML to remove power from it.  If it is not
in PCI D3 at this point, the AML still runs and still removes power
from it IIUC, so on the way back the situation is the same regardless:
the device has no power which (again) needs to be restored by AML.
That (in principle) should not depend on what happened to the device
before it lost power.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-11-21 Thread Rafael J. Wysocki

On Thu, Nov 21, 2019 at 12:08 PM Rafael J. Wysocki  wrote:
>
> On Thu, Nov 21, 2019 at 12:03 PM Rafael J. Wysocki  wrote:
> >
> > On Thu, Nov 21, 2019 at 11:14 AM Mika Westerberg
> >  wrote:
> > >
> > > On Wed, Nov 20, 2019 at 10:36:31PM +0100, Karol Herbst wrote:
> > > > with the branch and patch applied:
> > > > https://gist.githubusercontent.com/karolherbst/03c4c8141b0fa292d781badfa186479e/raw/5c62640afbc57d6e69ea924c338bd2836e770d02/gistfile1.txt
> > >
> > > Thanks for testing. Too bad it did not help :( I suppose there is no
> > > change if you increase the delay to say 1s?
> >
> > Well, look at the original patch in this thread.
> >
> > What it does is to prevent the device (GPU in this particular case)
> > from going into a PCI low-power state before invoking AML to power it
> > down (the AML is still invoked after this patch AFAICS), so why would
> > that have anything to do with the delays?
> >
> > The only reason would be the AML running too early, but that doesn't
> > seem likely.  IMO more likely is that the AML does something which
> > cannot be done to a device in a PCI low-power state.
>
> BTW, I'm wondering if anyone has tried to skip the AML instead of
> skipping the PCI PM in this case (as of 5.4-rc that would be a similar
> patch to skip the invocations of
> __pci_start/complete_power_transition() in pci_set_power_state() for
> the affected device).

Moving the dev->broken_nv_runpm test into
pci_platform_power_transition() (also for transitions into D0) would
be sufficient for that test if I'm not mistaken.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-11-21 Thread Rafael J. Wysocki

On Thu, Nov 21, 2019 at 12:03 PM Rafael J. Wysocki  wrote:
>
> On Thu, Nov 21, 2019 at 11:14 AM Mika Westerberg
>  wrote:
> >
> > On Wed, Nov 20, 2019 at 10:36:31PM +0100, Karol Herbst wrote:
> > > with the branch and patch applied:
> > > https://gist.githubusercontent.com/karolherbst/03c4c8141b0fa292d781badfa186479e/raw/5c62640afbc57d6e69ea924c338bd2836e770d02/gistfile1.txt
> >
> > Thanks for testing. Too bad it did not help :( I suppose there is no
> > change if you increase the delay to say 1s?
>
> Well, look at the original patch in this thread.
>
> What it does is to prevent the device (GPU in this particular case)
> from going into a PCI low-power state before invoking AML to power it
> down (the AML is still invoked after this patch AFAICS), so why would
> that have anything to do with the delays?
>
> The only reason would be the AML running too early, but that doesn't
> seem likely.  IMO more likely is that the AML does something which
> cannot be done to a device in a PCI low-power state.

BTW, I'm wondering if anyone has tried to skip the AML instead of
skipping the PCI PM in this case (as of 5.4-rc that would be a similar
patch to skip the invocations of
__pci_start/complete_power_transition() in pci_set_power_state() for
the affected device).
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-11-21 Thread Rafael J. Wysocki

On Thu, Nov 21, 2019 at 11:14 AM Mika Westerberg
 wrote:
>
> On Wed, Nov 20, 2019 at 10:36:31PM +0100, Karol Herbst wrote:
> > with the branch and patch applied:
> > https://gist.githubusercontent.com/karolherbst/03c4c8141b0fa292d781badfa186479e/raw/5c62640afbc57d6e69ea924c338bd2836e770d02/gistfile1.txt
>
> Thanks for testing. Too bad it did not help :( I suppose there is no
> change if you increase the delay to say 1s?

Well, look at the original patch in this thread.

What it does is to prevent the device (GPU in this particular case)
from going into a PCI low-power state before invoking AML to power it
down (the AML is still invoked after this patch AFAICS), so why would
that have anything to do with the delays?

The only reason would be the AML running too early, but that doesn't
seem likely.  IMO more likely is that the AML does something which
cannot be done to a device in a PCI low-power state.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-11-20 Thread Rafael J. Wysocki

On Wed, Nov 20, 2019 at 10:40 PM Karol Herbst  wrote:
>
> On Wed, Nov 20, 2019 at 10:37 PM Rafael J. Wysocki  wrote:
> >
> > On Wed, Nov 20, 2019 at 4:53 PM Mika Westerberg
> >  wrote:
> > >
> > > On Wed, Nov 20, 2019 at 04:37:14PM +0100, Karol Herbst wrote:
> > > > On Wed, Nov 20, 2019 at 4:15 PM Mika Westerberg
> > > >  wrote:
> > > > >
> > > > > On Wed, Nov 20, 2019 at 01:11:52PM +0100, Karol Herbst wrote:
> > > > > > On Wed, Nov 20, 2019 at 1:09 PM Mika Westerberg
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Wed, Nov 20, 2019 at 12:58:00PM +0100, Karol Herbst wrote:
> > > > > > > > overall, what I really want to know is, _why_ does it work on 
> > > > > > > > windows?
> > > > > > >
> > > > > > > So do I ;-)
> > > > > > >
> > > > > > > > Or what are we doing differently on Linux so that it doesn't 
> > > > > > > > work? If
> > > > > > > > anybody has any idea on how we could dig into this and figure 
> > > > > > > > it out
> > > > > > > > on this level, this would probably allow us to get closer to 
> > > > > > > > the root
> > > > > > > > cause? no?
> > > > > > >
> > > > > > > Have you tried to use the acpi_rev_override parameter in your 
> > > > > > > system and
> > > > > > > does it have any effect?
> > > > > > >
> > > > > > > Also did you try to trace the ACPI _ON/_OFF() methods? I think 
> > > > > > > that
> > > > > > > should hopefully reveal something.
> > > > > > >
> > > > > >
> > > > > > I think I did in the past and it seemed to have worked, there is 
> > > > > > just
> > > > > > one big issue with this: it's a Dell specific workaround afaik, and
> > > > > > this issue plagues not just Dell, but we've seen it on HP and Lenovo
> > > > > > laptops as well, and I've heard about users having the same issues 
> > > > > > on
> > > > > > Asus and MSI laptops as well.
> > > > >
> > > > > Maybe it is not a workaround at all but instead it simply determines
> > > > > whether the system supports RTD3 or something like that (IIRC Windows 
> > > > > 8
> > > > > started supporting it). Maybe Dell added check for Linux because at 
> > > > > that
> > > > > time Linux did not support it.
> > > > >
> > > >
> > > > the point is, it's not checking it by default, so by default you still
> > > > run into the windows 8 codepath.
> > >
> > > Well you can add the quirk to acpi_rev_dmi_table[] so it goes to that
> > > path by default. There are a bunch of similar entries for Dell machines.
> >
> > OK, so the "Linux path" works and the other doesn't.
> >
> > I thought that this was the other way around, sorry for the confusion.
> >
> > > Of course this does not help the non-Dell users so we would still need
> > > to figure out the root cause.
> >
> > Right.
> >
> > Whatever it is, though, AML appears to be involved in it and AFAICS
> > there's no evidence that it affects any root ports that are not
> > populated with NVidia GPUs.
> >
>
> last week or so I found systems where the GPU was under the "PCI
> Express Root Port" (name from lspci) and on those systems all of that
> seems to work. So I am wondering if it's indeed just the 0x1901 one,
> which also explains Mikas case that Thunderbolt stuff works as devices
> never get populated under this particular bridge controller, but under
> those "Root Port"s

It always is a PCIe port, but its location within the SoC may matter.

Also some custom AML-based power management is involved and that may
be making specific assumptions on the configuration of the SoC and the
GPU at the time of its invocation which unfortunately are not known to
us.

However, it looks like the AML invoked to power down the GPU from
acpi_pci_set_power_state() gets confused if it is not in PCI D0 at
that point, so it looks like that AML tries to access device memory on
the GPU (beyond the PCI config space) or similar which is not
accessible in PCI power states below D0.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-11-20 Thread Rafael J. Wysocki

On Wed, Nov 20, 2019 at 4:53 PM Mika Westerberg
 wrote:
>
> On Wed, Nov 20, 2019 at 04:37:14PM +0100, Karol Herbst wrote:
> > On Wed, Nov 20, 2019 at 4:15 PM Mika Westerberg
> >  wrote:
> > >
> > > On Wed, Nov 20, 2019 at 01:11:52PM +0100, Karol Herbst wrote:
> > > > On Wed, Nov 20, 2019 at 1:09 PM Mika Westerberg
> > > >  wrote:
> > > > >
> > > > > On Wed, Nov 20, 2019 at 12:58:00PM +0100, Karol Herbst wrote:
> > > > > > overall, what I really want to know is, _why_ does it work on 
> > > > > > windows?
> > > > >
> > > > > So do I ;-)
> > > > >
> > > > > > Or what are we doing differently on Linux so that it doesn't work? 
> > > > > > If
> > > > > > anybody has any idea on how we could dig into this and figure it out
> > > > > > on this level, this would probably allow us to get closer to the 
> > > > > > root
> > > > > > cause? no?
> > > > >
> > > > > Have you tried to use the acpi_rev_override parameter in your system 
> > > > > and
> > > > > does it have any effect?
> > > > >
> > > > > Also did you try to trace the ACPI _ON/_OFF() methods? I think that
> > > > > should hopefully reveal something.
> > > > >
> > > >
> > > > I think I did in the past and it seemed to have worked, there is just
> > > > one big issue with this: it's a Dell specific workaround afaik, and
> > > > this issue plagues not just Dell, but we've seen it on HP and Lenovo
> > > > laptops as well, and I've heard about users having the same issues on
> > > > Asus and MSI laptops as well.
> > >
> > > Maybe it is not a workaround at all but instead it simply determines
> > > whether the system supports RTD3 or something like that (IIRC Windows 8
> > > started supporting it). Maybe Dell added check for Linux because at that
> > > time Linux did not support it.
> > >
> >
> > the point is, it's not checking it by default, so by default you still
> > run into the windows 8 codepath.
>
> Well you can add the quirk to acpi_rev_dmi_table[] so it goes to that
> path by default. There are a bunch of similar entries for Dell machines.

OK, so the "Linux path" works and the other doesn't.

I thought that this was the other way around, sorry for the confusion.

> Of course this does not help the non-Dell users so we would still need
> to figure out the root cause.

Right.

Whatever it is, though, AML appears to be involved in it and AFAICS
there's no evidence that it affects any root ports that are not
populated with NVidia GPUs.

Now, one thing is still not clear to me from the discussion so far: is
the _PR3 method you mentioned defined under the GPU device object or
under the port device object?
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-11-20 Thread Rafael J. Wysocki

On Wed, Nov 20, 2019 at 1:10 PM Karol Herbst  wrote:
>
> On Wed, Nov 20, 2019 at 1:06 PM Rafael J. Wysocki  wrote:
> >
> > On Wed, Nov 20, 2019 at 12:51 PM Karol Herbst  wrote:
> > >
> > > On Wed, Nov 20, 2019 at 12:48 PM Rafael J. Wysocki  
> > > wrote:
> > > >
> > > > On Wed, Nov 20, 2019 at 12:22 PM Mika Westerberg
> > > >  wrote:
> > > > >
> > > > > On Wed, Nov 20, 2019 at 11:52:22AM +0100, Rafael J. Wysocki wrote:
> > > > > > On Wed, Nov 20, 2019 at 11:18 AM Mika Westerberg
> > > > > >  wrote:
> > > > > > >
> >
> > [cut]
> >
> > > > > >
> > > > > > Oh, so does it look like we are trying to work around AML that tried
> > > > > > to work around some problematic behavior in Linux at one point?
> > > > >
> > > > > Yes, it looks like so if I read the ASL right.
> > > >
> > > > OK, so that would call for a DMI-based quirk as the real cause for the
> > > > issue seems to be the AML in question, which means a firmware problem.
> > > >
> > >
> > > And I disagree as this is a linux specific workaround and windows goes
> > > that path and succeeds. This firmware based workaround was added,
> > > because it broke on Linux.
> >
> > Apparently so at the time it was added, but would it still break after
> > the kernel changes made since then?
> >
> > Moreover, has it not become harmful now?  IOW, wouldn't it work after
> > removing the "Linux workaround" from the AML?
> >
> > The only way to verify that I can see would be to run the system with
> > custom ACPI tables without the "Linux workaround" in the AML in
> > question.
> >
>
> the workaround is not enabled by default, because it has to be
> explicitly enabled by the user.

I'm not sure what you are talking about.

I'm taking specifically about the ((OSYS == 0x07DF) && (_REV == 0x05))
check mentioned by Mika which doesn't seem to depend on user input in
any way.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-11-20 Thread Rafael J. Wysocki

On Wed, Nov 20, 2019 at 1:06 PM Rafael J. Wysocki  wrote:
>
> On Wed, Nov 20, 2019 at 12:51 PM Karol Herbst  wrote:
> >
> > On Wed, Nov 20, 2019 at 12:48 PM Rafael J. Wysocki  
> > wrote:
> > >
> > > On Wed, Nov 20, 2019 at 12:22 PM Mika Westerberg
> > >  wrote:
> > > >
> > > > On Wed, Nov 20, 2019 at 11:52:22AM +0100, Rafael J. Wysocki wrote:
> > > > > On Wed, Nov 20, 2019 at 11:18 AM Mika Westerberg
> > > > >  wrote:
> > > > > >
>
> [cut]
>
> > > > >
> > > > > Oh, so does it look like we are trying to work around AML that tried
> > > > > to work around some problematic behavior in Linux at one point?
> > > >
> > > > Yes, it looks like so if I read the ASL right.
> > >
> > > OK, so that would call for a DMI-based quirk as the real cause for the
> > > issue seems to be the AML in question, which means a firmware problem.
> > >
> >
> > And I disagree as this is a linux specific workaround and windows goes
> > that path and succeeds. This firmware based workaround was added,
> > because it broke on Linux.
>
> Apparently so at the time it was added, but would it still break after
> the kernel changes made since then?
>
> Moreover, has it not become harmful now?  IOW, wouldn't it work after
> removing the "Linux workaround" from the AML?
>
> The only way to verify that I can see would be to run the system with
> custom ACPI tables without the "Linux workaround" in the AML in
> question.

Or running it with acpi_rev_override as suggested by Mika, which
effectively would be the same thing.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-11-20 Thread Rafael J. Wysocki

On Wed, Nov 20, 2019 at 12:51 PM Karol Herbst  wrote:
>
> On Wed, Nov 20, 2019 at 12:48 PM Rafael J. Wysocki  wrote:
> >
> > On Wed, Nov 20, 2019 at 12:22 PM Mika Westerberg
> >  wrote:
> > >
> > > On Wed, Nov 20, 2019 at 11:52:22AM +0100, Rafael J. Wysocki wrote:
> > > > On Wed, Nov 20, 2019 at 11:18 AM Mika Westerberg
> > > >  wrote:
> > > > >

[cut]

> > > >
> > > > Oh, so does it look like we are trying to work around AML that tried
> > > > to work around some problematic behavior in Linux at one point?
> > >
> > > Yes, it looks like so if I read the ASL right.
> >
> > OK, so that would call for a DMI-based quirk as the real cause for the
> > issue seems to be the AML in question, which means a firmware problem.
> >
>
> And I disagree as this is a linux specific workaround and windows goes
> that path and succeeds. This firmware based workaround was added,
> because it broke on Linux.

Apparently so at the time it was added, but would it still break after
the kernel changes made since then?

Moreover, has it not become harmful now?  IOW, wouldn't it work after
removing the "Linux workaround" from the AML?

The only way to verify that I can see would be to run the system with
custom ACPI tables without the "Linux workaround" in the AML in
question.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-11-20 Thread Rafael J. Wysocki

On Wed, Nov 20, 2019 at 12:22 PM Mika Westerberg
 wrote:
>
> On Wed, Nov 20, 2019 at 11:52:22AM +0100, Rafael J. Wysocki wrote:
> > On Wed, Nov 20, 2019 at 11:18 AM Mika Westerberg
> >  wrote:
> > >
> > > Hi Karol,
> > >
> > > On Tue, Nov 19, 2019 at 11:26:45PM +0100, Karol Herbst wrote:
> > > > On Tue, Nov 19, 2019 at 10:50 PM Bjorn Helgaas  
> > > > wrote:
> > > > >
> > > > > [+cc Dave]
> > > > >
> > > > > On Thu, Oct 17, 2019 at 02:19:01PM +0200, Karol Herbst wrote:
> > > > > > Fixes state transitions of Nvidia Pascal GPUs from D3cold into 
> > > > > > higher device
> > > > > > states.
> > > > > >
> > > > > > v2: convert to pci_dev quirk
> > > > > > put a proper technical explanation of the issue as a in-code 
> > > > > > comment
> > > > > > v3: disable it only for certain combinations of intel and nvidia 
> > > > > > hardware
> > > > > > v4: simplify quirk by setting flag on the GPU itself
> > > > >
> > > > > I have zero confidence that we understand the real problem, but we do
> > > > > need to do something with this.  I'll merge it for v5.5 if we get the
> > > > > minor procedural stuff below straightened out.
> > > > >
> > > >
> > > > Thanks, and I agree with your statement, but at this point I think
> > > > only Intel can help out digging deeper as I see no way to debug this
> > > > further.
> > >
> > > I don't have anything against this patch, as long as the quirk stays
> > > limited to the particular root port leading to the NVIDIA GPU. The
> > > reason why I think it should to be limited is that I'm pretty certain
> > > the problem is not in the root port itself. I have here a KBL based
> > > Thinkpad X1 Carbon 6th gen that can put the TBT controller into D3cold
> > > (it is connected to PCH root port) and it wakes up there just fine, so
> > > don't want to break that.
> > >
> > > Now, PCIe devices cannot go into D3cold all by themselves. They always
> > > need help from the platform side which is ACPI in this case. This is
> > > done by having the device to have _PR3 method that returns one or more
> > > power resources that the OS is supposed to turn off when the device is
> > > put into D3cold. All of that is implemented as form of ACPI methods that
> > > pretty much do the hardware specific things that are outside of PCIe
> > > spec to get the device into D3cold. At high level the _OFF() method
> > > causes the root port to broadcast PME_Turn_Off message that results the
> > > link to enter L2/3 ready, it then asserts PERST, configures WAKE (both
> > > can be GPIOs) and finally removes power (if the link goes into L3,
> > > otherwise it goes into L2).
> > >
> > > I think this is where the problem actually lies - the ASL methods that
> > > are used to put the device into D3cold and back. We know that in Windows
> > > this all works fine so unless Windows quirks the root port the same way
> > > there is another reason behind this.
> > >
> > > In case of Dell XPS 9560 (IIRC that's the machine you have) the
> > > corresponding power resource is called \_SB.PCI0.PEG0.PG00 and its
> > > _ON/_OFF methods end up calling PGON()/PGOF() accordingly. The methods
> > > itself do lots of things and it is hard to follow the dissassembled
> > > ASL which does not have any comments but there are couple of things that
> > > stand out where we may go into a different path. One of them is this in
> > > the PGOF() method:
> > >
> > >If (((OSYS <= 0x07D9) || ((OSYS == 0x07DF) && (_REV == 0x05
> > >
> > > The ((OSYS == 0x07DF) && (_REV == 0x05)) checks specifically for Linux
> > > (see [1] and 18d78b64fddc ("ACPI / init: Make it possible to override
> > > _REV")) so it might be that Dell people tested this at some point in
> > > Linux as well. Added Mario in case he has any ideas.
> > >
> > > Previously I suggested you to try the ACPI method tracing to see what
> > > happens inside PGOF(). Did you have time to try it? It may provide more
> > > information about that is happening inside those methods and hopefully
> > > point us to the root cause.
> > >
> > > Also if you haven't tried already passing acpi_rev_override in the
> > > command line makes the _REV to return 5 so it should go into the "Linux"
> > > path in PGOF().
> >
> > Oh, so does it look like we are trying to work around AML that tried
> > to work around some problematic behavior in Linux at one point?
>
> Yes, it looks like so if I read the ASL right.

OK, so that would call for a DMI-based quirk as the real cause for the
issue seems to be the AML in question, which means a firmware problem.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-11-20 Thread Rafael J. Wysocki

On Wed, Nov 20, 2019 at 11:18 AM Mika Westerberg
 wrote:
>
> Hi Karol,
>
> On Tue, Nov 19, 2019 at 11:26:45PM +0100, Karol Herbst wrote:
> > On Tue, Nov 19, 2019 at 10:50 PM Bjorn Helgaas  wrote:
> > >
> > > [+cc Dave]
> > >
> > > On Thu, Oct 17, 2019 at 02:19:01PM +0200, Karol Herbst wrote:
> > > > Fixes state transitions of Nvidia Pascal GPUs from D3cold into higher 
> > > > device
> > > > states.
> > > >
> > > > v2: convert to pci_dev quirk
> > > > put a proper technical explanation of the issue as a in-code comment
> > > > v3: disable it only for certain combinations of intel and nvidia 
> > > > hardware
> > > > v4: simplify quirk by setting flag on the GPU itself
> > >
> > > I have zero confidence that we understand the real problem, but we do
> > > need to do something with this.  I'll merge it for v5.5 if we get the
> > > minor procedural stuff below straightened out.
> > >
> >
> > Thanks, and I agree with your statement, but at this point I think
> > only Intel can help out digging deeper as I see no way to debug this
> > further.
>
> I don't have anything against this patch, as long as the quirk stays
> limited to the particular root port leading to the NVIDIA GPU. The
> reason why I think it should to be limited is that I'm pretty certain
> the problem is not in the root port itself. I have here a KBL based
> Thinkpad X1 Carbon 6th gen that can put the TBT controller into D3cold
> (it is connected to PCH root port) and it wakes up there just fine, so
> don't want to break that.
>
> Now, PCIe devices cannot go into D3cold all by themselves. They always
> need help from the platform side which is ACPI in this case. This is
> done by having the device to have _PR3 method that returns one or more
> power resources that the OS is supposed to turn off when the device is
> put into D3cold. All of that is implemented as form of ACPI methods that
> pretty much do the hardware specific things that are outside of PCIe
> spec to get the device into D3cold. At high level the _OFF() method
> causes the root port to broadcast PME_Turn_Off message that results the
> link to enter L2/3 ready, it then asserts PERST, configures WAKE (both
> can be GPIOs) and finally removes power (if the link goes into L3,
> otherwise it goes into L2).
>
> I think this is where the problem actually lies - the ASL methods that
> are used to put the device into D3cold and back. We know that in Windows
> this all works fine so unless Windows quirks the root port the same way
> there is another reason behind this.
>
> In case of Dell XPS 9560 (IIRC that's the machine you have) the
> corresponding power resource is called \_SB.PCI0.PEG0.PG00 and its
> _ON/_OFF methods end up calling PGON()/PGOF() accordingly. The methods
> itself do lots of things and it is hard to follow the dissassembled
> ASL which does not have any comments but there are couple of things that
> stand out where we may go into a different path. One of them is this in
> the PGOF() method:
>
>If (((OSYS <= 0x07D9) || ((OSYS == 0x07DF) && (_REV == 0x05
>
> The ((OSYS == 0x07DF) && (_REV == 0x05)) checks specifically for Linux
> (see [1] and 18d78b64fddc ("ACPI / init: Make it possible to override
> _REV")) so it might be that Dell people tested this at some point in
> Linux as well. Added Mario in case he has any ideas.
>
> Previously I suggested you to try the ACPI method tracing to see what
> happens inside PGOF(). Did you have time to try it? It may provide more
> information about that is happening inside those methods and hopefully
> point us to the root cause.
>
> Also if you haven't tried already passing acpi_rev_override in the
> command line makes the _REV to return 5 so it should go into the "Linux"
> path in PGOF().

Oh, so does it look like we are trying to work around AML that tried
to work around some problematic behavior in Linux at one point?

> [1] 
> https://www.kernel.org/doc/html/latest/firmware-guide/acpi/osi.html#do-not-use-rev
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 4/5] power: avs: smartreflex: Remove superfluous cast in debugfs_create_file() call

2019-11-13 Thread Rafael J. Wysocki

On Monday, October 21, 2019 4:51:48 PM CET Geert Uytterhoeven wrote:
> There is no need to cast a typed pointer to a void pointer when calling
> a function that accepts the latter.  Remove it, as the cast prevents
> further compiler checks.
> 
> Signed-off-by: Geert Uytterhoeven 
> ---
>  drivers/power/avs/smartreflex.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/power/avs/smartreflex.c b/drivers/power/avs/smartreflex.c
> index 4684e7df833a81e9..5376f3d22f31eade 100644
> --- a/drivers/power/avs/smartreflex.c
> +++ b/drivers/power/avs/smartreflex.c
> @@ -905,7 +905,7 @@ static int omap_sr_probe(struct platform_device *pdev)
>   sr_info->dbg_dir = debugfs_create_dir(sr_info->name, sr_dbg_dir);
>  
>   debugfs_create_file("autocomp", S_IRUGO | S_IWUSR, sr_info->dbg_dir,
> - (void *)sr_info, _sr_fops);
> + sr_info, _sr_fops);
>   debugfs_create_x32("errweight", S_IRUGO, sr_info->dbg_dir,
>  _info->err_weight);
>   debugfs_create_x32("errmaxlimit", S_IRUGO, sr_info->dbg_dir,
> 

Applying as 5.5 material, thanks!




___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 4/5] power: avs: smartreflex: Remove superfluous cast in debugfs_create_file() call

2019-11-08 Thread Rafael J. Wysocki

On Monday, October 21, 2019 4:51:48 PM CET Geert Uytterhoeven wrote:
> There is no need to cast a typed pointer to a void pointer when calling
> a function that accepts the latter.  Remove it, as the cast prevents
> further compiler checks.
> 
> Signed-off-by: Geert Uytterhoeven 

Greg, have you taken this one by any chance?

> ---
>  drivers/power/avs/smartreflex.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/power/avs/smartreflex.c b/drivers/power/avs/smartreflex.c
> index 4684e7df833a81e9..5376f3d22f31eade 100644
> --- a/drivers/power/avs/smartreflex.c
> +++ b/drivers/power/avs/smartreflex.c
> @@ -905,7 +905,7 @@ static int omap_sr_probe(struct platform_device *pdev)
>   sr_info->dbg_dir = debugfs_create_dir(sr_info->name, sr_dbg_dir);
>  
>   debugfs_create_file("autocomp", S_IRUGO | S_IWUSR, sr_info->dbg_dir,
> - (void *)sr_info, _sr_fops);
> + sr_info, _sr_fops);
>   debugfs_create_x32("errweight", S_IRUGO, sr_info->dbg_dir,
>  _info->err_weight);
>   debugfs_create_x32("errmaxlimit", S_IRUGO, sr_info->dbg_dir,
> 




___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

1 2 3 4 5 6 >

1 - 100 of 526 matches

Mail list logo