Hi Laura,

>>> We've received a number of reports of warnings when coming
>>> out of suspend with certain bluetooth firmware configurations:
>>> 
>>> WARNING: CPU: 3 PID: 3280 at drivers/base/firmware_class.c:1126
>>> _request_firmware+0x558/0x810()
>>> Modules linked in: ccm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
>>> xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter
>>> ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
>>> ip6table_mangle ip6table_security ip6table_raw ip6table_filter
>>> ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
>>> nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw
>>> binfmt_misc bnep intel_rapl iosf_mbi arc4 x86_pkg_temp_thermal
>>> snd_hda_codec_hdmi coretemp kvm_intel joydev snd_hda_codec_realtek
>>> iwldvm snd_hda_codec_generic kvm iTCO_wdt mac80211 iTCO_vendor_support
>>> snd_hda_intel snd_hda_controller snd_hda_codec crct10dif_pclmul
>>> snd_hwdep crc32_pclmul snd_seq crc32c_intel ghash_clmulni_intel uvcvideo
>>> snd_seq_device iwlwifi btusb videobuf2_vmalloc snd_pcm videobuf2_core
>>> serio_raw bluetooth cfg80211 videobuf2_memops sdhci_pci v4l2_common
>>> videodev thinkpad_acpi sdhci i2c_i801 lpc_ich mfd_core wacom mmc_core
>>> media snd_timer tpm_tis hid_logitech_hidpp wmi tpm rfkill snd mei_me mei
>>> shpchp soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc i915
>>> i2c_algo_bit drm_kms_helper e1000e drm hid_logitech_dj ptp pps_core
>>> video
>>> CPU: 3 PID: 3280 Comm: kworker/u17:0 Not tainted 3.19.3-200.fc21.x86_64
>>> Hardware name: LENOVO 343522U/343522U, BIOS GCET96WW (2.56 ) 10/22/2013
>>> Workqueue: hci0 hci_power_on [bluetooth]
>>> 0000000000000000 0000000089944328 ffff88040acffb78 ffffffff8176e215
>>> 0000000000000000 0000000000000000 ffff88040acffbb8 ffffffff8109bc1a
>>> 0000000000000000 ffff88040acffcd0 00000000fffffff5 ffff8804076bac40
>>> Call Trace:
>>> [<ffffffff8176e215>] dump_stack+0x45/0x57
>>> [<ffffffff8109bc1a>] warn_slowpath_common+0x8a/0xc0
>>> [<ffffffff8109bd4a>] warn_slowpath_null+0x1a/0x20
>>> [<ffffffff814dbe78>] _request_firmware+0x558/0x810
>>> [<ffffffff814dc165>] request_firmware+0x35/0x50
>>> [<ffffffffa03a7886>] btusb_setup_bcm_patchram+0x86/0x590 [btusb]
>>> [<ffffffff814d40e6>] ? rpm_idle+0xd6/0x230
>>> [<ffffffffa04d4801>] hci_dev_do_open+0xe1/0xa90 [bluetooth]
>>> [<ffffffff810c51dd>] ? ttwu_do_activate.constprop.90+0x5d/0x70
>>> [<ffffffffa04d5980>] hci_power_on+0x40/0x200 [bluetooth]
>>> [<ffffffff810b487c>] process_one_work+0x14c/0x3f0
>>> [<ffffffff810b52f3>] worker_thread+0x53/0x470
>>> [<ffffffff810b52a0>] ? rescuer_thread+0x300/0x300
>>> [<ffffffff810ba548>] kthread+0xd8/0xf0
>>> [<ffffffff810ba470>] ? kthread_create_on_node+0x1b0/0x1b0
>>> [<ffffffff81774958>] ret_from_fork+0x58/0x90
>>> [<ffffffff810ba470>] ? kthread_create_on_node+0x1b0/0x1b0
>>> 
>>> This occurs after every resume.
>>> 
>>> When resuming, the bluetooth stack calls hci_register_dev,
>>> allocates a new workqueue, and immediately schedules the
>>> power_on on the newly created workqueue. Since the new
>>> workqueue is not freezable, the work runs immediately and
>>> triggers the warning since resume is still happening and
>>> usermodehelper has not yet been re-enabled. Fix this by
>>> making the request workqueue freezable. This ensures
>>> the work will not run until unfreezing occurs and usermodehelper
>>> is re-enabled.
>>> 
>>> Signed-off-by: Laura Abbott <labb...@fedoraproject.org>
>>> ---
>>> Resend because I think this got lost in the thread.
>>> This should be fixing the actual root cause of the warnings.
>> 
>> so I am not convinced that it actually fixes the root cause. This is just 
>> papering over it.
>> 
>> The problem is pretty clear, the firmware for some of the Bluetooth 
>> controllers is optional and that means during the original hotplug event it 
>> will not be found and the controller keeps operating. However for some 
>> reason instead of actually suspending and resuming the Bluetooth controller, 
>> we see a unplug + replug (since we are going through probe) and that is 
>> causing this funny behaviour.
>> 
> 
> Fundamentally the issue is the request_firmware is being called at the
> wrong time. From Documentation/workqueue.txt:
> 
>  WQ_FREEZABLE
> 
>        A freezable wq participates in the freeze phase of the system
>        suspend operations.  Work items on the wq are drained and no
>        new work item starts execution until thawed.
> 
> 
> By making the request workqueue freezable, any work that gets scheduled
> will not run until the time for tasks to unthaw.
> 4320f6b1d9db4ca912c5eb6ecb328b2e090e1586
> ("PM / sleep: Fix request_firmware() error at resume") fixed the resume
> path such that before all tasks are unthawed, calls to
> usermodehelper_read_trylock will block until usermodehelper is fully
> resumed. This means that any task which is frozen and then woken up
> again should have the right sequencing for usermodehelper. The workqueue
> which handled the bluetooth power on was never being frozen properly so
> there was never any guarantee of when it would run. This patch gives
> it the necessary sequence.
> 
>> So how does making one of the core workqueues freezable fixes this the right 
>> way. I do not even know how many other side effects that might have. That 
>> hdev->req_workqueue is a Bluetooth core internal workqueue that we are using 
>> for multiple tasks.
>> 
>> Rather tell me on why we are probing the USB devices that might need 
>> firmware without having userspace ready. It sounds to me that the USB driver 
>> probe callback should be delayed if we can not guarantee that it can request 
>> firmware. As I explained many times, the call path that causes this is going 
>> through probe callback of the driver itself.
>> 
> 
> I agree that if the driver probe function was requesting firmware
> directly there would be a problem. The power_on function is already
> being called asynchronously on a workqueue. Making that workqueue freezable 
> does exactly the delay you describe.

I am not convinced. Now we are hacking the Bluetooth core layer (which has 
nothing to do with the drivers suspend/resume or probe) to do something 
different so that we do not see this warning.

I can not do anything about the platform in question choosing a unplug/replug 
for suspend/resume instead of having a proper USB suspend and resume handling. 
That is pretty much out of our control. I would rather have the USB subsystem 
delay the probe() callback if we tell it to. Of just have request_firmware() 
actually sleep until userspace is ready. Seriously, why is request_firmware not 
just sleeping for us.

> The side effects should be limited to the change in behavior of
> draining all work items at freeze time and then not running
> again until task thaw. Do you know of any limitations where
> draining the workqueue at freeze time could not happen or would
> block indefinitely?

Has anybody actually looked at the hdev->req_workqueue usage? You are touching 
core code now and unless you convince me that this has no impact, then I am 
cautious first of all.

Regards

Marcel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to