Resending in plain text.
On Wed, Jul 15, 2020 at 9:56 AM Alain Michaud <alainmich...@google.com> wrote: > > Hi Marcel, > > Sorry, just got around to this. > > On Tue, Jun 30, 2020 at 2:55 AM Marcel Holtmann <mar...@holtmann.org> wrote: >> >> Hi Archie, >> >> >>> There is a possibility that an ACL packet is received before we >> >>> receive the HCI connect event for the corresponding handle. If this >> >>> happens, we discard the ACL packet. >> >>> >> >>> Rather than just ignoring them, this patch provides a queue for >> >>> incoming ACL packet without a handle. The queue is processed when >> >>> receiving a HCI connection event. If 2 seconds elapsed without >> >>> receiving the HCI connection event, assume something bad happened >> >>> and discard the queued packet. >> >>> >> >>> Signed-off-by: Archie Pusaka <apus...@chromium.org> >> >>> Reviewed-by: Abhishek Pandit-Subedi <abhishekpan...@chromium.org> >> >> >> >> so two things up front. I want to hide this behind a >> >> HCI_QUIRK_OUT_OF_ORDER_ACL that a transport driver has to set first. >> >> Frankly if this kind of out-of-order happens on UART or SDIO transports, >> >> then something is obviously going wrong. I have no plan to fix up after a >> >> fully serialized transport. >> >> >> >> Secondly, if a transport sets HCI_QUIRK_OUT_OF_ORDER_ACL, then I want >> >> this off by default. You can enable it via an experimental setting. The >> >> reason here is that we have to make it really hard and fail as often as >> >> possible so that hardware manufactures and spec writers realize that >> >> something is fundamentally broken here. > > I don't have any objection to making this explicit enable to non serialized > transports. However, I do wonder what the intention is around making this > off by default. We already know there is a race condition between the > interupt and bulk endpoints over USB, so this can and does happen. Hardware > manufaturers can't relly do much about this other than trying to pull the > interupt endpoint more often, but that's only a workaround, it can't avoid it > all together. > > IMO, this seems like a legitimate fix at the host level and I don't see any > obvious benefits to hide this fix under an experimental feature and make it > more difficult for the customers and system integrators to discover. > >> >> >> >> >> I have no problem in running the code and complaining loudly in case the >> >> quirk has been set. Just injecting the packets can only happen if >> >> bluetoothd explicitly enabled it. >> > >> > Got it. >> > >> >> >> >> >> >>> >> >>> --- >> >>> >> >>> include/net/bluetooth/hci_core.h | 8 +++ >> >>> net/bluetooth/hci_core.c | 84 +++++++++++++++++++++++++++++--- >> >>> net/bluetooth/hci_event.c | 2 + >> >>> 3 files changed, 88 insertions(+), 6 deletions(-) >> >>> >> >>> diff --git a/include/net/bluetooth/hci_core.h >> >>> b/include/net/bluetooth/hci_core.h >> >>> index 836dc997ff94..b69ecdd0d15a 100644 >> >>> --- a/include/net/bluetooth/hci_core.h >> >>> +++ b/include/net/bluetooth/hci_core.h >> >>> @@ -270,6 +270,9 @@ struct adv_monitor { >> >>> /* Default authenticated payload timeout 30s */ >> >>> #define DEFAULT_AUTH_PAYLOAD_TIMEOUT 0x0bb8 >> >>> >> >>> +/* Time to keep ACL packets without a corresponding handle queued (2s) >> >>> */ >> >>> +#define PENDING_ACL_TIMEOUT msecs_to_jiffies(2000) >> >>> + >> >> >> >> Do we have some btmon traces with timestamps. Isn’t a second enough? >> >> Actually 2 seconds is an awful long time. >> > >> > When this happens in the test lab, the HCI connect event is about >> > 0.002 second behind the first ACL packet. We can change this if >> > required. >> > >> >> >> >>> struct amp_assoc { >> >>> __u16 len; >> >>> __u16 offset; >> >>> @@ -538,6 +541,9 @@ struct hci_dev { >> >>> struct delayed_work rpa_expired; >> >>> bdaddr_t rpa; >> >>> >> >>> + struct delayed_work remove_pending_acl; >> >>> + struct sk_buff_head pending_acl_q; >> >>> + >> >> >> >> can we name this ooo_q and move it to the other queues in this struct. >> >> Unless we want to add a Kconfig option around it, we don’t need to keep >> >> it here. >> > >> > Ack. >> > >> >> >> >>> #if IS_ENABLED(CONFIG_BT_LEDS) >> >>> struct led_trigger *power_led; >> >>> #endif >> >>> @@ -1773,6 +1779,8 @@ void hci_le_start_enc(struct hci_conn *conn, >> >>> __le16 ediv, __le64 rand, >> >>> void hci_copy_identity_address(struct hci_dev *hdev, bdaddr_t *bdaddr, >> >>> u8 *bdaddr_type); >> >>> >> >>> +void hci_process_pending_acl(struct hci_dev *hdev, struct hci_conn >> >>> *conn); >> >>> + >> >>> #define SCO_AIRMODE_MASK 0x0003 >> >>> #define SCO_AIRMODE_CVSD 0x0000 >> >>> #define SCO_AIRMODE_TRANSP 0x0003 >> >>> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c >> >>> index 7959b851cc63..30780242c267 100644 >> >>> --- a/net/bluetooth/hci_core.c >> >>> +++ b/net/bluetooth/hci_core.c >> >>> @@ -1786,6 +1786,7 @@ int hci_dev_do_close(struct hci_dev *hdev) >> >>> skb_queue_purge(&hdev->rx_q); >> >>> skb_queue_purge(&hdev->cmd_q); >> >>> skb_queue_purge(&hdev->raw_q); >> >>> + skb_queue_purge(&hdev->pending_acl_q); >> >>> >> >>> /* Drop last sent command */ >> >>> if (hdev->sent_cmd) { >> >>> @@ -3518,6 +3519,78 @@ static int hci_suspend_notifier(struct >> >>> notifier_block *nb, unsigned long action, >> >>> return NOTIFY_STOP; >> >>> } >> >>> >> >>> +static void hci_add_pending_acl(struct hci_dev *hdev, struct sk_buff >> >>> *skb) >> >>> +{ >> >>> + skb_queue_tail(&hdev->pending_acl_q, skb); >> >>> + >> >>> + queue_delayed_work(hdev->workqueue, &hdev->remove_pending_acl, >> >>> + PENDING_ACL_TIMEOUT); >> >>> +} >> >>> + >> >>> +void hci_process_pending_acl(struct hci_dev *hdev, struct hci_conn >> >>> *conn) >> >>> +{ >> >>> + struct sk_buff *skb, *tmp; >> >>> + struct hci_acl_hdr *hdr; >> >>> + u16 handle, flags; >> >>> + bool reset_timer = false; >> >>> + >> >>> + skb_queue_walk_safe(&hdev->pending_acl_q, skb, tmp) { >> >>> + hdr = (struct hci_acl_hdr *)skb->data; >> >>> + handle = __le16_to_cpu(hdr->handle); >> >>> + flags = hci_flags(handle); >> >>> + handle = hci_handle(handle); >> >>> + >> >>> + if (handle != conn->handle) >> >>> + continue; >> >>> + >> >>> + __skb_unlink(skb, &hdev->pending_acl_q); >> >>> + skb_pull(skb, HCI_ACL_HDR_SIZE); >> >>> + >> >>> + l2cap_recv_acldata(conn, skb, flags); >> >>> + reset_timer = true; >> >>> + } >> >>> + >> >>> + if (reset_timer) >> >>> + mod_delayed_work(hdev->workqueue, >> >>> &hdev->remove_pending_acl, >> >>> + PENDING_ACL_TIMEOUT); >> >>> +} >> >>> + >> >>> +/* Remove the oldest pending ACL, and all pending ACLs with the same >> >>> handle */ >> >>> +static void hci_remove_pending_acl(struct work_struct *work) >> >>> +{ >> >>> + struct hci_dev *hdev; >> >>> + struct sk_buff *skb, *tmp; >> >>> + struct hci_acl_hdr *hdr; >> >>> + u16 handle, oldest_handle; >> >>> + >> >>> + hdev = container_of(work, struct hci_dev, remove_pending_acl.work); >> >>> + skb = skb_dequeue(&hdev->pending_acl_q); >> >>> + >> >>> + if (!skb) >> >>> + return; >> >>> + >> >>> + hdr = (struct hci_acl_hdr *)skb->data; >> >>> + oldest_handle = hci_handle(__le16_to_cpu(hdr->handle)); >> >>> + kfree_skb(skb); >> >>> + >> >>> + bt_dev_err(hdev, "ACL packet for unknown connection handle %d", >> >>> + oldest_handle); >> >>> + >> >>> + skb_queue_walk_safe(&hdev->pending_acl_q, skb, tmp) { >> >>> + hdr = (struct hci_acl_hdr *)skb->data; >> >>> + handle = hci_handle(__le16_to_cpu(hdr->handle)); >> >>> + >> >>> + if (handle == oldest_handle) { >> >>> + __skb_unlink(skb, &hdev->pending_acl_q); >> >>> + kfree_skb(skb); >> >>> + } >> >>> + } >> >>> + >> >>> + if (!skb_queue_empty(&hdev->pending_acl_q)) >> >>> + queue_delayed_work(hdev->workqueue, >> >>> &hdev->remove_pending_acl, >> >>> + PENDING_ACL_TIMEOUT); >> >>> +} >> >>> + >> >> >> >> So I am wondering if we make this too complicated. Since generally >> >> speaking we can only have a single HCI connect complete anyway at a time. >> >> No matter if the controller serializes it for us or we do it for the >> >> controller. So hci_conn_add could just process the queue for packets with >> >> its handle and then flush it. And it can flush it no matter what since >> >> whatever other packets are in the queue, they can not be valid. >> >> >> >> That said, we wouldn’t even need to check the packet handles at all. We >> >> just needed to flag them as already out-of-order queued once and hand >> >> them back into the rx_q at the top. Then the would be processed as usual. >> >> Already ooo packets would cause the same error as before if it is for a >> >> non-existing handle and others would end up being processed. >> >> >> >> For me this means we just need another queue to park the packets until >> >> hci_conn_add gets called. I might have missed something, but I am looking >> >> for the least invasive option for this and least code duplication. >> > >> > I'm not aware of the fact that we can only have a single HCI connect >> > complete event at any time. Is this also true even if two / more >> > peripherals connect at the same time? >> > I was under the impression that if we have device A and B both are >> > connecting to us at the same time, we might receive the packets in >> > this order: >> > (1) ACL A >> > (2) ACL B >> > (3) HCI conn evt B >> > (4) HCI conn evt A >> > Hence the queue and the handle check. >> >> my reading from the LL state machine is that once the first LL_Connect_Req >> is processes, the controller moves out of the advertising state. So no other >> LL_Connect_Req can be processed. So that means that connection attempts are >> serialized. >> >> Now if you run AE and multiple instances, that might be different, but then >> again, these instances are also offset in time and so I don’t see how we can >> get more than one HCI_Connection_Complete event at a time (and with that a >> leading ACL packet). >> >> Regards >> >> Marcel >> >> -- >> You received this message because you are subscribed to the Google Groups >> "ChromeOS Bluetooth Upstreaming" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to chromeos-bluetooth-upstreaming+unsubscr...@chromium.org. >> To post to this group, send email to >> chromeos-bluetooth-upstream...@chromium.org. >> To view this discussion on the web visit >> https://groups.google.com/a/chromium.org/d/msgid/chromeos-bluetooth-upstreaming/7BBB55E0-FBD9-40C0-80D9-D5E7FC9F80D2%40holtmann.org.