Resending in plain text.

On Wed, Jul 15, 2020 at 9:56 AM Alain Michaud <alainmich...@google.com> wrote:
>
> Hi Marcel,
>
> Sorry, just got around to this.
>
> On Tue, Jun 30, 2020 at 2:55 AM Marcel Holtmann <mar...@holtmann.org> wrote:
>>
>> Hi Archie,
>>
>> >>> There is a possibility that an ACL packet is received before we
>> >>> receive the HCI connect event for the corresponding handle. If this
>> >>> happens, we discard the ACL packet.
>> >>>
>> >>> Rather than just ignoring them, this patch provides a queue for
>> >>> incoming ACL packet without a handle. The queue is processed when
>> >>> receiving a HCI connection event. If 2 seconds elapsed without
>> >>> receiving the HCI connection event, assume something bad happened
>> >>> and discard the queued packet.
>> >>>
>> >>> Signed-off-by: Archie Pusaka <apus...@chromium.org>
>> >>> Reviewed-by: Abhishek Pandit-Subedi <abhishekpan...@chromium.org>
>> >>
>> >> so two things up front. I want to hide this behind a 
>> >> HCI_QUIRK_OUT_OF_ORDER_ACL that a transport driver has to set first. 
>> >> Frankly if this kind of out-of-order happens on UART or SDIO transports, 
>> >> then something is obviously going wrong. I have no plan to fix up after a 
>> >> fully serialized transport.
>> >>
>> >> Secondly, if a transport sets HCI_QUIRK_OUT_OF_ORDER_ACL, then I want 
>> >> this off by default. You can enable it via an experimental setting. The 
>> >> reason here is that we have to make it really hard and fail as often as 
>> >> possible so that hardware manufactures and spec writers realize that 
>> >> something is fundamentally broken here.
>
> I don't have any objection to making this explicit enable to non serialized 
> transports.  However, I do wonder what the intention is around making this 
> off by default.  We already know there is a race condition between the 
> interupt and bulk endpoints over USB, so this can and does happen.  Hardware 
> manufaturers can't relly do much about this other than trying to pull the 
> interupt endpoint more often, but that's only a workaround, it can't avoid it 
> all together.
>
> IMO, this seems like a legitimate fix at the host level and I don't see any 
> obvious benefits to hide this fix under an experimental feature and make it 
> more difficult for the customers and system integrators to discover.
>
>>
>> >>
>> >> I have no problem in running the code and complaining loudly in case the 
>> >> quirk has been set. Just injecting the packets can only happen if 
>> >> bluetoothd explicitly enabled it.
>> >
>> > Got it.
>> >
>> >>
>> >>
>> >>>
>> >>> ---
>> >>>
>> >>> include/net/bluetooth/hci_core.h |  8 +++
>> >>> net/bluetooth/hci_core.c         | 84 +++++++++++++++++++++++++++++---
>> >>> net/bluetooth/hci_event.c        |  2 +
>> >>> 3 files changed, 88 insertions(+), 6 deletions(-)
>> >>>
>> >>> diff --git a/include/net/bluetooth/hci_core.h 
>> >>> b/include/net/bluetooth/hci_core.h
>> >>> index 836dc997ff94..b69ecdd0d15a 100644
>> >>> --- a/include/net/bluetooth/hci_core.h
>> >>> +++ b/include/net/bluetooth/hci_core.h
>> >>> @@ -270,6 +270,9 @@ struct adv_monitor {
>> >>> /* Default authenticated payload timeout 30s */
>> >>> #define DEFAULT_AUTH_PAYLOAD_TIMEOUT   0x0bb8
>> >>>
>> >>> +/* Time to keep ACL packets without a corresponding handle queued (2s) 
>> >>> */
>> >>> +#define PENDING_ACL_TIMEOUT          msecs_to_jiffies(2000)
>> >>> +
>> >>
>> >> Do we have some btmon traces with timestamps. Isn’t a second enough? 
>> >> Actually 2 seconds is an awful long time.
>> >
>> > When this happens in the test lab, the HCI connect event is about
>> > 0.002 second behind the first ACL packet. We can change this if
>> > required.
>> >
>> >>
>> >>> struct amp_assoc {
>> >>>      __u16   len;
>> >>>      __u16   offset;
>> >>> @@ -538,6 +541,9 @@ struct hci_dev {
>> >>>      struct delayed_work     rpa_expired;
>> >>>      bdaddr_t                rpa;
>> >>>
>> >>> +     struct delayed_work     remove_pending_acl;
>> >>> +     struct sk_buff_head     pending_acl_q;
>> >>> +
>> >>
>> >> can we name this ooo_q and move it to the other queues in this struct. 
>> >> Unless we want to add a Kconfig option around it, we don’t need to keep 
>> >> it here.
>> >
>> > Ack.
>> >
>> >>
>> >>> #if IS_ENABLED(CONFIG_BT_LEDS)
>> >>>      struct led_trigger      *power_led;
>> >>> #endif
>> >>> @@ -1773,6 +1779,8 @@ void hci_le_start_enc(struct hci_conn *conn, 
>> >>> __le16 ediv, __le64 rand,
>> >>> void hci_copy_identity_address(struct hci_dev *hdev, bdaddr_t *bdaddr,
>> >>>                             u8 *bdaddr_type);
>> >>>
>> >>> +void hci_process_pending_acl(struct hci_dev *hdev, struct hci_conn 
>> >>> *conn);
>> >>> +
>> >>> #define SCO_AIRMODE_MASK       0x0003
>> >>> #define SCO_AIRMODE_CVSD       0x0000
>> >>> #define SCO_AIRMODE_TRANSP     0x0003
>> >>> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
>> >>> index 7959b851cc63..30780242c267 100644
>> >>> --- a/net/bluetooth/hci_core.c
>> >>> +++ b/net/bluetooth/hci_core.c
>> >>> @@ -1786,6 +1786,7 @@ int hci_dev_do_close(struct hci_dev *hdev)
>> >>>      skb_queue_purge(&hdev->rx_q);
>> >>>      skb_queue_purge(&hdev->cmd_q);
>> >>>      skb_queue_purge(&hdev->raw_q);
>> >>> +     skb_queue_purge(&hdev->pending_acl_q);
>> >>>
>> >>>      /* Drop last sent command */
>> >>>      if (hdev->sent_cmd) {
>> >>> @@ -3518,6 +3519,78 @@ static int hci_suspend_notifier(struct 
>> >>> notifier_block *nb, unsigned long action,
>> >>>      return NOTIFY_STOP;
>> >>> }
>> >>>
>> >>> +static void hci_add_pending_acl(struct hci_dev *hdev, struct sk_buff 
>> >>> *skb)
>> >>> +{
>> >>> +     skb_queue_tail(&hdev->pending_acl_q, skb);
>> >>> +
>> >>> +     queue_delayed_work(hdev->workqueue, &hdev->remove_pending_acl,
>> >>> +                        PENDING_ACL_TIMEOUT);
>> >>> +}
>> >>> +
>> >>> +void hci_process_pending_acl(struct hci_dev *hdev, struct hci_conn 
>> >>> *conn)
>> >>> +{
>> >>> +     struct sk_buff *skb, *tmp;
>> >>> +     struct hci_acl_hdr *hdr;
>> >>> +     u16 handle, flags;
>> >>> +     bool reset_timer = false;
>> >>> +
>> >>> +     skb_queue_walk_safe(&hdev->pending_acl_q, skb, tmp) {
>> >>> +             hdr = (struct hci_acl_hdr *)skb->data;
>> >>> +             handle = __le16_to_cpu(hdr->handle);
>> >>> +             flags  = hci_flags(handle);
>> >>> +             handle = hci_handle(handle);
>> >>> +
>> >>> +             if (handle != conn->handle)
>> >>> +                     continue;
>> >>> +
>> >>> +             __skb_unlink(skb, &hdev->pending_acl_q);
>> >>> +             skb_pull(skb, HCI_ACL_HDR_SIZE);
>> >>> +
>> >>> +             l2cap_recv_acldata(conn, skb, flags);
>> >>> +             reset_timer = true;
>> >>> +     }
>> >>> +
>> >>> +     if (reset_timer)
>> >>> +             mod_delayed_work(hdev->workqueue, 
>> >>> &hdev->remove_pending_acl,
>> >>> +                              PENDING_ACL_TIMEOUT);
>> >>> +}
>> >>> +
>> >>> +/* Remove the oldest pending ACL, and all pending ACLs with the same 
>> >>> handle */
>> >>> +static void hci_remove_pending_acl(struct work_struct *work)
>> >>> +{
>> >>> +     struct hci_dev *hdev;
>> >>> +     struct sk_buff *skb, *tmp;
>> >>> +     struct hci_acl_hdr *hdr;
>> >>> +     u16 handle, oldest_handle;
>> >>> +
>> >>> +     hdev = container_of(work, struct hci_dev, remove_pending_acl.work);
>> >>> +     skb = skb_dequeue(&hdev->pending_acl_q);
>> >>> +
>> >>> +     if (!skb)
>> >>> +             return;
>> >>> +
>> >>> +     hdr = (struct hci_acl_hdr *)skb->data;
>> >>> +     oldest_handle = hci_handle(__le16_to_cpu(hdr->handle));
>> >>> +     kfree_skb(skb);
>> >>> +
>> >>> +     bt_dev_err(hdev, "ACL packet for unknown connection handle %d",
>> >>> +                oldest_handle);
>> >>> +
>> >>> +     skb_queue_walk_safe(&hdev->pending_acl_q, skb, tmp) {
>> >>> +             hdr = (struct hci_acl_hdr *)skb->data;
>> >>> +             handle = hci_handle(__le16_to_cpu(hdr->handle));
>> >>> +
>> >>> +             if (handle == oldest_handle) {
>> >>> +                     __skb_unlink(skb, &hdev->pending_acl_q);
>> >>> +                     kfree_skb(skb);
>> >>> +             }
>> >>> +     }
>> >>> +
>> >>> +     if (!skb_queue_empty(&hdev->pending_acl_q))
>> >>> +             queue_delayed_work(hdev->workqueue, 
>> >>> &hdev->remove_pending_acl,
>> >>> +                                PENDING_ACL_TIMEOUT);
>> >>> +}
>> >>> +
>> >>
>> >> So I am wondering if we make this too complicated. Since generally 
>> >> speaking we can only have a single HCI connect complete anyway at a time. 
>> >> No matter if the controller serializes it for us or we do it for the 
>> >> controller. So hci_conn_add could just process the queue for packets with 
>> >> its handle and then flush it. And it can flush it no matter what since 
>> >> whatever other packets are in the queue, they can not be valid.
>> >>
>> >> That said, we wouldn’t even need to check the packet handles at all. We 
>> >> just needed to flag them as already out-of-order queued once and hand 
>> >> them back into the rx_q at the top. Then the would be processed as usual. 
>> >> Already ooo packets would cause the same error as before if it is for a 
>> >> non-existing handle and others would end up being processed.
>> >>
>> >> For me this means we just need another queue to park the packets until 
>> >> hci_conn_add gets called. I might have missed something, but I am looking 
>> >> for the least invasive option for this and least code duplication.
>> >
>> > I'm not aware of the fact that we can only have a single HCI connect
>> > complete event at any time. Is this also true even if two / more
>> > peripherals connect at the same time?
>> > I was under the impression that if we have device A and B both are
>> > connecting to us at the same time, we might receive the packets in
>> > this order:
>> > (1) ACL A
>> > (2) ACL B
>> > (3) HCI conn evt B
>> > (4) HCI conn evt A
>> > Hence the queue and the handle check.
>>
>> my reading from the LL state machine is that once the first LL_Connect_Req 
>> is processes, the controller moves out of the advertising state. So no other 
>> LL_Connect_Req can be processed. So that means that connection attempts are 
>> serialized.
>>
>> Now if you run AE and multiple instances, that might be different, but then 
>> again, these instances are also offset in time and so I don’t see how we can 
>> get more than one HCI_Connection_Complete event at a time (and with that a 
>> leading ACL packet).
>>
>> Regards
>>
>> Marcel
>>
>> --
>> You received this message because you are subscribed to the Google Groups 
>> "ChromeOS Bluetooth Upstreaming" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to chromeos-bluetooth-upstreaming+unsubscr...@chromium.org.
>> To post to this group, send email to 
>> chromeos-bluetooth-upstream...@chromium.org.
>> To view this discussion on the web visit 
>> https://groups.google.com/a/chromium.org/d/msgid/chromeos-bluetooth-upstreaming/7BBB55E0-FBD9-40C0-80D9-D5E7FC9F80D2%40holtmann.org.

Reply via email to