Hi!

I'm working on InfiniTime, an open source firmware for the PineTime. PineTime is a foss smartwatch based on the NRF52832 MCU. InfiniTime uses Nimble 1.3 (tag nimble_1_3_0_tag on github) and FreeRTOS.

A few months ago, I reached this mailing list to find help to fix BLE connection issues and it works mostly fine since then.

Mostly... except when sending a new firmware for the OTA procedure : 'sometimes', the transfer just stops, without any reason. When this happens, we have to reset the whole MCU because the ble stack looks completely frozen : no advertising, no connection,...

I've finally decided to debug this issue using a BLE sniffer (based on the NRF52-DK), btmon connected to the RTT output of Nimble and a logic analyzer.

When the transfert is running, ble sniffer shows a 'write' package from the phone and an empty packet from the watch. When the transfert fails, it looks like the watch does not sent empty PDU between each packet sent by the host.

Btmon show that everything just stop, without any error. For example :

ACL Data RX: Handle 1 flags 0x02 dlen 27 #238433 18446744073709548371.836900
      ATT: Write Command (0x52) len 22
        Handle: 0x0044
          Data: 3df807200fb030bd9df80430042b01d02046f2e7
ACL Data RX: Handle 1 flags 0x02 dlen 27 #238434 18446744073709548371.837500
      ATT: Write Command (0x52) len 22
        Handle: 0x0044
          Data: dde902231568516892689a6020461d605960e8e7
* Drops: cmd 0 evt 0 acl_tx 1 acl_rx 0 sco_tx 0 sco_rx 0 other 0
ACL Data RX: Handle 1 flags 0x02 dlen 27 #238435 18446744073709548371.883500
      ATT: Write Command (0x52) len 22
        Handle: 0x0044
          Data: 2af05cfa104a9df810301468adf80e5068f30003
ACL Data RX: Handle 1 flags 0x02 dlen 27 #238436 18446744073709548371.884100
      ATT: Write Command (0x52) len 22
        Handle: 0x0044
          Data: 0c22adf80c7002968df810308df8042034b1d4e9

This line is actually the last line of the transfer when it failed.

Using the logic analyzer connected to debug pins I set/clear at specific places of the code, I observed that, when the error occurs, the ll_task just look frozen : it does not handle any event anymore.

On the attached screenshot, you'll find a capture from the logic analyzer : - The first channel (D0) shows the activity of the LL task (1 = processing event) - the second channel (D3) is set while waiting for an event from the queue (xQueueReceive())
 - The third channel (D2) is set in npl_freertos_eventq_put().
 - the last channel (D4) is set in npl_freertos_eventq_remove().

Most of the time, I can see short bursts of 9 events (the one on the left of the picture). When it fails, there are less than 9 events, and you can see than the last 'put' took more time than previously to run.

It looks like a deadlock occurs somewhere. Is it in my integration of nimble? In the freertos port part of nimble? Or maybe a bug in nimble or freertos?

I don't really know where to look next. Any suggestion to debug this issue?

Thanks

JF

Reply via email to