> On Nov 7, 2016, at 8:00 AM, Christopher Collins <[email protected]> wrote: > > Hello all, > > Will found a bug in the host, which is probably (hopefully!) the only > bug here. I plan on testing and committing a fix for this bug later > today. In the meantime, here is the bug: > > /* (ble_hs_heartbeat_sched(), ble_hs.c) */ > > /* Reset heartbeat timer if it is not currently scheduled or if the > * specified time is sooner than the current expiration time. > */ > if (!os_callout_queued(&ble_hs_heartbeat_timer.cf_c) || > OS_TIME_TICK_LT(ticks_from_now, ble_hs_heartbeat_timer.cf_c.c_ticks)) { > // ^^^^ ^^^^ > > ble_hs_heartbeat_timer_reset(ticks_from_now); > } > > The mistake is marked with a // comment above. A relative time > (ticks_from_now) is being compared to an absolute time (cf_c.c_ticks), > which of course yields nonsense. I will need to double-check, but I > think this issue is not caught by the any unit tests because each test > starts at time 0, eliminating the distinction between absolute and > relative time. It would be good to advance the OS time by some amount > prior to running timer-dependent tests. I will make sure I can get the > appropriate tests to fail before I fix the bug. >
That is a good idea. I’d prefer starting with random value. Or by a timestamp that’s slightly smaller than 0. So you’ll some testing of the wraparound. > Thanks Will and Simon! > > Chris > > On Sun, Nov 06, 2016 at 08:44:40PM -0800, Christopher Collins wrote: >> Hi Simon, > > >> >> On Sun, Nov 06, 2016 at 01:50:42PM -0800, Simon Ratner wrote: >>> Hi devs, >>> >>> I am seeing some variance in the handling of `duration_ms` param to >>> `ble_gap_connect` (on develop). Below are two log traces that demonstrate >>> it. In both cases, duration_ms=1280; the first shows the timeout event >>> (status=13) after 535 ticks (4280 ms), while the second timeout is at a >>> much more respectable 169 ticks (1352 ms). >>> >>> ``` >>> 23352163:[ts=2050146956ssb, mod=4 level=1] GAP procedure initiated: >>> connect; peer_addr_type=1 peer_addr=46:e9:00:75:ce:af scan_itvl=16 >>> scan_window=16 itvl_min=16 itvl_max=32 latency=0 supervision_timeout=256 >>> min_ce_len=16 max_ce_len=768 >>> 23352693:[ts=2054287572ssb, mod=4 level=1] GAP procedure initiated: >>> advertise; disc_mode=2 peer_addr_type=0 peer_addr=none adv_channel_map=0 >>> own_addr_type=0 adv_filter_policy=0 adv_itvl_min=338 adv_itvl_max=338 >>> adv_data_len=28 >>> 23352698:[ts=2054326632ssb, mod=64 level=2] gatt_cli: connection failed; >>> status=13 >>> ``` >>> >>> ``` >>> 23355205:[ts=2073912596ssb, mod=4 level=1] GAP procedure initiated: >>> connect; peer_addr_type=1 peer_addr=46:e9:00:75:ce:af scan_itvl=16 >>> scan_window=16 itvl_min=16 itvl_max=32 latency=0 supervision_timeout=256 >>> min_ce_len=16 max_ce_len=768 >>> 23355333:[ts=2074912596ssb, mod=4 level=1] GAP procedure initiated: >>> advertise; disc_mode=2 peer_addr_type=0 peer_addr=none adv_channel_map=0 >>> own_addr_type=0 adv_filter_policy=0 adv_itvl_min=338 adv_itvl_max=338 >>> adv_data_len=28 >>> 23355374:[ts=2075232888ssb, mod=64 level=2] gatt_cli: connection failed; >>> status=13 >>> ``` >> >> Hmm, I see what you mean. I think this is probably a bug in the host, >> though I don't see any obvious issues in the code. I can think of a few >> other possibilities, but they are proably not all that likely. I want >> to mention them here just so we can rule them out. >> >> 1. Host event processing code is being starved due to tx or rx of too >> many BLE data packets. >> >> 2. Host parent task doesn't get a chance to run for an extended period >> of time because higher priority tasks are continuously busy. >> >> Option 1 concerns data packets only, not events from the controller, >> such as advertising reports. I think this could only be the case if >> your device is sending or receiving notifications at a very high rate >> (e.g., faster than 10 ms). >> >> Option 2 depends on how tasks in the application are prioritized. Does >> your application create a task that could have gotten stuck in a busy >> loop or similar for 30 seconds? >> >> If neither of these sound likely, then it is probably a bug in Mynewt >> that I will need to replicate. I noticed from the log that you are >> connecting and advertising at the same time, which I will try. Is there >> anything else you can say about what your application might have been >> doing when the problem occurred? >> >> Thanks, >> Chris
