Hi Chris, I have resolved the issue. It wasn't my mbuf structure but MSYS_1's pool memory leak (caused by my app).
After verifying that the same actions recreated in btshell result in no MSYS_1 memory leak (BTW, SHELL_TASK and its mpool stat is a great tool!) I turned my attention back to my code. Ha! A couple hours later I found below guy in my ble_gattc_read's callback. static int cb_on_read(..., ..., struct ble_gatt_attr *attr, ...) // To *save* time I copy-pasted and inlined below while loop from the print_mbuf function. while (attr->om != NULL) { strncpy(p_message, attr->om->om_data, attr->om->om_len); p_message += attr->om->om_len; attr->om = SLIST_NEXT(attr->om, om_next); // GUY } // To fix above simply use const struct os_mbuf *om = attr->om; before the while // loop and operate on a copy of the received attribute's pointer. I finally have a stable Mynewt app <-> Android repeated communication even on MSYS_1_BLOCK_COUNT = 8! So happy with it. Thanks again for the ride this bug turned out to be. Kind regards, Łukasz On Sun, Aug 5, 2018 at 8:22 PM Lukasz Wolnik <lukasz.wol...@gmail.com> wrote: > Hi Chris, > > It's been more than a year but I have finally got some findings. > > In my another Mynewt powered device I got the BLE_HS_ENOMEM error > from ble_gattc_write_flat this time. I tried changing BLE_GATT_MAX_PROCS > (down to 2 or up to 8) but with no effect. > > My device was keeping a connection live but stopped receiving any > communictations from its peer. No notifications, read, write, etc.. Luckily > it was always happening on the 5th consecutive connection (each previous > ones terminated by my app using ble_gap_terminate(conn_handle, > BLE_ERR_REM_USER_CONN_TERM)). > > By changing MSYS_1_BLOCK_COUNT I started to get the BLE_HS_ENOMEM > respectively on: > > 8 - 4th connection > 12 - 5th connection (default value) > 16 - 6th connection > > Yay! This thing is not random at all. > > I assume it can't be NimBLE that's not freeing its resources up so I'm > going to look at my dynamically allocated structure that > uses os_memblock_get functions. I'm going to read up on Mynewt mbufs. > > Thank you very much for pointing me into the right direction. I regained > my faith in BLE stack. No more panic attacks while pitching my device and > getting to the demo part :) > > Kind regards, > Łukasz > > On Mon, May 15, 2017 at 9:09 PM Łukasz Wolnik <lukasz.wol...@gmail.com> > wrote: > >> Hi Szymon, >> >> Thanks for the clarification. I was going to write a queue system for >> GATT writes/reads but it looks like I should rely on lower layer and just >> reconnect if my central app has problem with communicating to peripherial >> devices. >> >> Kind regards, >> Łukasz >> >> On Mon, May 15, 2017 at 9:07 PM, Łukasz Wolnik <lukasz.wol...@gmail.com> >> wrote: >> >>> Hi Chris, >>> >>> Thanks a lot for your responses. They are very helpful and are radically >>> shaping the way I'm going to develop the second version of my app. >>> >>> I'm pretty sure, when I investigated the issue with gdb, that the >>> problem was not enough GATT procs available. I'll experiment with >>> MSYS_1_BLOCK_COUNT >>> and BLE_GATT_MAX_PROCS and let you know if increasing >>> BLE_GATT_MAX_PROCS helped. Thanks a lot for sharing these two config >>> values. And yes, that'd great if alongside the error it would tell which >>> resource is not available and what are the current limits. >>> >>> Right, so it's a 30 not a 20-second timeout. My app is a wearable item >>> and it's crucial for it to be robust. I think what I can do is to manually >>> disconnect a connection handle when I'm not getting a confirmation within 1 >>> second. >>> >>> Kind regards, >>> Łukasz >>> >>> On Mon, May 15, 2017 at 7:06 PM, Christopher Collins <ch...@runtime.io> >>> wrote: >>> >>>> On Mon, May 15, 2017 at 11:01:38AM -0700, Christopher Collins wrote: >>>> > Hi Łukasz, >>>> > >>>> > On Mon, May 15, 2017 at 12:33:59PM +0100, Łukasz Wolnik wrote: >>>> > > Hello, >>>> > > >>>> > > From time to time my ble_gattc_write_flat (run as central) is >>>> timing out >>>> > > after 20 seconds while sending to an Android 6 phone (in >>>> peripherial mode). >>>> > > Is there a way to reduce the timeout to just 1 second? At the >>>> moment if >>>> > > there's an issue with writing, my newt program has to wait 20 >>>> seconds until >>>> > > it can respond to a timeout. >>>> > > >>>> > > What's the best strategy here? Keep "bombarding" the peripherial >>>> with >>>> > > multiple writes until receiving first confirmation. Or reduce the >>>> timeout >>>> > > from 20 seconds (I don't know where this value is coming from) and >>>> resend >>>> > > only when got an HCI 19 timeout error in the callback? >>>> >>>> Oops, I forgot to respond to your actual question :). Sorry about that. >>>> The 30-second timeout is hardcoded in the spec, and is currently not >>>> configurable (Vol. 3, Part F, 3.3.3). It might be useful to make this >>>> configurable, but the device would not be standards compliant. >>>> >>>> Chris >>>> >>> >>> >>