I will need to switch over to the nrf52dk to debug, I'll see if I can do
that in the next couple of days.

In the meantime, I tried increasing hci buffers from 4 to 8 and it has
helped somewhat. I am not seeing -1 returns any more, and incoming
connections are less flaky, but I still see that 14. To clarify a point I
missed in my original email - and your question has surfaced - that isn't a
direct return from ble_gap_conn_initiate, but the status code in
BLE_GAP_EVENT_CONNECT (ctxt->connect.status) following said call. That is
why I wasn't sure if it is actually a BLE_HS_ETIMEOUT from the host, or
coming from another part of the stack entirely.

Just returning to the question of hci buffers briefly, it would help me to
really understand the resource requirements if you could very briefly
describe the sorts of things which consume these buffers, and for how long
they are tied up before being released back into the pool? What does
"max_hci_bufs=4" mean in practical terms? Same goes for all other
resources, I guess. As a tangential example, while playing with
multi-connections, it was not obvious until I read the code that in
addition to a connection descriptor, I also needed 3 available link
channels to be able to accept a new connection. I am still not sure why you
might ever set the number of channels to anything other than 3x connections.


On Jun 20, 2016 7:13 PM, "chris collins" <ch...@runtime.io> wrote:

(Btw, sorry if these emails "look annoying"... my main computer is out of
commission, so I have been using the gmail web interface for the last few
days!)

There is no connection between the mbuf settings and the max_hci_bufs
setting.  I don't have a specific max_hci_buf setting in mind, but 4 or 5
seems reasonable, but I am not so enthusiastic about this change anymore.
I am pretty sure my theory of what was causing the BLE_HS_ETIMEOUT error is
incorrect, for the following reasons:

1. I was discussing this with Will, and he reminded me that the controller
always reuses the command HCI buf when it sends an acknowledgement.  In
other words, the controller should never fail to allocate an HCI buf when
sending an acknowledgement to the host.

2. The host code *doesn't* return BLE_HS_ETIMEOUT when an acknowledgment is
not received; it returns -1 (another return code bug!).  I simply don't see
any code path which would yield a return code of 14 here.  I hate to ask,
but... are you sure you the 14 is coming from ble_gap_conn_initiate()?

I am fairly confident the -1 return code from ble_gap_disc_cancel() is
indeed caused by a hci buffer shortage, but I have a feeling there is some
sort of bug at the root of these issues.  Are you able to debug your
application in gdb?  I am curious about the state of the nimble stack when
you receive the -1 or 14 error codes.  In particular:

# Print state of HCI buffer pool:
p g_hci_os_event_pool

# Print GAP master and slave states:
p ble_gap_master
p ble_gap_slave

If you could capture that information that would much appreciated.

Finally, to answer a lingering question that I seem to have consistently
ignored: there should not be any issue with timing.  After the call to
ble_gap_disc_cancel() returns, you can immediately perform another GAP
procedure.

Chris

On Mon, Jun 20, 2016 at 6:08 PM, Simon Ratner <si...@proxy.co> wrote:

> Ok, so those two sound like they might be have the same cause. Perhaps
> related to that, I also stop receiving incoming connections after a short
> while, possibly for the same reason, although there is no indication in
the
> logs or anywhere else on the mynewt side - the connecting central justsees
> a failed connection.
>
> I am able to process all the advertisement reports just fine when I don't
> attempt to cancel discovery / connect to those discovered peripherals. Is
> it possible that cancellation is somehow causing or exacerbating this; for
> example some reports have already been received but are still being
handled
> by the stack at the time discovery is cancelled, they are never reported
to
> the app and corresponding buffers never freed? Just guessing here.
>
> I'll try increasing hci buffers, too. Do you have a recommended value for
> max_hci_buf? What about the mbuf size passed to ble_ll - is it at all
> correlated with host bufs, should they be allocated in certain ratios?
>
>
>
> On Mon, Jun 20, 2016 at 5:55 PM, chris collins <ch...@runtime.io> wrote:
>
> > Hi Simon,
> >
> > Unfortunately I am not able to reproduce that behavior.  However, I
> think I
> > can answer one of your questions.  Hopefully that will lead to a full
> > solution.
> >
> > That -1 return code is generated when the stack runs out of HCI command
/
> > event buffers.  The actual return code is a bug; BLE_HS_ENOMEM should
> > probably be returned instead.  I am a bit puzzled about the cause of the
> > buffer shortage.  You are probably receiving a lot of advertisement
> reports
> > from the controller, but I wouldn't expect them to be coming in faster
> than
> > you can handle them, but I suppose that depends on the particulars of
> your
> > application.  You can try increasing the number of HCI buffers at host
> > initializtion time.  This setting is in the host configuration struct,
> and
> > it is called max_hci_bufs.
> >
> > Regarding the second problem (ble_gap_conn_initiate() returns
> > BLE_HS_ETIMEOUT): I have a guess.  The return code indicates that the
> > controller did not respond to an HCI command in a timely manner.  My
> guess
> > is that the controller is unable to allocate an HCI buffer due to the
> > shortage.  From looking at the code, it appears we don't have any
> > statistics indicating the number of times an HCI buffer failed to
> > allocate... this is definitely something that should be added.
> >
> > Chris
> >
> > On Mon, Jun 20, 2016 at 5:07 PM, Simon Ratner <si...@proxy.co> wrote:
> >
> > > Thanks Chris, just tried it out and it seems to do the trick -- half
of
> > the
> > > time.
> > >
> > > I see two occasional errors:
> > >
> > > 1. Sometimes, ble_gap_disc_cancel returns (-1); any idea under what
> > > circumstances that might happen?
> > >
> > > 2. Sometimes, ble_gap_disc_cancel returns 0 but ble_gap_conn_initiate
> > > immediately afterwards fails with code 14 (ETIMEOUT? unless it's an
hci
> > > error?). Is it possible that this is timing-related somehow and the
> link
> > > layer hasn't switched to the right state yet? Should i delay connect
> > > attempt for a tick?
> > >
> > > Both of these occur inconsistently; about half the time it just works.
> > >
> > >
> > > On Sat, Jun 18, 2016 at 10:21 PM, chris collins <ch...@runtime.io>
> > wrote:
> > >
> > > > Hi Simon,
> > > >
> > > > Thanks for the heads up; this is definitely an omission.  You should
> be
> > > > able to cancel a scan in progress.
> > > >
> > > > Barring any unforeseen complications, the cancel functionality
should
> > be
> > > > implemented in the develop branch tomorrow.  This will allow the app
> > > cancel
> > > > the scan and initiate a connect procedure from within the
advertising
> > > event
> > > > callback.
> > > >
> > > > Chris
> > > >
> > > >
> > > > On Sat, Jun 18, 2016 at 7:38 PM, Simon Ratner <si...@proxy.co>
> wrote:
> > > >
> > > > > Hi devs,
> > > > >
> > > > > Having initiated an undirected scan with ble_gap_disc(), I would
> like
> > > to
> > > > > connect to my peripheral as soon as I spot it in the scan
callback.
> > > > > However, calling ble_gap_conn_initiate() at this point fails with
> > > > > BLE_HS_EALREADY, as ble_gap_master is still in discovery mode. I
> need
> > > to
> > > > > stash the discovered peripheral, wait for the scan to finish, and
> > then
> > > > try
> > > > > to connect, which is unnecessary state management. Additionally,
> > there
> > > > > doesn't seem to be a way to cancel the scan, so this becomes
> > especially
> > > > > problematic if the scan is long-running.
> > > > >
> > > > > For comparison, while advertising, an incoming connection
> > automatically
> > > > > drops the slave out of advertising mode (which can be resumed
> > > immediately
> > > > > if you have enough connection resources).
> > > > >
> > > > > Is this an omission, or by design?
> > > > >
> > > > > Cheers,
> > > > > simon
> > > > >
> > > >
> > >
> >
>

Reply via email to