+1 for efficiency over genericness for me, every time
Lower power and a shorter delay to sleep mode will always win for me with anything wireless as long as the timing constraints can be respected.
Kevin On 23/06/16 22:33, will sanfilippo wrote:
Hello: I wanted to post a question to the dev list to see if folks had opinions regarding the following topic. As others have stated “this will be a long and dry email” so be forewarned… HAL cputime was developed to provide application developers access to a generic, high resolution timer. The API provided by the hal allows developers to create “timers” that can be added to a timer queue. The API also provides a set of routines to convert “normal” time units to hw timer “ticks”. The timer queue is used to provide applications with a callback that will occur at a given ‘cputime’. The term ‘cputime’ refers to the underlying timebase that is kept by the hal. Cputime always counts in tick increments, with the time per tick dependent on the underlying HW timer resolution/configuration. The main impetus behind creating this HAL was for use in networking stacks. BLE (bluetooth low energy) is a good example of such a stack. The specification requires actions to occur at particular times and many of these actions are relatlive to the transmission or reception time of a packet. The cputime HAL provides a consistent timebase for the BLE controller stack to interface to the underlying HW and should provide a handy abstraction when porting to various BLE transceivers/socs. Using the current nimBLE stack (mynewt’s BLE stack) as example, the stack instantiates cputime using a 1 MHz clock. This means that each cputime tick is 1 usec. This timebase was chosen as it provides enough (more than enough!) resolution for the BLE stack and is in a time unit that is a common factor of any time interval used in the specification. For example, advertising events are in units of 625 usecs and connection intervals are in units of 1250 usecs. While using a 1 usec timebase has its advantages, there are disadvantages as well. The main drawback is that on some HW this timebase would require use of a higher power timer. For example, the nrf52 has a low power timer (they call it the RTC) but this timer has a minimum resolution of 30.517 usecs as it is based on a 32.768kHz crystal. In its current incarnation, hal cputime cannot support this timer as the minimum clock frequency accepted by this hal is 1 MHz. So, this (finally!) leads to the question I want to ask the community: how does the community feel about sacrificing “genericness” for “efficiency”? If it were up to me, I would sacrifice genericness for efficiency in a microsecond (forgive the bad pun!) in this case. Let me go into a bit more detail here. It should be obvious to the reader that there are neat tricks you can play when dividing by a power of 2 (it is a simple shift right). In the case of a 32.768 kHz crystal, each tick is 1/32768 seconds in length (this is where we get the ~30.517 usec tick interval). What I would like to do is have a compile time definition specifying use of a 32.768 kHz crystal for cputime. How this gets defined is outside the scope of this email. It may be a target variable, something in a pkg.yml file or a newt feature. With this definition the API that converts ticks to usecs (and vice versa) does a shift instead of a divide or multiply. On the nrf51 this can lead to quite a large savings in time. Using the C library 64-bit divide routine that mynewt uses, it takes about 60 usecs to perform this divide. When we shift a 64-bit number to perform the divide this time gets down to 4 or 5 usecs (slightly more than an order of magnitude savings!). Of course, on faster processors or processors that support faster divides this might be a moot point, but for those using the nrf51 it is not. Now you may say “you could have done the same thing in your current HAL cputime with a 1 MHz clock”. In this case, the routine to “convert” ticks to usecs (and vice versa) would simply return the number passed in. I would like to make this change as well personally. Seems quite a big win (and would also save some code space too!). Comments? Will
