Hi Hannes,
On 2019-02-14, 11:50, "Hannes Tschofenig" <[email protected]> wrote:
Hi Göran,
I will obviously not be able to convince you to change your research
strategy. So, I will not even try.
This is not just a research topic, but if this means that you respect that
different companies may have different strategies and want to be able to choose
between solutions with different properties, then I'm grateful for that. Also,
thanks for the pointers comparing ARM processors.
Göran
Anyway, thanks for the performance measurements your co-workers created in
the Excel sheets. I will take a closer look at them.
One item worthwhile to respond is the choice of the MCU. You wrote:
[GS] Nice application of LwM2M. The showcased device didn't seem very
constrained though, ARM Cortex M4?
The Cortex M4 offers a larger instruction set, including DSP/SIMD
capabilities, compared to something like the M0+. You can see the differences
at https://en.wikipedia.org/wiki/ARM_Cortex-M
In this blog post, see
https://community.arm.com/processors/b/blog/posts/armv6-m-vs-armv7-m---unpacking-the-microcontrollers,
Chris Shore shows the difference in the instruction set graphically.
Using these extra instructions code can be executed faster. This faster
execution time is already ensured by compilers but if you additionally use
hand-crafted Assembly code then you will get an extra performance improvement.
My co-workers from the Mbed TLS team have written hand-crafted Assembly to
speed up bignum computations, see
https://github.com/ARMmbed/mbedtls/blob/development/include/mbedtls/bn_mul.h#L645
https://github.com/ARMmbed/mbedtls/blob/development/include/mbedtls/bn_mul.h#L582
Executing code faster gives the device the ability to enter a low power
state quicker.
Additionally, if you use sensor fusion then having floating point support
in hardware will make your life easier (and the code faster).
Ciao
Hannes
-----Original Message-----
From: Göran Selander <[email protected]>
Sent: Montag, 4. Februar 2019 18:41
To: Hannes Tschofenig <[email protected]>; [email protected];
[email protected]
Subject: Re: [Secdispatch] FW: [secdir] EDHOC and Transports
Hi Hannes, secdispatch, and ace,
(It seems Hannes original mail only went to secdispatch.)
Apologies for a long mail, and late response. I had to ask some people for
help with calculations, see end of this mail.
On 2019-01-25, 15:15, "Secdispatch on behalf of Hannes Tschofenig"
<[email protected] on behalf of [email protected]> wrote:
Fwd to SecDispatch since it was only posted on the SecDir list
-----Original Message-----
From: Hannes Tschofenig <[email protected]>
Sent: Freitag, 25. Januar 2019 14:07
To: Hannes Tschofenig <[email protected]>; Jim Schaad
<[email protected]>; [email protected]
Subject: RE: [secdir] EDHOC and Transports
A minor follow-up: I mentioned that I am aware of a company using the
energy scavenging devices and it turns out that this information is actually
public and there is even a short video on YouTube. The company we worked with
is called Alphatronics and here is the video:
https://www.youtube.com/watch?v=JHpJV_CPYb4
As you can hear in the video we have been using our Mbed OS together
with our device management solution (LwM2M with DTLS and CoAP) for these types
of devices.
[GS] Nice application of LwM2M. The showcased device didn't seem very
constrained though, ARM Cortex M4?
-----Original Message-----
From: secdir <[email protected]> On Behalf Of Hannes Tschofenig
Sent: Freitag, 25. Januar 2019 13:52
To: Jim Schaad <[email protected]>; [email protected]
Subject: Re: [secdir] EDHOC and Transports
[Hannes] what we are doing here is making an optimization. For some
(unknown reason) we have focused our attention to the over-the-wire
transmission overhead (not code size, RAM utilization, or developer usability*).
[GS] Exactly my point, it is not enough with reducing transmission
overhead. We should also look at additional memory, flash, and configuration
effort. These parameters are of course implementation dependent but can to some
extent be inferred by bulk of specification and what pre-existing code can be
reused.
[Hannes] We are doing this optimization mostly based on information
about what other people tell us rather than based on our experience. The
problem is that we have too few people with hands-on knowledge and/or
deployment experience and if they have that experience they may not like to
talk about it. So, we are stepping around in the dark and mostly perceived
problems.
[GS] I don't think this rhetoric is very helpful. Who are "us"? The
co-workers you quote below, are they "us" or the "other people"? The people
active in 6tisch, lpwan or 6lo who are supporting the work on an optimized key
exchange, are they "us" or the "other people"?
[Hannes] Having said that I would like to provide a few remarks to your
list below:
[Jim] 1. Low-power devices that either are battery based or scavenge
power, these devices pay a power penalty for every byte of data sent and thus
have a desire for the smallest messages possible.
[Hannes] Low power is a very complex topic since it is a system issue
and boiling it down to the transmission overhead of every byte is an
oversimplification. You are making certain assumptions of how power consumption
of radio technologies work, which will be hard to verify. I have been working
on power measurements recently (but only focused on power measurements of
crypto, see
https://community.arm.com/arm-research/b/articles/posts/testing-crypto-performance-and-power-consumption).
[GS] These kind of power measurements of crypto are part of the explanation
for why transmission overhead is important to reduce. Optimizations and
hardware support make the crypto contribution to power consumption possible to
handle, so that there is no reason to deviate from the use of current best
practice crypto in security protocols even for constrained devices. The energy
cost for transmission, however, is a strongly coupled to the laws of physics
which sets a limit for how much they can be optimized.
[Hannes] I doubt that many people on this list nor in the IETF have a lot
of experience in this field to use this as a basic for an optimization.
[GS] There are people in 6tisch, lpwan and 6lo who knows about power
consumption and constrained characteristics. Some of them were supporting EDHOC
in ACE when you were chair.
[Hannes] My co-workers, who are active in this space, tell me that there
is nothing like a "per byte" linear relationship (for small quantities of data)
in terms of energy cost. Obviously if you trigger "an additional transmission",
which requires you to ramp up a PLL, turn on radio amplifiers, send lengthy
preambles etc then the incremental cost of sending 64 bytes in that packet vs
16 bytes might be immeasurable small. The critical thing appears to be how long
the RF amplifiers are powered on. Hence, you will often see publications that
tell you that waiting for incoming packets is actually the most expensive task
(in terms of power consumption).
[GS] Energy consumption generally increases with message overhead in
wireless systems. This function is different for different radio technologies,
data rates, etc. Even if we pick a certain technology like 6tisch, LoRaWAN or
NB-IoT, events like packet loss and retransmission impacts the result. So
indeed, this is complicated, but we can still make general claims as well as
estimates of particular technologies. I asked a colleague to make some power
consumption estimates for NB-IoT devices. NB-IoT is licensed spectrum, which
implies that the devices are allowed to transmit at a higher power compared to
unlicensed spectrum. It also means that the application provider in general
does not control how good the coverage is, since that depends on location of
base station and environment. A comparison [3] between DTLS 1.3 and EDHOC is
given at the end of this mail, but just because you mentioned the incremental
cost of a device sending 64 vs 16 bytes, the difference is indeed measurable:
992 mJ vs 479 mJ, i.e. half a Joule of difference in a case of low coverage
(see [3]).
[GS]: About cost for listening: there are different techniques for
decreasing time to listen, like time slots, DRX etc. These are examples of
where the radio guys can be innovative and make optimizations, in contrast to
transmission overhead for security where they just have to accept what the
security people decided.
[Jim] 2. CoAP over SMS: SMS has a 140 byte packet size. There are two
approaches for dealing with packets of larger than 140 bytes: 1) There is a
method of appending multiple packets together to form a single larger packet.
2) You can use CoAP blockwise transfer. Using CoAP blockwise would result in
128 byte packets for the underlying transfer assuming that only 12 bytes are
needed for the CoAP header itself.
[Hannes] It turns out that CoAP over SMS is rarely used for delivering
data of IP-based devices since SMS is a pretty expensive transport. From my
work in the OMA I know that people use SMS to trigger the wake-up of devices
and then switch to regular data transmission over IP. IMHO optimizing for use
cases that barely anyone uses appears to be a waste of time.
[GS] I strongly disagree with the general argument that what is currently
applied is the only thing that is worth working on. One problem with this type
of argument is that it reinforces the existing limitations and becomes a
self-fulfilling prophecy. The fact that key exchange protocol messages
currently does not fit into an SMS contributes to the reason why it is not so
much used. More SMSs also adds to cost, but the cost depends on the agreement
with the operator so is not necessarily a hard limitation. Who are we to
predict what technology will used given a more efficient key exchange protocol?
For EDHOC with PSK or RPK, each message fits into one SMS.
[Jim] 3. 6LoPan over IEEE 802.15.4: This has a packet size of 127
bytes. The maximum frame overhead size is 25 bytes allowing for 102 bytes of
message
space. If one assumes 20 bytes of overhead for CoAP then this means a
protocol packet size of 82 bytes. If one needs to break the message
across multiple packets then the maximum data size is going to be 64 bytes
using CoAP blockwise options.
[Hannes] For some reason there seems to be the worry that a small MTU
size at the link layer will cause a lot of problems. There are some radios that
have this small MTU size, IEEE 802.15.4 and Bluetooth Low Energy belong to
them. It turns out, however, that higher layers then offer fragmentation and
reassembly support so that higher layers just don't get to see any of this. In
IEEE 802.15.4 this fragmentation & reassembly support is offered by 6lowpan and
in case of Bluetooth Low Energy the link layer actually consists of various
sub-protocols. One of them offers fragmentation & reassembly. As such, the
problem you describe is actually not a problem. There is no reason why you
always have to put a single application layer payload into a single link layer
frame. We have been using LwM2M (which uses DTLS and CoAP) over IEEE 802.15.4
networks successfully for big commercial deployments. We have not run into
problems with the smaller MTU size at the lower layers.
[GS] I'm happy to hear you don’t experience any problems, but MTU sizes
does matter. If message overhead at a higher layer causes fragmentation at a
lower layer, instead of only powering up the radio and sending the physical
preamble once, it will be necessary to do that once per each fragment in the
next transmission opportunity at the MAC layer. On top of this wireless links
can be quite lossy, particularly with low-power radios like what is used e.g.
with 6tisch. For example, Packet Delivery Ratio (PDR) that you will typically
find indoors with 802.15.4 radios is 60-80% [1]. Now, when you pass from this
single frame to multiple fragments, you also exponentially increase the
probability that one of those fragments will get lost and that it needs to be
retransmitted. It often occurs that the endpoint performing the reassembly of
the fragments just drops the whole thing in case one of the fragments gets
lost. This then results in retransmissions of all fragments at the sending
endpoint, their link-layer retransmissions, etc, all employing the costly radio
operations that you describe. Having this handled by "lower layer" only means
that the application developer does not have to handle it himself, but the
energy penalty for the system does not go away!
[GS] Fragmentation also adds to latency in several ways. E.g. for LoRaWAN
which operates on unlicensed band, in Europe 868 MHz, there is the concept of
1% duty cycle meaning that for each transmission the device must wait 100 times
as long interval as message sending time before it is allowed to transmit
again. LoRaWAN is currently PSK based and this is one example where a key
exchange protocol would improve the overall security both in the case of PSK
and RPK, see [2] for an analysis using EDHOC with PSK ECDHE.
[GS] A comparison [4] of time on air between DTLS 1.3 handshake and EDHOC
are given at the end of the mail. Since for LoRaWAN the maximum MTU is 242
bytes, DTLS handshake with RPK ECHDE does not even fit and would require some
fragmentation scheme (+ the 100 times additional delay). Depending on radio
conditions, the higher data rates associated with 242 bytes may incur too much
packet loss requiring the use of a lower data rate with associated lower frame
size and even more severe message overhead restrictions to avoid fragmentation.
[Hannes] When it comes to energy scavenging devices then it becomes even
more challenging since this is a more rarely used case. I know about one
company doing this and I have spoken with a researchers at last year's Arm
research summit who show-cased one device. The device shown by the researcher
was a prototype and didn't use any Internet protocol nor a security mechanism.
I wouldn't call myself knowledgeable enough to optimize a system based on this
experience but maybe you have more expertise in this field. I am happy to learn
more.
[GS] As mentioned in my previous mail, the scope of this work is about
optimizing security for deployments that can support some kind of CoAP stack,
e.g. CoAP/UDP/IP or CoAP over some link technology.
[Hannes] The handshake itself is just a very small part of the overall size
of data that gets transmitted during the lifetime of the device since the
handshake obviously happens extremely rarely.
[GS] How often a handshake is invoked is application dependent, it could
for example be the result of the device needs to power off, or because the
device reboots. If one handshake consumes as much energy as months of normal
operations, then this contribution may well be noticeable in the lifetime of
the battery.
[Hannes] There are much better ways to optimize traffic and you obviously
have to look at all the data you are transmitting for the device.
[GS] How much further optimization you can do is application dependent, and
for some applications security overhead matters.
Ciao
Hannes
*: In my experience the ability for developers to easily use any of the
performance optimization techniques is the biggest barrier for gaining
performance. Of course, this does not fit nicely in any of the standardization
efforts in the IETF so the focus has to be somewhere else.
[GS] The need for performance optimizations depends on the design of the
protocol, so there are definitely efforts in the IETF which can make the life
easier for developers.
[GS] Now for the comparisons:
NB-IoT
======
Calculations of energy consumption for NB-IoT comparing EDHOC and DTLS 1.3
handshake is given in [3]
PSK + ECDHE (normal coverage)
----------------
DTLS 1.3 handshake: 47 mJ
EDHOC: 19 mJ
PSK + ECDHE (low coverage)
----------------
DTLS 1.3 handshake: 2992 mJ
EDHOC: 912 mJ
RPK + ECDHE (normal coverage)
----------------
DTLS 1.3 handshake: 64 mJ
EDHOC: 29 mJ
RPK + ECDHE (low coverage)
----------------
DTLS 1.3 handshake: 4326 mJ
EDHOC: 1677 mJ
We see that the factor 4 in message overhead with PSK ECDHE between DTLS
1.3 handshake and EDHOC (appendix E of EDHOC) is translated to a factor 2.5-3.3
in energy consumption for a NB-IoT device depending on coverage. Analogously
the factor 3 in message overhead with RPK ECDHE is translated into a factor 2.2
- 2.6 in energy consumption.
LoRaWAN
======
Calculations of time-of-air of handshake of EDHOC and DTLS 1.3 for LoRaWAN
is given in [4]
PSK + ECDHE
----------------
DTLS 1.3
Message #1: 564 ms
Message #2: 574 ms
Message #3: 226 ms
EDHOC:
Message #1: 195 ms
Message #2: 205 ms
Message #3: 113 ms
RPK + ECDHE
-----------------
DTLS 1.3: N/A without fragmentation scheme
EDHOC:
Message #1: 184 ms
Message #2: 389 ms
Message #3: 297 ms
As mentioned above, the time-on-air is an important property for LoRaWAN
deployments since it both relates to power consumption and latency, in
particular due to duty cycles.
Summary
=======
There is a lot that speaks in favor of low message overhead, for example
* Smaller per-byte contribution to power consumption, which has significant
impact in e.g. licensed spectrum
* Less latency, in particular due to duty cycles in LoRaWAN
* Better fit into MTUs with less fragmentation and associated overhead
* Smaller probability of packet loss
The comparisons presented here show that DTLS 1.3 is far from optimal. Let
me reiterate that this should not be interpreted as a criticism against
TLS/DTLS. We are targeting applications in constrained environments which the
TLS handshake was explicitly not designed to optimize for. We agree that for
many IoT applications the performance of the handshake is adequate, so there is
no need to change DTLS. We also agree that message overhead is only one aspect,
and it is really important to look at other aspects such as memory, code
footprint and usability, which all speak in favor of a protocol with limited
functionality and which reuses existing code in the devices such as CBOR and
COSE. For certain application providers current IETF protocols are prohibitive
in one or more of these aspects, and unless the performance is drastically
improved some consider (still, 2019) to skip end-to-end security (e.g.
terminate security in a gateway), make their own security protocol, or use more
pragmatic key exchange constructions like Noise [5].
I would like to leave the comparison exercise soon and focus on the
security properties. I hope we have made a point that constrained
characteristics matter. Can the IETF support work on a key exchange protocol
that is designed for the constrained IoT, or are we restricted to retrofit some
other protocol with other design goals?
Göran
[1] Muñoz, Jonathan, et al. "Why Channel Hopping Makes Sense, even with
IEEE802. 15.4 OFDM at 2.4 GHz." 2018 Global Internet of Things Summit (GIoTS).
IEEE, 2018.
[2] Sanchez-Iborra, Ramon, et al., "Enhancing LoRaWAN Security through a
Lightweight and Authenticated Key Management Approach", Sensors, 2018
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6021899/
[3] NB-IoT power consumption comparison EDHOC-DTLS 1.3
https://github.com/EricssonResearch/EDHOC/blob/master/docs/NB%20IoT%20power%20consumption.xlsx
[4] LoRaWAN Time-of-Air comparison EDHOC-DTLS 1.3
https://github.com/EricssonResearch/EDHOC/blob/master/docs/LoRaWAN_ToA.xlsx
[5] The Noise Protocol Framework
http://www.noiseprotocol.org/
IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended recipient,
please notify the sender immediately and do not disclose the contents to any
other person, use it for any purpose, or store or copy the information in any
medium. Thank you.
_______________________________________________
Ace mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/ace