Grewal, Ken writes:
> [Ken] This may be feasible for stateful devices, but does not work
> for stateless devices (QOS/Statistics/auditing functions). Even in
> stateful devices, it requires coupling between observation on flows
> and the associated heuristics cache engine, which creates an
> additional overhead.

As the draft says this is mostly meant for stateful devices, and that
has been the main goal for the document. The charter says:

"A standards-track mechanism that allows an intermediary device, such
as a firewall or intrusion detection system ..."

I.e. the main goal was set to be on the devices doing deeper
inspection i.e. firewalls and intrusion detection systems.

At least my conclusion on the list when we discussed on the usage
cases were for that kind of stateful devices.

Are QOS and auditing devices really stateless?

I would expect QOS devices to have all kind of reservation systems and
so on and for those I would expect them to be keeping state?

For the auditing I have been using I have usually enabled auditing of
new connections, not all packets, thus all my auditing systems I have
set up have been keeping state. What kind of usage is completely
stateless auditing devices used for your example of 10Gbps links?

Statistics devices could even be stateless, but is that really
something we should aim for? I.e. to wait for 5-10 years before we can
use our stateless statistics devices, compared to use stateful
statistics devices for doing the thing this or next year. 


> [Ken] These require timestamps (or other ordering / metric based
> approaches) and monitoring to ensure the cache is up to date.

Stateful devices do already have all that.

> Furthermore, it may also provide opportunities for adversaries to
> use periodic replays to provide cache thrash and associated overhead
> in re-executing heuristics engines.

As far as I have understood we are still talking about the inside one
enterprice network, not internet as whole. If they do have untrusted
users inside (i.e. attackers), they should enable encryption, thus all
this is not really a point.

As ESP-NULL does not offer confidentiality it can only be used in
trusted environments, where the denial-of-service attacks against the
device in the middle should not be big problem. 

> I am not convinced that SW based solutions will scale 10Gbps
> solutions, let alone future 40/100Gbps bandwidth requirements,
> especially at these network 'choke' points, so a HW orientated
> solution may be desirable...which brings us back to
> cost/complexity...

Limits for software based heuristics are not really related to the
line speed, but number of new IPsec connections per second going
through the device.

The line speed do affect the HW based flow cache lookup (i.e. the
appendix A.1 fastpath part of the processing), but that is doable even
at 10Gbps speed, as it basically does same thing as normal stateful
firewall rules (i.e. fetch flow information based on IP address pair,
protocol and in this SPI number instead of port numbers).

> >As here the heuristics is run on the same device which is running the
> >deep inspection, they do already require methods of transferring that
> >deep inspection state from one device to another, and moving the IPsec
> >SPI cache state at the same time should not be a problem.
> 
> [Ken] But again, this is additional work, which can be avoided if we have no 
> state.

Yes, it is additional data. You need to transfer 6 bytes (SPI + ICV
len and IV len) per flow more when you transfer the whole deep
inspection state from device to other (which might include whole TCP
transmission window, which is around 64kB or so), so the increase of
additional work is about 0.009% (actually it is normally even less, as
TCP state is per TCP flow, and usually one IPsec flow has multiple TCP
flows inside, but in this case I took the worst case scenario, where
each IPsec flow have exactly one TCP flow inside).

I do not really consider that to really be matter.

(Doing deep inspection on TCP streams usually do require
reconstructing TCP stream in fully including dropped and retransmitted
packets. Otherwise there are attacks where you only inspect packet
which never reaches the end node (attacker causes it to be dropped),
and the retransmission packet is different than the first packet and
if you let that pass to the end node, attacker managed to get
uninspected packet through. Solution is that you either do the
retransmissions from your internal data or you verify that
retransmissions sent by the original node contains same data than
original packet, both of them do require you keep the TCP transmission
window data. This text is here just to explain that doing deep
inspection (for IDS or IPS) on TCP stream is very costly operation and
heuristics do have minimal cost compared to them.)

> >> Auditing / logging / sniffing / sampling are some examples of
> >> stateless devices that do require to peek in the packets. Probably
> >> lots more also, so look for others to provide examples...
> >
> >As those do not affect the forwarding of the packet, then the
> >reliability requirements for them is much lower, meaning that they can
> >also work without storing any state, and running heuristics for per
> >packet basis. That of course do require implementing the heuristics on
> >the hardware if we are talking about gigabit links or faster. My guess
> >is that without any hardware support and only using software with
> >modern CPU you can probably process more than 100MBit/s, but most
> >likely not full 1GBit/s speed.
> 
> [Ken] Agreed on the need to implementing heuristics in HW, but this
> contradicts the original 'benefit' of heuristics, where heuristics
> engine can be run in SW and a resultant cache entry stored in HW.
> Adding an ever changing set of rules + heuristics engine is a
> non-starter - at least that is the feedback I am getting from the HW
> engineers I have spoken with - mucho complexity + cost!

I explained that IF you REQUIRE heuristics to run without state, then
you most likely need to implement them on hardware if you need high
speeds. I didn't mean to say anybody should try that, because I think
it is cheaper to do it by adding state.

I think doing heuristics on the software is completely ok and doable,
for even very high speeds, when you do it in the smart way, i.e. use
state. I would expect following implementations to be usable:

1) Keep state of the SPI information, doing slowpath heuristics on SW,
   and doing fastpath part on HW (any speed).

2) Keep state of the SPI information, doing slowpath heuristics on SW,
   and doing fastpath part on SW (most likely up to 1 GBit/s).

If you insist on not using state, then you can most likely do full
slowpath and fastpath on SW up to 100MBit/s speeds, and I do not
really see any point of doing any hardware version as making versions
1 or 2 above make much more sense.

If you really require very high new connection rate, i.e. for example
you have 10Gbit/s link and it consists fully at IKEv2 creation (4
packets), TCP session creation (3 packets), 2 data packets, TCP
session finish (4 packets), IKEv2 SA delete (2 packets), i.e. each
IPsec SA only contains one TCP session, which only exchange one data
packet. This means there is 15 packets in total 8 in one direction 7
in other. IKE packets around 236 + 236 + 208 + 208 (with minimal
packet sizes for AES-SHA1, 1024 bit MODP, pre shared key) and 100
bytes for IKEv2 deletes, TCP SYN/FIN packets are 40 bytes, and data
packets 40+64 bytes of data. So in total that means 1576 bytes and
using ethernet framing that makes 2146 octets -> 17168 bits.

Using 10Gbit/s link that means we can do 582479 such exchanges per
second. For software implementation using 2GHz CPU that gives about
3400 cycles for doing the heuristics for the packet. That should be
plenty for doing heuristics, as they do not really have any loops or
other complicated operations. The one packet verification is around 10
loads from packet, and around 15-20 compares, and we might need to do
that 6 times (for differet ICV/IV lengths), so in total we have at
about max 60 loads and 90-120 compares plus some basic arithmetics.

I would expect that 2GHz CPU should keep up to that speed. 2GHz CPU
cannot keep up to the 10GBit/s line speed or do routing lookups for
that speeds, but if routing and non-heuristics packets are processed
by other hardware, one cpu should take care of the worst case
heuristics.

So even at 10Gbit/s line speeds I do not think you need hardware
implementation of the slowpath heuristics part.

> [Ken] There will always be a migration path, as well as exception
> cases. It is much easier to add a static rule for a fixed printer,
> then to have dynamic rules to allowing encrypted data to pass
> through the network. Additionally, some of the legacy devices may
> not even support a secure connection, so the traffic will be in the
> clear anyway.

All of the traffic is in the clear, as we are talking about ESP-NULL
here.

My draft offers even simplier solution if you are accepting that kind
of solutions. It is mandated by policy option, then you do not need to
run heuristics at all, you simply assume all packets are ESP-NULL, as
that is mandated by policy, and those exception cases where this is
not true, you handle by adding rules...

> E.g. How many printers support IPsec today?

Not sure, but I do know there are such out there, and even more will
be (at least kyocera has announced their printers supporting IPsec and
IPv6 
http://www.fishers-boise.com/Information_Vault/Blogs/e_2496/News/2009/1/New_Kyocera_Printers_Combine_Quality__Low_Cost__and_Environmentally_Friendly_Design.htm).

> If they need to support this in the future, then it is just as easy
> for them to support WESP, instead of ESP-NULL...

Might be but for PCs you normally do this by updating the operating
system, how often have you updated the OS of your printer? And note
that they already support ESP-NULL and with heuristics they do not
need to update anything. With WESP they need to update their software.

Most of those vendors do not have their own IPsec, but it might either
come with the embedded operating system or it might be OEM toolkit,
which requires them to get new version of that, and new features
usually only comes with new versions (i.e. they are not part of the
bug fixes provided for old versions), thus they would need to update
to newer version of operating system or toolkit.

As we do make IPsec OEM toolkit I can say that our customers have been
very reluctant to update to latest versions after they get their
pruducts out. there are still customers using 4 years old version.
Some of them are luckily now looking for updating to latest versions
for new products. I do not expect them to ever update their old
devices for new versions.

So if we make WESP it will come out most likely during 2010 (and we
most likely do not make it unless there is customer demand for it (or
for heuristics), which mean that it might get pushed forward by year
or two), which will mean that some of our customers might delay up to
2015 before they update to that version, and after that it takes year
or so before they get their devices out.

> [Ken] I think we need to consider all these issues in determining if
> a heuristics solution will work and scale under ALL circumstances.

I do not think heuristics need to work in ALL circumstances. I think
it needs to work in the circumstances we are aiming for, and by
charter that is "an intermediary device, such as a firewall or
intrusion detection system".

Do you really consider stateless versions of those intermediary
devices something that will be common in the future?

> By contrast, WESP does not have any of these issues (bar adoption),
> as it is being designed for efficiency, cost effectiveness and
> scalability. 

The adoptation is big issue there.

As my draft says even if we go for WESP, I think the intermediary
device vendors still want to have some solution they can use now, thus
they will still need to implement heuristics.
-- 
kivi...@iki.fi
_______________________________________________
IPsec mailing list
IPsec@ietf.org
https://www.ietf.org/mailman/listinfo/ipsec

Reply via email to