Thanks again for the quick turn-around on this.
Using your proposed 2**(Delay + 10) seems to strike an okay balance, if
I'm understanding the situation correctly. Double-check my thinking
here: the scope of RA reach from an attacker will be available only on a
single local link, which deployments typically limit to on the order of
500 clients or so. If all 500 are triggered at the same time and smooth
out their requests over a one-second window, we're looking at a 500 TPS
load on a web server. That's about 25% the capacity of a relatively
low-end web server (e.g., Apache running on an Atom 1.66), which seems
small enough to avoid major issues.
So, unless one of my assumptions above is wrong, I think your proposal
below is a good solution to the issue. I'll clear my DISCUSS when a new
version of the draft comes out (I would propose that you wait for
instructions from your AD about when to do so).
/a
On 1/22/20 17:51, Tommy Pauly wrote:
Hi Adam,
Thanks for the feedback! The updated paragraph in the retrieval
section, to indicate a maximum failure count per attachment, is:
If the request for PvD Additional Information fails due to a TLS
error,
an HTTP error, or because the retrieved file does not contain
valid PvD JSON,
hosts MUST close any connection used to fetch the PvD Additional
Information,
and MUST NOT request the information for that PvD ID again for the
duration
of the local network attachment. If a host detects 10 or more such
failures
to fetch PvD Additional Information, the local network is assumed
to be
misconfigured or under attack, and the host MUST NOT make any further
requests for PvD Additional Information, belonging to any PvD ID, for
the duration of the local network attachment. For more discussion,
see {{security}}.
I've also expanded the security considerations DoS section as follows:
An attacker generating RAs on a local network can use the H-flag
and the PvD ID
to cause hosts on the network to make requests for PvD Additional
Information
from servers. This can become a denial-of-service attack, in which
an attacker
can amplify its attack by triggering TLS connections to arbitrary
servers in response
to sending UDP packets containing RA messages. To mitigate this
attack, hosts
MUST:
- limit the rate at which they fetch a particular PvD's Additional
Information;
- limit the rate at which they fetch any PvD Additional
Information on a given local
network;
- stop making requests for a PvD ID that does not respond with
valid JSON;
- stop making requests for all PvD IDs once a certain number of
failures is reached
on a particular network.
Details are provided in {{retr}}. This attack can be targeted at
generic web servers,
in which case the host behavior of stopping requesting for any
server that doesn't
behave like a PvD Additional Information server is critical.
Limiting requests for
a specific PvD ID might not be sufficient if the attacker changes
the PvD ID values
quickly, so hosts also need to stop requesting if they detect
consistent failure when
on a network that is under attack. For cases in which an attacker
is pointing hosts at
a valid PvD Additional Information server (but one that is not
actually associated
with the local network), the server SHOULD reject any requests
that do not originate
from the expected IPv6 prefix as described in {{serverop}}.
For the delay calculation, you make a good point that the larger
values get pretty unnecessarily large! I'm a bit concerned about
making the minimum fetch range be ~4 seconds, as that could end up
being user visible for some valid scenarios. How about making the
formula "2**(10 + Delay)":
The target time for the delay is calculated
as a random time between zero and 2**(10 + Delay) milliseconds,
where 'Delay' corresponds to the 4-bit unsigned integer in
the last received PvD Option.
This limits it to 1 second as what the RA can request for fastest
frequency bound. This isn't incredibly fast, and with the overall
limits for how many requests can be made by a client (which provide
the larger portion of the DoS prevention, I'd argue), I think this
strikes a good balance between usability and precaution. Thoughts?
I've updated the GitHub text for anyone wanting to see the full flow:
https://github.com/IPv6-mPvD/mpvd-ietf-drafts/pull/25
Thanks,
Tommy
On Jan 22, 2020, at 2:58 PM, Adam Roach <[email protected]
<mailto:[email protected]>> wrote:
Thanks for the explanation and the further proposed mitigation.
Allowing the RA to specify an arbitrarily small "Delay" parameter
seems to still allow for a pretty big burst of traffic. If I read the
proposed interpretation of the "Delay" bits correctly (2**(Delay *
2)), the current behavior is specified to allow a delay upper bound
selected from one of the following (approximate) values:
* 1 ms
* 4 ms
* 16 ms
* 64 ms
* 256 ms
* 1 second
* 4 seconds
* 16 seconds
* 1 minute
* 4 minutes
* 17 minutes
* 70 minutes
* 4 hours, 40 minutes
* 18 hours 38 minutes
* 3 days, 3 hours
* 1 week, 5 days
That's a pretty breathtaking scope, and it's hard to imagine that the
first six or so are strictly needed, while all six are in a range
that might overload a DDoS target. The final several seem a bit
questionable as well, given normal operational timelines for network
attachment. If the formula were revised to, e.g., "2**(Delay + 12)"
instead of the current formula, you would have an enforced lower
bound of roughly four seconds (which should be enough to blunt most
DDoS attacks), and an upper bound of roughly 37 hours (which still
seems excessive, although not quite as much as the previous upper bound).
Assuming the additional mitigation you propose below (10 maximum
failures per attachment) as well as some means of achieving a
lower-bound for "Delay" on the order of multiple seconds, I think I'm
good clearing when a new version comes out.
Thanks for your work in thinking through practical solutions to this
issue.
/a
_______________________________________________
Int-area mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/int-area