On 11/Jul/11 08:29, Hal Murray wrote: > >> I've been wandering about such period. If we use the record's Time-To-Live >> (TTL) we can specify stateless reporting like so: > > Piggybacking on the TTL field seems like a bad idea. A big system might be > loafing at X reports per second while the same load could kill a small system > or saturate a smaller link. So you have to distribute something like a scale > factor. I'm assuming that would be done over DNS. At that point you might > as well distribute the real data.
I assume you mean "/receiving/ X reports per second". I reckon X is proportional to the size of the domain anyway: a large domain sends much mail, part of which may cause authentication failures, and is more heavily phished than a small domain. The current ri parameter provides for setting a /linear/ scale factor, whereby a domain can say it is only interested in getting, say, 25% of failure reports. Such linear behavior can be achieved in a stateless manner by tossing R and sending if R <= 0.25. Linear and exponential cutoffs don't have to be mutually exclusive. > -------- > >> On diagnosing a failure, the agent generates a random number R in >> the interval [0, 1] (or sets R=0.5). It then computes a value P >> [...] If P >= R, then the agent generates and sends the report, >> otherwise does nothing. > >> P may be computed so as to be near to 1 for newly retrieved >> records and then decreasing more or less rapidly, according to >> the value of ri. > > What are the goals of this section? Since the reporter does not know what failures might be important for a domain, varying the criteria on which reports are discarded may better the chances to report something useful. > I assume the main idea is to avoid overloading (DoS) the receiving system. > There are two parts to that. How many reports are coming from each system, > and how many systems are contributing to the overall load. Yes. Each contributing system looks up the DNS record when a message arrives. Larger systems will use the cached version of that record repeatedly until it expires, if they receive a nearly continuous stream of messages. A small system possibly uses it just once. A sharp Gaussian probability centered on TTL can thus increase the relative visibility of the latter. > I like the idea of an exponential backoff. What are the appropriate > parameters? What data would the sending system need in order to do the right > thing? We should do some simulations to ascertain that. > Should this type of reporting be moved to a separate socket or separate IP > Address? (so a TCP level reject/timeout can be used to trigger the backoff) I don't think so, it'd be a rather blind trigger. > --------- > > Would it help to batch the data (at the report stage)? If you are the > receiving system, what fraction of your CPU/whatever resources are spent > processing the connection vs processing the data for a "report" transmitted > over that connection? If I have 100 reports per hour, would you like to get > them batched in one message rather than 100 separate messages? Yes, I certainly would. Indeed, this is what is currently being specified. However, exactly one message has to be attached to the report. For the other failures the reporter can only supply a count, implying they are "similar" in some sense. Does that mean they all had the same Auth-Failure type? Local-part? Did each failure occur today? How are multiple recipients being counted? I'd never know. Computing a probability, however complicate it may seem, can be done in a few lines of code and yields coherent results across most modern systems. Batching requires more implementation effort and more CPU cycles for the reporter, as it implies maintaining tables indexed on domain names. I think that mandating such behavior would exclude more contributing sites than the stateless approach. _______________________________________________ marf mailing list [email protected] https://www.ietf.org/mailman/listinfo/marf
