Re: [Mimedefang] IP Reputation data collection (announcement, Internet draft)

Kevin A. McGrail Fri, 30 Apr 2010 13:17:47 -0700

Hi Dave,

Passionate technical debate follows ;-)

DFS, I believe my comments below also address your comments which Ireceived slightly later.

In synopsis, I'd recommend you go with the broader, more flexible RFC.This is a great idea IMO either way, though!


regards,
KAM

1 - including the product / version used for auto-ham/spam and theautomated score & threshold of a spam
I see some of this as best handled out of band. You already need tonegotiate a username and shared secret before events can be reportedto the aggregator, so that's probably the best time to communicateproduct and version information.

As versions are always changing, you might want to know that someone isusing SpamAssassin 3.X and another person is using IHateSpam, etc.

The issue of scores is tougher, particularly in situations whereend-user configuration can change the score at any time. Here, it maymake sense to return the score and threshold with the event, but thosetwo points of data may not provide enough information to be useful.For example, two users (or CanIt streams, or filtering systems, or...)could have the same threshold and arrive at the same score for anearly identical message, but for entirely different reasons. It'sprobably enough for the purposes of reputation tracking to know thatsomeone or something thought they saw a spam event from a given address.

I agree it's not a complete snapshot but the information could beinvaluable. How valuable is debatable but my point is that some "extradata" per event is likely a good idea. And, for example, emails thatscore really high on SA are something that could be weighted. I mightnot even pay attention to the spam threshold as much as the spam score,for example.

From RPs perspective, knowing that 1.2.3.4 is sending a LOT of emailsall marked 15 and higher by SA could give a lot more credibility thanmarking a bunch of emails 1% over the threshold.

2 - including virii/malware as a note
Another event type for "virus or malware seen" might be a goodaddition, but I don't see any value in communicating back anythingmore detailed than that for calculating reputation. Differentiationbetween "virus" and other malware might be useful, too.

The virus type would be useful in identifying breakouts, etc. Againthough, this isn't a debate of the value of the data because thatshouldn't be a goal of the RFC. The goal is to provide something that isa framework lots of people might use both as aggregators and sensors.Towards that end, I would encourage RP to consider packaging theaggregator code as well since it's my basic belief

3 - dangerous attachments and a filename
4 - dangerous content
I guess the usefulness of this depends on the definition of"dangerous". What are you looking for here?

One example is a lot of emails that are phishing are sent with bad PDFsand EXEs.

Dangerous content could refer to phishing attacks via social engineerthat don't have attachments. Perhaps something like the ClamAV Phishingsignatures.

5 - reverse DNS failures
This might be good, but handling transient failures due to local orupstream DNS issues vs. failure to configure rDNS for a host might benecessary.

IMO, you are debating what an aggregator should do with a data ratherthan the process of sending / receiving the data. However, I think wecan agree that rDNS is an important component in the email ecosystem.

From RPs perspective, tracking senders that are consistently usinginvalid rDNS especially reported by multiple sensors would lead tovaluable data especially if it occurred over a period of time suitableto remove DNS outages from consideration.

6 - improper HELO/EHLO statements


This is probably a good one to add.

Hooray.  We'll always have Paris.

Seriously though, please realize that this was my first pass at aresponse to the RFC. I think you should poll for brainstorms on EVENTSto consider. There have got to be a lot more I haven't thoughtof/remembered/etc.

7 - invalid MX records
That's not terribly useful for a sending IP address, as there's nolegitimate reason the sending IP needs to be an MX of the sender'sdomain.

While RP's use of the aggregated data is an IP-based index, others mightuse it for a sending email address index, etc. But knowing that IP1.2.3.4 sent me an email from a from address with an invalid MX record(which includes checking A records, etc.) is quite useful in real-worldanti-spam.

I liked that in in #3 that REPUTATION database is not specific toindexing by IPv4 or IPv6. The system should be extensible to reportmore data such as the email address of the sender or recipient, thesubject of the email, etc. In theory, the system could even replaceRazor so it could include a hash of the email, etc. But I wouldlikely caveatthe first sentence with "index by IPv4 or IPv6 addressas oner example".
That's probably a bit of scope creep.  The idea here is that filters can
communicate IP reputation information with a low-overhead UDPprotocol. Sender address reputation might be worth investigating in afuture iteration (the extensibility is there), but let's concentrateon the IP reputation case for now.

Sure, replacing Razor is feature creep so that's an extreme case. Butadding more data to the packet is likely necessary to make this moreextensible though I did scope it to fairly short bits of data liketo/from/subject and hash values.

Plus playing devil's advocate, the RFC says specifically the IPreputation is NOT the only goal:

"Note that the exact format of the reputation database as well as whatconstitutes "reputation" are beyond the scope of this document. We areconcerned only with a standard for reporting events."

So while I'm happy to address it more narrowly, my editorial feedback onthis version would be to remove that statement if it isn't your goal toextend this beyond IP reputation.

The use of port 6568 could be expanded to stated something likeunless the AGGREGATOR utilizes an alternate port or something. Ihave other listeners on 6568 already, for example.
Well, it's an RFC, so "SHOULD" pretty much covers that.

Agreed and I was happy you added the RFC-eeze description but it neverhurts to be explicitly flexible and even require that alternate ports bepossible.

4.2 would be best organized into 4.2.0 for reserved, 4.2.1 forGREYLISTED, etc. so that all event types have a clear reportrestriction. Then 4.2 should be restrictions for all events like IPv4

Makes sense though if you end up adding a bazillion more EVENT types,grouping them could become troublesome. I was mostly looking for somesemblance of a 1:1 restriction for each EVENT type to help ensure thatan EVENT type isn't forgotten in the years to come.

Does " a priori knowledge" mean something or is it a grammar/spellingissue?
http://en.wikipedia.org/wiki/A_priori_and_a_posteriori#Use_of_the_terms

Thanks. I wasn't sure if there was some other meaning than who I readit originally.

So knowing that, my underlying question is: What is the a reason that asensor should only send 492 bytes? Because I read the text it as "withprior knowledge" which seems a fair paraphrase and that meant to me thatthe very next statement constituted prior knowledge that the aggregatorhas to accept larger than 492 bytes. In short, sentence one's caveat ismet by sentence two's caveat that the aggregator MUST handle reportsequal to or less than 65507, i.e. greater than 492 bytes. Thisinvalidates the need for sentence 1 completely which I imagine isn'twhat you want.

I would include an extract definition of [GREY] in section 7 inaddition to the reference. It's a term that confuses a lot ofpeople that I discuss anti-spam with that aren't anti-spam researchers.
Possibly a good idea, though I don't expect too many people who aren'tinvolved in anti-spam activities will be interested in this RFC

Touche. I agree to this statement 100%. I forgot to consider theaudience.


Regards,
KAM
_______________________________________________
NOTE: If there is a disclaimer or other legal boilerplate in the above
message, it is NULL AND VOID.  You may ignore it.

Visit http://www.mimedefang.org and http://www.roaringpenguin.com
MIMEDefang mailing list [email protected]
http://lists.roaringpenguin.com/mailman/listinfo/mimedefang

Re: [Mimedefang] IP Reputation data collection (announcement, Internet draft)

Reply via email to