Hi Dave,

Passionate technical debate follows ;-)

DFS, I believe my comments below also address your comments which I received slightly later.

In synopsis, I'd recommend you go with the broader, more flexible RFC. This is a great idea IMO either way, though!

regards,
KAM

1 - including the product / version used for auto-ham/spam and the automated score & threshold of a spam

I see some of this as best handled out of band. You already need to negotiate a username and shared secret before events can be reported to the aggregator, so that's probably the best time to communicate product and version information.
As versions are always changing, you might want to know that someone is using SpamAssassin 3.X and another person is using IHateSpam, etc.

The issue of scores is tougher, particularly in situations where end-user configuration can change the score at any time. Here, it may make sense to return the score and threshold with the event, but those two points of data may not provide enough information to be useful. For example, two users (or CanIt streams, or filtering systems, or...) could have the same threshold and arrive at the same score for a nearly identical message, but for entirely different reasons. It's probably enough for the purposes of reputation tracking to know that someone or something thought they saw a spam event from a given address.
I agree it's not a complete snapshot but the information could be invaluable. How valuable is debatable but my point is that some "extra data" per event is likely a good idea. And, for example, emails that score really high on SA are something that could be weighted. I might not even pay attention to the spam threshold as much as the spam score, for example.

From RPs perspective, knowing that 1.2.3.4 is sending a LOT of emails all marked 15 and higher by SA could give a lot more credibility than marking a bunch of emails 1% over the threshold.
2 - including virii/malware as a note

Another event type for "virus or malware seen" might be a good addition, but I don't see any value in communicating back anything more detailed than that for calculating reputation. Differentiation between "virus" and other malware might be useful, too.
The virus type would be useful in identifying breakouts, etc. Again though, this isn't a debate of the value of the data because that shouldn't be a goal of the RFC. The goal is to provide something that is a framework lots of people might use both as aggregators and sensors. Towards that end, I would encourage RP to consider packaging the aggregator code as well since it's my basic belief

3 - dangerous attachments and a filename
4 - dangerous content

I guess the usefulness of this depends on the definition of "dangerous". What are you looking for here?
One example is a lot of emails that are phishing are sent with bad PDFs and EXEs.

Dangerous content could refer to phishing attacks via social engineer that don't have attachments. Perhaps something like the ClamAV Phishing signatures.
5 - reverse DNS failures

This might be good, but handling transient failures due to local or upstream DNS issues vs. failure to configure rDNS for a host might be necessary.
IMO, you are debating what an aggregator should do with a data rather than the process of sending / receiving the data. However, I think we can agree that rDNS is an important component in the email ecosystem.

From RPs perspective, tracking senders that are consistently using invalid rDNS especially reported by multiple sensors would lead to valuable data especially if it occurred over a period of time suitable to remove DNS outages from consideration.

6 - improper HELO/EHLO statements

This is probably a good one to add.
Hooray.  We'll always have Paris.

Seriously though, please realize that this was my first pass at a response to the RFC. I think you should poll for brainstorms on EVENTS to consider. There have got to be a lot more I haven't thought of/remembered/etc.

7 - invalid MX records

That's not terribly useful for a sending IP address, as there's no legitimate reason the sending IP needs to be an MX of the sender's domain.
While RP's use of the aggregated data is an IP-based index, others might use it for a sending email address index, etc. But knowing that IP 1.2.3.4 sent me an email from a from address with an invalid MX record (which includes checking A records, etc.) is quite useful in real-world anti-spam.

I liked that in in #3 that REPUTATION database is not specific to indexing by IPv4 or IPv6. The system should be extensible to report more data such as the email address of the sender or recipient, the subject of the email, etc. In theory, the system could even replace Razor so it could include a hash of the email, etc. But I would likely caveatthe first sentence with "index by IPv4 or IPv6 address as oner example".

That's probably a bit of scope creep.  The idea here is that filters can
communicate IP reputation information with a low-overhead UDP protocol. Sender address reputation might be worth investigating in a future iteration (the extensibility is there), but let's concentrate on the IP reputation case for now.
Sure, replacing Razor is feature creep so that's an extreme case. But adding more data to the packet is likely necessary to make this more extensible though I did scope it to fairly short bits of data like to/from/subject and hash values.

Plus playing devil's advocate, the RFC says specifically the IP reputation is NOT the only goal:

"Note that the exact format of the reputation database as well as what constitutes "reputation" are beyond the scope of this document. We are concerned only with a standard for reporting events."

So while I'm happy to address it more narrowly, my editorial feedback on this version would be to remove that statement if it isn't your goal to extend this beyond IP reputation.

The use of port 6568 could be expanded to stated something like unless the AGGREGATOR utilizes an alternate port or something. I have other listeners on 6568 already, for example.

Well, it's an RFC, so "SHOULD" pretty much covers that.
Agreed and I was happy you added the RFC-eeze description but it never hurts to be explicitly flexible and even require that alternate ports be possible.

4.2 would be best organized into 4.2.0 for reserved, 4.2.1 for GREYLISTED, etc. so that all event types have a clear report restriction. Then 4.2 should be restrictions for all events like IPv4

Makes sense though if you end up adding a bazillion more EVENT types, grouping them could become troublesome. I was mostly looking for some semblance of a 1:1 restriction for each EVENT type to help ensure that an EVENT type isn't forgotten in the years to come.

Does " a priori knowledge" mean something or is it a grammar/spelling issue?

http://en.wikipedia.org/wiki/A_priori_and_a_posteriori#Use_of_the_terms

Thanks. I wasn't sure if there was some other meaning than who I read it originally.

So knowing that, my underlying question is: What is the a reason that a sensor should only send 492 bytes? Because I read the text it as "with prior knowledge" which seems a fair paraphrase and that meant to me that the very next statement constituted prior knowledge that the aggregator has to accept larger than 492 bytes. In short, sentence one's caveat is met by sentence two's caveat that the aggregator MUST handle reports equal to or less than 65507, i.e. greater than 492 bytes. This invalidates the need for sentence 1 completely which I imagine isn't what you want.
I would include an extract definition of [GREY] in section 7 in addition to the reference. It's a term that confuses a lot of people that I discuss anti-spam with that aren't anti-spam researchers.

Possibly a good idea, though I don't expect too many people who aren't involved in anti-spam activities will be interested in this RFC
Touche. I agree to this statement 100%. I forgot to consider the audience.

Regards,
KAM
_______________________________________________
NOTE: If there is a disclaimer or other legal boilerplate in the above
message, it is NULL AND VOID.  You may ignore it.

Visit http://www.mimedefang.org and http://www.roaringpenguin.com
MIMEDefang mailing list [email protected]
http://lists.roaringpenguin.com/mailman/listinfo/mimedefang

Reply via email to