Re: [Declude.JunkMail] Web-o-Trust

Pete McNeil Wed, 10 Dec 2003 19:33:39 -0800

At 06:27 PM 12/10/2003, you wrote:

I'm with Todd here. I see very little value here. I don't have a problem with blocking E-mail from

<snip>

I'm not against the idea of having some form of a registry, however the root of the problem is in differentiating among the gray stuff and not among the non-automated stuff. I find value in things like BONDEDSENDER, though to some purists, they view this as legitimizing large commercial spammers because their definition of spam differs from mine. Heck, Kami and I can't even agree on what spam is when it comes to this gray area stuff, and although I trust Kami's opinion on what he considers to be trusted senders, I wouldn't automatically trust his customers, or some list over which he is only in part involved in maintaining.

* See my PS for a description of how this objection might be mitigated.

<ship>

If someone can show me the value of crediting points to hosts which account for almost none of my mail volume, over which I have no familiarity with their rules and procedures, and for which I am not aware of any substantial problems, I will definitely reconsider my stance.

WOT turns out to be very similar to the COT (Circle Of Trust) features we are going to build into Message Sniffer. Snice WOT seems to be getting some attention we've decided to push forward some of that development toward building utilities that are compatible with WOT, and specifically that automate some of the admin process.

The COT systems we have planned will allow like-minded peers to share policy decisions and ratings for email sources. Our COT mechanisms will provide for a "colorful gradiation"... but the first mechanisms to be implemented will establish the black and white edges of the spectrum. In the simplest terms, the black edge is where email sources only produce spam and/or malware and the white edge is where sources never produce these.

WOT offers an early opportunity to define the "white edge", so we're anxious to begin supporting it.

As you point out, the "white edge" is somewhat fuzzy depending upon your definition of spam, but this can be mitigated through some fairly simple math - and in the end the "extreme white" will generally be agreed between systems as much as the "extreme black" is often common ground.

The benefit of having a reference to the "white edge" is primarily the elimination of false positives from previously unknown sources and transient filtering errors. The value of this scheme is particularly enhanced if your definition of the "white edge" is derived from like-minded peers and in particular systems that are your common neighbors and their neighbors. (if you're likely to have common contacts)

If the generation of "white edge" information can be automated (and we think it can) then this frees your system to be more aggressive in defining what is black since the probability of false positives is reduced.

(sorry if any of this is fuzzy... it is sometimes difficult to explain the real leverage that can be attained through network effects.)

To summarize, if the generation of your WOT can be automated based on the messages that you receive which are "extremely white" then the benefit of sharing that information with other systems is that you can gain access to their information. Everyone in the group can then be more aggressive with their filtering.

--- I don't want to put too much emphasis on this, in particular because there are problems with defining a source strictly from the IP address, but you might also think of the problem in the following way:

Through the use of virii and other means spammers have potential access to the vast majority of the IP space for sending their content. Comparatively there are very few points in the IP space that are legitimate sources for email - that is, sources which are at least email servers as opposed to randomly compromised equipment. There is every indication that these conditions will continue to get worse. (early in the game it made more sense to list the bad guys than the good guys, now the numbers say those conditions have reversed)

Strictly from a data processing perspective, it is clearly more economical to map the acceptable sources for email than it is to map the unacceptable sources.

From the perspective of automated, decentralized trust based systems, networks of trusted peers is a powerful mechanism - wether automated or not. Personally I think it is VITAL (sorry for the caps but I mean it) that these kinds of control systems remain completely open and decentralized in order to avoid the potential for catastrophic failure and abuse that is associated with any centralized mechanisms.

WOT isn't perfect, but it is a great place to start and it's here right now. The value of WOT will increase radically as it is more widely adopted - this is true of any system that leverages network effects.

I'm sure that if WOT really takes off it will be extended naturally to overcome many of it's shortcomings and that maintenance tasks will become highly automated. Given what I've learned by simulating COT scenarios for Message Sniffer, WOT has the potential to be extremely powerful in a very short time... if enough people can be convinced to give it a shot of course ;-)

I hope this helps,
_M

PS: I've just created some internal projects to develop automation tools that generate WOT data from Message Sniffer, and potentially from Declude (Scott has indicated that he may offer direct support for WOT and perhaps some additional supporting features).

These tools would allow a system to automatically generate a portion of their WOT file by doing a straight-forward statistical analysis of the messages that they allow to pass through their system. By definition the WOT file for a given system using these tools would automatically represent IP sources that are consistently trustworthy based on that system's filtering practices.

These tools would allow each system to express their "white edge" policy decisions based on their local policies with a vanishingly small additional cost in administration and computing resources.

The resulting WOT files generated from these tools will contain IP sources from which the publishing system consistently accepts messages as a matter of practice. If the standards for generating this data are set high enough then the trust level can also be high for subscribers.

WOT has, in my opinion, a failing in that it's mechanism for extending trust is binary - but it's not a bad starting point since it's easy to implement. Perhaps after a bit of adoption WOT can be extended to provide a more flexible mechanism for extending trust - such as a simple weighting system.

(Borrowing from our COT R&D work...) If WOT can be extended to include a weighting metric for included files (specifically a % Confidence) then the model can become extremely precise because it will naturally become a voting network. The result of that process is the ability to produce a reliable confidence metric from the recommendations of multiple peers. If these mechanisms are put in place it also becomes possible to automate the tuning of this voting network.

I hope to encourage the WOT folks to look in this direction.

An simple extension might be:

include[90]: http://www.my-trusted-friends.com/web-o-trust.txt
include[10]: http://www.guys-i-dont-know.com/web-o-trust.txt

A more complex extension might allow for changing metrics based on the distance...

include[100]: http://www.my-other-office.com/web-o-trust.txt 1[90] 2[50] 3[10] 4[5]

The extension presumes that weighting is optional.
The above would mean:

Weight my-other-office at 100% confidence.
Weight Their direct extensions at 90%.
Weight The next neighbors at 50%.
Weight The next neighbors at 10%.
Weight all further neighbors at 5%.

The same mechanism allows for weighting IPs... such as gray sources, consider the implications of the following construct:

ip[20]: ...

Metrics like these could also be derived automatically from local system practices... For example, topica IPs where a local system might block first and then white-list the content their local users are subscribed to. The result would be a weight that indicates the percentage of messages that are considered legitimate at that system from that IP source.

Consider also the implications of allowing weighted overlaps... Perhaps providing a very low weight (perhaps 0) to a notoriously bad networks IPs in general with a large CIDR entry and then providing a few high confidence numbers for specific IPs on those networks.

Note also that providing weight values for IP entries generates a white-black gradient potential for the system.

There are many ways this can go... I don't want to muddy things...

Sorry for taking up so much bandwidth, but I'm excited about the potential here.

Thanks,
_M

---
[This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)]

---
This E-mail came from the Declude.JunkMail mailing list.  To
unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
type "unsubscribe Declude.JunkMail".  The archives can be found
at http://www.mail-archive.com.

Re: [Declude.JunkMail] Web-o-Trust

Reply via email to