On 2012-11-27 17:12, Darin Cox wrote:
Hi Pete,

Would you mind sharing your calculations of confidence and probability?

Here is a page on the math:
http://www.armresearch.com/support/articles/technology/GBUdb/learns.jsp

I'm looking at the stats for p=1.0 and curious about the low confidence values. I would have expected high confidence where there were no good samples and a lot of bad... or do I have something backwards?

Confidence is a measure of the number of samples seen. So, if you have only one sample, and it was a bad message, then you have a 100% probability that you will get a bad scan (spam) as far as you know... BUT since you only have one sample, you don't know very much so your confidence is low. If, on the other hand, you have seen a few dozen messages from an IP and all of them were bad then you would have much more confidence in your probability figure.


Also, while it's easy to parse, it might be nice if the output had one delimiter between fields instead of being both tab and comma delimited. Makes importing into a database for analysis much easier.

I will look into making a different output mode that's easier to parse. The existing one is supposed to be human friendly.

Thanks!

_M

--
Pete McNeil
Chief Scientist
ARM Research Labs, LLC
www.armresearch.com
866-770-1044 x7010
twitter/codedweller


#############################################################
This message is sent to you because you are subscribed to
 the mailing list <sniffer@sortmonster.com>.
This list is for discussing Message Sniffer,
Anti-spam, Anti-Malware, and related email topics.
For More information see http://www.armresearch.com
To unsubscribe, E-mail to: <sniffer-...@sortmonster.com>
To switch to the DIGEST mode, E-mail to <sniffer-dig...@sortmonster.com>
To switch to the INDEX mode, E-mail to <sniffer-in...@sortmonster.com>
Send administrative queries to  <sniffer-requ...@sortmonster.com>

Reply via email to