https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6061
--- Comment #9 from AXB <[email protected]> 2009-02-07 12:45:22 PST --- > (in reply to comment #5) > Is the idea to accept anything that begins with "http://" as a URL? I would > like to have some idea as to how many false positives that leads to -- Not FPs > on spam detection, although that is important too, but for this, how many > false > identification of strings as URLs and how many resulting unnecessary calls to > URIRBLs? The reason for the current URI parse code (in trunk -- I'm still > waiting for that one more review and vote to put it in the 3.2 branch) is to > only send to the RBL what are possibly real links. - Its not supposed to trigger any queries - Its not supposed to be used to mark spam or ham, so FPS are not an issue. - It IS supposed to check if what the parser thinks is a tld, exists in the tld data or not. if URL is example.comm and ".comm" IS NOT in known tld list return 0 if URL is example.com and ".com" IS in known tld list return 1 make the 0 available to a rule. nothing else. > Which brings up another point. Is health.sharpdecimal as opposed to > health.sharpdecimal.com in the RBLs anyway? the URIBLs depend on SA's or other tld tables to list a domain. If its an unknown tld it won't be listed. health.sharpdecimal won't ever be listed unless someone starts listing these types. No sober BL op I know of would do this :-) > If not, what would be the point of parsing it as a URL? - to detect if domain is in the known tld list - to create custom URI rules to detect stuff which won't ever be listed but needs scoring (positive or negative, whatever may apply) - if its a new/obscure/frequent URI ending, add a util_rb_2tld entry to allow SA to parse it as known tld -- Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug.
