> -----Original Message----- > From: Will The Game [mailto:[EMAIL PROTECTED] > Sent: Saturday, April 09, 2005 7:50 AM > To: CF-Talk > Subject: Built a dirty word checker, testing please? > > Hey, > > I've built a cool little tool I'm callin' nodirtywords.
It's really hard to do this (so I'm not picking on it really) but after some tests: +) Placing spaces between the letters allows the words to pass. "F U C K" is just as legible as "FUCK". +) Substitutions (1's and I's, 0's and O's) allow the words to pass. "SH1T" looks a lot like "SHIT" but passes the filter. +) Stemming most words allows them to pass. For example it didn't catch "fucking", "fuckin'", "fucked", etc. +) Adding symbols (even those in grammatically correct sentences) allow the word to pass. "-fuck", "fuck.", "fuck,", etc all pass. Like I said - this kind of stuff is REALLY hard to do with any kind of certainty. In most cases I've found the only way to really handle this is human review of material (although a filter like this is a decent first line of defense). Also you may want to look at some of the heuristics and learning systems designed to deal with spam - most of these problems arise there as well. Jim Davis ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| Logware (www.logware.us): a new and convenient web-based time tracking application. Start tracking and documenting hours spent on a project or with a client with Logware today. Try it for free with a 15 day trial account. http://www.houseoffusion.com/banners/view.cfm?bannerid=67 Message: http://www.houseoffusion.com/lists.cfm/link=i:4:202132 Archives: http://www.houseoffusion.com/cf_lists/threads.cfm/4 Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4 Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4 Donations & Support: http://www.houseoffusion.com/tiny.cfm/54

