> -----Original Message-----
> From: Will The Game [mailto:[EMAIL PROTECTED]
> Sent: Saturday, April 09, 2005 7:50 AM
> To: CF-Talk
> Subject: Built a dirty word checker, testing please?
> 
> Hey,
> 
> I've built a cool little tool I'm callin' nodirtywords.

It's really hard to do this (so I'm not picking on it really) but after some
tests:

+) Placing spaces between the letters allows the words to pass.  "F U C K"
is just as legible as "FUCK".

+) Substitutions (1's and I's, 0's and O's) allow the words to pass.  "SH1T"
looks a lot like "SHIT" but passes the filter.

+) Stemming most words allows them to pass.  For example it didn't catch
"fucking", "fuckin'", "fucked", etc.

+) Adding symbols (even those in grammatically correct sentences) allow the
word to pass.  "-fuck", "fuck.", "fuck,", etc all pass.

Like I said - this kind of stuff is REALLY hard to do with any kind of
certainty.  In most cases I've found the only way to really handle this is
human review of material (although a filter like this is a decent first line
of defense).

Also you may want to look at some of the heuristics and learning systems
designed to deal with spam - most of these problems arise there as well.

Jim Davis




~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Logware (www.logware.us): a new and convenient web-based time tracking 
application. Start tracking and documenting hours spent on a project or with a 
client with Logware today. Try it for free with a 15 day trial account.
http://www.houseoffusion.com/banners/view.cfm?bannerid=67

Message: http://www.houseoffusion.com/lists.cfm/link=i:4:202132
Archives: http://www.houseoffusion.com/cf_lists/threads.cfm/4
Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4
Donations & Support: http://www.houseoffusion.com/tiny.cfm/54

Reply via email to