Russell Clemings wrote: > >One other question: Is there an easy way to make it fire on parts of words >as well as whole words? For example, I might want to catch "dig," "digger," >"digging," etc. (Not to mention "motherdigger.")
You can do pretty much any matching you want. For example \b(mother)?dig(ger|ging)?\b would match 'motherdig', 'motherdigger', 'motherdigging', 'dig', 'digger' or 'digging', but it wouldn't match 'diggery' because the \b at the end of the regexp says "there must be a word boundary here" where a word boundary is the begining or end of the line or a transition from the set of letters, digits and underscore to something else, whereas \b(mother)?dig(ger\B*|ging)?\b would also match 'diggery' and 'diggers'. It gets somewhat tricky. You could just match 'dig' regardless of what follows or precedes it with the regexp dig but then you also match 'digest', 'indigent' and so forth. I know that 'dig' isn't actually the word you're targeting, but the same problem exists with most simple words. See <http://docs.python.org/library/re.html#regular-expression-syntax> or perhaps <http://oreilly.com/catalog/9780596528126/>. The original expression I gave you BADWORDS = re.compile(r'(\W|^)word3(\W|$)|(\W|^)word6(\W|$)', re.I) is a bit more complicated than it needs to be because (\W|^) and (\W|$) could just as well be \b. Using the 'verbose' mode of regular expressions that allows you to insert white space for readability, you could have something like BADWORDS = re.compile(r"""\bword3\b | \bword6\b | \b(mother)?dig(ger\B*|ging)\b """, re.IGNORECASE | re.VERBOSE) Then later you could decide to add \b(mother)?diggingest\b with minimal editing like BADWORDS = re.compile(r"""\bword3\b | \bword6\b | \b(mother)?diggingest\b | \b(mother)?dig(ger\B*|ging)\b """, re.IGNORECASE | re.VERBOSE) Another way to do this is like WORDLIST = [r'\bword3\b', r'\bword6\b', r'\b(mother)?diggingest\b', r'\b(mother)?dig(ger\B*|ging)\b', ] BADWORDS = re.compile('|'.join(WORDLIST), re.IGNORECASE) This just makes a list of simple regexps and then joins them with '|' for the compiled re. In this case, re.VERBOSE isn't needed as we introduce no insignificant white space. -- Mark Sapiro <[email protected]> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan ------------------------------------------------------ Mailman-Users mailing list [email protected] http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org
