This is part of a project called "r.gere", for "random genetic
evolution of regular expressions", which I started in September, 2001.
I clearly haven't worked on it much since.
The overall goal of the project is to allow people to write regular
expressions by example, with immediate feedback like Kodos gives you,
but going in both directions. The r.gere program stochastically
generates regular expressions and tries them out on training data you
supply. Because regular languages are closed under union, you can
test a lot of regular expressions in a single pass through the
training data.
The 2001 goal was to make it easier to pull implicit fields out of
textual data --- at the time, I was doing a lot of HTML
screen-scraping at Alpiri, and it would have been a lot faster to just
highlight the relevant parts of the HTML and have a program randomly
generate regular expressions to match them.
Today, we have a problem more painful than our inability to automate
Web browsing --- spam that circumvents filters. The current suite of
word-based naive Bayesian classifiers are vulnerable to attacks on
their word orientation --- splitting up words, using non-"word"
characters in them, etc.
So it occurred to me that a naive Bayesian classifier that used
regular expression matches would probably do a better job. But how to
know which regular expressions to try? Well, you could try a lot of
them, and see which ones relate to the classification question you're
trying to answer. You could even mutate and crossbreed them.
So I took another whack at the problem.
My initial naive attempt at generating regular expressions at random
and feeding them to Python's NFA-based regex engine was foiled by the
repeating-repeats problem. NFA-based regex engines don't do a good
job coping with Kleene closure applied to a regular expression which
can, itself, match the empty string; for example, x**, (x*)*, (x*|y)*,
or (x*y*)*. The Python regexp engine tries to detect this situation
when compiling the regular expression, but as of Python 2.3.3, it
isn't completely successful, and instead, it raises a RuntimeError
exception, "Maximum recursion depth exceeded", when you try to
.match() the compiled version of, say, '(?:x*y*)*'. That expression
(and '(?:x*|y)*') recognizes the same language as '(?:x|y)*', which
doesn't have the same problem, but it isn't obvious to me how to use
rewriting rules to do that transformation.
I spent some time hooking the regular expression random generation up
to the tree-rewriting system I posted sometime in the past, in order
to write regexp stringification rules as tree rewriting rules instead
of overloaded methods (I wrote the rewriting rules in my notebook
during a sleepless night in Bolinas while Beatrice slept, after my
laptop battery gave out), but so far my rewriting rules don't suffice
to prevent regexp application errors.
This program is far longer and more complex than it needs to be to do
what it does, but I regret that I don't have the time to make it
shorter tonight. In particular, the tree-rewriting engine is very
slow, overly complex, and clumsy.
Here are the rewrite rules for stringifying the regexps:
# this set of rewriting rules turns parse-trees of regular expressions
# into trees of string-generation and concatenation operations that
# should produce a parsable regexp.
# 'start' and 'end' mean "(?:" and ")" respectively, because this
# tree-rewriting language doesn't have string literals.
group(char($x)) -> _lit($x)
group(kleene(kleene($x))) -> group(kleene($x))
group(kleene($x)) -> _concat(group($x) _str(*))
string(concat($x $y)) -> _concat(group($x) group($y))
string(altern($x $y)) -> _concat(_concat(group($x) _str(|)) group($y))
group($x) -> _concat(_concat(start string($x)) end)
string(char($x)) -> group(char($x))
string(kleene($x)) -> group(kleene($x))
Here's the modified version of tree.py I'm using:
#!/usr/bin/python
# A tree is a possibly empty sequence of arguments, the first
# of which is an atom.
# The arguments may be trees, atoms, integers, or variables.
# Variables are "abstract"; atoms and integers are not.
# Trees are "abstract" iff they contain any abstract arguments.
# Things that are not abstract are "concrete".
# Abstract things can be "instantiated" with some mapping of
# variables to concrete things, producing a concrete thing.
# If an abstract thing can be instantiated to produce a
# certain concrete thing, it is said to unify with it.
# TODO:
# - eliminate distinctions between trees, atoms, and integers
# - remove integer type, use integer trees instead
# - add integer functions: + - * / % < = > <= >= & | # - figure out how to implement
Boolean logic completely (variables incl.)
# - add some facility for nice I/O
# - do some symbolic math with nice I/O
from __future__ import nested_scopes
import string, traceback, sys, StringIO
class thing:
def __init__(self, value): self.value = value
def abstract(self): return 0
def __str__(self): return str(self.value)
def instantiate(self, mapping): return self
def compatible_type(self, other):
return isinstance(other, self.__class__)
def unify(self, concrete, mapping):
if not self.compatible_type(concrete): return None
if self.value != concrete.value: return None
return mapping
def __cmp__(self, other):
if not self.compatible_type(other): return cmp(id(self), id(other))
return cmp(self.value, other.value)
def toplevel_result(self, ostream):
ostream.write('-> ' + str(self) + '\n')
class variable(thing):
def abstract(self): return 1
def __str__(self): return "$" + self.value
def instantiate(self, mapping): return mapping[self.value]
def unify(self, concrete, mapping):
if mapping.has_key(self.value) and mapping[self.value] != concrete:
return None
mapping[self.value] = concrete
return mapping
class atom(thing): pass
class integer(thing): pass
class tree(thing):
def __init__(self, items):
assert isinstance(items[0], atom)
self.items = items
self.traversing = 0
self.is_abstract = 0
for item in self.items:
if item.abstract(): self.is_abstract = 1
def abstract(self): return self.is_abstract
def __str__(self):
if self.traversing: return "((error))"
self.traversing = 1
try:
head, args = self.items[0], self.items[1:]
return str(head) + "(" + string.join(map(str, args), ' ') + ')'
finally:
self.traversing = 0
def instantiate(self, mapping):
if not self.abstract(): return self
return tree(map(lambda item: item.instantiate(mapping), self.items))
def unify(self, concrete, mapping):
if not self.compatible_type(concrete): return None
if len(self.items) != len(concrete.items): return None
for ii in range(len(self.items)):
mapping = self.items[ii].unify(concrete.items[ii], mapping)
if mapping is None: return None
return mapping
def __cmp__(self, other):
if not self.compatible_type(other): return cmp(id(self), id(other))
for ii in range(len(self.items)):
if ii >= len(other.items): return 1
rv = cmp(self.items[ii], other.items[ii])
if rv != 0: return rv
if len(self.items) == len(other.items): return 0
else: return -1
class toplevel_action(thing):
def __init__(self, value, mapping = None):
self.value, self.mapping = value, mapping
def instantiate(self, mapping):
return self.__class__(self.value, mapping)
def toplevel_result(self, ostream):
return apply(self.value, (ostream,), self.mapping)
def unify(abstract, concrete, mapping = None):
"""Return a mapping that instantiates the abstract as the concrete, or None
If a mapping exists, find it and return it; otherwise, return None.
"""
if mapping is None: mapping = {}
mapping = abstract.unify(concrete, mapping)
if mapping is not None:
assert abstract.instantiate(mapping) == concrete
return mapping
### parsing
def cons_string(astring):
rv = ()
for ii in range(len(astring)-1, -1, -1):
rv = (astring[ii], rv)
return rv
def append_charlists(aa, bb):
if aa == (): return bb
else:
hh, tt = aa
return (hh, append_charlists(tt, bb))
def restring(charlist):
rv = []
while charlist:
rv.append(charlist[0])
charlist = charlist[1]
return string.join(rv, '')
class parse_error(Exception): pass
class unmatched_leftparen(parse_error): pass
def parse_charlist(charlist):
if charlist == (): raise parse_error("premature eof")
hh, tt = charlist
if hh == ')': raise parse_error("unmatched )")
elif hh == '(': raise parse_error("( not following atom")
elif hh in string.whitespace: return parse_charlist(tt)
elif hh == '$': return parse_variable(tt)
elif hh in string.digits: return parse_number(tt, hh)
else: return parse_atomstuff(tt, hh)
def parse_atomstuff(charlist, name=''):
if charlist == (): return atom(name), charlist
hh, tt = charlist
if hh in string.whitespace + '()':
return parse_postatom(strip_whitespace(charlist), name)
else: return parse_atomstuff(tt, name + hh)
def parse_variable(charlist):
name, charlist = parse_atomstuff(charlist)
if name.value == '': raise parse_error("missing variable name")
return variable(name.value), charlist
def parse_number(charlist, digits):
if charlist == (): return integer(int(digits)), ()
hh, tt = charlist
if hh in string.digits: return parse_number(tt, digits + hh)
else: return integer(int(digits)), strip_whitespace(charlist)
def parse_postatom(charlist, name):
if name == '': raise parse_error("missing atom name")
if charlist == (): return atom(name), ()
hh, tt = charlist
if hh == '(': return parse_args(tt, (atom(name),))
else: return atom(name), charlist
def parse_args(charlist, args):
if charlist == (): raise unmatched_leftparen("unmatched (", args)
hh, tt = charlist
if hh == ')': return tree(args), strip_whitespace(tt)
elif hh in string.whitespace: return parse_args(tt, args)
else:
nextitem, tt = parse_charlist(charlist)
return parse_args(tt, args + (nextitem,))
def strip_whitespace(charlist):
if charlist == (): return charlist
hh, tt = charlist
if hh in string.whitespace: return strip_whitespace(tt)
else: return charlist
def parse(astring):
return parse_charlist(cons_string(astring))
def parseall(astring):
result, remainder = parse(astring)
assert remainder == ()
return result
def rewrite(atree, before, after):
mapping = unify(before, atree)
if mapping is None: return None
return after.instantiate(mapping)
def eagerly_rewrite(athing, rules):
if isinstance(athing, tree):
items = map(lambda thing: eagerly_rewrite(thing, rules), athing.items)
athing = tree(items)
for before, after in rules:
result = rewrite(athing, before, after)
if result is not None: return eagerly_rewrite(result, rules)
return athing
class dummy_writer:
def write(self, astring): pass
def flush(self): pass
class parser:
def __init__(self, infile, lineno=0, outfile=dummy_writer(), prompt=''):
self.infile, self.lineno, self.outfile, self.prompt = (
infile, lineno, outfile, prompt)
def readline(self):
self.outfile.write(self.prompt)
self.outfile.flush()
self.lineno = self.lineno + 1
return self.infile.readline()
def parseline(self):
"""Parse at least a line from infile, returning a list of trees.
When an argument list continues across lines, this function
prefers parsing multiple lines to reporting an error. """
inputline = self.readline()
startlineno = self.lineno
if inputline == '': raise EOFError
charlist = strip_whitespace(cons_string(inputline))
results = []
while charlist and charlist[0] != '#':
try: tree, charlist = parse_charlist(charlist)
except unmatched_leftparen, v:
nextline = cons_string(self.readline())
if nextline == (): # eof
raise unmatched_leftparen(*(v.args +
('line no', startlineno)))
charlist = append_charlists(charlist, nextline)
except parse_error, v:
v.args = v.args + ('line no', startlineno)
raise v
else:
results.append(tree)
return results
def test_parser():
infile = ('''
x a b
x() a() b()
x( ) a( ) b( )
x() a(
''' + # lines 6-10 follow
''') b()
# a comment
snarg fuzz # another comment
a(
) # yet another
''' + # lines 11-14 follow
'''3
)
x() a(
) b(
''')
p = parser(StringIO.StringIO(infile))
assert p.parseline() == []
assert p.lineno == 1
assert p.parseline() == map(parseall, 'x a b'.split())
assert p.lineno == 2
assert p.parseline() == map(parseall, 'x() a() b()'.split())
assert p.lineno == 3
assert p.parseline() == map(parseall, 'x() a() b()'.split())
assert p.lineno == 4
assert p.parseline() == map(parseall, 'x() a() b()'.split())
assert p.lineno == 6
assert p.parseline() == []
assert p.lineno == 7
assert p.parseline() == map(parseall, 'snarg fuzz'.split())
assert p.lineno == 8
assert p.parseline() == [parseall('a()')]
assert p.lineno == 10
assert p.parseline() == [parseall('3')]
assert p.lineno == 11
try: p.parseline()
except parse_error, pe:
assert 12 in pe.args, `pe.args`
else:
assert 0
assert_raises(lambda: p.parseline(), parse_error)
assert_raises(lambda: p.parseline(), EOFError)
class repl:
"""Play with rewriting expressions. a -> b rewrites a to b.
d(peanuts) deletes rules for expressions starting with 'peanuts'.
See the various example *.tree files packaged herewith for
examples.
"""
def __init__(self):
self.rules = []
#define_action('r($pattern $replacement)', define_rewrite_rule)
self.define_action('d($atom)', self.delete_rules)
self.define_action('import($filename)', self.import_file)
def import_files(self, files):
for f in files: self.i_file(sys.stdout, f)
def define_rewrite_rule(self, ostream, pattern, replacement):
self.rules.append((pattern, replacement))
ostream.write("will rewrite %s to %s\n" % (pattern, replacement))
def delete_rules(self, ostream, atom):
ostream.write('forgetting about %s\n' % atom)
self.rules[:] = [(pat, rep) for (pat, rep) in self.rules
if (not hasattr(pat, 'items')
or pat.items[0] != atom)]
def define_action(self, patternstr, routine):
self.rules.append((parseall(patternstr), toplevel_action(routine)))
def import_file(self, ostream, filename): i_file(ostream, filename.value)
def i_file(self, ostream, filename):
infile = open(filename)
self.readfile(parser(infile), ostream)
infile.close()
ostream.write("imported %s\n" % filename)
def readfile(self, cmdlineparser, ostream):
while 1:
try:
try: trees = cmdlineparser.parseline()
except EOFError: ostream.write("\n"); return
self.handle_trees(trees, ostream)
except:
traceback.print_exc()
def handle_trees(self, trees, ostream):
if len(trees) == 1:
eagerly_rewrite(trees[0],
self.rules[:]).toplevel_result(ostream)
elif len(trees) == 0:
pass
elif len(trees) == 3 and trees[1] == atom('->'):
self.define_rewrite_rule(ostream, trees[0], trees[2])
else:
ostream.write("Couldn't understand that: %s\n" %
(map(str, trees)))
def run_interactively(self, files):
self.import_files(files)
cmdlineparser = parser(sys.stdin, outfile=sys.stdout, prompt=' ')
self.readfile(cmdlineparser, sys.stdout)
class wrong_exception(AssertionError): pass
def assert_raises(subr, exc):
try:
subr()
except exc:
return 1
except:
exc_type, exc_value, tb = sys.exc_info()
raise wrong_exception, wrong_exception(exc_value), tb
else:
raise wrong_exception("no exception")
def test():
def raise_TypeError():
raise TypeError
assert_raises(raise_TypeError, TypeError)
def assert_wrong():
assert_raises(raise_TypeError, ValueError)
assert_raises(assert_wrong, AssertionError)
assert_raises(assert_wrong, wrong_exception)
def assert_parse_error(astr):
assert_raises(lambda: parse(astr), parse_error)
assert_parse_error('')
assert_parse_error(')')
assert_parse_error('(')
assert_parse_error(' ')
assert_parse_error(' )')
assert_parse_error(' (')
def dparse(astr): return parse(astr)[0]
def punp(astr): return str(dparse(astr))
def teq(astr):
assert punp(astr) == astr, punp(astr)
teq('asdf')
teq('asdf()')
teq('1324')
teq('asdf(a b c)')
teq('a(1 3 b() c)')
teq('[EMAIL PROTECTED]')
teq('$x')
teq('a(3 $x b($3) x)')
assert punp(' as ') == 'as'
assert punp(' x ( y 2 ) ') == 'x(y 2)'
assert_parse_error('x(y(z)')
assert_parse_error('x((y) z)')
assert_parse_error('$')
assert_parse_error('$()')
### instantiation
atree = dparse('a(b($c d)$e)')
assert str(atree) == 'a(b($c d) $e)', str(atree)
atoma = atom('a')
assert_raises(lambda: atree.instantiate({}), KeyError)
insttree = atree.instantiate({'c': atoma, 'e': atoma})
assert str(insttree) == 'a(b(a d) a)', str(insttree)
insttree = atree.instantiate({'c': atom('b'), 'e': atoma})
assert str(insttree) == 'a(b(b d) a)', str(insttree)
### comparison
assert atom('a') == atom('a')
assert atom('a') != atom('b')
assert integer(1) == integer(1)
assert integer(1) < integer(2)
assert integer(2) > integer(1)
assert variable('a') == variable('a')
assert variable('a') != variable('b')
assert atom('a') != variable('a')
assert variable('a') != atom('a')
assert atom('1') != variable('1')
assert dparse('x(a b c(d e f()) g)') == dparse('x(a b c(d e f()) g)')
assert dparse('x(a b c(d e f()) g)') != 'x(a b c(d e f()) g)'
assert dparse('x(a b c(d e f()) g)') != dparse('x()')
assert dparse('x(a)') != dparse('x()')
assert dparse('x()') != dparse('x(a)')
assert dparse('x(a)') == dparse('x(a)')
assert dparse('x(a)') != dparse('x(b)')
### unification
def sunify(stra, strb):
return unify(dparse(stra), dparse(strb))
assert sunify('a', 'a') == {}
assert sunify('a', 'b') == None
assert sunify('$a', '3') == {'a': integer(3)}
assert sunify('a(b($c d) $e)', 'a(b(a d) a)') == {
'c': atom('a'),
'e': atom('a')}
assert sunify('a(b($c d) $e)', 'a(b(b d) a)') == {
'c': atom('b'),
'e': atom('a')}
assert sunify('hi($x)', 'hi(kragen(sitaker))') == {
'x': dparse('kragen(sitaker)')}
assert sunify('a($a $a)', 'a(b b)') == {'a': atom('b')}
assert sunify('a($a $a)', 'a(b c)') == None
### rewriting
def srew(stra, strb, strc):
return str(rewrite(dparse(stra), dparse(strb), dparse(strc)))
assert srew('a', 'b', 'c') == 'None'
assert srew('a', '$b', 'c') == 'c'
assert srew('a', '$b', '$b') == 'a'
assert srew('a(b)', '$x', '$x') == 'a(b)'
assert srew('a(b)', '$x', 'y($x)') == 'y(a(b))'
assert srew('a(b)', 'a($x)', 'c($x)') == 'c(b)'
### repetitive rewriting
def rules(alist):
def parsepair(apair):
a, b = apair
return dparse(a), dparse(b)
return map(parsepair, alist)
def evalu(astr, rules):
return str(eagerly_rewrite(dparse(astr), rules))
# "laws of form" boolean logic
lofrules = rules([('not(not($x))', '$x'),
('or(1 $x)', '1'),
('or($x 1)', '1'),
('or($x $x)', '$x')])
assert evalu('1', lofrules) == '1'
assert evalu('not(1)', lofrules) == 'not(1)'
assert evalu('not(not(1))', lofrules) == '1'
assert evalu('not(not(not(1)))', lofrules) == 'not(1)'
assert evalu('not(not(not(not(1)))))', lofrules) == '1'
assert evalu('or(1 not(1))', lofrules) == '1'
assert evalu('or(not(1) 1)', lofrules) == '1'
assert evalu('or(1 1)', lofrules) == '1'
assert evalu('not(or(1 1))', lofrules) == 'not(1)'
assert evalu('not(or(not(1) not(1)))', lofrules) == '1'
a = cons_string('fibble')
b = cons_string('wur')
c = restring(append_charlists(a, b))
assert c == restring(a) + restring(b), c
test_parser()
if __name__ == '__main__':
test()
if len(sys.argv) > 1:
repl().run_interactively(sys.argv[1:])
else:
print repl.__doc__
repl().run_interactively([])
And here's 'genre.py', which generates random regular expressions.
The commented-out 'ok' lines are ones that don't yet pass.
#!/usr/bin/python
import random, string, tree, sys, traceback, re
### definitions of regex tree node types
class char:
def __init__(self, thechar): self.thechar = thechar
def render(self):
if self.thechar not in string.letters + string.digits: return '\\' +
self.thechar
else: return self.thechar
def __repr__(self): return 'char(%s)' % repr(self.thechar)
atom = tree.atom('char')
def as_tree(self): return tree.tree([self.atom, tree.atom(self.thechar)])
class kleene:
def __init__(self, subre): self.subre = subre
def render(self): return '(?:%s)*' % self.subre.render()
def __repr__(self): return 'kleene(%s)' % repr(self.subre)
atom = tree.atom('kleene')
def as_tree(self): return tree.tree([self.atom, self.subre.as_tree()])
class seq:
def __init__(self, re1, re2): (self.re1, self.re2) = (re1, re2)
def render(self): return '%s%s' % (self.re1.render(), self.re2.render())
def __repr__(self): return 'seq(%s, %s)' % (repr(self.re1), repr(self.re2))
atom = tree.atom('concat')
def as_tree(self):
return tree.tree([self.atom, self.re1.as_tree(), self.re2.as_tree()])
class choice(seq):
def render(self): return '(?:%s|%s)' % (self.re1.render(), self.re2.render())
def __repr__(self): return 'choice(%s, %s)' % (repr(self.re1), repr(self.re2))
atom = tree.atom('altern')
### random generation of regex trees
def randchar(): return char(chr(random.randrange(32, 127)))
def randkleene(): return kleene(randre())
def randseq(): return seq(randre(), randre())
def randchoice(): return choice(randre(), randre())
def randre():
# we give randchar and randkleene higher probability so this terminates
return random.choice([randchar, randchar, randchar, randchar, randchar,
randkleene, randkleene, randseq, randchoice])()
### converting regexes to strings
def stringify(atree):
mapping = tree.unify(tree.parse('_lit($x)')[0], atree)
if mapping:
# this is too much of a pain to write in the tree-rewriting language
val = mapping['x'].value
if val in string.letters + string.digits: return val
else: return '\\' + val
mapping = tree.unify(tree.parse('_str($x)')[0], atree)
if mapping: return mapping['x'].value
mapping = tree.unify(tree.parse('_concat($a $b)')[0], atree)
if mapping: return stringify(mapping['a']) + stringify(mapping['b'])
print "couldn't match %s" % atree
return atree
myrepl = tree.repl()
nullfile = file('/dev/null', 'w')
myrepl.i_file(nullfile, 're-rewrite.tree')
# these are here because the parser can't cope
myrepl.define_rewrite_rule(nullfile, tree.atom('start'),
tree.tree([tree.atom('_str'), tree.atom('(?:')]))
myrepl.define_rewrite_rule(nullfile, tree.atom('end'),
tree.tree([tree.atom('_str'), tree.atom(')')]))
def deparse(regex):
rewritten = tree.eagerly_rewrite(tree.tree([tree.atom('string'),
regex.as_tree()]), myrepl.rules[:])
return stringify(rewritten)
### tests
def ok(a, b):
if a == b: return
print "mismatch: %s != %s" % (a, b)
def test_stringify_tree():
ok(stringify(tree.tree([tree.atom('_str'), tree.atom('boo')])), 'boo')
treein = '_concat(_concat(_concat(_concat(_str(x) _str(*)) _str(|)) _str(y))
_str(*))'
atree = tree.parse(treein)[0]
ok(str(atree), treein) # make sure we're using it right
ok(stringify(atree), 'x*|y*')
ok(stringify(tree.parse('_concat(_lit(*) _lit(a))')[0]), '\\*a')
def test():
test_stringify_tree()
ok(deparse(char('a')), 'a')
ok(deparse(char('.')), '\\.')
ok(deparse(kleene(choice(char('a'), char('b')))), '(?:a|b)*')
ok(deparse(seq(choice(char('a'), char('b')), choice(char('c'), char('d')))),
'(?:a|b)(?:c|d)')
ok(deparse(kleene(seq(choice(char('a'), char('b')), choice(char('c'),
char('d'))))),
'(?:(?:a|b)(?:c|d))*')
ok(deparse(kleene(char('x'))), 'x*')
ok(deparse(seq(char('x'), char('y'))), 'xy')
#ok(deparse(seq(char('x'), seq(char('y'), char('z')))), 'xyz')
ok(deparse(choice(char('x'), char('y'))), 'x|y')
#ok(deparse(choice(char('x'), choice(char('y'), char('z')))), 'x|y|z')
#ok(deparse(choice(choice(char('x'), char('y')), char('z'))), 'x|y|z')
#ok(deparse(choice(seq(char('x'), char('y')), char('z'))), 'xy|z')
#ok(deparse(choice(char('x'), seq(char('y'), char('z')))), 'x|yz')
ok(deparse(kleene(kleene(char('x')))), 'x*')
# ok(deparse(choice(seq(char('@'), seq(char('o'), char('D'))),
# choice(char('~'), char('9')))),
# '[EMAIL PROTECTED]|\\~|9')
ok(deparse(kleene(seq(kleene(char('d')), char('S')))),
'(?:d*S)*')
#ok(deparse(kleene(choice(kleene(seq(char('G'), char('+'))),
# seq(kleene(char('0')), char('='))))),
# '(?:(?:G\\+)*|O*\\=)*')
test()
if __name__ == '__main__':
for ii in range(100):
x = randre()
xs = deparse(x)
print x, xs
try: print re.compile(xs)
except: traceback.print_exc()