Re: Regular expression generator/creator

Chris Devers Sat, 28 Aug 2004 13:28:00 -0700

On Sat, 28 Aug 2004, Philipp Traeder wrote:

I'm not sure if I understood the Halting Problem completely, but could you give a hint what brings you to the assumption that "automatically generating regular expressions from a bunch of text" is a variant of the Halting Problem?

I'm not a computer science theorist or anything, and could easily be wrong about my guess there, but it just seems like the same sort of broadly unbounded scenario that the halting problem deals with.

The halting problem, for those that haven't heard of it, basically states that it is impossible to determine whether any random program will ever finish properly, because most will succeed or fail quickly but some will legitimately need infinite time to complete. It was conceived by Alan Turing before the first electronic computers were yet built, and is itself a variant on Kurt Godel's incompleteness theorem:

    <http://en.wikipedia.org/wiki/G%F6del's_incompleteness_theorem>

I wasn't aware of this example until the thread came up, but this blurb from the above URL --

    One of the first problems suspected to be undecidable was the word
    problem for groups, first posed by Max Dehn in 1911, which states
    that there is a finitely presented group that has no algorithm to
    state whether two words are equivalent. It was not proven to be
    undecidable until 1952.

-- sounds to me like it's talking about the same kind of regex situation that the person who started this thread was asking for.

Anyway, all of this & more isdescribed at extreme length in the great book _Godel, Escher, Bach_, if you want to learn more from an author who can actually explain this stuff without mangling it, as I've probably done :-)

I'm facing a roughly similar problem at the moment, and I was planning on using String::Compare or something like it for comparing strings char by char. Taking a first glance at the code, it doesn't look too hard to modify it in a way that it returns not only the similarity between two strings, but also a string with special characters at those places that are different - something that I could call like:
 my $a = 'abcdXef';
 my $b = 'abcdYef';
 my ($similarity, $regexp) = compare_strings($a, $b);
and would return the similarity as percentile of matching chars (6/7 in this case) and a regex that looks like
 abcd.ef
Is this problem different to the one you were talking about? If not, why
shouldn't it be solvable?

As I say before, I'm not a theorist or anything, it's all just hunches here :-).

That said, the example you give is pretty well restricted, which would seem to lend to the problem being solvable. On the other hand, the example you give is so well restricted that it didn't take you any trouble at all to come up with the regex yourself.

My suspicion is that as the problem gets more and more nuanced, automatic tools will have an increasingly hard time keeping up.

This also gets into the sorts of topics that artificial intelligence researchers spend time pondering about -- how do you get a program to automatically recognize that a random image is a person (rather than, say, a statue), etc. It sounds simple, but it's really tricy stuff...


--
Chris Devers      [EMAIL PROTECTED]
http://devers.homeip.net:8080/blog/

np: 'Protect Ya Neck (The Jump Off)'
     by Wu-Tang Clan
     from 'The W'

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: Regular expression generator/creator

Reply via email to