[REBOL] EBNF vs RE Re:(3)

joel . neely Sun, 30 Jan 2000 16:24:42 -0800
Sorry, Elan, to be so slow in responding...  It's been an eventful
weekend!

[EMAIL PROTECTED] wrote:
> 
> > One of the nice things about REs is that they AREN'T code, just
> > description. And, along the lines of the discussion that Bob
> > inspired, they're just data, which you can read from any old
> > place and apply.
> 
> Would you feel more comfortable when you consider that REBOL
> functions are also just data?
> 

My response would have to be "no" for a variety of reasons.


PRACTICAL LEVEL:

An RE is just a string that describes a pattern.  As such, it is
a sort of "4GL" for string matching.  It describes WHAT to do, not
HOW to do it.  This provides a number of advantages to me as the
writer of REs.


1)  I just write a (fairly) straightforward description of the
pattern I want to locate; the implementor of the match engine
(whether Perl, AWK, grep, egrep, agrep, Python, or whatever ...)
provides highly optimized code whose performance normally isn't
limited by my ingenuity or coding skills.


2)  It's just data; the matching engine just tries to match it.
    The outcome is boolean; given an RE and a string, there's a
match or there isn't.  Attempting to match a pattern doesn't
execute arbitrary code that can go wildly wrong and cause all
sorts of other breakage.  (There actually is a loophole in Perl
REs, but it's easy to spot and thwart.)


3)  REs are just strings, so I can store them externally, reading
    and matching against them as needed.  We know that there are
both performance and security issues involved in simply loading a
string and using the resulting block as a parse rule in REBOL.
(There's also that pesky "what context are these words defined in"
issue, but let's not even go there...)

Consider the following hypothetical example, assuming suitable
bitsets are defined:

Let's pretend we're reading from some external source...

    >> rulestring: "[some alpha {=^"} some alphanum {^"}]"

... and pretend to use it in a running script...

    >> foreach str [{name="value"} {n="v"} {name=val} {name='val} ][
    [    print [str  parse str load rulestring]
    [    ]
    name="value" true
    n="v" true
    name=val false
    name='val false

Now let's read another rule from an external source ...

    >> rulestring: "[some alpha {=^"} (print {YOU'VE BEEN TRASHED!}
halt)]"

... and try to use it with the same code ...

    >> foreach str [{name="value"} {n="v"} {name=val} {name='val} ][
    [    print [str  parse str load rulestring]
    [    ]
    YOU'VE BEEN TRASHED!

We can imagine worse...

    "[foreach filename read %. [delete to-file filename]]"

or

    "[foreach f read %/c/Quicken/ [send [EMAIL PROTECTED] read
to-file f]]"

(I didn't actually do a test run with either of these. ;-)


3)  By using a different matching engine with the same RE, we can
    get some nice behavioral enhancements.  Take a look at agrep
(for "approximate grep") as an example.  The agrep engine has some
very sophisticated heuristics for determining when a string is a
fuzzy "good enough" match.  Now imagine trying to rewrite a working
parse rule for such an approximation.


PHILOSOPHICAL LEVEL:

The whole data vs. software vs. hardware issue is a relative one.
(I used to frustrate graduate CompSci students with this one.)  We
can certainly play some Doug-Hofstadter-like games in this area
(as in _Goedel_Escher_and_Bach_ or _Metamagical_Themas_), but we
can also turn programming into a stroll "Through The Looking Glass"
if we don't keep ourselves VERY clear on what level we're dealing
with at each moment.


4)  The statement "REBOL functions are data" is true, but the
    statement "REBOL functions are JUST [emphasis mine] data" isn't
really accurate.  Their rules of construction, the behavior which
they exhibit when evaluated, and their possible effect on the state
of the environment in which they are evaluated -- are all things
that apply to a much higher degree to code than to data.

Certainly data values can have syntactical rules.  These are usually
much simpler than the rules for construction of valid code, however.
If someone misspells my name in a data file, I may grumble about it,
but it probably won't crash his program.


5)  A key difference between data and code is intent.  When I'm
    creating a REBOL source file using vim, an observer might say
that I'm just manipulating ASCII characters.  But I'm thinking
code, not ASCII.  The fact that my code also has a representation
as a collection of data structures is not the center of attention.

(Obviously there are exceptions; however, the amount of time I
spend on such exceptional cases is fairly small.  After all, how
many versions of 'source, 'huh?, and %browse-system.r does the
world need?  ;-)



OBTW, have a look at http://www.transmeta.com for a design that
adds yet ANOTHER level to this whole discussion.  Now if Carl and
company want to implement REBOL/Caruso with the ability to rewrite
VLIW microcode on the fly, then we'll have to get into this
discussion REALLY seriously.  ;-)

-jn-
[REBOL] EBNF vs RE Re:(3)

Reply via email to