PPIG discuss: Linguistics & programming languages

Martin Sustrik Tue, 07 Mar 2006 13:35:10 -0800

Hi all,
I am very pleasantly surprised I found people who
understand and are interested in the problem.
To make the thing I am interested in more clear, I
wrote a very simple and sraightforward case study:


    Probably every programmer learning to code by
himself had at one point written something like this:

    IF a > 5 AND < 10 THEN ...

    He was surprised that the line did not compile.
    He was then told that the construct should look
like this:

    IF a > 5 AND a < 10 THEN ...

    He corrected the line without giving it much
thought and forgot about the problem completely.
    If asked in the time when he grew more experienced
about the former construct, he would probably say that
it is not allowed because it is kind of vague and
ambiguous, but, if needed, it could be built into
complier without any problems as it is only a
'syntactic sugar', something that makes programmer's
life easier, but has no special meaning by itself.
    However, if we would ask him to implement such a
feature, he would run into really serious problems.
    We would expect following three lines to be
semantically equivalent:

    IF a < b AND a < c THEN ...
    IF a < b AND < c THEN ...
    IF a < b AND c THEN ...

    But what should compiler do to interpret such
expressions? When parsing it we would get following
parse trees:

    (a < b) AND (a < c)
    a ((< b) AND (< c))
    a < (b AND c)

    It is clear how to interpret the first parse tree,
but what about the second and the third one? What is
subexpression (< b) supposed to mean? Is it a function
that compares it's argument to b? And what about (b
AND c)? Is it a way how to construct a set? If so,
operator < would have different semantics in each of
the examples. In first one it is a standard comparison
between two numbers. In second one it is a way to
create an unary function. In third one it is a
comparison between a number and a set of numbers. Same
applies to AND. In the first case it is a standard
logical AND. In second one it is a way to combine two
unary functions. In third one it creates a set of
numbers.
    All these subtleties seem to be too complex to
express such an simple thing as joining two parts of
an expression by 'and'. Are we following a wrong way?
However, thinking in terms of orthodox compiler
design, I see no other way we can use.
    So let's look at the problem from different point
of view. What we are trying to model are natural
language expression like:

    If a is less then b and a is less then c then ...
    If a is less then b and less than c then ...
    If a is less then b and c then ...

    Does linguistics have anything to say about the
issue?
    I think it does. The answer is 'transformational
grammar' as proposed by Chomsky. I.e. basic
context-free grammar + set of transormational rules. I
even wrote a simple pseudo-compiler (just to prove
it's feasible) that's able to process statements like:

    Call function 'insert' on list1 and list2 with
argument x.
    Call functions 'erase' and 'insert' on list.
    Call function insert on list twice with arguments
x and y.

    In C++ like syntax:

    object1 & object2 . insert (x);
    object . erase (x) & insert (y);
    object . insert (x) & (y);

Anyone saw anything similar?
Thanks.
Martin

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
 
----------------------------------------------------------------------
PPIG Discuss List (discuss@ppig.org)
Discuss admin: http://limitlessmail.net/mailman/listinfo/discuss
Announce admin: http://limitlessmail.net/mailman/listinfo/announce
PPIG Discuss archive: http://www.mail-archive.com/discuss%40ppig.org/

PPIG discuss: Linguistics & programming languages

Reply via email to