list comprehensions and Bicicleta syntax

Kragen Javier Sitaker Thu, 19 Apr 2007 00:37:04 -0700

I'm pleasantly surprised to see some promise in Bicicleta's current
concrete syntax.


A call to collect
-----------------

So consider this call to prog.collect:

    prog.collect(f: "(" + f.item + ")", for_each="cthulhu", 
        where=(f.item == "u").not)

This returns the list ["(c)", "(t)", "(h)", "(l)", "(h)"].

Having just explained this code to several people, I am now aware that
it is not a marvel of clarity, so I will start by explaining what this
does, and some of its internal workings.

It makes a list of "(" plus a character plus ")", for each character
of the string "cthulhu", as long as the character is not "u".

There are three arguments to 'prog.collect': 'arg1', which is "(" +
f.item + ")"; 'for_each', which is "cthulhu"; and 'where', which is
(f.item == "u").not.  'f' is a name used to refer to the collect
expression as a whole.

It's a little odd to have "arguments" whose value depends on the
function they're being passed to, so I should explain that they're not
really arguments, but methods.  Bicicleta doesn't really have
arguments.

This expression is evaluated by evaluating prog.collect (the results
of the 'collect' method on the variable 'prog', which conventionally
refers to the top level of the program), deriving a new object from it
by overriding the 'arg1', 'for_each', and 'where' methods described
above, and then calling the '()' method on the resulting object.  If
we didn't want to do this last step, we could write

    prog.collect {f: "(" + f.item + ")", for_each="cthulhu", 
        where=(f.item == "u").not }

which is just the object (the same object named by f).

Prog.collect evaluates 'arg1' and 'where' in a series of objects with
different 'item' methods, which return successive elements of the
'for_each' value, in order to construct the list.  'for_each' can be
any kind of sequence, not just a string.  

Mechanics of 'collect'
----------------------

Here's the full code for prog.collect:

    # Collect: map+filter, in a more listcompy shape.
    # WORDY! Uck!  Avoids prog.if because prog.if depends on collect.
    collect = {collect: arg1 = collect.item,
        cursor = collect.for_each.cursor
        item = collect.cursor.item
        where = prog.sys.bool.true
        next = collect { cursor = collect.cursor.advanced }
        '()' = collect.cursor.empty.if_true(
            then = collect.cursor
            else = collect.where.if_true(
                then = collect.arg1 @ collect.next()
                else = collect.next()))
    }

'arg1' defaults to collect.item (the same as f.item in the call
earlier); 'item' is defined as collect.cursor.item; 'cursor' defaults
to collect.for_each.cursor; 'where' defaults to true; 'next' is a
method that returns the same object, except with a new value for
'cursor' (giving it different values for 'item', 'arg1', 'result', and
maybe 'where'); and '()' either returns an empty list (if 'cursor' is
empty) or either collect.next() or collect.arg1 @ collect.next(),
depending on whether 'where' is true or false.  '@' is the cons or
list-construction operator.

So, in the call above, initially 'item' is "c", 'arg1' is overridden
to be "(c)", 'where' is overridden to be true, and the cursor is not
empty, so we end up with '()' returning "(c)" @ collect.next().  In
collect.next, the cursor is advanced to point to the next item, 'item'
is "t", arg1 evaluates to "(t)", and '()' evaluates to "(t)" @
collect.next(), so the top-level '()' evaluates to "(c)" @ ("(t)" @
some other stuff), and so it goes on.

In the case where item is "u", because cursor is pointing to the
beginning of "ulhu", 'where' evaluates to 'false', so the
collect.where.if_true expression returns collect.next(), ignoring the
"(u)" that 'arg1' would compute.

Eventually, the cursor is empty, and '()' just returns that (empty)
cursor, which serves to terminate the list; probably I should return
prog.sys.nil instead.  In those cases, it doesn't matter what 'item'
and 'arg1' evaluate to, even if they evaluate to errors, because
they're not being returned.  Likewise in cases where 'where' evaluates
to false --- '()' just returns collect.next() and bypasses 'result'
and 'arg1' entirely.

I anticipate that utilities like "collect" will be able to keep
explicit recursion confined to tiny corners of the system libraries
and to problems that really benefit from recursion.

Why I Think This is Cool (Bicicleta, Python, OCaml, and Squeak)
---------------------------------------------------------------

Loops are confusing and complicated, especially in functional
languages that implement them by recursion.  A lot of loops can be
subsumed by simple one-variable list-comprehensions, often with
improved comprehensibility and brevity.

For this reason, Python and Haskell have list-comprehension syntax
built into the language, so that you can write (in Python):

    ["(" + item + ")" for item in "cthulhu" if item != "u"]

Which gives you the same result as the Bicicleta expression:

    prog.collect(f: "(" + f.item + ")", for_each="cthulhu", 
        where=(f.item == "u").not)

(The .not is just because I haven't implemented != for strings yet,
because right now my !=-derived-from-== magic is locked up in a
numeric class from which I should factor out a "comparable".)

To my eyes, the Python version is more readable, but the difference is
not enormous; they are closer to one another than either is to

    rv = []
    for item in "cthulhu":
        if item != "u": rv.append("(" + item + ")")
    # now do something with rv

If I added special syntax to Bicicleta to do list-comprehensions, I
coule eliminate the "prog.collect" part:

    [f: "(" + f.item + ")", for_each="cthulhu", where=(f.item == "u").not]

But even without special syntax, I think it's better already than
Smalltalk:

    'cthulhu' asArray select: [:c | c ~= $u] 
        thenCollect: [:c | '(', c asString, ')']

Or OCaml:

    let list_of_string string = 
      let rv = ref [] in 
        for i = String.length string - 1 downto 0 do 
          rv := string.[i] :: !rv 
        done ; 
        !rv
    in
    List.map (fun item -> "(" ^ String.make 1 item ^ ")") 
      (List.filter ((<>) 'u')
        (list_of_string "cthulhu")) ;;

Although, to be fair, a lot of the verbosity in the Smalltalk and
OCaml versions has to do with excessive incompatible types (lists,
strings, arrays, characters) rather than the clumsiness of the
non-list-comprehension syntax.  But consider in the ideal case, where
those incompatibilities don't exist:

    ["(" + item + ")" for item in "cthulhu" if item != "u"]
    'cthulhu' select: [:c | c ~= $u] thenCollect: [:c | '(', c, ')']
    prog.collect(f: "(" + f.item + ")", for_each="cthulhu", 
        where=f.item != "u")
    List.map (fun item -> "(" ^ item ^ ")") 
      (List.filter ((<>) 'u') "cthulhu") ;;

With corresponding bits rearranged to more or less line up:

             "(" + item + ")" for item in "cthulhu"  if item != "u"
    thenCollect: [:c | '(', c, ')']       'cthulhu' select: [:c | c ~= $u] 
       "(" + f.item + ")",       for_each="cthulhu", where=f.item != "u"
   List.map (fun item -> "(" ^ item ^ ")")"cthulhu"  (List.filter ((<>) 'u')

This suggests that there is some brevity benefit that attaches
specifically to the practice of defining new methods in the form
f.item != "u", rather than creating anonymous functions, even such
lightweight functions as Squeak has [:c | c ~= $u].  Currying, such as
((<>) 'u') is even shorter, of course.

It turns out that you can use currying in Bicicleta similarly; you can
write "u".'!=' to mean {op: '()' = "u" != op.arg1}.  In this case,
though, collect is defined to expect a method definition on itself,
not an anonymous function that it would have to pass something to
explicitly.

list comprehensions and Bicicleta syntax

Reply via email to