APL, Python, redundancy, and polymorphism

Kragen Javier Sitaker Mon, 02 Apr 2007 00:37:08 -0700

(When reading this, please keep in mind I've never written an
interesting program in APL or PHP, so I am speaking from ignorance.)


APL goes far out of its way to generalize the meanings of things.  For
example, it included a factorial operator (missing in dialects like A+
and K), but rather than give an error when fed a floating-point
number, it would give the value of the gamma function at the point
offset by 1 from the argument --- so it coincides with factorial for
integers, but is more general.  The iota operator is fairly useless (I
think) applied to non-scalars, but it is defined so that its useful
scalar behavior is just a special case of its more general behavior
when fed vectors of any length, pretending that the scalar is a vector
of length 1.

Along the same lines, APL's representation of booleans as 0 and 1
allows you to use +/ to count the number of times something is true;
in K, the 'and' and 'or' functions have additionally been dropped in
favor of the dyadic floor and ceiling functions, which act as 'and'
and 'or' over the domain of 0 and 1.

And then there's its infamous use of every built-in function character
to denote two different functions, depending on whether it is applied
to two arguments or one.  Some characters have even more meanings; I
think / has three (one as the "reduce" operator), and in K I think it
has four.

These aspects of its design are designed to remove redundancy from
your program, making it shorter and more likely to mean something when
you screw it up, so it gives the wrong answer rather than simply raise
an exception.  In general, I do not like this tendency to err quietly.

I was reminded of this language strategy of nonredundancy while
reading the documentation for ocamllex.  

Ocamllex is a lexer generator similar to Mike Lesk's lex, and in the
regular expression patterns that define tokens, you can bind a
variable to a part of the pattern, and there are some rules that
define what the type of the variable is --- whether it's a char, a
string, a char option, or a string option.  (This being Ocaml, you can
do different things with a string, a char, and a string option, so
it's important to know which one you have if you want your program to
compile.)

This reminded me of the big language difference between Python and
both Smalltalk and OCaml, which is that Python has relatively few
kinds of collections, while Smalltalk and OCaml have lots.  Python
doesn't even have a char type; it just uses strings of length 1.
Rather than having a fixed-size array type, a variable-size
OrderedCollection type, a linked-list type, and a special type for
fixed-size arrays of small integers, Python has a single resizable
array type that it calls "list".  (It also has an immutable tuple
type.) Rather than having a red-black tree type, a hash type, an alist
type, a compact type for many different dictionaries sharing the same
set of keys, a skip-list type, and so on, it has a single dict type,
which it implements with hashes.  (However, like Smalltalk, the
interfaces to these collection types are accessed by sending messages
OO-style, so you can make your own collection type.)

I like this a lot, even though it is one factor in making Python
dramatically slower than Smalltalk.  It makes it a lot easier to get
started with the language, and the speed penalty is often acceptable,
and it makes programs shorter.

(In cases where efficiency is really important, there are extension
libraries that provide other kinds of containers, some of which are
even in the standard distribution; but they are used many times less
frequently than lists.)

This struck me as being somehow analogous to the APL approach; where
APL uses one operator to mean many different (but related) things,
Python uses one data structure to represent many different (but
similar) kinds of collections.  It even uses the same indexing
operator to index into dicts and lists.  PHP, JavaScript, and Lua take
this approach even further, in slightly different directions.

So why do I like Python's data structures being general in this way
and dislike APL's operators being similarly general?  I think it's
because I don't expect Python's data structure overloading to hide
bugs, but only to cost performance, while APL's operator overloading
definitely does hide bugs.

However, there are several cases where Python's data structures are
         fairly strict in ways that help catch bugs, but which are irritating
coming from Perl or JavaScript, and which make your programs bigger:
- There's no implicit conversion between strings and numbers, or
  actually strings and anything.
- Trying to access off the end of an array gives you an error rather
  than returning null or resizing the array.  There are explicit
  'append' and 'extend' methods that make the array bigger.
- Dicts are different from lists, so indexing into a list with a
  number, or trying to append to a dict, gives you an error.  (This
  also helps with efficiency; I vaguely recall that Python's
  predecessor ABC used a single painfully-slow AVL tree for a single
  data structure that served both purposes.)
- Dicts are also different from objects, so accessing an object
  attribute with a run-time name requires hairy syntax.  (Also, this
  namespace separation provides more flexibility for emulating dicts
  with user-defined objects; because Perl doesn't do this, it requires
  a much hairier approach to doing this (called "tie"), which
  I consequently use much less.)

You could imagine that Smalltalk's approach of having different kinds
of collections for, say, fixed and variable-size arrays, might also
catch some kinds of errors, in addition to improving speed; but I
don't think it does.

APL, Python, redundancy, and polymorphism

Reply via email to