Jonathan S. Shapiro wrote:
> Something like that. It's not quite enough if blocks are expressions.
> In effect, you have several types of lexically nested braces, and you
> need to keep track of the innermost active brace type in the current
> lexical context.
>
> So there is a devil in the details, but yes, something like this ought
> to work. And then the trick would be to have the lexer synthetically
> emit OPEN/CLOSE tokens to the parser at the right points.
>   
Sorry I'm late to the indentation party.  I think an indentation syntax
is a great idea! 8^)

There are a *bunch* of different python lexer implementations.
The one in /usr/local/lib/python../lib/tokenize.py is pretty clear. 
It's written using generators.

Here's the relevant snippet:

        elif parenlev == 0 and not continued:  # new statement
            if not line: break
            column = 0
            while pos < max:                   # measure leading whitespace
                if line[pos] == ' ': column = column + 1
                elif line[pos] == '\t': column = (column/tabsize +
1)*tabsize
                elif line[pos] == '\f': column = 0
                else: break
                pos = pos + 1
            if pos == max: break

            [...]

            if column > indents[-1]:           # count indents or dedents
                indents.append(column)
                yield (INDENT, line[:pos], (lnum, 0), (lnum, pos), line)
            while column < indents[-1]:
                if column not in indents:
                    raise IndentationError(
                        "unindent does not match any outer indentation
level",
                        ("<tokenize>", lnum, pos, line))
                indents = indents[:-1]
                yield (DEDENT, '', (lnum, pos), (lnum, pos), line)

INDENT and DEDENT are 'synthesized' tokens.

-Sam

_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to