One-pass parsing and forward type references

Carl Mäsak Sun, 31 Jan 2010 09:36:00 -0800

There's one thing that bugs me ever so slightly. I'll just air it and
happily accept whatever feedback it produces.


This email is somewhat of a third-strike thing: looking back, I've
been muttering over this itch both on IRC and on Twitter during the
past year.

<masak> sometimes one-pass parsing annoys me to no end.
<moritz_> masak: multi pass parsing is even more annoying in the long run ;-)
<masak> I think it's very restrictive that you can't refer to a class
name before it's been declared.
<masak> it's unlike many other languages I'm familiar with.
<jnthn> You can always declare a stub and define it later.
<masak> true.
<masak> in other circumstances, that's called 'cruft' or 'boilerplate'.
<masak> code required to cater to a language's oddities.

<carlmasak> I sometimes run into the #perl6 restriction that you have
to declare types textually above you use them. (So no cycles.) It
feels arbitrary.
<quietfanatic> @carlmasak I think It's required to differentiate
between types and subs. I guess you could invent a way to declare a
stub type...
<quietfanatic> @carlmasak Oh, but you'd think the process that checks
ahead for sub declarations could also check for type declarations.
<carlmasak> @quietfanatic I haven't yet been seeking ways in which
classes wouldn't have to be declared beforehand; it just bothers me
that they do.

Just to be clear: I don't expect to sway anyone as to whether Perl 6
should be do parsing in more than one pass -- the one-pass parsing is
here to stay in the language. I mostly want to make the case that
sometimes the one-pass restriction causes the programmer to have to
resort to subtle contortions or experience strange errors.

Before I go into specifics, here's my general opinion: historically,
it took a while before compiler writers were convinced that recursion
within and between subroutines was useful enough that they built the
requisite complexity into the compilers. (FORTRAN 77 doesn't have
recursion, for example.) Nowadays, it feels completely natural that
C<foo> may call C<bar>, which may call C<foo> again.

But on another level, the level of types, Perl 6 makes it fairly
*un*natural that the type C<Foo> refers to the type C<Bar>, which in
turn refers to the type C<Foo>.

Quick, write a program where C<A::foo> calls C<B::bar> which calls C<A::foo>!

I found two ways. Either one uses C<augment> (the language construct
formerly known as C<is also>):

  class B {}
  class A { sub foo { B::bar } }
  augment class B { sub bar { A::foo } }

...or one may use the C<::> notation to index a type using a string value:

  class A { sub foo { ::<&B::bar>() } }
  class B { sub bar { A::foo } }

In either case, one has to mentally acknowledge that there's a
dependency cycle, and manually apply a circularity saw somewhere early
in the code to fix it.

While this in itself is not much of a problem, it becomes one as the
code base grows. The design of Perl 6 stigmatizes type cycles, and
introduces boilerplate of the above type, whereas in other languages
no special treatment at all is necessary. Also, when everything is
confined to one file, it's not so bad. The real pains start when types
in different files need to refer to each other. Do I put a stub class
definition in the 'wrong' file? Or do I turn off the compiler type
checking by putting types in strings?

I didn't see it that way a month or so ago, but now I think of
mutually defined classes as no more unusual than mutually recursive
functions. Here are two naturally-occurring examples from my current
medium-sized project GGE. If the details weigh you down rather than
inform, feel free to skip them. I just want to show that these kinds
of cycles do happen:

* The regex class R occasionally calls out to an optable parser O to parse
  a regex string into an AST. The class O can be set up in such a way as to
  call provided subroutines, including -- if one wants -- subroutines inside
  the class R. However, one of the O tests sends in a whole R object into
  O, expecting it to match as an ordinary regex. Question: How should O
  detect whether an R was sent in?

* The '<before foo\dbar>' syntax in S05 allows any regular expression to
  occur after the word 'before' and a space. In this case, 'foo\dbar' would
  be sent as a string to an ordinary method C<.before> in the match class
  M. Thus, M needs to (recursively) invoke the regex class R to parse the
  string into an AST. Only... R uses M heavily, so it's an A::foo<->B::bar
  situation. Question: How should M call out to R when R already calls out
  to M?

The one-pass answer to both these questions are: "Well, you simply
need to force your types into a tree structure, and take special care
every time there's a forwards reference somewhere in all your modules.
Either define a type 'too early' and re-open it when you really want
to define it, or use weaker string references to circumvent the
compiler."

The two-pass answer to both these questions are: "Huh? What's the problem?"

And that's what bothers me. There shouldn't, ideally, *be* anything
problematic about mutual type definitions. But right now, there is.
And it's subtly annoying, in a way that might very well be
proportional to the size of the project.

// Carl

One-pass parsing and forward type references

Reply via email to