Re: AST based language: was Re: [The Java Posse] Re: JavaFX - oddities in the language? Week 2.

Reinier Zwitserloot Thu, 10 Sep 2009 08:15:32 -0700

In another thread, the idea of compiler-plugin based literals was
floated. I observed that unless that plugin is available at tokenize
time (which means, before resolving typing info, so that's annoying,
as you'd want to use that to figure out which plugin is responsible),
the compiler can't continue unless there's a global, unescapable, end-
of-custom-literal marker, like XML's CDATA, or Fan's <| and |>. Of
course, these tokens, by their very nature, have to look awkward, and
it prevents recursive wrapping.


In an AST based editing environment, this problem goes away. At write
time you must of course have the plugin available to you, at which
point you type something like:

lit:Patt(ctrl+space to autocomplete it to java.util.regexp.Pattern)
and then you type a regular expression; the job of marking the end is
trivial where in raw character typing mode it's almost unsolvable.


You can take this same idea even further and add support for macros:
foreach could have been implemented as a macro, but this time, the AST
node carries its origin  with it. This way, I can switch my editor at
will, and type/read either in macro syntax, or in the desugared form.
Making the IDE extensively pluggable would be so much easier. There
would be no closure debate at all - those who like em use a macro
plugin that renders closures as Anonymous Inner Class constructs.

There would be need for import statements at all. Each and every
mention of a type is fully specified, and my preferences will decide
how they render.

Because whitespace, import statements, and other ambiguities melt
away, in certain ways the canonical textual representation (which ISNT
how you're supposed to edit things and would be extremely unwieldy) is
actually easier to interop with version control systems. Of course,
checking diffs with a non-smart diff viewer would be a little more
awkward, but perhaps this is a good thing: You'd see every
semantically relevant change, at the point of the semantic difference.
Contrast this to, say, changing an import statement from
java.util.List to java.awt.List, which would change everything, and
yet only show up as a dinky little one-liner in your import
statements.

The biggest issue remains that so much of the entirety of the
development ecosystem is built around the notion that source lives as
raw streams of characters. There would definitely have to be a human-
readable canonical representation so you can interop with such tools
until they also see the light. There may also be an interesting lesson
in how many typical geeks doing professional writing use something
like HTML or LaTeX, writing it essentially 'raw', instead of using
open office or word. I think there are different reasons for that, but
it is nevertheless interesting to see that shiny, graphical tools are
losing to raw char streams in some areas.

To the galaxy, and beyond!

On Sep 10, 4:49 pm, Joshua Marinacci <[email protected]> wrote:
> I suspect you are right. I've asked this question of many people and  
> gotten a variety of reasons why it won't work. They reasons are always  
> valid,  but they always boil down to the same thing: compatibility  
> with existing systems.  If we could start over fresh *for everything*,  
> then I think a AST based language would quite well, and enable lots of  
> very interesting things. I've changed the subject line since this is  
> really getting off topic now.  My goal is to just think meta for a  
> second.
>
> If we could design a language, and all of it's tools, from scratch  
> today; then how would we do things differently?
>
> == Proposal ==
>
> Consider a language that is defined not in terms of tokens but in  
> terms of it's abstract syntax tree (I'm not a compiler guy so I hope  
> I'm using the right terms here). Instead of saying:
>
>         conditional is defined by 'if' + '(' + mathematical expression + ')'  
> plus optional '{' then a clause etc.
>
> what if it was defined as:
>
>         conditional is defined by a boolean expression followed by two blocks
>
> The details such as the 'if' keyword, requiring braces, using  
> parenthesis, etc. would all be up to the individual developer (or at  
> least defined by their tools) rather than defined by the language.  
> Some sort of neutral binary or XML format would be the true storage  
> mechanism and everything else we think of as "the language" would be  
> defined at a higher level on a per developer basis.  The neutral  
> format would be semantically equivalent to the code that the developer  
> sees on screen, but specific entirely to them.
>
> == Advantages ==
>
> There are huge advantages to this approach.
>
> * tabs vs spaces goes away. You see whatever you wish to see, and it  
> doesn't affect other developers.
>
> * comments could be nicely formatted rich text, including lists,  
> tables, and diagrams.
>
> * Line numbers in stacktraces:  Consider the work required to turn the  
> location of the bytecode exception back into a line and column number.  
> It would be easier to map back to the AST. The compiler / runtime  
> would emit some sort of AST marker which the IDE would convert back to  
> it's visualization of your line / column (assuming you are still  
> editing in terms of lines and columns). Most likely it would highlight  
> the exact problematic branch of the tree, not just a line and column.
>
> * refactoring becomes far easier, and could enable far more  
> interesting refactoring changes than the simple ones we have to day.
>
> * since we are using a binary / xml blob for the real storage, we  
> wouldn't have to worry about files and filenames anymore. What would  
> matter is modules and compilation units. The actual files it's stored  
> in become irrelevant.
>
> * code analysis: tools which analyze your code should be able to do a  
> better job when they work at the 'meaning' level rather than the  
> 'syntax' level.
>
> * code visualizers: It should be trivial to build things which draw  
> UML diagrams of your beans, or show nested structures with darkening  
> backgrounds. Almost all of the cool things you want to do boil down to  
> visualizing a branch of the tree, making possible all sorts of very  
> interesting visualizations.
>
> * many syntax errors go away: since the IDE knows what the valid tree  
> should look like, it can prevent anything which would create an  
> invalid tree. rather than scanning the whole file 20 times a second it  
> can look at what you've done in the last few seconds that just made  
> the tree invalid and isolate the error to that. the result is more  
> accurate error reporting, even before you get to the compiler.
>
> * never ever worry about some other developer f**king up your  
> indentation, line breaks, curly brace scheme, etc.
>
> * the potential to use different keywords, line terminators, and other  
> syntax of your choosing and have it be completely isolated to your  
> environment. No other developer is affected.
>
> == Cons ==
>
> * You've got to use an IDE. Yes, no more blindly editing text files  
> with vi and emacs. Sorry. It's the 21st century. I edit images in  
> Photoshop, not the command line. I will now edit programs in a  
> programming tool.
>
> * Youv'e got to write IDE support for this. Building this new language  
> requires also building an IDE plugin that understands it.
>
> * Text diff tools (and therefore source control systems) would have to  
> be updated to understand this binary / xml format. In theory the diffs  
> should be better since you'd have a better idea of what semantically  
> changed (tree diffing, basically), but someone's still go to write the  
> tools to do it.
>
> * Two developers working on their own machines would see the code  
> views they expect. One developer trying to help a second developer on  
> his machine would see a view completely unfamiliar to what they expect.
>
> * Web based code review tools would show a normalized view that is  
> unfamiliar to all developers, or else code review tools would have to  
> be a new module inside the IDE to pick up the prefs of the developer  
> doing the reviewing.
>
> Crazy idea, but it's the 21st century. We can handle it.  Now if  
> you'll excuse me I've got to go take my flying car in for repairs  
> before my weekend trip to Mars.
>
> - j
>
> On Sep 10, 2009, at 1:28 AM, Peter Becker wrote:
>
>
>
>
>
> > And it alls starts with the language specs still being written at the
> > abstraction level of a concrete syntax. Chapter 1: Tokenization.
>
> >  Peter
>
> > Joshua Marinacci wrote:
> >> RANT!
>
> >> Why, in the 21st century, are we still writing code with ascii  
> >> symbols
> >> in text editors, and worried about the exact indentation and whether
> >> to use tabs, spaces, etc?!!
>
> >> Since the IDE knows the structure of our code, why aren't we just
> >> sharing ASTs directly, letting your IDE format it to your desire, and
> >> only sharing the underlying AST with your fellow developers.  
> >> Encoding,
> >> spaces, braces, etc. is a detail that only matters when presented to
> >> the human.
>
> >> What we do today is like editing image files from the commandline!
>
> >> On Sep 9, 2009, at 7:32 PM, Ryan Waterer wrote:
>
> >>> While experienced programmers might not worry about the braces on a
> >>> single line, they become invaluable to any junior programmers.  I've
> >>> trained a few in which they couldn't understand why the following
> >>> code segment simply stopped working.  (Let's not even start a
> >>> discussion about System.out.println as a valid debugging tool, ok?
> >>> This is just an example of a n00blet mistake )
>
> >>> for (int y = 0; y < lines; y++)
> >>>   for (int x = 0; x < columns; x++)
> >>>      System.out.println("The sum is: " + sum);
> >>>       sum += cells[y][x];
>
> >>> I agree that the braces add a bit of "clutter" to the visual look  
> >>> and
> >>> feel of code.  However,  I feel that it helps with the overall
> >>> maintainability of the code and therefore, I disregard the way that
> >>> it looks.
>
> >>> --Ryan
>
> >>> On Wed, Sep 9, 2009 at 8:24 PM, Jess Holle <[email protected]
> >>> <mailto:[email protected]>> wrote:
>
> >>>    I'll agree on the newlines and indents, but the braces are silly.
>
> >>>    One might debate the extra whitespace inside the ()'s, but I find
> >>>    it more readable with the whitespace -- to each his/her own in
> >>>    that regard.
>
> >>>    TorNorbye wrote:
> >>>>    On Sep 9, 5:27 pm, Reinier Zwitserloot <[email protected]>  
> >>>> <mailto:[email protected]> wrote:
>
> >>>>>    Here's a line from my code:
>
> >>>>>    for ( int y = 0 ; x < lines ; y++ ) for ( int x = 0 ; x <  
> >>>>> columns ; x+
> >>>>>    + ) sum += cells[y][x];
>
> >>>>    I guess that's where we disagree.
>
> >>>>    for (int y = 0; y < lines; y++) {
> >>>>        for (int x = 0; x < columns; x++) {
> >>>>            sum += cells[y][x];
> >>>>        }
> >>>>    }
>
> >>>>    is IMHO better because:
> >>>>    (a) I can see immediately that I'm dealing with a nested  
> >>>> construct
> >>>>    here, and that's it's O(n^2)
> >>>>    (b) I can more easily set breakpoints on individual statements  
> >>>> of this
> >>>>    code while debugging - and similarly other "line oriented"  
> >>>> operations
> >>>>    (like quickfixes etc) get more cluttery when it's all on one  
> >>>> line.
> >>>>    Profiling data / statement counts / code coverage highlighting  
> >>>> for the
> >>>>    line is also trickier when you mash multiple statements into  
> >>>> one line.
> >>>>    (c) I think it's less likely that I would have made the "x <  
> >>>> lines"
> >>>>    error that was in your code when typing it this way because the
> >>>>    handling of y and x were done separately on separate lines  
> >>>> (though
> >>>>    this is a bit speculative)
> >>>>    (d) I removed your spaces inside the parentheses, because they  
> >>>> are
> >>>>    Bad! Bad!
>
> >>>>    (Ok c and d are padding)
>
> >>>>    I am -not- looking to minimize the number of lines needed to  
> >>>> express
> >>>>    code.  If I wanted that, I'd be coding in Perl.  I  
> >>>> deliberately add
> >>>>    newlines to make the code more airy and to group logical  
> >>>> operations
> >>>>    together. I always insert a newline before the final return-
> >>>> statement
> >>>>    from a function etc.
>
> >>>>    I think the extra vertical space you've gained, which arguably  
> >>>> could
> >>>>    help you orient yourself in your code by showing more of the
> >>>>    surrounding context, is lost because the code itself is denser  
> >>>> and
> >>>>    more difficult to visually scan.
>
> >>>>    Oh no, a formatting flamewar -- what have I gotten myself into?
>
> >>>>    -- Tor
>
> >>>>    P.S. No tabs!
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "The 
Java Posse" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/javaposse?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: AST based language: was Re: [The Java Posse] Re: JavaFX - oddities in the language? Week 2.

Reply via email to