[il-antlr-interest: 28179] Re: [antlr-interest] Complementing ANTLR with parboiled

Mathias Fri, 05 Mar 2010 10:24:24 -0800

Ron,

thanks for your feedback.


> OK, I was bemused by your "motivation" page. The motive is built
> around the desire to create domain-specific languages with Java.
> But then, the first disadvantage you claim for existing parser generators
> is this:
> 
>  Special, non-java grammar syntax in separate project files
> 
> Um, that's because parser generators *are* domain-specific
> languages!  So, you don't like special, non-java syntaxes,
> but your goal is to create a tool that lets people create special,
> non-java syntaxes. :-)

I understand that at first glance the point you mention might seem like a 
contradiction.
However, it's not that I don't like DSLs, au contraire!
You could say that even parboiled uses a DSL for defining a grammar, the only 
difference is it being an internal instead of an external DSL.
I think that when developing a DSL one should take into account the environment 
in which the targeted DSL users will be using the language.
A business user of my application might be perfectly content with entering 
short snippets of a business rule DSL on a website without further support 
(apart from documentation). However, any serious present-day java developer is 
relying heavily on his/her IDE to manage large code bases and offset Javas 
relatively high level of verbosity.
When designing the underlying grammar description DSL for a parser generator 
written in Java one has two choices:
a) Choose an external DSL (like ANTLR) and gain conciseness but forego 
automatic IDE support, which can only be achieved by the tedious development of 
custom plugins for all major IDEs.
b) Choose an internal Java DSL (like parboiled) and trade in the compactness 
and expressive power of a custom syntax for automatic support in all IDEs.

IMHO it depends on the size and complexity of the languages the parser 
generator is being designed for whether a) or b) yield the best compromise.
For large projects, where big, complicated languages have to be defined a) 
might be the better choice, since otherwise the limitations of Java as a 
"carrier" for the grammar description DSL might be too restricting and make the 
grammar description bloated and unmanageable.
However, when smaller, less complicated grammars are the main target of a 
parser generator I would argue that b) is the better approach.
Defining the target language grammar directly in Java instead of a special 
syntax puts it under the full power of modern IDEs. Syntax highlighting, code 
completion, code navigation, inspections, reference analysis, refactoring 
support... they all work out of the box. 
Not having to learn another syntax will speed things up, as will not having 
additional build steps for an external generator.

> But seriously, much of the other motivation points also suffer
> the same self-contradictory problem vis a vis the basic nature
> of a domain-specific language. OTOH, this point deserves special
> note:
> 
>    More complicated design and maintenance through divided
>    parsing process in lexing (token generation) and token
>    parsing phases
> 
> The division of labor between lexing and parsing is more than
> half a century old, and it was arrived at (and survived) because
> it does exactly the opposite of what you say: it makes the
> code more modular and easier to maintain. When you try to
> pack the two together for any non-trivial language, you
> inevitably see the hacks multiply (e.g., something as trivial
> as white space becomes some kind of "special case").

You are right, the division into lexing and parsing is very old. And it has 
performance advantages and can make things like whitespace handling easier. 
However, it also has drawbacks. Lexing differs from parsing in the underlying 
logic and is therefore an additional concept to understand. It requires a 
separate specification. It does make it difficult to compose grammars.
On todays hardware performance is not a problem for most applications. The 
second main reason (apart from performance) why it was introduced decades ago, 
grouping input characters to enable parsers with limited look-ahead to "see 
further", is irrelevant with Parsing Expressions Grammars that do not have any 
look-ahead problem.
So again, the decision to split the whole process into lexing and parsing or 
not depends on the application.
If performance and white-space handling are really important, using a separate 
lexing phase might make sense. Otherwise things are easier to build and 
maintain without it, IMHO.

> Finally, as we live in an age where CPU speed has peaked
> and even begun to decline, there is increasing pressure to
> parallelize code to take advantage of the only remaining
> practical advantage of Moore's law -- increasing numbers
> of CPUs. For a language processor, one of the few neat
> and modular divisions of labor that can easily be put
> in parallel is the division between lexing and parsing.
> Often, efficiency doesn't matter for parsing, but since
> you list ANTLR's footprint as a disadvantage, it becomes
> contradictory to claim that combining lexing and parsing
> so they can't be parallelized is an unvarnished advantage.

Yes, ANTLRs footprint in Kb certainly isn't the main point.
But the general size and complexity of all its subparts can make it hard to get 
started with.

> None of this is by way of criticism of the project, which
> I find interesting reading (thanks for the pointer!).

parboiled's raison d'être is not trying to replace ANTLR, JavaCC or any other 
traditional parser generator.
All it would like to offer is an alternative for applications where ANTLR & Co. 
are currently used outside of their primary target areas.

Cheers,
Mathias

---
[email protected]
http://www.parboiled.org


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.

[il-antlr-interest: 28179] Re: [antlr-interest] Complementing ANTLR with parboiled

Reply via email to