Re: [Emacs-ada-mode] new emacs mode blocking problems : performance, indentation while code is incomplete

Stephen Leake Wed, 11 Jun 2014 12:01:50 -0700

WAROQUIERS Philippe <[email protected]> writes:

> Here at Eurocontrol, several developers tried the new Ada mode.


Good to hear feedback.

> Performance problem:
> On big files, the performance is unacceptably slow.
> On one of our biggest files, the initial loading/parsing of the file blocks 
> the emacs
> during 10 seconds.

That is very slow.

If you can send me the file, I might be able to speed things up; there
are tradeoffs in the current parser.

Is it possible to simply split the file? I understand letting the editor
dictate file size is awkward, but it is an option.

> After that, typing some characters blocks the emacs during 7 seconds,
> after which the typed characters become visible Further typing
> exhibits the same delay, each time the idle timer kicks in. 

The entire file is reparsed for any change. Actually, the parser must
parse one compilation unit (see below). If there is more than one
compilation unit in the file, Ada mode currently only parses enough
compilation units to include point. But it still starts over from the
beginning of the file if anything changes.

> Indentation when editing code:
> Interactive editing is painful, as there is no
> immediate indentation as long as the code is not syntactically
> complete.

Yes. Georg pointed out using C-c C-e to expand skeletons; that is the
recommended approach to this issue.

It is a change in editing behavior; I am getting used to it (but I am
_very_ biased :).

> I understand that Ada mode 5 is based on a new parser technique, which
> has a lot of advantages.

I hope you have tried some of the new navigation features enabled by the
parser, and will find they outweigh the indentation
issue. I agree the speed issue is much harder to ignore.

For me as the maintainer, the overwhelming advantage is that the Elisp
code now matches the Ada syntax, so it is _much_ easier to maintain, in
particular to add new features for new versions of the language. It was
the Ada 2012 if and case expressions that triggered the new parser; Ada
mode 4.0 just could not cope with that syntax, and I could not cope with
trying to fix it.

> Without knowing anything about the implementation, 

It uses a generalized LALR parser. That means is must parse one complete
compilation_unit, as defined by the Ada grammar (actually a superset
of that grammar, to simplify the skeletons).

> For performance problem: maybe the parser code (or the code that
> starts the parser) should avoid parsing the complete file. 

That requires a completely different parser design.

SMIE (Simple Minded Indentation Engine) is an Emacs package that
provides such a parser; it supports parsing the minimal amount of text
around point to compute indentation.

When I started the Ada mode 5.0 effort, I used SMIE, because I assumed
an LALR parser would be too slow. But could never get the indentation
and navigation completely correct. It turns out there are things in Ada
that simply require parsing the whole compilation unit. I'd have to dig
thru the monotone commit logs to remember exactly what the problems
where; something about nested packages and subprograms, I think. It
ended up parsing the entire compilation unit for some things, which is
why I switched to the LALR parser; I noticed that parsing the entire
compilation unit was _not_ "too slow".

You could try out that parser, by checking out the monotone branch
org.emacs.ada-mode.smie from the ada-france server.

You might find that trading interactive indenting for accurate
indenting/navigating is acceptable. But you will lose other features.
And it would take a lot of convincing for me to maintain both (money
would certainly help; I've retired from NASA, and I'm looking for other
jobs). The SMIE grammar mostly matches the Ada grammar, but not as
nicely as the generalized LALR grammar does.

> It could maybe use an heuristic to start parsing from some nr of lines
> before Point. E.g. the heuristic could be : search a procedure or
> package or ... (whatever looks reasonable) up to xx lines before the
> Point, and then starts parsing from there.

That seems tempting, but when you consider all of the Ada syntax, an
LALR parser requires a compilation unit.

The SMIE parser could ignore lots of Ada syntax. Which is partly why the
indentation/navigation of that syntax was not very nice.

There are a couple options for speeding up the first parse of a
compilation unit:

1) Rewrite parser in Ada (or C if necessary). I'm not sure how much this
   would gain; the lisp compiler is supposed to be pretty good.

2) Reduce some of the redundancy in the grammar. I'm using a generalized
   LALR parser, which spawns parallel parsers whenever it finds a
   conflict in the grammar. That makes it easier to use Annex P as the
   reference for the grammar. A pure LALR parser requires lots of
   contortions of the grammar, making it harder to understand and
   maintain.

   The downside of the generalized approach is parsing speed; every
   spawned parser cuts the speed in half. In most cases, one of the
   parallel parsers reaches an error state and stops, so there is
   normally only one parser going. But sometimes we get "parallel
   explosion"; I recently changed the case statement grammar to avoid
   that for nested case statements. Your files may have a similar
   problem. 

For speeding up parsing after minor editing, we could cache the parser
state every so often; parsing could start from there. And if the change
is truly local (ie not introducing a "begin"), then the parse state of
the new code would match the cached parse state after the edit, and
parsing could stop.

That's complex, and does not solve the first parse problem, so I have
not worked on it yet.

So far, my files are small enough that the parsing delay is acceptable;
that's due to my coding style.

> For what concerns interactive edition: when the parser encounters an
> error, rather than not do any indentation anymore, the parsing could
> be relaunched from the starting parsing position (cfr above
> performance problem), but assuming that just after Point, we have an
> artificial "grammar token" that "properly completes" all the grammar
> elements in the parsed zone before the Point.

This is a possibility that I have not investigated at all. Some of the
compiler textbooks discuss error recovery techniques for LALR parsers,
so there is precedent for this.

> An example: Imagine I am typing: procedure A is begin
> Another_Procedure_Call;

The parser actually notices the problem on the next token, so the full
context is required. It may be several tokens before the error is
detected.

To a human reading this, clearly "end;" would solve the problem. Getting
a parser to understand that is the trick.

If you use skeletons for new code, the skeleton inserts "end;" for you.
So this is only a problem when editing existing code.

One approach to editing existing code is to type the new syntax skeleton
first, then modify the bodies. I am getting used to that process.

But it would be worth exploring a hueristic guess for completing the
syntax. There is no way to insert the correct missing syntax in general,
so I need some examples to work from.

If you can post some real-life editing tasks that you find awkward, I
can look into it. I'll try to notice that situation myself. So far, it
has not been bad enough to motivate me to do anything about it :). But I
like writing good tools, especially when I get feedback, so fixing this
problem would be fun.


Another option is to use AdaCore's GPS; it has a very fast parser in the
editor (completely separate from the compiler; it's recursive descent,
implemented in Ada). Obviously Emacs is better in general :). I've toyed
with trying to implement some of the features I like from Emacs in GPS,
but I always give up; the Elisp environment is much more friendly to
this sort of development, and the list of missing features in GPS is
_so_ long (monotone integration is the first on the list; Gnus for email
and news is second :). I've suggested that AdaCore pay me to make GPS
better, but so far they have refused :(.

-- 
-- Stephe

_______________________________________________
Emacs-ada-mode mailing list
[email protected]
http://host114.hostmonster.com/mailman/listinfo/emacs-ada-mode_stephe-leake.org

Re: [Emacs-ada-mode] new emacs mode blocking problems : performance, indentation while code is incomplete

Reply via email to