Earley Algorithm things and Marpa things

Rocky Sat, 28 Oct 2017 14:27:06 -0700

Fairly recently I've been interested in using decompilation as a means for 
getting more exact location information of where a program is when it is 
stopped or hits an error. See  http://rocky.github.io/NYC-Hackntell/#/ for 
5 minute talk I gave to a very general audience. (I bet you could get it in 
3 minutes though).


Somewhat mirroring your experience with Perl (vs Python), I figured this 
out with the help of Perl Monks, and did this first in Perl 
<https://metacpan.org/pod/B::DeparseTree> interprets via an AST tree that 
it creates.

Having mastered this aspect in Perl, I went on to Python, and that's where 
I discovered the AH Earley Algorithm parser from John Aycock. As that was a 
bit old and unmaintained, I even packaged it for Python 
<https://pypi.python.org/pypi/spark_parser>.

To get more broader interest in this I am in the process of submitting to 
conferences papers on this aspect. 

*Some questions*

First there is the issue of whether I could use Marpa instead of 
Aycock/Horspool SPARK. Well, one handicap for me here is that I know the 
python parser and have mucked with it to give me debug and trace 
information that I so sorely need in debugging grammars. 

What I'll probably do here is use Marpa in my Perl Debugger to parse the 
various command arguments of the "list", "disassemble", and various "break" 
commands. I currently have this done as custom code but I had promised 
users that I would follow gdb conventions 
<http://kirste.userpage.fu-berlin.de/chemnet/use/info/gdb/gdb_8.html> unless 
there was good reason not to. (Complicated code is no longer a good reason 
since this can be easily expressed in a grammar)

So back to the broader question of Marpa for Python. When I installed 
Marpa, it looks like it uses Bison/YACC and does a lot of work in C. This 
kind of thing seems like it is not going to be easy going. 

There there is the broader question of how the Python decompiler currently 
works. At runtime, for each method custom rules are created for 
instructions that have variable length.

That is instead of

   call_expr :: call | expr CALL | expr expr CALL | expr expr expr CALL | 
expr expr expr expr CALL

The operand of the CALL opcode gives how many parameters it takes. So  a 
rule like:


  call_expr ::= call_4
  call_4 ::= expr expr expr expr CALL

is created when we see there is a call with 4 arguments. This cute idea 
goes back to the earliest versions of the decompiler by John Aycock. I 
can't find it mentioned in the literature though. 

And I think somewhere I read that Marpa doesn't prefer to add rules at 
runtime per method. I suspect if the precompile phase or Marpa is fast 
enough this is moot. 

In spark, I also added the ability to *remove *grammar rules since there is 
a lot of sharing of grammars going on between different Python versions. 

Somewhere I read that the Earley parser is cubic. However ambiguous 
grammars can have an exponential number of derivations. 

In particular consider this: 

  E ::= E E
 E ::= a 

Doesn't his amount to the number of ways you can parenthesize an expression 
of length n which is on the order of n choose 2 which is on the order of 
the upper part of a factorial which is exponential? Perhaps what they mean 
by cubic the time is cubic in the number of possible derivations although 
the number of derivations for a string of length n may be exponential? 

One other technique of late in SPARK that I've been using is doing 
additional checks at grammar reduction time. For example two expressions 
might for some special kind of tuple if they they are expressions of 
constants of some kind. That kind of check might be too cumbersome to code 
in the grammar by changing opcodes. 

Does Marpa allow for a programmable check to be made before performing a 
reduction? 

Going the other way, may be beneficial: adding some Marpa goodness into the 
spark parser? Thoughts about that? 

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Earley Algorithm things and Marpa things

Reply via email to