Re: Earley Algorithm things and Marpa things

Rocky Sat, 28 Oct 2017 17:38:29 -0700

I had looked at the link in the earlier reply for the first post but don't 
see how that addresses any of the questions.


With respect to incomplete parses or duplicate ambiguous parses and so 
on...  in my specific situation I am not sure that matters. In 
decompilation, ambiguity is pretty much a given. Perhaps in this situation 
it helps to think of this as akin to human translation: the translation 
doesn't have to be exact or elegant to be useful . 

With respect to time complexity, the question is that the parse time is 2.4 
or cubic with respect to exactly what? I am pretty sure that there can be 
an exponential number of valid derivations or parse trees for E ::= E E | a 
with 
respect to the length of the number of tokens in the input.

Although this isn't rigorous here is a general idea by induction. Suppose I 
have a list of all the parses for  a**n. When adding another "a", to get 
all the new possible derivations I would tack the "a" onto the beginning as 
(a (i)) or onto the end as ((i) a) There may be duplicate parenthesizations 
or parse trees that occur between in the two sets of ways to parse, but I 
suspect that the list of different derivations still grows by order 2n. And 
when I run some examples by hand indeed it seems the number of valid parses 
seems to grow exponentially:

-------------
input aa
1. (E E)
-------------
input aaa
(E (E E))
((E E) E)

-------------
input aaaa
(E (E (E E)))
(E ((E E) E))

((E (E E)) E)
((E E) E)) E)


I looked again at libmarpa_build or engine and although there's no Bison 
used (parsing of the marpa's grammar rules seems to be described 
in lib/Marpa/R2/meta/metag.bnf) there does seem to be a lot of C code one 
way or another. Not necessary bad, but just is something to be aware of if 
porting to another programming language. 

So in the back of my mind I still am thinking about if there is good stuff 
in Marpa that I could to the python SPARK, perhaps the ruby slippers error 
handling. And what would it be like to change the decompiler  uncompyle6 
<https://github.com/rocky/python-uncompyle6> to use Marpa. 

I'll get it sorted out somehow.

Thanks for the information.


On Saturday, October 28, 2017 at 6:30:38 PM UTC-4, Jeffrey Kegler wrote:
>
> Re Marpa and AH, you might want to look at this earlier reply: 
> https://groups.google.com/d/msg/marpa-parser/eKSSioagswU/uKm_AgxGAQAJ
>
> In creating Marpa, I may have acquired as much experience with the AH 
> algorithm as anybody, possible even Aycock/Horspool themselves.  I found 
> the special LR(0) items they create to be counter-productive, and ripped 
> them out of the Marpa algorithm converting back to a more conventional 
> Earley's parser.  But I did keep some other ideas, and I still consider the 
> AH article to be one of the sources of Marpa.  For more, see that link 
> above.
>
> I did not work from SPARK, and I've heard it was buggy, though I don't 
> know.  I do know the algorithm in the original article has a problem in 
> that it can produce two sets of AH-items for the same parse, so that unless 
> you go to some trouble to look for this situation, you get duplicate parses 
> of ambiguous grammars.  I successfully worked around this (I don't think 
> SPARK does), but by that time I had enough experience to get numbers on the 
> benefit I was getting for all this additional complexity and realized the 
> AH-LR(0) states, in practice, just don't produce enough of a payoff.
>
> Marpa does *NOT* use Bison or Yacc or LALR-parsing.  You might have gotten 
> that impression if, when installing Marpa, you also had to compile Perl 
> from source.  Perl does use Bison for its parsing.
>
> Marpa is cubic in the general case for BNF, but so is every other general 
> BNF parsing ever implemented.  (Valiant's is O(n**2.4), but has never had 
> IIRC even a toy implementation.)  If you compare apples-to-apples, if 
> another grammar parses a well-defined grammar class in linear time, Marpa 
> also parses that grammar class in linear time.
>
> Wrt "checks at reduction time".  Marpa::R2's ASF facility might allow you 
> to do what you have in mind.
>
> I don't think this addressed everything, but I hope it's a helpful start 
> -- jeffrey
>
> On Sat, Oct 28, 2017 at 2:26 PM, Rocky <[email protected] <javascript:>
> > wrote:
>
>> Fairly recently I've been interested in using decompilation as a means 
>> for getting more exact location information of where a program is when it 
>> is stopped or hits an error. See  http://rocky.github.io/NYC-Hackntell/#/ 
>> for 
>> 5 minute talk I gave to a very general audience. (I bet you could get it in 
>> 3 minutes though). 
>>
>> Somewhat mirroring your experience with Perl (vs Python), I figured this 
>> out with the help of Perl Monks, and did this first in Perl 
>> <https://metacpan.org/pod/B::DeparseTree> interprets via an AST tree 
>> that it creates.
>>
>> Having mastered this aspect in Perl, I went on to Python, and that's 
>> where I discovered the AH Earley Algorithm parser from John Aycock. As that 
>> was a bit old and unmaintained, I even packaged it for Python 
>> <https://pypi.python.org/pypi/spark_parser>.
>>
>> To get more broader interest in this I am in the process of submitting to 
>> conferences papers on this aspect. 
>>
>> *Some questions*
>>
>> First there is the issue of whether I could use Marpa instead of 
>> Aycock/Horspool SPARK. Well, one handicap for me here is that I know the 
>> python parser and have mucked with it to give me debug and trace 
>> information that I so sorely need in debugging grammars. 
>>
>> What I'll probably do here is use Marpa in my Perl Debugger to parse the 
>> various command arguments of the "list", "disassemble", and various "break" 
>> commands. I currently have this done as custom code but I had promised 
>> users that I would follow gdb conventions 
>> <http://kirste.userpage.fu-berlin.de/chemnet/use/info/gdb/gdb_8.html> unless 
>> there was good reason not to. (Complicated code is no longer a good reason 
>> since this can be easily expressed in a grammar)
>>
>> So back to the broader question of Marpa for Python. When I installed 
>> Marpa, it looks like it uses Bison/YACC and does a lot of work in C. This 
>> kind of thing seems like it is not going to be easy going. 
>>
>> There there is the broader question of how the Python decompiler 
>> currently works. At runtime, for each method custom rules are created for 
>> instructions that have variable length.
>>
>> That is instead of
>>
>>    call_expr :: call | expr CALL | expr expr CALL | expr expr expr CALL 
>> | expr expr expr expr CALL
>>
>> The operand of the CALL opcode gives how many parameters it takes. So  a 
>> rule like:
>>
>>
>>   call_expr ::= call_4
>>   call_4 ::= expr expr expr expr CALL
>>
>> is created when we see there is a call with 4 arguments. This cute idea 
>> goes back to the earliest versions of the decompiler by John Aycock. I 
>> can't find it mentioned in the literature though. 
>>
>> And I think somewhere I read that Marpa doesn't prefer to add rules at 
>> runtime per method. I suspect if the precompile phase or Marpa is fast 
>> enough this is moot. 
>>
>> In spark, I also added the ability to *remove *grammar rules since there 
>> is a lot of sharing of grammars going on between different Python versions. 
>>
>> Somewhere I read that the Earley parser is cubic. However ambiguous 
>> grammars can have an exponential number of derivations. 
>>
>> In particular consider this: 
>>
>>   E ::= E E
>>  E ::= a 
>>
>> Doesn't his amount to the number of ways you can parenthesize an 
>> expression of length n which is on the order of n choose 2 which is on the 
>> order of the upper part of a factorial which is exponential? Perhaps what 
>> they mean by cubic the time is cubic in the number of possible derivations 
>> although the number of derivations for a string of length n may be 
>> exponential? 
>>
>> One other technique of late in SPARK that I've been using is doing 
>> additional checks at grammar reduction time. For example two expressions 
>> might for some special kind of tuple if they they are expressions of 
>> constants of some kind. That kind of check might be too cumbersome to code 
>> in the grammar by changing opcodes. 
>>
>> Does Marpa allow for a programmable check to be made before performing a 
>> reduction? 
>>
>> Going the other way, may be beneficial: adding some Marpa goodness into 
>> the spark parser? Thoughts about that? 
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "marpa parser" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Earley Algorithm things and Marpa things

Reply via email to