Hello everyone,

I am trying to use Marpa to parse mathematical formulas (MathML) and 
extract the relevant arguments.
For that I trigger events whenever an argument gets matched. It actually 
works fine - but there is one problem -
I get too many matches. The problem is that the argument rule gets 
completed a lot of times, but only sometimes 
it actually gets matched as a part of a formula and this is exactly the 
point I'm stuck at - I need to find a way to 
get rid of the matches of argument rules that don't live up to become part 
of an actual formula match.

Here is a simplified version of what I'm working on: 
(Notation means mathematical formula, Presentation - MathML content)
/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
#HERE THE STRUCTURE OF THE GRAMMAR IS DEFINED
#:default ::= action => getString
lexeme default = latm => 1
:start ::= Expression 
ExpressionList ::= Expression ExpressionList 
| Expression 
Expression ::= Notation 
 || Presentation 
 #Presentation MathML
Presentation ::= mrowB ExpressionList mrowE
 | moB '(' moE ExpressionList moB ')' moE 
 | moB text moE 
 | miB text miE 
 | mnB text mnE 
 | mtextB text mtextE 
 | msB text msE 
 | mfracB ExpressionList mfracE
 | msqrtB Expression msqrtE
 | msupB ExpressionList msupE
 | msubB ExpressionList msubE
 | msubsupB ExpressionList msubsupE
 | munderB ExpressionList munderE 
 | moverB ExpressionList moverE 
 | munderoverB ExpressionList munderoverE 
 | mtdB ExpressionList mtdE
 | mtrB ExpressionList mtrE
 | mtableB ExpressionList mtableE 
 | mathB ExpressionList mathE 
 | emB ExpressionList emE 
 | mstyleB ExpressionList mstyleE 
 | mspaceB mspaceE 
 | miSingle 
mfracB ::= ws '<mfrac' attribs '>' ws
mfracE ::= ws '</mfrac>' ws
msqrtB ::= ws '<msqrt' attribs '>' ws
msqrtE ::= ws '</msqrt>' ws
msupB ::= ws '<msup' attribs '>' ws
msupE ::= ws '</msup>' ws
msubB ::= ws '<msub' attribs '>' ws
msubE ::= ws '</msub>' ws
munderB ::= ws '<munder' attribs '>' ws
munderE ::= ws '</munder>' ws
moverB ::= ws '<mover' attribs '>' ws
moverE ::= ws '</mover>' ws
mnB ::= ws '<mn' attribs '>' ws
mnE ::= ws '</mn>' ws
miB ::= ws '<mi' attribs '>' ws
miE ::= ws '</mi>' ws
msB ::= ws '<ms' attribs '>' ws
msE ::= ws '</ms>' ws
mspaceB ::= ws '<mspace' attribs '>' ws
mspaceE ::= ws '</mspace>' ws
moB ::= ws '<mo' attribs '>' ws
moE ::= ws '</mo>' ws
mstyleB ::= ws '<mstyle' attribs '>' ws
mstyleE ::= ws '</mstyle>' ws
mtextB ::= ws '<mtext' attribs '>' ws
mtextE ::= ws '</mtext>' ws
emB ::= ws '<em' attribs '>' ws
emE ::= ws '</em>' ws
mtdB ::= ws '<mtd' attribs '>' ws
mtdE ::= ws '</mtd>' ws
mtrB ::= ws '<mtr' attribs '>' ws
mtrE ::= ws '</mtr>' ws
mtableB ::= ws '<mtable' attribs '>' ws
mtableE ::= ws '</mtable>' ws
msubsupB ::= ws '<msubsup' attribs '>' ws
msubsupE ::= ws '</msubsup>' ws
munderoverB ::= ws '<munderover' attribs '>' ws
munderoverE ::= ws '</munderover>' ws
mrowB ::= ws '<mrow' attribs '>' ws
mrowE ::= ws '</mrow>' ws
mathB ::= ws '<math' attribs '>' ws action => getString
mathE ::= ws '</math>' ws
miSingle ::= ws '<mi' attribs '/>' ws 
ws ::= spaces action => getNothing
ws ::= # empty action => getNothing
spaces ~ space+
space ~ [\s] 
attribs ::= ws || attrib || attrib attribs 
attrib ::= ws notEqSignS '=' ws '"' notQuoteS '"' ws
notEqSignS ~ notEqSign+ 
notEqSign ~ [^=<>/]
notQuoteS ~ notQuote+
notQuote ~ [^"]
text ~ char+
char ~ [^<>]
#ARGUMENT RULE
argRule::= Expression
Notation::=_equal_eqN210

#HERE WE HAVE AN ACTUAL FORMULA - EQUAL WITH ARGUMENTS 'A=B=C=D'
_equal_eqN210::= rule252
#I had to give 3 different names to the argument rules because Marpa 
doesn't handle consecutive matches of the same rule as I want 
# (returns just the longest match when ->last_completed() is called)
rule252::= argRuleN210A1Seq1 rule53 rule252 
 | argRuleN210A1Seq2 rule53 argRuleN210A1Seq3 
argRuleN210A1Seq1::= argRule 
argRuleN210A1Seq2::= argRule 
argRuleN210A1Seq3::= argRule 
rule53::= moB '=' moE 
event 'rule252' = completed rule252
event 'rule53_C' = completed rule53
event 'argRuleN210A1Seq1' = completed argRuleN210A1Seq1
event 'argRuleN210A1Seq2' = completed argRuleN210A1Seq2
event 'argRuleN210A1Seq3' = completed argRuleN210A1Seq3
event '_equal_eqN210_C' = completed _equal_eqN210
# event '_equal_eqN210_P' = predicted _equal_eqN210 #not sure how to use 
it/whether I need it at all
///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
EXAMPLE OF RESULTS:
On the input '<math><mn>1</mn><mo>=</mo><mn>2</mn></math>' the following 
events are triggered:

*Event Positions: Begin->End  MathMLContent*
'argRuleN210A1Seq1'  7 17 <mn>1</mn> - USELESS
'argRuleN210A1Seq2'  7 17 <mn>1</mn> - PERFECT
'argRuleN210A1Seq1' 17 27 <mo>=</mo> - USELESS
'rule53_C'                  17 27 <mo>=</mo> - PERFECT
'argRuleN210A1Seq2' 17 27 <mo>=</mo> - USELESS
'_equal_eqN210_C'      7 37 <mn>1</mn><mo>=</mo><mn>2</mn> - PERFECT
'rule252'                    27 37 <mn>2</mn> -USELESS
'argRuleN210A1Seq1'  7 37 <mn>1</mn><mo>=</mo><mn>2</mn> USELESS
'argRuleN210A1Seq2'  7 37 <mn>1</mn><mo>=</mo><mn>2</mn> - USELESS
'argRuleN210A1Seq3' 27 37 <mn>2</mn> - PERFECT
'argRuleN210A1Seq1'  1 44 <math><mn>1</mn><mo>=</mo><mn>2</mn></math> 
-USELESS (but out of range anyway)
'argRuleN210A1Seq2'  1 44 <math><mn>1</mn><mo>=</mo><mn>2</mn></math> 
-USELESS (but out of range anyway)

To sum up: How can I only get the 'good' matches (that are eventually part 
of a notation match) for arguments and leave aside the other matches?  

Thank you in advance for help.

Best regards,
Toloaca Ion
          
       
    

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to