Hello everyone,
I am trying to use Marpa to parse mathematical formulas (MathML) and
extract the relevant arguments.
For that I trigger events whenever an argument gets matched. It actually
works fine - but there is one problem -
I get too many matches. The problem is that the argument rule gets
completed a lot of times, but only sometimes
it actually gets matched as a part of a formula and this is exactly the
point I'm stuck at - I need to find a way to
get rid of the matches of argument rules that don't live up to become part
of an actual formula match.
Here is a simplified version of what I'm working on:
(Notation means mathematical formula, Presentation - MathML content)
/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
#HERE THE STRUCTURE OF THE GRAMMAR IS DEFINED
#:default ::= action => getString
lexeme default = latm => 1
:start ::= Expression
ExpressionList ::= Expression ExpressionList
| Expression
Expression ::= Notation
|| Presentation
#Presentation MathML
Presentation ::= mrowB ExpressionList mrowE
| moB '(' moE ExpressionList moB ')' moE
| moB text moE
| miB text miE
| mnB text mnE
| mtextB text mtextE
| msB text msE
| mfracB ExpressionList mfracE
| msqrtB Expression msqrtE
| msupB ExpressionList msupE
| msubB ExpressionList msubE
| msubsupB ExpressionList msubsupE
| munderB ExpressionList munderE
| moverB ExpressionList moverE
| munderoverB ExpressionList munderoverE
| mtdB ExpressionList mtdE
| mtrB ExpressionList mtrE
| mtableB ExpressionList mtableE
| mathB ExpressionList mathE
| emB ExpressionList emE
| mstyleB ExpressionList mstyleE
| mspaceB mspaceE
| miSingle
mfracB ::= ws '<mfrac' attribs '>' ws
mfracE ::= ws '</mfrac>' ws
msqrtB ::= ws '<msqrt' attribs '>' ws
msqrtE ::= ws '</msqrt>' ws
msupB ::= ws '<msup' attribs '>' ws
msupE ::= ws '</msup>' ws
msubB ::= ws '<msub' attribs '>' ws
msubE ::= ws '</msub>' ws
munderB ::= ws '<munder' attribs '>' ws
munderE ::= ws '</munder>' ws
moverB ::= ws '<mover' attribs '>' ws
moverE ::= ws '</mover>' ws
mnB ::= ws '<mn' attribs '>' ws
mnE ::= ws '</mn>' ws
miB ::= ws '<mi' attribs '>' ws
miE ::= ws '</mi>' ws
msB ::= ws '<ms' attribs '>' ws
msE ::= ws '</ms>' ws
mspaceB ::= ws '<mspace' attribs '>' ws
mspaceE ::= ws '</mspace>' ws
moB ::= ws '<mo' attribs '>' ws
moE ::= ws '</mo>' ws
mstyleB ::= ws '<mstyle' attribs '>' ws
mstyleE ::= ws '</mstyle>' ws
mtextB ::= ws '<mtext' attribs '>' ws
mtextE ::= ws '</mtext>' ws
emB ::= ws '<em' attribs '>' ws
emE ::= ws '</em>' ws
mtdB ::= ws '<mtd' attribs '>' ws
mtdE ::= ws '</mtd>' ws
mtrB ::= ws '<mtr' attribs '>' ws
mtrE ::= ws '</mtr>' ws
mtableB ::= ws '<mtable' attribs '>' ws
mtableE ::= ws '</mtable>' ws
msubsupB ::= ws '<msubsup' attribs '>' ws
msubsupE ::= ws '</msubsup>' ws
munderoverB ::= ws '<munderover' attribs '>' ws
munderoverE ::= ws '</munderover>' ws
mrowB ::= ws '<mrow' attribs '>' ws
mrowE ::= ws '</mrow>' ws
mathB ::= ws '<math' attribs '>' ws action => getString
mathE ::= ws '</math>' ws
miSingle ::= ws '<mi' attribs '/>' ws
ws ::= spaces action => getNothing
ws ::= # empty action => getNothing
spaces ~ space+
space ~ [\s]
attribs ::= ws || attrib || attrib attribs
attrib ::= ws notEqSignS '=' ws '"' notQuoteS '"' ws
notEqSignS ~ notEqSign+
notEqSign ~ [^=<>/]
notQuoteS ~ notQuote+
notQuote ~ [^"]
text ~ char+
char ~ [^<>]
#ARGUMENT RULE
argRule::= Expression
Notation::=_equal_eqN210
#HERE WE HAVE AN ACTUAL FORMULA - EQUAL WITH ARGUMENTS 'A=B=C=D'
_equal_eqN210::= rule252
#I had to give 3 different names to the argument rules because Marpa
doesn't handle consecutive matches of the same rule as I want
# (returns just the longest match when ->last_completed() is called)
rule252::= argRuleN210A1Seq1 rule53 rule252
| argRuleN210A1Seq2 rule53 argRuleN210A1Seq3
argRuleN210A1Seq1::= argRule
argRuleN210A1Seq2::= argRule
argRuleN210A1Seq3::= argRule
rule53::= moB '=' moE
event 'rule252' = completed rule252
event 'rule53_C' = completed rule53
event 'argRuleN210A1Seq1' = completed argRuleN210A1Seq1
event 'argRuleN210A1Seq2' = completed argRuleN210A1Seq2
event 'argRuleN210A1Seq3' = completed argRuleN210A1Seq3
event '_equal_eqN210_C' = completed _equal_eqN210
# event '_equal_eqN210_P' = predicted _equal_eqN210 #not sure how to use
it/whether I need it at all
///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
EXAMPLE OF RESULTS:
On the input '<math><mn>1</mn><mo>=</mo><mn>2</mn></math>' the following
events are triggered:
*Event Positions: Begin->End MathMLContent*
'argRuleN210A1Seq1' 7 17 <mn>1</mn> - USELESS
'argRuleN210A1Seq2' 7 17 <mn>1</mn> - PERFECT
'argRuleN210A1Seq1' 17 27 <mo>=</mo> - USELESS
'rule53_C' 17 27 <mo>=</mo> - PERFECT
'argRuleN210A1Seq2' 17 27 <mo>=</mo> - USELESS
'_equal_eqN210_C' 7 37 <mn>1</mn><mo>=</mo><mn>2</mn> - PERFECT
'rule252' 27 37 <mn>2</mn> -USELESS
'argRuleN210A1Seq1' 7 37 <mn>1</mn><mo>=</mo><mn>2</mn> USELESS
'argRuleN210A1Seq2' 7 37 <mn>1</mn><mo>=</mo><mn>2</mn> - USELESS
'argRuleN210A1Seq3' 27 37 <mn>2</mn> - PERFECT
'argRuleN210A1Seq1' 1 44 <math><mn>1</mn><mo>=</mo><mn>2</mn></math>
-USELESS (but out of range anyway)
'argRuleN210A1Seq2' 1 44 <math><mn>1</mn><mo>=</mo><mn>2</mn></math>
-USELESS (but out of range anyway)
To sum up: How can I only get the 'good' matches (that are eventually part
of a notation match) for arguments and leave aside the other matches?
Thank you in advance for help.
Best regards,
Toloaca Ion
--
You received this message because you are subscribed to the Google Groups
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.