I would certainly address the problem using Marpa, but I'm kinda biased. :-)
>From what I read, you do *not* need to capture any nesting of blocks. RNS's parser doesn't do that so it might make a fine start. (I've never used it.) You don't need to produce a full AST, it seems, so the following is not directly relevant to your question, but let me talk about how I might go about how I might go about parsing Markdown for display purposes, for which case the block nesting is essential, I would probably try a two-layer approach -- one level of parser to capture the line-by-line directives and individual pieces, and an upper level which captures the block structure. The upper level would certainly be in Marpa, and probably the lower level as well. I hope this is helpful. On Wed, Apr 22, 2020 at 8:38 PM Martin Quinson <[email protected]> wrote: > Hello, > > I'm one of the authors of the po4a project (https://po4a.org), that > helps translating the documentation. > > The idea is to extract the translatable content of the documents into > PO files that are comonly used by the translators of open-source > programs, get the translators do their job, and then reinject the > translated content in the structure of the original document. > > We have parsers for many formats, such as POD, manpages, asciidoc, > markdown, xml, and some others. The project exists since almost 2 > decades and we are now used in production for the translation of many > manpages in all major distributions, for the translation of the > manpages documenting the git project, for the f-droid web pages, for > the whole fedora documentation, etc. > > My problem is that our parsers are currently written as a ugly bunch > of regexps that are hard to work with, and I am considering converting > to something more robust. > > Our parsers don't really need to access the AST, but they are more of > a filter calling the translate() function on the parts that need. > > Every parser takes a document to analyse + a translation catalog > (called PO file) associating a set of strings to their transation in a > given language. > > This produce an output document where the content of the input doc was > replaced by the translations found in the catalog + a list of strings > that the input doc contains. This list is used to update the > translation catalogs when the input document changes. > > Input document --\ /---> Output document > \ TransTractor:: / (translated) > +-->-- parse() --------+ > / \ > Input PO --------/ \---> Output PO > (extracted) > > > Let's take a little Markdown example: > | A nice title > | ============ > | > | The first paragraph. > | > | * Item 1 > | * Item 2 > > I need the following calls to be issued during the parsing: > | pushline( translate ("A nice title", "input:1") ); > | pushline("============"); > | pushline(""); > | pushline( translate("The first paragraph.", "input:4") ); > | pushline(""); > | pushline(" * " . translate("Item 1", "input:6") ); > | pushline(" * " . translate("Item 2", "input:7") ); > | pushline(""); > > All the po4a magic lays into the translate() function, that add its > parameters to the output PO file while returning the translation found > in the input PO file for that string (or the string itself if no > translation was found). The second parameter of translate is the > location in the input file. > > > So, after this long context, I guess that my question would simply be: > how would you address this problem with Marpa? > > I found [1], that provide a Marpa parser for the Markdown format. > First subquestion: is this parser use the latest recommendations to > Marpa (right input language and such) as I think? > > [1] https://github.com/rns/MarpaX-Languages-CommonMark-AST/ > > In some sense I feel that this example is too complex for what I need > because it seems difficult to dump a Markdown file from the AST. Am I > wrong here? If I'm correct so far, what would be the easiest to dump > the parser file with no modification, eg using actions? Or maybe I'm > misleaded and Marpa is not exactly the tool I'm looking for? > > I have the feeling that what I need is very simple, but I fail to nail > it done, so I'd really appreciate any idea or insight that you could > provide. > > Thanks in advance, > Mt. > > -- > Fear is no philosophy of life. -- Kurt Von Hammerstein. > > -- > You received this message because you are subscribed to the Google Groups > "marpa parser" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/marpa-parser/20200423003604.GS8215%40cafuron > . > -- You received this message because you are subscribed to the Google Groups "marpa parser" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/marpa-parser/CA%2B2Wrv9vFx5d8a5Pj6KMX14TwHcfnGjsBcOM3sDRqHAP0UaeFQ%40mail.gmail.com.
