I would certainly address the problem using Marpa, but I'm kinda biased. :-)

>From what I read, you do *not* need to capture any nesting of blocks.
RNS's parser doesn't do that so it might make a fine start.  (I've never
used it.)

You don't need to produce a full AST, it seems, so the following is not
directly relevant to your question, but let me talk about how I might go
about how I might go about parsing Markdown for display purposes, for which
case the block nesting is essential, I would probably try a two-layer
approach -- one level of parser to capture the line-by-line directives and
individual pieces, and an upper level which captures the block structure.
The upper level would certainly be in Marpa, and probably the lower level
as well.

I hope this is helpful.



On Wed, Apr 22, 2020 at 8:38 PM Martin Quinson <martin.quin...@ens-rennes.fr>
wrote:

> Hello,
>
> I'm one of the authors of the po4a project (https://po4a.org), that
> helps translating the documentation.
>
> The idea is to extract the translatable content of the documents into
> PO files that are comonly used by the translators of open-source
> programs, get the translators do their job, and then reinject the
> translated content in the structure of the original document.
>
> We have parsers for many formats, such as POD, manpages, asciidoc,
> markdown, xml, and some others. The project exists since almost 2
> decades and we are now used in production for the translation of many
> manpages in all major distributions, for the translation of the
> manpages documenting the git project, for the f-droid web pages, for
> the whole fedora documentation, etc.
>
> My problem is that our parsers are currently written as a ugly bunch
> of regexps that are hard to work with, and I am considering converting
> to something more robust.
>
> Our parsers don't really need to access the AST, but they are more of
> a filter calling the translate() function on the parts that need.
>
> Every parser takes a document to analyse + a translation catalog
> (called PO file) associating a set of strings to their transation in a
> given language.
>
> This produce an output document where the content of the input doc was
> replaced by the translations found in the catalog + a list of strings
> that the input doc contains. This list is used to update the
> translation catalogs when the input document changes.
>
>  Input document --\                             /---> Output document
>                    \      TransTractor::       /       (translated)
>                     +-->--   parse()  --------+
>                    /                           \
>  Input PO --------/                             \---> Output PO
>                                                       (extracted)
>
>
> Let's take a little Markdown example:
> | A nice title
> | ============
> |
> | The first paragraph.
> |
> |  * Item 1
> |  * Item 2
>
> I need the following calls to be issued during the parsing:
> |  pushline( translate ("A nice title", "input:1") );
> |  pushline("============");
> |  pushline("");
> |  pushline( translate("The first paragraph.", "input:4") );
> |  pushline("");
> |  pushline(" * " . translate("Item 1", "input:6") );
> |  pushline(" * " . translate("Item 2", "input:7") );
> |  pushline("");
>
> All the po4a magic lays into the translate() function, that add its
> parameters to the output PO file while returning the translation found
> in the input PO file for that string (or the string itself if no
> translation was found). The second parameter of translate is the
> location in the input file.
>
>
> So, after this long context, I guess that my question would simply be:
> how would you address this problem with Marpa?
>
> I found [1], that provide a Marpa parser for the Markdown format.
> First subquestion: is this parser use the latest recommendations to
> Marpa (right input language and such) as I think?
>
>   [1] https://github.com/rns/MarpaX-Languages-CommonMark-AST/
>
> In some sense I feel that this example is too complex for what I need
> because it seems difficult to dump a Markdown file from the AST. Am I
> wrong here? If I'm correct so far, what would be the easiest to dump
> the parser file with no modification, eg using actions? Or maybe I'm
> misleaded and Marpa is not exactly the tool I'm looking for?
>
> I have the feeling that what I need is very simple, but I fail to nail
> it done, so I'd really appreciate any idea or insight that you could
> provide.
>
> Thanks in advance,
> Mt.
>
> --
> Fear is no philosophy of life.          -- Kurt Von Hammerstein.
>
> --
> You received this message because you are subscribed to the Google Groups
> "marpa parser" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to marpa-parser+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/marpa-parser/20200423003604.GS8215%40cafuron
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to marpa-parser+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/marpa-parser/CA%2B2Wrv9vFx5d8a5Pj6KMX14TwHcfnGjsBcOM3sDRqHAP0UaeFQ%40mail.gmail.com.

Reply via email to