On 3 January 2015 at 13:02, Peter Kelly <[email protected]> wrote: > Inspired by Jan’s excellent idea of posting what we each plan to work on, > I thought I’d chip in with my intentions: > > - Complete development of a generic parser library based on Parsing > Expression Grammars [1,2], which will serve as a basis for parsing non-XML > based file formats like Markdown, AsciiDoc, reStructuredText, and RTF. This > is something I’ve been dabbling with on and off for about a year now, and > have recently done a complete rewrite of. I also forsee potential in > extending this into a high-level programming language for expressing > transformations similar to XSLT or Stratego/XT [3], but that’s something > for a little further down the track. > I like the idea especially after having read up on Stratego/XT. However we still need at some point to discuss how we store information internally, and how filters can access this information.
> > I’ll put this code in a separate, experimental branch once it’s in a > vaguely reasonable state - Real Soon Now (TM). > > - Implement parsers for XML and HTML. Theoretically this could be done > with the PEG-based parser above, but will be quicker and easier to do > “manually”, as neither are very complicated to do. This will allow us to > remove the external dependencies on libxml2, iconv, and htmltidy. I’ll > likely actually do this first, given that it’s the easiest. > +1 I would really see those go away. > > Note that given these dependencies will shortly be going away, I recommend > against trying to isolate them in platform, as doing so will likely be more > effort than writing the parsers themselves due to the dependencies on data > structures used in core (specifically the DOM classes), which aren’t > accessible from platform. > Agreed, not in my current plans anyhow. > > - Document more of the code base. This will include coding conventions - > how things like error handling, memory management, and string > representation/manipulation are carried out by the library. It will also > cover the core classes and parts of the existing Word filter. > Coding conventions would be real nice to have as a policy web page. I am working with dorte on a couple of extensions to our web, so if you can make the raw text, then dorte can change drawings etc. into the responsive design. > > For those of you interested in formal language theory and parsing > techniques, I recommend reading [4] which describes some of the history and > recent developments such as packrat parsing which make for practical and > simpler implementations of parsers for a more general range of languages > than handled by LL/LR grammars of old. Flex and Bison users in particular > should find this a relieving read :) > > [1] Bryan Ford: Parsing expression grammars: a recognition-based syntactic > foundation. POPL 2004: 111-122. http://bford.info/pub/lang/peg.pdf > > [2] Bryan Ford: Packrat parsing: : simple, powerful, lazy, linear time, > functional pearl. ICFP 2002: 36-47. > http://bford.info/pub/lang/packrat-icfp02.pdf > > [3] http://strategoxt.org > > [4] Lennart C. L. Kats, Eelco Visser, Guido Wachsmuth: Pure and > declarative syntax definition: paradise lost and regained. OOPSLA 2010: > 918-932. > http://swerl.tudelft.nl/twiki/pub/Main/TechnicalReports/TUD-SERG-2010-019.pdf > rgds jan i. > > — > Dr Peter M. Kelly > [email protected] > > PGP key: http://www.kellypmk.net/pgp-key <http://www.kellypmk.net/pgp-key> > (fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966) > >
