Another good reason to use the MediaWiki parser is to support all the custom tag extension and function parser extensions written for the MediaWiki engine. For example, the last time that I checked there were Trac issues submitted for supporting the "Poem" tag and another for allowing developers to write custom extensions to the mwlib parser.
So as a corollary to the comment that "the only complete specification of wikitext syntax is Parser.php [and] only makes this worse," MediaWiki allows the syntax to be extended through user written extensions which makes the difficulty even worse yet. For myself, I have a number of proprietary MediaWiki function parser extensions and tag extensions that mwlib's parser cannot support. My personal solution was to create a MediaWiki API extension to intercept the "query" action and call the MediaWiki parser to expand my parser extensions and tag extensions before returning the wikitext to mwlib. (as an aside, I'm willing to share that with whomever may be interested). Even with my solution there are items that do not show up in the generated PDF - especially tables of contents, for example. I don't know if using a DOM tree from MediaWiki would help in resolving that any more easily than parsing the wikitext, but its conceivable that it might. However, I do think using a DOM tree from the MediaWiki parse would help with supporting the entire ecosystem of extensions that are out there. Does the parse action support transcluded pages? My memory is that I had problems when I tried to modify mwlib to use the parse action instead of query. My memory is a little hazy, but I think the problem had to do with parse having a different return format than the query action. On Sep 11, 4:13 pm, [EMAIL PROTECTED] wrote: > On Sep 10, 5:42 pm, "Joel Nothman" <[EMAIL PROTECTED]> > wrote:> While the parser is primarily used to convert to PDF, many of us use > it > > for other purposes entirely. Getting back a structured parse tree, rather > > than HTML formatting, can be useful. > > I've filed a bug [1] for it. I'll probably add the feature next week, > if someone else doesn't beat me to it. > > > If nothing else, is the action=parse feature faster than the mwlib parser? > > > Other Wikipedia processors that I played around with that utilised the > > default MediaWiki parser did not do so at an impressive pace. > > As fast or slow as our parser is, I guess. But speed wasn't really why > I'm suggesting this. Implementing your own parser shouldn't be > necessary (as it duplicates code) and is very error-prone: there's > bound to be some corner case your parser handles differently. The fact > that the only complete specification of wikitext syntax is Parser.php > only makes this worse. > > And, of course, wikitext to PDF parsers (or wikitext to anything else, > for that matter) become considerably simpler (and therefore easier to > write and maintain) if there's a pre-built parser tree they can use. > > Roan Kattouw (Catrope) > > [1]https://bugzilla.wikimedia.org/show_bug.cgi?id=15567 --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "mwlib" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/mwlib?hl=en -~----------~----~----~----~------~----~------~--~---
