Another good reason to use the MediaWiki parser is to support all the
custom tag extension and function parser extensions written for the
MediaWiki engine.  For example, the last time that I checked there
were Trac issues submitted for supporting the "Poem" tag and another
for allowing developers to write custom extensions to the mwlib
parser.

So as a corollary to the comment that "the only complete specification
of wikitext syntax is Parser.php [and] only makes this worse,"
MediaWiki allows the syntax to be extended through user written
extensions which makes the difficulty even worse yet.

For myself, I have a number of proprietary MediaWiki function parser
extensions and tag extensions that mwlib's parser cannot support.  My
personal solution was to create a MediaWiki API extension to intercept
the "query" action and call the MediaWiki parser to expand my parser
extensions and tag extensions before returning the wikitext to mwlib.
(as an aside, I'm willing to share that with whomever may be
interested).

Even with my solution there are items that do not show up in the
generated PDF - especially tables of contents, for example.  I don't
know if using a DOM tree from MediaWiki would help in resolving that
any more easily than parsing the wikitext, but its conceivable that it
might.

However, I do think using a DOM tree from the MediaWiki parse would
help with supporting the entire ecosystem of extensions that are out
there.

Does the parse action support transcluded pages?  My memory is that I
had problems when I tried to modify mwlib to use the parse action
instead of query.  My memory is a little hazy, but I think the problem
had to do with parse having a different return format than the query
action.



On Sep 11, 4:13 pm, [EMAIL PROTECTED] wrote:
> On Sep 10, 5:42 pm, "Joel Nothman" <[EMAIL PROTECTED]>
> wrote:> While the parser is primarily used to convert to PDF, many of us use 
> it  
> > for other purposes entirely. Getting back a structured parse tree, rather  
> > than HTML formatting, can be useful.
>
> I've filed a bug [1] for it. I'll probably add the feature next week,
> if someone else doesn't beat me to it.
>
> > If nothing else, is the action=parse feature faster than the mwlib parser?
>
> > Other Wikipedia processors that I played around with that utilised the  
> > default MediaWiki parser did not do so at an impressive pace.
>
> As fast or slow as our parser is, I guess. But speed wasn't really why
> I'm suggesting this. Implementing your own parser shouldn't be
> necessary (as it duplicates code) and is very error-prone: there's
> bound to be some corner case your parser handles differently. The fact
> that the only complete specification of wikitext syntax is Parser.php
> only makes this worse.
>
> And, of course, wikitext to PDF parsers (or wikitext to anything else,
> for that matter) become considerably simpler (and therefore easier to
> write and maintain) if there's a pre-built parser tree they can use.
>
> Roan Kattouw (Catrope)
>
> [1]https://bugzilla.wikimedia.org/show_bug.cgi?id=15567
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"mwlib" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/mwlib?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to