what about having mwlib use the mw's api parse parameter to retrieve a parse tree from a page?
why would this project insist on being its own parser, you will struggle to keep up with the features offered by the mw native parser as well as cause more headaches when it comes to supporting extensions that are already working with the native parser? it would seem to me from a features standpoint you would simply parse the output of a printable page output or the simple html output that comes from the mw api. On Nov 18, 2009 4:25am, Ralf Schmitt <[email protected]> wrote: > Gero [email protected]> writes: > > I think that the expansion of the DPL tag (parser function and parser > > tag, both are possible!) > > should be handled by the php code of the DPL extension. > > > > DPL has quite a lot of features and it would be very hard work to re- > > implement all or even part of its functionality > > in another language. > > > > Also, wIthin DPL some MW parser functions are called to process > > transcluded content. > > > > So I think Jeremy is absolutely right: The collection extension should > > process a page after > > DPL has done its job. If the mwlib can live with that sequence of > > processing it should > > be possible to enable DPL with reasonable effort. Otherwíse I see no > > chance. > > > > DPL's output is in principle standard WIKI syntax; in some cases (~ > > 10%) HTML tags will be found in the > > output. These text portions are wrapped within HTML .. /HTML tags, > > however, so they are also wiki-compatible. > Only doing the DPL processing and not running the template expansion > is not possible. > We currently only handle the unprocessed wiki source and can't work with > mediawiki's html output. We do have plans to parse mediawiki's > output. However, nothing has can been coded so far. > If I had to implement this now, I would use lxml's html parser and > convert it's parse tree to our internal format as generated by > mwlib.refine. > - Ralf > -- > You received this message because you are subscribed to the Google > Groups "mwlib" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/mwlib?hl=. -- You received this message because you are subscribed to the Google Groups "mwlib" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/mwlib?hl=.
