So why not use the "real" parser? * Get rendered HTML page * Extract <div id="bodyContent"> * Take the first <p> element in there
Profit! Magnus On Sat, Aug 7, 2010 at 6:19 PM, Brian J Mingus <brian.min...@colorado.edu> wrote: > On Sat, Aug 7, 2010 at 10:54 AM, lmhelp <lm...@wanadoo.fr> wrote: > >> >> Hi, >> >> Thank you for your answer. >> >> > mwlib is the best parser available for folks who want to do a quick job >> > such >> > as yours. >> >> Maybe it is, I don't know... >> I know (since recently) it is not an easy task constructing a parser for >> "Wikitext"... >> but, fairly, it is not really satisfactory to have {{lang}}, >> {{formatnum:1401}} >> left in the generated "HTML" code, is it (I mean... given the fact that it >> never >> happens with "Wikipedia"). >> >> > mwlib was written in conjunction with the WMF, and IIRC had at least some > input from Brion Vibber. It's high quality and works well. There is a 2-3 > hour learning curve for navigating the python modules and methods using dir > and help. > > > >> > You can use the dumpHTML maintenance script to convert wikitext to html >> >> Would "dumpHTML" work with only one "Wikitext" sentence >> having to be translated to "HTML"? >> >> Actually, on: http://www.mediawiki.org/wiki/Extension:DumpHTML >> one can read: >> "dumpHTML is an extension for generating a simple HTML >> dump, including images and media files, of a MediaWiki >> installation". >> It looks a bit oversized in my case... doesn't it? >> > > IIRC dumpHTML is a maintenance script that is included with mediawiki. I > don't believe that it requires you to have images. I have used both of the > approaches I described to you in the past, and found them both to be > straightforward. > > >> >> All the best, >> -- >> Lmhelp >> -- >> View this message in context: >> http://old.nabble.com/Wikitext-grammar-tp29350471p29375714.html >> Sent from the WikiMedia General mailing list archive at Nabble.com. >> >> >> _______________________________________________ >> MediaWiki-l mailing list >> MediaWiki-l@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/mediawiki-l >> > _______________________________________________ > MediaWiki-l mailing list > MediaWiki-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/mediawiki-l > _______________________________________________ MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l