On Fri, Aug 6, 2010 at 10:18 AM, Léa Massiot <[email protected]> wrote:

> Are you sure this will be able to extract the
> introductory paragraph (only) which is not in any
> section... (because it is not trivial).
>
> There is only one example I could find at
> http://code.pediapress.com/wiki/wiki/mwlib
> ... which is not so easy to understand by the way...
>
> Cheers,
> --
> Lmhelp
>

mwlib is a fairly full featured parser for wikitext. It is not documented,
but by using the dir and help python commands you can easily navigate its
methods. It creates a parse tree from which you can reconstruct the plain
text. Once you have the plain text extracting paragraphs is straightforward.
_______________________________________________
MediaWiki-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l

Reply via email to