I am writing a PyGTK application. I would like to be able to download text only (with formatting) from Wikipedia and display it in my application. I think that I am close to a solution, but I have reached an impasse due to my ignorance of most of the mediawiki API.
My plan has been to use GtkMozembed in my application to render the page, so I need to retrieve html. What is close to working is to use the index.php API with action=render and title=<search string for the Wikipedia page>. The data that I retrieve does display in my browser, but it has the following undesired characteristics: 1. All images appear (I want none). 2. There are sections at the end that I don't want (Further reading, External links, Notes, See also, References). 3. Some characters are not rendered correctly (e.g., IPA: [ˈvÉ”lfgaÅ‹ amaˈdeus ˈmoËtsart]). To fix 1 and 2, I could perhaps use an html parser and delete the offending items, but I wonder whether there is a proper solution using the mediawiki API (such as a prop parameter with which I could at least specify that I don't want any images). I assume that 3 is a unicode problem, but I don't know what to do to fix it. -- Jeffrey Barish _______________________________________________ Mediawiki-api mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
