Hi,

Here in this link http://download.wikimedia.org/enwiki/latest/  u can find
the file called
"enwiki-latest-abstract.xml<http://download.wikimedia.org/enwiki/latest/enwiki-latest-abstract.xml>"
witch has all the abstracts of the wikipedia in english.


More information about this u can find in
http://en.wikipedia.org/wiki/Wikipedia_database

[]'s

Daniel Hasan Dalip





On Wed, Jan 27, 2010 at 3:34 PM, aditya srinivas <[email protected]>wrote:

> Hello,
>
> I am writing a Java program to extract the abstract of the wikipedia page
> given the title of the wikipedia page. I have done some research and found
> out that the abstract with be in rvsection=0
>
>  So for example if I want the abstract of 'Eiffel Tower" wiki page then I
> am querying using the api in the following way.
>
>
> http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Eiffel%20Tower&rvprop=content&rvsection=0
>
> and parse the XML data which we get and take the wikitext in the tag <rev
> xml:space="preserve">  which represents the abstract of the wikipedia
> page. But this wiki text also contains the infobox data which I do not need.
> I would like to know if there is anyway in which I can remove the infobox
> data and get only the wikitext related to the page's abstract Or if there is
> any alternative method by which I can get the abstract of the page directly.
>
> Looking forward to your help.
>
> Thanks in Advance
> Aditya Uppu
>
> _______________________________________________
> Mediawiki-api mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
>
>
_______________________________________________
Mediawiki-api mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api

Reply via email to