Hi, Here in this link http://download.wikimedia.org/enwiki/latest/ u can find the file called "enwiki-latest-abstract.xml<http://download.wikimedia.org/enwiki/latest/enwiki-latest-abstract.xml>" witch has all the abstracts of the wikipedia in english.
More information about this u can find in http://en.wikipedia.org/wiki/Wikipedia_database []'s Daniel Hasan Dalip On Wed, Jan 27, 2010 at 3:34 PM, aditya srinivas <[email protected]>wrote: > Hello, > > I am writing a Java program to extract the abstract of the wikipedia page > given the title of the wikipedia page. I have done some research and found > out that the abstract with be in rvsection=0 > > So for example if I want the abstract of 'Eiffel Tower" wiki page then I > am querying using the api in the following way. > > > http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Eiffel%20Tower&rvprop=content&rvsection=0 > > and parse the XML data which we get and take the wikitext in the tag <rev > xml:space="preserve"> which represents the abstract of the wikipedia > page. But this wiki text also contains the infobox data which I do not need. > I would like to know if there is anyway in which I can remove the infobox > data and get only the wikitext related to the page's abstract Or if there is > any alternative method by which I can get the abstract of the page directly. > > Looking forward to your help. > > Thanks in Advance > Aditya Uppu > > _______________________________________________ > Mediawiki-api mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/mediawiki-api > >
_______________________________________________ Mediawiki-api mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
