2010-08-07 20:24, lmhelp skrev: > >> So why not use the "real" parser? > > Exactly. Where can it be found, please? > > Thanks and all the best, > -- > Lmhelp
fetch the html from wikipedia.org with something like wget (playing nicely and using delays!) and then extract the first <p> element with something which parses the html into a tree. I've done that using perl with HTML::Tree. Generally a regular expression like /<p\b.+?<\/p>/ might do the extraction just as well, but cheaper and faster if you,re just after the first <p> element! Really cheap, I know! /BP _______________________________________________ MediaWiki-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
