Hi all, does anybody have experience with parsing high quality abstracts from Wikipedia?
We're currently working on improving the abstract extractor because we need abstracts in industry quality, but we're kind of stuck. There are so many exceptions in the way people use the wiki markup in the first paragraph, especially the pronunciations in other languages are a pain. And the Mediawiki parsing code seems to be unusable. So if you have any experience with that, if you know a good open source parser for wikitext, or if you're willing to contribute in any other way, please let me know. Thanks, Georgi -- Georgi Kobilarov Freie Universität Berlin www.georgikobilarov.com ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
