Hi all,

does anybody have experience with parsing high quality abstracts from
Wikipedia? 

We're currently working on improving the abstract extractor because we
need abstracts in industry quality, but we're kind of stuck. There are
so many exceptions in the way people use the wiki markup in the first
paragraph, especially the pronunciations in other languages are a pain.
And the Mediawiki parsing code seems to be unusable.

So if you have any experience with that, if you know a good open source
parser for wikitext, or if you're willing to contribute in any other
way, please let me know.

Thanks,
Georgi

--
Georgi Kobilarov
Freie Universität Berlin
www.georgikobilarov.com



-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to