Forwarding to the Discovery list, since this project seems like it might be of interest even outside the wikidata context. Blame me if you've already seen this elsewhere. :)
Kevin Smith Agile Coach, Wikimedia Foundation ---------- Forwarded message ---------- From: Marco Fossati <[email protected]> Date: Wed, Jun 15, 2016 at 9:06 AM Subject: [Wikimedia-l] [ANNOUNCEMENT] StrepHit 1.0 Beta Release To: "Discussion list for the Wikidata project." < [email protected]> Cc: [email protected], [email protected] [Feel free to blame me if you read this more than once] To whom it may interest, Full of delight, I would like to announce the first beta release of *StrepHit*: https://github.com/Wikidata/StrepHit TL;DR: StrepHit is an intelligent reading agent that understands text and translates it into *referenced* Wikidata statements. It is a IEG project funded by the Wikimedia Foundation. Key features: -Web spiders to harvest a collection of documents (corpus) from reliable sources -automatic corpus analysis to understand the most meaningful verbs -sentences and semi-structured data extraction -train a machine learning classifier via crowdsourcing -*supervised and rule-based fact extraction from text* -Natural Language Processing utilities -parallel processing You can find all the details here: https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References/Midpoint If you like it, star it on GitHub! Best, Marco
_______________________________________________ discovery mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/discovery
