Dear DBpedia’s community, I am here to announce you the results of my GSoC work: The Table Extractor.
Extended REPORT and PROGRESS page [1] REPOSITORY with code, extraction’s data set and log examples, readme file [2] Little report: The aim of the project was to extract useful rdf data set from tables spread all over wiki pages. A lot of tables are used to store data of electoral, sports, competition results [3]. I started the extraction focusing on electoral results, firstly regarding USA presidential elections and therefore in general (it chapter). I had hard times finding the right solution due to a unfortunate solution I initially projected (involving JSONpedia). Please refer to progress page [1] to have more info about that. The final solution manages html page’s representations instead of json ones and it consists in a Python package, which works well on different table’s structure. It works on every wiki chapter, but data extraction strictly depend on rules created on a topic/language base. In fact tables are completely different in data and structures depending on the wiki chapter. Compare [4](it chapter) [5](en chapter) and [6](de chapter). Good results have been achieved on Electoral topic (it chapter). Hope to have the feedback from the whole community. Pills: USA elections results (it wiki pages): 1,8 % table’s lost due to structure’s problems. 87,3 % cells correctly extracted and correctly mapped. 1,9 % cells lost due to a lack of mapping rules. General elections results (it wiki pages): 8 % table’s lost due to structure’s problems. 43 % cells extracted and correctly mapped. 42,6 % cells lost due to a lack of mapping rules. Student: Simone Papalini Mentors: Marco Fossati (DBpedia), Claudia Diamantini, Domenico Potena, Emanuele Storti [1] https://github.com/dbpedia/extraction-framework/wiki/GSoC_2016_Progress_Simone [2] https://github.com/dbpedia/table-extractor [3] https://it.wikipedia.org/wiki/Elezioni_amministrative_italiane_del_2016 [4] https://it.wikipedia.org/wiki/Elezioni_presidenziali_negli_Stati_Uniti_d'America_del_2000 [5] https://en.wikipedia.org/wiki/United_States_presidential_election,_2000 [6] https://de.wikipedia.org/wiki/Pr%C3%A4sidentschaftswahl_in_den_Vereinigten_Staaten_2000
------------------------------------------------------------------------------
_______________________________________________ DBpedia-discussion mailing list DBpedia-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion