Dear DBpedia’s community,

I am here to announce you the results of my GSoC work: The Table Extractor.

Extended REPORT and PROGRESS page  [1]

REPOSITORY with code, extraction’s data set and log examples, readme file
[2]


Little report:

The aim of the project was to extract useful rdf data set from tables
spread all over wiki pages.

A lot of tables are used to store data of electoral, sports, competition
results [3].

I started the extraction focusing on electoral results, firstly regarding
USA presidential elections and therefore in general (it chapter). I had
hard times finding the right solution due to a unfortunate solution I
initially projected (involving JSONpedia). Please refer to progress page
[1] to have more info about that.

The final solution manages html page’s representations instead of json ones
and it consists in a Python package, which works well on different table’s
structure.

It works on every wiki chapter, but data extraction strictly depend on
rules created on a topic/language base. In fact tables are completely
different in data and structures depending on the wiki chapter.
Compare [4](it chapter)  [5](en chapter) and [6](de chapter).

Good results have been achieved on Electoral topic (it chapter). Hope to
have the feedback from the whole community.


Pills:

USA elections results (it wiki pages):

1,8 % table’s lost due to structure’s problems.

87,3 % cells correctly extracted and correctly mapped.

1,9 % cells lost due to a lack of mapping rules.


General elections results (it wiki pages):

8 % table’s lost due to structure’s problems.

43 % cells extracted and correctly mapped.

42,6 % cells lost due to a lack of mapping rules.


Student:

Simone Papalini

Mentors:

Marco Fossati (DBpedia), Claudia Diamantini, Domenico Potena, Emanuele
Storti


[1]
https://github.com/dbpedia/extraction-framework/wiki/GSoC_2016_Progress_Simone

[2] https://github.com/dbpedia/table-extractor

[3] https://it.wikipedia.org/wiki/Elezioni_amministrative_italiane_del_2016

[4]
https://it.wikipedia.org/wiki/Elezioni_presidenziali_negli_Stati_Uniti_d'America_del_2000

[5] https://en.wikipedia.org/wiki/United_States_presidential_election,_2000

[6]
https://de.wikipedia.org/wiki/Pr%C3%A4sidentschaftswahl_in_den_Vereinigten_Staaten_2000
------------------------------------------------------------------------------
_______________________________________________
DBpedia-discussion mailing list
DBpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to