Student: Wouter Maroy Mentors: Anastasia Dimou, Dimitris Kontokostas
TL;DR; The goal of this GSoC project was to start the integration of RML ( http://rml.io) with the DBpedia mappings wiki. The project had 2+1 goals (that all were completed successfully): To read this in a nicely formatted way click here: https://docs.google.com/document/d/1BwSG6Rg-tPZlaATIGvsnLkOSnU7wmRg2Gy57dSGhBrU/edit# Introduction DBpedia uses it’s own defined mappings for extracting triples from Wikipedia. The goal of this project was to integrate RML, a general mapping language for triples and replace the original mappings with RML mapping documents. In terms of goals, this project had two main goals and one optional goal. Main goals: - Translate the DBpedia defined mappings to RML mapping documents - Importing RML documents into the extraction framework and converting them to the existing DBpedia mapping data structures Optional goal: - Create a prototype of an integrated RML processor in the DBpedia extraction framework The project was a success. All goals of the project (including the optional goal) were completed and generated successful results. First goal: translating the DBpedia mappings to RML mappings DBpedia uses different types of custom mappings (e.g. simple property mappings, date interval mappings) for extracting triples from Wikipedia infoboxes. These are in general quite complex. Creating one-on-one mappings from DBpedia mappings to RML mappings was no easy task. Designing these mappings required quite some time during the project. We wanted this to be very accurate because the better these translations are, the better the results will be in the end of the process. To create the alignment it was necessary to dive into the exact details of how the DBpedia mappings were used in the extraction framework. In the other way around, it was necessary to fully understand how an RML mapping could produce the same results. All the DBpedia mappings eventually got their RML mapping version. Some mappings were straightforward but most of the cases were very specific and needed a custom solution. The next step was to automate the translation from the original DBpedia mapping files that are stored on GitHub to their corresponding RML version. This has also been done and was implemented in the extraction framework in the server module. Through this functionality it is now possible to access the RML version of every DBpedia mapping that is present on the running server. Second goal: importing and converting RML A first step towards integrating the executing of RML mapping documents is adding a parser that understands RML documents and converts these into a structure the extraction framework understands. To be specific, the extraction framework uses mapping data structures to store it’s loaded mappings. This parser loads the RML mapping documents and converts these to the mapping data structures. The advantage of using this method is that RML documents can be run and generate triples just as if it were using the old mapping documents. There are no big changes needed in the extraction framework itself to make this work. The drawback is that not all functionality of RML is available. Only the specific mappings designed for each DBpedia mapping can be understood and executed by this parser. For all functionality to be available, an RML processor needs to be integrated fully. An implementation of this parser was added to the extraction framework. It can read all the custom design mappings that were created. It is possible for the framework to load and run these mappings. The produced results are very good, the generated triples are the same as if the process would be run with loading the original DBpedia mappings. Optional goal: prototyping an integrated RML processor To make all functionality from RML available a real RML processor is needed. With an integrated RML processor it would be possible to test the mapping documents that were designed during the first part of the project. In the scope of this project an optional goal was to create a prototype to give an idea what is possible. There were some discussions on how this could be implemented and a solution was picked. A prototype was implemented and produced positive results. The generated triples were not all complete, but it served the purpose of a proof-of-concept implementation. The implementation proved that this workflow for integrating the processor is a possible solution if fully implemented. There was no certainty if it would be possible to create this prototype during the scope of this project. It depended on how long it would take to finalize the main goals. Luckily everything went as planned and the optional goal was completed successfully. Links Commits: https://github.com/dbpedia/extraction-framework/commits?author=wmaroy https://github.com/wmaroy/extraction-framework/commits?author=wmaroy (unmerged) GSoC Project https://summerofcode.withgoogle.com/projects/#6213126861094912
------------------------------------------------------------------------------
_______________________________________________ DBpedia-discussion mailing list DBpedia-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion