Hi Fabian,
On 02/09/14 14:09, Fabian Cretton wrote:
So that would be the goal of the "External data sources" module, which
was originaly called "overLOD Referencer" in the document [1]:
- define precisely RDF data to be cached in the server: that could be a
RDF File, a SPARQL CONSTRUCT on a end-point, etc.
- find a way to validate the content of that data -> here we might not
want to reason in an open world assumption, but if a property is defined
with a certain range, we would want to check that the objects in the
file ARE effectively instances from that defined class (for instance
using SPARQL queries to validate the content, instead of a reasoner).
- find a way to manage automatically the updates: it could be a 'pull'
from Marmotta depending on some VoID data provided by the source, or the
source could put in place a "ping" to marmotta, RSS-like features, like
it was done by Ping-The-Semantic-Web or Sindice
All that infrastructure is provided by the current LDCache module. If I
got it right, where you actually need to plug-in into this
infrastructure is at the LDClient level:
* you can define new LDClient Data Providers for your specific sources
* which can wrap all the validation logic you need
* then LDCache will transparently make use of your LDClient provider
* to avoid conflicts with the default providers, they can be disabled
If that setup fit with your ideas and needs, I'd recommend you to take a
look to the current providers:
https://github.com/apache/marmotta/tree/master/libraries/ldclient
Some of them just do data lifting from others formats (e.g., XML), some
wrap APIs to get RDF out of them (e.g., Facebook), and some do other
kind of validations and fixes (e.g., the Freebase provides does RDF
syntax fixing before parsing).
Hope that helps. I guess we have to provide better documentations and
diagrams to understand the infrastructure LDClient+LDCache provide.
Cheers,
--
Sergio Fernández
Partner Technology Manager
Redlink GmbH
m: +43 660 2747 925
e: [email protected]
w: http://redlink.co