Sergio, I hope you are well. I hope you are well, and I am coming back to you as I will start these days to understand better Marmotta and see (hopefully with your help) how the features we are thinking of in OverLOD could be implemented. There are quiet a few questions in this email, hope you can answer them so that I can move on more efficiently. Thank you in advance. In a former email your were pointing me towards Fusepool, saying "Actually looking to the idea, technologically talking it does not look so different to what the Fusepool P3 FP7 project tries to do. " I thus had a look at Fusepool, but from what I saw, Fusepool is about "creating" RDF from non-RDF resources, whereas OverLOD is mainly about consuming existing RDF data and have it at disposal for a specific platform and use-case. So one goal of OverLOD is more about the "next steps" of the semantic web: how to consume efficiently RDF, and then make it easier for non-RDF developpers to use the data. For instance, one use case could be to develop a software for engineers, the software being based on construction material data from different construction material providers. The providers publish there catalogs in RDF, and this instance of OverLOD keep a read-only copy of all the catalogs, managing updates efficiently and automatically, eventually handling data validation and so on. The software might need other information as some regulations, maybe meteo data, etc. And all those data are maintained in the OverLOD triple store for the softwares based on that instance of OverLOD. This feature is called "OverLOD Referencer" in the document "OverLOD Surfer – Marmotta discussion" https://dl.dropboxusercontent.com/u/852552/Marmotta_OverLOD%20Surfer%20presentation_0.2.pdf OverLOD triple store (i.e. Marmotta) should be able to handle its own RDF data BUT ALSO including RDF data from other sources on the web (RDF files, RDF from SPARQL construct on different end-points, eventually RDFa, Microformat/Microdata with rdfization here). And there here quite some job to do, we haven't decided yet how far we will get, as there are other features we need to implement. If I am not mistaken, neither Marmotta, nor Fusepool or LDP does have this feature, isn't it ? (I am also reading the LDP specification right now) At the end of the document "OverLOD Surfer – Marmotta discussion", I came up with some questions which haven't been answered if I am not mistaken, here I copy them: Both Marmotta and OverLOD handle LOD data with a local copy of the data. How does Marmotta plan to put in place automatic updates once the original data is modified ? Data validation: does Marmotta plan to validate the local data (in a CWA manner, à la SPIN maybe) ? Data "chunks": does Marmotta provide ways to import only a part of a data source, for instance running SPARQL Construct queries on a end-point, or on the content of an imported RDF file ? It is to be noted that OverLOD was written before the W3C advancement on LDP and JSON-LD. But now want to make good use of those specifications. About developpment Apache developpment is a new world for me, but some colleagues here might help me. Also I will follow the instruction you gave to QiHong earlier this year, and so I will start to fork marmotta if I am not mistaken (git and github are also new to me). You told him:
· Fork our mirror there [1] and give me (wikier) admin permissions. · Create, at least, a branch from 'develop' for your project; according our development guidelines [2], I'd recommend you to use the issue [3] as name for the branch: MARMOTTA-444. · I'll closely follow your development there, using the comments on the code committed to provide you early feedback. · Create issues there for internal issues of the project. So I will do the same, and thus, in my fork, create a branch from 'develop' -> is there a name you recommand me ? do I need to create 'Issues' about OverLOD features ? As you pointed out "For me the "OverLOD Referencer" has a big potential of reusing the infrastructure provided by LDClient [2 ( http://marmotta.apache.org/ldclient/ )] and LDCache [3 ( http://marmotta.apache.org/ldcache/ )]." LDClient seems the way to import external data into Marmotta. I don't see LDClient on the "Platform Architecture Overview". This external data could be RDF, or data that needs to be RDFized, right ? The first question is the one already expressed here above: does LDClient already handle the automatic update of the data once the data source have been modified ? (I only read about some time-out features but don't know yet what it means). Second question is about the RDFizers: there is a nice list of RDFizer listed, but nothing about Microdata/Microformat, would that be something to implement if needed ? Then, LDCache does handle where to store the incoming data from LDClient. But is this storage different from the main Marmotta storage ? Are the imported data part of the default graph and queryable transparently with the other data from the LDP ? I guess so, but from what I read I did have some doubts. That's all for now, thank you again Fabian >>> Sergio Fernández<[email protected]> 18.07.2014 10:53 >>> Hi Fabian. On 13/07/14 07:00, Fabian Cretton wrote: > As I said, we are not really working on that until mid-august. However, > what document would you recommand me to read until then, in order to > really understand Marmotta ? The platform description is a good starting point: http://marmotta.apache.org/platform Just let us know whatever we can help. Cheers, -- Sergio Fernández Partner Technology Manager Redlink GmbH m: +43 660 2747 925 e: [email protected] w: http://redlink.co
