Question on moving linked data sets
Dear all, We have a question on an what to do when a linked data set is moved from one namespace to the other. We searched for recipes to apply, but did not really find anything 'official' around... The VU university of Amsterdam has published a Linked Data SKOS representation of RAMEAU [1] as a prototype, several years ago. For example we have http://stitch.cs.vu.nl/vocabularies/rameau/ark:/12148/cb14521343b Recently, BnF implemented its own production service for RAMEAU. The previous concept is at: http://data.bnf.fr/ark:/12148/cb14521343b (see RDF at http://data.bnf.fr/14521343/web_semantique/rdf.xml) The production services makes the prototype obsolete. Our issue is how to properly transition from one to the other. Several services are using the URIs of the prototype. For example at the Library of Congress: http://id.loc.gov/authorities/subjects/sh2002000569 We can ask for the people we know to change their links. But identifying the users of URIs seems too manual, error-prone a process. And of course in general we do not want links to be broken. Currently we have done the following: - a 301 moved permanently redirection from the stitch.cs.vu.nl/rameau prototype to data.bnf.fr. - an owl:sameAs statement between the prototype URIs and the production ones, so that a client searching for data on the old URI gets data that enables it to make the connection with the original resource (URI) it was seeking data about. Does that seem ok? What should we do, otherwise? Thanks for any feedback you could have, Antoine Isaac (VU Amsterdam side) Romain Wenz (BnF side) [1] RAMEAU is a vocabulary (thesaurus) used by the National Library of France (BnF) for describing books.
Re: Question on moving linked data sets
On 4/19/12 10:23 AM, Antoine Isaac wrote: Dear all, We have a question on an what to do when a linked data set is moved from one namespace to the other. We searched for recipes to apply, but did not really find anything 'official' around... The VU university of Amsterdam has published a Linked Data SKOS representation of RAMEAU [1] as a prototype, several years ago. For example we have http://stitch.cs.vu.nl/vocabularies/rameau/ark:/12148/cb14521343b Recently, BnF implemented its own production service for RAMEAU. The previous concept is at: http://data.bnf.fr/ark:/12148/cb14521343b (see RDF at http://data.bnf.fr/14521343/web_semantique/rdf.xml) The production services makes the prototype obsolete. Our issue is how to properly transition from one to the other. Several services are using the URIs of the prototype. For example at the Library of Congress: http://id.loc.gov/authorities/subjects/sh2002000569 We can ask for the people we know to change their links. But identifying the users of URIs seems too manual, error-prone a process. And of course in general we do not want links to be broken. Currently we have done the following: - a 301 moved permanently redirection from the stitch.cs.vu.nl/rameau prototype to data.bnf.fr. - an owl:sameAs statement between the prototype URIs and the production ones, so that a client searching for data on the old URI gets data that enables it to make the connection with the original resource (URI) it was seeking data about. Does that seem ok? What should we do, otherwise? Seems OK to me :-) Kingsley Thanks for any feedback you could have, Antoine Isaac (VU Amsterdam side) Romain Wenz (BnF side) [1] RAMEAU is a vocabulary (thesaurus) used by the National Library of France (BnF) for describing books. -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca handle: @kidehen Google+ Profile: https://plus.google.com/112399767740508618350/about LinkedIn Profile: http://www.linkedin.com/in/kidehen smime.p7s Description: S/MIME Cryptographic Signature
Re: Question on moving linked data sets
On 4/19/12 10:23 AM, Antoine Isaac wrote: Dear all, We have a question on an what to do when a linked data set is moved from one namespace to the other. We searched for recipes to apply, but did not really find anything 'official' around... The VU university of Amsterdam has published a Linked Data SKOS representation of RAMEAU [1] as a prototype, several years ago. For example we have http://stitch.cs.vu.nl/vocabularies/rameau/ark:/12148/cb14521343b Recently, BnF implemented its own production service for RAMEAU. The previous concept is at: http://data.bnf.fr/ark:/12148/cb14521343b (see RDF at http://data.bnf.fr/14521343/web_semantique/rdf.xml) The production services makes the prototype obsolete. Our issue is how to properly transition from one to the other. Several services are using the URIs of the prototype. For example at the Library of Congress: http://id.loc.gov/authorities/subjects/sh2002000569 We can ask for the people we know to change their links. But identifying the users of URIs seems too manual, error-prone a process. And of course in general we do not want links to be broken. Currently we have done the following: - a 301 moved permanently redirection from the stitch.cs.vu.nl/rameau prototype to data.bnf.fr. - an owl:sameAs statement between the prototype URIs and the production ones, so that a client searching for data on the old URI gets data that enables it to make the connection with the original resource (URI) it was seeking data about. Does that seem ok? What should we do, otherwise? Thanks for any feedback you could have, Antoine Isaac (VU Amsterdam side) Romain Wenz (BnF side) [1] RAMEAU is a vocabulary (thesaurus) used by the National Library of France (BnF) for describing books. Forgot to paste this into my prior response . Here's why its OK: 1. http://linkeddata.uriburner.com/describe/?url=http%3A%2F%2Fdata.bnf.fr%2Fark%3A%2F12148%2Fcb14521343b -- new URI 2. http://linkeddata.uriburner.com/describe/?url=http%3A%2F%2Fstitch.cs.vu.nl%2Fvocabularies%2Frameau%2Fark%3A%2F12148%2Fcb14521343b -- old (prototype) URI 3. http://linkeddata.uriburner.com/describe/?url=http%3A%2F%2Fdata.bnf.fr%2Fark%3A%2F12148%2Fcb14521343bsas=yes -- new URI and effects of enabling owl:sameAs inference . -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca handle: @kidehen Google+ Profile: https://plus.google.com/112399767740508618350/about LinkedIn Profile: http://www.linkedin.com/in/kidehen smime.p7s Description: S/MIME Cryptographic Signature
RE: Question on moving linked data sets
This sounds like the best way to manage this transition. I believe the 301 redirection is precisely what Ed (CC'ed on the mail) did when directing traffic from the lcsh.info URIs to the new (at the time) id.loc URIs. (Speaking, of which, this email answers a question we asked ourselves here at LC about the relationship between Rameau at VU and Rameau at data.bnf, after Romain announced the addition of RAMEAU at data.bnf. I'll try to move a little faster updating the URIs in ID.) - an owl:sameAs statement between the prototype URIs and the production ones, so that a client searching for data on the old URI gets data that enables it to make the connection with the original resource (URI) it was seeking data about. -- The answer seems obvious, butThis would be expressed in the data available from data.bnf, correct? Yours, Kevin -- Kevin Ford Network Development and MARC Standards Office Library of Congress Washington, DC -Original Message- From: Antoine Isaac [mailto:ais...@few.vu.nl] Sent: Thursday, April 19, 2012 10:23 AM To: public-lod@w3.org Cc: romain.wenz; Antoine Isaac Subject: Question on moving linked data sets Dear all, We have a question on an what to do when a linked data set is moved from one namespace to the other. We searched for recipes to apply, but did not really find anything 'official' around... The VU university of Amsterdam has published a Linked Data SKOS representation of RAMEAU [1] as a prototype, several years ago. For example we have http://stitch.cs.vu.nl/vocabularies/rameau/ark:/12148/cb14521343b Recently, BnF implemented its own production service for RAMEAU. The previous concept is at: http://data.bnf.fr/ark:/12148/cb14521343b (see RDF at http://data.bnf.fr/14521343/web_semantique/rdf.xml) The production services makes the prototype obsolete. Our issue is how to properly transition from one to the other. Several services are using the URIs of the prototype. For example at the Library of Congress: http://id.loc.gov/authorities/subjects/sh2002000569 We can ask for the people we know to change their links. But identifying the users of URIs seems too manual, error-prone a process. And of course in general we do not want links to be broken. Currently we have done the following: - a 301 moved permanently redirection from the stitch.cs.vu.nl/rameau prototype to data.bnf.fr. - an owl:sameAs statement between the prototype URIs and the production ones, so that a client searching for data on the old URI gets data that enables it to make the connection with the original resource (URI) it was seeking data about. Does that seem ok? What should we do, otherwise? Thanks for any feedback you could have, Antoine Isaac (VU Amsterdam side) Romain Wenz (BnF side) [1] RAMEAU is a vocabulary (thesaurus) used by the National Library of France (BnF) for describing books.
Re: Question on moving linked data sets
Hello Antoine My take on this would be to use dcterms:isReplacedBy links rather than owl:sameAs Description of the concepts by BNF might change in the future and although the original identifier is the same, the description might be out of sync at some point. Bernard Le 19 avril 2012 16:23, Antoine Isaac ais...@few.vu.nl a écrit : Dear all, We have a question on an what to do when a linked data set is moved from one namespace to the other. We searched for recipes to apply, but did not really find anything 'official' around... The VU university of Amsterdam has published a Linked Data SKOS representation of RAMEAU [1] as a prototype, several years ago. For example we have http://stitch.cs.vu.nl/**vocabularies/rameau/ark:/**12148/cb14521343bhttp://stitch.cs.vu.nl/vocabularies/rameau/ark:/12148/cb14521343b Recently, BnF implemented its own production service for RAMEAU. The previous concept is at: http://data.bnf.fr/ark:/12148/**cb14521343bhttp://data.bnf.fr/ark:/12148/cb14521343b (see RDF at http://data.bnf.fr/14521343/**web_semantique/rdf.xmlhttp://data.bnf.fr/14521343/web_semantique/rdf.xml ) The production services makes the prototype obsolete. Our issue is how to properly transition from one to the other. Several services are using the URIs of the prototype. For example at the Library of Congress: http://id.loc.gov/authorities/**subjects/sh2002000569http://id.loc.gov/authorities/subjects/sh2002000569 We can ask for the people we know to change their links. But identifying the users of URIs seems too manual, error-prone a process. And of course in general we do not want links to be broken. Currently we have done the following: - a 301 moved permanently redirection from the stitch.cs.vu.nl/rameauprototype to data.bnf.fr. - an owl:sameAs statement between the prototype URIs and the production ones, so that a client searching for data on the old URI gets data that enables it to make the connection with the original resource (URI) it was seeking data about. Does that seem ok? What should we do, otherwise? Thanks for any feedback you could have, Antoine Isaac (VU Amsterdam side) Romain Wenz (BnF side) [1] RAMEAU is a vocabulary (thesaurus) used by the National Library of France (BnF) for describing books. -- *Bernard Vatant * Vocabularies Data Engineering Tel : + 33 (0)9 71 48 84 59 Skype : bernard.vatant Linked Open Vocabularies http://labs.mondeca.com/dataset/lov *Mondeca** ** * 3 cité Nollez 75018 Paris, France www.mondeca.com Follow us on Twitter : @mondecanews http://twitter.com/#%21/mondecanews
Re: Question on moving linked data sets
Hi Antoine, First, congratulations on http://data.bnf.fr/ that is a major milestone! On Thu, Apr 19, 2012 at 10:23 AM, Antoine Isaac ais...@few.vu.nl wrote: We can ask for the people we know to change their links. But identifying the users of URIs seems too manual, error-prone a process. And of course in general we do not want links to be broken. Currently we have done the following: - a 301 moved permanently redirection from the stitch.cs.vu.nl/rameau prototype to data.bnf.fr. - an owl:sameAs statement between the prototype URIs and the production ones, so that a client searching for data on the old URI gets data that enables it to make the connection with the original resource (URI) it was seeking data about. Does that seem ok? What should we do, otherwise? As you know when the SKOS concepts published at lcsh.info moved to id.loc.gov I had a similar situation :-) Like you I chose to do a mixture of technical and social things: - publish information about the move to relevant discussion lists - put some information up at lcsh.info about the move - permanently redirect (301) all resources to their new location (easy since it was the same app, and mod_rewrite could do it) - after a year of redirects I shut down lcsh.info and did not renew the domain (interestingly someone is squatting on it right now attempting to sell it, I think) Generally I think the linked data community should be encouraged to check their links, respect 301 redirects, and update their own link database appropriately. This is what Google and other major search engines do [1], and it's how the Web was designed to work, and continues to grow. While it's certainly cool when URIs don't change [1] I think it is somewhat irrational to expect URIs to be permanent. I hear people gripe about broken URLs in the digital preservation community quite a bit and it is pretty irritating, since any data that isn't actively used tends to rot...URLs really aren't that different. Of course there is certainly value in stable identifiers [1], but I think there is an opportunity for documenting and encouraging best practices on how to manage change (especially with respect to identifiers) in Linked Data. URNs, Namespaces and Registries [2] is partly helpful here, but a more succinct and URI focused presentation is needed. Or maybe a best practice document like this already exists and I haven't seen it yet. If that is the case I trust someone will let me know :-) //Ed [1] http://www.w3.org/Provider/Style/URI.html [2] http://www.w3.org/2001/tag/doc/URNsAndRegistries-50
Re: Question on moving linked data sets
On Thu, Apr 19, 2012 at 11:02 AM, Bernard Vatant bernard.vat...@mondeca.com wrote: My take on this would be to use dcterms:isReplacedBy links rather than owl:sameAs Description of the concepts by BNF might change in the future and although the original identifier is the same, the description might be out of sync at some point. I really like the idea of avoiding the owl:sameAs quagmire with an assertion that has less ontological consequences. When publishing data about data.bnf resources perhaps it might be a bit simpler to use dcterms:replaces instead, e.g. http://data.bnf.fr/ark:/12148/cb14521343b dcterns:replaces http://stitch.cs.vu.nl/vocabularies/rameau/ark:/12148/cb14521343b . //Ed
Re: Question on moving linked data sets
On Thu, Apr 19, 2012 at 1:23 PM, Ed Summers e...@pobox.com wrote: http://data.bnf.fr/ark:/12148/cb14521343b dcterns:replaces http://stitch.cs.vu.nl/vocabularies/rameau/ark:/12148/cb14521343b . s/dcterns/dcterms/ //Ed
Re: Question on moving linked data sets
DC:Terns... http://blog.terrain.org/wp-content/uploads/2011/04/LeastTern_Tom-Grey_U.jpg Jon On Thu, Apr 19, 2012 at 1:23 PM, Ed Summers e...@pobox.com wrote: On Thu, Apr 19, 2012 at 11:02 AM, Bernard Vatant bernard.vat...@mondeca.com wrote: My take on this would be to use dcterms:isReplacedBy links rather than owl:sameAs Description of the concepts by BNF might change in the future and although the original identifier is the same, the description might be out of sync at some point. I really like the idea of avoiding the owl:sameAs quagmire with an assertion that has less ontological consequences. When publishing data about data.bnf resources perhaps it might be a bit simpler to use dcterms:replaces instead, e.g. http://data.bnf.fr/ark:/12148/cb14521343b dcterns:replaces http://stitch.cs.vu.nl/vocabularies/rameau/ark:/12148/cb14521343b . //Ed
Re: Question on moving linked data sets
Hi All, I do not think that there is much else you could do. I presume the server doing the 301 is going to stay around for a while. ~Richard. On 19 April 2012 15:23, Antoine Isaac ais...@few.vu.nl wrote: Dear all, We have a question on an what to do when a linked data set is moved from one namespace to the other. We searched for recipes to apply, but did not really find anything 'official' around... The VU university of Amsterdam has published a Linked Data SKOS representation of RAMEAU [1] as a prototype, several years ago. For example we have http://stitch.cs.vu.nl/**vocabularies/rameau/ark:/**12148/cb14521343bhttp://stitch.cs.vu.nl/vocabularies/rameau/ark:/12148/cb14521343b Recently, BnF implemented its own production service for RAMEAU. The previous concept is at: http://data.bnf.fr/ark:/12148/**cb14521343bhttp://data.bnf.fr/ark:/12148/cb14521343b (see RDF at http://data.bnf.fr/14521343/**web_semantique/rdf.xmlhttp://data.bnf.fr/14521343/web_semantique/rdf.xml ) The production services makes the prototype obsolete. Our issue is how to properly transition from one to the other. Several services are using the URIs of the prototype. For example at the Library of Congress: http://id.loc.gov/authorities/**subjects/sh2002000569http://id.loc.gov/authorities/subjects/sh2002000569 We can ask for the people we know to change their links. But identifying the users of URIs seems too manual, error-prone a process. And of course in general we do not want links to be broken. Currently we have done the following: - a 301 moved permanently redirection from the stitch.cs.vu.nl/rameauprototype to data.bnf.fr. - an owl:sameAs statement between the prototype URIs and the production ones, so that a client searching for data on the old URI gets data that enables it to make the connection with the original resource (URI) it was seeking data about. Does that seem ok? What should we do, otherwise? Thanks for any feedback you could have, Antoine Isaac (VU Amsterdam side) Romain Wenz (BnF side) [1] RAMEAU is a vocabulary (thesaurus) used by the National Library of France (BnF) for describing books. -- Richard Wallis Technology Evangelist, OCLC: richard.wal...@oclc.org Founder, Data Liberate: richard.wal...@dataliberate.com http://dataliberate.com Tel: +44 (0)7767 886 005 Linkedin: http://www.linkedin.com/in/richardwallis Skype: richard.wallis1 Twitter: @rjw IM: rjw3...@hotmail.com
Announcing OWLIM 5.0 - with new transaction mechanism, performance improvements, SPARQL 1.1 graph store protocol and more
Ontotext are pleased to announce the release of OWLIM version 5.0 http://www.ontotext.com/owlim featuring a new transaction mechanism, performance improvements, SPARQL 1.1 graph store protocol, integration with TopBraid Composer/Live http://www.topquadrant.com/products/TB_Suite.html and many other improvements. The single most important new feature is the new transaction management mechanism which allows for much *more reliable and efficient handling of workloads where queries from multiple clients are combined with frequent updates* of the data. As benchmark results http://www.ontotext.com/owlim/benchmark-results/owlim-5 demonstrate, OWLIM 5.0 is *43% faster* than v.4.3 on the BSBM Explore and Update http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/spec/ scenario. As a result of several changes in the index structures, OWLIM now requires *between 25% and 70% less storage space*. Some of the most important improvements are listed below: * *Transaction management and isolation mechanisms* have been completely refactored. The previous strategy used lazy writing of modified database pages, such that dirty pages were only flushed to disk when further updates occur and no more memory is available. While extremely fast, the problem with this approach is that there is a considerable recovery time associated with replaying the transaction log after an abnormal termination. The new mechanism uses two modes: 'bulk-loading' (fast) with similar behaviour to previous versions and 'normal' (safe) where database modifications are flushed to disk as part of the commit operation. When running in safe mode, *database recovery is instant* and there is a *significant improvement in concurrency between updates and queries*. * *New context indices* can be used to improve query performance when data is modelled using many named graphs. These are switched on and off using a single configuration parameter enable-context-index * The *SPARQL 1.1 Graph Store HTTP Protocol* is now supported according to the W3C Working Draft http://www.w3.org/TR/sparql11-http-rdf-update/ from the 12th May 2011. This provides a REST interface for managing collections of graphs, using either directly or indirectly named graphs. * *Sesame http://www.openrdf.org* *2.6.5* with many bug-fixes and updates to bring SPARQL 1.1 Query http://www.w3.org/TR/2012/WD-sparql11-query-20120105/ support up to the latest W3C Working Draft from the 5th January 2012. * *Significant reduction in disk-space requirements* is achieved with the following modifications: o *Index compression* can now be used to reduce disk storage requirements by using zip compression on database pages. This feature if off by default, but can be switched on when creating a new repository. The configuration parameter index-compression-ratio can be set to -1 (the default value indicating no compression) or a value in the range 10-50 https://confluence.ontotext.com/pages/createpage.action?spaceKey=OWLIMinttitle=10-50linkCreation=truefromPageId=17596523 indicating the desired percentage reduction in page sizes. Any pages that can not be compressed by the specified amount are stored uncompressed. Therefore a compression ratio that is too aggressive will not bring many benefits. Experiments have shown that for large datasets a value of about 30% is close to optimal and leads to a total disk space saving of around 50%. o *Restructuring of the triple indices* has also led to a reduction in disk-space requirements of around 18% independent of the compression functionality o *Entity compression* is a modification that reduces the storage requirements for the lookup table that maps between internal identifiers and resources. This is transparent to the user and happens automatically. More disk space reductions are apparent using this version. * A new *literal index* is created automatically for numeric and date/time data-types. The index is used during query evaluation if a query or a sub-query (e.g. union) has a filter that is comprised of a conjunction of literal constraints, e.g. FILTER(?x = 3 ?y = 5 ?start 2001-01-01^^xsd:date). Other patterns, including those that use negation, will not use the index for this version of OWLIM. * Tighter integration with TopQuadrant http://www.topquadrant.com/'s TopBraid Composer http://www.topquadrant.com/products/TB_Composer.html (a graphical development environment for modelling data) and TopBraid Live http://www.topquadrant.com/products/TB_Live.html (an enterprise SOA-capable Semantic Web application platform). Contact the OWLIM team directly mailto:owlim-i...@ontotext.com for details of how to obtain the OWLIM plug-in. * All *control queries now use SPARQL Update syntax* (used mostly