Re: Get your dataset on the next LOD cloud diagram
On 7/13/11 12:04 AM, Giovanni Tummarello wrote: If you are seeking stats re. what I mean re. intertia, just keep track of what's happening on the schema.org front re. adoption curve. here are 100+ datasets http://sindice.com/search?q=schemanq=fq=class%3Ahttp%3A%2F%2Fschema.org%2F*sortbydate=1facet.field=domaininterface=guru started collecting 2 weeks ago and we did NOT reanalyze/recrawl previously known sites ATM . How fair is it to call them datasets rather than marked up pages that is up to discussion - possibly a reasonably interesting one. Gio Re., Linked Data: a dataset has to be collection of data objects endowed with URIs that resolve to human and machine decipherable representations of their referents. Representation takes the form of an EAV/SPO triples based directed graph pictorial. -- Regards, Kingsley Idehen President CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Re: Get your dataset on the next LOD cloud diagram
If a dataset has not been available in any Linked Data way for 3+ months it should not appear on the cloud we show people, coloured or not. It is no more. It has ceased to be. It has expired and gone to meet its maker. If it had not been nailed to the cloud it would have fallen off. It is an ex-dataset. We could have another diagram of datasets whose metadata processes are now history and have shuffled off this mortal coil, run down the SPARQL endpoint and joined the bleeding choir invisible, but that would only be useful as an historic document. http://dbpedia.org/resource/Dead_Parrot_sketch At this stage it is all about quality, not quantity. In fact 3+ months seems a long time - I would go for 1 month. Best Hugh On 12 Jul 2011, at 23:38, Kingsley Idehen wrote: On 7/12/11 11:21 PM, Pablo Mendes wrote: Thanks, Thomas. Giovanni, it was a coreference resolution problem from my side. You meant 'they'=datasets and I read 'they'=people. It was anyhow a possible question to come by and it's (hopefully) clearer now. Sorry for the confusion. Now to the intended question. I will discuss the issue of availability with my colleagues. But my personal opinion is that availability is an important quality indicator, and should be incorporated if feasible wrt to time and resource availability. Could we perhaps have others (e.g. Sindice, Openlink cloud cache, etc.) also providing their assessment of this specific indicator? It sounds like it's of shared interest and could benefit from multiple independent assessments. What do you think? Cheers, Pablo On Jul 12, 2011 10:54 PM, Thomas Steiner to...@google.com wrote: Datasets that are inaccessible for large amounts of time (e.g., 3+ months) ultimately undermine the LOD cloud. Rather than removing a dataset, why not color code LOD cloud bubbles using the same color scheme from: http://labs.mondeca.com/sparqlEndpointsStatus/index.html, if possible? For better or for worse, the LOD cloud pictorial is now a staple re. Linked Data marketing comms. collateral. -- Regards, Kingsley Idehen President CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen -- Hugh Glaser Chief Architect Seme4 Limited 18 Soho Square LONDON W1D 3QL Mobile: +44 7595334155 Main: +44 2070601590 hugh.gla...@seme4.com www.seme4.com Seme4 - the experts in semantic web and linked data applications Notice of Confidentiality. This e-mail message (including any attached documents) is proprietary and confidential to Seme4 Limited and/or its affiliates and may contain legally privileged information. It is intended for the named recipient(s) only. If you are not the intended recipient, you may not review, retain, copy or distribute this message and we ask you to notify the sender immediately, then delete this message from your system. Thank you for your cooperation. -- Hugh Glaser, Intelligence, Agents, Multimedia School of Electronics and Computer Science, University of Southampton, Southampton SO17 1BJ Work: +44 23 8059 3670, Fax: +44 23 8059 3045 Mobile: +44 75 9533 4155 , Home: +44 23 8061 5652 http://www.ecs.soton.ac.uk/~hg/
Re: Get your dataset on the next LOD cloud diagram
On 7/12/11 21:33 , Giovanni Tummarello wrote: Hi out of curiousity Will you be taking off the diagram those that are NOT online regularly? How about marking them as having one or more of the following: 1. A dump is available upon request to email 2. A dump is online at URL 3. A SPARQL endpoint available at URL 4. Sitemap available at URL Of course one might qualify availability/reliability as attributes to 2. - 4. but existence of a linked dataset shouldn't imply it being available online on a 24/7/36[45] basis. Yrjänä Gio On Tue, Jul 12, 2011 at 7:45 PM, Pablo Mendespablomen...@gmail.com wrote: Dear fellow Linked Open Data publishers and consumers, We are in the process of regenerating the next LOD cloud diagram and associated statistics [1]. We would like to invite those of you who publish data sets as Linked Data to join the other ~2000 data sets already in CKAN ( http://ckan.net ) to help us extend the list of ~300 candidates to the LOD cloud diagram. For those of you that already have entries on CKAN, we ask you to please review and update your entries accordingly. Please finalize your dataset descriptions until the end of this week to ensure that your entry will be considered for this round of the diagram. We will be analyzing all data sets tagged with lod in CKAN from the perspective of a data consumer, looking for best practices that make it easier to access, understand and use your data. The compliance with the best practices will be checked manually and with scripts that download and analyze data from the data sources. Therefore it is important that you provide as much information as possible in your CKAN entry. You can use the CKAN entry for DBpedia as one example: http://ckan.net/package/dbpedia In order to aid you in this quest, we have provided a validation page for your CKAN entry with step-by-step guidance for the information that we will be looking for: http://www4.wiwiss.fu-berlin.de/lodcloud/ckan/validator/ After you have completed the description of your data sets, we invite you to fill up this 5 minutes survey about your experience. This will help us to make the process easier, more complete and exciting for the next time around. http://www.surveymonkey.com/s/TDS3TML Thank you and happy dataset description! Cheers, Pablo, Anja, Richard and Chris [1] http://www4.wiwiss.fu-berlin.de/lodcloud/state/ -- Mr. Yrjana Rankka| gh...@openlinksw.com Developer, Virtuoso Team | http://www.openlinksw.com | Making Technology Work For You
Re: Get your dataset on the next LOD cloud diagram
Hi, On 12 July 2011 18:45, Pablo Mendes pablomen...@gmail.com wrote: Dear fellow Linked Open Data publishers and consumers, We are in the process of regenerating the next LOD cloud diagram and associated statistics [1]. ... This email prompted a discussion about how to the data collection or diagram could be improved or updated. As CKAN is an open platform and anyone can add additional tags to datasets, why doesn't everyone who is interested in seeing a particular improvement or alternate view of the data just go ahead and do it? There's no need to require all this to be done by one team on a fixed schedule. Some light co-ordination between people doing similar analyses would be worthwhile, but it wouldn't be hard to, e.g. tag datasets based on whether their Linked Data or SPARQL endpoint is available regularly, whether they're currently maintained, or (my current bug bear) whether the data dumps they publish parse with more than one tool chain. It'd be nice to see many different aspects of the cloud being explored. Cheers, L. -- Leigh Dodds Programme Manager, Talis Platform Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Re: Get your dataset on the next LOD cloud diagram
Hi LODers, The Web of Data is by definition an uncontrolled environment, and by nature constantly evolving. In this respect the cloud diagram is in my opinion a snapshot of the LOD at a particular moment. Last version is almost unreadable in a A4 paper and we passed the era of the more dataset we have the better. After the *Expansion era* now it's time for *quality and reliability era* :) In this context, a dead dataset has no place. (i) By dead dataset I also mean a dataset which is not maintained anymore. (ii) By dead dataset I mean a dataset which is neither accessible via a dump nor an endpoint. (i) may be solved by asking, just like a paper submission, data providers to update their CKAN dataset profile page for the new cloud diagram release... (ii) may be solved by filtering, among CKAN dataset collection, those which are not available (dump and endpoint) since last month. If this suggestion makes sense, I could help you on the last point by giving you SPARQL endpoint availability since last month http://bit.ly/dVztWw. Additionally, some cloud variants may be generated or SVG file could be given so may contribute to give a particular view of the cloud... Pierre-Yves Vandenbussche. On Wed, Jul 13, 2011 at 12:52 PM, Yrjana Rankka gh...@openlinksw.comwrote: On 7/12/11 21:33 , Giovanni Tummarello wrote: Hi out of curiousity Will you be taking off the diagram those that are NOT online regularly? How about marking them as having one or more of the following: 1. A dump is available upon request to email 2. A dump is online at URL 3. A SPARQL endpoint available at URL 4. Sitemap available at URL Of course one might qualify availability/reliability as attributes to 2. - 4. but existence of a linked dataset shouldn't imply it being available online on a 24/7/36[45] basis. Yrjänä Gio On Tue, Jul 12, 2011 at 7:45 PM, Pablo Mendespablomen...@gmail.com wrote: Dear fellow Linked Open Data publishers and consumers, We are in the process of regenerating the next LOD cloud diagram and associated statistics [1]. We would like to invite those of you who publish data sets as Linked Data to join the other ~2000 data sets already in CKAN ( http://ckan.net ) to help us extend the list of ~300 candidates to the LOD cloud diagram. For those of you that already have entries on CKAN, we ask you to please review and update your entries accordingly. Please finalize your dataset descriptions until the end of this week to ensure that your entry will be considered for this round of the diagram. We will be analyzing all data sets tagged with lod in CKAN from the perspective of a data consumer, looking for best practices that make it easier to access, understand and use your data. The compliance with the best practices will be checked manually and with scripts that download and analyze data from the data sources. Therefore it is important that you provide as much information as possible in your CKAN entry. You can use the CKAN entry for DBpedia as one example: http://ckan.net/package/**dbpedia http://ckan.net/package/dbpedia In order to aid you in this quest, we have provided a validation page for your CKAN entry with step-by-step guidance for the information that we will be looking for: http://www4.wiwiss.fu-berlin.**de/lodcloud/ckan/validator/http://www4.wiwiss.fu-berlin.de/lodcloud/ckan/validator/ After you have completed the description of your data sets, we invite you to fill up this 5 minutes survey about your experience. This will help us to make the process easier, more complete and exciting for the next time around. http://www.surveymonkey.com/s/**TDS3TMLhttp://www.surveymonkey.com/s/TDS3TML Thank you and happy dataset description! Cheers, Pablo, Anja, Richard and Chris [1] http://www4.wiwiss.fu-berlin.**de/lodcloud/state/http://www4.wiwiss.fu-berlin.de/lodcloud/state/ -- Mr. Yrjana Rankka| gh...@openlinksw.com Developer, Virtuoso Team | http://www.openlinksw.com | Making Technology Work For You
Re: Get your dataset on the next LOD cloud diagram
Re. availability, just a reminder of SPARQL Endpoints Status service http://labs.mondeca.com/sparqlEndpointsStatus/index.html As of today 80% (192/240) endpoints registered at CKAN are up and running. Monitor grey dots (still alive?) for candidate passed out datasets ... Bernard 2011/7/13 Leigh Dodds leigh.do...@talis.com: Hi, On 12 July 2011 18:45, Pablo Mendes pablomen...@gmail.com wrote: Dear fellow Linked Open Data publishers and consumers, We are in the process of regenerating the next LOD cloud diagram and associated statistics [1]. ... This email prompted a discussion about how to the data collection or diagram could be improved or updated. As CKAN is an open platform and anyone can add additional tags to datasets, why doesn't everyone who is interested in seeing a particular improvement or alternate view of the data just go ahead and do it? There's no need to require all this to be done by one team on a fixed schedule. Some light co-ordination between people doing similar analyses would be worthwhile, but it wouldn't be hard to, e.g. tag datasets based on whether their Linked Data or SPARQL endpoint is available regularly, whether they're currently maintained, or (my current bug bear) whether the data dumps they publish parse with more than one tool chain. It'd be nice to see many different aspects of the cloud being explored. Cheers, L. -- Leigh Dodds Programme Manager, Talis Platform Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS -- Bernard Vatant Senior Consultant Vocabulary Data Integration Tel: +33 (0) 971 488 459 Mail: bernard.vat...@mondeca.com Mondeca 3, cité Nollez 75018 Paris France Web: http://www.mondeca.com Blog: http://mondeca.wordpress.com
Re: Get your dataset on the next LOD cloud diagram
Hi, On 13 July 2011 13:05, Bernard Vatant bernard.vat...@mondeca.com wrote: Re. availability, just a reminder of SPARQL Endpoints Status service http://labs.mondeca.com/sparqlEndpointsStatus/index.html As of today 80% (192/240) endpoints registered at CKAN are up and running. Monitor grey dots (still alive?) for candidate passed out datasets ... Well as Kingsley pointed out SPARQL is only one metric. Whether the URIs still resolve is arguably most important for the Linked Data diagram, but service availability is a good thing to monitor. However its also worth noting that there are mirrors of a number of datasets. E.g. we have 70+ datasets in Kasabi, some new to the cloud, some of which are mirrors. Not all (any?) of those SPARQL endpoints are on your list. Cheers, L. -- Leigh Dodds Programme Manager, Talis Platform Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Re: Get your dataset on the next LOD cloud diagram
On Wed, Jul 13, 2011 at 1:05 PM, Bernard Vatant bernard.vat...@mondeca.com wrote: Re. availability, just a reminder of SPARQL Endpoints Status service http://labs.mondeca.com/sparqlEndpointsStatus/index.html As of today 80% (192/240) endpoints registered at CKAN are up and running. Monitor grey dots (still alive?) for candidate passed out datasets ... Just a small note on that - it looks like the SWI-Prolog SPARQL end points show up as gray dots, because SWI-Prolog, by default, 500s on the end point URI if the query parameter is not set. So for example, the John Peel DBTune dataset *is* actually alive and well, e.g. http://dbtune.org/bbc/peel/producer/e5826379ace5151894a6456d69fd1e41 Best, y Bernard 2011/7/13 Leigh Dodds leigh.do...@talis.com: Hi, On 12 July 2011 18:45, Pablo Mendes pablomen...@gmail.com wrote: Dear fellow Linked Open Data publishers and consumers, We are in the process of regenerating the next LOD cloud diagram and associated statistics [1]. ... This email prompted a discussion about how to the data collection or diagram could be improved or updated. As CKAN is an open platform and anyone can add additional tags to datasets, why doesn't everyone who is interested in seeing a particular improvement or alternate view of the data just go ahead and do it? There's no need to require all this to be done by one team on a fixed schedule. Some light co-ordination between people doing similar analyses would be worthwhile, but it wouldn't be hard to, e.g. tag datasets based on whether their Linked Data or SPARQL endpoint is available regularly, whether they're currently maintained, or (my current bug bear) whether the data dumps they publish parse with more than one tool chain. It'd be nice to see many different aspects of the cloud being explored. Cheers, L. -- Leigh Dodds Programme Manager, Talis Platform Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS -- Bernard Vatant Senior Consultant Vocabulary Data Integration Tel: +33 (0) 971 488 459 Mail: bernard.vat...@mondeca.com Mondeca 3, cité Nollez 75018 Paris France Web: http://www.mondeca.com Blog: http://mondeca.wordpress.com
Re: Get your dataset on the next LOD cloud diagram
On 7/13/11 11:36 AM, Hugh Glaser wrote: If a dataset has not been available in any Linked Data way for 3+ months it should not appear on the cloud we show people, coloured or not. It is no more. It has ceased to be. It has expired and gone to meet its maker. If it had not been nailed to the cloud it would have fallen off. It is an ex-dataset. We could have another diagram of datasets whose metadata processes are now history and have shuffled off this mortal coil, run down the SPARQL endpoint and joined the bleeding choir invisible, but that would only be useful as an historic document. http://dbpedia.org/resource/Dead_Parrot_sketch At this stage it is all about quality, not quantity. In fact 3+ months seems a long time - I would go for 1 month. Okay! +1 Kingsley Best Hugh On 12 Jul 2011, at 23:38, Kingsley Idehen wrote: On 7/12/11 11:21 PM, Pablo Mendes wrote: Thanks, Thomas. Giovanni, it was a coreference resolution problem from my side. You meant 'they'=datasets and I read 'they'=people. It was anyhow a possible question to come by and it's (hopefully) clearer now. Sorry for the confusion. Now to the intended question. I will discuss the issue of availability with my colleagues. But my personal opinion is that availability is an important quality indicator, and should be incorporated if feasible wrt to time and resource availability. Could we perhaps have others (e.g. Sindice, Openlink cloud cache, etc.) also providing their assessment of this specific indicator? It sounds like it's of shared interest and could benefit from multiple independent assessments. What do you think? Cheers, Pablo On Jul 12, 2011 10:54 PM, Thomas Steinerto...@google.com wrote: Datasets that are inaccessible for large amounts of time (e.g., 3+ months) ultimately undermine the LOD cloud. Rather than removing a dataset, why not color code LOD cloud bubbles using the same color scheme from: http://labs.mondeca.com/sparqlEndpointsStatus/index.html, if possible? For better or for worse, the LOD cloud pictorial is now a staple re. Linked Data marketing comms. collateral. -- Regards, Kingsley Idehen President CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen -- Hugh Glaser Chief Architect Seme4 Limited 18 Soho Square LONDON W1D 3QL Mobile: +44 7595334155 Main: +44 2070601590 hugh.gla...@seme4.com www.seme4.com Seme4 - the experts in semantic web and linked data applications Notice of Confidentiality. This e-mail message (including any attached documents) is proprietary and confidential to Seme4 Limited and/or its affiliates and may contain legally privileged information. It is intended for the named recipient(s) only. If you are not the intended recipient, you may not review, retain, copy or distribute this message and we ask you to notify the sender immediately, then delete this message from your system. Thank you for your cooperation. -- Regards, Kingsley Idehen President CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Re: Get your dataset on the next LOD cloud diagram
Hi all, On 13 July 2011 14:11, Leigh Dodds leigh.do...@talis.com wrote: Hi, On 13 July 2011 13:05, Bernard Vatant bernard.vat...@mondeca.com wrote: Re. availability, just a reminder of SPARQL Endpoints Status service http://labs.mondeca.com/sparqlEndpointsStatus/index.html As of today 80% (192/240) endpoints registered at CKAN are up and running. Monitor grey dots (still alive?) for candidate passed out datasets ... Well as Kingsley pointed out SPARQL is only one metric. Whether the URIs still resolve is arguably most important for the Linked Data diagram, but service availability is a good thing to monitor. +1 to Kingsley and Leigh's comments about the (questionable) value of the existence/availability of SPARQL endpoints as a measure of the aliveness of a data set. Availability of dumps is also a questionable metric, as many data sets will never have one. This may be because the data set is implemented as a wrapper around another API (so never fully materialised), or because the licensing terms of the data are not amenable to sharing dumps. On that note, many people talk about the LOD Cloud (rather than the more general Linked Data Cloud), but as Leigh demonstrated back in 2009 there is little clarity about the licensing terms of many of the data sets in the cloud, and I doubt the situation has changed much in the last two years; i.e. there's no guarantee of the O in LOD. More licensing clarity is important if we're expecting people to reuse our data, but universal openness of Linked Data in the Web is unrealistic for the foreseeable future, likewise universal availability of dumps. Cheers, Tom. -- Dr Tom Heath Lead Researcher Talis Systems Ltd W: http://www.talis.com/ W: http://tomheath.com/id/me Talis Systems Ltd is a company registered in England and Wales. Registered number: 07196440. Registered office: 43 Temple Row, Birmingham, B2 5LS, United Kingdom.
Re: Get your dataset on the next LOD cloud diagram
On 7/13/11 12:00 PM, Leigh Dodds wrote: Hi, On 12 July 2011 18:45, Pablo Mendespablomen...@gmail.com wrote: Dear fellow Linked Open Data publishers and consumers, We are in the process of regenerating the next LOD cloud diagram and associated statistics [1]. ... This email prompted a discussion about how to the data collection or diagram could be improved or updated. As CKAN is an open platform and anyone can add additional tags to datasets, why doesn't everyone who is interested in seeing a particular improvement or alternate view of the data just go ahead and do it? There's no need to require all this to be done by one team on a fixed schedule. Some light co-ordination between people doing similar analyses would be worthwhile, but it wouldn't be hard to, e.g. tag datasets based on whether their Linked Data or SPARQL endpoint is available regularly, whether they're currently maintained, or (my current bug bear) whether the data dumps they publish parse with more than one tool chain. It'd be nice to see many different aspects of the cloud being explored. Cheers, L. +1 There should be multiple clouds by now. Linked Data isn't monolithic or centralized. I encourage others to make other clouds. Especially a dynamic cloud that reflects the state of play in close to real-time :-) -- Regards, Kingsley Idehen President CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Re: Get your dataset on the next LOD cloud diagram
On 7/13/11 1:11 PM, Leigh Dodds wrote: Hi, On 13 July 2011 13:05, Bernard Vatantbernard.vat...@mondeca.com wrote: Re. availability, just a reminder of SPARQL Endpoints Status service http://labs.mondeca.com/sparqlEndpointsStatus/index.html As of today 80% (192/240) endpoints registered at CKAN are up and running. Monitor grey dots (still alive?) for candidate passed out datasets ... Well as Kingsley pointed out SPARQL is only one metric. Whether the URIs still resolve is arguably most important for the Linked Data diagram, but service availability is a good thing to monitor. However its also worth noting that there are mirrors of a number of datasets. E.g. we have 70+ datasets in Kasabi, some new to the cloud, some of which are mirrors. Not all (any?) of those SPARQL endpoints are on your list. Cheers, L. Leigh, Can you ping me or reply to this list with a list of missing SPARQL endpoints. Alternatively, you bookmark them on del.icio.us using tag: sparql_endpoint. Here is my collection: http://www.delicious.com/kidehen/sparql_endpoint . -- Regards, Kingsley Idehen President CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Re: Get your dataset on the next LOD cloud diagram
Hi, On 13 July 2011 14:30, Kingsley Idehen kide...@openlinksw.com wrote: Can you ping me or reply to this list with a list of missing SPARQL endpoints. Alternatively, you bookmark them on del.icio.us using tag: sparql_endpoint. Here is my collection: http://www.delicious.com/kidehen/sparql_endpoint . The data is all in a machine-readable form. See: http://data.kasabi.com/datasets The URI supports conneg so you can follow rdfs:seeAlso links to all of the VoiD descriptions and hence to the sparql endpoints, plus all of the other APIs. It'd be nice if the LD cloud diagram used other machine-readable sources where possible. I know CKAN is a good focal point for helping curate activity, but also frustrating to have to copy data around whether manually or otherwise. Cheers, L. -- Leigh Dodds Programme Manager, Talis Platform Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Re: Get your dataset on the next LOD cloud diagram
On 7/13/11 2:34 PM, Leigh Dodds wrote: Hi, On 13 July 2011 14:30, Kingsley Idehenkide...@openlinksw.com wrote: Can you ping me or reply to this list with a list of missing SPARQL endpoints. Alternatively, you bookmark them on del.icio.us using tag: sparql_endpoint. Here is my collection: http://www.delicious.com/kidehen/sparql_endpoint . The data is all in a machine-readable form. See: http://data.kasabi.com/datasets The URI supports conneg so you can follow rdfs:seeAlso links to all of the VoiD descriptions and hence to the sparql endpoints, plus all of the other APIs. It'd be nice if the LD cloud diagram used other machine-readable sources where possible. I know CKAN is a good focal point for helping curate activity, but also frustrating to have to copy data around whether manually or otherwise. Cheers, L. Leigh, I am seeking SPARQL endpoint URLs. Save me drilling down to each endpoint for each dataset :-) -- Regards, Kingsley Idehen President CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Re: Get your dataset on the next LOD cloud diagram
Hi out of curiousity Will you be taking off the diagram those that are NOT online regularly? Gio On Tue, Jul 12, 2011 at 7:45 PM, Pablo Mendes pablomen...@gmail.com wrote: Dear fellow Linked Open Data publishers and consumers, We are in the process of regenerating the next LOD cloud diagram and associated statistics [1]. We would like to invite those of you who publish data sets as Linked Data to join the other ~2000 data sets already in CKAN ( http://ckan.net ) to help us extend the list of ~300 candidates to the LOD cloud diagram. For those of you that already have entries on CKAN, we ask you to please review and update your entries accordingly. Please finalize your dataset descriptions until the end of this week to ensure that your entry will be considered for this round of the diagram. We will be analyzing all data sets tagged with lod in CKAN from the perspective of a data consumer, looking for best practices that make it easier to access, understand and use your data. The compliance with the best practices will be checked manually and with scripts that download and analyze data from the data sources. Therefore it is important that you provide as much information as possible in your CKAN entry. You can use the CKAN entry for DBpedia as one example: http://ckan.net/package/dbpedia In order to aid you in this quest, we have provided a validation page for your CKAN entry with step-by-step guidance for the information that we will be looking for: http://www4.wiwiss.fu-berlin.de/lodcloud/ckan/validator/ After you have completed the description of your data sets, we invite you to fill up this 5 minutes survey about your experience. This will help us to make the process easier, more complete and exciting for the next time around. http://www.surveymonkey.com/s/TDS3TML Thank you and happy dataset description! Cheers, Pablo, Anja, Richard and Chris [1] http://www4.wiwiss.fu-berlin.de/lodcloud/state/
Re: Get your dataset on the next LOD cloud diagram
Giovanni, Thanks for helping to build a preemptive QA for dataset providers. First, there is no 'taking off'. The 2007...2010 versions will remain online forever with the help of web archive. Second, I assume you refer to the relatively short time span between my message to the list and the desired date for finishing the entries for the new release. As you know, CKAN has been used for a while as a catalog for keeping updated entries, as well as the source for generating the diagram. The state of the lod page has instructions to add yourself to the cloud that are online 24x7. So, in comparison to all the time that people have been effectively updating their entries, my request to complete the updates still this week may sound a bit anxious. Dataset providers that cannot meet the 'deadline' for this release can update their entries in their own time frame and ensure their appearance in the next release. Other providers that feel they should appear in this release, but cannot meet the deadline can protest directly to my mailbox and I will do my best to accomodate everybody's needs. Cheers, Pablo On Jul 12, 2011 9:33 PM, Giovanni Tummarello giovanni.tummare...@deri.org wrote:
Re: Get your dataset on the next LOD cloud diagram
i meant a much simpler and significant thing. Go in CKAN click on the LOD tag, then start clicking around datasets. Many dont work, are offline etc. They have been for weeks or months. Are you checking these and removing them from the new lod diagram or will the lod diagram just grow regardless reality? thanks Gio Second, I assume you refer to the relatively short time span between my message to the list and the desired date for finishing the entries for the new release.
Re: Get your dataset on the next LOD cloud diagram
Many dont work, are offline etc. They have been for weeks or months. Are you checking these and removing them from the new lod diagram or will the lod diagram just grow regardless reality? Fair point, Giovanni. Pablo, I /believe/ Giovanni is referring to a recent related experiment [1] by Mondeca on SPARQL endpoint availability for CKAN resources. The result was that some SPARQL endpoints were down for a considerable period of time, which effectively makes their usage unreliable. Best, Tom [1] http://labs.mondeca.com/sparqlEndpointsStatus/index.html -- Thomas Steiner, Research Scientist, Google Inc. http://blog.tomayac.com, http://twitter.com/tomayac
Re: Get your dataset on the next LOD cloud diagram
Hi Giovanni, Will you be taking off the diagram those that are NOT online regularly? could you please be a bit more precise and clearly say which datasets you are talking about. Which datasets do not provide dereferencable URIs anymore? (Linked Data and the LOD diagram is not about SPARQL endpoints) A constructive approach, which I guess would be highly appreciated by the community, would be that you directly mark these datasets on CKAN using the tags that are proposed at the end of this page http://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation Cheers, Chris Hi out of curiousity Will you be taking off the diagram those that are NOT online regularly? Gio On Tue, Jul 12, 2011 at 7:45 PM, Pablo Mendes pablomen...@gmail.com wrote: Dear fellow Linked Open Data publishers and consumers, We are in the process of regenerating the next LOD cloud diagram and associated statistics [1]. We would like to invite those of you who publish data sets as Linked Data to join the other ~2000 data sets already in CKAN ( http://ckan.net ) to help us extend the list of ~300 candidates to the LOD cloud diagram. For those of you that already have entries on CKAN, we ask you to please review and update your entries accordingly. Please finalize your dataset descriptions until the end of this week to ensure that your entry will be considered for this round of the diagram. We will be analyzing all data sets tagged with lod in CKAN from the perspective of a data consumer, looking for best practices that make it easier to access, understand and use your data. The compliance with the best practices will be checked manually and with scripts that download and analyze data from the data sources. Therefore it is important that you provide as much information as possible in your CKAN entry. You can use the CKAN entry for DBpedia as one example: http://ckan.net/package/dbpedia In order to aid you in this quest, we have provided a validation page for your CKAN entry with step-by-step guidance for the information that we will be looking for: http://www4.wiwiss.fu-berlin.de/lodcloud/ckan/validator/ After you have completed the description of your data sets, we invite you to fill up this 5 minutes survey about your experience. This will help us to make the process easier, more complete and exciting for the next time around. http://www.surveymonkey.com/s/TDS3TML Thank you and happy dataset description! Cheers, Pablo, Anja, Richard and Chris [1] http://www4.wiwiss.fu-berlin.de/lodcloud/state/
Re: Get your dataset on the next LOD cloud diagram
I would think that Giovanni refers to the public tracker CKAN SPARQL Endpoint: http://labs.mondeca.com/sparqlEndpointsStatus/ I also would also recommend to remove the endpoints that indeed do show significant downtime such as 100% downtime 100% of the time for a consecutive period of more than 3 month Marco -- Marco Neumann KONA --- Join us at the Semantic Web Media Summit in New York City for an exciting event on 14 September 2011 http://www.lotico.com/evt/swmsNYC2011/ On Tue, Jul 12, 2011 at 6:05 PM, bi...@zedat.fu-berlin.de wrote: Hi Giovanni, Will you be taking off the diagram those that are NOT online regularly? could you please be a bit more precise and clearly say which datasets you are talking about. Which datasets do not provide dereferencable URIs anymore? (Linked Data and the LOD diagram is not about SPARQL endpoints) A constructive approach, which I guess would be highly appreciated by the community, would be that you directly mark these datasets on CKAN using the tags that are proposed at the end of this page http://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation Cheers, Chris Hi out of curiousity Will you be taking off the diagram those that are NOT online regularly? Gio On Tue, Jul 12, 2011 at 7:45 PM, Pablo Mendes pablomen...@gmail.com wrote: Dear fellow Linked Open Data publishers and consumers, We are in the process of regenerating the next LOD cloud diagram and associated statistics [1]. We would like to invite those of you who publish data sets as Linked Data to join the other ~2000 data sets already in CKAN ( http://ckan.net ) to help us extend the list of ~300 candidates to the LOD cloud diagram. For those of you that already have entries on CKAN, we ask you to please review and update your entries accordingly. Please finalize your dataset descriptions until the end of this week to ensure that your entry will be considered for this round of the diagram. We will be analyzing all data sets tagged with lod in CKAN from the perspective of a data consumer, looking for best practices that make it easier to access, understand and use your data. The compliance with the best practices will be checked manually and with scripts that download and analyze data from the data sources. Therefore it is important that you provide as much information as possible in your CKAN entry. You can use the CKAN entry for DBpedia as one example: http://ckan.net/package/dbpedia In order to aid you in this quest, we have provided a validation page for your CKAN entry with step-by-step guidance for the information that we will be looking for: http://www4.wiwiss.fu-berlin.de/lodcloud/ckan/validator/ After you have completed the description of your data sets, we invite you to fill up this 5 minutes survey about your experience. This will help us to make the process easier, more complete and exciting for the next time around. http://www.surveymonkey.com/s/TDS3TML Thank you and happy dataset description! Cheers, Pablo, Anja, Richard and Chris [1] http://www4.wiwiss.fu-berlin.de/lodcloud/state/
Re: Get your dataset on the next LOD cloud diagram
Thanks, Thomas. Giovanni, it was a coreference resolution problem from my side. You meant 'they'=datasets and I read 'they'=people. It was anyhow a possible question to come by and it's (hopefully) clearer now. Sorry for the confusion. Now to the intended question. I will discuss the issue of availability with my colleagues. But my personal opinion is that availability is an important quality indicator, and should be incorporated if feasible wrt to time and resource availability. Could we perhaps have others (e.g. Sindice, Openlink cloud cache, etc.) also providing their assessment of this specific indicator? It sounds like it's of shared interest and could benefit from multiple independent assessments. What do you think? Cheers, Pablo On Jul 12, 2011 10:54 PM, Thomas Steiner to...@google.com wrote:
Re: Get your dataset on the next LOD cloud diagram
Hi Chris, could you please be a bit more precise and clearly say which datasets you are talking about. One example is Semantic Crunchbase (http://cb.semsol.org/). Which datasets do not provide dereferencable URIs anymore? None of those seem to work any longer (try any of http://www.google.com/?q=site:http://cb.semsol.org/company%20filetype:rdf). (Linked Data and the LOD diagram is not about SPARQL endpoints) Fair enough. It is related though, according to TimBL's Linked Data principle #3 (http://www.w3.org/DesignIssues/LinkedData.html). A constructive approach, which I guess would be highly appreciated by the community, would be that you directly mark these datasets on CKAN using the tags that are proposed at the end of this page http://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation I have added the lodcloud.needsfixing tag on the Semantic Crunchbase CKAN page (http://ckan.net/package/semsol-crunchbase) plus a (maybe) helpful note with a statement from the dataset's maintainer. Best, Tom -- Thomas Steiner, Research Scientist, Google Inc. http://blog.tomayac.com, http://twitter.com/tomayac
Re: Get your dataset on the next LOD cloud diagram
On 7/12/11 11:21 PM, Pablo Mendes wrote: Thanks, Thomas. Giovanni, it was a coreference resolution problem from my side. You meant 'they'=datasets and I read 'they'=people. It was anyhow a possible question to come by and it's (hopefully) clearer now. Sorry for the confusion. Now to the intended question. I will discuss the issue of availability with my colleagues. But my personal opinion is that availability is an important quality indicator, and should be incorporated if feasible wrt to time and resource availability. Could we perhaps have others (e.g. Sindice, Openlink cloud cache, etc.) also providing their assessment of this specific indicator? It sounds like it's of shared interest and could benefit from multiple independent assessments. What do you think? Cheers, Pablo On Jul 12, 2011 10:54 PM, Thomas Steiner to...@google.com mailto:to...@google.com wrote: Datasets that are inaccessible for large amounts of time (e.g., 3+ months) ultimately undermine the LOD cloud. Rather than removing a dataset, why not color code LOD cloud bubbles using the same color scheme from: http://labs.mondeca.com/sparqlEndpointsStatus/index.html, if possible? For better or for worse, the LOD cloud pictorial is now a staple re. Linked Data marketing comms. collateral. -- Regards, Kingsley Idehen President CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Re: Get your dataset on the next LOD cloud diagram
Chris, i am not interested in specific content of the diagram, but rather i am interested in understanding what its value of it which depends on the method you're going to follow in the update. You're answeing this saying basically there wont be a check for old dead datasets. I admit never having looked at this closely but i think i cant be the only one thinking its a bit of a joke if we're telling people to publish data in a way.. that doesnt even have a way to know if data is thre or not? please notice that i am trying to be constructive by suggesting the diagram is made to mean something that one can rely on e.g. let me go see the latest diagram so that i can.. . a suggestion in this sense could be to require that linked data in ckan publishes URIs with sample data are given, that sites are exposing either dumps or a sitemap (so that they can be collected) etc. cheers Gio and uselessness of the initiative, of the diagram of ckan and more. On Wed, Jul 13, 2011 at 12:05 AM, bi...@zedat.fu-berlin.de wrote: Hi Giovanni, Will you be taking off the diagram those that are NOT online regularly? could you please be a bit more precise and clearly say which datasets you are talking about. Which datasets do not provide dereferencable URIs anymore? (Linked Data and the LOD diagram is not about SPARQL endpoints) A constructive approach, which I guess would be highly appreciated by the community, would be that you directly mark these datasets on CKAN using the tags that are proposed at the end of this page http://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation Cheers, Chris Hi out of curiousity Will you be taking off the diagram those that are NOT online regularly? Gio On Tue, Jul 12, 2011 at 7:45 PM, Pablo Mendes pablomen...@gmail.com wrote: Dear fellow Linked Open Data publishers and consumers, We are in the process of regenerating the next LOD cloud diagram and associated statistics [1]. We would like to invite those of you who publish data sets as Linked Data to join the other ~2000 data sets already in CKAN ( http://ckan.net ) to help us extend the list of ~300 candidates to the LOD cloud diagram. For those of you that already have entries on CKAN, we ask you to please review and update your entries accordingly. Please finalize your dataset descriptions until the end of this week to ensure that your entry will be considered for this round of the diagram. We will be analyzing all data sets tagged with lod in CKAN from the perspective of a data consumer, looking for best practices that make it easier to access, understand and use your data. The compliance with the best practices will be checked manually and with scripts that download and analyze data from the data sources. Therefore it is important that you provide as much information as possible in your CKAN entry. You can use the CKAN entry for DBpedia as one example: http://ckan.net/package/dbpedia In order to aid you in this quest, we have provided a validation page for your CKAN entry with step-by-step guidance for the information that we will be looking for: http://www4.wiwiss.fu-berlin.de/lodcloud/ckan/validator/ After you have completed the description of your data sets, we invite you to fill up this 5 minutes survey about your experience. This will help us to make the process easier, more complete and exciting for the next time around. http://www.surveymonkey.com/s/TDS3TML Thank you and happy dataset description! Cheers, Pablo, Anja, Richard and Chris [1] http://www4.wiwiss.fu-berlin.de/lodcloud/state/
Re: Get your dataset on the next LOD cloud diagram
On 7/12/11 11:33 PM, Thomas Steiner wrote: (Linked Data and the LOD diagram is not about SPARQL endpoints) Fair enough. It is related though, according to TimBL's Linked Data principle #3 (http://www.w3.org/DesignIssues/LinkedData.html). Really got to be careful there. SPARQL and RDF are implementation details re. Linked Data. Note, in the original Linked Data meme, point #3 read: When someone looks up a URI, provide useful information . As exemplified by this post and in many other conversations, the addition of: using the standards (RDF*, SPARQL) is a regressive update of a GOLDEN meme. Net effect has been to inject inertia into Linked Data's adoption curve. If you are seeking stats re. what I mean re. intertia, just keep track of what's happening on the schema.org front re. adoption curve. -- Regards, Kingsley Idehen President CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Re: Get your dataset on the next LOD cloud diagram
If you are seeking stats re. what I mean re. intertia, just keep track of what's happening on the schema.org front re. adoption curve. here are 100+ datasets http://sindice.com/search?q=schemanq=fq=class%3Ahttp%3A%2F%2Fschema.org%2F*sortbydate=1facet.field=domaininterface=guru started collecting 2 weeks ago and we did NOT reanalyze/recrawl previously known sites ATM . How fair is it to call them datasets rather than marked up pages that is up to discussion - possibly a reasonably interesting one. Gio