Re: Get your dataset on the next LOD cloud diagram

2011-07-13 Thread Kingsley Idehen

On 7/13/11 12:04 AM, Giovanni Tummarello wrote:

If you are seeking stats re. what I mean re. intertia, just keep track of
what's happening on the schema.org front re. adoption curve.


  here are 100+ datasets

http://sindice.com/search?q=schemanq=fq=class%3Ahttp%3A%2F%2Fschema.org%2F*sortbydate=1facet.field=domaininterface=guru

started collecting 2 weeks ago and we did NOT reanalyze/recrawl
previously known sites ATM . How fair is it to call them datasets
rather than marked up pages that is up to discussion - possibly a
reasonably interesting one.

Gio




Re., Linked Data:  a dataset has to be collection of data objects 
endowed with URIs that resolve to human and machine decipherable 
representations of their referents. Representation takes the form of an 
EAV/SPO triples based directed graph pictorial.


--

Regards,

Kingsley Idehen 
President  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen








Re: Get your dataset on the next LOD cloud diagram

2011-07-13 Thread Hugh Glaser
If a dataset has not been available in any Linked Data way for 3+ months it 
should not appear on the cloud we show people, coloured or not.

It is no more. It has ceased to be. It has expired and gone to meet its maker. 
If it had not been nailed to the cloud it would have fallen off. It is an 
ex-dataset.
We could have another diagram of datasets whose metadata processes are now 
history and have shuffled off this mortal coil, run down the SPARQL endpoint 
and joined the bleeding choir invisible, but that would only be useful as an 
historic document.
http://dbpedia.org/resource/Dead_Parrot_sketch

At this stage it is all about quality, not quantity.
In fact 3+ months seems a long time - I would go for 1 month.

Best
Hugh

On 12 Jul 2011, at 23:38, Kingsley Idehen wrote:

 On 7/12/11 11:21 PM, Pablo Mendes wrote:
 Thanks, Thomas.
 
 Giovanni, it was a coreference resolution problem from my side. You meant 
 'they'=datasets and I read 'they'=people. It was anyhow a possible question 
 to come by and it's (hopefully) clearer now. Sorry for the confusion.
 
 Now to the intended question.
 I will discuss the issue of availability with my colleagues. But my personal 
 opinion is that availability is an important quality indicator, and should 
 be incorporated if feasible wrt to time and resource availability. Could we 
 perhaps have others (e.g. Sindice, Openlink cloud cache, etc.) also 
 providing their assessment of this specific indicator? It sounds like it's 
 of shared interest and could benefit from multiple independent assessments.
 
 What do you think?
 
 Cheers,
 Pablo
 
 On Jul 12, 2011 10:54 PM, Thomas Steiner to...@google.com wrote:
 
 Datasets that are inaccessible for large amounts of time (e.g., 3+ months) 
 ultimately undermine the LOD cloud. Rather than removing a dataset, why not 
 color code LOD cloud bubbles using the same color scheme from: 
 http://labs.mondeca.com/sparqlEndpointsStatus/index.html, if possible?
 
 For better or for worse, the LOD cloud pictorial is now a staple re. Linked 
 Data marketing comms. collateral. 
 -- 
 
 Regards,
 
 Kingsley Idehen 
 President  CEO 
 OpenLink Software 
 Web: 
 http://www.openlinksw.com
 
 Weblog: 
 http://www.openlinksw.com/blog/~kidehen
 
 Twitter/Identi.ca: kidehen 
 
 
 
 
 


--
Hugh Glaser
Chief Architect
Seme4 Limited
18 Soho Square
LONDON
W1D 3QL
Mobile: +44 7595334155
Main: +44 2070601590

hugh.gla...@seme4.com
www.seme4.com

Seme4 - the experts in semantic web and linked data applications

Notice of Confidentiality. This e-mail message (including any attached
documents) is proprietary and confidential to Seme4 Limited and/or its
affiliates and may contain legally privileged information. It is intended
for the named recipient(s) only. If you are not the intended recipient,
you may not review, retain, copy or distribute this message and we ask you
to notify the sender immediately, then delete this message from your
system. Thank you for your cooperation.

-- 
Hugh Glaser,  
  Intelligence, Agents, Multimedia
  School of Electronics and Computer Science,
  University of Southampton,
  Southampton SO17 1BJ
Work: +44 23 8059 3670, Fax: +44 23 8059 3045
Mobile: +44 75 9533 4155 , Home: +44 23 8061 5652
http://www.ecs.soton.ac.uk/~hg/





Re: Get your dataset on the next LOD cloud diagram

2011-07-13 Thread Yrjana Rankka

On 7/12/11 21:33 , Giovanni Tummarello wrote:

Hi out of curiousity
Will you be taking off the diagram those that are NOT online regularly?

How about marking them as having one or more of the following:

1. A dump is available upon request to email
2. A dump is online at URL
3. A SPARQL endpoint available at URL
4. Sitemap available at URL

Of course one might qualify availability/reliability as attributes to 2. 
- 4. but existence of a linked dataset shouldn't imply it being 
available online on a 24/7/36[45] basis.


Yrjänä


Gio

On Tue, Jul 12, 2011 at 7:45 PM, Pablo Mendespablomen...@gmail.com  wrote:

Dear fellow Linked Open Data publishers and consumers,
We are in the process of regenerating the next LOD cloud diagram and
associated statistics [1]. We would like to invite those of you who publish
data sets as Linked Data to join the other ~2000 data sets already in CKAN (
http://ckan.net ) to help us extend the list of ~300 candidates to the LOD
cloud diagram. For those of you that already have entries on CKAN, we ask
you to please review and update your entries accordingly. Please finalize
your dataset descriptions until the end of this week to ensure that your
entry will be considered for this round of the diagram.

We will be analyzing all data sets tagged with lod in CKAN from the
perspective of a data consumer, looking for best practices that make it
easier to access, understand and use your data. The compliance with the best
practices will be checked manually and with scripts that download and
analyze data from the data sources. Therefore it is important that you
provide as much information as possible in your CKAN entry.

You can use the CKAN entry for DBpedia as one example:
http://ckan.net/package/dbpedia

In order to aid you in this quest, we have provided a validation page for
your CKAN entry with step-by-step guidance for the information that we will
be looking for:
http://www4.wiwiss.fu-berlin.de/lodcloud/ckan/validator/

After you have completed the description of your data sets, we invite you to
fill up this 5 minutes survey about your experience. This will help us to
make the process easier, more complete and exciting for the next time
around.
http://www.surveymonkey.com/s/TDS3TML

Thank you and happy dataset description!

Cheers,
Pablo, Anja, Richard and Chris
[1] http://www4.wiwiss.fu-berlin.de/lodcloud/state/



--
Mr. Yrjana Rankka| gh...@openlinksw.com
Developer, Virtuoso Team | http://www.openlinksw.com
 | Making Technology Work For You




Re: Get your dataset on the next LOD cloud diagram

2011-07-13 Thread Leigh Dodds
Hi,

On 12 July 2011 18:45, Pablo Mendes pablomen...@gmail.com wrote:
 Dear fellow Linked Open Data publishers and consumers,
 We are in the process of regenerating the next LOD cloud diagram and
 associated statistics [1].
 ...

This email prompted a discussion about how to the data collection or
diagram could be improved or updated. As CKAN is an open platform and
anyone can add additional tags to datasets, why doesn't everyone who
is interested in seeing a particular improvement or alternate view of
the data just go ahead and do it? There's no need to require all this
to be done by one team on a fixed schedule.

Some light co-ordination between people doing similar analyses would
be worthwhile, but it wouldn't be hard to, e.g. tag datasets based on
whether their Linked Data or SPARQL endpoint is available regularly,
whether they're currently maintained, or (my current bug bear) whether
the data dumps they publish parse with more than one tool chain.

It'd be nice to see many different aspects of the cloud being explored.

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Get your dataset on the next LOD cloud diagram

2011-07-13 Thread Pierre-Yves Vandenbussche
Hi LODers,

The Web of Data is by definition an uncontrolled environment, and by nature
constantly evolving. In this respect the cloud diagram is in my opinion a
snapshot of the LOD at a particular moment. Last version is almost
unreadable in a A4 paper and we passed the era of the more dataset we have
the better. After the *Expansion era* now it's time for *quality and
reliability era* :) In this context, a dead dataset has no place. (i) By
dead dataset I also mean a dataset which is not maintained anymore. (ii) By
dead dataset I mean a dataset which is neither accessible via a dump nor an
endpoint.

(i) may be solved by asking, just like a paper submission, data providers to
update their CKAN dataset profile page for the new cloud diagram release...

(ii) may be solved by filtering, among CKAN dataset collection, those which
are not available (dump and endpoint) since last month.

If this suggestion makes sense, I could help you on the last point by giving
you SPARQL endpoint availability since last month http://bit.ly/dVztWw.

Additionally, some cloud variants may be generated or SVG file could be
given so may contribute to give a particular view of the cloud...



Pierre-Yves Vandenbussche.


On Wed, Jul 13, 2011 at 12:52 PM, Yrjana Rankka gh...@openlinksw.comwrote:

 On 7/12/11 21:33 , Giovanni Tummarello wrote:

 Hi out of curiousity
 Will you be taking off the diagram those that are NOT online regularly?

 How about marking them as having one or more of the following:

 1. A dump is available upon request to email
 2. A dump is online at URL
 3. A SPARQL endpoint available at URL
 4. Sitemap available at URL

 Of course one might qualify availability/reliability as attributes to 2. -
 4. but existence of a linked dataset shouldn't imply it being available
 online on a 24/7/36[45] basis.

 Yrjänä

  Gio

 On Tue, Jul 12, 2011 at 7:45 PM, Pablo Mendespablomen...@gmail.com
  wrote:

 Dear fellow Linked Open Data publishers and consumers,
 We are in the process of regenerating the next LOD cloud diagram and
 associated statistics [1]. We would like to invite those of you who
 publish
 data sets as Linked Data to join the other ~2000 data sets already in
 CKAN (
 http://ckan.net ) to help us extend the list of ~300 candidates to the
 LOD
 cloud diagram. For those of you that already have entries on CKAN, we ask
 you to please review and update your entries accordingly. Please finalize
 your dataset descriptions until the end of this week to ensure that your
 entry will be considered for this round of the diagram.

 We will be analyzing all data sets tagged with lod in CKAN from the
 perspective of a data consumer, looking for best practices that make it
 easier to access, understand and use your data. The compliance with the
 best
 practices will be checked manually and with scripts that download and
 analyze data from the data sources. Therefore it is important that you
 provide as much information as possible in your CKAN entry.

 You can use the CKAN entry for DBpedia as one example:
 http://ckan.net/package/**dbpedia http://ckan.net/package/dbpedia

 In order to aid you in this quest, we have provided a validation page for
 your CKAN entry with step-by-step guidance for the information that we
 will
 be looking for:
 http://www4.wiwiss.fu-berlin.**de/lodcloud/ckan/validator/http://www4.wiwiss.fu-berlin.de/lodcloud/ckan/validator/

 After you have completed the description of your data sets, we invite you
 to
 fill up this 5 minutes survey about your experience. This will help us to
 make the process easier, more complete and exciting for the next time
 around.
 http://www.surveymonkey.com/s/**TDS3TMLhttp://www.surveymonkey.com/s/TDS3TML

 Thank you and happy dataset description!

 Cheers,
 Pablo, Anja, Richard and Chris
 [1] 
 http://www4.wiwiss.fu-berlin.**de/lodcloud/state/http://www4.wiwiss.fu-berlin.de/lodcloud/state/



 --
 Mr. Yrjana Rankka| gh...@openlinksw.com
 Developer, Virtuoso Team | http://www.openlinksw.com
 | Making Technology Work For You





Re: Get your dataset on the next LOD cloud diagram

2011-07-13 Thread Bernard Vatant
Re. availability, just a reminder of SPARQL Endpoints Status service
http://labs.mondeca.com/sparqlEndpointsStatus/index.html
As of today 80% (192/240) endpoints registered at CKAN are up and running.
Monitor grey dots (still alive?) for candidate passed out datasets ...

Bernard

2011/7/13 Leigh Dodds leigh.do...@talis.com:
 Hi,

 On 12 July 2011 18:45, Pablo Mendes pablomen...@gmail.com wrote:
 Dear fellow Linked Open Data publishers and consumers,
 We are in the process of regenerating the next LOD cloud diagram and
 associated statistics [1].
 ...

 This email prompted a discussion about how to the data collection or
 diagram could be improved or updated. As CKAN is an open platform and
 anyone can add additional tags to datasets, why doesn't everyone who
 is interested in seeing a particular improvement or alternate view of
 the data just go ahead and do it? There's no need to require all this
 to be done by one team on a fixed schedule.

 Some light co-ordination between people doing similar analyses would
 be worthwhile, but it wouldn't be hard to, e.g. tag datasets based on
 whether their Linked Data or SPARQL endpoint is available regularly,
 whether they're currently maintained, or (my current bug bear) whether
 the data dumps they publish parse with more than one tool chain.

 It'd be nice to see many different aspects of the cloud being explored.

 Cheers,

 L.

 --
 Leigh Dodds
 Programme Manager, Talis Platform
 Mobile: 07850 928381
 http://kasabi.com
 http://talis.com

 Talis Systems Ltd
 43 Temple Row
 Birmingham
 B2 5LS





-- 
Bernard Vatant
Senior Consultant
Vocabulary  Data Integration
Tel:       +33 (0) 971 488 459
Mail:     bernard.vat...@mondeca.com

Mondeca
3, cité Nollez 75018 Paris France
Web:    http://www.mondeca.com
Blog:    http://mondeca.wordpress.com




Re: Get your dataset on the next LOD cloud diagram

2011-07-13 Thread Leigh Dodds
Hi,

On 13 July 2011 13:05, Bernard Vatant bernard.vat...@mondeca.com wrote:
 Re. availability, just a reminder of SPARQL Endpoints Status service
 http://labs.mondeca.com/sparqlEndpointsStatus/index.html
 As of today 80% (192/240) endpoints registered at CKAN are up and running.
 Monitor grey dots (still alive?) for candidate passed out datasets ...

Well as Kingsley pointed out SPARQL is only one metric. Whether the
URIs still resolve is arguably most important for the Linked Data
diagram, but service availability is a good thing to monitor.

However its also worth noting that there are mirrors of a number of
datasets. E.g. we have 70+ datasets in Kasabi, some new to the cloud,
some of which are mirrors. Not all (any?) of those SPARQL endpoints
are on your list.

Cheers,

L.
-- 
Leigh Dodds
Programme Manager, Talis Platform
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Get your dataset on the next LOD cloud diagram

2011-07-13 Thread Yves Raimond
On Wed, Jul 13, 2011 at 1:05 PM, Bernard Vatant
bernard.vat...@mondeca.com wrote:
 Re. availability, just a reminder of SPARQL Endpoints Status service
 http://labs.mondeca.com/sparqlEndpointsStatus/index.html
 As of today 80% (192/240) endpoints registered at CKAN are up and running.
 Monitor grey dots (still alive?) for candidate passed out datasets ...

Just a small note on that - it looks like the SWI-Prolog SPARQL end
points show up as gray dots, because SWI-Prolog, by default, 500s on
the end point URI if the query parameter is not set.

So for example, the John Peel DBTune dataset *is* actually alive and
well, e.g. http://dbtune.org/bbc/peel/producer/e5826379ace5151894a6456d69fd1e41

Best,
y


 Bernard

 2011/7/13 Leigh Dodds leigh.do...@talis.com:
 Hi,

 On 12 July 2011 18:45, Pablo Mendes pablomen...@gmail.com wrote:
 Dear fellow Linked Open Data publishers and consumers,
 We are in the process of regenerating the next LOD cloud diagram and
 associated statistics [1].
 ...

 This email prompted a discussion about how to the data collection or
 diagram could be improved or updated. As CKAN is an open platform and
 anyone can add additional tags to datasets, why doesn't everyone who
 is interested in seeing a particular improvement or alternate view of
 the data just go ahead and do it? There's no need to require all this
 to be done by one team on a fixed schedule.

 Some light co-ordination between people doing similar analyses would
 be worthwhile, but it wouldn't be hard to, e.g. tag datasets based on
 whether their Linked Data or SPARQL endpoint is available regularly,
 whether they're currently maintained, or (my current bug bear) whether
 the data dumps they publish parse with more than one tool chain.

 It'd be nice to see many different aspects of the cloud being explored.

 Cheers,

 L.

 --
 Leigh Dodds
 Programme Manager, Talis Platform
 Mobile: 07850 928381
 http://kasabi.com
 http://talis.com

 Talis Systems Ltd
 43 Temple Row
 Birmingham
 B2 5LS





 --
 Bernard Vatant
 Senior Consultant
 Vocabulary  Data Integration
 Tel:       +33 (0) 971 488 459
 Mail:     bernard.vat...@mondeca.com
 
 Mondeca
 3, cité Nollez 75018 Paris France
 Web:    http://www.mondeca.com
 Blog:    http://mondeca.wordpress.com
 





Re: Get your dataset on the next LOD cloud diagram

2011-07-13 Thread Kingsley Idehen

On 7/13/11 11:36 AM, Hugh Glaser wrote:

If a dataset has not been available in any Linked Data way for 3+ months it 
should not appear on the cloud we show people, coloured or not.

It is no more. It has ceased to be. It has expired and gone to meet its maker. 
If it had not been nailed to the cloud it would have fallen off. It is an 
ex-dataset.
We could have another diagram of datasets whose metadata processes are now 
history and have shuffled off this mortal coil, run down the SPARQL endpoint 
and joined the bleeding choir invisible, but that would only be useful as an 
historic document.
http://dbpedia.org/resource/Dead_Parrot_sketch

At this stage it is all about quality, not quantity.
In fact 3+ months seems a long time - I would go for 1 month.


Okay!
+1

Kingsley

Best
Hugh

On 12 Jul 2011, at 23:38, Kingsley Idehen wrote:


On 7/12/11 11:21 PM, Pablo Mendes wrote:

Thanks, Thomas.

Giovanni, it was a coreference resolution problem from my side. You meant 
'they'=datasets and I read 'they'=people. It was anyhow a possible question to 
come by and it's (hopefully) clearer now. Sorry for the confusion.

Now to the intended question.
I will discuss the issue of availability with my colleagues. But my personal 
opinion is that availability is an important quality indicator, and should be 
incorporated if feasible wrt to time and resource availability. Could we 
perhaps have others (e.g. Sindice, Openlink cloud cache, etc.) also providing 
their assessment of this specific indicator? It sounds like it's of shared 
interest and could benefit from multiple independent assessments.

What do you think?

Cheers,
Pablo

On Jul 12, 2011 10:54 PM, Thomas Steinerto...@google.com  wrote:

Datasets that are inaccessible for large amounts of time (e.g., 3+ months) 
ultimately undermine the LOD cloud. Rather than removing a dataset, why not 
color code LOD cloud bubbles using the same color scheme from: 
http://labs.mondeca.com/sparqlEndpointsStatus/index.html, if possible?

For better or for worse, the LOD cloud pictorial is now a staple re. Linked 
Data marketing comms. collateral.
--

Regards,

Kingsley Idehen 
President  CEO
OpenLink Software
Web:
http://www.openlinksw.com

Weblog:
http://www.openlinksw.com/blog/~kidehen

Twitter/Identi.ca: kidehen







--
Hugh Glaser
Chief Architect
Seme4 Limited
18 Soho Square
LONDON
W1D 3QL
Mobile: +44 7595334155
Main: +44 2070601590

hugh.gla...@seme4.com
www.seme4.com

Seme4 - the experts in semantic web and linked data applications

Notice of Confidentiality. This e-mail message (including any attached
documents) is proprietary and confidential to Seme4 Limited and/or its
affiliates and may contain legally privileged information. It is intended
for the named recipient(s) only. If you are not the intended recipient,
you may not review, retain, copy or distribute this message and we ask you
to notify the sender immediately, then delete this message from your
system. Thank you for your cooperation.





--

Regards,

Kingsley Idehen 
President  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen








Re: Get your dataset on the next LOD cloud diagram

2011-07-13 Thread Tom Heath
Hi all,

On 13 July 2011 14:11, Leigh Dodds leigh.do...@talis.com wrote:
 Hi,

 On 13 July 2011 13:05, Bernard Vatant bernard.vat...@mondeca.com wrote:
 Re. availability, just a reminder of SPARQL Endpoints Status service
 http://labs.mondeca.com/sparqlEndpointsStatus/index.html
 As of today 80% (192/240) endpoints registered at CKAN are up and running.
 Monitor grey dots (still alive?) for candidate passed out datasets ...

 Well as Kingsley pointed out SPARQL is only one metric. Whether the
 URIs still resolve is arguably most important for the Linked Data
 diagram, but service availability is a good thing to monitor.

+1 to Kingsley and Leigh's comments about the (questionable) value of
the existence/availability of SPARQL endpoints as a measure of the
aliveness of a data set.

Availability of dumps is also a questionable metric, as many data sets
will never have one. This may be because the data set is implemented
as a wrapper around another API (so never fully materialised), or
because the licensing terms of the data are not amenable to sharing
dumps.

On that note, many people talk about the LOD Cloud (rather than the
more general Linked Data Cloud), but as Leigh demonstrated back in
2009 there is little clarity about the licensing terms of many of the
data sets in the cloud, and I doubt the situation has changed much in
the last two years; i.e. there's no guarantee of the O in LOD. More
licensing clarity is important if we're expecting people to reuse our
data, but universal openness of Linked Data in the Web is unrealistic
for the foreseeable future, likewise universal availability of dumps.

Cheers,

Tom.

-- 
Dr Tom Heath
Lead Researcher
Talis Systems Ltd
W: http://www.talis.com/
W: http://tomheath.com/id/me

Talis Systems Ltd is a company registered in England and Wales.
Registered number: 07196440. Registered office: 43 Temple Row,
Birmingham, B2 5LS, United Kingdom.



Re: Get your dataset on the next LOD cloud diagram

2011-07-13 Thread Kingsley Idehen

On 7/13/11 12:00 PM, Leigh Dodds wrote:

Hi,

On 12 July 2011 18:45, Pablo Mendespablomen...@gmail.com  wrote:

Dear fellow Linked Open Data publishers and consumers,
We are in the process of regenerating the next LOD cloud diagram and
associated statistics [1].
...

This email prompted a discussion about how to the data collection or
diagram could be improved or updated. As CKAN is an open platform and
anyone can add additional tags to datasets, why doesn't everyone who
is interested in seeing a particular improvement or alternate view of
the data just go ahead and do it? There's no need to require all this
to be done by one team on a fixed schedule.

Some light co-ordination between people doing similar analyses would
be worthwhile, but it wouldn't be hard to, e.g. tag datasets based on
whether their Linked Data or SPARQL endpoint is available regularly,
whether they're currently maintained, or (my current bug bear) whether
the data dumps they publish parse with more than one tool chain.

It'd be nice to see many different aspects of the cloud being explored.

Cheers,

L.


+1

There should be multiple clouds by now. Linked Data isn't monolithic or 
centralized.


I encourage others to make other clouds. Especially a dynamic cloud that 
reflects the state of play in close to real-time :-)


--

Regards,

Kingsley Idehen 
President  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen








Re: Get your dataset on the next LOD cloud diagram

2011-07-13 Thread Kingsley Idehen

On 7/13/11 1:11 PM, Leigh Dodds wrote:

Hi,

On 13 July 2011 13:05, Bernard Vatantbernard.vat...@mondeca.com  wrote:

Re. availability, just a reminder of SPARQL Endpoints Status service
http://labs.mondeca.com/sparqlEndpointsStatus/index.html
As of today 80% (192/240) endpoints registered at CKAN are up and running.
Monitor grey dots (still alive?) for candidate passed out datasets ...

Well as Kingsley pointed out SPARQL is only one metric. Whether the
URIs still resolve is arguably most important for the Linked Data
diagram, but service availability is a good thing to monitor.

However its also worth noting that there are mirrors of a number of
datasets. E.g. we have 70+ datasets in Kasabi, some new to the cloud,
some of which are mirrors. Not all (any?) of those SPARQL endpoints
are on your list.

Cheers,

L.

Leigh,

Can you ping me or reply to this list with a list of missing SPARQL 
endpoints. Alternatively, you bookmark them on del.icio.us using tag: 
sparql_endpoint.


Here is my collection: http://www.delicious.com/kidehen/sparql_endpoint .

--

Regards,

Kingsley Idehen 
President  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen








Re: Get your dataset on the next LOD cloud diagram

2011-07-13 Thread Leigh Dodds
Hi,

On 13 July 2011 14:30, Kingsley Idehen kide...@openlinksw.com wrote:
 Can you ping me or reply to this list with a list of missing SPARQL
 endpoints. Alternatively, you bookmark them on del.icio.us using tag:
 sparql_endpoint.

 Here is my collection: http://www.delicious.com/kidehen/sparql_endpoint .

The data is all in a machine-readable form. See:

http://data.kasabi.com/datasets

The URI supports conneg so you can follow rdfs:seeAlso links to all of
the VoiD descriptions and hence to the sparql endpoints, plus all of
the other APIs.

It'd be nice if the LD cloud diagram used other machine-readable
sources where possible. I know CKAN is a good focal point for helping
curate activity, but also frustrating to have to copy data around
whether manually or otherwise.

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Get your dataset on the next LOD cloud diagram

2011-07-13 Thread Kingsley Idehen

On 7/13/11 2:34 PM, Leigh Dodds wrote:

Hi,

On 13 July 2011 14:30, Kingsley Idehenkide...@openlinksw.com  wrote:

Can you ping me or reply to this list with a list of missing SPARQL
endpoints. Alternatively, you bookmark them on del.icio.us using tag:
sparql_endpoint.

Here is my collection: http://www.delicious.com/kidehen/sparql_endpoint .

The data is all in a machine-readable form. See:

http://data.kasabi.com/datasets

The URI supports conneg so you can follow rdfs:seeAlso links to all of
the VoiD descriptions and hence to the sparql endpoints, plus all of
the other APIs.

It'd be nice if the LD cloud diagram used other machine-readable
sources where possible. I know CKAN is a good focal point for helping
curate activity, but also frustrating to have to copy data around
whether manually or otherwise.

Cheers,

L.


Leigh,

I am seeking SPARQL endpoint URLs. Save me drilling down to each 
endpoint for each dataset  :-)



--

Regards,

Kingsley Idehen 
President  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen








Re: Get your dataset on the next LOD cloud diagram

2011-07-12 Thread Giovanni Tummarello
Hi out of curiousity
Will you be taking off the diagram those that are NOT online regularly?
Gio

On Tue, Jul 12, 2011 at 7:45 PM, Pablo Mendes pablomen...@gmail.com wrote:
 Dear fellow Linked Open Data publishers and consumers,
 We are in the process of regenerating the next LOD cloud diagram and
 associated statistics [1]. We would like to invite those of you who publish
 data sets as Linked Data to join the other ~2000 data sets already in CKAN (
 http://ckan.net ) to help us extend the list of ~300 candidates to the LOD
 cloud diagram. For those of you that already have entries on CKAN, we ask
 you to please review and update your entries accordingly. Please finalize
 your dataset descriptions until the end of this week to ensure that your
 entry will be considered for this round of the diagram.

 We will be analyzing all data sets tagged with lod in CKAN from the
 perspective of a data consumer, looking for best practices that make it
 easier to access, understand and use your data. The compliance with the best
 practices will be checked manually and with scripts that download and
 analyze data from the data sources. Therefore it is important that you
 provide as much information as possible in your CKAN entry.

 You can use the CKAN entry for DBpedia as one example:
 http://ckan.net/package/dbpedia

 In order to aid you in this quest, we have provided a validation page for
 your CKAN entry with step-by-step guidance for the information that we will
 be looking for:
 http://www4.wiwiss.fu-berlin.de/lodcloud/ckan/validator/

 After you have completed the description of your data sets, we invite you to
 fill up this 5 minutes survey about your experience. This will help us to
 make the process easier, more complete and exciting for the next time
 around.
 http://www.surveymonkey.com/s/TDS3TML

 Thank you and happy dataset description!

 Cheers,
 Pablo, Anja, Richard and Chris
 [1] http://www4.wiwiss.fu-berlin.de/lodcloud/state/



Re: Get your dataset on the next LOD cloud diagram

2011-07-12 Thread Pablo Mendes
Giovanni,
Thanks for helping to build a preemptive QA for dataset providers. First,
there is no 'taking off'. The 2007...2010 versions will remain online
forever with the help of web archive.
Second, I assume you refer to the relatively short time span between my
message to the list and the desired date for finishing the entries for the
new release. As you know, CKAN has been used for a while as a catalog for
keeping updated entries, as well as the source for generating the diagram.
The state of the lod page has instructions to add yourself to the cloud that
are online 24x7. So, in comparison to all the time that people have been
effectively updating their entries, my request to complete the updates still
this week may sound a bit anxious.
Dataset providers that cannot meet the 'deadline' for this release can
update their entries in their own time frame and ensure their appearance in
the next release.
Other providers that feel they should appear in this release, but cannot
meet the deadline can protest directly to my mailbox and I will do my best
to accomodate everybody's needs.

Cheers,
Pablo
On Jul 12, 2011 9:33 PM, Giovanni Tummarello giovanni.tummare...@deri.org
wrote:


Re: Get your dataset on the next LOD cloud diagram

2011-07-12 Thread Giovanni Tummarello
i meant a much simpler and significant thing. Go in CKAN click on the
LOD tag, then start clicking around datasets.
Many dont work, are offline etc. They have been for weeks or months.
Are you checking these and removing them from the new lod diagram or
will the lod diagram just grow regardless reality?
thanks
Gio

 Second, I assume you refer to the relatively short time span between my
 message to the list and the desired date for finishing the entries for the
 new release.



Re: Get your dataset on the next LOD cloud diagram

2011-07-12 Thread Thomas Steiner
 Many dont work, are offline etc. They have been for weeks or months.
 Are you checking these and removing them from the new lod diagram or
 will the lod diagram just grow regardless reality?

Fair point, Giovanni. Pablo, I /believe/ Giovanni is referring to a
recent related experiment [1] by Mondeca on SPARQL endpoint
availability for CKAN resources. The result was that some SPARQL
endpoints were down for a considerable period of time, which
effectively makes their usage unreliable.

Best,
Tom

[1] http://labs.mondeca.com/sparqlEndpointsStatus/index.html

-- 
Thomas Steiner, Research Scientist, Google Inc.
http://blog.tomayac.com, http://twitter.com/tomayac



Re: Get your dataset on the next LOD cloud diagram

2011-07-12 Thread bizer
Hi Giovanni,

 Will you be taking off the diagram those that are NOT online regularly?

could you please be a bit more precise and clearly say which datasets you
are talking about.

Which datasets do not provide dereferencable URIs anymore?

(Linked Data and the LOD diagram is not about SPARQL endpoints)

A constructive approach, which I guess would be highly appreciated by the
community, would be that you directly mark these datasets on CKAN using
the tags that are proposed at the end of this page

http://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation

Cheers,

Chris


 Hi out of curiousity
 Will you be taking off the diagram those that are NOT online regularly?
 Gio

 On Tue, Jul 12, 2011 at 7:45 PM, Pablo Mendes pablomen...@gmail.com
 wrote:
 Dear fellow Linked Open Data publishers and consumers,
 We are in the process of regenerating the next LOD cloud diagram and
 associated statistics [1]. We would like to invite those of you who
 publish
 data sets as Linked Data to join the other ~2000 data sets already in
 CKAN (
 http://ckan.net ) to help us extend the list of ~300 candidates to the
 LOD
 cloud diagram. For those of you that already have entries on CKAN, we
 ask
 you to please review and update your entries accordingly. Please
 finalize
 your dataset descriptions until the end of this week to ensure that your
 entry will be considered for this round of the diagram.

 We will be analyzing all data sets tagged with lod in CKAN from the
 perspective of a data consumer, looking for best practices that make it
 easier to access, understand and use your data. The compliance with the
 best
 practices will be checked manually and with scripts that download and
 analyze data from the data sources. Therefore it is important that you
 provide as much information as possible in your CKAN entry.

 You can use the CKAN entry for DBpedia as one example:
 http://ckan.net/package/dbpedia

 In order to aid you in this quest, we have provided a validation page
 for
 your CKAN entry with step-by-step guidance for the information that we
 will
 be looking for:
 http://www4.wiwiss.fu-berlin.de/lodcloud/ckan/validator/

 After you have completed the description of your data sets, we invite
 you to
 fill up this 5 minutes survey about your experience. This will help us
 to
 make the process easier, more complete and exciting for the next time
 around.
 http://www.surveymonkey.com/s/TDS3TML

 Thank you and happy dataset description!

 Cheers,
 Pablo, Anja, Richard and Chris
 [1] http://www4.wiwiss.fu-berlin.de/lodcloud/state/







Re: Get your dataset on the next LOD cloud diagram

2011-07-12 Thread Marco Neumann
I would think that Giovanni refers to the public tracker CKAN SPARQL Endpoint:

http://labs.mondeca.com/sparqlEndpointsStatus/

I also would also recommend to remove the endpoints that indeed do
show significant downtime such as 100% downtime 100% of the time for a
consecutive period of more than 3 month

Marco

-- 
Marco Neumann
KONA

---
Join us at the Semantic Web Media Summit in New York City for an
exciting event on 14 September 2011
http://www.lotico.com/evt/swmsNYC2011/


On Tue, Jul 12, 2011 at 6:05 PM,  bi...@zedat.fu-berlin.de wrote:
 Hi Giovanni,

 Will you be taking off the diagram those that are NOT online regularly?

 could you please be a bit more precise and clearly say which datasets you
 are talking about.

 Which datasets do not provide dereferencable URIs anymore?

 (Linked Data and the LOD diagram is not about SPARQL endpoints)

 A constructive approach, which I guess would be highly appreciated by the
 community, would be that you directly mark these datasets on CKAN using
 the tags that are proposed at the end of this page

 http://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation

 Cheers,

 Chris


 Hi out of curiousity
 Will you be taking off the diagram those that are NOT online regularly?
 Gio

 On Tue, Jul 12, 2011 at 7:45 PM, Pablo Mendes pablomen...@gmail.com
 wrote:
 Dear fellow Linked Open Data publishers and consumers,
 We are in the process of regenerating the next LOD cloud diagram and
 associated statistics [1]. We would like to invite those of you who
 publish
 data sets as Linked Data to join the other ~2000 data sets already in
 CKAN (
 http://ckan.net ) to help us extend the list of ~300 candidates to the
 LOD
 cloud diagram. For those of you that already have entries on CKAN, we
 ask
 you to please review and update your entries accordingly. Please
 finalize
 your dataset descriptions until the end of this week to ensure that your
 entry will be considered for this round of the diagram.

 We will be analyzing all data sets tagged with lod in CKAN from the
 perspective of a data consumer, looking for best practices that make it
 easier to access, understand and use your data. The compliance with the
 best
 practices will be checked manually and with scripts that download and
 analyze data from the data sources. Therefore it is important that you
 provide as much information as possible in your CKAN entry.

 You can use the CKAN entry for DBpedia as one example:
 http://ckan.net/package/dbpedia

 In order to aid you in this quest, we have provided a validation page
 for
 your CKAN entry with step-by-step guidance for the information that we
 will
 be looking for:
 http://www4.wiwiss.fu-berlin.de/lodcloud/ckan/validator/

 After you have completed the description of your data sets, we invite
 you to
 fill up this 5 minutes survey about your experience. This will help us
 to
 make the process easier, more complete and exciting for the next time
 around.
 http://www.surveymonkey.com/s/TDS3TML

 Thank you and happy dataset description!

 Cheers,
 Pablo, Anja, Richard and Chris
 [1] http://www4.wiwiss.fu-berlin.de/lodcloud/state/










Re: Get your dataset on the next LOD cloud diagram

2011-07-12 Thread Pablo Mendes
Thanks, Thomas.

Giovanni, it was a coreference resolution problem from my side. You meant
'they'=datasets and I read 'they'=people. It was anyhow a possible question
to come by and it's (hopefully) clearer now. Sorry for the confusion.

Now to the intended question.
I will discuss the issue of availability with my colleagues. But my personal
opinion is that availability is an important quality indicator, and should
be incorporated if feasible wrt to time and resource availability. Could we
perhaps have others (e.g. Sindice, Openlink cloud cache, etc.) also
providing their assessment of this specific indicator? It sounds like it's
of shared interest and could benefit from multiple independent assessments.

What do you think?

Cheers,
Pablo
On Jul 12, 2011 10:54 PM, Thomas Steiner to...@google.com wrote:


Re: Get your dataset on the next LOD cloud diagram

2011-07-12 Thread Thomas Steiner
Hi Chris,

 could you please be a bit more precise and clearly say which datasets you
 are talking about.
One example is Semantic Crunchbase (http://cb.semsol.org/).

 Which datasets do not provide dereferencable URIs anymore?
None of those seem to work any longer (try any of
http://www.google.com/?q=site:http://cb.semsol.org/company%20filetype:rdf).

 (Linked Data and the LOD diagram is not about SPARQL endpoints)
Fair enough. It is related though, according to TimBL's Linked Data
principle #3 (http://www.w3.org/DesignIssues/LinkedData.html).

 A constructive approach, which I guess would be highly appreciated by the
 community, would be that you directly mark these datasets on CKAN using
 the tags that are proposed at the end of this page

 http://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation
I have added the lodcloud.needsfixing tag on the Semantic Crunchbase
CKAN page (http://ckan.net/package/semsol-crunchbase) plus a (maybe)
helpful note with a statement from the dataset's maintainer.

Best,
Tom

-- 
Thomas Steiner, Research Scientist, Google Inc.
http://blog.tomayac.com, http://twitter.com/tomayac



Re: Get your dataset on the next LOD cloud diagram

2011-07-12 Thread Kingsley Idehen

On 7/12/11 11:21 PM, Pablo Mendes wrote:


Thanks, Thomas.

Giovanni, it was a coreference resolution problem from my side. You 
meant 'they'=datasets and I read 'they'=people. It was anyhow a 
possible question to come by and it's (hopefully) clearer now. Sorry 
for the confusion.


Now to the intended question.
I will discuss the issue of availability with my colleagues. But my 
personal opinion is that availability is an important quality 
indicator, and should be incorporated if feasible wrt to time and 
resource availability. Could we perhaps have others (e.g. Sindice, 
Openlink cloud cache, etc.) also providing their assessment of this 
specific indicator? It sounds like it's of shared interest and could 
benefit from multiple independent assessments.


What do you think?

Cheers,
Pablo

On Jul 12, 2011 10:54 PM, Thomas Steiner to...@google.com 
mailto:to...@google.com wrote:


Datasets that are inaccessible for large amounts of time (e.g., 3+ 
months) ultimately undermine the LOD cloud. Rather than removing a 
dataset, why not color code LOD cloud bubbles using the same color 
scheme from: http://labs.mondeca.com/sparqlEndpointsStatus/index.html, 
if possible?


For better or for worse, the LOD cloud pictorial is now a staple re. 
Linked Data marketing comms. collateral.


--

Regards,

Kingsley Idehen 
President  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen







Re: Get your dataset on the next LOD cloud diagram

2011-07-12 Thread Giovanni Tummarello
Chris,

i am not interested in specific content of the diagram, but rather i
am interested in understanding what its value of it which depends on
the method you're going to follow in the update. You're answeing this
saying basically there wont be a check for old dead datasets.

I admit never having looked at this closely but i think i cant be the
only one thinking its a bit of a joke if we're telling people to
publish data in a way.. that doesnt even have a way to know if data is
thre or not?

please notice that i am trying to be constructive by suggesting the
diagram is made to mean something that one can rely on e.g. let me go
see the latest diagram so that i can.. . a suggestion in this sense
could be to require that linked data in ckan publishes  URIs with
sample data are given, that sites are exposing either dumps or a
sitemap (so that they can be collected) etc.
cheers
Gio











 and uselessness of the initiative, of the diagram of ckan and more.



On Wed, Jul 13, 2011 at 12:05 AM,  bi...@zedat.fu-berlin.de wrote:
 Hi Giovanni,

 Will you be taking off the diagram those that are NOT online regularly?

 could you please be a bit more precise and clearly say which datasets you
 are talking about.

 Which datasets do not provide dereferencable URIs anymore?

 (Linked Data and the LOD diagram is not about SPARQL endpoints)

 A constructive approach, which I guess would be highly appreciated by the
 community, would be that you directly mark these datasets on CKAN using
 the tags that are proposed at the end of this page

 http://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation

 Cheers,

 Chris


 Hi out of curiousity
 Will you be taking off the diagram those that are NOT online regularly?
 Gio

 On Tue, Jul 12, 2011 at 7:45 PM, Pablo Mendes pablomen...@gmail.com
 wrote:
 Dear fellow Linked Open Data publishers and consumers,
 We are in the process of regenerating the next LOD cloud diagram and
 associated statistics [1]. We would like to invite those of you who
 publish
 data sets as Linked Data to join the other ~2000 data sets already in
 CKAN (
 http://ckan.net ) to help us extend the list of ~300 candidates to the
 LOD
 cloud diagram. For those of you that already have entries on CKAN, we
 ask
 you to please review and update your entries accordingly. Please
 finalize
 your dataset descriptions until the end of this week to ensure that your
 entry will be considered for this round of the diagram.

 We will be analyzing all data sets tagged with lod in CKAN from the
 perspective of a data consumer, looking for best practices that make it
 easier to access, understand and use your data. The compliance with the
 best
 practices will be checked manually and with scripts that download and
 analyze data from the data sources. Therefore it is important that you
 provide as much information as possible in your CKAN entry.

 You can use the CKAN entry for DBpedia as one example:
 http://ckan.net/package/dbpedia

 In order to aid you in this quest, we have provided a validation page
 for
 your CKAN entry with step-by-step guidance for the information that we
 will
 be looking for:
 http://www4.wiwiss.fu-berlin.de/lodcloud/ckan/validator/

 After you have completed the description of your data sets, we invite
 you to
 fill up this 5 minutes survey about your experience. This will help us
 to
 make the process easier, more complete and exciting for the next time
 around.
 http://www.surveymonkey.com/s/TDS3TML

 Thank you and happy dataset description!

 Cheers,
 Pablo, Anja, Richard and Chris
 [1] http://www4.wiwiss.fu-berlin.de/lodcloud/state/








Re: Get your dataset on the next LOD cloud diagram

2011-07-12 Thread Kingsley Idehen

On 7/12/11 11:33 PM, Thomas Steiner wrote:

  (Linked Data and the LOD diagram is not about SPARQL endpoints)

Fair enough. It is related though, according to TimBL's Linked Data
principle #3 (http://www.w3.org/DesignIssues/LinkedData.html).

Really got to be careful there. SPARQL and RDF are implementation 
details re. Linked Data.

Note, in the original Linked Data meme, point #3 read:
When someone looks up a URI, provide useful information .

As exemplified by this post and in many other conversations, the 
addition of: using the standards (RDF*, SPARQL) is a regressive update 
of a GOLDEN meme. Net effect has been to inject inertia into Linked 
Data's adoption curve.


If you are seeking stats re. what I mean re. intertia, just keep track 
of what's happening on the schema.org front re. adoption curve.


--

Regards,

Kingsley Idehen 
President  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen








Re: Get your dataset on the next LOD cloud diagram

2011-07-12 Thread Giovanni Tummarello
 If you are seeking stats re. what I mean re. intertia, just keep track of
 what's happening on the schema.org front re. adoption curve.


 here are 100+ datasets

http://sindice.com/search?q=schemanq=fq=class%3Ahttp%3A%2F%2Fschema.org%2F*sortbydate=1facet.field=domaininterface=guru

started collecting 2 weeks ago and we did NOT reanalyze/recrawl
previously known sites ATM . How fair is it to call them datasets
rather than marked up pages that is up to discussion - possibly a
reasonably interesting one.

Gio