Metadata for Bio SPARQL endpoints ?

2011-03-04 Thread Andrea Splendiani
Hi,

I've recently seen a post on the w3c mailing list which points to a very
useful resource:

http://labs.mondeca.com/sparqlEndpointsStatus/index.html

(it's a bit incomplete on the bio side... perhaps not many people used ckan
in this area?).

I was wondering that for a resource like that to be really useful in
research, we would need an extra information: how fresh the information is.

Do you know if there is any standard metadata to indicate the last refresh
of the endpoint content ?
Technically speaking this kind of information should be associated to data
as provenance. In practice however, 90% of utility can be reached by  having
some state information for each big graph in the endpoint, corresponding to
major data sources.

In practice it would be nice to have a standard dictionary so that we can
ask to the triplestore:
list of graphs/datasets.

for each of these (or for endpoint itself if this holds information which is
coherent source-wise):
- update frequency
- last update
- data source (type and in case link).

Does anybody have this already ? Opinions ?

best,
Andrea

Andrea Splendiani
Senior Bioinformatics Scientist
Centre for Mathematical and Computational Biology
+44(0)1582 763133 ext 2004
andrea.splendi...@bbsrc.ac.uk







Re: Metadata for Bio SPARQL endpoints ?

2011-03-04 Thread Matthew Gamble
The issue of dataset dynamics for linked data data sets was discussed  
at the LDOW workshop at last years WWW conference. The impression that  
I got was that it is something that is obviously needed, but the  
problem is still being defined -  along with intial solutions.


As one might expect there are a myriad of proposals for capturing this  
metadata:


http://www.w3.org/wiki/DatasetDynamics

Best,
Matthew

Matthew Gamble
School of Computer Science
University of Manchester
Kilburn Building
Oxford Road
Manchester
M13 9PL
United Kingdom


On 4 Mar 2011, at 17:17, Andrea Splendiani wrote:


Hi,

I've recently seen a post on the w3c mailing list which points to a  
very

useful resource:

http://labs.mondeca.com/sparqlEndpointsStatus/index.html

(it's a bit incomplete on the bio side... perhaps not many people  
used ckan

in this area?).

I was wondering that for a resource like that to be really useful in
research, we would need an extra information: how fresh the  
information is.


Do you know if there is any standard metadata to indicate the last  
refresh

of the endpoint content ?
Technically speaking this kind of information should be associated  
to data
as provenance. In practice however, 90% of utility can be reached  
by  having
some state information for each big graph in the endpoint,  
corresponding to

major data sources.

In practice it would be nice to have a standard dictionary so that  
we can

ask to the triplestore:
list of graphs/datasets.

for each of these (or for endpoint itself if this holds information  
which is

coherent source-wise):
- update frequency
- last update
- data source (type and in case link).

Does anybody have this already ? Opinions ?

best,
Andrea

Andrea Splendiani
Senior Bioinformatics Scientist
Centre for Mathematical and Computational Biology
+44(0)1582 763133 ext 2004
andrea.splendi...@bbsrc.ac.uk