Hi there,

I like this feature list. Actually, when reading it I was thinking of  
a "(Web 3.0 | Linked Data | GGG | call-it-whatever) validation  
service" similar to [1]. Maybe another Google SOC project ;-) It would  
basically mean to write several checkers more or less sophisticated to  
compute scores summed up to a highscore. And then I would create a  
cool looking highscore website which could move web developers and  
site maintainers (although they will host only small sets of data, the  
community is big -> the Long Tail ;-) and this could bring people to  
jump onto the bandwagon and expose data. I've added my thoughts below...

[1] http://web2.0validator.com/

> - There's a well-thought-out ontology for the dataset with smart
> mappings to existing popular ontologies and vocabularies.

score based on popularity of vocab and referred vocabs (may be  
evaluated by SWSE/Sindice/Swoogle/dbpedia/etc.), consistency (checking  
of DL ontologies), quality measurement of ontologies in general - I'm  
sure some work has been done towards this direction

> - All items of interest in the dataset have been assigned unique URIs.
> - All URIs are dereferenceable, according to the recommendations given
> in [1].

Checking the first requirement is somehow impossible without knowing  
about the underlying dataset. The second is easy but could take a long  
time (however, a "Semantic Link" checker would probably be very useful  
anyway to keep the GGG consistent and check your own site)

> - Resolving the URIs returns information about the resource, ideally
> in RDF/XML and N3 and HTML, based on content negotiation
can be checked, support for each format adds to the score

> - The HTML pages where the data shows up are marked up with RDFa
can be checked, score ~ ratio number of rdfa-tags / html-tags

> - There's a SPARQL endpoint that makes all the RDF data available
looking for the sitemap extension; simple test for Request-Path "/ 
sparql", or by following internal links looking for SPARQL endpoint

> - There's an RDF data dump that contains all the data
looking for the sitemap extension, dumps available?

> - The dataset is richly interlinked internally, so you can use e.g.
> Tabulator to browse through the dataset, jumping from one node to the
> next
score ~ ratio number of triples with IRI as object / all triples

> - The project team engages with other dataset maintainers to create
> RDF links between resources that are described in multiple datasets,
> or that are related
score ~ ratio number of triples with outgoing IRI as object / triples  
with IRI as object

> - The URIs and what kind of data is available is all clearly
> documented, to make it easy for people to e.g. link from their FOAF
> files into the dataset
difficult, not possible to include in score

regards
Andy
_______________________________________________
Linking-open-data mailing list
[email protected]
http://simile.mit.edu/mailman/listinfo/linking-open-data

Reply via email to