----- "Yves Raimond" <[EMAIL PROTECTED]> wrote:

> Hello!
> 
> >
> > Is this really a problem? Why not just keep in mind that triple
> numbers are
> > a purely mechanical measure and are no indication of quality or
> usefulness?
> >
> > A raw triple count is just that, a raw triple count. It doesn't
> mean
> > anything else. And it is useful for anyone who wants to
> > store/index/postprocess a dataset/linkset, because for storage and
> querying
> > the number of triples matters.
> >
> 
> I wasn't talking about triples counts actually (as you said, a triple
> count is just that). But about quantifying the number of interlinks
> in
> a way that is consistent across dataset (just that, no notion of
> usefulness:-) ) - eg. in a way where you can say "oh, this dataset
> indeed have more interlinks than this one". I was arguing that the
> current way of doing it (counting triples that mention an `external'
> resource) is not consistent, as you can easily make that number
> higher
> or lower by applying simple transformations to your data.
> 
> I think a consistent way of measuring the interlinking is to just
> count the number of distinct `external' resources in the dataset
> (which will give the lowest number you get by applying such
> transformations).
> 
> See for example http://dbtune.org/musicbrainz/,
> http://dbtune.org/bbc/peel/ or http://dbtune.org/jamendo/
> 

It depends on whether you know that the external references are distinct just 
based on the URI string. If someone links out to multiple formats using 
external resource links then they would have to be counted as multiple links as 
you have no way of knowing that they are different, except in the case where 
you reserve these types of links as RDF literals.

I think a more effective way is to measure both the number of outbound and 
inbound links, something which is only possible if you have a way of 
determining in the rest of the web who links back to a particular resource in 
your own scheme. If you count inbound links, like Google do with PageRank, then 
you get a better idea of who is actually being used as opposed to who is just 
linking into all the others it can find. See [1] for a description of it in the 
Bio2RDF project.

Cheers,

Peter

[1] http://dx.doi.org/10.1007/978-3-540-69828-9_15

Reply via email to