Re: Top three levels of Dewey Decimal Classification published as linked data

2009-08-20 Thread Bernard Vatant

Ed, Michael

Hi Michael,

This is really exciting news, especially for those of us Linked Data
enthusiasts in the world of libraries and archives. Congratulations!
  

+1

I haven't fully read the wiki page yet, so I apologize if this
question is already answered there. I was wondering why you chose to
mint multiple URIs for the same concept in different languages. 

I had exactly the same question, you stole it ...


...

I kind of expected the assertions to hang off of a language and
version agnostic URI, with perhaps dct:hasVersion links to previous
versions.
  

Indeed.

http://dewey.info/class/641/
cc:attributionName OCLC Online Computer Library Center, Inc. ;
cc:attributionURL http://www.oclc.org/dewey/ ;
cc:morePermissions http://www.oclc.org/dewey/about/licensing/ ;
dct:hasVersion http://dewey.info/class/641/2009/08/ ;
dct:language de^^dct:RFC4646 ;
a skos:Concept ;
xhtml:license http://creativecommons.org/licenses/by-nc-nd/3.0/ ;
skos:broader http://dewey.info/class/64/2003/08/about.de ;
skos:inScheme http://dewey.info/scheme/2003/08/about.de ;
skos:notation 641^^http://dewey.info/schema-terms/Notation ;
skos:prefLabel Food  drink@en, Essen und Trinken@de .
  

Maybe rather :

skos:broader http://dewey.info/class/64/2003/ ;
skos:inScheme http://dewey.info/scheme/2003/08/ ;

Although I don't understand why some classes have a year information 
in the URI (2003) and some have none?


Bernard

--

*Bernard Vatant
*Senior Consultant
Vocabulary  Data Engineering
Tel:   +33 (0) 971 488 459
Mail: bernard.vat...@mondeca.com mailto:bernard.vat...@mondeca.com

*Mondeca**
*3, cité Nollez 75018 Paris France
Web:www.mondeca.com http://www.mondeca.com
Blog:Leçons de Choses http://mondeca.wordpress.com/
**




Re: Top three levels of Dewey Decimal Classification published as linked data

2009-08-20 Thread Ed Summers
On Thu, Aug 20, 2009 at 3:53 AM, Bernard
Vatantbernard.vat...@mondeca.com wrote:
 http://dewey.info/class/641/
    cc:attributionName OCLC Online Computer Library Center, Inc. ;
    cc:attributionURL http://www.oclc.org/dewey/ ;
    cc:morePermissions http://www.oclc.org/dewey/about/licensing/ ;
    dct:hasVersion http://dewey.info/class/641/2009/08/ ;
    dct:language de^^dct:RFC4646 ;
    a skos:Concept ;
    xhtml:license http://creativecommons.org/licenses/by-nc-nd/3.0/ ;
    skos:broader http://dewey.info/class/64/2003/08/about.de ;
    skos:inScheme http://dewey.info/scheme/2003/08/about.de ;
    skos:notation 641^^http://dewey.info/schema-terms/Notation ;
    skos:prefLabel Food  drink@en, Essen und Trinken@de .


 Maybe rather :

 skos:broader http://dewey.info/class/64/2003/ ;
 skos:inScheme http://dewey.info/scheme/2003/08/ ;

Yes, thanks Bernard. In my haste I forgot to remove the language
specific-ness from the skos:broader assertions in my email to Michael.
I might actually recommend that the version information is removed
from the object in the broader relation too:

http://dewey.info/class/641/ skos:broader
http://dewey.info/class/64/2003/ .

As it stands now a document URI is being used in the object position,
instead of the URI for the concept.

uqbar:~ ed$ curl -I http://dewey.info/class/64/2003/08/about.de
HTTP/1.1 200 OK
Date: Thu, 20 Aug 2009 09:27:32 GMT
Server: Apache/2.2.11 (Unix) PHP/5.2.10
X-Powered-By: PHP/5.2.10
Content-Location: http://dewey.info/class/64/2003/08/about.de.rdf
Vary: Accept
Content-Length: 2115
Content-Type: application/rdf+xml

//Ed



Re: Top three levels of Dewey Decimal Classification published as linked data

2009-08-20 Thread Toby Inkster
On Wed, 2009-08-19 at 14:27 -0400, Panzer,Michael wrote:
 I would like to announce the availability of the DDC Summaries as a
 linked data service that uses SKOS and other vocabularies for
 representation [1]. Please take a look if you like. Comments,
 suggestions, and advice are really appreciated!

Hurrah!

I was looking for linked data URIs for the Dewey Decimal Classification
a few months ago, and while I eventually found info:ddc/22/eng//641,
these aren't very linked data as they're not dereferencable via HTTP.
So I set up my own, in the format:

http://purl.org/NET/decimalised#c470

The data is available in RDF/XML and N-Triples with a subset in XHTML
+RDFa, all content negotiated. 

I've called my data the Decimalised Database of Concepts to reflect
the fact that the information shouldn't be treated as necessarily
exactly the same as official Dewey Decimal Classification concepts.

It includes SKOS relatedMatch mappings to DBpedia and closeMatch
mappings to LCSH and LCCO in many cases. In light of your announcement
yesterday, I've added exactMatch mappings to dewey.info's URIs too.

I don't know if this information or the mappings are any use to you.
It's available under CC-BY-SA, which is not especially compatible with
the CC-BY-NC-ND license used by dewey.info, but given that the vast
majority of my data comes from Wikipedia, I don't really have the choice
of publishing it under a different license.

-- 
Toby A Inkster
mailto:m...@tobyinkster.co.uk
http://tobyinkster.co.uk




Re: Top three levels of Dewey Decimal Classification published as linked data

2009-08-20 Thread Ian Davis
On Wed, Aug 19, 2009 at 7:27 PM, Panzer,Michaelpanz...@oclc.org wrote:
 Hi all,

 I would like to announce the availability of the DDC Summaries as a linked
 data service that uses SKOS and other vocabularies for representation [1].
 Please take a look if you like. Comments, suggestions, and advice are really
 appreciated!


Very pleased to see this happen at OCLC and I hope there's more to come!

Ian



Re: data.gov now live with RDFa

2009-08-20 Thread Richard Cyganiak

Rick,

Toby already pointed out the issue with the dc: namespace.

There's also this section:

div class=hometext about=data.gov program  
property=dc:description

  The purpose of Data.gov is to increase public access ...
/div

The value of the @about attribute has to be a URI, and data.gov  
program is not a URI. Fixing it is easy, but depends on what exactly  
you want the data to say.


1. Maybe you want the text to be a description of the web page http://www.data.gov/ 
. Similar to the dc:title, dc:creator and dc:publisher triples that  
are already present. In that case, you would want to use about=,  
which is a shortcut for about=http://www.data.gov/; because an empty  
URI will be treated as a relative URI, and therefore expands to the  
URI of the current document. But since about= is the default anyway,  
you could just drop the attribute completely and still get the same  
triple.


2. Maybe you want to text to be a description of the data.gov  
program, the government activity/entity, and not just a description  
of the particular web page. In that case, you would need to decide on  
a URI that you want to use for this entity. One easy option would be  
about=#program, which again is a relative URI and would expand to http://www.data.gov/#program 
. In that case, it would be nice to add some more triples about the  
program.


The first option might be a bit more straightforward.

Best,
Richard


On 19 Aug 2009, at 11:50, rick wrote:


Hello All:

Just to give you a heads up the data.gov home page is now live with  
RDFa tags.


http://www.data.gov/

The page has two triples. One about the page, one about the program.  
We used this very simplistic approach to overcome any operational  
barriers and develop experience in publishing RDFa tags.


See below for source code that parses the triples. I used the INRIA  
GRDDL transform.


Have fun and I'll watch this list for feedback and such.

--
Rick

cell: 703-201-9129
web:  http://www.rickmurphy.org
blog: http://phaneron.rickmurphy.org

 more RDFaConsumer.java
/**
* @(#)RDFaConsumer.java
* @author r...@rickmurphy.org
*/

package gov.data;

/**
*/

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.FactoryConfigurationError;
import javax.xml.parsers.ParserConfigurationException;

import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;

import java.text.ParseException;

import java.io.UnsupportedEncodingException;
import java.io.IOException;

import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.w3c.dom.Element;

import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerConfigurationException;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;


import java.net.URL;

public class RDFaConsumer{

   private static String namespace;

   /**
*/
   public RDFaConsumer(String uri){

try {

	DocumentBuilderFactory factory =  
DocumentBuilderFactory.newInstance();

   DocumentBuilder builder = factory.newDocumentBuilder();
Document input = builder.parse(new URL(uri).toString());
input.normalize();

DOMSource source = new DOMSource(input);
StreamResult result = new StreamResult(System.out);
StreamSource stylesheet = new StreamSource(new 
URL(http://ns.inria.fr/grddl/
rdfa/2008/09/03/RDFa2RDFXML.xsl).toString());

// Use a Transformer for output
TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer transformer = tFactory.newTransformer(stylesheet);
transformer.transform(source, result);

// transform the dom into RDF/XML
traverse(input);

   } catch (SAXParseException spe) {
System.out.println(sax parse ex:  + spe);
   } catch (SAXException sxe) {
System.out.println(sax ex:  + sxe);
   } catch (ParserConfigurationException pce) {
System.out.println(parser config ex:  + pce);
   }catch (UnsupportedEncodingException ude) {
System.out.println(unsupported encoding ex:  + ude);
   } catch (IOException ioe) {
System.out.println(io ex:  + ioe);
   } catch (TransformerConfigurationException tc) {
System.out.println(transformer config ex:  + tc);
   } catch (TransformerException te) {
System.out.println(trasnformer ex:  + te);
   }

   }//RDFaConsumer

   /**
*/
   public static void main(String[] args){

new RDFaConsumer(args[0]);

   }//main

   /**
*/
   public void traverse(Node node){

   // is there anything to do?
   if (node == null) {
   return;
   }

   int type = node.getNodeType();
   

Re: [Dbpedia-discussion] Dbpedia-Freebase raw dump of conditional probabilities

2009-08-20 Thread Kavitha Srinivas

Tim (and anyone else who is interested)

   I put the raw dump of conditional probabilities on an external  
website (http://domino.research.ibm.com/comm/research_projects.nsf/ 
pages/iaa.index.html).  Go the section on LinkedOpenData and  
Extraction of Vocabularies on this page and click on the link to the  
datafile.


Kavitha

On Aug 10, 2009, at 4:42 PM, Tim Finin wrote:


Kavitha Srinivas wrote:
I understand what you are saying -- but some of this reflects the  
way types are associated with freebase instances.  The types are  
more like 'tags' in the sense that there is no hierarchy, but each  
instance is annotated with multiple types.  So an artist would in  
fact be annotated with person reliably (and probably less  
consistently with /music/artist).  Similar issues with Uyhurs,  
murdered children etc.  The issue is differences in modeling  
granularity as well.  Perhaps a better thing to look at are types  
where the YAGO types map to Wordnet (this is usually at a coarser  
level of granularity).


One way to approach this problem is to use a framework to mix logical
constraints with probabilistic ones.  My colleague Yun Peng has been
exploring integrating data backed by OWL ontologies with Bayesian  
information,
with applications for ontology mapping.  See [1] for recent papers  
on this

as well as a recent PhD thesis [2] that I think also may be relevant.

[1] http://ebiquity.umbc.edu/papers/select/search/html/ 
613a353a7b693a303b643a37383b693a313b643a303b693a323b733a303a3b693a 
333b733a303a3b693a343b643a303b7d/
[2] http://ebiquity.umbc.edu/paper/html/id/427/Constraint- 
Generation-and-Reasoning-in-OWL





RE: Top three levels of Dewey Decimal Classification published as linked data

2009-08-20 Thread Panzer,Michael
Hi Ed 

 I haven't fully read the wiki page yet, so I apologize if 
 this question is already answered there. I was wondering why 
 you chose to mint multiple URIs for the same concept in 
 different languages. 

[...]
 
 I kind of expected the assertions to hang off of a language 
 and version agnostic URI, with perhaps dct:hasVersion links 
 to previous versions.
 
 http://dewey.info/class/641/
 cc:attributionName OCLC Online Computer Library Center, Inc. ;
 cc:attributionURL http://www.oclc.org/dewey/ ;
 cc:morePermissions http://www.oclc.org/dewey/about/licensing/ ;
 dct:hasVersion http://dewey.info/class/641/2009/08/ ;
 dct:language de^^dct:RFC4646 ;
 a skos:Concept ;
 xhtml:license 
 http://creativecommons.org/licenses/by-nc-nd/3.0/ ;
 skos:broader http://dewey.info/class/64/2003/08/about.de ;
 skos:inScheme http://dewey.info/scheme/2003/08/about.de ;
 skos:notation 641^^http://dewey.info/schema-terms/Notation ;
 skos:prefLabel Food  drink@en, Essen und Trinken@de .

A very good question (and not an easy one to answer). The short answer
would be: Language _is_ an element of the domain to be described (Dewey
concepts), so a different language should generate a different URI,
because it describes a separate instance of a concept. A longer answer: 

My basic premise here was that a URI like http://dewey.info/class/641/
should indentify class 641 across all versions/languages of the DDC, not
just the most current version or a multilingual version. Why?

1. Labels can change over time for a given class, which could lead to
inconsistencies with the SKOS model. For example, at one point in time
class 210 had the prefLabel Natural theology; now it has the prefLabel
Philosophy  theory of religion (which reflects changes to its
semantics as well). This would lead to problems when hanging them from
one concept: 

http://dewey.info/class/641/
skos:prefLabel Natural theology@en;
skos:prefLabel Philosophy  theory of religion@en;
...
skos:prefLabel Religionsphilosophie, Religionstheorie@de.

2. The prefLabel is not the only relationship that might be dependent on
the concept language. And many of these relationships are not used with
plain literals in object position that can be disambiguated with a
language tag. Semantic relationships may be different for a concept in
the German version of the DDC. Example: 220.5312 Luther-Bibel und
Revisionen is a concept that, because of an expansion, only exists in
the German edition. There has to be a way to identify this class as a
German edtion concept, not only as a Dewey concept that happens to only
have a German caption.

3. One could argue that this would not be a relevant counter-argument if
translations where 100% interoperable (i.e., 220.5312 might one day be
included in the English edition as Luther Bible and revisions.
(Interoperability in this case would mean that no other translation or
the English edition could have claimed this number to coin a different
concept.) But translations are not always perfectly synchronized. So
http://dewey.info/class/641/2008/01/03/ in Portugese could in fact be a
translation of an earlier version of the English concept, e.g.
http://dewey.info/class/641/2005/08/09/about.en (that might have been
updated with different index terms, different semantic relationships and
so on in the meantime).

4. Finally, having different identifiers for the same concept in a
different language, or, more precisely, for a concept with the same
Dewey number, makes it possible to answer very useful questions like: 

Which concepts exist in the German but not in the English edition?

---

The compromise here is to use language as part of the document URIs (as
a dimension of the representation), not as part of the abstract URIs.
So, if you want to refer to a class, or timestamped version, you can do
so in an abstract way:

http://dewey.info/class/641/  - 303 See Other:
http://dewey.info/class/641/about
http://dewey.info/class/641/2003/ - 303 See Other:
http://dewey.info/class/641/2003/about

A 303 points a user agent to the generic document which is /about.

Whereas when you want to refer to a specific language or format, you
have to use a specific URI for an information resource, e.g.,
http://dewey.info/class/641/about.de or
http://dewey.info/class/641/2003/about.de. So language _is_ recognized
as part of the domain, but only as part of representations, not of
concepts.

 See how multiple skos:prefLabel assertions can be made using 
 the same subject? To illustrate why I think this is important 
 consider how someone may use the resources at dewey.info:
 
 http://openlibrary.org/b/OL11604988M dct:subject 
 http://dewey.info/class/641/ .
 
 If we follow our nose to http://dewey.info/class/641 we will 
 get back some RDF, but the description we get isn't about the 
 subject http://dewey.info/class/641/ so what are we to make 
 of the above assertion?

Going back the the premise, I 

Re: [Dbpedia-discussion] Dbpedia-Freebase raw dump of conditional probabilities

2009-08-20 Thread Ryan Shaw
On Thu, Aug 20, 2009 at 12:57 PM, Kavitha Srinivasksrin...@gmail.com wrote:

 I put the raw dump of conditional probabilities on an external
 website (http://domino.research.ibm.com/comm/research_projects.nsf/
 pages/iaa.index.html).  Go the section on LinkedOpenData and
 Extraction of Vocabularies on this page and click on the link to the
 datafile.

It strikes me that this is the kind of thing it would be useful for
publish as Linked Data. In other words, rather than analyzing
instances, calculating a bunch of conditional probabilities, and then
publishing a bunch of [ sameAs | equivalentClass | seeAlso | whatever
] assertions, one could publish a bunch of conditional probabilities
or other similarity values, with  some indication of the type of
similarity measure used and links to the specific instance sets used
to calculate the values. Others could then use these measures as they
wished, setting their own thresholds for when to consider something an
equivalence relation or not.

Are there any vocabularies that might be used to publish such as data
set as Linked Data?



Re: Top three levels of Dewey Decimal Classification published as linked data

2009-08-20 Thread Ryan Shaw
On Thu, Aug 20, 2009 at 1:50 PM, Panzer,Michaelpanz...@oclc.org wrote:

 Whereas when you want to refer to a specific language or format, you
 have to use a specific URI for an information resource, e.g.,
 http://dewey.info/class/641/about.de or
 http://dewey.info/class/641/2003/about.de. So language _is_ recognized
 as part of the domain, but only as part of representations, not of
 concepts.

I don't understand this part. I understand why you ought to have
language-specific (and versioned) concepts for the DDC. But then it
seems to me that you should have language-specific (and versioned)
URIs for the *concepts*, not just for the information resources that
represent them. As it stands (and as Ed pointed out), you seem to be
asserting that your information resources are SKOS Concepts.

(For what it's worth, I agree with Pat Hayes that one ought to be able
to have a single URI that provides access to the information resource
and at the same time refers to the abstract concept. But that's not
how everyone else is doing it.)



Re: [Dbpedia-discussion] Dbpedia-Freebase raw dump of conditional probabilities

2009-08-20 Thread Mike Bergman

Hi Ryan,

Ryan Shaw wrote:

On Thu, Aug 20, 2009 at 12:57 PM, Kavitha Srinivasksrin...@gmail.com wrote:


I put the raw dump of conditional probabilities on an external
website (http://domino.research.ibm.com/comm/research_projects.nsf/
pages/iaa.index.html).  Go the section on LinkedOpenData and
Extraction of Vocabularies on this page and click on the link to the
datafile.


It strikes me that this is the kind of thing it would be useful for
publish as Linked Data. In other words, rather than analyzing
instances, calculating a bunch of conditional probabilities, and then
publishing a bunch of [ sameAs | equivalentClass | seeAlso | whatever
] assertions, one could publish a bunch of conditional probabilities
or other similarity values, with  some indication of the type of
similarity measure used and links to the specific instance sets used
to calculate the values. Others could then use these measures as they
wished, setting their own thresholds for when to consider something an
equivalence relation or not.

Are there any vocabularies that might be used to publish such as data
set as Linked Data?


UMBEL has a specific vocabulary and set of properties for this. 
See the umbel:withAlignment and umbel:withLikelihood properties:


http://www.umbel.org/technical_documentation.html#vocabulary

Thanks, Mike