CoIN: Composition of Identifier Names

2010-04-13 Thread Niklas Lindström
Hi all!

I'd like to point you to a vocabulary I've made for describing how to
mint (or validate) URI:s from RDF properties of a resource: CoIN -
Composition of Identifier Names [1].

It's completely based on needs we have in my current work, and may
still evolve a bit. Therefore this is both an early announcement and
an inquiry to see if this thing is of general interest.

I've found it very valuable to formally declare the pieces from which
an URI is to be composed of. Especially in our environment where we
have a central design of the URI:s, but decentralized publishing of
data (which is of a somewhat rich and varied nature). Currently we use
the CoIN scheme for our domain to:

* Formally express our URI compositions, thereby concretizing our
needs and potential complexities.
* Generate structured documentation about which properties (and lists
of tokens for resources such as publication series) the URI:s are
composed of (using XSLT on a Grit [2] serialization of it plus the
relevant vocabularies).
* Verify the published RDF descriptions by minting URI:s from this
data and comparing these to the supplied subjects (currently with
SPARQL+Groovy; next step is to see if Grit+EXSLT may be a more clean
approach (due to SPARQL 1.0:s inability to do recursion)).

I'd love to hear any thoughts on whether you'd find this approach
useful in general.

Best regards,
Niklas

[1]: http://code.google.com/p/court/wiki/COIN
[2]: http://code.google.com/p/oort/wiki/Grit



Re: CoIN: Composition of Identifier Names

2010-04-13 Thread Richard Cyganiak

Niklas,

On 13 Apr 2010, at 10:06, Niklas Lindström wrote:

I'd like to point you to a vocabulary I've made for describing how to
mint (or validate) URI:s from RDF properties of a resource: CoIN -
Composition of Identifier Names [1].


Nice. Creating URIs from descriptions of resources is a recurrent  
problem, so it's great to see a proposal in this space!


I had a look at the documentation and didn't quite manage to grasp how  
it works in detail. The documentation is mostly just a usage example,  
which is a nice start but doesn't quite do it for me. Looking at the  
N3 for rdfs:comments also didn't help much.


I think that URI Templates [3] might be a handy companion syntax for  
CoIN and I wonder if they could be integrated into CoIN. I'm thinking  
more about the general curly-brace-syntax rather than the fancy  
details. So perhaps you could start with something like


http://example.org/publ/{publisher}/{document}
http://example.org/publ/{publisher}/{document}/rev/{date}
http://example.org/profiles/{name}

and then attach further information to those {foo} parts, e.g. a  
TokenSet and the represented property.


Anyway, nice work.

Best,
Richard


[3] http://bitworking.org/projects/URI-Templates/




It's completely based on needs we have in my current work, and may
still evolve a bit. Therefore this is both an early announcement and
an inquiry to see if this thing is of general interest.

I've found it very valuable to formally declare the pieces from which
an URI is to be composed of. Especially in our environment where we
have a central design of the URI:s, but decentralized publishing of
data (which is of a somewhat rich and varied nature). Currently we use
the CoIN scheme for our domain to:

* Formally express our URI compositions, thereby concretizing our
needs and potential complexities.
* Generate structured documentation about which properties (and lists
of tokens for resources such as publication series) the URI:s are
composed of (using XSLT on a Grit [2] serialization of it plus the
relevant vocabularies).
* Verify the published RDF descriptions by minting URI:s from this
data and comparing these to the supplied subjects (currently with
SPARQL+Groovy; next step is to see if Grit+EXSLT may be a more clean
approach (due to SPARQL 1.0:s inability to do recursion)).

I'd love to hear any thoughts on whether you'd find this approach
useful in general.

Best regards,
Niklas

[1]: http://code.google.com/p/court/wiki/COIN
[2]: http://code.google.com/p/oort/wiki/Grit






Re: Natural Keys and Patterned URIs

2010-04-13 Thread Leigh Dodds
Hi Patrick,

On 10 April 2010 17:44:06 UTC+1, Patrick Logan patrickdlo...@gmail.com wrote:
 Ah, never mind. I think I found the answer... Literal Key. Perhaps the
 other patterns should mention this and include Literal Key in the
 Related section?

I'll make sure there are some extra cross-references.

The discovery aspects are interesting here as ideally you want to look
them up based on a known identifier property that stores the Literal
Key.

OWL 2 has some support for defining keys and I ought to reference this
from the pattern.

There also needs to be some discussion around using dc:identifier or
sub-properties. The former can be easier to discover is there any
resource with X as an identifier, while the latter can carry more
semantics. An intermediary position is to use dc:identifier with a
Custom Datatype. SKOS encourages the latter via skos:notation is
always has to have a datatype associated with it.

Cheers,

L.
-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



Re: Announce: Linked Data Patterns book

2010-04-13 Thread Pierre-Antoine Champin
Wonderful.

Any PDF version available?

  pa

On 06/04/2010 16:10, Leigh Dodds wrote:
 Hi folks,
 
 Ian Davis and I have been working on a catalogue of Linked Data
 patterns which we've put on-line as a free book. The work is licensed
 under a Creative Commons attribution license.
 
 This is is still a very early draft but already contains 30 patterns
 covering identifiers, modelling, publishing and consuming Linked Data.
 
 http://patterns.dataincubator.org
 
 More background at [1]. We'd be interested to hear your comments, and
 hope that it can become a useful resource for the growing community of
 practitioners.
 
 Cheers,
 
 L.
 
 [1]. 
 http://www.ldodds.com/blog/2010/04/linked-data-patterns-a-free-book-for-practitioners/
 




XMP RDF extractors?

2010-04-13 Thread Dan Brickley
On Tue, Apr 13, 2010 at 3:56 PM, Leigh Dodds leigh.do...@talis.com wrote:
 Hi,

 Yes.

 PDF: http://patterns.dataincubator.org/book/linked-data-patterns.pdf
 EPUB: http://patterns.dataincubator.org/book/linked-data-patterns.epub

Something of a tangent but this reminds me, what's the latest on RDF
extractors for Adobe XMP? I always used to use 'strings' and a regex
but I haven't tracked the spec and have found this trick working
*less* well over time, not better.

strings linked-data-patterns.pdf | grep -i xmp
 id=W5M0MpCehiHzreSzNTczkc9d?x:xmpmeta xmlns:x=adobe:ns:meta/
rdf:Description xmlns:xmp=http://ns.adobe.com/xap/1.0/; rdf:about=
xmp:CreateDate2010-04-12T23:01:36+01:00/xmp:CreateDate
/x:xmpmeta?xpacket end=r?

By contrast, downloading the .epub file and unzipping you find this in
content.opf:

?xml version=1.0 encoding=utf-8 standalone=no?
package xmlns=http://www.idpf.org/2007/opf; version=2.0
unique-identifier=bookid
  metadata
dc:identifier xmlns:dc=http://purl.org/dc/elements/1.1/;
id=bookid_id2880071/dc:identifier
dc:title xmlns:dc=http://purl.org/dc/elements/1.1/;Linked Data
Patterns/dc:title
dc:creator xmlns:dc=http://purl.org/dc/elements/1.1/;
xmlns:opf=http://www.idpf.org/2007/opf; opf:file-as=Dodds,
LeighLeigh Dodds/dc:creator
dc:creator xmlns:dc=http://purl.org/dc/elements/1.1/;
xmlns:opf=http://www.idpf.org/2007/opf; opf:file-as=Davis, IanIan
Davis/dc:creator
dc:description xmlns:dc=http://purl.org/dc/elements/1.1/;This
book lives at http://patterns.dataincubator.org. Check that website
for the latest version. This work is licenced under the Creative
Commons Attribution 2.0 UK: England amp; Wales License. To view a
copy of this licence, visit
http://creativecommons.org/licenses/by/2.0/uk/. Thanks to members of
the Linked Data mailing list for their feedback and input, and Sean
Hannan for contributing some CSS to style the online
book./dc:description
dc:language xmlns:dc=http://purl.org/dc/elements/1.1/;en/dc:language
  /metadata
  manifest
item id=ncxtoc media-type=application/x-dtbncx+xml href=toc.ncx/
item id=htmltoc media-type=application/xhtml+xml href=bk01-toc.html/
item id=id2880071 href=index.html media-type=application/xhtml+xml/

Wouldn't it be nice if there were easy conventions for books about RDF
to have Webby linked RDF bundled in the files? Both seem nearly there
but not quite... (this not a complaint Leigh, I love this work btw!)

cheers,

Dan


ps. re epub see also
http://lists.w3.org/Archives/Public/public-lod/2010Jan/0121.html



Re: What would you build with a web of data? Decision support

2010-04-13 Thread Wolfgang Orthuber

Hi Georgi,

First let me underline that the following is not a detached theory, it 
is very practical:


The web of data can support the clinician in his cycle of decision:

(a)The clinician makes measurements (in the broadest sense, also 
speaking with the patient and looking at a picture is a measurement).
(b)The clinician focuses on those measurement results which are 
interesting for his therapeutic decisions (feature extraction).
(c)The clinician compares these measurement results with experience. 
At this he may use rules or models which are derived from common experience.
(d)The clinician decides for therapy, and measures the effect of his 
decision, i.e. the cycle starts again with (a).


Good and large experience is very important for step (c).

The cycle of decision (measurements - feature extraction - comparison 
with experience - decision) is also effective outside medicine: Before 
every conscious decision we *compare* decision relevant data with 
experience (or a model which is derived from common experience). 
Experience says, at *similar* situations possibility X yields better 
results than other possibilities, so we decide for possibility X. Even 
if we try to decide best, our decisions are suboptimal due to limited 
experience.


The web of data can be designed in a way, that it collects experiences 
(also decision relevant measurements of machines) in a precise and 
*comparable* way (much more precise and better comparable than text). So 
the web of data can summarize experiences in well defined comparable way 
for decision support.


For this a clear similarity relation is necessary. The natural way to do 
this is a vectorial description of resources, i.e. quantification of the 
resource's properties and regarding the result (a sequence of numbers) 
as vector. After defining an appropriate metric (distance function) we 
can calculate similarity of vectors by calculating the distance between 
them - the less the distance, the more similar are the vectors and (in 
case of good quantification) the original resources. Using HTTP URIs 
allows that all domain name owners can define these vectors and 
optimized distance functions.


Therefore i suggest to introduce standardized Vectorial Resource 
Descriptors (VRDs) on the WEB - and it seems the best possibility to 
integrate these in Linked Data. The paper 
http://www.orthuber.com/wp1.pdf describes details. It is not completely 
up to date, and though the basal content of the VRDs (and Vector Space 
Descriptors - VSDs) is clear, I have not been sure about the syntax of 
the RDF examples (Chapters 2.2.2 and 2.2.3 currently) - and I would like 
to adapt the syntax to suggestions from the community.


So comments and suggestions are very welcome!

Best

Wolfgang


Georgi Kobilarov schrieb:

Yesterday issued a challenge on my blog for ideas for concrete linked open
data applications. Because talking about concrete apps helps shaping the
roadmap for the technical questions for the linked data community ahead. The
real questions, not the theoretical ones...

Richard MacManus of ReadWriteWeb picked up the challenge:
http://www.readwriteweb.com/archives/web_of_data_what_would_you_build.php

Let's be creative about stuff we'd build with the web of data. Assume the
Linked Data Web would be there already, what would build?

Cheers,
Georgi

--
Georgi Kobilarov
Uberblic Labs Berlin
http://blog.georgikobilarov.com




  




Re: CoIN: Composition of Identifier Names

2010-04-13 Thread Robert Sanderson
A quick question...

2010/4/13 Niklas Lindström lindstr...@gmail.com:

 I've found it very valuable to formally declare the pieces from which
 an URI is to be composed of. Especially in our environment where we
 have a central design of the URI:s, but decentralized publishing of
 data (which is of a somewhat rich and varied nature).


How does this mesh with URIs being opaque?  If the URIs were actually
opaque and treated as such, then formally declaring the parts would be
a non-issue.  It seems that this ideal is being increasingly watered
down or ignored... is that intentional, and is it a good or bad thing?

Thoughts?

Rob Sanderson



Re: CoIN: Composition of Identifier Names

2010-04-13 Thread Pierre-Antoine Champin
Here are my 2¢ about the opacity of resources.

First, let me point out that, contrary to what is often believed/claimed
(and I plead guilty of having done so), URI opacity is *not* a
constraint of the REST architectural style, at least as defined by
Fielding in his thesis [1].

Then, AFAIK, the main reference for URI opacity is [2]. The axiom states
that you should not look at the contents of the URI string to gain
other information. If you read the following, you see that you mainly
means your software. From this, I personaly draw two conclusions:


1/ URI opacity is a desirable feature of software handling URIs, not of
URIs themselves.

A hacker trying to get familiar with a source of URIs/linked data
should, on the other hand, be able to easily understand what is going
on... This is a good property, and does not contradict the opacity axiom
as long as that hacker does not make his/her software *relying* on such
an implicit understanding.



2/ Given a URI, a software should not try to reverse-engineer it.
However, the axiom does not prevent that a software be given a *rule* to
*produce* new URIs.

As a matter of fact, I would be surprised that TBL would discourage this
very mechanism which underlies all HTML-based forms (at least those
using the GET method). A form is nothing else than the specification of
a *whole set* of URIs, plus the technical tool to produce them easily in
your browser.


  pa

[1] http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm
[2] http://www.w3.org/DesignIssues/Axioms.html#opaque

On 13/04/2010 17:11, Robert Sanderson wrote:
 A quick question...
 
 2010/4/13 Niklas Lindström lindstr...@gmail.com:
 
 I've found it very valuable to formally declare the pieces from which
 an URI is to be composed of. Especially in our environment where we
 have a central design of the URI:s, but decentralized publishing of
 data (which is of a somewhat rich and varied nature).
 
 
 How does this mesh with URIs being opaque?  If the URIs were actually
 opaque and treated as such, then formally declaring the parts would be
 a non-issue.  It seems that this ideal is being increasingly watered
 down or ignored... is that intentional, and is it a good or bad thing?
 
 Thoughts?
 
 Rob Sanderson
 




Re: XMP RDF extractors?

2010-04-13 Thread Dan Brickley
On Tue, Apr 13, 2010 at 6:31 PM, Pierre-Antoine Champin
swlists-040...@champin.net wrote:
 Even more tangent, but when I read in detail the XMP spec last year (in
 relation to the Media Annotation WG), I came to two conclusions:

 - XMP specifies RDF at the level of the XML serialization, which is
 *ugly* (emphasis on *ugly*). Furthermore, it makes it unsafe to use
 standard RDF/XML serializers, as those may not enforce those syntactic
 constraints.

 - XMP interprets RDF/XML in a non-standard way, considering the two
 following tags as non equivalent
  ns1:bar    xmlns=http://example.com/foo;...
  ns2:foobar xmlns=http://example.com/;...
 (which is again, a syntax-only perspective). So it is not safe to use
 standard RDF/XML parsers, as they will produce a model which may be
 inconsistent with other XMP parsers.

 So you can neither use standard serializers nor standard parsers to
 handle XMP's RDF safely, so as far as I'm concerned, XMP is not really
 RDF -- and Dan's problems to extract it strengthen this opinion of mine...

 That being said, the risks of inconsistency are minimal, especially for
 parsing. So I guess there is some value in pretending XMP is RDF ;)
 and using an RDF parser to extract it...

I think we can and should be generous to Adobe here; there were
supportive of RDF since the late '90s - eg. Walter Chang's work on UML
and RDF http://www.w3.org/TR/NOTE-rdf-uml/ - and commiting to
something that is embedded within files that will mostly *never* be
re-generated (PDFs, JPEGs etc in the wild) makes for naturally
conservative design. There are probably many kinds of improvement they
could make, but being back-compatible with the large bulk of deployed
XMP must be a major concern. Pushing out revisions to tools on the
scale of Photoshop etc isn't easy, especially when the new stuff will
also have to read/write properly in older deployed tools for unknown
years to come.

That said I think we would do well to look around more actively at
what's out there via XMP, and see how it hangs together when
re-aggregated into a common SPARQL environment. In particular XMP
pre-dates SKOS, and I imagine many of the environments where XMP
matters would benefit from the kinds of integration SKOS can bring. So
I'd love to see some exploration of that...

cheers,

Dan



Re: CoIN: Composition of Identifier Names

2010-04-13 Thread Richard Cyganiak

Hi Robert,

On 13 Apr 2010, at 17:11, Robert Sanderson wrote:

I've found it very valuable to formally declare the pieces from which
an URI is to be composed of. Especially in our environment where we
have a central design of the URI:s, but decentralized publishing of
data (which is of a somewhat rich and varied nature).


How does this mesh with URIs being opaque?


The “URI opacity” axiom does not say that URIs should be opaque.

It says that clients should *treat them* as opaque.

This is because URI owners should have full authority about what their  
URIs identify and resolve to, and if clients make assumptions about  
what a URI will resolve to, then they are contesting this authority.


URI opacity in no way precludes URI owners from telling the world  
about the structure of their URI space.


In some sense, CoIN is no different from an HTML form that has  
@method=GET -- it specifies a mapping from some data (in the one  
case RDF resource descriptions, in the other key-value pairs of form  
fields and form values) to URIs.


It is true that link following should be preferred over URI  
construction, but this is not always possible, as shown by the example  
of, say, HTML search forms.


(Examples for violations of URI opacity: the /favicon.ico convention  
-- suddenly, server operators don't “own” that URI any more, because  
browsers will try to fetch a web site icon from there, no matter what  
the server operator wants that URI to denote. Or assuming that all  
URIs that end in .png must be rendered as image files -- the publisher  
might have a web page at that URI, and the assumption conflicts with  
that.)


Best,
Richard



If the URIs were actually
opaque and treated as such, then formally declaring the parts would be
a non-issue.  It seems that this ideal is being increasingly watered
down or ignored... is that intentional, and is it a good or bad thing?

Thoughts?

Rob Sanderson






Re: CoIN: Composition of Identifier Names

2010-04-13 Thread Richard Cyganiak

On 13 Apr 2010, at 18:04, Pierre-Antoine Champin wrote:

2/ Given a URI, a software should not try to reverse-engineer it.
However, the axiom does not prevent that a software be given a  
*rule* to

*produce* new URIs.

As a matter of fact, I would be surprised that TBL would discourage  
this

very mechanism which underlies all HTML-based forms (at least those
using the GET method). A form is nothing else than the specification  
of
a *whole set* of URIs, plus the technical tool to produce them  
easily in

your browser.


Didn't read this before writing my own response -- well said!

Cheers,
Richard





 pa

[1] http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm
[2] http://www.w3.org/DesignIssues/Axioms.html#opaque

On 13/04/2010 17:11, Robert Sanderson wrote:

A quick question...

2010/4/13 Niklas Lindström lindstr...@gmail.com:

I've found it very valuable to formally declare the pieces from  
which

an URI is to be composed of. Especially in our environment where we
have a central design of the URI:s, but decentralized publishing of
data (which is of a somewhat rich and varied nature).



How does this mesh with URIs being opaque?  If the URIs were actually
opaque and treated as such, then formally declaring the parts would  
be

a non-issue.  It seems that this ideal is being increasingly watered
down or ignored... is that intentional, and is it a good or bad  
thing?


Thoughts?

Rob Sanderson









RE: [semanticweb] ANN: DBpedia 3.5 released

2010-04-13 Thread Michael Schneider
From: semantic-web-requ...@w3.org [mailto:semantic-web-requ...@w3.org] On
Behalf Of ba...@goldmail.de
Sent: Tuesday, April 13, 2010 1:13 PM
To: dbpedia-discuss...@lists.sourceforge.net;
dbpedia-announceme...@lists.sourceforge.net; Chris Bizer
Cc: public-lod@w3.org; 'SW-forum'; semantic...@yahoogroups.com
Subject: Re: [semanticweb] ANN: DBpedia 3.5 released

A fact of my experience since many years:

The homepage of my grandma is better accessible than the flagship(!) of
'linked data' dbpedia.org...

Let's do the test!

In Firefox, using best guesses:

http://dbpedia.org

Works!

http://barans-grandma.org

Does not work! 

How many years experience do I need to be able to access your grandma's
homepage?

Michael

--
Dipl.-Inform. Michael Schneider
Research Scientist, Information Process Engineering (IPE)
Tel  : +49-721-9654-726
Fax  : +49-721-9654-727
Email: michael.schnei...@fzi.de
WWW  : http://www.fzi.de/michael.schneider
===
FZI Forschungszentrum Informatik an der Universität Karlsruhe
Haid-und-Neu-Str. 10-14, D-76131 Karlsruhe
Tel.: +49-721-9654-0, Fax: +49-721-9654-959
Stiftung des bürgerlichen Rechts, Az 14-0563.1, RP Karlsruhe
Vorstand: Prof. Dr.-Ing. Rüdiger Dillmann, Dipl. Wi.-Ing. Michael Flor,
Prof. Dr. Dr. h.c. Wolffried Stucky, Prof. Dr. Rudi Studer
Vorsitzender des Kuratoriums: Ministerialdirigent Günther Leßnerkraus
===




Re: Hungarian National Library published its entire OPAC and Digital Library as Linked Data

2010-04-13 Thread Ed Summers
Sorry I had a small typo in that oai-ore example I included. I meant
to type ore:aggregates instead of ore-aggregates. I also meant to
include assertions about the format of the files, which can be handy
to have. //Ed

http://oszkdk.oszk.hu/resource/DRJ/404
   dc:creator http://nektar.oszk.hu/resource/auth/33589, Jókai
Mór,, (1825-1904.) ;
   dc:date cop. 2006 ;
   dc:description 
Működési követelmények: Adobe Reader / MS Reader
, Főcím a címképernyőről, Szöveg (pdf : 1.2 MB) (lit : 546 KB) ;
   dc:identifier , 963-606-169-6 (pdf), 963-606-170-X (lit) ;
   dc:language hun ;
   dc:publisher Szentendre : Mercator Stúdió ;
   dc:subject http://nektar.oszk.hu/resource/auth/magyar_irodalom,
magyar irodalom. ;
   dc:title Dekameron ;
   dc:type book, elbeszélés., elektronikus dokumentum., no
type provided ;
   ore:aggregate
http://oszkdk.oszk.hu/storage/00/00/04/04/dd/1/dekameron.pdf,
http://oszkdk.oszk.hu/storage/00/00/04/04/dd/2/dekameron.lit .

http://oszkdk.oszk.hu/storage/00/00/04/04/dd/1/dekameron.pdf
dc:format application/pdf .
http://oszkdk.oszk.hu/storage/00/00/04/04/dd/2/dekameron.lit
dc:format application/x-ms-reader .



Re: CoIN: Composition of Identifier Names

2010-04-13 Thread Ivan Žužak
Hi all,

Here's the juice on URI opacity, right from Roy [1].

The important bits:

Opacity of URI only applies to clients and, even then, only to
those parts of the URI that are not defined by relevant standards.
Origin servers, for example, have the choice of interpreting a
URI as being opaque or as a structure that defines how the server
maps the URI to a representation of the resource. Cool URIs will
often make a transition from being originally interpreted as
structure by the server and then later treated as an opaque
string (perhaps because the server implementation has changed
and the owner wants the old URI to persist). The server can make
that transition because clients are required to act like they
are ignorant of the server-private structure.

Clients are allowed to treat a URI as being structured
if that structure is defined by standard (e.g., scheme and
authority in http) or if the server tells the client how its
URI is structured. For example, both GET-based FORM actions and
server-side image map processing compose the URI from a
server-provided base and a user-supplied suffix constructed
according to an algorithm defined by a standard media type.

Ivan

[1] http://tech.groups.yahoo.com/group/rest-discuss/message/5369


On Tue, Apr 13, 2010 at 19:30, Richard Cyganiak rich...@cyganiak.de wrote:
 On 13 Apr 2010, at 18:04, Pierre-Antoine Champin wrote:

 2/ Given a URI, a software should not try to reverse-engineer it.
 However, the axiom does not prevent that a software be given a *rule* to
 *produce* new URIs.

 As a matter of fact, I would be surprised that TBL would discourage this
 very mechanism which underlies all HTML-based forms (at least those
 using the GET method). A form is nothing else than the specification of
 a *whole set* of URIs, plus the technical tool to produce them easily in
 your browser.

 Didn't read this before writing my own response -- well said!

 Cheers,
 Richard




  pa

 [1] http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm
 [2] http://www.w3.org/DesignIssues/Axioms.html#opaque

 On 13/04/2010 17:11, Robert Sanderson wrote:

 A quick question...

 2010/4/13 Niklas Lindström lindstr...@gmail.com:

 I've found it very valuable to formally declare the pieces from which
 an URI is to be composed of. Especially in our environment where we
 have a central design of the URI:s, but decentralized publishing of
 data (which is of a somewhat rich and varied nature).


 How does this mesh with URIs being opaque?  If the URIs were actually
 opaque and treated as such, then formally declaring the parts would be
 a non-issue.  It seems that this ideal is being increasingly watered
 down or ignored... is that intentional, and is it a good or bad thing?

 Thoughts?

 Rob Sanderson









Re: XMP RDF extractors?

2010-04-13 Thread Leigh Dodds
Hi,

On Tuesday, April 13, 2010, Dan Brickley dan...@danbri.org wrote:
 ...snip!...

 Wouldn't it be nice if there were easy conventions for books about RDF
 to have Webby linked RDF bundled in the files? Both seem nearly there
 but not quite... (this not a complaint Leigh, I love this work btw!)

Thanks for the feedback, glad you like it.

The epub and PDF formats ate just generated with the docbook-xsl
stylesheets. I'm really pleased that there's any machine readable data
in there at all! It's on my TODO lost to get some RDFa into the the
HTML output.

I guess if folk wanted to improve the quality of metadata for ebooks
and PDFs then exploring how to enhance docbook conversions would be a
good start.

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



Re: [semanticweb] ANN: DBpedia 3.5 released

2010-04-13 Thread baran

A fact of my experience since many years:

The homepage of my grandma is better accessible than the flagship(!) of
'linked data' dbpedia.org...


Let's do the test!

In Firefox, using best guesses:

http://dbpedia.org

Works!

http://barans-grandma.org

Does not work!
How many years experience do I need to be able to access your grandma's
homepage?

Dipl.-Inform. Michael Schneider
Research Scientist, Information Process Engineering (IPE)
FZI Forschungszentrum Informatik an der Universität Karlsruhe


Someone who has used the endpoint dbpedia.org/sparql intensively
knows what i mean:

After one or two hours or so, it hangs, i try dbpedia.org with FFox,
Opera, IE,
it hangs also, after 5 minutes i try dbpedia.org, i see the page,
for dbpedia.org/sparql i put my simple query again, it is ok.

Since years it is the same story in the same rhythm.

But if someone clicks dbpedoa.org only once, then he has of course
also the time to write such a nonsense like you did it above from:

'FZI Forschungszentrum Informatik an der Universität Karlsruhe'

And if i send a mail the server doesnt't work properly,
i can get perhaps a reply from Chris Bizer suggesting

'maintanance work on the DBpedia server'...

i think there has been so much maintenance work there that even
a simple click on dbpedia.org was hanging too oft comparing to
the homepage of my grandma...

baran.
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/ [2]




Re: [Dbpedia-discussion] ANN: DBpedia 3.5 released

2010-04-13 Thread Nicolas Torzec
Dear DBpedia workers,
First of all, many thanks for this new release :)

Then, I have a quick question regarding the difference between the dbpedia
3.4 raw infobox data set and the dbpedia 3.5 raw infobox data set.
- http://downloads.dbpedia.org/3.5/en/infobox_properties_en.nt.bz2
- http://downloads.dbpedia.org/3.4/en/infobox_en.nt.bz2

Comparing the two, it appears that the dbpedia 3.5 infobox data set (4.7G)
is actually much smaller than the dbpedia 3.4 infobox data set (5.7G).

Do you know why the trend is not size increase, but size reduction?
Did you change anything in the way that raw infobox data sets are extracted?


Cheers,
Nicolas.

--
Nicolas Torzec
Yahoo! Labs.



On 4/12/10 2:06 AM, Chris Bizer ch...@bizer.de wrote:

 Hi all,
 
 we are happy to announce the release of DBpedia 3.5.
 
 The new release is based on Wikipedia dumps dating from March 2010. Compared
 to the 3.4 release, we were able to increase the quality of the DBpedia
 knowledge base by employing a new data extraction framework which applies
 various data cleansing heuristics as well as by extending the
 infobox-to-ontology mappings that guide the data extraction process.
 
 The new DBpedia knowledge base describes more than 3.4 million things, out
 of which 1.47 million are classified in a consistent ontology, including
 312,000 persons, 413,000 places, 94,000 music albums, 49,000 films, 15,000
 video games, 140,000 organizations, 146,000 species and 4,600 diseases. The
 DBpedia data set features labels and abstracts for these 3.2 million things
 in up to 92 different languages; 1,460,000 links to images and 5,543,000
 links to external web pages; 4,887,000 external links into other RDF
 datasets, 565,000 Wikipedia categories, and 75,000 YAGO categories. The
 DBpedia knowledge base altogether consists of over 1 billion pieces of
 information (RDF triples) out of which 257 million were extracted from the
 English edition of Wikipedia and 766 million were extracted from other
 language editions.
 
 The new release provides the following improvements and changes compared to
 the DBpedia 3.4 release:
 
 1. The DBpedia extraction framework has been completely rewritten in Scala.
 The new framework dramatically reduces the extraction time of a single
 Wikipedia article from over 200 to about 13 milliseconds. All features of
 the previous PHP framework have been ported. In addition, the new framework
 can extract data from Wikipedia tables based on table-to-ontology mappings
 and is able to extract multiple infoboxes out of a single Wikipedia article.
 The data from each infobox is represented as a separate RDF resource. All
 resources that are extracted from a single page can be connected using
 custom RDF properties which are also defined in the mappings. A lot of work
 also went into the value parsers and the DBpedia 3.5 dataset should
 therefore be much cleaner than its predecessors. In addition, units of
 measurement are normalized to their respective SI unit, which makes querying
 DBpedia easier. 
 
 2. The mapping language that is used to map Wikipedia infoboxes to the
 DBpedia Ontology has been redesigned. The documentation of the new mapping
 language is found at
 http://dbpedia.svn.sourceforge.net/viewvc/dbpedia/trunk/extraction/core/doc/
 mapping%20language/
 
 3. In order to enable the DBpedia user community to extend and refine the
 infobox to ontology mappings, the mappings can be edited on the newly
 created wiki hosted on http://mappings.dbpedia.org. At the moment, 303
 template mappings are defined, which cover (including redirects) 1055
 templates. On the wiki, the DBpedia Ontology can be edited by the community
 as well. At the moment, the ontology consists of 259 classes and about 1,200
 properties.
  
 4. The ontology properties extracted from infoboxes are now split into two
 data sets: 1. The Ontology Infobox Properties dataset contains the
 properties as they are defined in the ontology (e.g. length). The range of a
 property is either an xsd schema type or a dimension of measurement, in
 which case the value is normalized to the respective SI unit. 2. The
 Ontology Infobox Properties (Specific) dataset contains properties which
 have been specialized for a specific class using a specific unit. e.g. the
 property height is specialized on the class Person using the unit
 centimeters instead of meters. For further details please refer to
 http://wiki.dbpedia.org/Datasets#h18-11.
  
 5. The framework now resolves template redirects, making it possible to
 cover all redirects to an infobox on Wikipedia with a single mapping.
 
 6. Three new extractors have been implemented: 1. PageIdExtractor extracting
 Wikipedia page IDs are extracted for each page. 2. RevisionExtractor
 extracting the latest revision of a page. 3. PNDExtractor extracting PND
 (Personnamendatei) identifiers.
 
 7. The data set now provides labels, abstracts, page links and infobox data
 in 92 different languages, which have been extracted from 

Re: CoIN: Composition of Identifier Names

2010-04-13 Thread Ed Summers
2010/4/13 Richard Cyganiak rich...@cyganiak.de:
 I think that URI Templates [3] might be a handy companion syntax for CoIN
 and I wonder if they could be integrated into CoIN. I'm thinking more about
 the general curly-brace-syntax rather than the fancy details. So perhaps you
 could start with something like

 http://example.org/publ/{publisher}/{document}
 http://example.org/publ/{publisher}/{document}/rev/{date}
 http://example.org/profiles/{name}

I second the idea of exploring the use of URI Templates for
documenting how to construct a URL from other data. I'm not sure if
it's part of the latest URI Templates draft [1], but OpenSearch allows
parameter names to be defined with namespaces [2]. For example:

 ?xml version=1.0 encoding=UTF-8?
 OpenSearchDescription xmlns=http://a9.com/-/spec/opensearch/1.1/;

xmlns:geo=http://a9.com/-/opensearch/extensions/geo/1.0/;
   Url type=application/vnd.google-earth.kml+xml
   
template=http://example.com/?q={searchTerms}pw={startPage?}bbox={geo:box?}format=kml/
/OpenSearchDescription

Note, the use of the geo namespace and the geo:box parameter name? So
you could imagine a URL template that referenced names from an RDF
vocabulary:

Url type=application/rdf+xml
template=http://example.com/user/{foaf:mbox_sha1sum}; /

OpenSearch was an incubator for the ideas that led to the URI
Templates draft, and is built into many modern web browsers (IE,
Firefox, Chrome).

//Ed

[1] http://tools.ietf.org/html/draft-gregorio-uritemplate-04
[2] http://www.opensearch.org/Specifications/OpenSearch/1.1#Parameter_names



Re: Natural Keys and Patterned URIs

2010-04-13 Thread Peter Ansell
On 13 April 2010 22:00, Leigh Dodds leigh.do...@talis.com wrote:
 Hi Patrick,

 On 10 April 2010 17:44:06 UTC+1, Patrick Logan patrickdlo...@gmail.com 
 wrote:
 Ah, never mind. I think I found the answer... Literal Key. Perhaps the
 other patterns should mention this and include Literal Key in the
 Related section?

 I'll make sure there are some extra cross-references.

 The discovery aspects are interesting here as ideally you want to look
 them up based on a known identifier property that stores the Literal
 Key.

 OWL 2 has some support for defining keys and I ought to reference this
 from the pattern.

 There also needs to be some discussion around using dc:identifier or
 sub-properties. The former can be easier to discover is there any
 resource with X as an identifier, while the latter can carry more
 semantics. An intermediary position is to use dc:identifier with a
 Custom Datatype. SKOS encourages the latter via skos:notation is
 always has to have a datatype associated with it.

skos:notation with a custom datatype is just as hard to find as
dc:identifier with a custom datatype, or merely a custom predicate
with a simple plain literal string. Either way, you have to know
exactly which URI's people are using for the datatype or predicate, so
the discovery or semantics of each scheme are exactly the same. There
is no discovery advantage in my opinion to using custom datatypes
where predicates are equally suitable, as there is no ability to match
?uri skos:notation 123.23. if the data is actually ?uri
skos:notation 123.23^^mydatatype . without treating all of the
objects as plain string literals. There doesn't seem to be any
advantage to using the datatype if people have to go through ?uri
skos:notation ?object . filter(str(?object) = 123.23) to get there,
and then they could have overlaps with other schemes anyway.

If you are looking for a predicate that is defined as a key, then you
could still have overlaps between schemes, as you are not recognising
the predicate explicitly. In all of the methods, one needs to know
they are looking for an identifier, and know what scheme the
identifier is defined in to get exact access, and in all of the cases
one may have overlaps if they only know they are looking for an
identifier without knowing which scheme it is defined in, so there is
no semantic difference.

All comes down to accessibility I think. If you want the scheme to be
more accessible to people who know the identifier but not the scheme
than string plain literals with a custom predicate is more useful. If
you want the scheme to be more accessible to people who know the
scheme, but want to know the identifier, than the standard predicates
, is, dc:identifier or skos:notation (w/ custom datatype) are more
useful. If the patterns document wants to portray the advantages of
different methods, rather than just giving best practices, then the
advantages of both methods could be explained.

Cheers,

Peter