CoIN: Composition of Identifier Names
Hi all! I'd like to point you to a vocabulary I've made for describing how to mint (or validate) URI:s from RDF properties of a resource: CoIN - Composition of Identifier Names [1]. It's completely based on needs we have in my current work, and may still evolve a bit. Therefore this is both an early announcement and an inquiry to see if this thing is of general interest. I've found it very valuable to formally declare the pieces from which an URI is to be composed of. Especially in our environment where we have a central design of the URI:s, but decentralized publishing of data (which is of a somewhat rich and varied nature). Currently we use the CoIN scheme for our domain to: * Formally express our URI compositions, thereby concretizing our needs and potential complexities. * Generate structured documentation about which properties (and lists of tokens for resources such as publication series) the URI:s are composed of (using XSLT on a Grit [2] serialization of it plus the relevant vocabularies). * Verify the published RDF descriptions by minting URI:s from this data and comparing these to the supplied subjects (currently with SPARQL+Groovy; next step is to see if Grit+EXSLT may be a more clean approach (due to SPARQL 1.0:s inability to do recursion)). I'd love to hear any thoughts on whether you'd find this approach useful in general. Best regards, Niklas [1]: http://code.google.com/p/court/wiki/COIN [2]: http://code.google.com/p/oort/wiki/Grit
Re: CoIN: Composition of Identifier Names
Niklas, On 13 Apr 2010, at 10:06, Niklas Lindström wrote: I'd like to point you to a vocabulary I've made for describing how to mint (or validate) URI:s from RDF properties of a resource: CoIN - Composition of Identifier Names [1]. Nice. Creating URIs from descriptions of resources is a recurrent problem, so it's great to see a proposal in this space! I had a look at the documentation and didn't quite manage to grasp how it works in detail. The documentation is mostly just a usage example, which is a nice start but doesn't quite do it for me. Looking at the N3 for rdfs:comments also didn't help much. I think that URI Templates [3] might be a handy companion syntax for CoIN and I wonder if they could be integrated into CoIN. I'm thinking more about the general curly-brace-syntax rather than the fancy details. So perhaps you could start with something like http://example.org/publ/{publisher}/{document} http://example.org/publ/{publisher}/{document}/rev/{date} http://example.org/profiles/{name} and then attach further information to those {foo} parts, e.g. a TokenSet and the represented property. Anyway, nice work. Best, Richard [3] http://bitworking.org/projects/URI-Templates/ It's completely based on needs we have in my current work, and may still evolve a bit. Therefore this is both an early announcement and an inquiry to see if this thing is of general interest. I've found it very valuable to formally declare the pieces from which an URI is to be composed of. Especially in our environment where we have a central design of the URI:s, but decentralized publishing of data (which is of a somewhat rich and varied nature). Currently we use the CoIN scheme for our domain to: * Formally express our URI compositions, thereby concretizing our needs and potential complexities. * Generate structured documentation about which properties (and lists of tokens for resources such as publication series) the URI:s are composed of (using XSLT on a Grit [2] serialization of it plus the relevant vocabularies). * Verify the published RDF descriptions by minting URI:s from this data and comparing these to the supplied subjects (currently with SPARQL+Groovy; next step is to see if Grit+EXSLT may be a more clean approach (due to SPARQL 1.0:s inability to do recursion)). I'd love to hear any thoughts on whether you'd find this approach useful in general. Best regards, Niklas [1]: http://code.google.com/p/court/wiki/COIN [2]: http://code.google.com/p/oort/wiki/Grit
Re: Natural Keys and Patterned URIs
Hi Patrick, On 10 April 2010 17:44:06 UTC+1, Patrick Logan patrickdlo...@gmail.com wrote: Ah, never mind. I think I found the answer... Literal Key. Perhaps the other patterns should mention this and include Literal Key in the Related section? I'll make sure there are some extra cross-references. The discovery aspects are interesting here as ideally you want to look them up based on a known identifier property that stores the Literal Key. OWL 2 has some support for defining keys and I ought to reference this from the pattern. There also needs to be some discussion around using dc:identifier or sub-properties. The former can be easier to discover is there any resource with X as an identifier, while the latter can carry more semantics. An intermediary position is to use dc:identifier with a Custom Datatype. SKOS encourages the latter via skos:notation is always has to have a datatype associated with it. Cheers, L. -- Leigh Dodds Programme Manager, Talis Platform Talis leigh.do...@talis.com http://www.talis.com
Re: Announce: Linked Data Patterns book
Wonderful. Any PDF version available? pa On 06/04/2010 16:10, Leigh Dodds wrote: Hi folks, Ian Davis and I have been working on a catalogue of Linked Data patterns which we've put on-line as a free book. The work is licensed under a Creative Commons attribution license. This is is still a very early draft but already contains 30 patterns covering identifiers, modelling, publishing and consuming Linked Data. http://patterns.dataincubator.org More background at [1]. We'd be interested to hear your comments, and hope that it can become a useful resource for the growing community of practitioners. Cheers, L. [1]. http://www.ldodds.com/blog/2010/04/linked-data-patterns-a-free-book-for-practitioners/
XMP RDF extractors?
On Tue, Apr 13, 2010 at 3:56 PM, Leigh Dodds leigh.do...@talis.com wrote: Hi, Yes. PDF: http://patterns.dataincubator.org/book/linked-data-patterns.pdf EPUB: http://patterns.dataincubator.org/book/linked-data-patterns.epub Something of a tangent but this reminds me, what's the latest on RDF extractors for Adobe XMP? I always used to use 'strings' and a regex but I haven't tracked the spec and have found this trick working *less* well over time, not better. strings linked-data-patterns.pdf | grep -i xmp id=W5M0MpCehiHzreSzNTczkc9d?x:xmpmeta xmlns:x=adobe:ns:meta/ rdf:Description xmlns:xmp=http://ns.adobe.com/xap/1.0/; rdf:about= xmp:CreateDate2010-04-12T23:01:36+01:00/xmp:CreateDate /x:xmpmeta?xpacket end=r? By contrast, downloading the .epub file and unzipping you find this in content.opf: ?xml version=1.0 encoding=utf-8 standalone=no? package xmlns=http://www.idpf.org/2007/opf; version=2.0 unique-identifier=bookid metadata dc:identifier xmlns:dc=http://purl.org/dc/elements/1.1/; id=bookid_id2880071/dc:identifier dc:title xmlns:dc=http://purl.org/dc/elements/1.1/;Linked Data Patterns/dc:title dc:creator xmlns:dc=http://purl.org/dc/elements/1.1/; xmlns:opf=http://www.idpf.org/2007/opf; opf:file-as=Dodds, LeighLeigh Dodds/dc:creator dc:creator xmlns:dc=http://purl.org/dc/elements/1.1/; xmlns:opf=http://www.idpf.org/2007/opf; opf:file-as=Davis, IanIan Davis/dc:creator dc:description xmlns:dc=http://purl.org/dc/elements/1.1/;This book lives at http://patterns.dataincubator.org. Check that website for the latest version. This work is licenced under the Creative Commons Attribution 2.0 UK: England amp; Wales License. To view a copy of this licence, visit http://creativecommons.org/licenses/by/2.0/uk/. Thanks to members of the Linked Data mailing list for their feedback and input, and Sean Hannan for contributing some CSS to style the online book./dc:description dc:language xmlns:dc=http://purl.org/dc/elements/1.1/;en/dc:language /metadata manifest item id=ncxtoc media-type=application/x-dtbncx+xml href=toc.ncx/ item id=htmltoc media-type=application/xhtml+xml href=bk01-toc.html/ item id=id2880071 href=index.html media-type=application/xhtml+xml/ Wouldn't it be nice if there were easy conventions for books about RDF to have Webby linked RDF bundled in the files? Both seem nearly there but not quite... (this not a complaint Leigh, I love this work btw!) cheers, Dan ps. re epub see also http://lists.w3.org/Archives/Public/public-lod/2010Jan/0121.html
Re: What would you build with a web of data? Decision support
Hi Georgi, First let me underline that the following is not a detached theory, it is very practical: The web of data can support the clinician in his cycle of decision: (a)The clinician makes measurements (in the broadest sense, also speaking with the patient and looking at a picture is a measurement). (b)The clinician focuses on those measurement results which are interesting for his therapeutic decisions (feature extraction). (c)The clinician compares these measurement results with experience. At this he may use rules or models which are derived from common experience. (d)The clinician decides for therapy, and measures the effect of his decision, i.e. the cycle starts again with (a). Good and large experience is very important for step (c). The cycle of decision (measurements - feature extraction - comparison with experience - decision) is also effective outside medicine: Before every conscious decision we *compare* decision relevant data with experience (or a model which is derived from common experience). Experience says, at *similar* situations possibility X yields better results than other possibilities, so we decide for possibility X. Even if we try to decide best, our decisions are suboptimal due to limited experience. The web of data can be designed in a way, that it collects experiences (also decision relevant measurements of machines) in a precise and *comparable* way (much more precise and better comparable than text). So the web of data can summarize experiences in well defined comparable way for decision support. For this a clear similarity relation is necessary. The natural way to do this is a vectorial description of resources, i.e. quantification of the resource's properties and regarding the result (a sequence of numbers) as vector. After defining an appropriate metric (distance function) we can calculate similarity of vectors by calculating the distance between them - the less the distance, the more similar are the vectors and (in case of good quantification) the original resources. Using HTTP URIs allows that all domain name owners can define these vectors and optimized distance functions. Therefore i suggest to introduce standardized Vectorial Resource Descriptors (VRDs) on the WEB - and it seems the best possibility to integrate these in Linked Data. The paper http://www.orthuber.com/wp1.pdf describes details. It is not completely up to date, and though the basal content of the VRDs (and Vector Space Descriptors - VSDs) is clear, I have not been sure about the syntax of the RDF examples (Chapters 2.2.2 and 2.2.3 currently) - and I would like to adapt the syntax to suggestions from the community. So comments and suggestions are very welcome! Best Wolfgang Georgi Kobilarov schrieb: Yesterday issued a challenge on my blog for ideas for concrete linked open data applications. Because talking about concrete apps helps shaping the roadmap for the technical questions for the linked data community ahead. The real questions, not the theoretical ones... Richard MacManus of ReadWriteWeb picked up the challenge: http://www.readwriteweb.com/archives/web_of_data_what_would_you_build.php Let's be creative about stuff we'd build with the web of data. Assume the Linked Data Web would be there already, what would build? Cheers, Georgi -- Georgi Kobilarov Uberblic Labs Berlin http://blog.georgikobilarov.com
Re: CoIN: Composition of Identifier Names
A quick question... 2010/4/13 Niklas Lindström lindstr...@gmail.com: I've found it very valuable to formally declare the pieces from which an URI is to be composed of. Especially in our environment where we have a central design of the URI:s, but decentralized publishing of data (which is of a somewhat rich and varied nature). How does this mesh with URIs being opaque? If the URIs were actually opaque and treated as such, then formally declaring the parts would be a non-issue. It seems that this ideal is being increasingly watered down or ignored... is that intentional, and is it a good or bad thing? Thoughts? Rob Sanderson
Re: CoIN: Composition of Identifier Names
Here are my 2¢ about the opacity of resources. First, let me point out that, contrary to what is often believed/claimed (and I plead guilty of having done so), URI opacity is *not* a constraint of the REST architectural style, at least as defined by Fielding in his thesis [1]. Then, AFAIK, the main reference for URI opacity is [2]. The axiom states that you should not look at the contents of the URI string to gain other information. If you read the following, you see that you mainly means your software. From this, I personaly draw two conclusions: 1/ URI opacity is a desirable feature of software handling URIs, not of URIs themselves. A hacker trying to get familiar with a source of URIs/linked data should, on the other hand, be able to easily understand what is going on... This is a good property, and does not contradict the opacity axiom as long as that hacker does not make his/her software *relying* on such an implicit understanding. 2/ Given a URI, a software should not try to reverse-engineer it. However, the axiom does not prevent that a software be given a *rule* to *produce* new URIs. As a matter of fact, I would be surprised that TBL would discourage this very mechanism which underlies all HTML-based forms (at least those using the GET method). A form is nothing else than the specification of a *whole set* of URIs, plus the technical tool to produce them easily in your browser. pa [1] http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm [2] http://www.w3.org/DesignIssues/Axioms.html#opaque On 13/04/2010 17:11, Robert Sanderson wrote: A quick question... 2010/4/13 Niklas Lindström lindstr...@gmail.com: I've found it very valuable to formally declare the pieces from which an URI is to be composed of. Especially in our environment where we have a central design of the URI:s, but decentralized publishing of data (which is of a somewhat rich and varied nature). How does this mesh with URIs being opaque? If the URIs were actually opaque and treated as such, then formally declaring the parts would be a non-issue. It seems that this ideal is being increasingly watered down or ignored... is that intentional, and is it a good or bad thing? Thoughts? Rob Sanderson
Re: XMP RDF extractors?
On Tue, Apr 13, 2010 at 6:31 PM, Pierre-Antoine Champin swlists-040...@champin.net wrote: Even more tangent, but when I read in detail the XMP spec last year (in relation to the Media Annotation WG), I came to two conclusions: - XMP specifies RDF at the level of the XML serialization, which is *ugly* (emphasis on *ugly*). Furthermore, it makes it unsafe to use standard RDF/XML serializers, as those may not enforce those syntactic constraints. - XMP interprets RDF/XML in a non-standard way, considering the two following tags as non equivalent ns1:bar xmlns=http://example.com/foo;... ns2:foobar xmlns=http://example.com/;... (which is again, a syntax-only perspective). So it is not safe to use standard RDF/XML parsers, as they will produce a model which may be inconsistent with other XMP parsers. So you can neither use standard serializers nor standard parsers to handle XMP's RDF safely, so as far as I'm concerned, XMP is not really RDF -- and Dan's problems to extract it strengthen this opinion of mine... That being said, the risks of inconsistency are minimal, especially for parsing. So I guess there is some value in pretending XMP is RDF ;) and using an RDF parser to extract it... I think we can and should be generous to Adobe here; there were supportive of RDF since the late '90s - eg. Walter Chang's work on UML and RDF http://www.w3.org/TR/NOTE-rdf-uml/ - and commiting to something that is embedded within files that will mostly *never* be re-generated (PDFs, JPEGs etc in the wild) makes for naturally conservative design. There are probably many kinds of improvement they could make, but being back-compatible with the large bulk of deployed XMP must be a major concern. Pushing out revisions to tools on the scale of Photoshop etc isn't easy, especially when the new stuff will also have to read/write properly in older deployed tools for unknown years to come. That said I think we would do well to look around more actively at what's out there via XMP, and see how it hangs together when re-aggregated into a common SPARQL environment. In particular XMP pre-dates SKOS, and I imagine many of the environments where XMP matters would benefit from the kinds of integration SKOS can bring. So I'd love to see some exploration of that... cheers, Dan
Re: CoIN: Composition of Identifier Names
Hi Robert, On 13 Apr 2010, at 17:11, Robert Sanderson wrote: I've found it very valuable to formally declare the pieces from which an URI is to be composed of. Especially in our environment where we have a central design of the URI:s, but decentralized publishing of data (which is of a somewhat rich and varied nature). How does this mesh with URIs being opaque? The “URI opacity” axiom does not say that URIs should be opaque. It says that clients should *treat them* as opaque. This is because URI owners should have full authority about what their URIs identify and resolve to, and if clients make assumptions about what a URI will resolve to, then they are contesting this authority. URI opacity in no way precludes URI owners from telling the world about the structure of their URI space. In some sense, CoIN is no different from an HTML form that has @method=GET -- it specifies a mapping from some data (in the one case RDF resource descriptions, in the other key-value pairs of form fields and form values) to URIs. It is true that link following should be preferred over URI construction, but this is not always possible, as shown by the example of, say, HTML search forms. (Examples for violations of URI opacity: the /favicon.ico convention -- suddenly, server operators don't “own” that URI any more, because browsers will try to fetch a web site icon from there, no matter what the server operator wants that URI to denote. Or assuming that all URIs that end in .png must be rendered as image files -- the publisher might have a web page at that URI, and the assumption conflicts with that.) Best, Richard If the URIs were actually opaque and treated as such, then formally declaring the parts would be a non-issue. It seems that this ideal is being increasingly watered down or ignored... is that intentional, and is it a good or bad thing? Thoughts? Rob Sanderson
Re: CoIN: Composition of Identifier Names
On 13 Apr 2010, at 18:04, Pierre-Antoine Champin wrote: 2/ Given a URI, a software should not try to reverse-engineer it. However, the axiom does not prevent that a software be given a *rule* to *produce* new URIs. As a matter of fact, I would be surprised that TBL would discourage this very mechanism which underlies all HTML-based forms (at least those using the GET method). A form is nothing else than the specification of a *whole set* of URIs, plus the technical tool to produce them easily in your browser. Didn't read this before writing my own response -- well said! Cheers, Richard pa [1] http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm [2] http://www.w3.org/DesignIssues/Axioms.html#opaque On 13/04/2010 17:11, Robert Sanderson wrote: A quick question... 2010/4/13 Niklas Lindström lindstr...@gmail.com: I've found it very valuable to formally declare the pieces from which an URI is to be composed of. Especially in our environment where we have a central design of the URI:s, but decentralized publishing of data (which is of a somewhat rich and varied nature). How does this mesh with URIs being opaque? If the URIs were actually opaque and treated as such, then formally declaring the parts would be a non-issue. It seems that this ideal is being increasingly watered down or ignored... is that intentional, and is it a good or bad thing? Thoughts? Rob Sanderson
RE: [semanticweb] ANN: DBpedia 3.5 released
From: semantic-web-requ...@w3.org [mailto:semantic-web-requ...@w3.org] On Behalf Of ba...@goldmail.de Sent: Tuesday, April 13, 2010 1:13 PM To: dbpedia-discuss...@lists.sourceforge.net; dbpedia-announceme...@lists.sourceforge.net; Chris Bizer Cc: public-lod@w3.org; 'SW-forum'; semantic...@yahoogroups.com Subject: Re: [semanticweb] ANN: DBpedia 3.5 released A fact of my experience since many years: The homepage of my grandma is better accessible than the flagship(!) of 'linked data' dbpedia.org... Let's do the test! In Firefox, using best guesses: http://dbpedia.org Works! http://barans-grandma.org Does not work! How many years experience do I need to be able to access your grandma's homepage? Michael -- Dipl.-Inform. Michael Schneider Research Scientist, Information Process Engineering (IPE) Tel : +49-721-9654-726 Fax : +49-721-9654-727 Email: michael.schnei...@fzi.de WWW : http://www.fzi.de/michael.schneider === FZI Forschungszentrum Informatik an der Universität Karlsruhe Haid-und-Neu-Str. 10-14, D-76131 Karlsruhe Tel.: +49-721-9654-0, Fax: +49-721-9654-959 Stiftung des bürgerlichen Rechts, Az 14-0563.1, RP Karlsruhe Vorstand: Prof. Dr.-Ing. Rüdiger Dillmann, Dipl. Wi.-Ing. Michael Flor, Prof. Dr. Dr. h.c. Wolffried Stucky, Prof. Dr. Rudi Studer Vorsitzender des Kuratoriums: Ministerialdirigent Günther Leßnerkraus ===
Re: Hungarian National Library published its entire OPAC and Digital Library as Linked Data
Sorry I had a small typo in that oai-ore example I included. I meant to type ore:aggregates instead of ore-aggregates. I also meant to include assertions about the format of the files, which can be handy to have. //Ed http://oszkdk.oszk.hu/resource/DRJ/404 dc:creator http://nektar.oszk.hu/resource/auth/33589, Jókai Mór,, (1825-1904.) ; dc:date cop. 2006 ; dc:description Működési követelmények: Adobe Reader / MS Reader , Főcím a címképernyőről, Szöveg (pdf : 1.2 MB) (lit : 546 KB) ; dc:identifier , 963-606-169-6 (pdf), 963-606-170-X (lit) ; dc:language hun ; dc:publisher Szentendre : Mercator Stúdió ; dc:subject http://nektar.oszk.hu/resource/auth/magyar_irodalom, magyar irodalom. ; dc:title Dekameron ; dc:type book, elbeszélés., elektronikus dokumentum., no type provided ; ore:aggregate http://oszkdk.oszk.hu/storage/00/00/04/04/dd/1/dekameron.pdf, http://oszkdk.oszk.hu/storage/00/00/04/04/dd/2/dekameron.lit . http://oszkdk.oszk.hu/storage/00/00/04/04/dd/1/dekameron.pdf dc:format application/pdf . http://oszkdk.oszk.hu/storage/00/00/04/04/dd/2/dekameron.lit dc:format application/x-ms-reader .
Re: CoIN: Composition of Identifier Names
Hi all, Here's the juice on URI opacity, right from Roy [1]. The important bits: Opacity of URI only applies to clients and, even then, only to those parts of the URI that are not defined by relevant standards. Origin servers, for example, have the choice of interpreting a URI as being opaque or as a structure that defines how the server maps the URI to a representation of the resource. Cool URIs will often make a transition from being originally interpreted as structure by the server and then later treated as an opaque string (perhaps because the server implementation has changed and the owner wants the old URI to persist). The server can make that transition because clients are required to act like they are ignorant of the server-private structure. Clients are allowed to treat a URI as being structured if that structure is defined by standard (e.g., scheme and authority in http) or if the server tells the client how its URI is structured. For example, both GET-based FORM actions and server-side image map processing compose the URI from a server-provided base and a user-supplied suffix constructed according to an algorithm defined by a standard media type. Ivan [1] http://tech.groups.yahoo.com/group/rest-discuss/message/5369 On Tue, Apr 13, 2010 at 19:30, Richard Cyganiak rich...@cyganiak.de wrote: On 13 Apr 2010, at 18:04, Pierre-Antoine Champin wrote: 2/ Given a URI, a software should not try to reverse-engineer it. However, the axiom does not prevent that a software be given a *rule* to *produce* new URIs. As a matter of fact, I would be surprised that TBL would discourage this very mechanism which underlies all HTML-based forms (at least those using the GET method). A form is nothing else than the specification of a *whole set* of URIs, plus the technical tool to produce them easily in your browser. Didn't read this before writing my own response -- well said! Cheers, Richard pa [1] http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm [2] http://www.w3.org/DesignIssues/Axioms.html#opaque On 13/04/2010 17:11, Robert Sanderson wrote: A quick question... 2010/4/13 Niklas Lindström lindstr...@gmail.com: I've found it very valuable to formally declare the pieces from which an URI is to be composed of. Especially in our environment where we have a central design of the URI:s, but decentralized publishing of data (which is of a somewhat rich and varied nature). How does this mesh with URIs being opaque? If the URIs were actually opaque and treated as such, then formally declaring the parts would be a non-issue. It seems that this ideal is being increasingly watered down or ignored... is that intentional, and is it a good or bad thing? Thoughts? Rob Sanderson
Re: XMP RDF extractors?
Hi, On Tuesday, April 13, 2010, Dan Brickley dan...@danbri.org wrote: ...snip!... Wouldn't it be nice if there were easy conventions for books about RDF to have Webby linked RDF bundled in the files? Both seem nearly there but not quite... (this not a complaint Leigh, I love this work btw!) Thanks for the feedback, glad you like it. The epub and PDF formats ate just generated with the docbook-xsl stylesheets. I'm really pleased that there's any machine readable data in there at all! It's on my TODO lost to get some RDFa into the the HTML output. I guess if folk wanted to improve the quality of metadata for ebooks and PDFs then exploring how to enhance docbook conversions would be a good start. Cheers, L. -- Leigh Dodds Programme Manager, Talis Platform Talis leigh.do...@talis.com http://www.talis.com
Re: [semanticweb] ANN: DBpedia 3.5 released
A fact of my experience since many years: The homepage of my grandma is better accessible than the flagship(!) of 'linked data' dbpedia.org... Let's do the test! In Firefox, using best guesses: http://dbpedia.org Works! http://barans-grandma.org Does not work! How many years experience do I need to be able to access your grandma's homepage? Dipl.-Inform. Michael Schneider Research Scientist, Information Process Engineering (IPE) FZI Forschungszentrum Informatik an der Universität Karlsruhe Someone who has used the endpoint dbpedia.org/sparql intensively knows what i mean: After one or two hours or so, it hangs, i try dbpedia.org with FFox, Opera, IE, it hangs also, after 5 minutes i try dbpedia.org, i see the page, for dbpedia.org/sparql i put my simple query again, it is ok. Since years it is the same story in the same rhythm. But if someone clicks dbpedoa.org only once, then he has of course also the time to write such a nonsense like you did it above from: 'FZI Forschungszentrum Informatik an der Universität Karlsruhe' And if i send a mail the server doesnt't work properly, i can get perhaps a reply from Chris Bizer suggesting 'maintanance work on the DBpedia server'... i think there has been so much maintenance work there that even a simple click on dbpedia.org was hanging too oft comparing to the homepage of my grandma... baran. -- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/ [2]
Re: [Dbpedia-discussion] ANN: DBpedia 3.5 released
Dear DBpedia workers, First of all, many thanks for this new release :) Then, I have a quick question regarding the difference between the dbpedia 3.4 raw infobox data set and the dbpedia 3.5 raw infobox data set. - http://downloads.dbpedia.org/3.5/en/infobox_properties_en.nt.bz2 - http://downloads.dbpedia.org/3.4/en/infobox_en.nt.bz2 Comparing the two, it appears that the dbpedia 3.5 infobox data set (4.7G) is actually much smaller than the dbpedia 3.4 infobox data set (5.7G). Do you know why the trend is not size increase, but size reduction? Did you change anything in the way that raw infobox data sets are extracted? Cheers, Nicolas. -- Nicolas Torzec Yahoo! Labs. On 4/12/10 2:06 AM, Chris Bizer ch...@bizer.de wrote: Hi all, we are happy to announce the release of DBpedia 3.5. The new release is based on Wikipedia dumps dating from March 2010. Compared to the 3.4 release, we were able to increase the quality of the DBpedia knowledge base by employing a new data extraction framework which applies various data cleansing heuristics as well as by extending the infobox-to-ontology mappings that guide the data extraction process. The new DBpedia knowledge base describes more than 3.4 million things, out of which 1.47 million are classified in a consistent ontology, including 312,000 persons, 413,000 places, 94,000 music albums, 49,000 films, 15,000 video games, 140,000 organizations, 146,000 species and 4,600 diseases. The DBpedia data set features labels and abstracts for these 3.2 million things in up to 92 different languages; 1,460,000 links to images and 5,543,000 links to external web pages; 4,887,000 external links into other RDF datasets, 565,000 Wikipedia categories, and 75,000 YAGO categories. The DBpedia knowledge base altogether consists of over 1 billion pieces of information (RDF triples) out of which 257 million were extracted from the English edition of Wikipedia and 766 million were extracted from other language editions. The new release provides the following improvements and changes compared to the DBpedia 3.4 release: 1. The DBpedia extraction framework has been completely rewritten in Scala. The new framework dramatically reduces the extraction time of a single Wikipedia article from over 200 to about 13 milliseconds. All features of the previous PHP framework have been ported. In addition, the new framework can extract data from Wikipedia tables based on table-to-ontology mappings and is able to extract multiple infoboxes out of a single Wikipedia article. The data from each infobox is represented as a separate RDF resource. All resources that are extracted from a single page can be connected using custom RDF properties which are also defined in the mappings. A lot of work also went into the value parsers and the DBpedia 3.5 dataset should therefore be much cleaner than its predecessors. In addition, units of measurement are normalized to their respective SI unit, which makes querying DBpedia easier. 2. The mapping language that is used to map Wikipedia infoboxes to the DBpedia Ontology has been redesigned. The documentation of the new mapping language is found at http://dbpedia.svn.sourceforge.net/viewvc/dbpedia/trunk/extraction/core/doc/ mapping%20language/ 3. In order to enable the DBpedia user community to extend and refine the infobox to ontology mappings, the mappings can be edited on the newly created wiki hosted on http://mappings.dbpedia.org. At the moment, 303 template mappings are defined, which cover (including redirects) 1055 templates. On the wiki, the DBpedia Ontology can be edited by the community as well. At the moment, the ontology consists of 259 classes and about 1,200 properties. 4. The ontology properties extracted from infoboxes are now split into two data sets: 1. The Ontology Infobox Properties dataset contains the properties as they are defined in the ontology (e.g. length). The range of a property is either an xsd schema type or a dimension of measurement, in which case the value is normalized to the respective SI unit. 2. The Ontology Infobox Properties (Specific) dataset contains properties which have been specialized for a specific class using a specific unit. e.g. the property height is specialized on the class Person using the unit centimeters instead of meters. For further details please refer to http://wiki.dbpedia.org/Datasets#h18-11. 5. The framework now resolves template redirects, making it possible to cover all redirects to an infobox on Wikipedia with a single mapping. 6. Three new extractors have been implemented: 1. PageIdExtractor extracting Wikipedia page IDs are extracted for each page. 2. RevisionExtractor extracting the latest revision of a page. 3. PNDExtractor extracting PND (Personnamendatei) identifiers. 7. The data set now provides labels, abstracts, page links and infobox data in 92 different languages, which have been extracted from
Re: CoIN: Composition of Identifier Names
2010/4/13 Richard Cyganiak rich...@cyganiak.de: I think that URI Templates [3] might be a handy companion syntax for CoIN and I wonder if they could be integrated into CoIN. I'm thinking more about the general curly-brace-syntax rather than the fancy details. So perhaps you could start with something like http://example.org/publ/{publisher}/{document} http://example.org/publ/{publisher}/{document}/rev/{date} http://example.org/profiles/{name} I second the idea of exploring the use of URI Templates for documenting how to construct a URL from other data. I'm not sure if it's part of the latest URI Templates draft [1], but OpenSearch allows parameter names to be defined with namespaces [2]. For example: ?xml version=1.0 encoding=UTF-8? OpenSearchDescription xmlns=http://a9.com/-/spec/opensearch/1.1/; xmlns:geo=http://a9.com/-/opensearch/extensions/geo/1.0/; Url type=application/vnd.google-earth.kml+xml template=http://example.com/?q={searchTerms}pw={startPage?}bbox={geo:box?}format=kml/ /OpenSearchDescription Note, the use of the geo namespace and the geo:box parameter name? So you could imagine a URL template that referenced names from an RDF vocabulary: Url type=application/rdf+xml template=http://example.com/user/{foaf:mbox_sha1sum}; / OpenSearch was an incubator for the ideas that led to the URI Templates draft, and is built into many modern web browsers (IE, Firefox, Chrome). //Ed [1] http://tools.ietf.org/html/draft-gregorio-uritemplate-04 [2] http://www.opensearch.org/Specifications/OpenSearch/1.1#Parameter_names
Re: Natural Keys and Patterned URIs
On 13 April 2010 22:00, Leigh Dodds leigh.do...@talis.com wrote: Hi Patrick, On 10 April 2010 17:44:06 UTC+1, Patrick Logan patrickdlo...@gmail.com wrote: Ah, never mind. I think I found the answer... Literal Key. Perhaps the other patterns should mention this and include Literal Key in the Related section? I'll make sure there are some extra cross-references. The discovery aspects are interesting here as ideally you want to look them up based on a known identifier property that stores the Literal Key. OWL 2 has some support for defining keys and I ought to reference this from the pattern. There also needs to be some discussion around using dc:identifier or sub-properties. The former can be easier to discover is there any resource with X as an identifier, while the latter can carry more semantics. An intermediary position is to use dc:identifier with a Custom Datatype. SKOS encourages the latter via skos:notation is always has to have a datatype associated with it. skos:notation with a custom datatype is just as hard to find as dc:identifier with a custom datatype, or merely a custom predicate with a simple plain literal string. Either way, you have to know exactly which URI's people are using for the datatype or predicate, so the discovery or semantics of each scheme are exactly the same. There is no discovery advantage in my opinion to using custom datatypes where predicates are equally suitable, as there is no ability to match ?uri skos:notation 123.23. if the data is actually ?uri skos:notation 123.23^^mydatatype . without treating all of the objects as plain string literals. There doesn't seem to be any advantage to using the datatype if people have to go through ?uri skos:notation ?object . filter(str(?object) = 123.23) to get there, and then they could have overlaps with other schemes anyway. If you are looking for a predicate that is defined as a key, then you could still have overlaps between schemes, as you are not recognising the predicate explicitly. In all of the methods, one needs to know they are looking for an identifier, and know what scheme the identifier is defined in to get exact access, and in all of the cases one may have overlaps if they only know they are looking for an identifier without knowing which scheme it is defined in, so there is no semantic difference. All comes down to accessibility I think. If you want the scheme to be more accessible to people who know the identifier but not the scheme than string plain literals with a custom predicate is more useful. If you want the scheme to be more accessible to people who know the scheme, but want to know the identifier, than the standard predicates , is, dc:identifier or skos:notation (w/ custom datatype) are more useful. If the patterns document wants to portray the advantages of different methods, rather than just giving best practices, then the advantages of both methods could be explained. Cheers, Peter