Re: How "valid" is to use a term marked as "unstable" for a data publisher/consumer?
+Cc: Leigh Dodds, for old time's sake On 20 July 2016 at 09:45, Ghislain Atemezingwrote: > Hi all, > > [ Apologize if this question has been answered before in this group. ] > > Recently, I was working on a project where we were just reusing existing > terms for building a knowledge base for a private company. When we were > considering using for example foaf:birthday, I was told by someone that it > was marked “unstable” in the vocabulary file. The normal reaction would have > been “so what?” ;) > > However I found the question somehow interesting in the sense that the vocab > defining the term status of the vocabulary [1] uses“unstable” for all the > properties in the vocabulary, and of course it is reused by many > vocabularies [2]. At the meantime, FOAF is one of the most popular > vocabulary used in the LOD cloud (stats for 2014 here [3] ) and I guess > there are many data modeled with some of the terms flagged as “unstable”. I > found an example dataset here for Nobel Prize [4]. > > Is there any risk for data publishers or consumers (e.g., visual > applications) to reuse “safely” terms flagged as “unstable”? > Do you know any study on this type of questions? > > Any experience or thought is more than welcome to propose a more rationale > answer to my project partner. On some level this is my fault :) The vocabulary at [1] bubbled out of FOAF collaborations many years ago, where we were keen to explore more fine-grained mechanisms for term evolution than the previously dominant notion that versioning happened at the vocabulary/namespace level. We had seen efforts like Dublin Core get stuck because of a sense that changing any term's documentation necessitated a revision to the schema's version number (DC 1.0 -> DC 1.1), and I had also been responsible for somewhat naive language in the 1998/1999 working drafts of the initial RDFS spec which encouraged the notion that any changes to a schema should require a new URL. See http://lists.foaf-project.org/pipermail/foaf-dev/2003-July/005462.html for the initial design discussions in the FOAF project, ~2003. The reason that the vocab status vocabulary is itself marked as unstable, is that we hoped to refine it in the light of experience, and in particular to consider using URLs instead of well known strings, to better support i18n/l18n and SKOS-style refinement. We did make a sketch of a sketch of a W3C Note on this at https://www.w3.org/2003/06/sw-vocab-status/note but didn't complete the work. There may also be things we can reflect from the schema.org experience, as well as mechanisms in OWL and SKOS, that ought to be incorporated. On the schema.org side, for example, we recently added a "pending" area of the vocabulary (see http://pending.schema.org/) where drafts are shared; this is roughly like "unstable" but the word "pending" is slightly less intimidating to potential users. The main point of marking a term 'unstable' is that if the term maintainer does change it in the light of experience, they have an excuse and can say "hey, don't blame us, we said there was some chance we might change the definitions in light of experience". Beyond that, I doubt there is much that can be formally encoded and potential users are probably best advised to read actual human-oriented text and discussions to understand any remaining open issues. For example, http://pending.schema.org/ClaimReview describes the status ('pending') of the schema.org term ClaimReview. Probably the most important thing that page does is point to the corresponding issue tracker entry at https://github.com/schemaorg/schemaorg/issues/1061 where you can ready anything that is known in that vocabulary community about the maturity or otherwise of the relevant term. So if I were revisiting the vocabulary status vocabulary in 2016 my advice would be that it should be re-oriented towards discovery of such human-oriented documentation, rather than trying to over-formalize codes like 'unstable' vs 'testing' whose nuanced meaning will naturally vary by context and project. If you dig around http://lists.foaf-project.org/pipermail/foaf-dev/2003-July/005462.html you'll see that was pretty much what we had in mind originally... cheers, Dan > Best, > Ghislain > > [1] http://www.w3.org/2003/06/sw-vocab-status/ns# > [2] http://lov.okfn.org/dataset/lov/vocabs/vs > [3] http://linkeddatacatalog.dws.informatik.uni-mannheim.de/state/ > [4] http://data.nobelprize.org/ > --- > Ghislain A. Atemezing, Ph.D > Mail: ghislain.atemez...@gmail.com > Web: https://w3id.org/people/gatemezing > Twitter: @gatemezing > About Me: https://about.me/ghislain.atemezing > > > > > > > >
new W3C CSV on the Web specs, now at Candidate Recommendation stage - please implement!
Hi! Short version: Please see http://www.w3.org/blog/news/archives/4830 for the Candidate Recommendation specs from W3C's CSV on the Web group - https://www.w3.org/2013/csvw/wiki/Main_Page Long version: These are the 4 docs, Model for Tabular Data and Metadata on the Web—an abstract model for tabular data, and how to locate metadata that enables users to better understand what the data holds; this specification also contains non-normative guidance on how to parse CSV files http://www.w3.org/TR/2015/CR-tabular-data-model-20150716/ Metadata Vocabulary for Tabular Data—a JSON-based format for expressing metadata about tabular data to inform validation, conversion, display and data entry for tabular data http://www.w3.org/TR/2015/CR-tabular-metadata-20150716/ Generating JSON from Tabular Data on the Web—how to convert tabular data into JSON http://www.w3.org/TR/2015/CR-csv2json-20150716/ Generating RDF from Tabular Data on the Web—how to convert tabular data into RDF http://www.w3.org/TR/2015/CR-csv2rdf-20150716/ See the blog post for more links including an extensive set of test cases, our GitHub repo and the mailing list for feedback. Also note that the approach takes CSV as its central stereotypical use case but should apply to many other tabular data-sharing approaches too (e.g. most obviously tab separated). So if you prefer tab-separated files to comma-separated, do please take a look! The Model spec defines that common model, the metadata document defines terminology for talking about instances of that model, and the last two specs apply this approach to the problem of mapping tables into JSON and/or RDF. The group expects to satisfy the implementation goals (i.e., at least two, independent implementations for each of the test cases) by October 30, 2015. Please take a look, and pass this along to other groups who may be interested. cheers, Dan for the CSVW WG p.s. since I'm writing I'll indulge myself and share my personal favourite part, which is the ability (in the csv2rdf doc) to map from rows in a table via templates into RDF triples. This is a particularly interesting/important facility and worth some attention. Normally I wouldn't enthuse over (yet another) new RDF syntax but the ability to map tabular data into triples via out-of-band mappings is very powerful. BTW the group gave some serious consideration to applying R2RML here (see docs and github/wiki for details), however given the subtle differences between SQL and CSV environments we have taken a different approach. Anyway please take a look!
Spec review request: CSV on the Web
The CSV on the Web Working Group [1] has just published a new set of Working Drafts, which we consider feature complete and implementable. We particularly seek reviews from Web Security, Privacy, Internationalization and Accessibility perspectives at this time. A request has also been sent to the TAG [7]. We request review now rather than later since we are following W3C's revised Process in which there is no distinct Last Call; we prefer to invite reviews now rather than wait for a formal Candidate Recommendation. The drafts are: Model for Tabular Data and Metadata on the Web [2] - an abstract model for tabular data, and how to locate metadata that enables users to better understand what the data holds; this specification also contains non-normative guidance on how to parse CSV files. Metadata Vocabulary for Tabular Data [3] - a JSON-based format for expressing metadata about tabular data to inform validation, conversion, display and data entry for tabular data Generating JSON from Tabular Data on the Web [4] - how to convert tabular data into JSON Generating RDF from Tabular Data on the Web [5] - how to convert tabular data into RDF We are keen to get comments on these specifications, either as issues on our GitHub repository [6] or by posting to public-csv-wg-comme...@w3.org. We would also like to invite people to start implementing these specifications and to donate their test cases into our test suite. Building this test suite, as well as responding to comments, will be our focus over the next couple of months. Dan [1] http://www.w3.org/2013/csvw/wiki/Main_Page [2] http://www.w3.org/TR/2015/WD-tabular-data-model-20150416/ [3] http://www.w3.org/TR/2015/WD-tabular-metadata-20150416/ [4] http://www.w3.org/TR/2015/WD-csv2json-20150416/ [5] http://www.w3.org/TR/2015/WD-csv2rdf-20150416/ [6] https://github.com/w3c/csvw/issues [7] https://lists.w3.org/Archives/Public/www-tag/2015Apr/0028.html
Re: How to avoid that collections break relationships
On 25 March 2014 15:52, Markus Lanthaler markus.lantha...@gmx.net wrote: please let's not talk about hash URLs etc. here, ok? So, please. Let's try to focus on the problem at hand. As an online discussion grows longer, the probability of a comparison involving http-range-14 or URNs approaches 1 Dan
Re: How to avoid that collections break relationships
On 26 March 2014 04:26, Pat Hayes pha...@ihmc.us wrote: On Mar 25, 2014, at 11:29 AM, Markus Lanthaler markus.lantha...@gmx.net wrote: On Tuesday, March 25, 2014 5:00 PM, Pat Hayes wrote: Seems to me that the, um, mistake that is made here is to use the same property schema:knows for both the individual case and the list case. Exactly.. it is especially problematic if rdfs:range is involved. Why not invent a new property for the list case, say :knowsList, and add a relationship between them as an RDF triple: :knowsList :listPropertyOf schema:knows . where :listPropertyOf has the semantic condition aaa listPropertyOf bbb xxx aaa ddd ddd schema:itemLIstElement yyy imply xxx bbb yyy Yeah, that's very similar to an idea I had (but it wasn't so elegant). The issue is that you won't discover :knowsList if you look for schema:knows unless you infer the xxx bbb yyy triples. In other words, if you don't know :knowsList and thus ignore it, you would neither find the collection nor the schema:knows relationships. Hmm. I would be inclined to violate IRI opacity at this point and have a convention that says that any schema.org property schema:ppp can have a sister property called schema:pppList, for any character string ppp. So you ought to check schema:knowsList when you are asked to look for schema:knows. Then although there isn't a link in the conventional sense, there is a computable route from schema:knows to schema:knowsList, which as far as I am concerned amounts to a link. In fact something very close to this was considered for the roles proposal I circulated yesterday, i.e. http://lists.w3.org/Archives/Public/public-vocabs/2014Mar/0111.html The idea was to define a URI template pattern e.g. http://schema.org/role/{propertyname} so that '/actor' would be shadowed by '/role/actor', and the latter used when describing a situation involving 3 entities (movie, role, person) rather than a binary relationship between movie and person. In this case so far we decided against introducing the complexity, but similar designs might prove appropriate for related problems. Dan
Re: Schema.org v1.0e published: Order schema, Accessibility properties
On 4 December 2013 23:07, Aaron Bradley aaran...@gmail.com wrote: Swell stuff! Are there plans to bring the previously published draft specification on the Google Developer site [1] in line with this new specification on schema.org? Properties only on schema.org: confirmationNumber discount / discountCode / discountCurrency isGift orderedItem paymentDue / PaymentMethod / PaymentMethodID / paymentUrl Properties only on Google Developer: price / priceCurrency / priceSpecification As well - and probably the most noticable difference - the Google version uses the property seller instead of the schema.org property merchant. Because the earlier version exists on Google Developer I know this is chiefly a Google-esque issue, but insofar as there's now published version of schema.org/Order *on* schema.org, it would obviously be mutually advantageous if the seller/merchant property nomenclature was normalized - perhaps it's in the works. As a first step we'll get a link from the google work-in-progress docs to the final finished thing. I can't say for sure how long until various Google products understand the new vocabulary, although Google's Structured Data Testing Tool should at least already not complain when it sees new (v1.0d, v1.0e) terms. Work in progress! Dan [1] https://developers.google.com/gmail/actions/reference/order#specification On Wed, Dec 4, 2013 at 8:57 AM, Kingsley Idehen kide...@openlinksw.com wrote: On 12/4/13 10:54 AM, Pierre-Yves Vandenbussche wrote: Hi all, you can find more information on what has changed since the last version here: http://lov.okfn.org/dataset/lov/dif/dif_schema_1.0d-1.0e.html The Schema.org entry on LOV is as well updated (versions file and difference can be found on the timeline): http://lov.okfn.org/dataset/lov/details/vocabulary_schema.html Regards, Pierre-Yves. Awesome on both fronts re., schema.org version 1.0e and LOV's cool delta page! Kingsley Pierre-Yves Vandenbussche. On Wed, Dec 4, 2013 at 3:30 PM, Dan Brickley dan...@danbri.org wrote: Schema.org version 1.0e has been published. This release includes a schema for describing Orders, see http://schema.org/Order as well as the Accessibility properties for http://schema.org/CreativeWork pre-announced recently, http://lists.w3.org/Archives/Public/public-vocabs/2013Nov/0190.html (blog post on its way). It also fixes a small bug with http://schema.org/validFrom (in 1.0d made the text overly focussed on Civic Actions, and we revert the expected type back to DateTime). As always, a machine-readable RDFa dump of the entire schema is available at http://schema.org/docs/schema_org_rdfa.html and bugfixes, discussion etc. are welcomed here. Many thanks to everyone who was involved! Dan (trying to get in first with an announcement for a change ;) -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog: http://www.openlinksw.com/blog/~kidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen
Re: List membership - more women
On 24 June 2013 10:34, Isabelle Augenstein i.augenst...@sheffield.ac.ukwrote: Hi Dominic, I only joined the list a few months ago, so my observations might be inaccurate, but - Overall, most discussions on the list seem to be rather philosophical (What is Linked Data? Does Linked Data require RDF?), which are not the kind of discussions I was hoping for when I joined the list in the first place Quite. A lot of the initial enthusiasm about Linked Data was associated with a despair some felt about the Semantic Web slogan, which had got itself associated with overly-academic, complex-KR-obsessed and other unworldy concerns. I suspect this sort of churn is a natural part of the lifecycle of standards work; some are starting to feel about public-lod the same way. - My guess would be that the ratio between subscribers and people posting on the list is rather low in general in addition to few women being subscribed to the list (But I bet we can get some statistics for that?) There are just over 1000 subscribers to the list (no gender figures available for those). You can see from http://lists.w3.org/Archives/Public/public-lod/2013Jun/author.html who the most vocal participants are. Dan
Re: The Great Public Linked Data Use Case Register for Non-Technical End User Applications
On 24 June 2013 14:31, Kingsley Idehen kide...@openlinksw.com wrote: On 6/24/13 2:14 AM, Michael Brunnbauer wrote: Hello Kingsley Idehen, On Sun, Jun 23, 2013 at 05:32:00PM -0400, Kingsley Idehen wrote: We don't need a central repository of anything. Linked Data is supposed to be about enhancing serendipitous discovery of relevant things. You appear to be arguing against the simple useful practice of communally collecting information. Just because we can scatter information around the Web and subsequently aggregate it, doesn't mean that such fragmentation is always productive. I don't see anyone arguing that the only option is to monolithically centralise everything forever; just that a communal effort on cataloguing things might be worth the time. Google already demonstrates some of this, in the most obvious sense via its search engine, and no so obvious via its crawling of Linked Data which then makes its way Google Knowledge Graph and G+ etc.. - http://en.wikipedia.org/wiki/Citation_needed You've sometimes said that all Web pages are already Linked Data with boring link-types. Are you talking about something more RDFish in this case? Dan
Are Topic Maps Linked Data?
Just wondering, Dan
Re: The Great Public Linked Data Use Case Register for Non-Technical End User Applications
On 23 June 2013 23:46, Kingsley Idehen kide...@openlinksw.com wrote: On 6/23/13 5:36 PM, Barry Norton wrote: Are you confusing Linked Data and Linked Open Data? Of course not! Web-like structured data enhanced with explicit entity relationship semantics enables serendipitous discovery at the public or private level. Open has nothing to do with Public . Open is about standards and the interoperability they accord. What part of http://www.w3.org/wiki/index.php?title=SweoIG/TaskForces/CommunityProjects/LinkingOpenDataoldid=35551am I misunderstanding? The early LOD collaborations had a clear emphasis on open in the sense of freely available data. I can see merit in broadening that, but to say has nothing to do with seems at odds with how a lot of people appeared to be understanding the initiative. Dan Interlinking Open Data on the Semantic Web Chris Bizer, Richard Cyganiak *1. Please provide a brief description of your proposed project.* The Open Data Movement http://en.wikipedia.org/wiki/Open_Data aims at making data freely available to everyone. There are already various interesting open data sources availiable on the Web. Examples include Wikipedia http://www.wikipedia.org/,Wikibooks http://www.wikipedia.org/ , Geonames http://www.geonames.org/, MusicBrainz http://musicbrainz.org/ , WorldNet http://wordnet.princeton.edu/online/, the DBLP bibliographyhttp://www.informatik.uni-trier.de/~ley/db/ and many more which are published under Creative Commonshttp://creativecommons.org/ or Talis http://www.talis.com/tdn/tcl licenses. The goal of the proposed project is to make various open data sources available on the Web as RDF and to set RDF links between data items from different data sources. There are already some data publishing efforts. Examples include the dbpedia.org http://dbpedia.org/docs/ project, the Geonames Ontologyhttp://www.geonames.org/ontology/ and a D2R Server publishing the DBLP bibliographyhttp://www4.wiwiss.fu-berlin.de/dblp/. There are also initial efforts to interlink these data sources. For instance, the dpedia RDF descriptions of cities includes owl:sameAs links to the Geonames data about the city (1) http://dbpedia.org/docs/#link. Another example is the RDF Book Mashuphttp://sites.wiwiss.fu-berlin.de/suhl/bizer/bookmashup/ which links book authors to paper authors within the DBLP bibliography (2)http://lists.w3.org/Archives/Public/semantic-web/2006Dec/0022 . *2. Why did you select this particular project?* For demonstrating the value of the Semantic Web it is essential to have more real-world data online. RDF is also the obvious technology to interlink open data from various sources. *3. Why do you think this project will have a wide impact?* A huge inter-linked data set would be beneficial for various Semantic Web development areas, including Semantic Web browsers and other user interfaces, Semantic Web crawlers, RDF repositories and reasoning engines. Having a variety of useful data online would encourage people to link to it and could help bootstrapping the Semantic Web as a whole. Dan
Re: Linked Data discussions require better communication
On 20 June 2013 18:54, Giovanni Tummarello giovanni.tummare...@deri.org wrote: My 2c is .. i agree with kingsley diagram , linked data should be possible without RDF (no matter serialization) :) however this is different from previous definitions i think its a step forward.. but it is different from previously. Do we want to call it Linked Data 2.0? under this definition also schema.org marked up pages would be linked data .. and i agree plenty with this . Schema.org pages are already RDF and imho Linked Data, as was FOAF even when (shock horror!) the graph contains bNodes. Nothing in TimBL's original note _forces_ you to always use URIs for every node in the graph. It does advocate strongly for lots of URIs and for machine-friendly data available from using them. To be clear, Schema.org is based on RDF. We just choose our moments for when to emphasize this, and when to focus on other practicalities. I'd draw an analogy with Unicode. It's there in the background and helps tie things together, even if you don't always need to be emphasizing it when talking about things that use it. Dan
Re: Monitoring subscribers on the list
On 18 June 2013 15:43, Barry Norton barry.nor...@ontotext.com wrote: Does anyone know if the number of subscribers on the list can be monitored? I have a limited degree of monitoring, for the EUCLID project, through the RSS feed and Web scraping, but I'm struggling to measure: 1) what fraction of subscribers the vocal minority of posters are; 2) how unsubscriptions correlate with the length of current threads. I have access to a list admin tool that gives me the current count. I don't believe time-series data is easily available. The list has 1063 subscribers/survivors currently. Semantic-Web@ has 1344; the defunct www-rdf-logic has 433. I've no idea how many bounce (I believe some bouncing can cause auto-unsubscription). You can approximate frequent poster stats manually from e.g. http://lists.w3.org/Archives/Public/public-lod/2013Jun/author.html ... I'm not aware of an machine-friendly version. I don't unsubscribe from lists any more, I just pipe them into folders. I guess others do the same? Dan
Re: CFP: Data In Web Search (DISH) Workshop - 13th May 2013, Rio de Janeiro, Brazil
Just to let you know, the Workshop papers deadline is extended until March 4th 2013. Please don't ask me what time of day on March 4th! --Dan On 9 January 2013 18:22, Dan Brickley dan...@danbri.org wrote: [I don't often crosspost to 3 W3C lists, but I think this will be an important event and hope to see some of you there... --Dan] CFP: Data In Web Search (DISH) Workshop - 13th May 2013, Rio de Janeiro, Brazil. Workshop: http://dish2013.foaf-project.org/ Conference: http://www2013.org/ This WWW2013 Workshop focuses on new approaches to using structured data for improving Web search. Most Web documents and queries are about entities and the relationships between them, i.e., structured data with documented semantics. However, popular search engines have historically ignored structured data, instead relying on techniques that model the document and queries as a bag of words. Recent developments, most notably the dramatic increase in the use of structured data markup on web pages have lead to substantial interest from mainstream search engines. However, we are still in the very early stages in the evolution of how search engines use this structured data. Most of the current work is focussed on searching databases of facts about entities and presenting them either alongside the search results, or on annotating search results with additional data. The core problems of utilizing knowledge about entities for improving the ranking of documents, helping set the user context, etc. are still largely unexplored territories. While the use of structured data is still limited in Web search engines, active research in this direction can be observed in many communities. Most notably, there is a broad range of solutions proposed by IR, database, and Semantic Web researchers for exploiting structured data for various search tasks. The goal of this Workshop is to bring these communities together to focus on the central question of how to make these solutions applicable to Web search engines. The central theme of the workshop is to explore new and novel ways for exploiting explicit representations of entities and the relationships between them to improve Web search. Important Dates Workshop proceedings will be published through the ACM Digital Library, with associated tight production deadlines: * February 23rd 2013: Workshop paper deadline * March 13th 2013: Workshop paper notifications * April 2nd 2013: Workshop paper final copy * WWW2013 Conference: May 13-17th 2013, Rio de Janeiro, Brazil * Workshop day: May 13th 2013. Topics Three main directions of semantic search have emerged. The first is the use of structured data to augment traditional web search results and the search results page. The second is to use the structured data to directly deliver results to search requests and to answer questions. The third is to use knowledge about a domain to affect ranking of results. This workshop targets all these directions of semantic document retrieval and semantic data retrieval but puts special emphasis on the web search context. Possible topics for submission include, but are not limited to: * Structured data for Web document retrieval * Entity/relation aware document and query models * Entity/relation aware matching and ranking * Use of structured data for building vertical search engines * Web data retrieval * Searching structured data with textual queries * Novel applications of structured data to augment search results * Evaluation methodologies Submissions Workshop papers should be submitted by Feb 23rd 2013 using EasyChair (we are 'dish2013' there), see https://www.easychair.org/conferences/?conf=dish2013 The organizers can be contacted at dish-workshop organiz...@googlegroups.com in case of technical issues with the submission process. Due to the tight schedule, please don't ask for extensions! Organization We invite posters and papers of max. 6 pages presenting new ideas for how structured data can be used in search, preferably with working demos. Accepted papers will be published as part of the International Conference Proceedings Series (ICPS) of the ACM Digital Library. We plan to accept 8-12 papers and organize a full day event, roughly half devoted to each of the two approaches. Each session will have a significant time set aside for discussion. There will also be a poster session. Attendance will be open to the public (via WWW2013 Workshops registration http://www2013.org/registration/). We plan to have one or two invited talks. Advisory Board * Krisztian Balog, NTNU, Norway * Charlie Jiang, Bing, USA * Steve Macbeth, Bing, USA * Pavel Serdyukov, Yandex, Russia * Alexander Shubin, Yandex, Russia * Arjen P. de Vries, Delft University of Technology, Holland Program Committee (in progress - awaiting confirmations) * Vineet Gupta, Google, USA * Alon Halevy, Google, USA
Linked Data RDFa
With RDFa maturing (RDFa 1.1, particularly Lite), I wanted to ask here about attitudes to RDFa. I have acquired the impression somehow that in the Linked Data scene, people lean more towards the classic 'a doc for the humans, another for the machines' partitioning model. Perhaps this is just a consequence of history; digging around some old rdfweb/foaf discussions[1] I realise just how far we've come. RDFa wasn't an option for a long time; but it is now. So - questions. How much of the linked data cloud is expressed in some variant HTML+RDFa alongside RDF/XML, Turtle etc.? When/if you do so, are you holding some data back and keeping it only in the machine-oriented dumps, or including it in the RDFa? Are you finding it hard to generate RDFa from triple datasets because it's 'supposed' to be intermingled with human text? What identifiers (if any) are you assigning to real-world entities? Dataset maintainers ... as you look to the future is RDFa in your planning? Did/does Microdata confuse the picture? I'm curious where we are with this... Dan [1] http://lists.foaf-project.org/pipermail/foaf-dev/2000-September/004222.html http://web.archive.org/web/20011123075822/http://rdfwebring.org/2000/09/rdfweblog/example.html
Re: Breaking news: GoodRelations now fully integrated with schema.org!
On 8 November 2012 22:43, Guha g...@google.com wrote: Thank you Martin for the great collaboration. Look forward to more. And on our side, it was really Dan Brickley who did the work. Thank you Dan. Well in fact it was Cenk Gazen who did the hard and interesting work on the schema.org side (and Martin of course for the epic editorial work around GR). But a few words on the site-internal RDFa system now in passing, as it is also progress in its own right: This latest build of schema.org uses a different approach to previous updates. Earlier versions (apart from health/medicine) were relatively small, and could be hand coded. With Good Relations, the approach we took was to use an import system that reads schema definitions expressed in HTML+RDFa/RDFS and generates the site as an aggregation of these 'layers'. In other words, schema.org is built by a system that reads a collection of schema definitions expressed using W3C standards. The public site is also now more standards-friendly, aiming for 'Polyglot' HTML that works as HTML5 and XHTML, and you can find an RDFa view of the overall schema at http://schema.org/docs/schema_org_rdfa.html I'm really happy to see Good Relations go live, and look forward to catching up on the other contributions that are in the queue. The approach will be to express each of these in HTML/RDFa/RDFS and make some test sites on Appspot that show each proposal 'in place', and in combination with other proposals. Since schemas tend to overlap in coverage, this is really important for improving the quality and integration of schema.org as we grow. While it took us a little while to get this mechanism in place, I'm glad we now have this standards-based machinery in place that will help us scale up the collaboration around schema.org. Thanks again to all involved, Dan
Re: ANN: WebDataCommons.org - Offering 3.2 billion quads current RDFa, Microdata and Miroformat data extracted from 65.4 million websites
On 17 April 2012 18:56, Peter Mika pm...@yahoo-inc.com wrote: Hi Martin, It's not as simple as that, because PageRank is a probabilistic algorithm (it includes random jumps between pages), and I wouldn't expect that wayfair.com would include 2M links on a single page (that would be one very long webpage). But again to reiterate the point, search engines would want to make sure that they index the main page more than they would want to index the detail pages. You can do a site query to get a rough estimate of the ranking without a query string: search.yahoo.com/search?p=site%3Awayfair.com You will see that most of the pages are category pages. If you go to 2nd page and onward you will see an estimate of 1900 pages indexed. Of course, I agree with you that a search engine focused on structured data, especial if domain-specific, might want to reach all the pages and index all the data. I'm just saying that current search engines don't, and CommonCrawl is mostly trying to approximate them (if I understand correctly what they are trying to do). According to http://commoncrawl.org/faq/ What do you intend to do with the crawled content? Our mission is to democratize access to web information by producing and maintaining an open repository of web crawl data that is universally accessible. We store the crawl data on Amazon’s S3 service, allowing it to be bulk downloaded as well as directly accessed for map-reduce processing in EC2. No mention of search as such. I'd imagine they're open to suggestions, and that the project (and crawl) could take various paths as it evolves. (With corresponding influence on the stats...). Our problem here is in figuring out what can be taken from such stats to help guide linked data vocabulary creation and management. Maybe others will do deeper focussed crawls, who knows? But it's great to see this focus on stats lately, I hope others have more to share. Dan
Re: ANN: WebDataCommons.org - Offering 3.2 billion quads current RDFa, Microdata and Miroformat data extracted from 65.4 million websites
How about adding a disclaimer line to the webdatacommons.org site like Note that the many database-backed sites contain a huge long tail of rarely-visited, rarely-linked pages (e.g. product catalogues), but which increasingly contain useful structured data. It is best not to assume that this collection contains a complete, deep crawl of every site it touches. Dan
Re: See Other
On 28 March 2012 14:24, David Wood da...@3roundstones.com wrote: Hi Dan, On Mar 27, 2012, at 21:30, Dan Brickley wrote: On 27 March 2012 20:23, Melvin Carvalho melvincarva...@gmail.com wrote: I'm curious as to why this is difficult to explain. Especially since I also have difficulties explaining the benefits of linked data. However, normally the road block I hit is explaining why URIs are important. Alice: So, you want to share your in-house thesaurus in the Web as 'Linked Data' in SKOS? Bob: Yup, I saw [inspirational materials] online and a few blog posts, it looks easy enough. We've exported it as RDF/XML SKOS already. Here, take a look... [data stick changes hands] Alice: Cool! And .. yup it's wellformed XML, and here see I parsed it with a real RDF parser (made by Dave Beckett who worked on the last W3C spec for this stuff, beats me actually checking it myself) and it didn't complain. So looks fine! Ok so we'll need to chunk this up somehow so there's one little record per term from your thesaurus, and links between them... ...and it's generally good to make human facing pages as well as machine-oriented RDF ones too. Bob and Alice can stop at this point, throw the RDF/XML at Callimachus, write some templates in XHTML/RDFa and be done. They would get themeable human-oriented HTML, conneg for RDF/XML and Turtle, one URI per term, REST API CRUD, management with user accounts... Ok, ... up for a simple challenge then? In http://schema.org/JobPosting we say that a job posting (likely expressed in html + microdata or for that matter html + rdfa) can have an occupationalCategory property, whose values are drawn from an existing scheme, Category or categories describing the job. Use BLS O*NET-SOC taxonomy: http://www.onetcenter.org/taxonomy.html. Ideally includes textual label and formal code, with the property repeated for each applicable value. If you dig around on that link you can find PDF and XLS files at http://www.onetcenter.org/reports/Taxonomy2010.html So let's take http://www.onetcenter.org/dl_files/Taxonomy2010_AppA.xls ... it shows a table with pairs of codes and labels, and a kind of implied hierarchy. Say we wanted those in linked data (SKOS, most likely), ... how should the pages and URIs look? Can we do something better than point to .xls and .pdf files? what advice would we give the administrators of that site, for publishing (annual versions of...) their job taxonomy codes? How would/could/should an actual job listing on a jobs site look? Would it have a real hyperlink into the taxonomy site? Or just a textual property? What kind of standard templates can be offered to make such things less choice-filled? How would we do the same with, say, country codes? cheers, Dan
Re: {Disarmed} Re: See Other
On 28 March 2012 14:28, Hugh Glaser h...@ecs.soton.ac.uk wrote: I can't find any apps (other than mine) that actually use this. Searching: Sindice: http://sindice.com/search?q=http://graph.facebook.com 40 (forty) results Bing: http://www.bing.com/search?q=%22http://graph.facebook.com/%22 8400 results I don't think this activity has actually set the world alight yet - people are quite excited from what you call the Structured Data point of view, but little or no Linked Data. And it has been around for a little while now. And my (unproven) hypothesis is that Sindice would be finding these links all over the place if Facebook had been encouraged to do it differently. I'm not knocking it - you are right - it is really great they have done it. But I think we could have helped them do it better. I doubt the issue is 'help'. A structured data description of a network of hundreds of million people, ... but without the links, ... is kinda missing something. At which point we're deep in privacy and oauth etc territory; it wouldn't be proper, appropriate or polite to dump the social graph fully public anyhow. But a social network dataset without the network isn't going to set the afire with excitement. Even with FOAF where we got pretty substantial social graph datasets (livejournal, my opera etc) in public since 2004 or so, ... frankly very few managed to find interesting uses of that huge bulk of data. And not because it was in rdf/xml or because there were bnodes. It's much much harder to make compelling, useful apps with this stuff than it is to make proof of concept demos. Dan
See Other
On 27 March 2012 20:23, Melvin Carvalho melvincarva...@gmail.com wrote: I'm curious as to why this is difficult to explain. Especially since I also have difficulties explaining the benefits of linked data. However, normally the road block I hit is explaining why URIs are important. Alice: So, you want to share your in-house thesaurus in the Web as 'Linked Data' in SKOS? Bob: Yup, I saw [inspirational materials] online and a few blog posts, it looks easy enough. We've exported it as RDF/XML SKOS already. Here, take a look... [data stick changes hands] Alice: Cool! And .. yup it's wellformed XML, and here see I parsed it with a real RDF parser (made by Dave Beckett who worked on the last W3C spec for this stuff, beats me actually checking it myself) and it didn't complain. So looks fine! Ok so we'll need to chunk this up somehow so there's one little record per term from your thesaurus, and links between them... ...and it's generally good to make human facing pages as well as machine-oriented RDF ones too. Bob: Ok, so that'll be microformats no wait microdata ah yeah, RDFa, right? Which version? Alice: well RDFa yes, microdata is a kind of cousin, a mix of thinking from microdata and microformats communities. But I meant that you'd make a version of each page for computers to use (RDF/XML like your test export here), ... and you'd make some kind of HTML page for more human readers also. The stuff you mention is more about doing both within the same format... Bob: Great. Which one's the most standard? What should I use? Alice: Well I guess it depends what you mean by standard. [skips digression about whatwg and w3c etc notions of standards process] [skips digression about XHTML vs XML-ish polyglot HTML vs resolutely non-XML HTML5 flavours] [skips digression about qnames in HTML and RDFa 1.1 versus 1.0] ...you might care to look at using basic HTML5 document with say the Lite version of RDFa 1.1 (which is pretty much finished but not an official stable standard yet at W3C) Bob: [makes a note]. Ok, but that's just the human-facing page, anyway. We'd put up RDF/XML for machines too, right? Well maybe that's not necessary I guess. I was reading something about GRDDL and XSLT that automates the conversion, ... should we maybe generate the RDF/XML from the HTML+RDFa or vice versa? or just have some php hack generate both from MySQL since that's where the stuff ultimately lives right now anyway...? Alice: Um, well it's pretty much your choice. Do you need RDF/XML too? Well. maybe, not sure... it depends. There are more RDF/XML parsers around, they're more mature, ... but increasingly tools will consume all kinds of data as RDF. So it might not matter. Depends why you're doing this, really. Bob: Er ok, maybe we ought to do both for now, ... belt-and-braces, ... maybe watch the stats and see what's being picked up? I'm doing this because of promise of interestingly unexpected re-use and so on, which makes details hard to predict by definition. Alice: Sounds like a plan. Ok, so each node in your RDF graph, ... we'll need to give it a URI. You know that's like the new word for URL, but that includes identifiers for real world things too. Bob: Sure sure, I read that. Makes sense. And I can have a URI, my homepage can have a URI, I'm not my home page blah-de-blah? Alice: You got it. Bob: Ok, so what URLs should I give the concepts in this thesaurus? They've got all kinds of strings attached, but we've also got nicely managed numeric IDs too. Alice: Right so maybe something short (URls can never be too short...), ... so maybe if you host at your example.org server, http://example.com/demothes/c1 then same but /c2 /c3 etc. ... or well you could use #c1 or #c2 etc. That's pretty much up to you. There are pros-and-cons in both directions. Bob: whatever's easiest. It's a pretty plain apache2 setup, with php if we want it, or we can batch create files if that makes more sense; this data doesn't change much. Alice: Well how big is the thesaurus...? Bob: a couple thousand terms, each with a few relations and bits of text; maybe more if we dig out the translations (humm should we language negotiate those somehow?) Alice: Let's talk about that another day, maybe? Bob: And hmm the translations are versioned a bit differently? Should we put version numbers in somewhere so it's unambiguous which version of the translation we're using? Alice: Let's talk about that another day, too. Bob: OK, where were we? http://example.com/demothes/c1 ... sure, that sounds fine. ... we'd put some content negotiated apache thing there, and make c1 send HTML if there's a browser, or rdf/xml if they want that stuff instead? Default to the browser / HTML version maybe? Alice: something like that could work. There are some howtos around. Oh, but if c1 isn't an information resource, you'll need to redirect with a 303 HTTP code. It's like you said with people and homepages, to make clear which is which. Bob:
Re: Annotating IR of any relevance? (httpRange-14)
On 26 March 2012 08:51, Giovanni Tummarello giovanni.tummare...@deri.org wrote: Is annotating IRs is of *any value practical and role today* ? Anything of value and core interest to wikipedia, imdb, bestbuy, bbc, geonames, rottentomatoes, lastfm, facebook, whatever. is a NIR. We are talking people, products Everything on the LOD cloud (for what it matters) its all NIR Even pictures, comments, and text are easiy seen and BEST INTERPRETED as NIR they're not just the bytes they're composed of, they're the full record of their creation, the concept of message. a facebook picture is a full record of content, comments, tags, multiple resolutions etc. The mere stream OF BYTES (the IR) IS JUST A DeTAIL that if it REALLY needs to be annotated, ... it can. no problem, with proper attributes hasResolution, hascopyright ok i guess that refers to a IR then. I see where you're coming from here, but will be agnostic for now on that point. Instead, I'd like to draw attention to the distressing fact that we don't even seem as a community to be clear on what is meant by IR? Is IR the mere stream of BYTES, .. or some (slightly) higher abstraction. The OO picture of HTTP/REST I mentioned here recently, for example, has the IR be the hidden-behind-service object whose state we get authoritative samples of via HTTP messages. Making a new http-range-14 agreement without having a common terminology doesn't fill me with hope. Quite different notions of IR are bouncing around here. I tend to read 'IR' as something approximating 'Web-serializable networked entity'; sounds like you're equating it more directly with the content that is sent over the wire? Dan
Re: Annotating IR of any relevance? (httpRange-14)
On 26 March 2012 13:06, Michael Hopwood mich...@editeur.org wrote: Hi Dan, Giovanni, Thank you for this dialogue - I've been following this thread (or trying to!) for some days now and wondering where is the data model in all this?. At the point where Quite different notions of IR are bouncing around... would it not make sense to focus on the fact that there are actually several well-established, intricately worked-out and *open* standard models that overlap at this domain, coming from different ends of the commerciality spectrum, and themselves based on consensus, pre-existing (for example, largely ISO) standards and solid database theory? I'm talking about CIDOC-CRM and Indecs, of course: www.cidoc-crm.org/ http://www.doi.org/topics/indecs/indecs_framework_2000.pdf The fact that these 2 models, apparently quite different in domain, converge on the event-based modelling approach, and both describe information resources and other types of real world (it's fairly safe to say, all types) resource in detail but without too much term bloat, would make them strong contenders for a consensus definition - or at the very least, to point towards the shape a consensus should take. So I've been trying to drag FRBR into this conversation for some years now, http://www.frbr.org/2005/07/05/dan-brickley-and-the-w3c ... but not because it (or Indecs, CRM etc., which also have their charm) is good/better/best, ...rather to assert that different models, and levels of detail, make sense in different contexts. Simple flat records have their place, richer multi-entity structures have their place. If we can avoid the Web architecture itself picking a winner amongst these different ways of thinking about the results of content creation and publication activities, so much the better. The beauty of the Web architecture is its minimalism and pluralism; the challenge here is to bring more clarity to our discussion while preserving that. But I quite agree that the terminologies from those models may help improve the quality of debate here... cheers, Dan
Re: The Battle for Linked Data
On 26 March 2012 16:49, Hugh Glaser h...@ecs.soton.ac.uk wrote: So What is Linked Data? I think this can be defused: 'Linked Data' is the use of the Web standards to share documents that encode structured data, typically but not necessarily using a graph data model. Considerations --- It's important to be open and inclusive. It's important to mention the webby graph data model without getting bogged down in the detail (RDF? which version? which format? OWL too?). It's important to mention standards. Sharing (intranets included!) is more important than 'publishing', or 'public', though the latter should be alluded to. If we stray too far from the graph data model and Web standards like URIs, we lose interop; if we stray too far into nerdy semweb rdf detail, we lose mainstream audience. It's a balance and it's for the market not us to say where the sweet spot lies. And if we start religiously starting to force modeling idioms on the world, we lose credibility; no anti-bnode laws, or strictures about http-range-14. Some things are best left unspoken! Fashions will come and go; look at HTML frames and Flash splash screens. Good taste will triumph, without the Linked Data slogan needing to encode all its aspects. 'Linked Information', from a FOAFy perspective, is then the larger let's share what we know perspective (http://www.flickr.com/photos/danbri/4030764915/ etc) in which we apply equal passion to the sharing of information that is in non-graph data formats, or in people's heads. Doing so brings the graph data model into a distinctively central role, since it can describe other data formats (GML files, spreadsheets, MP3s, videos, mysql dumps... RDF's original use case as metadata), and it can describe people and their characteristics. So we can be pro-RDF here without forcing it down people's throats... and we can be pro-data while admitting that there's vastly more to Web-based information sharing than triples, and more to 'sharing what we know' than sharing data. cheers, Dan
Re: The Battle for Linked Data
On 26 March 2012 19:16, Dan Brickley dan...@danbri.org wrote: On 26 March 2012 16:49, Hugh Glaser h...@ecs.soton.ac.uk wrote: So What is Linked Data? I think this can be defused: 'Linked Data' is the use of the Web standards to share documents that encode structured data, typically but not necessarily using a graph data model. Sorry, lost a bit. 'Linked Data' is the use of the Web and its standards to share documents that encode structured data, typically but not necessarily using a graph data model.
Re: The Battle for Linked Data
On 26 March 2012 20:13, Kingsley Idehen kide...@openlinksw.com wrote: On 3/26/12 2:16 PM, Dan Brickley wrote: I think this can be defused: 'Linked Data' is the use of the Web standards to share documents that encode structured data, typically but not necessarily using a graph data model. TimBL's Linked Data meme isn't about sharing, solely. What about whole data representation and the URI de-reference requirements? Ditto unambiguous URI based naming etc.. Sure, but we don't need to pack our entire shopping list into one slogan. What's at the heart of it that gives a distinctive character? The Web, ... Web-like data models (graphs), and the pragmatic use of standards to allow decentralised data to still be recombinable. I guess the problem is that Linked Data is quite generic when taken literally and like wise in the broader computer science realm discourse. Yup, taking just the words alone, all kinds of thing could fit. We have to find the middle ground between overly specific and pointlessly vague. For me, that's something around the creative re-use of the standard Web infrastructure to exchange and interlink simple factual data expressed as graphs. Some might insist they're not just graphs, but RDF graphs. Other that CSV and random XML is fine (not least because it can be RDFized by consumers). But this is the territory we're marching up and down on. Thus, we have to deal with the question of what moniker best applies to the title of TimBL's Linked Data meme [1] and the best practices that it espouses. Maybe we'll end up referring to fine-grained structured data that adheres to said meme as *Hyperdata*. At the end of the day, that's a cleaner moniker anyway :-) That's a good one too, yep. Dan Links: 1. http://www.w3.org/DesignIssues/LinkedData.html - original Linked Data meme . 2. http://en.wikipedia.org/wiki/Hyperdata -- Wikipedia entry exists (it needs some cleaning up though). -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web:http://www.openlinksw.com Personal Weblog:http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca handle: @kidehen Google+ Profile:https://plus.google.com/112399767740508618350/about LinkedIn Profile:http://www.linkedin.com/in/kidehen
Re: Change Proposal for HttpRange-14
On 25 March 2012 11:03, Michael Brunnbauer bru...@netestate.de wrote: Hello Jeni, On Sun, Mar 25, 2012 at 10:13:09AM +0100, Jeni Tennison wrote: I agree we shouldn't blame publishers who conflate IRs and NIRs. That is not what happens at the moment. Therefore we need to change something. Do you think semantic web projects have been stopped because some purist involved did not see a way to bring httprange14 into agreement with the other intricacies of the project ? Those purists will still see the new options that the proposal offers as what they are: Suboptimal. Or do you think some purists have been actually blaming publishers ? [...] http://go-to-hellman.blogspot.co.uk/2009/10/new-york-times-blunders-into-linked.html comes close to doing so... though more around semantics of 'sameas' than IR/NIR. Dan
Re: Change Proposal for HttpRange-14
On 25 March 2012 20:26, Tim Berners-Lee ti...@w3.org wrote: On 2012-03 -24, at 00:47, Pat Hayes wrote: I am sympathetic, but... On Mar 23, 2012, at 9:59 AM, Dave Reynolds wrote: The proposal is that URI X denotes what the publisher of X says it denotes, whether it returns 200 or not. And what if the publisher simply does not say anything about what the URi denotes? After all, something like 99.999% of the URIs on the planet lack this information. What, if anything, can be concluded about what they denote? The http-range-14 rule provides an answer to this which seems reasonably intuitive. What would be your answer? Or do you think there should not be any 'default' rule in such cases? Exactly. For example, To take an arbitrary one of the trillions out there, what does http://www.gutenberg.org/catalog/world/readfile?fk_files=2372108pageno=11 identify, there being no RDF in it? What can I possibly do with that URI if the publisher has not explicitly allowed me to use it to refer to the online book, under your proposal? Pat Just to follow up on this specific example with the current actual details: (aside: in my mailer I'm replying to TimBL but all the most recent text seems attributed to Pat; maybe some mangling occured?) I can't see a mechanical way to find this, but I happened to know about http://www.gutenberg.org/wiki/Gutenberg:Feeds#The_Project_Gutenberg_Catalog_in_RDF.2FXML_Format ...which guides us to http://www.gutenberg.org/ebooks/2701.rdf and via http 302 from there to pThe document has moved a href=http://www.gutenberg.org/cache/epub/2701/pg2701.rdf;here/a./p it uses xmlns:pgterms=http://www.gutenberg.org/2009/pgterms/; and other vocabs to say, amongst other things, pgterms:ebook rdf:about=ebooks/2701 dcterms:creator rdf:resource=2009/agents/9/ dcterms:descriptionSee also Etext #2489, Etext #15, and a computer-generated audio file, Etext #9147./dcterms:description dcterms:hasFormat rdf:resource=http://www.gutenberg.org/ebooks/2701.epub.noimages/ dcterms:hasFormat rdf:resource=http://www.gutenberg.org/ebooks/2701.kindle.noimages/ dcterms:hasFormat rdf:resource=http://www.gutenberg.org/ebooks/2701.plucker/ dcterms:hasFormat rdf:resource=http://www.gutenberg.org/ebooks/2701.qioo/ dcterms:hasFormat rdf:resource=http://www.gutenberg.org/ebooks/2701.txt.utf8/ dcterms:hasFormat rdf:resource=http://www.gutenberg.org/files/2701/2701-h.zip/ dcterms:hasFormat rdf:resource=http://www.gutenberg.org/files/2701/2701-h/2701-h.htm/ dcterms:hasFormat rdf:resource=http://www.gutenberg.org/files/2701/2701.txt/ dcterms:hasFormat rdf:resource=http://www.gutenberg.org/files/2701/2701.zip/ dcterms:issued rdf:datatype=http://www.w3.org/2001/XMLSchema#date;2001-07-01/dcterms:issued dcterms:language rdf:datatype=http://purl.org/dc/terms/RFC4646;en/dcterms:language dcterms:license rdf:resource=license/ dcterms:publisherProject Gutenberg/dcterms:publisher dcterms:rightsPublic domain in the USA./dcterms:rights dcterms:subject rdf:Description dcam:memberOf rdf:resource=http://purl.org/dc/terms/LCSH/ rdf:valueAdventure stories/rdf:value rdf:valueAhab, Captain (Fictitious character) -- Fiction/rdf:value rdf:valueAllegories/rdf:value rdf:valueEpic literature/rdf:value rdf:valueSea stories/rdf:value rdf:valueWhales -- Fiction/rdf:value rdf:valueWhaling -- Fiction/rdf:value /rdf:Description /dcterms:subject pgterms:agent rdf:about=2009/agents/9 pgterms:birthdate rdf:datatype=http://www.w3.org/2001/XMLSchema#integer;1819/pgterms:birthdate pgterms:deathdate rdf:datatype=http://www.w3.org/2001/XMLSchema#integer;1891/pgterms:deathdate pgterms:nameMelville, Herman/pgterms:name pgterms:webpage rdf:resource=http://en.wikipedia.org/wiki/Herman_Melville/ /pgterms:agent I found this by finding the item number 2701 from inspection of the original link, and plugging it into the metadata template from their human-oriented documentation . The RDF I found makes assertions about various related URLs and things, but nothing that ties directly back to the initial URL. Worse, we've not even any evidence that the RDF doc and the other docs are in the same voice, same publisher, or author etc. Seems a great shame they went to the trouble of publishing quite a rich description of this fine work, and yet it's not easy to find by the machines that could make use of it. Dan
Re: Change Proposal for HttpRange-14
2012/3/23 Melvin Carvalho melvincarva...@gmail.com: 2012/3/23 Giovanni Tummarello giovanni.tummare...@deri.org 2012/3/23 Sergio Fernández sergio.fernan...@fundacionctic.org: Do you really think that base your proposal on the usage on a Powder annotation is a good idea? Sorry, but IMHO HttpRange-14 is a good enough agreement. yup performed brilliantly so far, nothing to say. Industry is flocking to adoption, and what a consensus. +1 'Brilliantly' is an understatement :) And we're probably still only towards the beginning of the adoption cycle! I dont think, even the wildest optimist, could have predicted the success of the current architecture (both pre and post HR14). Oh dear, so now I don't know any more if Gio was being saracastic! Linked Data is a brilliant success, despite the burden of http-range-14. Is a SKOS Concept an Information Resource? Must its URIs 303 redirect? Is a # pointing into an RDFa page OK? We don't make this stuff easy. http-range-14 has long been an embarrassment. Just now all the critics get invited to try to do a better job, which isn't as easy as it looks :) Dan
Re: Change Proposal for HttpRange-14
On 23 March 2012 14:33, Pat Hayes pha...@ihmc.us wrote: On Mar 23, 2012, at 8:52 AM, Jonathan A Rees wrote: I am a bit dismayed that nobody seems to be picking up on the point I've been hammering on (TimBL and others have also pointed it out), that, as shown by the Flickr and Jamendo examples, the real issue is not an IR/NIR type distinction, but rather a distinction in the *manner* in which a URI gets its meaning, via instantiation (of some generic IR) on the one hand, vs. description (of *any* resource, perhaps even an IR) on the other. The whole information-resource-as-type issue is a total red herring, perhaps the most destructive mistake made by the httpRange-14 resolution. +1000. There is no need for anyone to even talk about information resources. The important point about http-range-14, which unfortunately it itself does not make clear, is that the 200-level code is a signal that the URI *denotes* whatever it *accesses* via the HTTP internet architecture. We don't need to get into the metaphysics of HTTP in order to see that a book (say) can't be accessed by HTTP, so if you want to denote it (the book) with an IRI and stay in conformance with this rule, then you have to use something other than a 200-level response. Setting aside http://www.fastcompany.com/1754259/amazon-declares-the-e-book-era-has-arrived ('ebooks' will soon just be 'books', just as 'email' became 'mail'), and slipping into general opinion here that's not particularly directed at Pat. I assume you're emphasising the physical notion of book. Perhaps 'person' is even more obviously physical (though heavily tattoo'd people have some commonaliities with books). The Web architecture that I first learned, was explained to me (HTTP-NG WG era) in terms familiar from the Object Oriented style of thinking about computing (and a minor religion at the time too). The idea is that the Web interface is a kind of encapsulation. External parties don't get direct access to the insides, it's always mediated by HTTP GET and other requests. Just as in Java, you an expose an object's data internals directly, or you get hide them behind getters and setters, same with Web content. So a Web site might encapsulate a coffee machine, teapot or toaster; a CSV file, SGML repository, perl script or whatever). That pattern allowed the Web to get very big, very fast; you could wrap it around anything. In http://www.w3.org/TR/WD-HTTP-NG-interfaces/ we see a variant on this view described, in which the hidden innards of a Web object are constrained to be 'data'. When we think of the Web today, the idea of a 'resource' comes to mind. In general, a resource is an Object that has some methods (e.g. in HTTP, Get Head and Post) that can be invoked on it. Objects may be stateful in that they have some sort of opaque 'native data' that influences their behavior. The nature of this native data is unknown to the outside, unless the object explicitly makes it known somehow. (note, this is from the failed HTTP-NG initiative, not the HTTP/webarch we currently enjoy) So on this thinking, Dan's homepage is an item of Web content, that is encapsulated inside the standard Web interface. It has http-based getters and (potentially) setters, so you can ask for the default bytestream rendering of it, or perhaps content-negotiate with different getter and get a PDF, or a version in another language. But on this OO-style of thinking about Web content, you *never get the thing itself*. Only (possibly lossy, possibly on-the-fly generated) serializations of it. The notion of 'serialization' (also familiar to many coders) doesn't get used much in discussing http-range-14, yes it seems to be very close to our concerns here. Perhaps all the different public serializations of my homepage are so rich that they constitute full (potentially round-trippable) serializations of the secret internal state. Or perhaps they're all lossy, because enough internals are never actually sent out over the wire. The Web design (as I understand/understood) it means that you'll never 100% know what's on the inside. My homepage might be generated by 1000 typing monkeys; or by pulling zeros and ones from filesystem, or composed from a bunch of SQL database lookups. It might be generated by different methods in 2010 to 2012; or from minute to minute. All of this is my private webmasterly business: as far as the rest of the world is concerned, it's all the same thing, ... my homepage. I can move the internals from filesystem-based to wordpress to mediawiki, and from provider to provider. I can choose to serve US IP addresses from a mediawiki in Boston, and Japanese IP addresses from a customised MoinMoin wiki in Tokyo. Why? That's my business! But it's still my homepage. And you - the outside world - don't get to know how it's made. On that thinking, it might be sometimes useful to have clues as to whether sufficient of the secret internals of some Web page could be fully
Re: Change Proposal for HttpRange-14
On 24 March 2012 17:36, Dave Reynolds dave.e.reyno...@gmail.com wrote: However, the data is not always under our complete control and there is no universal agreement on what default fragment to use. Leaving us either having to maintain mapping tables or try multiple probes (when asked for U try U then try U#id then try ...). Not a fatal problem but certainly an inconvenience when managing large and complex stores. Maybe we can come up with such a string? Something that isn't in current use, yet isn't too ugly? Maybe something that looks nice in UTF8 but obscure in ascii-fied form? I know well-known strings are frowned upon, but ... it's tempting. Are there values that would be legitimate as URI/IRI references, yet impossible to be HTML anchor targets? (and therefore avoid clashes?) Problem 2: serialization With a convention of a single standard fragment then prefix notation in Turtle and qname notation in RDF/XML become unusable. You would have to have a separate prefix/namespace for each resource. In turtle you can just write out all URIs in full, inconvenient for not fatal. In RDF/XML you can do that for subjects/objects but not for properties (and not for classes if you want to use abbreviated syntax). Having to declare a new prefix for every property, and maybe every class, in a large ontology just so it can be serialized is a non-starter. Good point. I'm mostly concerned with entity identification (people, movies etc.) rather than vocabulary, since the publishers are typically a bit less semweb-engaged. For entities, there's a bit less need to use a prefixing notation afaik. cheers, Dan
Re: ANN: Sudoc bibliographic ans authority data
On 7 July 2011 23:17, Yann NICOLAS nico...@abes.fr wrote: Bonjour, Sudoc [1], the French academic union catalogue maintained by ABES [2], has just been released as linked open data. 10 million bibliographic records are now available as RDF/XML. Examples for the Sudoc record whose internal id is 132133520 : . Resource URI : http://www.sudoc.fr/132133520/id . Generic document : http://www.sudoc.fr/132133520 (content negotiation is supported) . RDF/XML page : http://www.sudoc.fr/132133520.rdf . HTML pages with schema.org microdata [3] for search engines : http://www.sudoc.fr/132133520.html . The users are not supposed to visit these microdata pages : they are redirected to the standard UI : http://www.sudoc.abes.fr/xslt/DB=2.1/SRCH?IKT=12TRM=132133520 Sudoc RDF data are linked to http://lexvo.org and http://dewey.info/ . They are also linked to IdRef [4], ie the Sudoc authority file that ABES considers as a separate and open application. 2 million IdRef records are also available as RDF data (since October 2010). The links between Sudoc and IdRef are bidirectional. For example, http://www.sudoc.fr/110404416/id ( Rethinking symbolism by Dan Sperber ) links to D. Sperber's IdRef URI: http://www.idref.fr/027146030/id . But, in the other direction, http://www.idref.fr/027146030/id links to *all* the Sudoc documents that are linked to this authority. In next months, we hope to add more links to our data, to OCLC and BnF resources among others. More info (in French) here : http://punktokomo.abes.fr/ Congratulations, this is fantastic nes. And I think also a very timely test-case for how community-maintained and consortium-based standards (schema.org) can be deployed alongside each other. Could you say a little more about the subject classification aspects of this data? I don't know a lot about French cataloguing. In the sample URIs you give above, I find only Rameau. You mention also Dewey.info, so I guess there's Dewey in there. And Rameau also has some mappings to LCSH. Are there other schemes? e.g. I'm interested in particular to find instance data for UDC and for Library of Congress Classification (LCC), but also anything else that has a SKOS expression. Thanks for any more info, cheers, Dan ps. some Gremlin examples follow (see http://danbri.org/words/2011/05/10/675 ) ... it uses the Linked Data Sail to pull in pages on demand from the Web, as you explore into the graph. g = new LinkedDataSailGraph(new MemoryStoreSailGraph()) i1 = g.v('http://www.sudoc.fr/132133520/id') gremlin i1.out('dcterms:subject').out('skos:inScheme') ==v[http://stitch.cs.vu.nl/vocabularies/rameau/autorites_matieres] ==v[http://stitch.cs.vu.nl/vocabularies/rameau/autorites_matieres] ==v[http://stitch.cs.vu.nl/vocabularies/rameau/autorites_matieres] ==v[http://stitch.cs.vu.nl/vocabularies/rameau/autorites_matieres] gremlin i1.out('dcterms:subject').out('skos:prefLabel') ==v[T?l?communications@fr] ==v[Th?ses et ?crits acad?miques@fr] ==v[Nouvelles technologies de l'information et de la communication@fr] ==v[Internet@fr] ==v[] gremlin i2=g.v('http://www.sudoc.fr/110404416/id') gremlin i2.out('dcterms:subject').out('skos:inScheme') ==v[http://stitch.cs.vu.nl/vocabularies/rameau/autorites_matieres] ==v[http://stitch.cs.vu.nl/vocabularies/rameau/autorites_matieres] gremlin gremlin i2.out('dcterms:subject').out('skos:prefLabel') ==v[Signes et symboles@fr] ==v[Anthropologie@fr] [1] http://www.sudoc.abes.fr [2] http://www.abes.fr [3] Shame on us ;) (twice) [4] http://www.idref.fr
survey: who uses the triple foaf:name rdfs:subPropertyOf rdfs:label?
Dear all, The FOAF RDFS/OWL document currently includes the triple foaf:name rdfs:subPropertyOf rdfs:label . This is one of several things that OWL DL oriented tools (eg. http://www.mygrid.org.uk/OWL/Validator) don't seem to like, since it mixes application schemas with the W3C builtins. So for now, pure fact-finding. I would like to know if anyone is actively using this triple, eg. for Linked Data browsers. If we can avoid this degenerating into a thread about the merits or otherwise of description logic, I would be hugely grateful. So - 1. do you have code / applications that checks to see if a property is rdfs:subPropertyOf rdfs:label ? 2. do you have any scope to change this behaviour (eg. it's a web service under your control, rather than shipping desktop software ) 3. would you consider checking for ?x rdf:type foaf:LabelProperty or other idioms instead (or rather, as well). 4. would you object if the triple foaf:name rdfs:subPropertyOf rdfs:label is removed from future version of the main FOAF RDFS/OWL schema? (it could be linked elsewhere, mind) Thanks in advance, Dan
Re: Correct Usage of rdfs:idDefinedBy in Vocabulary Specifications with a Hash-based URI Pattern
On Thu, Sep 30, 2010 at 9:06 AM, Martin Hepp martin.h...@ebusiness-unibw.org wrote: Dear all: We use rdfs:isDefinedBy in all of our vocabularies (*) for linking between the conceptual elements and their specification. Now, there is a subtle question: Let's assume we have an ontology with the main URI http://purl.org/vso/ns All conceptual elements are defined as hash fragment URIs (URI references), e.g. http://purl.org/vso/ns#Bike The ontology itself (the instance of owl:Ontology) has the URI http://purl.org/vso/ns# http://purl.org/vso/ns# a owl:Ontology ; owl:imports http://purl.org/goodrelations/v1 ; dc:title VSO: The Vehicle Sales Ontology for Semantic Web-based E-Commerce@en . So we have two URIs for the ontology: 1. http://purl.org/vso/ns# for the ontology as an abstract artefact 2. http://purl.org/vso/ns for the syntactical representation of the ontology (its serialization) Shall the rdfs:isDefinedBy statements refer to #1 or #2 ? #1 vso:Vehicle a owl:Class ; rdfs:subClassOf gr:ProductOrService ; rdfs:label Vehicle (gr:ProductOrService)@en ; rdfs:isDefinedBy http://purl.org/vso/ns# . === #1 gets my vote... (The isDefinedBy property originally had use cases in mind for situations where the URI of the vocab couldn't be discovered in Webby fashion through dererencing, eg. uuid: or urn: -based identifiers for the terms or vocab). As it turned out, the world learned to live with using http: everywhere, so that particular need faded somewhat :) Dan cheers, Dan #2 vso:Vehicle a owl:Class ; rdfs:subClassOf gr:ProductOrService ; rdfs:label Vehicle (gr:ProductOrService)@en ; rdfs:isDefinedBy http://purl.org/vso/ns . === I had assumed they shall refer to #1, but that caused some debate within our group ;-) Opinions? Best Martin
Re: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.
On Thu, Sep 2, 2010 at 8:10 PM, Anja Jentzsch a...@anjeve.de wrote: Hi all, we are in the process of drawing the next version of the LOD cloud diagram. This time it is likely to contain around 180 datasets altogether having a size of around 20 billion RDF triples. For drawing the next version of the LOD cloud, we have started to collect meta-information about the datasets to be included on CKAN, a registry of open data and content packages provided by the Open Knowledge Foundation. The list of datasets about which we have already collected information is be found here: http://www4.wiwiss.fu-berlin.de/lodcloud/ In addition to basic meta-information about a dataset such as its size and the number of links pointing at other datasets, we also collect additional meta-information about the license of the dataset, alternative access options like SPARQL endpoints or dataset dumps, and whether there exist a voiD description of the dataset or a Semantic Web Sitemap. So if your dataset is not listed yet and you want to have it included into the next version of the LOD cloud, please add it to CKAN until next Wednesday (September 8th, 2010). Also, if we have collected wrong information about your dataset or if your dataset is only partially described up till now, it would be great if you could add the missing information. Guidelines about how to add datasets to CKAN as well as about the tags that we are using to annotate the datasets are found here: http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation We thank all contributors in advance for their input and help, which hopefully will allow us to draw the next version of the LOD cloud as accurate as possible. This is great! Glad to see this being updated :) One thing I would love in the next revision is for FOAF to also be presented as a vocabulary, rather than as if it were itself a distinct dataset. While there are databases that expose as FOAF (LiveJournal etc.), and also a reasonable number of independently published 'FOAF files', the technical core of FOAF is really the vocabulary and the habit of linking things together. Having a FOAF 'blob' is great and all, but it doesn't help people understand that FOAF is used as a vocabulary by various of the other blobs too. And beyond FOAF, I'm wondering how we can visually represent the use of eg. Music Ontology, or Dublin Core, or Creative Commons vocabularies across different regions of the cloud. Maybe (later :) someone could make a view where each blob is a pie-chart showing which vocabularies it uses? As a vocabulary manager, it is pretty hard to understand the costs and benefits of possible changes to a widely deployed RDF vocabulary. I'm sure I'm not alone in this; Tom (cc:'d) I expect would vouch the same regarding the Dublin Core terms. So if there could be some view of the new cloud diagram that showed us which blobs (er, datasets) used which vocabulary (and which terms), that would be really wonderful. On the Dublin Core side, it would be fascinating to see which datasets are using http://purl.org/dc/elements/1.1/ and which are using http://purl.org/dc/terms/ (and which are using both). Similarly with FOAF, I'd like to understand common deployment patterns better. I expect other vocab managers and dataset publishersare in a similar situation, and would appreciate a map of the wider territory, so they know how to fit in with trends and conventions, or what missing pieces of vocabulary might need more work... Thanks for any thoughts, Dan
Re: Predicate for external links on dbpedialite.org?
On Thu, Jul 15, 2010 at 6:09 PM, Nicholas Humfrey nicholas.humf...@bbc.co.uk wrote: Hello, I have added external links to dbpedialite, for example see Berlin: http://dbpedialite.org/things/3354 Is there a better predicate to use than rdfs:seeAlso? I am not sure if it is correct because the link is just a random webpage, rather than an rdfs:Resource but not found anything better. Perhaps an openvocab subclass of rdfa:seeAlso? If you're pointing at documents, you could use foaf:page (inverse of foaf:topic) to say that those pages have (the city) Berlin as a topic. Or if you're more confident, foaf:isPrimaryTopicOf (inverse of foaf:primaryTopic). Oh hey, in the 1/2 hour since I started drafted this reply, I see the conversation has gone in this direction. Yeah it sounds like foaf:topic or foaf:page fit. I don't particularly enjoy RDF vocabs having inverses in them but for that matter we do have both directions named in FOAF, so pick whatever suits your markup best. I lean towards 'topic' as the most intuitively named, but DBpedia uses 'page', which might be worth bearing in mind... Dan
Re: Solving Real Problems with Linked Data: Verifiable Network Identity Single Sign On
On Sun, Jul 11, 2010 at 7:05 PM, Kingsley Idehen kide...@openlinksw.com wrote: Q: What about OpenID? A: The WebID Protocol embraces and extends OpenID via the WebID + OpenID That's an unfortunate turn of phrase. The intent I assume is to suggest that there are ways in which the two approaches can be used together, and ways in which they quite reasonably take differing approaches. When they differ, it's through genuine and transparent differences rather than industry mischief. The embrace and extend phrase is rather too closely associated with cynical manipulation of partial compatibility for commercial advantage. I suggest avoiding it here! From http://en.wikipedia.org/wiki/Embrace,_extend_and_extinguish Embrace, extend and extinguish,[1] also known as Embrace, extend, and exterminate,[2] is a phrase that the U.S. Department of Justice found[3] was used internally by Microsoft[4] to describe its strategy for entering product categories involving widely used standards, extending those standards with proprietary capabilities, and then using those differences to disadvantage its competitors. [...] The strategy and phrase embrace and extend were first described outside Microsoft in a 1996 New York Times article entitled Microsoft Trying to Dominate the Internet,[5] in which writer John Markoff said, Rather than merely embrace and extend the Internet, the company's critics now fear, Microsoft intends to engulf it. The phrase embrace and extend also appears in a facetious motivational song by Microsoft employee Dean Ballard,[6] and in an interview of Steve Ballmer by the New York Times. I think we're doing something quite different here! cheers, Dan
Re: Subjects as Literals
On Tue, Jul 6, 2010 at 12:40 AM, Hugh Glaser h...@ecs.soton.ac.uk wrote: Hi Sampo. I venture in again... I have much enjoyed the interchanges, and they have illuminated a number of cultural differences for me, which have helped me understand why some people have disagree with things that seem clear to me. A particular problem in this realm has been characterised as S-P-O v. O-R-O and I suspect that this reflects a Semantic Web/Linked Data cultural difference, although the alignment will not be perfect. I see I am clearly in the latter camp. Some responses below. imho RDF processing requires both perspectives, and neither is more semwebby or linky than the other. On a good day, we can believe what an RDF doc tells us. It does so in terms of objects/things and their properties and relationships (o-r-o i guess). On another day, we have larger collections of RDF to curate, and need to keep track more carefully of who is claiming what about these object properties; that's the provenance and quads perspective, s-p-o. Note that the subject/predicate/object terminology comes from the old MS spec which introduced reification in a ham-fisted attempt to handle some of this trust-ish stuff, and that most simple data' -oriented stuff uses SPARQL, the only W3C formal spec that covers quads rather than triples. So I don't think the community splits neatly into two on this, and that's probably for the best! RDF processing, specs and tooling are about being able to jump in a fluid and natural way between these two views of data; dipping down into the 'view from one graph', or zooming out to see the bigger picture of who says what. Neither is correct, and it is natural for the terminology to change to capture the shifting emphasis. But until we make this landscape clearer, people will be confused -- when is it an attribute or property, and when is it a predicate? cheers, Dan -- There are two kinds of people in the world, those who believe there are two kinds of people in the world and those who don't. --Benchley
Re: RDF Extensibility
2010/7/6 Jiří Procházka oji...@gmail.com: On 07/06/2010 03:35 PM, Toby Inkster wrote: On Tue, 6 Jul 2010 14:03:19 +0200 Michael Schneider schn...@fzi.de wrote: So, if :s lit :o . must not have a semantic meaning, what about lit rdf:type rdf:Property . ? As, according to what you say above, you are willing to allow for literals in subject position, this triple is fine for you syntactically. But what about its meaning? Would this also be officially defined to have no meaning? It would have a meaning. It would just be a false statement. The same as the following is a false statement: foaf:Person a rdf:Property . Why do you think so? I believe it is valid RDF and even valid under RDFS semantic extension. Maybe OWL says something about disjointness of RDF properties and classes URI can be many things. It just so happens as a fact in the world, that the thing called foaf:Person isn't a property. It's a class. Some might argue that there are no things that are simultaneously RDF classes and properties, but that doesn't matter for the FOAF case. The RSS1 vocabulary btw tried to define something that was both, rss1:image I think; but this was a backwards-compatibility hack. cheers, Dan
Re: Subjects as Literals
On Tue, Jul 6, 2010 at 11:17 PM, Pat Hayes pha...@ihmc.us wrote: [...] This is the canonical way to find it's meaning, and is the initial procedure we should use to arbitrate between competing understandings of its meaning. Whoo, I doubt if that idea is going to fly. I sincerely hope not. Using that, how would you determine the meaning of the DC vocabulary? It's also worth bearing in mind that Web sites get hacked from time to time. W3C gets attacked regularly (but is pretty robust). The FOAF servers were compromised a year or two back (but the xmlns.com site was untouched). For a while, foaf-project.org was serving evil PHP and ugly links, as was my own home page. This kind of mischief should be kept in mind by anyone building a system that assumes you'll get canonical meaning from an HTTP GET... cheers, Dan
Re: PRISM data on the LOD cloud?
On Fri, Jul 2, 2010 at 3:19 PM, Hammond, Tony t.hamm...@nature.com wrote: Hi Kingsley: Kill me with the PDF URL :-( I think we could have been a tad more gracious here. This kind of remark only serves to alienate the well intentioned. You know, it's not actually (yet) a crime to put out a PDF on the open Web. Yes, it may not be the most webby of document formats but it does have certain viabilities. Re your question: Where can I see GET the RDF/XML resource? There's RDF/XML XMP hidden inside the file, talking of XMP. Presumably Virtuoso has a sponger for it. Copied below as a reminder that rdf:Seq will be very hard to delete from the Web, since most files that pass through Adobe toolchain have it stuffed inside... Dan x:xmpmeta xmlns:x=adobe:ns:meta/ x:xmptk=Adobe XMP Core 4.1-c036 46.277092, Fri Feb 23 2007 14:16:18 rdf:RDF xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#; rdf:Description rdf:about= xmlns:dc=http://purl.org/dc/elements/1.1/; dc:formatapplication/postscript/dc:format dc:title rdf:Alt rdf:li xml:lang=x-defaultPrint/rdf:li /rdf:Alt /dc:title /rdf:Description rdf:Description rdf:about= xmlns:xap=http://ns.adobe.com/xap/1.0/; xmlns:xapGImg=http://ns.adobe.com/xap/1.0/g/img/; xap:CreatorToolAdobe Illustrator CS3/xap:CreatorTool xap:CreateDate2008-10-10T11:07:02-04:00/xap:CreateDate xap:ModifyDate2008-10-10T11:07:02-04:00/xap:ModifyDate xap:MetadataDate2008-10-10T11:07:02-04:00/xap:MetadataDate xap:Thumbnails rdf:Alt rdf:li rdf:parseType=Resource xapGImg:width256/xapGImg:width xapGImg:height96/xapGImg:height xapGImg:formatJPEG/xapGImg:format xapGImg:image [ big pile of hex snipped ] /xapGImg:image /rdf:li /rdf:Alt /xap:Thumbnails /rdf:Description rdf:Description rdf:about= xmlns:xapMM=http://ns.adobe.com/xap/1.0/mm/; xmlns:stRef=http://ns.adobe.com/xap/1.0/sType/ResourceRef#; xapMM:DocumentIDuuid:ED37D99F4D98DD11B2AD92E8487485F8/xapMM:DocumentID xapMM:InstanceIDuuid:EE37D99F4D98DD11B2AD92E8487485F8/xapMM:InstanceID xapMM:DerivedFrom rdf:parseType=Resource stRef:instanceIDuuid:EC37D99F4D98DD11B2AD92E8487485F8/stRef:instanceID stRef:documentIDuuid:EB37D99F4D98DD11B2AD92E8487485F8/stRef:documentID /xapMM:DerivedFrom /rdf:Description rdf:Description rdf:about= xmlns:illustrator=http://ns.adobe.com/illustrator/1.0/; illustrator:StartupProfilePrint/illustrator:StartupProfile /rdf:Description rdf:Description rdf:about= xmlns:xapTPg=http://ns.adobe.com/xap/1.0/t/pg/; xmlns:stDim=http://ns.adobe.com/xap/1.0/sType/Dimensions#; xmlns:xapG=http://ns.adobe.com/xap/1.0/g/; xapTPg:MaxPageSize rdf:parseType=Resource stDim:w11.00/stDim:w stDim:h8.50/stDim:h stDim:unitInches/stDim:unit /xapTPg:MaxPageSize xapTPg:NPages1/xapTPg:NPages xapTPg:HasVisibleTransparencyFalse/xapTPg:HasVisibleTransparency xapTPg:HasVisibleOverprintFalse/xapTPg:HasVisibleOverprint xapTPg:PlateNames rdf:Seq rdf:liCyan/rdf:li rdf:liMagenta/rdf:li rdf:liYellow/rdf:li rdf:liBlack/rdf:li rdf:liC=100 M=10 Y=0 K=0 1/rdf:li /rdf:Seq /xapTPg:PlateNames xapTPg:SwatchGroups rdf:Seq rdf:li rdf:parseType=Resource xapG:groupNameDefault Swatch Group/xapG:groupName xapG:groupType0/xapG:groupType xapG:Colorants rdf:Seq rdf:li rdf:parseType=Resource xapG:swatchNameWhite/xapG:swatchName xapG:modeCMYK/xapG:mode ...etc etc (big file...)
Re: Subjects as Literals, [was Re: The Ordered List Ontology]
[snip] This is the second time in a few hours that a thread has degenerated into talk of accusations and insults. I don't care who started it. Sometimes email just isn't the best way to communicate. If people are feeling this way about an email discussion, it might be worth the respective parties spending a few minutes on the phone to try to smooth things over. Or not. I don't care, really. But each of these mail messages is getting distributed to several hundred readers. It would be good if we can find ways of using that bandwidth to solve problems rather than get into fights. Or maybe we should all just take a weekend break, mull things over for a couple of days, and start fresh on monday? That's my plan anyhow... cheers, Dan
An RDF wishlist
(rejigged subject line) On Thu, Jul 1, 2010 at 4:35 AM, Pat Hayes pha...@ihmc.us wrote: Pat, I wish you had been there. ;) I have very mixed views on this, I have to say. Part of me wanted badly to be present. But after reading the results of the straw poll, part of me wants to completely forget about RDF, never think about an ontology or a logic ever again, and go off and do something completely different, like art or philosophy. I have mixed feelings about missing the workshop too. Having been pushing this wheelbarrow uphill for far too long, it does seem a shame to have missed such an event. On the other hand, it is hard to know what to make of the workshop outcomes since the participants form an unusually specialist subset of humanity, and the problem of what W3C next does with its RDF standard such a small part of the larger problem. It's clear that many workshop participants were aware of the risk of destabilizing the core technologies just as we are gaining some very promising real-world traction. That was a relief to read. For those who have invested time and money in helping us get this far, and who had the resources to participate, this concern was probably enough to motivate participation. It's clear also that participants were aware of many of the little annoyances that bring friction and frustration to those working with RDF. What I'm less sure of is how to represent the perspective of those who have explored RDF and walked away. Over the years, many bright people have investigated RDF enthusiastically, and left disappointed. Those folk didn't come to the workshop, they didn't write a position paper, and they probably don't particularly care about its outcomes. But they're just the kind of people who will need to enjoy using RDF if we are to succeed. Is RDF hard to work with? I think the answer remains 'yes', but we lack consensus on why. And it seems even somehow disloyal to admit it. If I had to list reasons, I'd leave nits like 'subjects as literals' pretty low down. Many of the reasons I think are anavoidable, and intrinsic to the kind of technology and problems we're dealing with. But there are also lots of areas for improvement. Most of these are nothing to do with fixups to W3C standards documentation. And finally, we can lesson the perception of pain by improving the other side: getting more decent linked data out there, so the suffering people go through is worth it. Some reasons why RDF is annoying and hard (a mildly ordered list): * RDF data is gappy, chaotic, full of unexpected extensions and omissions - BY DESIGN * RDF toolkits each offer different items from a large menu (syntaxes, storage, inference facilities), so even when you're getting a lot, you probably don't appreciate what you're getting and we have no common checklist that help non-guru developers understand this. * RDF toolkit / library immaturity; eg1. I wasted half a weekend recently trying to find a decent Javascript system. eg2. I work in Python using the popular rdflib library, whose half-finished SPARQL support was recently removed and put into an 'extras' package; nobody seems quite sure how well it works. The Ruby landscape remains messy although the public-rdf-ruby list have recently been collaborating actively to improve things. Broken old and abandoned code litters the Web; good stuff remains on the bleeding edge and unpackaged. Great ideas, code and algorithms remain trapped in a single implementation language rather than transliterated to other widely deployed languages. Almost every toolkit's SQL backend is represented differently. Only a few serializers bother to prettify RDF/XML nicely, despite there being opensource code out there that could easily be copied. * RDF is good for aggregation of externally managed data; managing data *as* RDF comes with certain complexities since edit/delete operations on a connected graph aren't as intuitive as on a closed tree structure. If I delete a certain node from the graph, which others should be cleaned up too? Named graphs help somewhat there but good habits aren't yet understood, much less documented. * As a community, we have some standards for documenting the atomic terms in our vocabularies (ie. RDFS/OWL) but we tend to stop there, and not to document the larger graph patterns that are needed to really communicate using these structures, or the underlying use cases that motivated them in the first place. We also don't do nearly enough analytics and stats over the actual data out there to make it easier to consume, and for publishers to gravitate towards existing idioms rather than make up similar-but-different graph patterns that'll confuse the landscape further. * Our small community (we are outnumbered by Visual Basic enthusiasts, let alone Javascripters) is fragmented and grumpy. OWL and Linked Data enthusiasts too often talk and think disparagingly about each others' work, or not-so-secretly wish the others would just go away and stop
Re: destabilizing core technologies: was Re: An RDF wishlist
Hi Patrick, On Thu, Jul 1, 2010 at 11:39 AM, Patrick Durusau patr...@durusau.net wrote: Dan, Just a quick response to only one of the interesting points you raise: It's clear that many workshop participants were aware of the risk of destabilizing the core technologies just as we are gaining some very promising real-world traction. That was a relief to read. For those who have invested time and money in helping us get this far, and who had the resources to participate, this concern was probably enough to motivate participation. It might be helpful to recall that destabilizing the core technologies was exactly the approach that SGML took when its little annoyances [brought] friction and frustration to those working with [SGML]... There was ...promising real-world traction. I don't know what else to call the US Department of Defense mandating the use of SGML for defense contracts. That is certainly real-world and it seems hard to step on an economic map of the US without stepping in defense contracts of one sort or another. Yes, you are right. It is fair and interesting to bring up this analogy and associated history. SGML even got a namecheck in the original announcement of the Web, see http://groups.google.com/group/alt.hypertext/msg/395f282a67a1916c and even today HTML is not yet re-cast in terms XML, much less SGML. Many today are looking to JSON rather than XML, perhaps because of a lack of courage/optimism amongst XMLs creators that saddled it with more SGML heritage than it should now be carrying. These are all reasons for chopping away more bravely at things we might otherwise be afraid of breaking. But what if we chop so much the original is unrecognisable? Is that so wrong? What if RDF's biggest adoption burden is the openworld triples model? Clinging to decisions that seemed right at the time they were made is a real problem. It is only because we make decisions that we have the opportunity to look back and wish we had decided differently. That is called experience. If we don't learn from experience, well, there are other words to describe that. :) So, I wouldn't object to a new RDF Core WG, to cleanups including eg. 'literals as subjects' in the core data model, or to see the formal semantics modernised/simplified according to the latest wisdom of the gurus. I do object to the idea that proposed changes are the kinds of thing that will make RDF significantly easier to deploy. The RDF family of specs is already pretty layered. You can do a lot without ever using or encountering rdf:Alt, or reification, or OWL DL reasoning, or RIF. Or reading a W3C spec. The basic idea of triples is pretty simple and even sometimes strangely attractive, however many things have been piled on top. But simplicity is a complex thing! Having a simple data model, even simple, easy to read specs, won't save RDF from being a complex-to-use technology. We have I think a reasonably simple data model. You can't take much away from the triples story and be left with anything sharing RDF's most attractive properties. The specs could be cleaner and more accessible. But I know plenty of former RDF enthuasiasts who knew the specs and the tech inside out, and still ultimately abandoned it all. Making RDF simpler to use can't come just from simplifying the specs; when you look at the core, and it's the core that's the problem, there just isn't much left to throw out. Some of the audience for these postings will remember that the result of intransigence on the part of the SGML community was XML. XML was a giant gamble. It's instructive to look back at what happened, and to realise that we don't need a single answer (a single gamble) here. Part of the problem I was getting at earlier was of dangerously elevated expectations... the argument that *all* data in the Web must be in RDF. We can remain fans of the triple model for simple factual data, even while acknowledging there will be other useful formats (XMLs, JSONs). Some of us can gamble on lets use RDF for everything. Some can retreat to the original, noble and neglected metadata use case, and use RDF to describe information, but leave the payload in other formats; others (myself at least) might spend their time trying to use triples as a way of getting people to share the information that's inside their heads rather than inside their computers. I am not advocating in favor of any specific changes. I am suggesting that clinging to prior decisions simply because they are prior decisions doesn't have a good track record. Learning from prior decisions, on the other hand, such as the reduced (in my opinion) feature set of XML, seems to have a better one. (Other examples left as an exercise for the reader.) So, I think I'm holding an awkward position here: * massive feature change (ie. not using triples, URIs etc); or rather focus change: become a data sharing in the Web community not a doing stuff with triples community * cautious
Re: Show me the money - (was Subjects as Literals)
On Thu, Jul 1, 2010 at 5:38 PM, Jeremy Carroll jer...@topquadrant.com wrote: I am still not hearing any argument to justify the costs of literals as subjects I have loads and loads of code, both open source and commercial that assumes throughout that a node in a subject position is not a literal, and a node in a predicate position is a URI node. Of course, the correct thing to do is to allow all three node types in all three positions. (Well four if we take the graph name as well!) But if we make a change, all of my code base will need to be checked for this issue. This costs my company maybe $100K (very roughly) No one has even showed me $1K of advantage for this change. It is a no brainer not to do the fix even if it is technically correct Well said. Spend the money on a W3C-license javascript SPARQL engine, or on fixing and documenting and test suiting what's out there already. And whatever's left on rewriting it in Ruby, Scale, Lua ... Better still, put the money up as a prize, then you only have to give it to one party, while dozens of others will slave away for free in pursuit of said loot ;) Dan
Re: Show me the money - (was Subjects as Literals)
On Thu, Jul 1, 2010 at 6:29 PM, Sandro Hawke san...@w3.org wrote: On Thu, 2010-07-01 at 17:10 +0100, Nathan wrote: In all honesty, if this doesn't happen, I personally will have no choice but to move to N3 for the bulk of things, and hope for other serializations of N3 to come along. RIF (which became a W3C Recommendation last week) is N3, mutated (in some good ways and some bad ways, I suppose) by the community consensus process. RIF is simultaneously the heir to N3 and a standard business rules format. RIF's central syntax is XML-based, but there's room for a presentation syntax that looks like N3. RIF includes triples which can have literals as subject, of course. (In RIF, these triples are called frames. Well, sets of triples with a shared subject are called frames, technically. But they are defined by the spec to be an extension of RDF triples.) Excellent, so there's no need to mess with RDF itself for a while? We can let RIF settle in for a couple years and see how it shapes up against people's RDFCore 2.0 aspirations? Dan
Re: Show me the money - (was Subjects as Literals)
(cc: list trimmed to LOD list.) On Thu, Jul 1, 2010 at 7:05 PM, Kingsley Idehen kide...@openlinksw.com wrote: Cut long story short. [-cut-] We have an EAV graph model, URIs, triples and a variety of data representation mechanisms. N3 is one of those, and its basically the foundation that bootstrapped the House of HTTP based Linked Data. I have trouble believing that last point, so hopefully I am misunderstanding your point. Linked data in the public Web was bootstrapped using standard RDF, serialized primarily in RDF/XML, and initially deployed mostly by virtue of people enthusiastically publishing 'FOAF files' in the (RDF)Web. These files, for better or worse, were overwhelmingly in RDF/XML. When TimBL wrote http://www.w3.org/DesignIssues/LinkedData.html in 2006 he used what is retrospectively known as Notation 2, not its successor Notation 3. Notation2[*] was an unstriped XML syntax ( see original in http://web.archive.org/web/20061115043657/http://www.w3.org/DesignIssues/LinkedData.html ). That DesignIssues note was largely a response to the FOAF deployment. This linking system was very successful, forming a growing social network, and dominating, in 2006, the linked data available on the web. The LinkedData design note argued that (post RDFCore cleanup and http-range discussions) we could now use URIs for non-Web things, and that this would be easier than dealing with bNode-heavy data. Much of the subsequent successes come from following that advice. Perhaps N3 played an educational role in showing that RDF had other representations; but by then, SPARQL, NTriples etc were also around. As was RDFa, http://xtech06.usefulinc.com/schedule/paper/58 ... I have a hard time seeing N3 as the foundation that bootstrapped things. Most of the substantial linked RDF in Web by 2006 was written in RDF/XML, and by then the substantive issues around linking, reference, aggregation, identification and linking etc were pretty well understood. I don't dislike N3; it was a good technology testbed and gave us the foundation for SPARQL's syntax, and for the Turtle subset. But it's role outside our immediate community has been pretty limited in my experience. cheers, Dan [*] http://www.w3.org/DesignIssues/Syntax.html
Re: Show me the money - (was Subjects as Literals)
On Thu, Jul 1, 2010 at 11:35 PM, Kingsley Idehen kide...@openlinksw.com wrote: The sequence went something like this. TimBL Design Issues Note. and SPARQL emergence. Before that, RDF was simply in the dark ages. It's only simple if you weren't there :) You mean you didn't see me lurking in the dark? :-) Humor aside, pre Linked Data meme, RDF just wasn't making any tangible progress (adoption or comprehension wise) beyond the inner sanctums of the Semantic Web community, you know what I mean when I say that, right? And all I'm saying is that it took a lot of work from a lot of people (most of whom are on these lists) to get to that stage where it was capable of breaking out. The state of RDF deployment, tooling, concepts, specs and community in 2006 was a significant improvement on what we had in, say 1999. The Linked Data push was a breakthrough, but it didn't happen in a vacuum or overnight; neither did SPARQL... cheers, Dan
Re: The Ordered List Ontology
On Wed, Jun 30, 2010 at 6:34 PM, Pat Hayes pha...@ihmc.us wrote: On Jun 30, 2010, at 6:45 AM, Toby Inkster wrote: On Wed, 30 Jun 2010 10:54:20 +0100 Dan Brickley dan...@danbri.org wrote: That said, i'm sure sameAs and differentIndividual (or however it is called) claims could probably make a mess, if added or removed... You can create some pretty awesome messes even without OWL: # An rdf:List that loops around... #mylist a rdf:List ; rdf:first #Alice ; rdf:next #mylist . # A looping, branching mess... #anotherlist a rdf:List ; rdf:first #anotherlist ; rdf:next #anotherlist . They might be messy, but they are *possible* structures using pointers, which is what the RDF vocabulary describes. Its just about impossible to guarantee that messes can't happen when all you are doing is describing structures in an open-world setting. But I think the cure is to stop thinking that possible-messes are a problem to be solved. So, there is dung in the road. Walk round it. Yes. So this is a point that probably needs careful presentation to new users of this technology. Educating people that they shouldn't believe any random RDF they find in the Web, ... now that is pretty easy. Still needs doing, but it shadows real world intuitions pretty well. If in real life you think the Daily Mail is full of nonsense, then it isn't a huge leap to treat RDFized representations of their claims with similar skepticism (eg. see http://data.totl.net/cancer_causes.rdf for a great list of Things The Daily Mail Say Might Cause Cancer). *However* it is going to be tough to persuade developers to treat a basic data structure like List in the same way. Lists are the kinds of thing we expect to be communicated perfectly or to get some low-level error. A lot of developers will write RDF-consuming code that won't anticipate errors. Hopefully supporting software libraries can take some of the strain here... cheers, Dan
Re: ANNOUNCE: lod-announce list
On Sun, Jun 13, 2010 at 7:44 PM, Angelo Veltens angelo.velt...@online.de wrote: Hi, Ian Davis schrieb: Hi all, Now we are getting a steady growth in the number of Linked Data sites, products and services I thought it was time to create a low-volume announce list for Linked Data related announcements so people can keep up to date without needing to wade through the LOD discussion. You can join the list at http://groups.google.com/group/lod-announce Sounds find, but is it possible to subscribe to the list without a google account? Yes. The Google Groups site doesn't make it particularly easy to find from the lod-announce group homepage, but see http://groups.google.com/support/bin/answer.py?answer=46606cbid=-o2vzb2h0iyxwsrc=cblev=index Q: How do I subscribe to a group? A: You can subscribe to a group through our web interface or via email. To subscribe to a group through our web interface, simply log in to your Google Account and visit the group of your choice. Then click the Join this group link on the right-hand side of the page under About this group. To subscribe to a group via email, send an email to [groupname]+subscr...@googlegroups.com. For example, if you wanted to join a group called google-friends, you'd send an email to google-friends+subscr...@googlegroups.com cheers, Dan
Re: Organizations changing status
On Tue, Jun 8, 2010 at 12:17 PM, William Waites william.wai...@okfn.org wrote: On 10-06-07 23:03, Emmanouil Batsis (Manos) wrote: b) what happens when organizations change legal status? I'm not certain but I don't think this ever really happens. In practice the old organisation ceases to exist and a new one comes into being possibly with a period of overlap. They may share the same name and informally be referred to as the same but technically they are different organisations. I think this suggests two predicates that are not present in the ontology -- org:successor and org:predecessor Here's a nice practical example: the Dublin Core Metadata Initiative. http://purl.org/dc/aboutdcmi - http://dublincore.org/DCMI.rdf rdf:RDF xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#; xmlns:rdfs=http://www.w3.org/2000/01/rdf-schema#; xmlns:dct=http://purl.org/dc/terms/; xmlns:foaf=http://xmlns.com/foaf/0.1/; foaf:Organization rdf:about=http://purl.org/dc/aboutdcmi#DCMI; foaf:nameDublin Core Metadata Initiative/foaf:name foaf:nickDCMI/foaf:nick foaf:homepage rdf:resource=http://dublincore.org/; / rdfs:seeAlso rdf:resource=http://purl.org/dc/aboutdcmi; / dct:descriptionThe Dublin Core Metadata Initiative is an open forum engaged in the development of interoperable online metadata standards that support a broad range of purposes and business models. DCMI's activities include consensus-driven working groups, global conferences and workshops, standards liaison, and educational efforts to promote widespread acceptance of metadata standards and practices./dct:description dct:created1995-01-03/dct:created dct:subject rdf:resource=http://id.loc.gov/authorities/sh96000740#concept/ dct:subject rdf:resource=http://id.loc.gov/authorities/sh98002267#concept/ /foaf:Organization /rdf:RDF There was a little discussion on this point: when was the Dublin Core created as an organization? It began in 1995 but as an informal internet-mediated community. In recent years this has increasingly solidified until now there is a legal entity; http://dublincore.org/about-us/ The Dublin Core Metadata Initiative (DCMI) is an open organization, incorporated in Singapore as a public, not-for-profit Company limited by Guarantee (registration number 200823602C), engaged in the development of interoperable metadata standards that support a broad range of purposes and business models. RDF doesn't natively handle the representation of changes over time. In some contexts we'll want to talk as if there is a single thing that existed since 1995. In some other contexts we'll want to be precise, and talk of the legal entity in Singapore. RDF has the basics to allow this kind of separation and folding together of perspectives, but in everyday practice we don't yet do it very well, to be honest. I'd be interested to see proposals for refining the Dublin Core's self-description to include a more detailed picture using the Org: vocab... cheers, Dan
Re: Organization types predicates vs classes
On Tue, Jun 8, 2010 at 12:21 PM, William Waites william.wai...@okfn.org wrote: On 10-06-08 04:27, Todd Vincent wrote: By adding OrganizationType to the Organization data model, you provide the ability to modify the type of organization and can then represent both (legal) entities and (legally unrecognized) organizations. :foo rdf:type SomeKindOfOrganisation . vs. :foo org:organisationType SomeKindOfOrganisation . I don't really see the need for an extra predicate with almost identical semantics to rdf:type. There is nothing stopping a subject from having more than one type. Yes, exactly. The schema guarantees things will have multiple types. The art is to know when to bother mentioning each type. Saying things are an rdfs:Resource is rarely interesting. Saying they're a foaf:Agent is also pretty bland and uninformative. The mid-level classes around Organization are generally more interesting, and folk using local / community-extended classes (foo:CultLikeOrganization bar:SomePreciseSubClassOrg etc) probably ought to mention mid-level classes too. Some day we'll get support for these distinctions from the big RDF aggregators and from analysis of code, SPARQL queries etc, so we know which terms are most likely to be understood. BTW the syntax of RDFa (compared to RDF/XML) makes it easy and much less ugly to mention extra types and relations. Mentioning a second relationship in original syntax of RDF/XML is particularly verbose. In RDFa we have space-separated lists of qualified names, which significantly reduces the cost of mixing general (widely understood) classes with precise (but more obscure) community extensions. This is a pretty good thing :) cheers, Dan
Re: Organization ontology
On Tue, Jun 8, 2010 at 12:54 PM, Kingsley Idehen kide...@openlinksw.com wrote: Peristeras, Vassilios wrote: Hello all, I have the feeling that we are (at least partly) reinventing the wheel here. There have been several initiatives drafting generic models and representations for organizations. Just two examples below [1][2] which go back to 90ies. More generally, an in-depth look at design and data patterns literature could also help a lot. I have the feeling that others before this group have defined concepts like organization, legal entity etc... We could re-use their conceptual (or data or formal) models, instead of starting the discussion from scratch. Best regards, Vassilios [1] http://www.aiai.ed.ac.uk/project/enterprise/enterprise/ontology.html [2] http://www.eil.utoronto.ca/enterprise-modelling/tove/ Both of your links point to PDFs or Postscript docs. Are there any actual ontology doc URLs? The enterprise ontology page is HTML and describes availability as The formal Ontolingua encoding of the Enterprise Ontology is held in the Library of Ontologies maintained by Stanford University's Knowledge Systems Lab (KSL). http://www-ksl-svc.stanford.edu:5915/FRAME-EDITOR/UID-15908sid=ANONYMOUSuser-id=ALIEN Last modified: Monday, 31 May 2010 sounds fresher than I expected. There's LISP here: http://www-ksl-svc.stanford.edu:5915/FRAME-EDITOR/UID-15901sid=ANONYMOUSuser-id=ALIEN#ENTERPRISE-ONTOLOGY I guess there must be an OWL conversion tool around somewhere. I've copied Mike Uschold who may have more to say on this... cheers, Dan
Re: Slideshare.net as Linked Data
On Mon, Jun 7, 2010 at 8:18 PM, Paul Groth pgr...@gmail.com wrote: Hi All, I've wrapped the Slideshare.net API to expose it as RDF. You can find a blog post about the service at [1] and the service itself at [2]. An interesting bit is how we deal with Slideshare's API limits by letting you use your own API key. It's still needs to be properly linked (i.e. point to other resources on the WoD) but we're working on it. [1] http://thinklinks.wordpress.com/2010/06/07/linking-slideshare-data/ [2] http://linkeddata.few.vu.nl/slideshare/ Cool :) How does it relate to the RDFa they're embedding? (There's definitely a role for value-adding, even for sites that embed per-page RDF already...) cheers, Dan Let me know what you think, Thanks, Paul -- Dr. Paul Groth (pgr...@few.vu.nl) http://www.few.vu.nl/~pgroth/ Postdoc Knowledge Representation Reasoning Group Artificial Intelligence Section Department of Computer Science VU University Amsterdam
Re: Why should we publish ordered collections or indexes as RDF?
2010/6/3 Haijie.Peng haijie.p...@gmail.com: [Apologies for cross-posting] Why should we publish ordered collections or indexes as RDF? is it necessary? On the Web, very little is 'necessary'. But some things can be useful. Indexes and summaries can help software prioritise, and allow larger files to be loaded only when needed. It depends what you mean by 'ordered collections' and 'indexes'. But the reason for sitemap-style summaries is usually to help external sites monitor the content of the Web better. At http://www.sitemaps.org/ there is an explanation of the sitemaps format which several crawlers use. I believe the Google crawler will use it to help schedule activity on a site, and that -for example- it can help if you want your RDF/FOAF or XFN documents to be indexed byGoogle's Social Graph API - http://code.google.com/apis/socialgraph/ There is also a version of this format called Semantic Sitemaps, but http://sw.deri.org/2007/07/sitemapextension/ is offline right now. In other cases, RSS feeds (also Atom) do the same thing, and provide a 'What's new' feed for a site, letting everyone know which documents are new or updated, so that they can be (re-)indexed. For large collections of documents, it is useful sometimes to have smaller summary documents so that the bigger files can be fetched only when they are needed. Mobile apps that care about bandwidth are an example scenario there. Regarding Linked Data, what we do there is to link descriptions together. Each partial description often links to other documents that are about the same real-world thing. This addresses some of the same needs as a top level index or catalogue, because you can retrieve different levels of detail from different sites. So my small FOAF file is in some ways a top level entry (index?) for me, and it might point to larger files (eg. twitter or flickr datasets) that are maintained separately. RDF aggregator sItes like sindice.com can be used to link these together, even if the top level file does not contain links to every other file that mentions me. So in that scenario, it is not 100% necessary for the small file to be an index to the large files. The data can be linked together later if common identifiers are used in each data set. Hope this helps. Can you say more about the specific situation you have in mind? cheers, Dan
Re: Organization ontology
On Thu, Jun 3, 2010 at 8:47 AM, Stuart A. Yeates syea...@gmail.com wrote: On Wed, Jun 2, 2010 at 8:09 PM, Dave Reynolds dave.e.reyno...@googlemail.com wrote: On Wed, 2010-06-02 at 17:06 +1200, Stuart A. Yeates wrote: On Tue, Jun 1, 2010 at 7:50 PM, Dave Reynolds dave.e.reyno...@googlemail.com wrote: We would like to announce the availability of an ontology for description of organizational structures including government organizations. This was motivated by the needs of the data.gov.uk project. After some checking we were unable to find an existing ontology that precisely met our needs and so developed this generic core, intended to be extensible to particular domains of use. [1] http://www.epimorphics.com/public/vocabulary/org.html I think this is great, but I'm a little worried that a number of Western (and specifically Westminister) assumptions may have been built into it. Interesting. We tried to keep the ontology reasonably neutral, that's why, for example, there is no notion of a Government or Corporation. Could you say a little more about the specific Western Westminster assumptions that you feel are built into it? (*) that structure is relatively static with sharp transitions between states. This simplification pretty much comes 'out of the box' with the use of RDF or other simple logics (SQL too). Nothing we do here deals in a very fluid manner with an ever-changing, subtle and complex world. But still SQL and increasingly RDF can be useful tools, and used carefully I don't think they're instruments of western cultural imperialism. I don't find anything particularly troublesome about the org: vocab on this front. If you really want to critique culturally-loaded ontologies, I'd go find one that declares class hierarchies with terms like 'Terrorist' without giving any operational definitions... (*) that an organisation has a single structure rather than a set of structures depending on the operations you are concerned with (finance, governance, authority, criminal justice, ...) Couldn't the subOrganizationOf construct be used to allow these different aspects be described and then grouped loosly together? (*) that the structures are intended to be as they are, rather than being steps towards some kind of Platonic ideal I'm not getting that from the docs. For example, We felt that the best approach was to develop a small, generic, reusable core ontology for organizational information and then let developers extend and specialize it to particular domains. ...suggests a hope for incremental refinement / improvement, but also a hope that the basic pieces are likely to map onto multiple parties situations at a higher level. Bit of both there, but no Plato. ... Modelling the crime organisations (the mafia, drug runners, Enron, identity crime syndicates) may also be helpful in exposing assumptions, particularly those in mapping the real-world to legal entities. I agree these are interesting areas to attempt to describe, but dealing with situations where obfuscation, secrecy and complexity are core business is a tough stress-test of any model. Ontology-style modeling works best when there is a shared conceptualisation of what's going on; even many direct participants in these complex crime situations lack that. So I'd suggest for those situations taking a more evidence-based social networks approach; instead of saying here's their org chart, build things up from raw data of who emails who, who knows who, who met who, where and when (or who claimed that they did), etc. RDF is ok for that task too. Those techniques are also useful when understanding how more legitimate organizations really function, but (as mentioned w.r.t. accountability) it can largely be broken out as a separate descriptive problem. Alternatively, this may help in defining the subset of organisations that you're trying to model. Yup Control is a different issue from organizational structure. This ontology is not designed to support reasoning about authority and governance models. There are Enterprise Ontologies that explicitly model authority, accountability and empowerment flows and it would be possible to create a generic one which bolted alongside org but org is not such a beast :) I suspect I may have mis-understood the subset of problems you're trying to solve. A statement such as the above in the ontology document might save others making the same mistake. Perhaps the scope is organizations in which there is some ideal that all participants can share a common explicit understanding of (the basics of) how things work - who does roughly what, and what the main aggregations of activity are. Companies, clubs, societies, public sector bodies etc. Sure there will be old-boy networks, secret handshakes and all kinds of undocumented channels, but those are understood as routing-around the main tranparent shared picture of how the organization works (or should work).
Re: Organization ontology
On Thu, Jun 3, 2010 at 3:07 PM, William Waites william.wai...@okfn.org wrote: On 10-06-03 09:01, Dan Brickley wrote: I don't find anything particularly troublesome about the org: vocab on this front. If you really want to critique culturally-loaded ontologies, I'd go find one that declares class hierarchies with terms like 'Terrorist' without giving any operational definitions... I must admit when I looked at the org vocabulary I had a feeling that there were some assumptions buried in it but discarded a couple of draft emails trying to articulate it. I think it stems from org:FormalOrganization being a thing that is legally recognized and org:OrganizationalUnit (btw, any particular reason for using the North American spelling here?) Re spelling - fair question. I think there are good reasons. British spelling accepts both. FOAF, which was made largely in Bristol UK but with international participants, has used 'Z' spelling for nearly a decade, http://xmlns.com/foaf/spec/#term_Organization ... as far as I know without any complaints. I'm really happy to see this detailed work happen and hope to nudge FOAF a little too, perhaps finding a common form of words to define the shared general Org class. It would be pretty unfortunate to have foaf:Organization and org:Organisation; much worse imho than the camel-case vs underscore differences that show up within and between vocabularies. Z seems the pragmatic choice. I don't know much about English usage outside the UK and the northern Americas, but I find 'z' is generally accepted in the UK, whereas in the US, 's' is seen as a mistake. This seems supported by whoever wrote this bit of wikipedia, http://en.wikipedia.org/wiki/American_and_British_English_spelling_differences#-ise.2C_-ize_.28-isation.2C_-ization.29 American spelling accepts only -ize endings in most cases, such as organize, realize, and recognize.[53] British usage accepts both -ize and -ise (organize/organise, realize/realise, recognize/recognise).[53] British English using -ize is known as Oxford spelling, and is used in publications of the Oxford University Press, most notably the Oxford English Dictionary, as well as other authoritative British sources. being an entity that is not recognised outside of the FormalOrg Organisations can become recognised in some circumstances despite never having solicited outside recognition from a state -- this might happen in a court proceeding after some collective wrongdoing. Conversely you might have something that can behave like a kind of organisation, e.g. a class in a class-action lawsuit without the internal structure present it most organisations. Yes. In FOAF we have a class foaf:Project but it is not quite clear how best to characteri[sz]e it. In purely FOAF oriented scenarios, I believe it is hardly ever used (although humm stats below seem to contradict that). However, the pretty successful DOAP project ('description of a project') has made extensive use of a subclass, doap:Project in describing open source collaborative projects. These have something of the character of an organization, but are usually on the bazaar end of the cathedral/bazzar spectrum. Are some but not all projects also organizations? etc. discuss :) See also http://xmlns.com/foaf/spec/#term_Project http://trac.usefulinc.com/doap http://sindice.com/search?q=foaf:project+qt=term Search results for terms “foaf:project ”, found about 13.0 thousand (sindice seems to require downcasing for some reason) http://sindice.com/search?q=doap:project+qt=term Search results for terms “doap:project ”, found about 8.41 thousand (I haven't time to dig into those results, probably the queries could be tuned better to filter out some misleading matches) Is a state an Organisation? It would be great to link if possible to FAO's Geopolitical ontology here, see http://en.wikipedia.org/wiki/Geopolitical_ontology ... this for example has a model for groupings that geo-political entities belong to (I'm handwaving a bit here on the detail). It also has a class Organization btw, as well as extensive mappings to different coding systems. Organisational units can often be semi-autonomous (e.g. legally recognised) subsidiaries of a parent or holding company. What about quangos or crown-corporations (e.g. corporations owned by the state). They have legal recognition but are really like subsidiaries or units. As an aside, I would like to have a way of representing boards of directors, to update the old (theyrule-derrived) FOAFCorp data and schema. Ancient page here: http://rdfweb.org/foafcorp/intro.html schema http://xmlns.com/foaf/corp/ Some types of legally recognised organisations don't have a distinct legal personality, e.g. a partnership or unincorporated association so they cannot be said to have rights and responsibilities, rather the members have joint (or joint and several) rights and responsibilities. This may seem like splitting hairs but from
Re: UK Govt RDF Data Sets
On Sun, Apr 25, 2010 at 8:02 PM, Kingsley Idehen kide...@openlinksw.com wrote: Jeni Tennison wrote: Kingsley, On 15 Apr 2010, at 23:19, Kingsley Idehen wrote: Do you have any idea as to the whereabouts of RDF data sets for the SPARQL endpoints associated with data.gov.uk? [...] One thing I haven't been able to reconcile (in my head repeatedly) re. the above. If data provenance is the key concern behind the RDF dump releases, doesn't the same issue apply to CONSTRUCTs or DESCRIBE style crawls against the published endpoints? Basically, the very pattern exhibited by some user agents that hit the DBpedia endpoint (as per the DBpedia Endpoint Burden post). hes What makes a SPARQL endpoint safer than an RDF dump in this regard? For what it's worth, I've encountered very similar attitudes over the years in other environments. A good example is the digital library world; both regarding access to digital collections and online access to OPAC data, it was quite common to see Z39.50 search protocol access to the full collection, but accompanied by a rather cautious reluctance to also offer a simple data dump of the entire thing. Pointing out that you could do this via repeated Z39.50 searches was rarely helpful, and seemed more likely to encourage the search interface to be restricted than for data dumps to be made available. But hey, times are changing! I think it's just a matter of time... cheers, Dan
Fwd: backronym proposal: Universal Resource Linker
So - I'm serious. The term 'URI' has never really worked as something most Web users encounter and understand. For RDF, SemWeb and linked data efforts, this is a problem as our data model is built around URIs. If 'URL' can be brought back from limbo as a credible technical term, and rebranded around the concept of 'linkage', I think it'll go a long way towards explaining what we're up to with RDF. Thoughts? Dan -- Forwarded message -- From: Dan Brickley dan...@danbri.org Date: Sun, Apr 18, 2010 at 11:52 AM Subject: backronym proposal: Universal Resource Linker To: u...@w3.org Cc: Tim Berners-Lee ti...@w3.org I'll keep this short. The official term for Web identifiers, URI, isn't widely known or understood. The I18N-friendly variant IRI confuses many (are we all supposed to migrate to use it; or just in our specs?), while the most widely used, understood and (for many) easiest to pronounce, 'URL' (for Uniform Resource Locator) has been relegated to 'archaic form' status. At the slightest provocation this community dissapears down the rathole of URI-versus-URN, and until this all settles down we are left with an uncomfortable disconnect between how those in-the-know talk about Web identifiers, and those many others who merely use it. As of yesterday, I've been asked but what is a URI? one too many times. I propose a simple-minded fix: restore 'URL' as the most general term for Web identifiers, and re-interpret 'URL' as Universal Resource Linker. Most people won't care, but if they investigate, they'll find out about the re-naming. This approach avoids URN vs URI kinds of distinction, scores 2 out of 3 for use of intelligible words, and is equally appropriate to classic browser/HTML, SemWeb and other technical uses. What's not to like? The Web is all about links, and urls are how we make them... cheers, Dan
Re: Fwd: backronym proposal: Universal Resource Linker
On Sun, Apr 18, 2010 at 3:42 PM, Nathan nat...@webr3.org wrote: Wonder what would happen if we just called them Links? I think that would confuse people. And would put stress just on the point where SemWeb and HTML notions of link diverge. An HTML page can have two (hyper-)links, a href=/contactus/contact us/a in the header, and a href=/contactus/contacts/a in the footer. Each of those chunks of markup is what we informally call a link; the relative URI reference inside the href attribute in both cases is what makes it possible for the link to be useful. I'm saying that http://example.com/contactus/ should be called a 'universal resource linker' instead of 'uniform resource locator'. Using 'universal resource link' for that instead has a different grammatical role and could confuse since the page has two links (the bits that go blue in your browser usually), but they both point to the same URI/URL. Seems to be pretty unambiguous, if I say Link to TimBL or my Mum they both know what I mean, and it appears to produce the desired mental picture when used. There are two usages at least with link; 'pass me the link' versus 'click on the link'; the latter emphasises the occurance as being the link. Link, short for HyperLink - Link as in Linked Data. Keep the URI/URL/IRI for those who need to know the exact syntax of a Link. So when the RDF perspective comes in, so do subtly different notions of link. This is why I think framing 'link' as a countable thing will lead to confusion. RDF links are a bit like relationships; so a href=http://bob.example.com/; rel=xfn:coworker xfn:buddyBob/a is a link expressing two relationships, er, links. If you poke to hard at the magic word link it kinda crumbles a bit. But it remains incredible evocative and at the heart of both the Web and the SemWeb. Linker is non-commital enough that allows a family of related readings; where the markup describes a pre-existing link/relationship (eg. co-worker), and where markup itself is the link we're interested in. If you check back to Timbl's original diagram in http://www.w3.org/History/1989/proposal.html the different flavours of 'link' were in there from the start; 'wrote' and 'refers to' for example; the former links a person to a document; the later connects documents. So the linking story here is that identifiers for people and documents can share a notation, and become linkable. What exactly a link is, on the other hand, I think will always be a little bit slippery. cheers, Dan
Re: backronym proposal: Universal Resource Linker
On Sun, Apr 18, 2010 at 7:40 PM, Ian Davis m...@iandavis.com wrote: When talking to people who aren't semweb engineers then i use URL/URI/link interchangeably. I don't think it matters because the 1% that care will look it all up and get the distinction and the rest will just get on and use RDF as shown. Yeah, I find myself slipping between the two in the same sentence sometimes, even written or spoken. I don't think it really super matters which we use, but the confusion is costly and pointless. At the Augmented Reality Dev Camp here in Amsterdam yesterday, one of the comments was http://twitter.com/garciacity/status/12339906312 So what is an URI? mentioned by steven pemberton and hans overbeek #ardevcamp This is perfectly reasonable question from an educated and technical audience member, and a perfectly avoidable one. I mean no disrespect to either of the fine speakers, or the audience member; the mess is not of their making. RDFa and Linked Data were presented to a mixed audience, some coders, some artists, game designers, augmented reality, mapping folk etc... a real big mix.; and I think it went over well, but this silly issue of URI/URL is a bug worth fixing. We should be able to say URL unapologetically, correctly and without fear of contradiction. It's a fine acronym; it just has the wrong expansion. Easily fixed, since most people (as you say) won't even bother to look it up. My suggestion is that we flip things upside down. Too often URL comes across a being a kind of double-taboo (it's the old, incorrect name and it's (to URN-advocates) the crappy, lower quality form of linking, prone to breakage, 404 etc). People who use URL often do it in a sort of self-deprecating way; they know they should probably say URI or perhaps IRI; or maybe they really mean URI Reference or is that IRI Reference to be really inclusive and modern? [And are they called URI schemes now, or IRI schemes? I truly have no idea.] So let's pull URL out from the bottom of the pile, reinstate it at the top, and rework the acronym to remove the most troublesome part Locator. By flipping that to something link-centric, we re-emphasise the core value of the Web, and turn the conversation away from pointless ratholes like names/IDs vs addresses/locations to something potentially *much* more productive: different types of URL-based linking. * the whole mess around 'UR*' makes it hard for even technically aware observers to talk clearly * we don't have an actively used top term in the tech scene for all those identifying strings (URIs, URI Refs, IRIs, IRI Refs) * the deprecated nature of 'URL' means we don't reward people for using it; we make them feel dumber instead of smarter. We say URL? yeah kinda, you probably really ought to say URI but don't worry, you nearly got it instead of Yeah, URLs - universal resource linkers - it's all about linking; if you understand URLs you understand the core idea behind the Web (and the Web of data, ... and the Web of things, ...) There was a fuss a while back when the HTML5 spec was using URL instead of URI; however that was without the proposed reconceptualisation here. I'd hate to stir up a fuss, but I think we have a lot of interesting ingredients: * the term 'URL' isn't being used in a technical sense currently - I consider it available for careful redeployment * many of us are already using it informally as an overarching umbrella term ('cos we know it works) * it has massive market-presence and is understood pretty well by the public * we really badly need an umbrella term that hides the URI vs IRI vs *RI-Ref distinction from normal humans * 'universal resource linker' is loose and evocative enough to do the job, and makes people feel smarter not dumber... cheers, Dan
Re: UK Govt RDF Data Sets
On Fri, Apr 16, 2010 at 12:53 AM, Ian Davis li...@iandavis.com wrote: Kingsley, You should address your question directly to the project organisers, we're a technology provider and host some of the data but it is not up to us when or where the dumps get shared. My understanding is that because this is officially sanctioned data they want to ensure that the provenance is built into the datasets properly. My hope and wish is that the commitment to making dumps available will be built into the guidelines the UK Government are working on. But those won't be issued during this month because of the election. Re their provenance requirements, do you know if the right people are already engaged with the W3C Incubator on this topic; see various links fwd'd in http://lists.foaf-project.org/pipermail/foaf-dev/2010-April/010164.html It would be very interesting if someone were prepared to digitally sign the files; or at least to publish checksums on a trusted Web page in RDFa. Lots of options that could be explored. BTW I get the impression that similar concerns can be found in the library community too, when publishing SKOS and wanting to make sure that extensions and addons mixed into the data later are not mis-attributed to the original source. cheers, Dan
Re: twitter's annotation and metadata
+cc: Ed Summers On Fri, Apr 16, 2010 at 11:37 AM, Chris Sizemore chris.sizem...@bbc.co.uk wrote: the main problem is gonna be the cognitive dissonance over whether a tweet is an information or non-information resource and how many URIs are needed to fully rep a tweet... so, who's gonna volunteer to publish the linked data version of Twitter data, a la db/wiki[pedia] ... Based on http://blogs.loc.gov/loc/2010/04/how-tweet-it-is-library-acquires-entire-twitter-archive/ it looks like the Library of Congress might be taking on that job. And on the strength of the LCSH RDF work, it might even be feasible... Dan
Re: DBpedia hosting burden
On Wed, Apr 14, 2010 at 11:50 PM, Daniel Koller dakol...@googlemail.com wrote: Dan, ...I just setup some torrent files containing the current english and german dbpedia content: (.. as a test/proof of concept, was just curious to see how fast a network effect via p2p networks). To try, go to http://dakoller.net/dbpedia_torrents/dbpedia_torrents.html. I presume to get it working you need just the first people downloading (and keep spreading it around w/ their Torrent-Clients)... as long as the *.torrent-files are consistent. (layout of the link page courtesy of the dbpedia-people) Thanks! OK, let's see if my laptop has enough disk space left ;) could you post an 'ls -l' too, so we have an idea of the file sizes? Transmission.app on OSX says Downloading from 1 or 1 peers now (for a few of them), and from 0 of 0 peers for others. Perhaps you have some limits/queue in place? Now this is where my grip on the protocol is weak --- I'm behind NAT currently, and I forget how this works - can other peers find my machine via your public seeder? I'll try this on an ubuntu box too. Would be nice if someone could join with a single simple script... cheers, Dan I was working my way down the list in http://dakoller.net/dbpedia_torrents/dbpedia_torrents.html although when I got to Raw Infobox Property Definitions the first two links 404'd...
Re: DBpedia hosting burden
On Thu, Apr 15, 2010 at 9:57 PM, Kingsley Idehen kide...@openlinksw.com wrote: Ian Davis wrote: When you use the term: SPARQL Mirror (note: Leigh's comments yesterday re. not orienting towards this), you open up a different set of issues. I don't want to revisit SPARQL and SPARQL extensions debate etc.. Esp. as Virtuoso's SPARQL extensions are integral part of what makes the DBpedia SPARQL endpoint viable, amongst other things. Having the same dataset available via different implementations of SPARQL can only be healthy. If certain extensions are necessary, this will only highlight their importance. If there are public services offering SPARQL-based access to the DBpedia datasets (or subsets) out there on the Web, it would be rather useful if we could have them linked from a single easy to find page, along with information about any restrictions, quirks, subsetting, or value-adding features special to that service. I suggest using a section in http://en.wikipedia.org/wiki/DBpedia for this, unless someone cares to handle that on dbpedia.org. The burden issue is basically veering away from the key points, which are: 1. Use the DBpedia instance properly 2. When the instance enforces restrictions, understand that this is a Virtuoso *feature* not a bug or server shortcoming. Yes, the showcase implementation needs to be used properly if it is going to survive the increasing attention developer LOD is getting. It is perfectly reasonable of you to make clear when there are limits they are for everyone's benefit. Beyond the dbpedia.org instance, there are other locations for: 1. Data Sets 2. SPARQL endpoints (like yours and a few others, where functionality mirroring isn't an expectation). Is there a list somewhere of related SPARQL endpoints? (also other Wikipedia-derrived datasets in RDF) Descriptor Resource vhandling ia mirrors, BitTorrents, Reverse Proxies, Cache directives, and some 303 heuristics etc.. Are the real issues of interest. (am chatting with Daniel Koller in Skype now re the BitTorrent experiments...) Note: I can send wild SPARQL CONSTRUCTs, DESCRIBES, and HTTP GETs for Resource Descriptors to a zillion mirrors (maybe next year's April Fool's joke re. beauty of Linked Data crawling) and it will only make broaden the scope of my dysfunctional behavior. The behavior itself has to be handled (one or a zillion mirrors). Sure. But on balance, more mirrors rather than fewer should benefit everyone, particularly if 'good behaviour' is documented and enforced... Anyway, we will publish our guide for working with DBpedia very soon. I believe this will add immense clarity to this matter. Great! cheers, Dan
Re: DBpedia hosting burden
On Thu, Apr 15, 2010 at 9:57 PM, Kingsley Idehen kide...@openlinksw.com wrote: Ian Davis wrote: When you use the term: SPARQL Mirror (note: Leigh's comments yesterday re. not orienting towards this), you open up a different set of issues. I don't want to revisit SPARQL and SPARQL extensions debate etc.. Esp. as Virtuoso's SPARQL extensions are integral part of what makes the DBpedia SPARQL endpoint viable, amongst other things. Having the same dataset available via different implementations of SPARQL can only be healthy. If certain extensions are necessary, this will only highlight their importance. If there are public services offering SPARQL-based access to the DBpedia datasets (or subsets) out there on the Web, it would be rather useful if we could have them linked from a single easy to find page, along with information about any restrictions, quirks, subsetting, or value-adding features special to that service. I suggest using a section in http://en.wikipedia.org/wiki/DBpedia for this, unless someone cares to handle that on dbpedia.org. The burden issue is basically veering away from the key points, which are: 1. Use the DBpedia instance properly 2. When the instance enforces restrictions, understand that this is a Virtuoso *feature* not a bug or server shortcoming. Yes, the showcase implementation needs to be used properly if it is going to survive the increasing attention developer LOD is getting. It is perfectly reasonable of you to make clear when there are limits they are for everyone's benefit. Beyond the dbpedia.org instance, there are other locations for: 1. Data Sets 2. SPARQL endpoints (like yours and a few others, where functionality mirroring isn't an expectation). Is there a list somewhere of related SPARQL endpoints? (also other Wikipedia-derrived datasets in RDF) Descriptor Resource vhandling ia mirrors, BitTorrents, Reverse Proxies, Cache directives, and some 303 heuristics etc.. Are the real issues of interest. (am chatting with Daniel Koller in Skype now re the BitTorrent experiments...) Note: I can send wild SPARQL CONSTRUCTs, DESCRIBES, and HTTP GETs for Resource Descriptors to a zillion mirrors (maybe next year's April Fool's joke re. beauty of Linked Data crawling) and it will only make broaden the scope of my dysfunctional behavior. The behavior itself has to be handled (one or a zillion mirrors). Sure. But on balance, more mirrors rather than fewer should benefit everyone, particularly if 'good behaviour' is documented and enforced... Anyway, we will publish our guide for working with DBpedia very soon. I believe this will add immense clarity to this matter. Great! cheers, Dan
Re: DBpedia hosting burden
On Wed, Apr 14, 2010 at 8:11 PM, Kingsley Idehen kide...@openlinksw.com wrote: Some have cleaned up their act for sure. Problem is, there are others doing the same thing, who then complain about the instance in very generic fashion. They're lucky it exists at all. I'd refer them to this Louis CK sketch - http://videosift.com/video/Louie-CK-on-Conan-Oct-1st-2008?fromdupe=We-live-in-an-amazing-amazing-world-and-we-complain (if it stays online...). While it is a shame to say 'no' to people trying to use linked data, this would be more saying 'yes, but not like that...'. I think we have an outstanding blog post / technical note about the DBpedia instance that hasn't been published (possibly due to the 3.5 and DBpedia-Live work we are doing), said note will cover how to work with the instance etc.. [..] We do have a solution in mind, basically, we are going to have a different place for the descriptor resources and redirect crawlers there via 303's etc.. [...] We'll get the guide out. That sounds useful As you mention, DBpedia is an important and central resource, thanks both to the work of the Wikipedia community, and those in the DBpedia project who enrich and make available all that information. It's therefore important that the SemWeb / Linked Data community takes care to remember that these things don't come for free, that bills need paying and that de-referencing is a privilege not a right. Bills the major operative word in a world where the Bill Payer and Database Maintainer is a footnote (at best) re. perception of what constitutes the DBpedia Project. Yes, I'm sure some are thoughtless and take it for granted; but also that others are well aware of the burdens. (For that matter, I'm not myself so sure how Wikipedia cover their costs or what their longer-term plan is...). For us, the most important thing is perspective. DBpedia is another space on a public network, thus it can't magically rewrite the underlying physics of wide area networking where access is open to the world. Thus, we can make a note about proper behavior and explain how we protect the instance such that everyone has a chance of using it (rather than a select few resource guzzlers). This I think is something others can help with, when presenting LOD and related concepts: to encourage good habits that spread the cost of keeping this great dataset globally available. So all those making slides, tutorials, blog posts or software tools have a role to play here. Are there any scenarios around eg. BitTorrent that could be explored? What if each of the static files in http://dbpedia.org/sitemap.xml were available as torrents (or magnet: URIs)? When we set up the Descriptor Resource host, these would certainly be considered. Ok, let's take care to explore that then; it would probably help others too. There must be dozens of companies and research organizations who could put some bandwidth resources into this, if only there was a short guide to setting up a GUI-less bittorrent tool and configuring it appropriately. Are there any bittorrent experts on these mailing lists who could suggest next practical steps here (not necessarily dbpedia-specific)? (ah I see a reply from Ivan; copying it in here...) If I were The Emperor of LOD I'd ask all grand dukes of datasources to put fresh dumps at some torrent with control of UL/DL ratio :) For reason I can't understand this idea is proposed few times per year but never tried. I suspect BitTorrent is in some ways somehow 'taboo' technology, since it is most famous for being used to distributed materials that copyright-owners often don't want distributed. I have no detailed idea how torrent files are made, how trackers work, etc. I started poking around magnet: a bit recently but haven't got a sense for how solid that work is yet. Could a simple Wiki page be used for sharing torrents? (plus published hash of files elsewhere for integrity checks). What would it take to get started? Perhaps if http://wiki.dbpedia.org/Downloads35 had the sha1 for each download published (rdfa?), then others could experiment with torrents and downloaders could cross-check against an authoritative description of the file from dbpedia? I realise that would only address part of the problem/cost, but it's a widely used technology for distributing large files; can we bend it to our needs? Also, we encourage use of gzip over HTTP :-) Are there any RDF toolkits in need of a patch to their default setup in this regard? Tutorials that need fixing, etc? cheers, Dan ps. re big datasets, Library of Congress apparently are going to have complete twitter archive - see http://twitter.com/librarycongress/status/12172217971 - http://blogs.loc.gov/loc/2010/04/how-tweet-it-is-library-acquires-entire-twitter-archive/
XMP RDF extractors?
On Tue, Apr 13, 2010 at 3:56 PM, Leigh Dodds leigh.do...@talis.com wrote: Hi, Yes. PDF: http://patterns.dataincubator.org/book/linked-data-patterns.pdf EPUB: http://patterns.dataincubator.org/book/linked-data-patterns.epub Something of a tangent but this reminds me, what's the latest on RDF extractors for Adobe XMP? I always used to use 'strings' and a regex but I haven't tracked the spec and have found this trick working *less* well over time, not better. strings linked-data-patterns.pdf | grep -i xmp id=W5M0MpCehiHzreSzNTczkc9d?x:xmpmeta xmlns:x=adobe:ns:meta/ rdf:Description xmlns:xmp=http://ns.adobe.com/xap/1.0/; rdf:about= xmp:CreateDate2010-04-12T23:01:36+01:00/xmp:CreateDate /x:xmpmeta?xpacket end=r? By contrast, downloading the .epub file and unzipping you find this in content.opf: ?xml version=1.0 encoding=utf-8 standalone=no? package xmlns=http://www.idpf.org/2007/opf; version=2.0 unique-identifier=bookid metadata dc:identifier xmlns:dc=http://purl.org/dc/elements/1.1/; id=bookid_id2880071/dc:identifier dc:title xmlns:dc=http://purl.org/dc/elements/1.1/;Linked Data Patterns/dc:title dc:creator xmlns:dc=http://purl.org/dc/elements/1.1/; xmlns:opf=http://www.idpf.org/2007/opf; opf:file-as=Dodds, LeighLeigh Dodds/dc:creator dc:creator xmlns:dc=http://purl.org/dc/elements/1.1/; xmlns:opf=http://www.idpf.org/2007/opf; opf:file-as=Davis, IanIan Davis/dc:creator dc:description xmlns:dc=http://purl.org/dc/elements/1.1/;This book lives at http://patterns.dataincubator.org. Check that website for the latest version. This work is licenced under the Creative Commons Attribution 2.0 UK: England amp; Wales License. To view a copy of this licence, visit http://creativecommons.org/licenses/by/2.0/uk/. Thanks to members of the Linked Data mailing list for their feedback and input, and Sean Hannan for contributing some CSS to style the online book./dc:description dc:language xmlns:dc=http://purl.org/dc/elements/1.1/;en/dc:language /metadata manifest item id=ncxtoc media-type=application/x-dtbncx+xml href=toc.ncx/ item id=htmltoc media-type=application/xhtml+xml href=bk01-toc.html/ item id=id2880071 href=index.html media-type=application/xhtml+xml/ Wouldn't it be nice if there were easy conventions for books about RDF to have Webby linked RDF bundled in the files? Both seem nearly there but not quite... (this not a complaint Leigh, I love this work btw!) cheers, Dan ps. re epub see also http://lists.w3.org/Archives/Public/public-lod/2010Jan/0121.html
Re: XMP RDF extractors?
On Tue, Apr 13, 2010 at 6:31 PM, Pierre-Antoine Champin swlists-040...@champin.net wrote: Even more tangent, but when I read in detail the XMP spec last year (in relation to the Media Annotation WG), I came to two conclusions: - XMP specifies RDF at the level of the XML serialization, which is *ugly* (emphasis on *ugly*). Furthermore, it makes it unsafe to use standard RDF/XML serializers, as those may not enforce those syntactic constraints. - XMP interprets RDF/XML in a non-standard way, considering the two following tags as non equivalent ns1:bar xmlns=http://example.com/foo;... ns2:foobar xmlns=http://example.com/;... (which is again, a syntax-only perspective). So it is not safe to use standard RDF/XML parsers, as they will produce a model which may be inconsistent with other XMP parsers. So you can neither use standard serializers nor standard parsers to handle XMP's RDF safely, so as far as I'm concerned, XMP is not really RDF -- and Dan's problems to extract it strengthen this opinion of mine... That being said, the risks of inconsistency are minimal, especially for parsing. So I guess there is some value in pretending XMP is RDF ;) and using an RDF parser to extract it... I think we can and should be generous to Adobe here; there were supportive of RDF since the late '90s - eg. Walter Chang's work on UML and RDF http://www.w3.org/TR/NOTE-rdf-uml/ - and commiting to something that is embedded within files that will mostly *never* be re-generated (PDFs, JPEGs etc in the wild) makes for naturally conservative design. There are probably many kinds of improvement they could make, but being back-compatible with the large bulk of deployed XMP must be a major concern. Pushing out revisions to tools on the scale of Photoshop etc isn't easy, especially when the new stuff will also have to read/write properly in older deployed tools for unknown years to come. That said I think we would do well to look around more actively at what's out there via XMP, and see how it hangs together when re-aggregated into a common SPARQL environment. In particular XMP pre-dates SKOS, and I imagine many of the environments where XMP matters would benefit from the kinds of integration SKOS can bring. So I'd love to see some exploration of that... cheers, Dan
Re: KIT releases 14 billion triples to the Linked Open Data cloud
But I love it :) Do the numbers include dates? Dan On Thu, Apr 1, 2010 at 12:30 PM, Matthias Samwald samw...@gmx.at wrote: Hi Denny, I am sorry, but I have to voice some criticism of this project. Over the past two years, I have become increasingly wary of the excitement over large numbers of triples in the LOD community. Large numbers of triples don't mean don't necessarily mean that a dataset enables us to do anything novel or significantly useful. I think there should be a shift from focusing on quantity to focusing on quality and usefulness. Now the project you describe seems to be well-made, but it also exemplifies this problem to a degree that I have not seen before. You basically published a huge dataset of numbers, for the sake of producing a large number of triples. Your announcement mainly emphasis on how huge the dataset is, and the corresponding paper does the same. The paper gives a few application scenarios, I quote The added value of the paradigm shift initiated by our work cannot be underestimated. By endowing numbers with an own identity, the linked open data cloud will become treasure trove for a variety of disciplines. By using elaborate data mining techniques, groundbreaking insights about deep mathematical correspondences can be obtained. As an example, using our sample dataset, we were able to discover that there are signi cantly more odd primes than even ones, and even more excitingly a number contains 2 as a prime factor exactly if its successor does not. I am sorry, but this sounds a bit overenthusiastic. I see no paradigm shift, and I also don't see why your findings about prime numbers required you to publish the dataset as linked data. I also have troubles seeing the practical value of looking at the resource pages for each number with a linked data browser, but I am also not a mathematician. I am sorry for being a bit antagonistic, but we as a community should really try not to be seduced too easily by publishing ever-larger numbers of triples. Cheers, Matthias Samwald -- From: Denny Vrandecic denny.vrande...@kit.edu Sent: Thursday, April 01, 2010 12:01 PM To: public-lod@w3.org Subject: KIT releases 14 billion triples to the Linked Open Data cloud We are happy to announce that the Institute AIFB at the KIT is releasing the biggest dataset until now to the Linked Open Data cloud. The Linked Open Numbers project offers billions of facts about natural numbers, all readily available as Linked Data. Our accompanying peer-reviewed paper [1] gives further details on the background and implementation. We have integrated with external data sources (linking DBpedia to all their 335 number entities) and also directly link to the best-known linked open data browsers from the page. You can visit the Linked Open Numbers project at: http://km.aifb.kit.edu/projects/numbers/ Or point your linked open data browser directly at: http://km.aifb.kit.edu/projects/numbers/n1 We are happy to have increased the amount of triples on the Web by more than 14 billion triples, roughly 87.5% of the size of linked data web before this release (see paper for details). We hope that the data set will find its serendipitous use. The data set and the publication mechanism was checked pedantically, and we expect no errors in the triples. If you do find some, please let us know. We intend to be compatible with all major linked open data publication standards. About the AIFB The Institute AIFB (Applied Informatics and Formal Description Methods) at KIT is one of the world-leading institutions in Semantic Web technology. Approximately 20 researchers of the knowledge management research group are establishing theoretical results and scalable implementations for the field, closely collaborating with the sister institute KSRI (Karlsruhe Service Research Institute), the start-up company ontoprise GmbH, and the Knowledge Management group at the FZI Research Center for Information Technologies. Particular emphasis is given to areas such as logical foundations, Semantic Web mining, ontology creation engineering and management, RDF data management, semantic web search, and the implementation of interfaces and tools. The institute is involved in many industry-university co-operations, both on a European and a national level, including a number of intelligent Web systems case studies. Website: http://www.aifb.kit.edu About KIT The Karlsruhe Institute of Technology (KIT) is the merger of the former Universität Karlsruhe (TH) and the former Forschungszentrum Karlsruhe. With about 8000 employees and an annual budget of 700 million Euros, KIT is the largest technical research institution within Germany. KIT is both, a state university with research and teaching and, at the same time, a large-scale research institution of the Helmholtz Association. KIT has a strong reputation as
Re: KIT releases 14 billion triples to the Linked Open Data cloud
On Thu, Apr 1, 2010 at 6:25 PM, Martin Hepp (UniBW) martin.h...@ebusiness-unibw.org wrote: Hi Denny: Without spooling your All Fools' Day joke: I think it is a dangerous one, because there is obviously a true core in the expected criticism. I think that without any need, you give outsiders additional ammunition to confirm other outsiders' prejudices against the value of linked data. I bet you will find lots of triples in the current LOD cloud that have information value close to the triples in your experiment. And many people communicating over the potential of the Web of Linked Data, and maybe deciding about business investments, will not see the joke in your page. On the contrary, I think it was both funny and healthy for the semweb community. My thought process when I carelessly saw the original blurb go past was as follows: * oh dear, more overblown hype for some semweb thing, that's not good * oh, it's quite stupid in fact * ah it's Denny, and I like everything he makes ... and ah yeah 2010-04-01, phew The fact that I was even for a second prepared to entertain the idea that this was serious, worries me. And clearly a few others on the list went further before realising. Which is why I say this was a healthy exercise. If we as a community are so used to over-hyped folly that we could consider that this might have been a serious offering, then we ought to take more care of our habits during the other 364 days of the year. If I hadn't seen Denny's name against the project or actually read the paper, I'd probably have been taken in too... If we can't laugh at ourselves, we'll be ill prepared to deal with criticism. And criticism is healthy for any technology community, but especially one whose ambitions are as large as ours. We are trying to build a global, integrated system for planet-wide sharing of descriptions of all things and their interconnections. Described like that, it sounds like drug-addled idiocy, but that's what we're doing. And the only way we'll manage it is if we do it in good humour. This means acting gracefully when fans of other technologies offer criticism, whether or not they are gentle in their words. And it means taking care to balance enthusiasm for the potential of this technology with a realisation that there's still a long way to go in making these tools and techniques a joy for non-enthusiasts to use... cheers, Dan
Re: Should dbpedia have stuff in that is not from wikipedia - was: Re: A URI(Web ID) for the semantic web community as a foaf:Group
[snip] Couple of almost-independent points - Re DBpedia, I share a concern that the Wikipedia turned into a database product remain fairly clearly defined, even though the RDFization naturally includes a bit of creativity. However even that has subtleties - there are the different language variants for example, plus outlying members of the Wikipedia family (wiktionary etc.). However I think we as a community should be prepared for an interesting trend, hopefully one that'll move faster with things like openid and RDF helping: I believe Wiki federation and cross-referencing will become a major trend over next few years. The stress and trauma that the Wikipedia community are currently feeling re scoping, ie. the Deletionism debate - http://meta.wikimedia.org/wiki/Deletionism - can only really be resolved by accepting that we'll have a Web of useful and overlapping wikis, treating various topics in more or less detail. Using common URIs (grounded in the central Wikipedia) makes this possible. And this means - by combining dbpedia's extraction technology, or the Semantic MediaWiki addons, that we can expect a lot more RDF data from other wikis over the coming years. It wouldn't be unreasonable for the DBpedia project to offer some aggregate of all this, if they chose to... Also re SWIG, considered as a entity in the W3C world and as a larger vaguer community. Some W3C Interest Groups have enumerated memberships; traditionally RDF IG and its successor, this SemWeb IG, didn't. There is no master list, just a collection of SWIG-related mailing lists and other channels. I wonder sometimes about changing that, so we had a stronger sense of who the members of W3C SWIG actually are (ie. who has commited to the group's charter; also db-backed profile pages at w3.org, etc.). There are also data sources like the mail archives and #swig IRC logs (see http://swig.xmlhack.com/), Twitter/Identi.ca etc that offer some sense of who the active members of the community are. Also I made some experiments in http://danbri.org/words/2009/10/25/504 with exposing lists of OpenIDs from Wordpress, MediaWiki etc to show who is actively participating at some site. I think this evidence-driven approach is a stronger way of defining a network of overlapping foaf:Group descriptions, rather than having a single central list. I might for example want to see who was on the www-rdf-logic or www-rdf-rules lists and via their microblog posts, which amongst them were in the Netherlands. Or find microblog posts from the people who are actively contributing to the FOAF or ESW wikis. There are lots of overlapping communities; being 'in the Semantic Web community' isn't a simple boolean flag. So I'd rather surface the underlying data and allow people to compose views into it that suit particular use cases - find me things bookmarked by ontologists; what have members of public-lod been saying on Twitter this week?, Find me DOAP descriptions of software associated with members of the #swig IRC channel, conferences with 2 or more editors of W3C SemWeb specs on the steering committee, etc etc... To relate these two points, I have started documenting bits of SemWeb history in the FOAF Wiki, since I really can't be bothered to fight deletionism wars on Wikipedia's main site. For example http://wiki.foaf-project.org/w/MCF describes Meta Content Format (and yep the CSS image right alignment has gone wrong there - help welcomed!). The FOAF wiki has OpenID support, and Semantic Media Wiki installed, so edits can be associated with OpenIDs. I would love to know how best to configure SMW so that we could figure out that http://wiki.foaf-project.org/w/MCF is talking about the same thing as http://en.wikipedia.org/wiki/Meta_Content_Framework so that folk who express their interest the topic using either URI can be linked. What's the markup to put into the FOAF wiki entry which would express the appropriate sameAs? Also of note, the FOAF Wiki is currently configured to consume a list of OpenIDs and add them to a MediaWiki trust group, Bureaucrat. http://wiki.foaf-project.org/w/FOAF_Wiki:Bureaucrats ... it currently gets this list just from my blog, ie. anyone who I have trusted enough to comment in my blog, gets added to this group. In future I would like to tune this to use more sources and more subtlety. Getting this kind of trust syndication in place I think will be a big part of helping smaller Wikis flourish, to connect back to the original point... cheers, Dan
Re: SKOS, owl:sameAs and DBpedia
On Wed, Mar 24, 2010 at 4:57 PM, Yves Raimond yves.raim...@gmail.com wrote: Hello! We are in the process of rolling out some links to DBpedia over in BBC Programmes. However, we are facing a small issue. We use our own categorisation scheme based on SKOS, and then want to add some sameAs links to DBpedia. For example, we currently publish the following statements: http://www.bbc.co.uk/programmes/places/france#place a skos:Concept ; a po:Place . And we want to add an extra statement: http://www.bbc.co.uk/programmes/places/france#place owl:sameAs http://dbpedia.org/resource/France. Is that an issue? Should we drop SKOS altogether if we go on with that, or should we use skos:exactMatch instead of owl:sameAs? see also http://wiki.foaf-project.org/w/term_focus I'm running out of excuses for not having added this already... Dan
Re: SKOS, owl:sameAs and DBpedia
On Wed, Mar 24, 2010 at 5:09 PM, Yves Raimond yves.raim...@gmail.com wrote: Is that an issue? Should we drop SKOS altogether if we go on with that, or should we use skos:exactMatch instead of owl:sameAs? see also http://wiki.foaf-project.org/w/term_focus I'm running out of excuses for not having added this already... Great, thanks for the link! However, I'd like to understand why a sameAs would be bad here, I have the intuition it might be, but am really not sure. It looks to me like there's no resource out there that couldn't be a SKOS concept as well (you may want to use anything for categorisation purpose --- the loose categorisation relationship being encoded in the predicate, not the type). If it can't be, then I am beginning to feel slightly uncomfortable about SKOS :-) Because conceptualisations of things as SKOS concept are distinct from the things themselves. If this weren't the case, we couldn't have diverse treatment of common people/places/artifacts in multiple SKOS thesauri, since sameAs merging would mangle the data. SKOS has lots of local administrative info attached to each concept which doesn't make sense when considered to be properties of the thing the concept is a conceptualization of. I am sure this problem must have been looked at before, e.g. within LCSH? Yes, this has been discussed since we brought SKOS into W3C from the SWAD-Europe project ~2004. There is some discussion in this old guide - http://www.w3.org/TR/2005/WD-swbp-skos-core-guide-20051102/#secmodellingrdf 'There is a subtle difference between SKOS Core and other RDF applications like FOAF [FOAF], in terms of what they allow you to model. SKOS Core allows you to model a set of concepts (essentially a set of meanings) as an RDF graph. Other RDF applications, such as FOAF, allow you to model things like people, organisations, places etc. as an RDF graph. Technically, SKOS Core introduces a layer of indirection into the modelling.' 'The above graph describes a relationship between a concept, and the person who is the creator of that concept. This graph should be interpreted as saying, the person named 'Alistair Miles' is the creator of the concept denoted by the URI http://www.example.com/concepts#henry8. This concept was modified on 2005-02-06. This graph should probably not be interpreted as saying, the person named 'Alistair Miles' is the creator of King Henry VIII, or that, King Henry VIII was modified on 2005-02-06. 'This second graph should probably be interpreted as saying, the persons named 'King Henry VII' and 'Elizabeth of York' are the creators of the person named 'King Henry VIII'. So, for a resource of type skos:Concept, any properties of that resource (such as creator, date of modification, source etc.) should be interpreted as properties of a concept, and not as properties of some 'real world thing' that that resource may be a conceptualisation of. This layer of indirection allows thesaurus-like data to be expressed as an RDF graph. The conceptual content of any thesaurus can of course be remodelled as an RDFS/OWL ontology. However, this remodelling work can be a major undertaking, particularly for large and/or informal thesauri. A SKOS Core representation of a thesaurus maps fairly directly onto the original data structures, and can therefore be created without expensive remodelling and analysis. SKOS Core is intended to provide both a stable encoding of thesaurus-like data within the RDF graph formalism, as well as a migration path for exploring the costs and benefits of moving from thesaurus-like to RDFS/OWL-like modelling formalisms.' http://www.w3.org/TR/2005/WD-swbp-skos-core-guide-20051102/#secidentity 'Concept Identity and Mapping The property owl:sameAs should not be used to express the fact that two conceptual resources (i.e. resources of type skos:Concept) share the same meaning. The property owl:sameAs implies that two resources are identical in every way (they are in fact the same resource). Although two conceptual resources may have the same meaning, they may have different owners, different labels, different documentation, different history, and of course a different future.' Hope this helps, Dan
Re: Improving Organization of Govt. based Linked Data Projects
On 21 Mar 2010, at 12:47, Hugh Glaser h...@ecs.soton.ac.uk wrote: Hi Kingsley, I am right with you - finding stuff is hard. But I do think we could make it easier for all of us. Just the esw wiki alone requires me to put every set I create into a bunch of places 10 years ago, looking for RDF on the public Web was like looking for a needle in a haystack. There wasnt much out there and it was poorly linked. So a big part of the thinking that led to the foaf/rdfweb design was to make discovery easier: if you find one rdf doc, you should be able to find most of the rest by following seeAlso and other kinds of links. Why isn't this enough? Perhaps because many of the datasets are huge db exports, crawlers are often overwhelmed and dissapear into depth- first holes? Or because we don't publish triples about doc- and dataset-types in a crawler-discoverable way? A wiki page is ok for initial bootstrap but we ought to outgrow that soon... Dan
Re: head/@profile needed in HTML 5? GRDDL in Linked Data community?
On Wed, Feb 24, 2010 at 5:55 PM, Dan Connolly conno...@w3.org wrote: The proposal from the editors and chairs it that it is not needed; i.e. not cost-effective. http://lists.w3.org/Archives/Public/public-html/2010Feb/0794.html Dan B., your message suggests (without actually saying so) that Dublin Core doesn't need it. Have you heard back from the Dublin Core decision-making authorities? http://lists.w3.org/Archives/Public/public-html/2010Jan/0576.html There was a little discussion on the Dublin Core Advisory Board list (not a public forum; sorry no links). I don't believe we considered explicitly the scenario in which profile= gets lost, but something like RDFa is not permitted for HTML5. Maybe Pete or Tom (cc:'d) can comment further? My personal guess at a DC view would be something like well if we don't get RDFa, then don't take @profile away!, the assumption being that RDFa would come with some namespace abbreviation mechanism, whether xmlns:-based or otherwise. I doubt the DC community would be satisfied by the current Microdata design in which each use of a DC property would be identified by its full URI. If you like, I can ask explicitly. The microformats community seems happy to explore alternatives. http://lists.w3.org/Archives/Public/public-html/2010Feb/0690.html I'm considering pushing back on the 0794 proposal, but it's only worth my time if somebody actually needs head/@profile to survive into HTML 5. Does anybody need it? That's a little like asking if someone needs the emergency life-raft before telling them whether they get to keep using the boat or not. WIthout RDFa, DC would have to use it. On a somewhat related topic... as RDFa matures, the need for GRDDL somewhat fades. I wonder, though... to what extent is GRDDL used in the linked data community? What tools consume it? What content providers produce it? I've never used GRDDL, and I don't know of anyone actively using it. That said, there are many things I don't know! I have tried to get Redland/Raptor working with it to consume POWDER a couple of times, but with no success. When I think about running GRDDL against wild Web content, I have some vague worry about whether untrusted XSLTs are sufficiently sandboxed, but I haven't investigated the risks very carefully. I remember Bijan raising similar concerns a while back. cheers, Dan See also: The details of data in documents: GRDDL, profiles, and HTML5 By Dan Connolly in HTML, Semantic Web, Web Architecture, XML on August 22, 2008 7:45 PM http://www.w3.org/QA/2008/08/the_details_of_data_in_documen.html -- Dan Connolly, W3C http://www.w3.org/People/Connolly/ gpg D3C2 887B 0F92 6005 C541 0875 0F91 96DE 6E52 C29E
Re: Colors
On Wed, Feb 24, 2010 at 8:31 AM, Pat Hayes pha...@ihmc.us wrote: Does anyone know of URIs which identify colors? Umbel has the general notion of Color, but I want the actual colors, like, you know, red, white, blue and yellow. I can make up my own, but would rather use some already out there, if they exist. Many thanks for any pointers. How scruffy are you feeling? http://en.wikipedia.org/wiki/List_of_colors suggests you'll find a lot in Wikipedia / dbpedia... Dan
Re: Terminology when talking about Linked Data
On Wed, Feb 17, 2010 at 12:51 PM, Damian Steer d.st...@bristol.ac.uk wrote: Historical aside: On 17/02/10 11:20, Hugh Glaser wrote: More recently I have also badged as Web of Data; See [1], since 1998 :-) It's been used fairly regularly since then, although I'd highlight [2] as a particularly significant use of the term. Damian [1] http://www.w3.org/DesignIssues/Semantic.html [2] http://www.plasticbag.org/archives/2006/02/my_future_of_web_apps_slides/ Yes, any use of the phrase Web of data that excludes or sidelines work like Tom Coates' here ([2]) would be ... regrettable. There have already been unfortuate run-ins in blog land about whether you can do 'linked data' without using RDF in some LOD-approved manner. There is much much more to 'data' than RDF (or OWL, or triples, or W3C SemWeb). The Web's a big place and we have to be inclusive. RDF was originally standardised as a metadata system, a mechanism for finding stuff ... whether that stuff was photos, videos, HTML pages, excel spreadsheets, SQL databases, 3d models. It can also be used to provide summaries or normalisation of some of the information held in those data objects too. But we shouldn't forget the original use case, nor sideline it. Metadata about non-RDF documents is still linked data imho: all of those forms of Web information are 'linked data' if we use W3C information-linking technology to increase their findability. There's more information out there than fits comfortably in triples or quads; some of the best information is still in people's heads, after all. FOAF was always blurbed as an experimental linked information system; we should have been clearer that some of that info was in triples, some in human-oriented documents, and some ... critically ... was still in people's heads. The richness comes from the interplay between those three forms of information. But I guess that's why I still cling nostalgically to the word 'information' here, rather than just 'data'. BTW an early and important paper in the 'web of data' line, which tried to bring RDF and XML together as components of a larger ('Semantic Web') story is http://www.w3.org/1999/04/WebData ... it doesn't use the phrase explicitly (except in the url path maybe) but it is clear on the need for an inclusive approach. cheers, Dan
Re: Terminology when talking about Linked Data
On 17 Feb 2010, at 18:14, Pat Hayes pha...@ihmc.us wrote: On Feb 17, 2010, at 6:37 AM, Dan Brickley wrote: ... . RDF was originally standardised as a metadata system, a mechanism for finding stuff ... whether that stuff was photos, videos, HTML pages, excel spreadsheets, SQL databases, 3d models. ... Really? That was not the impression I got when I first got involved with it. In fact, I asked explicitly for clarification, at the first F2F in Sebastopol: is RDF intended to be metadata for Web 'objects', or is it supposed to be a notation for describing **things in general**? And the resounding chorus from the WG was the latter, most definitely not the former. (Which is also what Guha told me right after the very first RDF speclet was first released.) And that is why I designed the semantics based on a logical model theory rather than a computational annotation system. If RDF was supposed to be primarily a mechanism for finding stuff, then we designed it wrong. The original use cases were various flavours of 'metadata'; however that concept melts on closer inspection. We did the right thing by going with a general system; but we did lose touch a bit with some of the original scenarios which motivated W3C to standardise RDF in '97. MCF and RDF were never themselves technologies with a built-in scope of 'describing only data', and that was all fine and good. Whenever you dig into 'metadata' requirements you soon find that the whole world is soon in-scope. The gamble of course with a highly general standard is that it can be used in-principle for *everything* but risks in practice being used for nothing. It took us a while to find that niche... Dan Pat IHMC (850)434 8903 or (650)494 3973 40 South Alcaniz St. (850)202 4416 office Pensacola(850)202 4440 fax FL 32502 (850)291 0667 mobile phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes
Re: The status of Semantic Web community- perspective from Scopus and Web Of Science (WOS)
On Fri, Feb 12, 2010 at 8:22 PM, Ying Ding dingy...@indiana.edu wrote: Hi, If you are interested to know the Semantic Web: Who is who from the perspective of Scopus and Web Of Science, recently we conduct a bibliometric analysis in this field (http://info.slis.indiana.edu/~dingying/Publication/JIS-1098-v4.pdf), which might be interesting to you. It's interesting to see what a traditional - ie. essentially pre-Web - citation analysis comes up with; however I wouldn't leap so quickly to claim this this results in 'identifying the most productive players'. A lot of key SemWeb infrastructure came about through non-academic collaboration; either industrial or what we might call collaborations conducted online informally, 'Internet-style'. In fact I'd argue that the needs of the academic publication process have often been a retarding factor on this collaborative work. The traditionally-published academic literature is of course a key part of the story, but if you look at it alone you will end up with both a misleading sense of how things got this way, and -worse- misleading intuitions about how to get more involved and help further the project. This is why I bother to make a little fuss here. The phrase 'Semantic Web' from ~2000 was essentially a rebranding of the then-unfashionable RDF technology. Prior to calling it RDF, the project was called PICS-NG. These days many call it 'Linked Data' instead. From http://lists.w3.org/Archives/Public/sw99/ - http://www.w3.org/1999/11/SW/Overview.html (Member-only link) 'We propose to continue the W3C Metadata Activity as a Semantic Web Development Initiative'. But by this point, the base technology was already out there, both as a W3C Recommendation and as something in use: Netscape - the Google of it's time - was using RDF already. For example back in October 1988 http://web.archive.org/web/19991002043750/www.mailbase.ac.uk/lists/rdf-dev/1998-11/0004.html R.V.Guha, then at Netscape wrote I still see this as a big and important use of RDF. This server answers over 2 million requests in RDF every day. ... I do plan to fix the RDF, but thats with the next version of the browser (I have about 6M browsers out there which are depending on this older format). Any narrative that puts the start of Semantic Web history in 2000/2001 will confuse people as to where it came from: we had major browser buy-in 2-3 years previously, after all. And any narrative that omits the role of MCF - simply because it didn't come through the academic publication process - risks misleading 'emerging stars' about how to make an impact on the world rather than just on the citation databases. Netscape bought into RDF because it grew from MCF, acquired from Apple with Guha. A reformulation of MCF to use an XML notation was one of the key inputs into the RDF design; see http://www.w3.org/TR/NOTE-MCF-XML/ and the earlier MCF White Paper http://www.guha.com/mcf/wp.html Now MCF had significant mind-share and presence in the tech world back in 1996 - http://web.archive.org/web/2815212707/http://www.xspace.net/hotsauce/ - and even grassroots adoption on sites that wanted to have a '3d fly thru' using Apple's then-cool visualization plugin. MCF was a direct ancestor to RSS (also originally an RDF-based Netscape product); it was triples-based, written in XML, and quite recognisable as RDF's precursor to anyone who reads the spec. The grassroots, information linking style of MCF was one of the inspirations behind FOAF too. However it did not leave any footprint in the academic literature. We might ask why. Like much of the work around W3C and tech industry standards, the artifacts it left behind don't often show up in the citation databases. A white paper here, a Web-based specification there, ... it's influence cannot easily be measured through academic citation patterns, despite the fact that without it, the vast majority of papers mentioned in http://info.slis.indiana.edu/~dingying/Publication/JIS-1098-v4.pdf would never have existed. In my experience, many of the discussions that shaped the early RDF and Semantic Web efforts were conducted online, using email, often also IRC chat, and as the years went by, increasingly in blogs and now microblogs. And many of the people who got a lot done were not employed in an academic setting where there was an institutionalised pressure to public in certain kinds of places. This is not to belittle the critically important contributions that came from those employed in academia, just to note that the wave of interest and research funding that followed 200/1 served largely to polish and promote ideas (and tools, specs) that had already reached prominence via Internet/Web/industry means. Without that academic buy-in and associated research funding, the Semantic Project would surely be dead by now. However, there is a continuing danger of confusing the real project --- a global collaboration to improve the Web's information-linking facilities ---
Re: DBpedia-based entity recognition service / tool?
On Tue, Feb 2, 2010 at 4:47 PM, Georgi Kobilarov georgi.kobila...@gmx.de wrote: Hi Matthias, So you're asking for the perfect entity recognition service, applicable to the easy domain of scientific texts? Sure, I developed one in my spare time, it's much better than OpenCalais, I was just too lazy to publish it yet... ;-) Yes please, I'll take two :) Seriously, I think it might be time to look at having common REST APIs for these things, so we have a more fluid marketplace where servers can be swapped and composed. How similar are the existing interfaces? I have no idea... One idea I had on NoTube that is implemented experimentally in http://lupedia.ontotext.com/ is to use RDFa as an interop point. So one of the interfaces from the Ontotext demo there is to return RDFa markup - http://lupedia.ontotext.com/test-page4rdfa.html ... however this doesn't leave much scope for including confidence measures etc in the output. cheers, Dan
Can anyone help with an XSLT GRDDL conversion of Open Packaging Format (OPF) into RDF/XML Dublin Core
Hi all http://www.idpf.org/2007/opf/OPF_2.0_final_spec.html#AppendixA defines a Dublin Core-based XML metadata format used for ebooks. This is very nice but a little disconnected from other Dublin Core data in RDF. It would be great to have some XSLT to explore closer integration and use of newer Dublin Core idioms (including http://purl.org/dc/terms/). Anyone got the time / expertise to explore this? A related task would be to track down some actual OPF data to convert. You don't need be an XSLT guru to do this :) There's a forum at http://www.idpf.org/forums/viewforum.php?f=5sid=4b4d5b89baf1300bd0f258e0715610e5 with some pointers to data. For example, I am pleased to announce that Adobe InDesign CS3 now supports the direct generation of OCF-packaged OPS content. A sample generated directly from InDesign CS3 can be found at: http://www.idpf.org/2007/ops/samples/TwoYearsBeforeTheMast.epub; ...which is a .zip package containing a file content.opf, the beginning of which I'll excerpt below. Thanks for any help exploring this. I found 3 examples in the forum, the metadata section of the .opf files are extracted below. As we think about RDFizing these, I think there are two aspects: firstly, getting modern RDF triples from the data as-is. This might take some care to figure out what role= should be, etc. But also secondly, thinking how the format could be enriched in future iterations, so that linked data URIs are used, eg. for those LCSH headings. At the moment they have dc:subjectlcsh: Czech Americans—Fiction./dc:subject but it would be nice if http://id.loc.gov/authorities/sh2009122741#concept was in there somewhere (instead, as well?). I'm sure any help working through these practicalities would be appreciated both by the OPF folk and by Dublin Core... cheers, Dan example 1: http://www.idpf.org/2007/ops/samples/TwoYearsBeforeTheMast.epub ?xml version=1.1? package xmlns=http://www.idpf.org/2007/opf; version=2.0 unique-identifier=bookid metadata xmlns:dc=http://purl.org/dc/elements/1.1/; dc:titleTwo Years Before the Mast/dc:title dc:creatorRichard H. Dana Jr./dc:creator dc:subject19th Century/dc:subject dc:subjectCalifornia/dc:subject dc:subjectSailors' life/dc:subject dc:subjectfur trade/dc:subject dc:descriptionTwo years at sea on the coast of California/dc:description dc:identifier id=bookidurn:uuid:4618c86c-f508-11db-8314-0800200c9a66/dc:identifier /metadata manifest item id=ncx href=toc.ncx media-type=text/xml/ item id=introduction href=Introduction.html media-type=application/xhtml+xml/ item id=chapteri href=ChapterI.html media-type=application/xhtml+xml/ ... example 2: http://www.idpf.org/2007/ops/samples/hauy.epub package xmlns=http://www.idpf.org/2007/opf; version=2.0 unique-identifier=uid metadata xmlns:dc=http://purl.org/dc/elements/1.1/; xmlns:opf=http://www.idpf.org/2007/opf; dc:titleValentin Haüy - the father of the education for the blind/dc:title dc:creatorBeatrice Christensen Sköld/dc:creator dc:publisherTPB/dc:publisher dc:date opf:event=publication2006-03-23/dc:date dc:date opf:event=creation2007-08-09/dc:date dc:identifier id=uidC0/dc:identifier dc:languageen/dc:language meta name=generator content=Daisy Pipeline OPS Creator / /metadata example 3: http://www.idpf.org/2007/ops/samples/myantonia.epub package version=2.0 unique-identifier=PrimaryID xmlns=http://www.idpf.org/2007/opf; metadata xmlns:dc=http://purl.org/dc/elements/1.1/; xmlns:opf=http://www.idpf.org/2007/opf; dc:titleMy Ántonia/dc:title dc:identifier id=PrimaryID opf:scheme=URNurn:uuid:14c77a9a-e849-11db-8314-0800200c9a66/dc:identifier dc:languageen-US/dc:language dc:creator opf:role=aut opf:file-as=Cather, Willa SibertWilla Cather/dc:creator dc:creator opf:role=ill opf:file-as=Benda, Wladyslaw TheodorW. T. Benda/dc:creator dc:contributor opf:role=edt opf:file-as=Noring, Jon E.Jon E. Noring/dc:contributor dc:contributor opf:role=edt opf:file-as=Menéndez, JoséJosé Menéndez/dc:contributor dc:contributor opf:role=mdc opf:file-as=Noring, Jon E.Jon E. Noring/dc:contributor dc:contributor opf:role=trc opf:file-as=Noring, Jon E.Jon E. Noring/dc:contributor dc:publisherDigitalPulp Publishing/dc:publisher dc:descriptionMy Ántonia is considered to be Willa S. Cather’s best work, first published in 1918. It is a fictional account (inspired by Cather’s childhood years) of the pioneer prairie settlers in late 19th century Nebraska. This version, intended for general readers, is a faithful, highly-proofed, and modestly modernized transcription of the First Edition, with text corrections by José Menéndez./dc:description dc:coverageNebraska prairie, late 19th and early 20th Centuries C.E./dc:coverage dc:sourceFirst Edition of My Ántonia, published by the Riverside Press Cambridge, Houghton
Re: Question about paths as URIs in the BBC RDF
On Thu, Jan 28, 2010 at 7:56 PM, Ross Singer rossfsin...@gmail.com wrote: Hi, I have a question about something I've run across when trying to parse the RDF coming from the BBC. If you take a document like: http://www.bbc.co.uk/music/artists/72c536dc-7137-4477-a521-567eeb840fa8.rdf notice how all of the URIs are paths, but there's no xml:base to declare where these actual paths may reside. If I point rapper at that URI, it brings me back fully qualified URIs: http://www.bbc.co.uk/music/artists/72c536dc-7137-4477-a521-567eeb840fa8#artist but the only way I can figure it's able to do that is for the parser and the HTTP agent to be in cahoots somehow, which seems like a breakdown in the separation of concerns -- this document is useless, except in the context of living on www.bbc.co.uk. The moment I cache it to my local system, if I'm understanding it correctly, it's now asserting these things about my filesystem (effectively). Rapper now says: file:///music/artists/72c536dc-7137-4477-a521-567eeb840fa8#artist So my questions would be: 1) Is this valid? 2) If so, is there an expectation of the parser being aware of the URI of retrieval? (I have written my own set of parsers, so I'd need to rethink this assumption, if so) 3) How do other client libraries handle this? Hi Ross, The relevant specs are http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210/#section-Syntax-ID-xml-base The XML Infoset provides a base URI attribute xml:base that sets the base URI for resolving relative RDF URI references, otherwise the base URI is that of the document. The base URI applies to all RDF/XML attributes that deal with RDF URI references which are rdf:about, rdf:resource, rdf:ID and rdf:datatype. http://www.faqs.org/rfcs/rfc2396.html which specifies relative URI processing given a base URI. I think most of what you need is in :5.1. Establishing a Base URI there. cheers, Dan
Re: ISBNs, owl:sameAs, etc
On Tue, Dec 29, 2009 at 4:47 AM, Daniel O'Connor daniel.ocon...@gmail.com wrote: Psst, Chris, Tobias - any chance of RDFBookMashup rendering 'owl:sameAs urn:isbn:12434567' ? I might see if I can glue freebase's 1.8 million or so ISBNs onto rdfbookmashup. It's probably common knowledge, but there's a few scripts here - http://wiki.foaf-project.org/w/DanBri/WikipediaISBNs - for extracting isbns from wikipedia dumps. It found about half a million last time I tried. Dan -- Forwarded message -- From: Daniel O'Connor daniel.ocon...@gmail.com Date: Tue, Dec 29, 2009 at 2:12 PM Subject: ISBNs, owl:sameAs, etc To: Discussion list for Freebase Experts freebase-expe...@freebase.com I don't suppose anyone wants to mint a whole bunch of URNs for ISBNs via a quick acre application? I'm upset that http://sameas.org/html?uri=urn%3Aisbn%3A9780670063260%0D%0Ax=0y=0 Doesn't give me http://www.freebase.com/view/soft/isbn/9780670063260/best (or its RDF friends) :( WOE.
Re: Creating JSON from RDF
On Mon, Dec 14, 2009 at 10:23 AM, Richard Light rich...@light.demon.co.uk wrote: In message c74badc3.20683%t.hamm...@nature.com, Hammond, Tony t.hamm...@nature.com writes Normal developers will always want simple. Surely what normal developers actually want are simple commands whereby data can be streamed in, and become available programmatically within their chosen development environment, without any further effort on their part? Personally I don't see how providing a format which is easier for humans to read helps to achieve this. Do normal developers like writing text parsers so much? Give 'em RDF and tell them to develop better toolsets ... RDF tooling still has some rough edges, it must be said. I am as enthusiastic about RDF as anyone (having been involved since 1997) but I've also seen the predictable results where on occasion people (eg. standards groups) have been 'arm twisted' into using the technology against their judgement and preferences. We don't have a solid well-packaged and tested RDF/XML parser for the Ruby language yet, for example. And while we do have librdfa integration into the Redland/Raptor C toolkit, it hasn't yet propagated into all the easy install settings we'll eventually find it - like my Amazon EC2 Ubuntu box, or the copy of Fink I installed recently on my MacBook Pro. And in PHP we have a fantastic RDF toolkit in ARC2, but it relies on MySQL for all complex querying. Plenty of scope for toolkit polish and improvement, nothing to worry massively about, but also lots of things that will cause pain if we take a stubborn RDF or nothing approach. I wholeheartedly applaud the pragmatic approach from Jeni and others. Come to that, RDF-to-JSON conversion could be a downstream service that someone else offers. You don't have to do it all. That could be useful for some, and inappropriate for others. Every new step in the chain introduces potential problems with latency, bugs, security and so on... cheers, Dan
Re: Creating JSON from RDF
On Mon, Dec 14, 2009 at 10:37 AM, Jeni Tennison j...@jenitennison.com wrote: Richard, My opinion, based on the reactions that I've seen from enthusiastic, hard-working developers who just want to get things done, is that we (the data.gov.uk project in particular, linked data in general) are not providing them what they need. We can sit around and wait for other people to provide the simple, light-weight interfaces that those developers demand, or we can do it ourselves. I can predict with near certainty that if we do not do it ourselves, these developers will not use the linked data that we produce: they will download the original source data which is also being made available to them, and use that. We, here, on this list, understand the potential power of using linked data. The developers who want to use the data don't. (And the publishers producing the data don't.) We simply can't say but they can just build tools, they can just use SPARQL. They are not going to build bridges to us. We have to build bridges to them. My opinion. Opinion, sure. But absolutely correct, also! (Excuse me if a small rant is triggered by all this...) Why, twelve years, two months and twelve days after http://www.w3.org/TR/WD-rdf-syntax-971002/ was first published, do we not have well packaged, maintained and fully compliant RDF parsers available in every major programming language? And that is for just the smallest critical piece of software needed to do anything useful. Short answer: because people from these mailing lists didn't sit down and do the work. We waited for someone else to do it. Some of us did bits of it, but ... taken as a whole, there are still plenty of basic pieces unfinished, in various languages. Millions upon millions of euros and dollars have been spent on Semantic this and Semantic that, and now Linked this and Linked that; countless conferences, workshops and seminars, PDFs, PPTs and so on; but still such basic software components haven't been finished, polished, tested and distributed. I'm not speaking ill of anyone in particular here. Countless folk have worked hard and tirelessly to progress the state of the art, get tools matured and deployed. But there is plenty plenty more to do. I do fear that the structure of both academic and research (eg. EU) funding doesn't favour the kind of work and workplan we need. In the SWAD-Europe EU project we were very unusual to have explicit funding and plans that allowed - for example - Dave Beckett to work not only on the RDF Core standards, but on their opensource implementation in C; or Jan Grant and Dave to work on the RDF Test Cases, or Alistair Miles to take SKOS from a rough idea to something that's shaking up the whole library world. I wish that kind of funding was easy to come by, but it's not. A lot of the work we need to get done around here to speed up progress is pretty boring stuff. It's not cutting edge research, nor the core of a world-changing startup, nor a good topic for a phd. With every passing year the RDF tools do get a bit better, but also the old ones code rot a bit, or new things come along that need supporting (GRDDL, RDFa etc.). What can be done in the SemWeb and Linked Data scene so that it becomes a bigger part of people's real dayjobs to improve our core tooling? Are the resources already out there but poorly coordinated? Would some lightweight collective project management help? Are there things (eg. finalising a ruby parser toolkit) that are weekend-sized jobs, month sized jobs; do they look more like msc student summer projects or EU STREP / IP projects in scale? Could we do more by simply transliterating code between languages? ie. if something exists in Python it can be converted to Ruby or vice-versa...? Are funded grants available (eg. JISC in UK?) that would help polish, package, test and integrate basic entry-level RDF / linked data software tools? Back on the original thread, I am talking here so far only about core RDF tools, eg. having basic RDF -to- triples facility available reliably in some language of choice. As Jeni emphasises, there are lots of other pieces of bridging technology needed (eg. into modern JSON idioms). But when we are hoping to convert folk to use pure generic RDF tools, we better make sure they're in good shape. Some are, some aren't, and that lumpy experience can easily turn people away... cheers, Dan
Re: Creating JSON from RDF
On Sun, Dec 13, 2009 at 8:03 PM, Dave Reynolds dave.e.reyno...@googlemail.com wrote: Hi Jeni, [Rest of post snipped for now, I'll respond properly later. Seems like we are on sufficiently similar wavelengths that it is just a matter of working the details.] I don't know where the best place is to work on this: I guess at some point it would be good to set up a Wiki page or something that we could use as a hub for discussion? I'd suggest setting up a Google Code area and making anyone who is interested a committer. That gives us a Wiki but also hosting for associated code for generating/navigating the format. I'd be happy to set one up. An alternative is the ESW Wiki but (a) that doesn't have an associated code area, (b) I don't personally have access right now (though I believe that is easily fixable) and (c) it might be presumptuous to associate it with W3C at this stage of baking. Ivan Herman (cc:'d) has been looking into a modernised general 'Semantic Web' wiki area on w3.org, ie. using (Semantic?) MediaWiki, rather than the old MoinMoin (for now and forseeable ESW will remain using MoinMoin, since migration is non-trivial). There was also some recent discussion at W3C about opening up Git or Mercurial distributed versioning systems for the standards community, which sounds like it could be a good fit for SemWeb IG-and-nearby collaborations. However that is at an early stage. Google Code might be easiest for now... Ivan - care to comment? Dan
Re: Need help mapping two letter country code to URI
On Mon, Nov 9, 2009 at 10:47 PM, Aldo Bucchi aldo.buc...@gmail.com wrote: Hi, I found a dataset that represents countries as two letter country codes: DK, FI, NO, SE, UK. I would like to turn these into URIs of the actual countries they represent. ( I have no idea on whether this follows an ISO standard or is just some private key in this system ). Any ideas on a set of candidata URIs? I would like to run a complete coverage test and take care I don't introduce distortion ( that is pretty easy by doing some heuristic tests against labels, etc ). There are some border cases that suggest this isn't ISO3166-1, but I am not sure yet. ( and if it were, which widely used URIs are based on this standard? ). http://www.fao.org/countryprofiles/geoinfo.asp might have something useful for you? Dan
Re: temporary URLs on Second Life
On 20/7/09 11:01, Danny Ayers wrote: Second Life objects to become HTTP-aware : http://www.massively.com/2009/07/08/second-life-objects-to-become-http-aware/ cool, right? well not exactly, it uses shortlived-by-design URIs: http://wiki.secondlife.com/wiki/LSL_http_server Well, we can't have it both ways. Either we want everything of interest to have HTTP URIs. Or we want all HTTP URIs to de-reference usefully forever. But we won't easily get eternally-useful http URIs for everything useful that has ever been plugged into the 'net. Anyone building systems that assume otherwise is building something rather fragile. There are a *lot* of data objects in secondlife... cheers, Dan
Re: Dons flame resistant (3 hours) interface about Linked Data URIs
On 10/7/09 12:23, Juan Sequeda wrote: Steve is right. If I am not wrong, when TBL gave his talk at CERN for the 20th aniversary of the web, he said that he was amazed that people were hacking HTML by hand. He never expected it. Now... we are the geeks doing RDF, conneg, linked data by hand... In a couple of years we will create tools for the non-geeks We have to learn from our history and not get ahead of ourselves. RDF has been a W3C Recommendation since February, 1999. The RDF work went public in Oct 1997. A lot has happened since then... Definitely we've done a lot of hacker-grade stuff in the meantime. But tools for going mainstream are getting overdue! Even tools for developers: eg. regular Redland builds on Windows; a solid packaged Ruby library, etc. Re tools for publishing, given the fiddlyness of doing RDF right, my vote is for everything that allows tools on one site to post RDF into another. I've suggested before that AtomPub + OAuth would be a plausible starting point, but I'm open to suggestions. Re non-geeks, http://www.youtube.com/watch?v=o4MwTvtyrUQ is a must-watch... cheers, Dan
[Fwd: 2nd CFP: ISWC'09 workshop on Ontology Matching (OM-2009)]
I don't normally forward conference CFPs, but it seems it would be useful to build some links with this community. Aw crap, can't believe I typed that. But you know what I mean... Dan Original Message Subject:2nd CFP: ISWC'09 workshop on Ontology Matching (OM-2009) Date: Wed, 8 Jul 2009 09:28:34 +0200 From: Pavel Shvaiko pa...@dit.unitn.it To: pavel.shva...@infotn.it Apologies for cross-postings -- CALL FOR PAPERS -- The Fourth International Workshop on ONTOLOGY MATCHING (OM-2009) http://om2009.ontologymatching.org/ October 25, 2009, ISWC'09 Workshop Program, Fairfax, near Washington DC., USA BRIEF DESCRIPTION AND OBJECTIVES Ontology matching is a key interoperability enabler for the Semantic Web, as well as a useful tactic in some classical data integration tasks. It takes the ontologies as input and determines as output an alignment, that is, a set of correspondences between the semantically related entities of those ontologies. These correspondences can be used for various tasks, such as ontology merging and data translation. Thus, matching ontologies enables the knowledge and data expressed in the matched ontologies to interoperate. The workshop has three goals: 1. To bring together leaders from academia, industry and user institutions to assess how academic advances are addressing real-world requirements. The workshop will strive to improve academic awareness of industrial and final user needs, and therefore, direct research towards those needs. Simultaneously, the workshop will serve to inform industry and user representatives about existing research efforts that may meet their requirements. The workshop will also investigate how the ontology matching technology is going to evolve. 2. To conduct an extensive and rigorous evaluation of ontology matching approaches through the OAEI (Ontology Alignment Evaluation Initiative) 2009 campaign: http://oaei.ontologymatching.org/2009/ This year's OAEI campaign introduces two new tracks about oriented alignments and about instance matching (a timely topic for the linked data community). Therefore, the ontology matching evaluation initiative itself will provide a solid ground for discussion of how well the current approaches are meeting business needs. 3. To examine similarities and differences from database schema matching, which has received decades of attention but is just beginning to transition to mainstream tools. TOPICS of interest include but are not limited to: Business cases for matching; Requirements to matching from specific domains; Application of matching techniques in real-world scenarios; Formal foundations and frameworks for ontology matching; Large-scale ontology matching evaluation; Performance of matching techniques; Matcher selection and self-configuration; Uncertainty in ontology matching; User involvement (including both technical and organizational aspects); Explanations in matching; Social and collaborative matching; Alignment management; Reasoning with alignments; Matching for traditional applications (e.g., information integration); Matching for dynamic applications (e.g., peer-to-peer, web-services). SUBMISSIONS Contributions to the workshop can be made in terms of technical papers and posters/statements of interest addressing different issues of ontology matching as well as participating in the OAEI 2009 campaign. Technical papers should be not longer than 12 pages using the LNCS Style: http://www.springeronline.com/sgw/cda/frontpage/0,11855,5-164-2-72376-0,00.html Posters/statements of interest should not exceed 2 pages and should be handled according to the guidelines for technical papers. All contributions should be prepared in PDF format and should be submitted through the workshop submission site at: http://www.easychair.org/conferences/?conf=om20090 Contributors to the OAEI 2009 campaign have to follow the campaign conditions and schedule at http://oaei.ontologymatching.org/2009/. IMPORTANT DATES FOR TECHNICAL PAPERS: August 11, 2009: Deadline for the submission of papers. September 6, 2009: Deadline for the notification of acceptance/rejection. October 2, 2009: Workshop camera ready copy submission. October 25, 2009: OM-2009, Westfields Conference Center, Fairfax, near Washington DC., USA. ORGANIZING COMMITTEE 1. Pavel Shvaiko (Main contact) TasLab, Informatica Trentina SpA, Italy 2. Jérôme Euzenat INRIA LIG, France 3. Fausto Giunchiglia University of Trento, Italy 4. Heiner Stuckenschmidt University of Mannheim, Germany 5. Natasha Noy Stanford Center for Biomedical Informatics Research, USA 6. Arnon Rosenthal The MITRE Corporation, USA PROGRAM COMMITTEE Yuan An, Drexel University, USA Zohra Bellahsene, LIRMM, France Paolo Besana, University of Edinburgh, UK Olivier Bodenreider, National Library of Medicine, USA
Re: tutorial on Music and the Web of Data
On 1/7/09 17:51, Kingsley Idehen wrote: Linked Music Data or Linked Open Music Data, either provides a clear moniker for a music oriented Linked Data Space on the Web :-) It does rather suggest the music files are up there too. And I wouldn't complain if they were... :) Dan
Re: how do I report bad sameAs links? (dbpedia - Cyc)
On 30/6/09 13:33, Kingsley Idehen wrote: Dan Brickley wrote: (I was reminded about the SW bug tracker after posting this; good idea) http://sw.opencyc.org/2008/06/10/concept/Mx4rv8L0_JwpEbGdrcN5Y29ycA says it is owl:sameAs dbpedia:Spaced And DBpedia reports the same. They're both wrong! The DBpedia page is about a television situation comedy show; the Cyc page is about a freeware computer game. This is problem in the OpenCyc data space (and the datasets generated from it). DBpedia doesn't reciprocate that claim :-) Yes it does! That's how I found the Cyc entry in the first place. Use case blogged here - http://danbri.org/words/2009/06/30/418 http://dbpedia.org/page/Spaced says owl:sameAs * fbase:Spaced * opencyc:en/Spaced_TheGame btw - we are on the verge of releasing DBpedia 3.3 (sometime today). Congratulations! :) Dan
how do I report bad sameAs links? (dbpedia - Cyc)
http://sw.opencyc.org/2008/06/10/concept/Mx4rv8L0_JwpEbGdrcN5Y29ycA says it is owl:sameAs dbpedia:Spaced And DBpedia reports the same. They're both wrong! The DBpedia page is about a television situation comedy show; the Cyc page is about a freeware computer game. cheers, Dan
Re: Visualization of domain and range
Interesting discussion! On 25/6/09 14:15, Simon Reinhardt wrote: Hi Bernhard Schandl wrote: [1] http://www.ifs.univie.ac.at/schandl/2009/06/domain+range_bad.png [2] http://www.ifs.univie.ac.at/schandl/2009/06/domain+range_better.png I like this. The former has several problems anyway: you have to repeat properties if they can hold between several classes [3] and you have to draw lines connecting lines for expressing sub-properties or inverse properties [4] which looks rather confusing and is not supported by many visual modelling tools. Yeah, my [4] is at my threshold of tolerance for chaos in a diagram. I wanted a way to show the core of the FOAF spec in a picture, so tried (despite similar concerns to those mentioned in this thread) the style of putting domain/range directly in an instance-like style. In http://www.flickr.com/photos/danbri/1856478164/ ([4]) I try to do too many things at once: * show the classes that each property is used with * show sub-property relationships * show sub-class relationships * show some typical properties * show attachment points for friends of FOAF namespaces (DOAP, SIOC, DC, Geo etc), with classes and with sample properties This is a lot of information. I did try to make a gradual reveal slideshow version, building up from something simple. It wasn't great. The layout was done by hand to minimise crossovers, and looking at it, I think the whole structure could be twisted/stretched to be more evenly presented. It was fiddly to do though. A sample of instance-data would probably convey most of the same information about domain/range, and would allow subclasses reasonably too. Sub-property would remain hard... If anyone wants to mess around with the FOAF example, source data in OmniGraffle format is here and also in SVG: just do svn co http://svn.foaf-project.org/foaf/trunk/xmlns.com/htdocs/foaf/spec/images/; [3] also shows a combination of the two problems: if you draw several lines for one property, you have to connect sub-properties to each of them or to an arbitrarily selected one. The only downside I see here is that adding ellipses for properties makes the diagram a bit more bloated. I don't find [3] very readable. There was another Harmony ABC diagram (I think from Carl Lagoze) in http://www.ilrt.bris.ac.uk/discovery/harmony/docs/abc/abc_draft.html#Simple%20Rules that uses dotted lines for implied types, I think this can work well in instance level presentations. cheers, Dan Regards, Simon [3] http://metadata.net/harmony/ABC_Class_Hierarchy_with_Properties.gif [4] http://www.flickr.com/photos/danbri/1856478164/ (sorry Dan!)
Re: .htaccess a major bottleneck to Semantic Web adoption / Was: Re: RDFa vs RDF/XML and content negotiation
On 26/6/09 10:51, Toby Inkster wrote: On Fri, 2009-06-26 at 09:35 +0200, Dan Brickley wrote: Does every major RDF toolkit have an integrated RDFa parser already? No - and even for those that do, it's often rather flaky. Seseme/Rio doesn't have one in its stable release, though I believe one is in development for 3.0. Redland/Raptor often (for me at least) seems to crash on RDFa. It also complains a lot when named entities are used (e.g.nbsp;) even though the XHTML+RDFa 1.0 DTD does allow them. Jena (just testing on sparql.org) doesn't seem to handle RDFa at all. Not really toolkits per se, but cwm and the current release of Tabulator don't seem to have RDFa support. (Though I think support for the latter is being worked on.) For application developers who are specifically trying to support RDFa, none of this is a major problem - it's pretty easy to include a little content-type detection and pass the XHTML through an RDFa-XML converter prior to the rest of your code getting its hands on it - but this does require specific handling, which must be an obstacle to adoption. Yep, pretty much as I feared. Also the Google SGAPI currently only reads FOAF in RDF/XML form, not yet updated to use the rdfa support in Rapper. Re app developers, it depends a lot. If your app is built inside some framework - eg. Protege - RDFa might be quite hard to integrate. Some apps also store to local disk rather than HTTP space, and so using content-negotiation is tricky. RDFa files don't have any well known file-suffix patterns either. cheers, Dan
Re: http://ld2sd.deri.org/lod-ng-tutorial/
On 22/6/09 23:16, Martin Hepp (UniBW) wrote: Yves Raimond wrote: Ontology modularization is a pretty difficult task, and people use various heuristics for deciding what to put in the subset being served for an element. There is no guarantee that the fragment you get contains everything that you need. There is no safe way of importing only parts of an ontology, unless you know that its modularization is 100% reliable. Serving fragments of likely relevant parts of an ontology for reducing the network overhead is not the same as proper modularization of the ontology. Can you give a concrete example of the danger described here? ie. the pair of a complete (safe) ontology file and a non-safe subset, and an explanation of the problems caused. I can understand there is no guarantee that the fragment you get contains everything you need, and I also remind everyone that dereferencing is a privilege not a right: sometimes the network won't give you what you want, when you want it. But I've yet to hear of anyone who has suffered due to term-oriented ontology fragment downloads. I guess medical ontologies would be the natural place for horror stories? cheers, Dan
Re: http://ld2sd.deri.org/lod-ng-tutorial/
On 23/6/09 09:33, Martin Hepp (UniBW) wrote: Hi Dan: I think Alan already gave examples this morning. An ontology can contain statements about the relationship between conceptual elements - classes, properties, individuals - that (1) influence the result to queries but (2) are not likely retrieved when you just dereference an element from that ontology. The more complex an ontology is, the more difficult is it to properly modularize it. Indeed, I missed his mail until after I'd sent mine. And the examples are helpful. However they are - for the non-SemWeb enthusiast - incredibly abstract: FunctionalObjectProperty(p) InverseFunctionalObjectProperty(p) ObjectPropertyDomain(:a) ObjectPropertyRange(:b) etc. What I'd love to see is some flesh on these bones: a wiki page that works through these cases in terms of a recognisable example. Products, people, documents, employees, access control, diseases, music, whatever. I want something I can point to that says this is why it is important to take care of the formalisms..., this is what we can do so that simple-minded but predictable machines do the hard work instead of us But basically my main point is that the use of owl:imports is defined pretty well in http://www.w3.org/TR/owl-ref/#imports-def and there is no need to deviate from the spec just for the matter of gut feeling and annoyance about the past dominance of DL research in the field. And as the spec says - with a proper owl:imports statement, any application can decide if and what part of the imported ontologies are being included to the local model for the task at hand. +1 on respecting the specs, but also all know that not every piece of specification finds itself useful in practice. Having a worked-through to the instance level account of why owl:imports is useful would help. There is no compulsion re standards here: if someone is happy publishing RDFS, we can't make them use OWL. If they're happy using OWL we can't make them use RIF. If they're happy with RIF 1, we can't make them use RIF 2 etc. Or any particular chapter or verse of those specs. What we can do is ground our evangelism in practical examples. And for those to be compelling, they can't solely be at the level of properties of properties; we need an account in terms of instance level use cases too. cheers, Dan
Re: http://ld2sd.deri.org/lod-ng-tutorial/
[snip] Yup, re owl:imports, I enthusiastically added it to the FOAF spec when some OWL WG insider suggested it was the right thing to use, and dutifully removed it when someone (I forget who in both cases - quite possibly same person!) a few years later told me it had fallen from fashion within the OWL scene. Re attitudes to OWL ... I do agree there have in the distant past (ie. last year!) been a few casually dismissive remarks around here regarding OWL. It's all too easy for a healthy enthusiasm for practical tools to trick us into seeing tools that we're not so familiar with as impractical. I'm happy to have read plenty of useful discussion here and nearby about how best to use or augment owl:sameAs. FOAF is a described using OWL. I expect some day in the not too distant future, Dublin Core Terms will be described in OWL too. And the community on public-lod@w3.org have been excellent champions of both. Things aren't too polarised, despite the occasional lapses into them and us-ism... Optimistically, Dan
Re: Common Tag, FOAF and Dublin Core Re: Common Tag - semantic tagging convention
On 18/6/09 13:31, Bernard Vatant wrote: Rob, Danny (and Dan) ... why not use simply dc:creator and dc:date to this effect? Right. dc:date would seem a good choice, though I reckon foaf:maker might be a better option than dc:creator as the object is a resource (a foaf:Agent) rather than a literal. While it's likely to mean an extra node in many current scenarios, it offers significantly more prospect for linking data (and less ambiguity). dcterms:creator would also allow for use of a resource. Bibliontology uses dcterms over dc. Well I actually meant dcterms:creator when I wrote dc:creator, sorry. So you can link your personal tags to your foaf profile, for example. And it's consistent even for tag:AutoTag, since the range of dcterms:creator is dcterms:Agent, including person, organisation and software agent as well. Unless I miss some sublte distinguo dcterms:Agent is equivalent to foaf:Agent, and dcterms:creator equivalent to foaf:maker. BTW, with due respect to danbri, I wish FOAF would be revised to align whenever possible on dcterms vocabulary, now that it has clean declarations of classes, domains and ranges ... http://dublincore.org/documents/dcmi-terms is worth (re)visiting :-) Completely agree. I'm very happy with the direction of DC terms. The foaf:maker property was essential for a while, until DC was cleaned up. I'll mark it as a sub-property of dcterms:creator. I hope we'll get reciprocal claims into the Dublin Core RDF files some day too... Copying Tom Baker here. Tom - what would the best process be for adding in mapping claims to the DC Terms RDF? Maybe we could draft some RDF, put it onto dublincore.org elsewhere, and for now add a seeAlso from the namespace RDF? cheers, Dan
Re: Common Tag, FOAF and Dublin Core Re: Common Tag - semantic tagging convention
On 18/6/09 15:07, Thomas Baker wrote: On Thu, Jun 18, 2009 at 01:49:56PM +0200, Dan Brickley wrote: Well I actually meant dcterms:creator when I wrote dc:creator, sorry. So you can link your personal tags to your foaf profile, for example. And it's consistent even for tag:AutoTag, since the range of dcterms:creator is dcterms:Agent, including person, organisation and software agent as well. Unless I miss some sublte distinguo dcterms:Agent is equivalent to foaf:Agent, and dcterms:creator equivalent to foaf:maker. BTW, with due respect to danbri, I wish FOAF would be revised to align whenever possible on dcterms vocabulary, now that it has clean declarations of classes, domains and ranges ... http://dublincore.org/documents/dcmi-terms is worth (re)visiting :-) Completely agree. I'm very happy with the direction of DC terms. The foaf:maker property was essential for a while, until DC was cleaned up. I'll mark it as a sub-property of dcterms:creator. I hope we'll get reciprocal claims into the Dublin Core RDF files some day too... Copying Tom Baker here. Tom - what would the best process be for adding in mapping claims to the DC Terms RDF? Maybe we could draft some RDF, put it onto dublincore.org elsewhere, and for now add a seeAlso from the namespace RDF? Hi Dan, If you could write up a short proposal -- how the properties are defined, with a proposed mapping claim -- we could discuss this in the DCMI Usage Board and take a decision. We associate changes in the namespace RDF (and related namespace documentation) with formal decisions so would need to follow a process. Sounds like a plan! Thanks. I'll take it to DC lists and report back here as things progress. cheers, Dan