from:"Dan Brickley"

Re: How "valid" is to use a term marked as "unstable" for a data publisher/consumer?

2016-07-20 Thread Dan Brickley

+Cc: Leigh Dodds, for old time's sake

On 20 July 2016 at 09:45, Ghislain Atemezing
 wrote:
> Hi all,
>
> [ Apologize if this question has been answered before in this group. ]
>
> Recently, I was working on a project where we were just reusing existing
> terms for building a knowledge base for a private company. When we were
> considering using for example foaf:birthday, I was told by someone that it
> was marked “unstable” in the vocabulary file. The normal reaction would have
> been “so what?” ;)
>
> However I found the question somehow interesting in the sense that the vocab
> defining the term status of the vocabulary [1] uses“unstable” for all the
> properties in the vocabulary, and of course it is reused by many
> vocabularies [2]. At the meantime, FOAF is one of the most popular
> vocabulary used in the LOD cloud (stats for 2014 here [3] ) and I guess
> there are many data modeled with some of the terms flagged as “unstable”. I
> found an example dataset here for Nobel Prize [4].
>
> Is there any risk for data publishers or consumers (e.g., visual
> applications) to reuse “safely” terms flagged as “unstable”?
> Do you know any study on this type of questions?
>
> Any experience or thought is more than welcome to propose a more rationale 
> answer to my project partner.

On some level this is my fault :)

The vocabulary at [1] bubbled out of FOAF collaborations many years
ago, where we were keen to explore more fine-grained mechanisms for
term evolution than the previously dominant notion that versioning
happened at the vocabulary/namespace level. We had seen efforts like
Dublin Core get stuck because of a sense that changing any term's
documentation necessitated a revision to the schema's version number
(DC 1.0 -> DC 1.1), and I had also been responsible for somewhat naive
language in the 1998/1999 working drafts of the initial RDFS spec
which encouraged the notion that any changes to a schema should
require a new URL.

See http://lists.foaf-project.org/pipermail/foaf-dev/2003-July/005462.html
for the initial design discussions in the FOAF project, ~2003.

The reason that the vocab status vocabulary is itself marked as
unstable, is that we hoped to refine it in the light of experience,
and in particular to consider using URLs instead of well known
strings, to better support i18n/l18n and SKOS-style refinement. We did
make a sketch of a sketch of a W3C Note on this at
https://www.w3.org/2003/06/sw-vocab-status/note but didn't complete
the work. There may also be things we can reflect from the schema.org
experience, as well as mechanisms in OWL and SKOS, that ought to be
incorporated. On the schema.org side, for example, we recently added a
"pending" area of the vocabulary (see http://pending.schema.org/)
where drafts are shared; this is roughly like "unstable" but the word
"pending" is slightly less intimidating to potential users.

The main point of marking a term 'unstable' is that if the term
maintainer does change it in the light of experience, they have an
excuse and can say "hey, don't blame us, we said there was some chance
we might change the definitions in light of experience". Beyond that,
I doubt there is much that can be formally encoded and potential users
are probably best advised to read actual human-oriented text and
discussions to understand any remaining open issues.

For example, http://pending.schema.org/ClaimReview describes the
status ('pending') of the schema.org term ClaimReview. Probably the
most important thing that page does is point to the corresponding
issue tracker entry at
https://github.com/schemaorg/schemaorg/issues/1061 where you can ready
anything that is known in that vocabulary community about the maturity
or otherwise of the relevant term. So if I were revisiting the
vocabulary status vocabulary in 2016 my advice would be that it should
be re-oriented towards discovery of such human-oriented documentation,
rather than trying to over-formalize codes like 'unstable' vs
'testing' whose nuanced meaning will naturally vary by context and
project. If you dig around
http://lists.foaf-project.org/pipermail/foaf-dev/2003-July/005462.html
you'll see that was pretty much what we had in mind originally...

cheers,

Dan

> Best,
> Ghislain
>
> [1] http://www.w3.org/2003/06/sw-vocab-status/ns#
> [2] http://lov.okfn.org/dataset/lov/vocabs/vs
> [3] http://linkeddatacatalog.dws.informatik.uni-mannheim.de/state/
> [4] http://data.nobelprize.org/
> ---
> Ghislain A. Atemezing, Ph.D
> Mail: ghislain.atemez...@gmail.com
> Web: https://w3id.org/people/gatemezing
> Twitter: @gatemezing
> About Me: https://about.me/ghislain.atemezing
>
>
>
>
>
>
>
>

new W3C CSV on the Web specs, now at Candidate Recommendation stage - please implement!

2015-07-28 Thread Dan Brickley

Hi! Short version: Please see
http://www.w3.org/blog/news/archives/4830 for the Candidate
Recommendation specs from W3C's CSV on the Web group -
https://www.w3.org/2013/csvw/wiki/Main_Page

Long version:

These are the 4 docs,

Model for Tabular Data and Metadata on the Web—an abstract model for
tabular data, and how to locate metadata that enables users to better
understand what the data holds; this specification also contains
non-normative guidance on how to parse CSV files
http://www.w3.org/TR/2015/CR-tabular-data-model-20150716/

Metadata Vocabulary for Tabular Data—a JSON-based format for
expressing metadata about tabular data to inform validation,
conversion, display and data entry for tabular data
http://www.w3.org/TR/2015/CR-tabular-metadata-20150716/

Generating JSON from Tabular Data on the Web—how to convert tabular
data into JSON
http://www.w3.org/TR/2015/CR-csv2json-20150716/

Generating RDF from Tabular Data on the Web—how to convert tabular
data into RDF
http://www.w3.org/TR/2015/CR-csv2rdf-20150716/

See the blog post for more links including an extensive set of test
cases, our GitHub repo and the mailing list for feedback. Also note
that the approach takes CSV as its central stereotypical use case but
should apply to many other tabular data-sharing approaches too (e.g.
most obviously tab separated). So if you prefer tab-separated files to
comma-separated, do please take a look! The Model spec defines that
common model, the metadata document defines terminology for talking
about instances of that model, and the last two specs apply this
approach to the problem of mapping tables into JSON and/or RDF.

The group expects to satisfy the implementation goals (i.e., at least
two, independent implementations for each of the test cases) by
October 30, 2015. Please take a look, and pass this along to other
groups who may be interested.

cheers,

Dan

for the CSVW WG


p.s. since I'm writing I'll indulge myself and share my personal
favourite part, which is the ability (in the csv2rdf doc) to map from
rows in a table via templates into RDF triples. This is a particularly
interesting/important facility and worth some attention. Normally I
wouldn't enthuse over (yet another) new RDF syntax but the ability to
map tabular data into triples via out-of-band mappings is very
powerful. BTW the group gave some serious consideration to applying
R2RML here (see docs and github/wiki for details), however given the
subtle differences between SQL and CSV environments we have taken a
different approach. Anyway please take a look!

Spec review request: CSV on the Web

2015-04-20 Thread Dan Brickley

The CSV on the Web Working Group [1] has just published a new set of
Working Drafts, which we consider feature complete and implementable.
We particularly
seek reviews from Web Security, Privacy, Internationalization and Accessibility
perspectives at this time. A request has also been sent to the TAG
[7]. We request
review now rather than later since we are following W3C's revised Process in
which there is no distinct Last Call; we prefer to invite reviews now
rather than wait for a formal Candidate Recommendation.

The drafts are:

Model for Tabular Data and Metadata on the Web [2]
- an abstract model for tabular data, and how to locate metadata that enables
users to better understand what the data holds; this specification also
contains non-normative guidance on how to parse CSV files.

Metadata Vocabulary for Tabular Data [3]
- a JSON-based format for expressing metadata about tabular data to inform
validation, conversion, display and data entry for tabular data

Generating JSON from Tabular Data on the Web [4]
- how to convert tabular data into JSON

Generating RDF from Tabular Data on the Web [5]
- how to convert tabular data into RDF

We are keen to get comments on these specifications, either as issues on
our GitHub repository [6]  or by
posting to public-csv-wg-comme...@w3.org.

We would also like to invite people to start implementing these
specifications and to donate their test cases into our test suite.
Building this test suite, as well as responding to comments,
will be our focus over the next couple of months.


Dan



[1] http://www.w3.org/2013/csvw/wiki/Main_Page
[2] http://www.w3.org/TR/2015/WD-tabular-data-model-20150416/
[3] http://www.w3.org/TR/2015/WD-tabular-metadata-20150416/
[4] http://www.w3.org/TR/2015/WD-csv2json-20150416/
[5] http://www.w3.org/TR/2015/WD-csv2rdf-20150416/
[6] https://github.com/w3c/csvw/issues
[7] https://lists.w3.org/Archives/Public/www-tag/2015Apr/0028.html

Re: How to avoid that collections break relationships

2014-03-27 Thread Dan Brickley

On 25 March 2014 15:52, Markus Lanthaler markus.lantha...@gmx.net wrote:

 please let's not talk about hash URLs etc. here, ok?

 So, please. Let's try to focus on the problem at hand.

 As an online discussion grows longer, the probability of a
comparison involving http-range-14 or URNs approaches 1

Dan

Re: How to avoid that collections break relationships

2014-03-27 Thread Dan Brickley

On 26 March 2014 04:26, Pat Hayes pha...@ihmc.us wrote:

 On Mar 25, 2014, at 11:29 AM, Markus Lanthaler markus.lantha...@gmx.net 
 wrote:

 On Tuesday, March 25, 2014 5:00 PM, Pat Hayes wrote:
 Seems to me that the, um, mistake that is made here is to use the same
 property schema:knows for both the individual case and the list case.

 Exactly.. it is especially problematic if rdfs:range is involved.


 Why not invent a new property for the list case, say :knowsList, and
 add a relationship between them as an RDF triple:

 :knowsList :listPropertyOf schema:knows .

 where :listPropertyOf has the semantic condition

 aaa listPropertyOf bbb
 xxx aaa ddd
 ddd schema:itemLIstElement yyy

 imply

 xxx bbb yyy

 Yeah, that's very similar to an idea I had (but it wasn't so elegant). The
 issue is that you won't discover :knowsList if you look for schema:knows
 unless you infer the xxx bbb yyy triples. In other words, if you don't
 know :knowsList and thus ignore it, you would neither find the collection
 nor the schema:knows relationships.

 Hmm. I would be inclined to violate IRI opacity at this point and have a 
 convention that says that any schema.org property schema:ppp can have a 
 sister property called schema:pppList, for any character string ppp. So you 
 ought to check schema:knowsList when you are asked to look for schema:knows. 
 Then although there isn't a link in the conventional sense, there is a 
 computable route from schema:knows to schema:knowsList, which as far as I am 
 concerned amounts to a link.

In fact something very close to this was considered for the roles
proposal I circulated yesterday, i.e.
http://lists.w3.org/Archives/Public/public-vocabs/2014Mar/0111.html

The idea was to define a URI template pattern e.g.
http://schema.org/role/{propertyname} so that '/actor' would be
shadowed by '/role/actor', and the latter used when describing a
situation involving 3 entities (movie, role, person) rather than a
binary relationship between movie and person. In this case so far we
decided against introducing the complexity, but similar designs might
prove appropriate for related problems.

Dan

Re: Schema.org v1.0e published: Order schema, Accessibility properties

2013-12-05 Thread Dan Brickley

On 4 December 2013 23:07, Aaron Bradley aaran...@gmail.com wrote:
 Swell stuff!

 Are there plans to bring the previously published draft specification on the
 Google Developer site [1] in line with this new specification on schema.org?

 Properties only on schema.org:
 confirmationNumber
 discount / discountCode / discountCurrency
 isGift
 orderedItem
 paymentDue / PaymentMethod / PaymentMethodID / paymentUrl

 Properties only on Google Developer:
 price / priceCurrency / priceSpecification

 As well - and probably the most noticable difference - the Google version
 uses the property seller instead of the schema.org property merchant.

 Because the earlier version exists on Google Developer I know this is
 chiefly a Google-esque issue, but insofar as there's now published version
 of schema.org/Order *on* schema.org, it would obviously be mutually
 advantageous if the seller/merchant property nomenclature was normalized -
 perhaps it's in the works.

As a first step we'll get a link from the google work-in-progress docs
to the final finished thing. I can't say for sure how long until
various Google products understand the new vocabulary, although
Google's Structured Data Testing Tool should at least already not
complain when it sees new (v1.0d, v1.0e) terms. Work in progress!

Dan


 [1]
 https://developers.google.com/gmail/actions/reference/order#specification


 On Wed, Dec 4, 2013 at 8:57 AM, Kingsley Idehen kide...@openlinksw.com
 wrote:

 On 12/4/13 10:54 AM, Pierre-Yves Vandenbussche wrote:

 Hi all,

 you can find more information on what has changed since the last version
 here: http://lov.okfn.org/dataset/lov/dif/dif_schema_1.0d-1.0e.html

 The Schema.org entry on LOV is as well updated (versions file and
 difference can be found on the timeline):
 http://lov.okfn.org/dataset/lov/details/vocabulary_schema.html

 Regards,
 Pierre-Yves.


 Awesome on both fronts re., schema.org version 1.0e and LOV's cool delta
 page!

 Kingsley



 Pierre-Yves Vandenbussche.


 On Wed, Dec 4, 2013 at 3:30 PM, Dan Brickley dan...@danbri.org wrote:

 Schema.org version 1.0e has been published. This release includes a
 schema for describing Orders, see http://schema.org/Order as well as
 the Accessibility properties for http://schema.org/CreativeWork
 pre-announced recently,
 http://lists.w3.org/Archives/Public/public-vocabs/2013Nov/0190.html
 (blog post on its way).

 It also fixes a small bug with http://schema.org/validFrom (in 1.0d
 made the text overly focussed on Civic Actions, and we revert the
 expected type back to DateTime).

 As always, a machine-readable RDFa dump of the entire schema is
 available at http://schema.org/docs/schema_org_rdfa.html and bugfixes,
 discussion etc. are welcomed here.

 Many thanks to everyone who was involved!

 Dan

 (trying to get in first with an announcement for a change ;)




 --

 Regards,

 Kingsley Idehen
 Founder  CEO
 OpenLink Software
 Company Web: http://www.openlinksw.com
 Personal Weblog: http://www.openlinksw.com/blog/~kidehen
 Twitter Profile: https://twitter.com/kidehen
 Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
 LinkedIn Profile: http://www.linkedin.com/in/kidehen

Re: List membership - more women

2013-06-24 Thread Dan Brickley

On 24 June 2013 10:34, Isabelle Augenstein i.augenst...@sheffield.ac.ukwrote:

  Hi Dominic,

 I only joined the list a few months ago, so my observations might be
 inaccurate, but

 - Overall, most discussions on the list seem to be rather philosophical
 (What is Linked Data? Does Linked Data require RDF?), which are not the
 kind of discussions I was hoping for when I joined the list in the first
 place


Quite. A lot of the initial enthusiasm about Linked Data was associated
with a despair some felt about the Semantic Web slogan, which had got
itself associated with overly-academic, complex-KR-obsessed and other
unworldy concerns. I suspect this sort of churn is a natural part of the
lifecycle of standards work; some are starting to feel about public-lod the
same way.


 - My guess would be that the ratio between subscribers and people posting
 on the list is rather low in general in addition to few women being
 subscribed to the list (But I bet we can get some statistics for that?)


 There are just over 1000 subscribers to the list (no gender figures
available for those). You can see from
http://lists.w3.org/Archives/Public/public-lod/2013Jun/author.html who the
most vocal participants are.

Dan

Re: The Great Public Linked Data Use Case Register for Non-Technical End User Applications

2013-06-24 Thread Dan Brickley

On 24 June 2013 14:31, Kingsley Idehen kide...@openlinksw.com wrote:
 On 6/24/13 2:14 AM, Michael Brunnbauer wrote:

 Hello Kingsley Idehen,

 On Sun, Jun 23, 2013 at 05:32:00PM -0400, Kingsley Idehen wrote:

 We don't need a central repository of anything. Linked Data is supposed
 to be about enhancing serendipitous discovery of relevant things.

You appear to be arguing against the simple useful practice of
communally collecting information. Just because we can scatter
information around the Web and subsequently aggregate it, doesn't mean
that such fragmentation is always productive.  I don't see anyone
arguing that the only option is to monolithically centralise
everything forever; just that a communal effort on cataloguing things
might be worth the time.




 Google already demonstrates some of this, in the most obvious sense via its
 search engine, and no so obvious via its crawling of Linked Data which then
 makes its way Google Knowledge Graph and G+ etc..

- http://en.wikipedia.org/wiki/Citation_needed

You've sometimes said that all Web pages are already Linked Data
with boring link-types. Are you talking about something more RDFish in
this case?

Dan

Are Topic Maps Linked Data?

2013-06-23 Thread Dan Brickley

Just wondering,

Dan

Re: The Great Public Linked Data Use Case Register for Non-Technical End User Applications

2013-06-23 Thread Dan Brickley

On 23 June 2013 23:46, Kingsley Idehen kide...@openlinksw.com wrote:

On 6/23/13 5:36 PM, Barry Norton wrote:

Are you confusing Linked Data and Linked Open Data?

Of course not!

Web-like structured data enhanced with explicit entity relationship
semantics enables serendipitous discovery at the public or private level.

Open has nothing to do with Public . Open is about standards and the
interoperability they accord.

What part of
http://www.w3.org/wiki/index.php?title=SweoIG/TaskForces/CommunityProjects/LinkingOpenDataoldid=35551am
I misunderstanding? The early LOD collaborations had a clear emphasis
on
open in the sense of freely available data. I can see merit in broadening
that, but to say has nothing to do with seems at odds with how a lot of
people appeared to be understanding the initiative.

Dan

Interlinking Open Data on the Semantic Web

Chris Bizer, Richard Cyganiak

*1. Please provide a brief description of your proposed project.*

The Open Data Movement http://en.wikipedia.org/wiki/Open_Data aims at
making data freely available to everyone. There are already various
interesting open data sources availiable on the Web. Examples include
Wikipedia http://www.wikipedia.org/,Wikibooks http://www.wikipedia.org/
, Geonames http://www.geonames.org/, MusicBrainz http://musicbrainz.org/
, WorldNet http://wordnet.princeton.edu/online/, the DBLP
bibliographyhttp://www.informatik.uni-trier.de/~ley/db/ and
many more which are published under Creative
Commonshttp://creativecommons.org/
or Talis http://www.talis.com/tdn/tcl licenses.

The goal of the proposed project is to make various open data sources
available on the Web as RDF and to set RDF links between data items from
different data sources.

There are already some data publishing efforts. Examples include the
dbpedia.org http://dbpedia.org/docs/ project, the Geonames
Ontologyhttp://www.geonames.org/ontology/ and
a D2R Server publishing the DBLP
bibliographyhttp://www4.wiwiss.fu-berlin.de/dblp/.
There are also initial efforts to interlink these data sources. For
instance, the dpedia RDF descriptions of cities includes owl:sameAs links
to the Geonames data about the city (1) http://dbpedia.org/docs/#link.
Another example is the RDF Book
Mashuphttp://sites.wiwiss.fu-berlin.de/suhl/bizer/bookmashup/ which
links book authors to paper authors within the DBLP bibliography
(2)http://lists.w3.org/Archives/Public/semantic-web/2006Dec/0022
.

*2. Why did you select this particular project?*

For demonstrating the value of the Semantic Web it is essential to have
more real-world data online. RDF is also the obvious technology to
interlink open data from various sources.

*3. Why do you think this project will have a wide impact?*

A huge inter-linked data set would be beneficial for various Semantic Web
development areas, including Semantic Web browsers and other user
interfaces, Semantic Web crawlers, RDF repositories and reasoning engines.

Having a variety of useful data online would encourage people to link to it
and could help bootstrapping the Semantic Web as a whole.

Dan

Re: Linked Data discussions require better communication

2013-06-21 Thread Dan Brickley

On 20 June 2013 18:54, Giovanni Tummarello giovanni.tummare...@deri.org wrote:
 My 2c is .. i agree with kingsley diagram , linked data should be possible
 without RDF (no matter serialization) :)
 however this is different from previous definitions

 i think its a step forward.. but it is different from previously. Do we want
 to call it  Linked Data 2.0? under this definition also schema.org marked up
 pages would be linked data .. and i agree plenty with this .

Schema.org pages are already RDF and imho Linked Data, as was FOAF
even when (shock horror!) the graph contains bNodes. Nothing in
TimBL's original note _forces_ you to always use URIs for every node
in the graph. It does advocate strongly for lots of URIs and for
machine-friendly data available from using them.

To be clear, Schema.org is based on RDF. We just choose our moments
for when to emphasize this, and when to focus on other practicalities.

I'd draw an analogy with Unicode. It's there in the background and
helps tie things together, even if you don't always need to be
emphasizing it when talking about things that use it.

Dan

Re: Monitoring subscribers on the list

2013-06-18 Thread Dan Brickley

On 18 June 2013 15:43, Barry Norton barry.nor...@ontotext.com wrote:
 Does anyone know if the number of subscribers on the list can be monitored?

 I have a limited degree of monitoring, for the EUCLID project, through the
 RSS feed and Web scraping, but I'm struggling to measure:
 1) what fraction of subscribers the vocal minority of posters are;
 2) how unsubscriptions correlate with the length of current threads.

I have access to a list admin tool that gives me the current count. I
don't believe time-series data is easily available.

The list has 1063 subscribers/survivors currently. Semantic-Web@ has
1344; the defunct www-rdf-logic has 433. I've no idea how many bounce
(I believe some bouncing can cause auto-unsubscription). You can
approximate frequent poster stats manually from e.g.
http://lists.w3.org/Archives/Public/public-lod/2013Jun/author.html ...
I'm not aware of an machine-friendly version.

I don't unsubscribe from lists any more, I just pipe them into
folders. I guess others do the same?

Dan

Re: CFP: Data In Web Search (DISH) Workshop - 13th May 2013, Rio de Janeiro, Brazil

2013-02-25 Thread Dan Brickley

Just to let you know, the Workshop papers deadline is extended until
March 4th 2013. Please don't ask me what time of day on March 4th!
--Dan


On 9 January 2013 18:22, Dan Brickley dan...@danbri.org wrote:
 [I don't often crosspost to 3 W3C lists, but I think this will be an
 important event and hope to see some of you there... --Dan]



 CFP: Data In Web Search (DISH) Workshop - 13th May 2013, Rio de Janeiro, 
 Brazil.

  Workshop: http://dish2013.foaf-project.org/
  Conference: http://www2013.org/

 This WWW2013 Workshop focuses on new approaches to using structured
 data for improving Web search. Most Web documents and queries are
 about entities and the relationships between them, i.e., structured
 data with documented semantics. However, popular search engines have
 historically ignored structured data, instead relying on techniques
 that model the document and queries as a bag of words.

 Recent developments, most notably the dramatic increase in the use of
 structured data markup on web pages have lead to substantial interest
 from mainstream search engines. However, we are  still in the very
 early stages in the evolution of how search engines use this
 structured data. Most of the current work is focussed on searching
 databases of facts about entities and presenting them either alongside
 the search results, or on annotating search results with additional
 data. The core problems of utilizing knowledge about entities for
 improving the ranking of documents, helping set the user context, etc.
 are still largely   unexplored territories.

 While the use of structured data is still limited in Web search
 engines, active research in this direction can be observed in many
 communities. Most notably, there is a broad range of solutions
 proposed by IR, database, and Semantic Web researchers for exploiting
 structured data for various search tasks. The goal of this Workshop is
 to bring these communities together to focus on the central question
 of how to make these solutions applicable to Web search engines. The
 central theme of the workshop is to explore new and novel ways for
 exploiting explicit representations of entities and the relationships
 between them to improve Web search.

 Important Dates

 Workshop proceedings will be published through the ACM Digital
 Library, with associated tight production deadlines:

  * February 23rd 2013: Workshop paper deadline
  * March 13th 2013: Workshop paper notifications
  * April 2nd 2013: Workshop paper final copy
  * WWW2013 Conference: May 13-17th 2013, Rio de Janeiro, Brazil
  * Workshop day: May 13th 2013.

  Topics

 Three main directions of semantic search have emerged. The first is
 the use of structured data to augment traditional web search results
 and the search results page. The second is to use the structured data
 to directly deliver results to search requests and to answer
 questions. The third is to use knowledge about a domain to affect
 ranking of results. This workshop targets all these directions of
 semantic document retrieval and semantic data retrieval but puts
 special emphasis on the web search context. Possible topics for
 submission include, but are not limited to:

   * Structured data for Web document retrieval
   * Entity/relation aware document and query models
   * Entity/relation aware matching and ranking
   * Use of structured data for building vertical search engines
   * Web data retrieval
   * Searching structured data with textual queries
   * Novel applications of structured data to augment search results
   * Evaluation methodologies

  Submissions

 Workshop papers should be submitted by Feb 23rd 2013 using EasyChair
 (we are 'dish2013' there), see
 https://www.easychair.org/conferences/?conf=dish2013

 The organizers can be contacted at dish-workshop
 organiz...@googlegroups.com in case of technical issues with the
 submission process. Due to the tight schedule, please don't ask for
 extensions!

 Organization

 We invite posters and papers of max. 6 pages presenting new ideas
 for how structured data can be used in search, preferably with working
 demos. Accepted papers will be published as part of the International
 Conference Proceedings Series (ICPS) of the ACM Digital Library.

 We plan to accept 8-12 papers and organize a full day event, roughly
 half devoted to each of the two approaches. Each session will have a
 significant time set aside for discussion. There will also be a poster
 session. Attendance will be open to the public (via WWW2013 Workshops
 registration http://www2013.org/registration/). We plan to have one
 or two invited talks.

 Advisory Board

 * Krisztian Balog, NTNU, Norway
 * Charlie Jiang, Bing, USA
 * Steve Macbeth, Bing, USA
 * Pavel Serdyukov, Yandex, Russia
 * Alexander Shubin, Yandex, Russia
 * Arjen P. de Vries, Delft University of Technology, Holland

   Program Committee (in progress - awaiting confirmations)

 * Vineet Gupta, Google, USA
 * Alon Halevy, Google, USA

Linked Data RDFa

2013-01-18 Thread Dan Brickley

With RDFa maturing (RDFa 1.1, particularly Lite), I wanted to ask here
about attitudes to RDFa.

I have acquired the impression somehow that in the Linked Data scene,
people lean more towards the classic 'a doc for the humans, another
for the machines' partitioning model. Perhaps this is just a
consequence of history; digging around some old rdfweb/foaf
discussions[1] I realise just how far we've come. RDFa wasn't an
option for a long time; but it is now.

So - questions. How much of the linked data cloud is expressed in some
variant HTML+RDFa alongside RDF/XML, Turtle etc.? When/if you do so,
are you holding some data back and keeping it only in the
machine-oriented dumps, or including it in the RDFa? Are you finding
it hard to generate RDFa from triple datasets because it's 'supposed'
to be intermingled with human text? What identifiers (if any) are you
assigning to real-world entities? Dataset maintainers ... as you look
to the future is RDFa in your planning? Did/does Microdata confuse the
picture?

I'm curious where we are with this...

Dan






[1] http://lists.foaf-project.org/pipermail/foaf-dev/2000-September/004222.html
http://web.archive.org/web/20011123075822/http://rdfwebring.org/2000/09/rdfweblog/example.html

Re: Breaking news: GoodRelations now fully integrated with schema.org!

2012-11-08 Thread Dan Brickley

On 8 November 2012 22:43, Guha g...@google.com wrote:
 Thank you Martin for the great collaboration. Look forward to more.

 And on our side, it was really Dan Brickley who did the work. Thank you Dan.

Well in fact it was Cenk Gazen who did the hard and interesting work
on the schema.org side (and Martin of course for the epic editorial
work around GR). But a few words on the site-internal RDFa system now
in passing, as it is also progress in its own right:

This latest build of schema.org uses a different approach to previous
updates. Earlier versions (apart from health/medicine) were relatively
small, and could be hand coded. With Good Relations, the approach we
took was to use an import system that reads schema definitions
expressed in HTML+RDFa/RDFS and generates the site as an aggregation
of these 'layers'. In other words, schema.org is built by a system
that reads a collection of schema definitions expressed using W3C
standards. The public site is also now more standards-friendly, aiming
for 'Polyglot' HTML that works as HTML5 and XHTML, and you can find an
RDFa view of the overall schema at
http://schema.org/docs/schema_org_rdfa.html


I'm really happy to see Good Relations go live, and look forward to
catching up on the other contributions that are in the queue. The
approach will be to express each of these in HTML/RDFa/RDFS and make
some test sites on Appspot that show each proposal 'in place', and in
combination with other proposals. Since schemas tend to overlap in
coverage, this is really important for improving the quality and
integration of schema.org as we grow. While it took us a little while
to get this mechanism in place, I'm glad we now have this
standards-based machinery in place that will help us scale up the
collaboration around schema.org.

Thanks again to all involved,

Dan

Re: ANN: WebDataCommons.org - Offering 3.2 billion quads current RDFa, Microdata and Miroformat data extracted from 65.4 million websites

2012-04-17 Thread Dan Brickley

On 17 April 2012 18:56, Peter Mika pm...@yahoo-inc.com wrote:

 Hi Martin,

 It's not as simple as that, because PageRank is a probabilistic algorithm (it 
 includes random jumps between pages), and I wouldn't expect that wayfair.com 
 would include 2M links on a single page (that would be one very long webpage).

 But again to reiterate the point, search engines would want to make sure that 
 they index the main page more than they would want to index the detail pages.

 You can do a site query to get a rough estimate of the ranking without a 
 query string:

 search.yahoo.com/search?p=site%3Awayfair.com

 You will see that most of the pages are category pages. If you go to 2nd page 
 and onward you will see an estimate of 1900 pages indexed.

 Of course, I agree with you that a search engine focused on structured data, 
 especial if domain-specific, might want to reach all the pages and index all 
 the data. I'm just saying that current search engines don't, and CommonCrawl 
 is mostly trying to approximate them (if I understand correctly what they are 
 trying to do).


According to http://commoncrawl.org/faq/

What do you intend to do with the crawled content?
Our mission is to democratize access to web information by producing
and maintaining an open repository of web crawl data that is
universally accessible. We store the crawl data on Amazon’s S3
service, allowing it to be bulk downloaded as well as directly
accessed for map-reduce processing in EC2.

No mention of search as such. I'd imagine they're open to suggestions,
and that the project (and crawl) could take various paths as it
evolves. (With corresponding influence on the stats...).

Our problem here is in figuring out what can be taken from such stats
to help guide linked data vocabulary creation and management. Maybe
others will do deeper focussed crawls, who knows? But it's great to
see this focus on stats lately, I hope others have more to share.

Dan

Re: ANN: WebDataCommons.org - Offering 3.2 billion quads current RDFa, Microdata and Miroformat data extracted from 65.4 million websites

2012-04-17 Thread Dan Brickley

How about adding a disclaimer line to the webdatacommons.org site like

Note that the many database-backed sites contain a huge long tail of
rarely-visited, rarely-linked pages (e.g. product catalogues), but
which increasingly contain useful structured data. It is best not to
assume that this collection contains a complete, deep crawl of every
site it touches.

Dan

Re: See Other

2012-03-28 Thread Dan Brickley

On 28 March 2012 14:24, David Wood da...@3roundstones.com wrote:
 Hi Dan,

 On Mar 27, 2012, at 21:30, Dan Brickley wrote:

 On 27 March 2012 20:23, Melvin Carvalho melvincarva...@gmail.com wrote:

 I'm curious as to why this is difficult to explain.  Especially since I also
 have difficulties explaining the benefits of linked data.  However, normally
 the road block I hit is explaining why URIs are important.



 Alice: So, you want to share your in-house thesaurus in the Web as
 'Linked Data' in SKOS?

 Bob: Yup, I saw [inspirational materials] online and a few blog posts,
 it looks easy enough. We've exported it as RDF/XML SKOS already. Here,
 take a look...

 [data stick changes hands]

 Alice: Cool! And .. yup it's wellformed XML, and here see I parsed it
 with a real RDF parser (made by Dave Beckett who worked on the last
 W3C spec for this stuff, beats me actually checking it myself) and it
 didn't complain. So looks fine! Ok so we'll need to chunk this up
 somehow so there's one little record per term from your thesaurus, and
 links between them... ...and it's generally good to make human facing
 pages as well as machine-oriented RDF ones too.

 Bob and Alice can stop at this point, throw the RDF/XML at Callimachus, write 
 some templates in XHTML/RDFa and be done.  They would get themeable 
 human-oriented HTML, conneg for RDF/XML and Turtle, one URI per term, REST 
 API CRUD, management with user accounts...

Ok, ... up for a simple challenge then?

In http://schema.org/JobPosting we say that a job posting (likely
expressed in html + microdata or for that matter html + rdfa) can have
an occupationalCategory property, whose values are drawn from an
existing scheme, Category or categories describing the job. Use BLS
O*NET-SOC taxonomy: http://www.onetcenter.org/taxonomy.html. Ideally
includes textual label and formal code, with the property repeated for
each applicable value.

If you dig around on that link you can find PDF and XLS files at
http://www.onetcenter.org/reports/Taxonomy2010.html

So let's take http://www.onetcenter.org/dl_files/Taxonomy2010_AppA.xls
... it shows a table with pairs of codes and labels, and a kind of
implied hierarchy.

Say we wanted those in linked data (SKOS, most likely), ... how should
the pages and URIs look?

Can we do something better than point to .xls and .pdf files? what
advice would we give the administrators of that site, for publishing
(annual versions of...) their job taxonomy codes? How
would/could/should an actual job listing on a jobs site look? Would it
have a real hyperlink into the taxonomy site? Or just a textual
property? What kind of standard templates can be offered to make such
things less choice-filled? How would we do the same with, say, country
codes?

cheers,

Dan

Re: {Disarmed} Re: See Other

2012-03-28 Thread Dan Brickley

On 28 March 2012 14:28, Hugh Glaser h...@ecs.soton.ac.uk wrote:
 I can't find any apps (other than mine) that actually use this.

 Searching:
 Sindice:
 http://sindice.com/search?q=http://graph.facebook.com
 40 (forty) results
 Bing:
 http://www.bing.com/search?q=%22http://graph.facebook.com/%22
 8400 results

 I don't think this activity has actually set the world alight yet - people 
 are quite excited from what you call the Structured Data point of view, but 
 little or no Linked Data.
 And it has been around for a little while now.
 And my (unproven) hypothesis is that Sindice would be finding these links all 
 over the place if Facebook had been encouraged to do it differently.

 I'm not knocking it - you are right - it is really great they have done it.
 But I think we could have helped them do it better.

I doubt the issue is 'help'.

A structured data description of a network of hundreds of million
people, ... but without the links, ... is kinda missing something.

At which point we're deep in privacy and oauth etc territory; it
wouldn't be proper, appropriate or polite to dump the social graph
fully public anyhow.  But a social network dataset without the network
isn't going to set the afire with excitement.

Even with FOAF where we got pretty substantial social graph datasets
(livejournal, my opera etc) in public since 2004 or so, ... frankly
very few managed to find interesting uses of that huge bulk of data.
And not because it was in rdf/xml or because there were bnodes. It's
much much harder to make compelling, useful apps with this stuff than
it is to make proof of concept demos.

Dan

See Other

2012-03-27 Thread Dan Brickley

On 27 March 2012 20:23, Melvin Carvalho melvincarva...@gmail.com wrote:

 I'm curious as to why this is difficult to explain.  Especially since I also
 have difficulties explaining the benefits of linked data.  However, normally
 the road block I hit is explaining why URIs are important.



Alice: So, you want to share your in-house thesaurus in the Web as
'Linked Data' in SKOS?

Bob: Yup, I saw [inspirational materials] online and a few blog posts,
it looks easy enough. We've exported it as RDF/XML SKOS already. Here,
take a look...

[data stick changes hands]

Alice: Cool! And .. yup it's wellformed XML, and here see I parsed it
with a real RDF parser (made by Dave Beckett who worked on the last
W3C spec for this stuff, beats me actually checking it myself) and it
didn't complain. So looks fine! Ok so we'll need to chunk this up
somehow so there's one little record per term from your thesaurus, and
links between them... ...and it's generally good to make human facing
pages as well as machine-oriented RDF ones too.

Bob: Ok, so that'll be microformats no wait microdata ah yeah, RDFa,
right? Which version?

Alice: well RDFa yes, microdata is a kind of cousin, a mix of thinking
from microdata and microformats communities. But I meant that you'd
make a version of each page for computers to use (RDF/XML like your
test export here), ... and you'd make some kind of HTML page for more
human readers also. The stuff you mention is more about doing both
within the same format...

Bob: Great. Which one's the most standard?  What should I use?

Alice: Well I guess it depends what you mean by standard.
[skips digression about whatwg and w3c etc notions of standards process]
[skips digression about XHTML vs XML-ish polyglot HTML vs resolutely
non-XML HTML5 flavours]
[skips digression about qnames in HTML and RDFa 1.1 versus 1.0]

...you might care to look at using basic HTML5 document with say the
Lite version of RDFa 1.1 (which is pretty much finished but not an
official stable standard yet at W3C)

Bob: [makes a note]. Ok, but that's just the human-facing page,
anyway. We'd put up RDF/XML for machines too, right? Well maybe that's
not necessary I guess. I was reading something about GRDDL and XSLT
that automates the conversion, ... should we maybe generate the
RDF/XML from the HTML+RDFa or vice versa? or just have some php hack
generate both from MySQL since that's where the stuff ultimately lives
right now anyway...?

Alice: Um, well it's pretty much your choice. Do you need RDF/XML too?
Well. maybe, not sure... it depends. There are more RDF/XML
parsers around, they're more mature, ... but increasingly tools will
consume all kinds of data as RDF. So it might not matter. Depends why
you're doing this, really.

Bob: Er ok, maybe we ought to do both for now, ... belt-and-braces,
... maybe watch the stats and see what's being picked up? I'm doing
this because of promise of interestingly unexpected re-use and so on,
which makes details hard to predict by definition.

Alice: Sounds like a plan. Ok, so each node in your RDF graph, ...
we'll need to give it a URI. You know that's like the new word for
URL,
but that includes identifiers for real world things too.

Bob: Sure sure, I read that. Makes sense. And I can have a URI, my
homepage can have a URI, I'm not my home page blah-de-blah?

Alice: You got it.

Bob: Ok, so what URLs should I give the concepts in this thesaurus?
They've got all kinds of strings attached, but we've also got nicely
managed numeric IDs too.

Alice: Right so maybe something short (URls can never be too
short...), ... so maybe if you host at your example.org server,
http://example.com/demothes/c1  then same but /c2 /c3 etc.

... or well you could use #c1 or #c2 etc. That's pretty much up to
you. There are pros-and-cons in both directions.

Bob: whatever's easiest. It's a pretty plain apache2 setup, with php
if we want it, or we can batch create files if that makes more sense;
this data doesn't change much.

Alice: Well how big is the thesaurus...?

Bob: a couple thousand terms, each with a few relations and bits of
text; maybe more if we dig out the translations (humm should we
language negotiate those somehow?)

Alice: Let's talk about that another day, maybe?

Bob:  And hmm the translations are versioned a bit differently? Should
we put version numbers in somewhere so it's unambiguous which
version of the translation we're using?

Alice: Let's talk about that another day, too.

Bob: OK, where were we? http://example.com/demothes/c1 ... sure, that
sounds fine.

... we'd put some content negotiated apache thing there, and make c1
send HTML if there's a browser, or rdf/xml if they want that stuff
instead? Default to the browser / HTML version maybe?

Alice: something like that could work. There are some howtos around.
Oh, but if c1 isn't an information resource, you'll need to redirect
with a 303 HTTP code. It's like you said with people and homepages, to
make clear which is which.

Bob:

Re: Annotating IR of any relevance? (httpRange-14)

2012-03-26 Thread Dan Brickley

On 26 March 2012 08:51, Giovanni Tummarello
giovanni.tummare...@deri.org wrote:
 Is annotating IRs is of *any value practical and role today* ?

 Anything of value and core interest to  wikipedia, imdb, bestbuy, bbc,
 geonames, rottentomatoes, lastfm, facebook, whatever. is a  NIR.

 We are talking people, products

 Everything on the LOD cloud (for what it matters) its all NIR

 Even pictures, comments, and text are easiy seen and BEST INTERPRETED as NIR

 they're not just the bytes they're composed of, they're the full
 record of their creation, the concept of message.
 a facebook picture is a full record of content, comments, tags,
 multiple resolutions etc.
 The mere stream OF BYTES (the IR) IS JUST A DeTAIL that if it REALLY
 needs to be annotated, ... it can. no problem, with proper attributes
 hasResolution, hascopyright ok i guess that refers to a IR then.

I see where you're coming from here, but will be agnostic for now on
that point. Instead, I'd like to draw attention to the distressing
fact that we don't even seem as a community to be clear on what is
meant by IR? Is IR the mere stream of BYTES, .. or some (slightly)
higher abstraction. The OO picture of HTTP/REST I mentioned here
recently, for example, has the IR be the hidden-behind-service object
whose state we get authoritative samples of via HTTP messages.

Making a new http-range-14 agreement without having a common
terminology doesn't fill me with hope. Quite different notions of IR
are bouncing around here.

I tend to read 'IR' as something approximating 'Web-serializable
networked entity'; sounds like you're equating it more directly with
the content that is sent over the wire?

Dan

Re: Annotating IR of any relevance? (httpRange-14)

2012-03-26 Thread Dan Brickley

On 26 March 2012 13:06, Michael Hopwood mich...@editeur.org wrote:
 Hi Dan, Giovanni,

 Thank you for this dialogue - I've been following this thread (or trying to!) 
 for some days now and wondering where is the data model in all this?.

 At the point where Quite different notions of IR are bouncing around... 
 would it not make sense to focus on the fact that there are actually several 
 well-established, intricately worked-out and *open* standard models that 
 overlap at this domain, coming from different ends of the commerciality 
 spectrum, and themselves based on consensus, pre-existing (for example, 
 largely ISO) standards and solid database theory?

 I'm talking about CIDOC-CRM and Indecs, of course:

 www.cidoc-crm.org/

 http://www.doi.org/topics/indecs/indecs_framework_2000.pdf

 The fact that these 2 models, apparently quite different in domain, converge 
 on the event-based modelling approach, and both describe information 
 resources and other types of real world (it's fairly safe to say, all types) 
 resource in detail but without too much term bloat, would make them strong 
 contenders for a consensus definition - or at the very least, to point 
 towards the shape a consensus should take.

So I've been trying to drag FRBR into this conversation for some years
now, http://www.frbr.org/2005/07/05/dan-brickley-and-the-w3c

... but not because it (or Indecs, CRM etc., which also have their
charm) is good/better/best,

...rather to assert that different models, and levels of detail, make
sense in different contexts. Simple flat records have their place,
richer multi-entity structures have their place. If we can avoid the
Web architecture itself picking a winner amongst these different
ways of thinking about the results of content creation and publication
activities, so much the better.  The beauty of the Web architecture is
its minimalism and pluralism; the challenge here is to bring more
clarity to our discussion while preserving that. But I quite agree
that the terminologies from those models may help improve the quality
of debate here...

cheers,

Dan

Re: The Battle for Linked Data

2012-03-26 Thread Dan Brickley

On 26 March 2012 16:49, Hugh Glaser h...@ecs.soton.ac.uk wrote:
 So What is Linked Data?

I think this can be defused:

'Linked Data' is the use of the Web standards to share documents that
encode structured data, typically but not necessarily using a graph
data model.

Considerations --- It's important to be open and inclusive. It's
important to mention the webby graph data model without getting bogged
down in the detail (RDF? which version? which format? OWL too?). It's
important to mention standards. Sharing (intranets included!) is more
important than 'publishing', or 'public', though the latter should be
alluded to. If we stray too far from the graph data model and Web
standards like URIs, we lose interop; if we stray too far into nerdy
semweb rdf detail, we lose mainstream audience. It's a balance and
it's for the market not us to say where the sweet spot lies. And if we
start religiously starting to force modeling idioms on the world, we
lose credibility; no anti-bnode laws, or strictures about
http-range-14. Some things are best left unspoken! Fashions will come
and go; look at HTML frames and Flash splash screens. Good taste will
triumph, without the Linked Data slogan needing to encode all its
aspects.

'Linked Information', from a FOAFy perspective, is then the larger
let's share what we know perspective
(http://www.flickr.com/photos/danbri/4030764915/ etc) in which we
apply equal passion to the sharing of information that is in non-graph
data formats, or in people's heads. Doing so brings the graph data
model into a distinctively central role, since it can describe other
data formats (GML files, spreadsheets, MP3s, videos, mysql dumps...
RDF's original use case as metadata), and it can describe people and
their characteristics. So we can be pro-RDF here without forcing it
down people's throats... and we can be pro-data while admitting that
there's vastly more to Web-based information sharing than triples, and
more to 'sharing what we know' than sharing data.

cheers,

Dan

Re: The Battle for Linked Data

2012-03-26 Thread Dan Brickley

On 26 March 2012 19:16, Dan Brickley dan...@danbri.org wrote:
 On 26 March 2012 16:49, Hugh Glaser h...@ecs.soton.ac.uk wrote:
 So What is Linked Data?

 I think this can be defused:

 'Linked Data' is the use of the Web standards to share documents that
 encode structured data, typically but not necessarily using a graph
 data model.

Sorry, lost a bit.

'Linked Data' is the use of the Web and its standards to share documents that
encode structured data, typically but not necessarily using a graph
data model.

Re: The Battle for Linked Data

2012-03-26 Thread Dan Brickley

On 26 March 2012 20:13, Kingsley Idehen kide...@openlinksw.com wrote:
 On 3/26/12 2:16 PM, Dan Brickley wrote:

 I think this can be defused:

 'Linked Data' is the use of the Web standards to share documents that
 encode structured data, typically but not necessarily using a graph
 data model.


 TimBL's  Linked Data meme isn't about sharing, solely. What about whole
 data representation and the URI de-reference requirements?
 Ditto unambiguous URI based naming etc..

Sure, but we don't need to pack our entire shopping list into one slogan.

What's at the heart of it that gives a distinctive character? The Web,
... Web-like data models (graphs), and the pragmatic use of standards
to allow decentralised data to still be recombinable.

 I guess the problem is that Linked Data is quite generic when taken
 literally and like wise in the broader computer science realm discourse.

Yup, taking just the words alone, all kinds of thing could fit. We
have to find the middle ground between overly specific and pointlessly
vague. For me, that's something around the creative re-use of the
standard Web infrastructure to exchange and interlink simple factual
data expressed as graphs. Some might insist they're not just graphs,
but RDF graphs. Other that CSV and random XML is fine (not least
because it can be RDFized by consumers). But this is the territory
we're marching up and down on.

 Thus, we have to deal with the question of what moniker best applies to the
 title of TimBL's Linked Data meme [1] and the best practices that it
 espouses.  Maybe we'll end up referring to fine-grained structured data that
 adheres to said meme as *Hyperdata*. At the end of the day, that's a cleaner
 moniker anyway :-)


That's a good one too, yep.

Dan


 Links:

 1. http://www.w3.org/DesignIssues/LinkedData.html - original Linked Data
 meme .
 2. http://en.wikipedia.org/wiki/Hyperdata -- Wikipedia entry exists (it
 needs some cleaning up though).


 --

 Regards,

 Kingsley Idehen
 Founder  CEO
 OpenLink Software
 Company Web:http://www.openlinksw.com
 Personal Weblog:http://www.openlinksw.com/blog/~kidehen
 Twitter/Identi.ca handle: @kidehen
 Google+ Profile:https://plus.google.com/112399767740508618350/about
 LinkedIn Profile:http://www.linkedin.com/in/kidehen

Re: Change Proposal for HttpRange-14

2012-03-25 Thread Dan Brickley

On 25 March 2012 11:03, Michael Brunnbauer bru...@netestate.de wrote:

 Hello Jeni,

 On Sun, Mar 25, 2012 at 10:13:09AM +0100, Jeni Tennison wrote:
 I agree we shouldn't blame publishers who conflate IRs and NIRs. That is not 
 what happens at the moment. Therefore we need to change something.

 Do you think semantic web projects have been stopped because some purist
 involved did not see a way to bring httprange14 into agreement with the
 other intricacies of the project ? Those purists will still see the new
 options that the proposal offers as what they are: Suboptimal.

 Or do you think some purists have been actually blaming publishers ? [...]

http://go-to-hellman.blogspot.co.uk/2009/10/new-york-times-blunders-into-linked.html
comes close to doing so... though more around semantics of 'sameas'
than IR/NIR.

Dan

Re: Change Proposal for HttpRange-14

2012-03-25 Thread Dan Brickley

On 25 March 2012 20:26, Tim Berners-Lee ti...@w3.org wrote:

 On 2012-03 -24, at 00:47, Pat Hayes wrote:

 I am sympathetic, but...

 On Mar 23, 2012, at 9:59 AM, Dave Reynolds wrote:


 The proposal is that URI X denotes what the publisher of X says it denotes,
 whether it returns 200 or not.


 And what if the publisher simply does not say anything about what the URi
 denotes? After all, something like 99.999% of the URIs on the planet lack
 this information. What, if anything, can be concluded about what they
 denote? The http-range-14 rule provides an answer to this which seems
 reasonably intuitive. What would be your answer? Or do you think there
 should not be any 'default' rule in such cases?


 Exactly.
 For example, To take an arbitrary one of the trillions out there, what does
 http://www.gutenberg.org/catalog/world/readfile?fk_files=2372108pageno=11
  identify, there being no RDF in it?
 What can I possibly do with that URI if the publisher has not explicitly
 allowed me to use it
 to refer to the online book, under your proposal?


 Pat

Just to follow up on this specific example with the current actual details:

(aside: in my mailer I'm replying to TimBL but all the most recent
text seems attributed to Pat; maybe some mangling occured?)

I can't see a mechanical way to find this, but I happened to know
about 
http://www.gutenberg.org/wiki/Gutenberg:Feeds#The_Project_Gutenberg_Catalog_in_RDF.2FXML_Format

...which guides us to http://www.gutenberg.org/ebooks/2701.rdf and via
http 302 from there to
pThe document has moved a
href=http://www.gutenberg.org/cache/epub/2701/pg2701.rdf;here/a./p

it uses   xmlns:pgterms=http://www.gutenberg.org/2009/pgterms/; and
other vocabs to say, amongst other things,

  pgterms:ebook rdf:about=ebooks/2701
dcterms:creator rdf:resource=2009/agents/9/
dcterms:descriptionSee also Etext #2489, Etext #15, and a
computer-generated audio file, Etext #9147./dcterms:description
dcterms:hasFormat
rdf:resource=http://www.gutenberg.org/ebooks/2701.epub.noimages/
dcterms:hasFormat
rdf:resource=http://www.gutenberg.org/ebooks/2701.kindle.noimages/
dcterms:hasFormat
rdf:resource=http://www.gutenberg.org/ebooks/2701.plucker/
dcterms:hasFormat
rdf:resource=http://www.gutenberg.org/ebooks/2701.qioo/
dcterms:hasFormat
rdf:resource=http://www.gutenberg.org/ebooks/2701.txt.utf8/
dcterms:hasFormat
rdf:resource=http://www.gutenberg.org/files/2701/2701-h.zip/
dcterms:hasFormat
rdf:resource=http://www.gutenberg.org/files/2701/2701-h/2701-h.htm/
dcterms:hasFormat
rdf:resource=http://www.gutenberg.org/files/2701/2701.txt/
dcterms:hasFormat
rdf:resource=http://www.gutenberg.org/files/2701/2701.zip/
dcterms:issued
rdf:datatype=http://www.w3.org/2001/XMLSchema#date;2001-07-01/dcterms:issued
dcterms:language
rdf:datatype=http://purl.org/dc/terms/RFC4646;en/dcterms:language
dcterms:license rdf:resource=license/
dcterms:publisherProject Gutenberg/dcterms:publisher
dcterms:rightsPublic domain in the USA./dcterms:rights
dcterms:subject
  rdf:Description
dcam:memberOf rdf:resource=http://purl.org/dc/terms/LCSH/
rdf:valueAdventure stories/rdf:value
rdf:valueAhab, Captain (Fictitious character) -- Fiction/rdf:value
rdf:valueAllegories/rdf:value
rdf:valueEpic literature/rdf:value
rdf:valueSea stories/rdf:value
rdf:valueWhales -- Fiction/rdf:value
rdf:valueWhaling -- Fiction/rdf:value
  /rdf:Description
/dcterms:subject

  pgterms:agent rdf:about=2009/agents/9
pgterms:birthdate
rdf:datatype=http://www.w3.org/2001/XMLSchema#integer;1819/pgterms:birthdate
pgterms:deathdate
rdf:datatype=http://www.w3.org/2001/XMLSchema#integer;1891/pgterms:deathdate
pgterms:nameMelville, Herman/pgterms:name
pgterms:webpage
rdf:resource=http://en.wikipedia.org/wiki/Herman_Melville/
  /pgterms:agent



I found this by finding the item number 2701 from inspection of the
original link, and plugging it into the metadata template from their
human-oriented documentation . The RDF I found makes assertions about
various related URLs and things, but nothing that ties directly back
to the initial URL. Worse, we've not even any evidence that the RDF
doc and the other docs are in the same voice, same publisher, or
author etc.

Seems a great shame they went to the trouble of publishing quite a
rich description of this fine work, and yet it's not easy to find by
the machines that could make use of it.

Dan

Re: Change Proposal for HttpRange-14

2012-03-24 Thread Dan Brickley

2012/3/23 Melvin Carvalho melvincarva...@gmail.com:


 2012/3/23 Giovanni Tummarello giovanni.tummare...@deri.org

 2012/3/23 Sergio Fernández sergio.fernan...@fundacionctic.org:
  Do you really think that base your proposal on the usage on a Powder
  annotation is a good idea?
 
  Sorry, but IMHO HttpRange-14 is a good enough agreement.

 yup performed brilliantly so far, nothing to say. Industry is flocking
 to adoption, and what a consensus.


 +1

 'Brilliantly' is an understatement :)

 And we're probably still only towards the beginning of the adoption cycle!

 I dont think, even the wildest optimist, could have predicted the success of
 the current architecture (both pre and post HR14).

Oh dear, so now I don't know any more if Gio was being saracastic!

Linked Data is a brilliant success, despite the burden of http-range-14.

Is a SKOS Concept an Information Resource? Must its URIs 303 redirect?
Is a # pointing into an RDFa page OK? We don't make this stuff easy.

http-range-14 has long been an embarrassment. Just now all the critics
get invited to try to do a better job, which isn't as easy as it looks
:)

Dan

Re: Change Proposal for HttpRange-14

2012-03-24 Thread Dan Brickley

On 23 March 2012 14:33, Pat Hayes pha...@ihmc.us wrote:

 On Mar 23, 2012, at 8:52 AM, Jonathan A Rees wrote:

 I am a bit dismayed that nobody seems to be picking up on the point
 I've been hammering on (TimBL and others have also pointed it out),
 that, as shown by the Flickr and Jamendo examples, the real issue is
 not an IR/NIR type distinction, but rather a distinction in the
 *manner* in which a URI gets its meaning, via instantiation (of some
 generic IR) on the one hand, vs. description (of *any* resource,
 perhaps even an IR) on the other. The whole
 information-resource-as-type issue is a total red herring, perhaps the
 most destructive mistake made by the httpRange-14 resolution.

 +1000. There is no need for anyone to even talk about information 
 resources. The important point about http-range-14, which unfortunately it 
 itself does not make clear, is that the 200-level code is a signal that the 
 URI *denotes* whatever it *accesses* via the HTTP internet architecture. We 
 don't need to get into the metaphysics of HTTP in order to see that a book 
 (say) can't be accessed by HTTP, so if you want to denote it (the book) with 
 an IRI and stay in conformance with this rule, then you have to use something 
 other than a 200-level response.

Setting aside 
http://www.fastcompany.com/1754259/amazon-declares-the-e-book-era-has-arrived
('ebooks' will soon just be 'books', just as 'email' became 'mail'),
and slipping into general opinion here that's not particularly
directed at Pat.

I assume you're emphasising the physical notion of book. Perhaps
'person' is even more obviously physical (though heavily tattoo'd
people have some commonaliities with books).

The Web architecture that I first learned, was explained to me
(HTTP-NG WG era) in terms familiar from the Object Oriented style of
thinking about computing (and a minor religion at the time too).  The
idea is that the Web interface is a kind of encapsulation. External
parties don't get direct access to the insides, it's always mediated
by HTTP GET and other requests.

Just as in Java, you an expose an object's data internals directly, or
you get hide them behind getters and setters, same with Web content.
So a Web site might encapsulate a coffee machine, teapot or toaster; a
CSV file, SGML repository, perl script or whatever). That pattern
allowed the Web to get very big, very fast; you could wrap it around
anything.

In http://www.w3.org/TR/WD-HTTP-NG-interfaces/ we see a variant on
this view described, in which the hidden innards of a Web object are
constrained to be 'data'.
When we think of the Web today, the idea of a 'resource' comes to
mind. In general, a resource is an Object that has some methods (e.g.
in HTTP, Get Head and Post) that can be invoked on it. Objects may be
stateful in that they have some sort of opaque 'native data' that
influences their behavior. The nature of this native data is unknown
to the outside, unless the object explicitly makes it known somehow. 
(note, this is from the failed HTTP-NG initiative, not the
HTTP/webarch we currently enjoy)

So on this thinking, Dan's homepage is an item of Web content, that
is encapsulated inside the standard Web interface. It has http-based
getters and (potentially) setters, so you can ask for the default
bytestream rendering of it, or perhaps content-negotiate with
different getter and get a PDF, or a version in another language.

But on this OO-style of thinking about Web content, you *never get the
thing itself*. Only (possibly lossy, possibly on-the-fly generated)
serializations of it.

The notion of 'serialization' (also familiar to many coders) doesn't
get used much in discussing http-range-14, yes it seems to be very
close to our concerns here.

Perhaps all the different public serializations of my homepage are so
rich that they constitute full (potentially round-trippable)
serializations of the secret internal state. Or perhaps they're all
lossy, because enough internals are never actually sent out over the
wire. The Web design (as I understand/understood) it means that you'll
never 100% know what's on the inside. My homepage might be generated
by 1000 typing monkeys; or by pulling zeros and ones from filesystem,
or composed from a bunch of SQL database lookups. It might be
generated by different methods in 2010 to 2012; or from minute to
minute. All of this is my private webmasterly business: as far as the
rest of the world is concerned, it's all the same thing, ... my
homepage. I can move the internals from filesystem-based to wordpress
to mediawiki, and from provider to provider. I can choose to serve US
IP addresses from a mediawiki in Boston, and Japanese IP addresses
from a customised MoinMoin wiki in Tokyo. Why? That's my business! But
it's still my homepage. And you - the outside world - don't get to
know how it's made.

On that thinking, it might be sometimes useful to have clues as to
whether sufficient of the secret internals of some Web page could be
fully

Re: Change Proposal for HttpRange-14

2012-03-24 Thread Dan Brickley

On 24 March 2012 17:36, Dave Reynolds dave.e.reyno...@gmail.com wrote:

 However, the data is not always under our complete control and
 there is no universal agreement on what default fragment to use. Leaving us
 either having to maintain mapping tables or try multiple probes (when asked
 for U try U then try U#id then try ...). Not a fatal problem but
 certainly an inconvenience when managing large and complex stores.

Maybe we can come up with such a string? Something that isn't in
current use, yet isn't too ugly? Maybe something that looks nice in
UTF8 but obscure in ascii-fied form? I know well-known strings are
frowned upon, but ... it's tempting. Are there values that would be
legitimate as URI/IRI references, yet impossible to be HTML anchor
targets? (and therefore avoid clashes?)

 Problem 2: serialization

 With a convention of a single standard fragment then prefix notation in
 Turtle and qname notation in RDF/XML become unusable. You would have to have
 a separate prefix/namespace for each resource. In turtle you can just write
 out all URIs in full, inconvenient for not fatal. In RDF/XML you can do that
 for subjects/objects but not for properties (and not for classes if you want
 to use abbreviated syntax). Having to declare a new prefix for every
 property, and maybe every class, in a large ontology just so it can be
 serialized is a non-starter.

Good point. I'm mostly concerned with entity identification (people,
movies etc.) rather than vocabulary, since the publishers are
typically a bit less semweb-engaged. For entities, there's a bit less
need to use a prefixing notation afaik.

cheers,

Dan

Re: ANN: Sudoc bibliographic ans authority data

2011-07-11 Thread Dan Brickley

On 7 July 2011 23:17, Yann NICOLAS nico...@abes.fr wrote:
 Bonjour,

 Sudoc [1], the French academic union catalogue maintained by ABES [2], has
 just been released as linked open data.

 10 million bibliographic records are now available as RDF/XML.

 Examples for the Sudoc record whose internal id is 132133520 :
 . Resource URI : http://www.sudoc.fr/132133520/id
 . Generic document : http://www.sudoc.fr/132133520 (content negotiation is
 supported)
 . RDF/XML page : http://www.sudoc.fr/132133520.rdf
 . HTML pages with schema.org microdata [3] for search engines :
 http://www.sudoc.fr/132133520.html . The users are not supposed to visit
 these microdata pages : they are redirected to the standard UI :
 http://www.sudoc.abes.fr/xslt/DB=2.1/SRCH?IKT=12TRM=132133520

 Sudoc RDF data are linked to http://lexvo.org and http://dewey.info/ .

 They are also linked to IdRef [4], ie the Sudoc authority file that ABES
 considers as a separate and open application.
 2 million IdRef records are also available as RDF data (since October 2010).
 The links between Sudoc and IdRef are bidirectional.
 For example, http://www.sudoc.fr/110404416/id ( Rethinking symbolism by Dan
 Sperber ) links to D. Sperber's IdRef URI: http://www.idref.fr/027146030/id
 .
 But, in the other direction, http://www.idref.fr/027146030/id links to *all*
 the Sudoc documents that are linked to this authority.

 In next months, we hope to add more links to our data, to OCLC and BnF
 resources among others.

 More info (in French) here : http://punktokomo.abes.fr/

Congratulations, this is fantastic nes. And I think also a very timely
test-case for how community-maintained and consortium-based standards
(schema.org) can be deployed alongside each other.

Could you say a little more about the subject classification aspects
of this data? I don't know a lot about French cataloguing.  In the
sample URIs you give above, I find only Rameau. You mention also
Dewey.info, so I guess there's Dewey in there. And Rameau also has
some mappings to LCSH. Are there other schemes? e.g. I'm interested in
particular to find instance data for UDC and for Library of Congress
Classification (LCC), but also anything else that has a SKOS
expression.

Thanks for any more info,

cheers,

Dan

ps. some Gremlin examples follow (see
http://danbri.org/words/2011/05/10/675 ) ... it uses the Linked Data
Sail to pull in pages on demand from the Web, as you explore into the
graph.

g = new LinkedDataSailGraph(new MemoryStoreSailGraph())
i1 = g.v('http://www.sudoc.fr/132133520/id')

gremlin i1.out('dcterms:subject').out('skos:inScheme')
==v[http://stitch.cs.vu.nl/vocabularies/rameau/autorites_matieres]
==v[http://stitch.cs.vu.nl/vocabularies/rameau/autorites_matieres]
==v[http://stitch.cs.vu.nl/vocabularies/rameau/autorites_matieres]
==v[http://stitch.cs.vu.nl/vocabularies/rameau/autorites_matieres]
gremlin i1.out('dcterms:subject').out('skos:prefLabel')
==v[T?l?communications@fr]
==v[Th?ses et ?crits acad?miques@fr]
==v[Nouvelles technologies de l'information et de la communication@fr]
==v[Internet@fr]
==v[]


gremlin i2=g.v('http://www.sudoc.fr/110404416/id')
gremlin  i2.out('dcterms:subject').out('skos:inScheme')
==v[http://stitch.cs.vu.nl/vocabularies/rameau/autorites_matieres]
==v[http://stitch.cs.vu.nl/vocabularies/rameau/autorites_matieres]
gremlin
gremlin i2.out('dcterms:subject').out('skos:prefLabel')
==v[Signes et symboles@fr]
==v[Anthropologie@fr]





 [1] http://www.sudoc.abes.fr
 [2] http://www.abes.fr
 [3] Shame on us ;) (twice)
 [4] http://www.idref.fr

survey: who uses the triple foaf:name rdfs:subPropertyOf rdfs:label?

2010-11-12 Thread Dan Brickley

Dear all,

The FOAF RDFS/OWL document currently includes the triple

 foaf:name rdfs:subPropertyOf rdfs:label .

This is one of several things that OWL DL oriented tools (eg.
http://www.mygrid.org.uk/OWL/Validator) don't seem to like, since it
mixes application schemas with the W3C builtins.

So for now, pure fact-finding. I would like to know if anyone is
actively using this triple, eg. for Linked Data browsers. If we can
avoid this degenerating into a thread about the merits or otherwise of
description logic, I would be hugely grateful.

So -

1. do you have code / applications that checks to see if a property is
rdfs:subPropertyOf rdfs:label ?
2. do you have any scope to change this behaviour (eg. it's a web
service under your control, rather than shipping desktop software )
3. would you consider checking for ?x rdf:type foaf:LabelProperty or
other idioms instead (or rather, as well).
4. would you object if the triple foaf:name rdfs:subPropertyOf
rdfs:label  is removed from future version of the main FOAF RDFS/OWL
schema? (it could be linked elsewhere, mind)

Thanks in advance,

Dan

Re: Correct Usage of rdfs:idDefinedBy in Vocabulary Specifications with a Hash-based URI Pattern

2010-10-20 Thread Dan Brickley

On Thu, Sep 30, 2010 at 9:06 AM, Martin Hepp
martin.h...@ebusiness-unibw.org wrote:
 Dear all:

 We use rdfs:isDefinedBy in all of our vocabularies (*) for linking between
 the conceptual elements and their specification.

 Now, there is a subtle question:

 Let's assume we have an ontology with the main URI

        http://purl.org/vso/ns

 All conceptual elements are defined as hash fragment URIs (URI references),
 e.g.

        http://purl.org/vso/ns#Bike

 The ontology itself (the instance of owl:Ontology) has the URI

        http://purl.org/vso/ns#

 http://purl.org/vso/ns#  a owl:Ontology ;
    owl:imports http://purl.org/goodrelations/v1 ;
    dc:title VSO: The Vehicle Sales Ontology for Semantic Web-based
 E-Commerce@en .

 So we have two URIs for the ontology:

 1. http://purl.org/vso/ns# for the ontology as an abstract artefact
 2. http://purl.org/vso/ns for the syntactical representation of the ontology
 (its serialization)

 Shall the rdfs:isDefinedBy statements refer to #1 or #2 ?

 #1
 vso:Vehicle a owl:Class ;
    rdfs:subClassOf gr:ProductOrService ;
    rdfs:label Vehicle (gr:ProductOrService)@en ;
    rdfs:isDefinedBy http://purl.org/vso/ns# .  ===

#1 gets my vote...

(The isDefinedBy property originally had use cases in mind for
situations where the URI of the vocab couldn't be discovered in Webby
fashion through dererencing, eg. uuid: or urn: -based identifiers for
the terms or vocab). As it turned out, the world learned to live with
using http: everywhere, so that particular need faded somewhat :)

Dan

cheers,

Dan

 #2
 vso:Vehicle a owl:Class ;
    rdfs:subClassOf gr:ProductOrService ;
    rdfs:label Vehicle (gr:ProductOrService)@en ;
    rdfs:isDefinedBy http://purl.org/vso/ns .   ===


 I had assumed they shall refer to #1, but that caused some debate within our
 group ;-)

 Opinions?

 Best

 Martin

Re: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

2010-09-04 Thread Dan Brickley

On Thu, Sep 2, 2010 at 8:10 PM, Anja Jentzsch a...@anjeve.de wrote:
Hi all,

we are in the process of drawing the next version of the LOD cloud diagram.
This time it is likely to contain around 180 datasets altogether having a
size of around 20 billion RDF triples.

For drawing the next version of the LOD cloud, we have started to collect
meta-information about the datasets to be included on CKAN, a registry of
open data and content packages provided by the Open Knowledge Foundation.

The list of datasets about which we have already collected information is be
found here:

http://www4.wiwiss.fu-berlin.de/lodcloud/

In addition to basic meta-information about a dataset such as its size and
the number of links pointing at other datasets, we also collect additional
meta-information about the license of the dataset, alternative access
options like SPARQL endpoints or dataset dumps, and whether there exist a
voiD description of the dataset or a Semantic Web Sitemap.

So if your dataset is not listed yet and you want to have it included into
the next version of the LOD cloud, please add it to CKAN until next
Wednesday (September 8th, 2010).

Also, if we have collected wrong information about your dataset or if your
dataset is only partially described up till now, it would be great if you
could add the missing information.

Guidelines about how to add datasets to CKAN as well as about the tags that
we are using to annotate the datasets are found here:
http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation

We thank all contributors in advance for their input and help, which
hopefully will allow us to draw the next version of the LOD cloud as
accurate as possible.

This is great! Glad to see this being updated :)

One thing I would love in the next revision is for FOAF to also be
presented as a vocabulary, rather than as if it were itself a distinct
dataset. While there are databases that expose as FOAF (LiveJournal
etc.), and also a reasonable number of independently published 'FOAF
files', the technical core of FOAF is really the vocabulary and the
habit of linking things together. Having a FOAF 'blob' is great and
all, but it doesn't help people understand that FOAF is used as a
vocabulary by various of the other blobs too. And beyond FOAF, I'm
wondering how we can visually represent the use of eg. Music Ontology,
or Dublin Core, or Creative Commons vocabularies across different
regions of the cloud. Maybe (later :) someone could make a view where
each blob is a pie-chart showing which vocabularies it uses?

As a vocabulary manager, it is pretty hard to understand the costs and
benefits of possible changes to a widely deployed RDF vocabulary. I'm
sure I'm not alone in this; Tom (cc:'d) I expect would vouch the same
regarding the Dublin Core terms. So if there could be some view of the
new cloud diagram that showed us which blobs (er, datasets) used which
vocabulary (and which terms), that would be really wonderful. On the
Dublin Core side, it would be fascinating to see which datasets are
using http://purl.org/dc/elements/1.1/ and which are using
http://purl.org/dc/terms/ (and which are using both). Similarly with
FOAF, I'd like to understand common deployment patterns better. I
expect other vocab managers and dataset publishersare in a similar
situation, and would appreciate a map of the wider territory, so they
know how to fit in with trends and conventions, or what missing pieces
of vocabulary might need more work...

Thanks for any thoughts,

Dan

Re: Predicate for external links on dbpedialite.org?

2010-07-15 Thread Dan Brickley

On Thu, Jul 15, 2010 at 6:09 PM, Nicholas Humfrey
nicholas.humf...@bbc.co.uk wrote:
 Hello,

 I have added external links to dbpedialite, for example see Berlin:
 http://dbpedialite.org/things/3354

 Is there a better predicate to use than rdfs:seeAlso? I am not sure if it is
 correct because the link is just a random webpage, rather than an
 rdfs:Resource but not found anything better. Perhaps an openvocab subclass
 of rdfa:seeAlso?

If you're pointing at documents, you could use foaf:page (inverse of
foaf:topic) to say that those pages have (the city) Berlin as a topic.
Or if you're more confident, foaf:isPrimaryTopicOf (inverse of
foaf:primaryTopic).

Oh hey, in the 1/2 hour since I started drafted this reply, I see the
conversation has gone in this direction. Yeah it sounds like
foaf:topic or foaf:page fit. I don't particularly enjoy RDF vocabs
having inverses in them but for that matter we do have both directions
named in FOAF, so pick whatever suits your markup best. I lean towards
'topic' as the most intuitively named, but DBpedia uses 'page', which
might be worth bearing in mind...

Dan

Re: Solving Real Problems with Linked Data: Verifiable Network Identity Single Sign On

2010-07-11 Thread Dan Brickley

On Sun, Jul 11, 2010 at 7:05 PM, Kingsley Idehen kide...@openlinksw.com wrote:
 Q: What about OpenID?

 A: The WebID Protocol embraces and extends OpenID via the WebID + OpenID

That's an unfortunate turn of phrase. The intent I assume is to
suggest that there are ways in which the two approaches can be used
together, and ways in which they quite reasonably take differing
approaches. When they differ, it's through genuine and transparent
differences rather than industry mischief. The embrace and extend
phrase is rather too closely associated with cynical manipulation of
partial compatibility for commercial advantage. I suggest avoiding it
here!

From http://en.wikipedia.org/wiki/Embrace,_extend_and_extinguish

Embrace, extend and extinguish,[1] also known as Embrace, extend,
and exterminate,[2] is a phrase that the U.S. Department of Justice
found[3] was used internally by Microsoft[4] to describe its strategy
for entering product categories involving widely used standards,
extending those standards with proprietary capabilities, and then
using those differences to disadvantage its competitors. [...]
The strategy and phrase embrace and extend were first described
outside Microsoft in a 1996 New York Times article entitled Microsoft
Trying to Dominate the Internet,[5] in which writer John Markoff
said, Rather than merely embrace and extend the Internet, the
company's critics now fear, Microsoft intends to engulf it. The
phrase embrace and extend also appears in a facetious motivational
song by Microsoft employee Dean Ballard,[6] and in an interview of
Steve Ballmer by the New York Times.

I think we're doing something quite different here!

cheers,

Dan

Re: Subjects as Literals

2010-07-06 Thread Dan Brickley

On Tue, Jul 6, 2010 at 12:40 AM, Hugh Glaser h...@ecs.soton.ac.uk wrote:
 Hi Sampo.
 I venture in again...
 I have much enjoyed the interchanges, and they have illuminated a number of
 cultural differences for me, which have helped me understand why some people
 have disagree with things that seem clear to me.
 A particular problem in this realm has been characterised as
 S-P-O v. O-R-O and I suspect that this reflects a Semantic Web/Linked Data
 cultural difference, although the alignment will not be perfect.
 I see I am clearly in the latter camp.
 Some responses below.


imho RDF processing requires both perspectives, and neither is more
semwebby or linky than the other.

On a good day, we can believe what an RDF doc tells us. It does so in
terms of objects/things and their properties and relationships (o-r-o
i guess). On another day, we have larger collections of RDF to curate,
and need to keep track more carefully of who is claiming what about
these object properties; that's the provenance and quads perspective,
s-p-o. Note that the subject/predicate/object terminology comes from
the old MS spec which introduced reification in a ham-fisted attempt
to handle some of this trust-ish stuff, and that most simple data'
-oriented stuff uses SPARQL, the only W3C formal spec that covers
quads rather than triples. So I don't think the community splits
neatly into two on this, and that's probably for the best!

RDF processing, specs and tooling are about being able to jump in a
fluid and natural way between these two views of data; dipping down
into the 'view from one graph', or zooming out to see the bigger
picture of who says what. Neither is correct, and it is natural for
the terminology to change to capture the shifting emphasis. But until
we make this landscape clearer, people will be confused -- when is it
an attribute or property, and when is it a predicate?

cheers,

Dan

--
There are two kinds of people in the world, those who believe there
are two kinds of people in the world and those who don't. --Benchley

Re: RDF Extensibility

2010-07-06 Thread Dan Brickley

2010/7/6 Jiří Procházka oji...@gmail.com:
 On 07/06/2010 03:35 PM, Toby Inkster wrote:
 On Tue, 6 Jul 2010 14:03:19 +0200
 Michael Schneider schn...@fzi.de wrote:

 So, if

     :s lit :o .

 must not have a semantic meaning, what about

     lit rdf:type rdf:Property .

 ? As, according to what you say above, you are willing to allow for
 literals in subject position, this triple is fine for you
 syntactically. But what about its meaning? Would this also be
 officially defined to have no meaning?

 It would have a meaning. It would just be a false statement. The
 same as the following is a false statement:

       foaf:Person a rdf:Property .

 Why do you think so?
 I believe it is valid RDF and even valid under RDFS semantic extension.
 Maybe OWL says something about disjointness of RDF properties and classes
 URI can be many things.

It just so happens as a fact in the world, that the thing called
foaf:Person isn't a property. It's a class.

Some might argue that there are no things that are simultaneously RDF
classes and properties, but that doesn't matter for the FOAF case. The
RSS1 vocabulary btw tried to define something that was both,
rss1:image I think; but this was a backwards-compatibility hack.

cheers,

Dan

Re: Subjects as Literals

2010-07-06 Thread Dan Brickley

On Tue, Jul 6, 2010 at 11:17 PM, Pat Hayes pha...@ihmc.us wrote:
[...]
 This is
 the canonical way to find it's meaning, and is the initial procedure we
 should use to arbitrate between competing understandings of its meaning.

 Whoo, I doubt if that idea is going to fly. I sincerely hope not. Using
 that, how would you determine the meaning of the DC vocabulary?

It's also worth bearing in mind that Web sites get hacked from time to
time. W3C gets attacked regularly (but is pretty robust). The FOAF
servers were compromised a year or two back (but the xmlns.com site
was untouched). For a while, foaf-project.org was serving evil PHP and
ugly links, as was my own home page. This kind of mischief should be
kept in mind by anyone building a system that assumes you'll get
canonical meaning from an HTTP GET...

cheers,

Dan

Re: PRISM data on the LOD cloud?

2010-07-02 Thread Dan Brickley

On Fri, Jul 2, 2010 at 3:19 PM, Hammond, Tony t.hamm...@nature.com wrote:
 Hi Kingsley:

 Kill me with the PDF URL :-(

 I think we could have been a tad more gracious here. This kind of remark
 only serves to alienate the well intentioned.

 You know, it's not actually (yet) a crime to put out a PDF on the open Web.
 Yes, it may not be the most webby of document formats but it does have
 certain viabilities.

 Re your question:

 Where can I see GET the RDF/XML resource?

There's RDF/XML XMP hidden inside the file, talking of XMP. Presumably
Virtuoso has a sponger for it. Copied below as a reminder that rdf:Seq
will be very hard to delete from the Web, since most files that pass
through Adobe toolchain have it stuffed inside...

Dan

x:xmpmeta xmlns:x=adobe:ns:meta/ x:xmptk=Adobe XMP Core 4.1-c036
46.277092, Fri Feb 23 2007 14:16:18
   rdf:RDF xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#;
  rdf:Description rdf:about=
xmlns:dc=http://purl.org/dc/elements/1.1/;
 dc:formatapplication/postscript/dc:format
 dc:title
rdf:Alt
   rdf:li xml:lang=x-defaultPrint/rdf:li
/rdf:Alt
 /dc:title
  /rdf:Description
  rdf:Description rdf:about=
xmlns:xap=http://ns.adobe.com/xap/1.0/;
xmlns:xapGImg=http://ns.adobe.com/xap/1.0/g/img/;
 xap:CreatorToolAdobe Illustrator CS3/xap:CreatorTool
 xap:CreateDate2008-10-10T11:07:02-04:00/xap:CreateDate
 xap:ModifyDate2008-10-10T11:07:02-04:00/xap:ModifyDate
 xap:MetadataDate2008-10-10T11:07:02-04:00/xap:MetadataDate
 xap:Thumbnails
rdf:Alt
   rdf:li rdf:parseType=Resource
  xapGImg:width256/xapGImg:width
  xapGImg:height96/xapGImg:height
  xapGImg:formatJPEG/xapGImg:format
  xapGImg:image [ big pile of hex snipped ] /xapGImg:image
   /rdf:li
/rdf:Alt
 /xap:Thumbnails
  /rdf:Description
  rdf:Description rdf:about=
xmlns:xapMM=http://ns.adobe.com/xap/1.0/mm/;
xmlns:stRef=http://ns.adobe.com/xap/1.0/sType/ResourceRef#;
 
xapMM:DocumentIDuuid:ED37D99F4D98DD11B2AD92E8487485F8/xapMM:DocumentID
 
xapMM:InstanceIDuuid:EE37D99F4D98DD11B2AD92E8487485F8/xapMM:InstanceID
 xapMM:DerivedFrom rdf:parseType=Resource

stRef:instanceIDuuid:EC37D99F4D98DD11B2AD92E8487485F8/stRef:instanceID

stRef:documentIDuuid:EB37D99F4D98DD11B2AD92E8487485F8/stRef:documentID
 /xapMM:DerivedFrom
  /rdf:Description
  rdf:Description rdf:about=
xmlns:illustrator=http://ns.adobe.com/illustrator/1.0/;
 illustrator:StartupProfilePrint/illustrator:StartupProfile
  /rdf:Description
  rdf:Description rdf:about=
xmlns:xapTPg=http://ns.adobe.com/xap/1.0/t/pg/;
xmlns:stDim=http://ns.adobe.com/xap/1.0/sType/Dimensions#;
xmlns:xapG=http://ns.adobe.com/xap/1.0/g/;
 xapTPg:MaxPageSize rdf:parseType=Resource
stDim:w11.00/stDim:w
stDim:h8.50/stDim:h
stDim:unitInches/stDim:unit
 /xapTPg:MaxPageSize
 xapTPg:NPages1/xapTPg:NPages
 xapTPg:HasVisibleTransparencyFalse/xapTPg:HasVisibleTransparency
 xapTPg:HasVisibleOverprintFalse/xapTPg:HasVisibleOverprint
 xapTPg:PlateNames
rdf:Seq
   rdf:liCyan/rdf:li
   rdf:liMagenta/rdf:li
   rdf:liYellow/rdf:li
   rdf:liBlack/rdf:li
   rdf:liC=100 M=10 Y=0 K=0 1/rdf:li
/rdf:Seq
 /xapTPg:PlateNames
 xapTPg:SwatchGroups
rdf:Seq
   rdf:li rdf:parseType=Resource
  xapG:groupNameDefault Swatch Group/xapG:groupName
  xapG:groupType0/xapG:groupType
  xapG:Colorants
 rdf:Seq
rdf:li rdf:parseType=Resource
   xapG:swatchNameWhite/xapG:swatchName
   xapG:modeCMYK/xapG:mode
...etc etc (big file...)

Re: Subjects as Literals, [was Re: The Ordered List Ontology]

2010-07-02 Thread Dan Brickley

[snip]

This is the second time in a few hours that a thread has degenerated
into talk of accusations and insults.

I don't care who started it. Sometimes email just isn't the best way
to communicate. If people are feeling this way about an email
discussion, it might be worth the respective parties spending a few
minutes on the phone to try to smooth things over. Or not. I don't
care, really. But each of these mail messages is getting distributed
to several hundred readers. It would be good if we can find ways of
using that bandwidth to solve problems rather than get into fights.

Or maybe we should all just take a weekend break, mull things over for
a couple of days, and start fresh on monday? That's my plan anyhow...

cheers,

Dan

An RDF wishlist

2010-07-01 Thread Dan Brickley

(rejigged subject line)

On Thu, Jul 1, 2010 at 4:35 AM, Pat Hayes pha...@ihmc.us wrote:
 Pat, I wish you had been there.  ;)

 I have very mixed views on this, I have to say. Part of me wanted badly to
 be present. But after reading the results of the straw poll, part of me
 wants to completely forget about RDF,  never think about an ontology or a
 logic ever again, and go off and do something completely different, like art
 or philosophy.

I have mixed feelings about missing the workshop too. Having been
pushing this wheelbarrow uphill for far too long, it does seem a shame
to have missed such an event. On the other hand, it is hard to know
what to make of the workshop outcomes since the participants form an
unusually specialist subset of humanity, and the problem of what W3C
next does with its RDF standard such a small part of the larger
problem.

It's clear that many workshop participants were aware of the risk of
destabilizing the core technologies just as we are gaining some very
promising real-world traction. That was a relief to read. For those
who have invested time and money in helping us get this far, and who
had the resources to participate, this concern was probably enough to
motivate participation. It's clear also that participants were aware
of many of the little annoyances that bring friction and frustration
to those working with RDF. What I'm less sure of is how to represent
the perspective of those who have explored RDF and walked away. Over
the years, many bright people have investigated RDF enthusiastically,
and left disappointed. Those folk didn't come to the workshop, they
didn't write a position paper, and they probably don't particularly
care about its outcomes. But they're just the kind of people who will
need to enjoy using RDF if we are to succeed.

Is RDF hard to work with? I think the answer remains 'yes', but we
lack consensus on why. And it seems even somehow disloyal to admit it.
If I had to list reasons, I'd leave nits like 'subjects as literals'
pretty low down. Many of the reasons I think are anavoidable, and
intrinsic to the kind of technology and problems we're dealing with.
But there are also lots of areas for improvement. Most of these are
nothing to do with fixups to W3C standards documentation. And finally,
we can lesson the perception of pain by improving the other side:
getting more decent linked data out there, so the suffering people go
through is worth it.

Some reasons why RDF is annoying and hard (a mildly ordered list):

* RDF data is gappy, chaotic, full of unexpected extensions and
omissions - BY DESIGN
* RDF toolkits each offer different items from a large menu (syntaxes,
storage, inference facilities), so even when you're getting a lot, you
probably don't appreciate what you're getting and we have no common
checklist that help non-guru developers understand this.
* RDF toolkit / library immaturity; eg1. I wasted half a weekend
recently trying to find a decent Javascript system. eg2. I work in
Python using the popular rdflib library, whose half-finished SPARQL
support was recently removed and put into an 'extras' package; nobody
seems quite sure how well it works. The Ruby landscape remains messy
although the public-rdf-ruby list have recently been collaborating
actively to improve things. Broken old and abandoned code litters the
Web; good stuff remains on the bleeding edge and unpackaged. Great
ideas, code and algorithms remain trapped in a single implementation
language rather than transliterated to other widely deployed
languages. Almost every toolkit's SQL backend is represented
differently. Only a few serializers bother to prettify RDF/XML nicely,
despite there being opensource code out there that could easily be
copied.
* RDF is good for aggregation of externally managed data; managing
data *as* RDF comes with certain complexities since edit/delete
operations on a connected graph aren't as intuitive as on a closed
tree structure. If I delete a certain node from the graph, which
others should be cleaned up too? Named graphs help somewhat there but
good habits aren't yet understood, much less documented.
* As a community, we have some standards for documenting the atomic
terms in our vocabularies (ie. RDFS/OWL) but we tend to stop there,
and not to document the larger graph patterns that are needed to
really communicate using these structures, or the underlying use cases
that motivated them in the first place. We also don't do nearly enough
analytics and stats over the actual data out there to make it easier
to consume, and for publishers to gravitate towards existing idioms
rather than make up similar-but-different graph patterns that'll
confuse the landscape further.
* Our small community (we are outnumbered by Visual Basic enthusiasts,
let alone Javascripters) is fragmented and grumpy. OWL and Linked Data
enthusiasts too often talk and think disparagingly about each others'
work, or not-so-secretly wish the others would just go away and stop

Re: destabilizing core technologies: was Re: An RDF wishlist

2010-07-01 Thread Dan Brickley

Hi Patrick,

On Thu, Jul 1, 2010 at 11:39 AM, Patrick Durusau patr...@durusau.net wrote:
 Dan,

 Just a quick response to only one of the interesting points you raise:

 It's clear that many workshop participants were aware of the risk of
 destabilizing the core technologies just as we are gaining some very
 promising real-world traction. That was a relief to read. For those
 who have invested time and money in helping us get this far, and who
 had the resources to participate, this concern was probably enough to
 motivate participation.

 It might be helpful to recall that destabilizing the core technologies was
 exactly the approach that SGML took when its little annoyances
 [brought] friction and frustration to those working with [SGML]...

 There was ...promising real-world traction.

 I don't know what else to call the US Department of Defense mandating the
 use of SGML for defense contracts. That is certainly real-world and it
 seems hard to step on an economic map of the US without stepping in defense
 contracts of one sort or another.

Yes, you are right. It is fair and interesting to bring up this
analogy and associated history. SGML even got a namecheck in the
original announcement of the Web, see
http://groups.google.com/group/alt.hypertext/msg/395f282a67a1916c and
even today HTML is not yet re-cast in terms XML, much less SGML. Many
today are looking to JSON rather than XML, perhaps because of a lack
of courage/optimism amongst XMLs creators that saddled it with more
SGML heritage than it should now be carrying. These are all reasons
for chopping away more bravely at things we might otherwise be afraid
of breaking. But what if we chop so much the original is
unrecognisable? Is that so wrong? What if RDF's biggest adoption
burden is the openworld triples model?

 Clinging to decisions that seemed right at the time they were made is a real
 problem. It is only because we make decisions that we have the opportunity
 to look back and wish we had decided differently. That is called experience.
 If we don't learn from experience, well, there are other words to describe 
 that.

:)

So, I wouldn't object to a new RDF Core WG, to cleanups including eg.
'literals as subjects' in the core data model, or to see the formal
semantics modernised/simplified according to the latest wisdom of the
gurus.

I do object to the idea that proposed changes are the kinds of thing
that will make RDF significantly easier to deploy. The RDF family of
specs is already pretty layered. You can do a lot without ever using
or encountering rdf:Alt, or reification, or OWL DL reasoning, or RIF.
Or reading a W3C spec. The basic idea of triples is pretty simple and
even sometimes strangely attractive, however many things have been
piled on top. But simplicity is a complex thing! Having a simple data
model, even simple, easy to read specs, won't save RDF from being a
complex-to-use technology.

We have I think a reasonably simple data model. You can't take much
away from the triples story and be left with anything sharing RDF's
most attractive properties. The specs could be cleaner and more
accessible. But I know plenty of former RDF enthuasiasts who knew the
specs and the tech inside out, and still ultimately abandoned it all.
Making RDF simpler to use can't come just from simplifying the specs;
when you look at the core, and it's the core that's the problem, there
just isn't much left to throw out.

 Some of the audience for these postings will remember that the result of
 intransigence on the part of the SGML community was XML.

XML was a giant gamble. It's instructive to look back at what
happened, and to realise that we don't need a single answer (a single
gamble) here. Part of the problem I was getting at earlier was of
dangerously elevated expectations... the argument that *all* data in
the Web must be in RDF. We can remain fans of the triple model for
simple factual data, even while acknowledging there will be other
useful formats (XMLs, JSONs). Some of us can gamble on lets use RDF
for everything. Some can retreat to the original, noble and neglected
metadata use case, and use RDF to describe information, but leave the
payload in other formats; others (myself at least) might spend their
time trying to use triples as a way of getting people to share the
information that's inside their heads rather than inside their
computers.

 I am not advocating in favor of any specific changes. I am suggesting that
 clinging to prior decisions simply because they are prior decisions doesn't
 have a good track record. Learning from prior decisions, on the other hand,
 such as the reduced (in my opinion) feature set of XML, seems to have a
 better one. (Other examples left as an exercise for the reader.)

So, I think I'm holding an awkward position here:

* massive feature change (ie. not using triples, URIs etc); or rather
focus change: become a data sharing in the Web community not a
doing stuff with triples community
* cautious

Re: Show me the money - (was Subjects as Literals)

2010-07-01 Thread Dan Brickley

On Thu, Jul 1, 2010 at 5:38 PM, Jeremy Carroll jer...@topquadrant.com wrote:

 I am still not hearing any argument to justify the costs of literals as
 subjects

 I have loads and loads of code, both open source and commercial that assumes
 throughout that a node in a subject position is not a literal, and a node in
 a predicate position is a URI node.

 Of course, the correct thing to do is to allow all three node types in all
 three positions. (Well four if we take the graph name as well!)

 But if we make a change,  all of my code base will need to be checked for
 this issue.
 This costs my company maybe $100K (very roughly)
 No one has even showed me $1K of advantage for this change.

 It is a no brainer not to do the fix even if it is technically correct

Well said. Spend the money on a W3C-license javascript SPARQL engine,
or on fixing and documenting and test suiting what's out there
already. And whatever's left on rewriting it in Ruby, Scale, Lua ...

Better still, put the money up as a prize, then you only have to give
it to one party, while dozens of others will slave away for free in
pursuit of said loot ;)

Dan

Re: Show me the money - (was Subjects as Literals)

2010-07-01 Thread Dan Brickley

On Thu, Jul 1, 2010 at 6:29 PM, Sandro Hawke san...@w3.org wrote:
 On Thu, 2010-07-01 at 17:10 +0100, Nathan wrote:
 In all honesty, if this doesn't happen, I personally will have no choice
 but to move to N3 for the bulk of things, and hope for other
 serializations of N3 to come along.

 RIF (which became a W3C Recommendation last week) is N3, mutated (in
 some good ways and some bad ways, I suppose) by the community consensus
 process.   RIF is simultaneously the heir to N3 and a standard business
 rules format.

 RIF's central syntax is XML-based, but there's room for a presentation
 syntax that looks like N3.   RIF includes triples which can have
 literals as subject, of course.  (In RIF, these triples are called
 frames.   Well, sets of triples with a shared subject are called
 frames, technically.    But they are defined by the spec to be an
 extension of RDF triples.)

Excellent, so there's no need to mess with RDF itself for a while? We
can let RIF settle in for a couple years and see how it shapes up
against people's RDFCore 2.0 aspirations?

Dan

Re: Show me the money - (was Subjects as Literals)

2010-07-01 Thread Dan Brickley

(cc: list trimmed to LOD list.)

On Thu, Jul 1, 2010 at 7:05 PM, Kingsley Idehen kide...@openlinksw.com wrote:

 Cut long story short.

[-cut-]

 We have an EAV graph model, URIs, triples and a variety of data
 representation mechanisms. N3 is one of those, and its basically the
 foundation that bootstrapped the House of HTTP based Linked Data.

I have trouble believing that last point, so hopefully I am
misunderstanding your point.

Linked data in the public Web was bootstrapped using standard RDF,
serialized primarily in RDF/XML, and initially deployed mostly by
virtue of people enthusiastically publishing 'FOAF files' in the
(RDF)Web. These files, for better or worse, were overwhelmingly in
RDF/XML.

When TimBL wrote http://www.w3.org/DesignIssues/LinkedData.html in
2006 he used what is retrospectively known as Notation 2, not its
successor Notation 3.

Notation2[*] was an unstriped XML syntax ( see original in
http://web.archive.org/web/20061115043657/http://www.w3.org/DesignIssues/LinkedData.html
). That DesignIssues note was largely a response to the FOAF
deployment.
This linking system was very successful, forming a  growing social
network, and dominating, in 2006, the linked data available on the
web.

The LinkedData design note argued that (post RDFCore cleanup and
http-range discussions) we could now use URIs for non-Web things, and
that this would be easier than dealing with bNode-heavy data. Much of
the subsequent successes come from following that advice. Perhaps N3
played an educational role in showing that RDF had other
representations; but by then, SPARQL, NTriples etc were also around.
As was RDFa, http://xtech06.usefulinc.com/schedule/paper/58  ...

I have a hard time seeing N3 as the foundation that bootstrapped
things. Most of the substantial linked RDF in Web by 2006 was written
in RDF/XML, and by then the substantive issues around linking,
reference, aggregation, identification and linking etc were pretty
well understood. I don't dislike N3; it was a good technology testbed
and gave us the foundation for SPARQL's syntax, and for the Turtle
subset. But it's role outside our immediate community has been pretty
limited in my experience.

cheers,

Dan

[*] http://www.w3.org/DesignIssues/Syntax.html

Re: Show me the money - (was Subjects as Literals)

2010-07-01 Thread Dan Brickley

On Thu, Jul 1, 2010 at 11:35 PM, Kingsley Idehen kide...@openlinksw.com wrote:
 The sequence went something like this.

 TimBL Design Issues Note. and SPARQL emergence. Before that, RDF was
 simply
 in the dark ages.


 It's only simple if you weren't there :)

 You mean you didn't see me lurking in the dark? :-)

 Humor aside, pre Linked Data meme, RDF just wasn't making any tangible
 progress (adoption or comprehension wise) beyond the inner sanctums of the
 Semantic Web community, you know what I mean when I say that, right?

And all I'm saying is that it took a lot of work from a lot of people
(most of whom are on these lists) to get to that stage where it was
capable of breaking out. The state of RDF deployment, tooling,
concepts, specs and community in 2006 was a significant improvement on
what we had in, say 1999. The Linked Data push was a breakthrough, but
it didn't happen in a vacuum or overnight; neither did SPARQL...

cheers,

Dan

Re: The Ordered List Ontology

2010-06-30 Thread Dan Brickley

On Wed, Jun 30, 2010 at 6:34 PM, Pat Hayes pha...@ihmc.us wrote:

 On Jun 30, 2010, at 6:45 AM, Toby Inkster wrote:

 On Wed, 30 Jun 2010 10:54:20 +0100
 Dan Brickley dan...@danbri.org wrote:

 That said, i'm sure sameAs and differentIndividual (or however it is
 called) claims could probably make a mess, if added or removed...

 You can create some pretty awesome messes even without OWL:

        # An rdf:List that loops around...

        #mylist a rdf:List ;
                rdf:first #Alice ;
                rdf:next #mylist .

        # A looping, branching mess...

        #anotherlist a rdf:List ;
                rdf:first #anotherlist ;
                rdf:next #anotherlist .


 They might be messy, but they are *possible* structures using pointers,
 which is what the RDF vocabulary describes.  Its just about impossible to
 guarantee that messes can't happen when all you are doing is describing
 structures in an open-world setting. But I think the cure is to stop
 thinking that possible-messes are a problem to be solved. So, there is dung
 in the road. Walk round it.

Yes.

So this is a point that probably needs careful presentation to new
users of this technology. Educating people that they shouldn't believe
any random RDF they find in the Web, ... now that is pretty easy.
Still needs doing, but it shadows real world intuitions pretty well.

If in real life you think the Daily Mail is full of nonsense, then it
isn't a huge leap to treat RDFized representations of their claims
with similar skepticism (eg. see
http://data.totl.net/cancer_causes.rdf for a great list of Things The
Daily Mail Say Might Cause Cancer).

*However* it is going to be tough to persuade developers to treat a
basic data structure like List in the same way. Lists are the kinds of
thing we expect to be communicated perfectly or to get some low-level
error. A lot of developers will write RDF-consuming code that won't
anticipate errors. Hopefully supporting software libraries can take
some of the strain here...

cheers,

Dan

Re: ANNOUNCE: lod-announce list

2010-06-13 Thread Dan Brickley

On Sun, Jun 13, 2010 at 7:44 PM, Angelo Veltens
angelo.velt...@online.de wrote:
 Hi,

 Ian Davis schrieb:
 Hi all,

 Now we are getting a steady growth in the number of Linked Data sites,
 products and services I thought it was time to create a low-volume
 announce list for Linked Data related announcements so people can keep
 up to date without needing to wade through the LOD discussion.

 You can join the list at http://groups.google.com/group/lod-announce

 Sounds find, but is it possible to subscribe to the list without a google 
 account?

Yes. The Google Groups site doesn't make it particularly easy to find
from the lod-announce group homepage, but see

http://groups.google.com/support/bin/answer.py?answer=46606cbid=-o2vzb2h0iyxwsrc=cblev=index

Q: How do I subscribe to a group?
A: You can subscribe to a group through our web interface or via
email. To subscribe to a group through our web interface, simply log
in to your Google Account and visit the group of your choice. Then
click the Join this group link on the right-hand side of the page
under About this group. To subscribe to a group via email, send an
email to [groupname]+subscr...@googlegroups.com. For example, if you
wanted to join a group called google-friends, you'd send an email to
google-friends+subscr...@googlegroups.com

cheers,

Dan

Re: Organizations changing status

2010-06-08 Thread Dan Brickley

On Tue, Jun 8, 2010 at 12:17 PM, William Waites william.wai...@okfn.org wrote:
 On 10-06-07 23:03, Emmanouil Batsis (Manos) wrote:

 b) what happens when organizations change legal status?

 I'm not certain but I don't think this ever really
 happens. In practice the old organisation ceases to
 exist and a new one comes into being possibly with
 a period of overlap. They may share the same name
 and informally be referred to as the same but
 technically they are different organisations.

 I think this suggests two predicates that are not
 present in the ontology -- org:successor and
 org:predecessor

Here's a nice practical example: the Dublin Core Metadata Initiative.

http://purl.org/dc/aboutdcmi -
http://dublincore.org/DCMI.rdf

rdf:RDF
xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#;
xmlns:rdfs=http://www.w3.org/2000/01/rdf-schema#;
xmlns:dct=http://purl.org/dc/terms/;
xmlns:foaf=http://xmlns.com/foaf/0.1/; 
foaf:Organization rdf:about=http://purl.org/dc/aboutdcmi#DCMI;
   foaf:nameDublin Core Metadata Initiative/foaf:name
   foaf:nickDCMI/foaf:nick
   foaf:homepage rdf:resource=http://dublincore.org/; /
   rdfs:seeAlso rdf:resource=http://purl.org/dc/aboutdcmi; /
   dct:descriptionThe Dublin Core Metadata Initiative is an open
forum engaged in the development of interoperable online metadata
standards that support a broad range of purposes and business models.
DCMI's activities include consensus-driven working groups, global
conferences and workshops, standards liaison, and educational efforts
to promote widespread acceptance of metadata standards and
practices./dct:description
   dct:created1995-01-03/dct:created
   dct:subject
rdf:resource=http://id.loc.gov/authorities/sh96000740#concept/
   dct:subject
rdf:resource=http://id.loc.gov/authorities/sh98002267#concept/
/foaf:Organization
/rdf:RDF

There was a little discussion on this point: when was the Dublin Core
created as an organization? It began in 1995 but as an informal
internet-mediated community. In recent years this has increasingly
solidified until now there is a legal entity;
http://dublincore.org/about-us/  The Dublin Core Metadata Initiative
(DCMI) is an open organization, incorporated in Singapore as a public,
not-for-profit Company limited by Guarantee (registration number
200823602C), engaged in the development of interoperable metadata
standards that support a broad range of purposes and business models.

RDF doesn't natively handle the representation of changes over time.
In some contexts we'll want to talk as if there is a single thing that
existed since 1995. In some other contexts we'll want to be precise,
and talk of the legal entity in Singapore. RDF has the basics to allow
this kind of separation and folding together of perspectives, but in
everyday practice we don't yet do it very well, to be honest. I'd be
interested to see proposals for refining the Dublin Core's
self-description to include a more detailed picture using the Org:
vocab...

cheers,

Dan

Re: Organization types predicates vs classes

2010-06-08 Thread Dan Brickley

On Tue, Jun 8, 2010 at 12:21 PM, William Waites william.wai...@okfn.org wrote:
 On 10-06-08 04:27, Todd Vincent wrote:

 By adding OrganizationType to the Organization data model, you provide
 the ability to modify the type of organization and can then represent
 both (legal) entities and (legally unrecognized) organizations.

 :foo rdf:type SomeKindOfOrganisation .

 vs.

 :foo org:organisationType SomeKindOfOrganisation .

 I don't really see the need for an extra predicate
 with almost identical semantics to rdf:type. There
 is nothing stopping a subject from having more than
 one type.

Yes, exactly. The schema guarantees things will have multiple types.
The art is to know when to bother mentioning each type. Saying things
are an rdfs:Resource is rarely interesting. Saying they're a
foaf:Agent is also pretty bland and uninformative. The mid-level
classes around Organization are generally more interesting, and folk
using local / community-extended classes (foo:CultLikeOrganization
bar:SomePreciseSubClassOrg etc) probably ought to mention mid-level
classes too. Some day we'll get support for these distinctions from
the big RDF aggregators and from analysis of code, SPARQL queries etc,
so we know which terms are most likely to be understood.

BTW the syntax of RDFa (compared to RDF/XML) makes it easy and much
less ugly to mention extra types and relations. Mentioning a second
relationship in original syntax of RDF/XML is particularly verbose. In
RDFa we have space-separated lists of qualified names, which
significantly reduces the cost of mixing general (widely understood)
classes with precise (but more obscure) community extensions. This is
a pretty good thing :)

cheers,

Dan

Re: Organization ontology

2010-06-08 Thread Dan Brickley

On Tue, Jun 8, 2010 at 12:54 PM, Kingsley Idehen kide...@openlinksw.com wrote:
Peristeras, Vassilios wrote:

Hello all,
I have the feeling that we are (at least partly) reinventing the wheel
here. There have been several initiatives drafting generic models and
representations for organizations. Just two examples below [1][2] which
go back to 90ies. More generally, an in-depth look at design and data
patterns literature
could also help a lot. I have the feeling that others before this group
have defined concepts like organization, legal entity etc... We
could re-use their conceptual (or data or formal) models, instead of
starting the discussion from scratch. Best regards,
Vassilios

[1] http://www.aiai.ed.ac.uk/project/enterprise/enterprise/ontology.html
[2] http://www.eil.utoronto.ca/enterprise-modelling/tove/

Both of your links point to PDFs or Postscript docs.

Are there any actual ontology doc URLs?

The enterprise ontology page is HTML and describes availability as
The formal Ontolingua encoding of the Enterprise Ontology is held in
the Library of Ontologies maintained by Stanford University's
Knowledge Systems Lab (KSL).

http://www-ksl-svc.stanford.edu:5915/FRAME-EDITOR/UID-15908sid=ANONYMOUSuser-id=ALIEN

Last modified: Monday, 31 May 2010 sounds fresher than I expected.

There's LISP here:
http://www-ksl-svc.stanford.edu:5915/FRAME-EDITOR/UID-15901sid=ANONYMOUSuser-id=ALIEN#ENTERPRISE-ONTOLOGY

I guess there must be an OWL conversion tool around somewhere. I've
copied Mike Uschold who may have more to say on this...

cheers,

Dan

Re: Slideshare.net as Linked Data

2010-06-08 Thread Dan Brickley

On Mon, Jun 7, 2010 at 8:18 PM, Paul Groth pgr...@gmail.com wrote:
 Hi All,

 I've wrapped the Slideshare.net API to expose it as RDF. You can find a blog
 post about the service at [1] and the service itself at [2]. An interesting
 bit is how we deal with Slideshare's API limits by letting you use your own
 API key.

 It's still needs to be properly linked (i.e. point to other resources on the
 WoD) but we're working on it.

 [1] http://thinklinks.wordpress.com/2010/06/07/linking-slideshare-data/
 [2] http://linkeddata.few.vu.nl/slideshare/

Cool :) How does it relate to the RDFa they're embedding?

(There's definitely a role for value-adding, even for sites that embed
per-page RDF already...)

cheers,

Dan

 Let me know what you think,

 Thanks,
 Paul




 --
 Dr. Paul Groth (pgr...@few.vu.nl)
 http://www.few.vu.nl/~pgroth/
 Postdoc
 Knowledge Representation  Reasoning Group
 Artificial Intelligence Section
 Department of Computer Science
 VU University Amsterdam

Re: Why should we publish ordered collections or indexes as RDF?

2010-06-03 Thread Dan Brickley

2010/6/3 Haijie.Peng haijie.p...@gmail.com:
 [Apologies for cross-posting]

 Why should we publish ordered collections or indexes as RDF? is it necessary?

On the Web, very little is 'necessary'. But some things can be useful.
Indexes and summaries can help software prioritise, and allow larger
files to be loaded only when needed.

It depends what you mean by 'ordered collections' and 'indexes'. But
the reason for sitemap-style summaries is usually to help external
sites monitor the content of the Web better.

At http://www.sitemaps.org/ there is an explanation of the sitemaps
format which several crawlers use. I believe the Google crawler will
use it to help schedule activity on a site, and that -for example- it
can help if you want your RDF/FOAF or XFN documents to be indexed
byGoogle's Social Graph API - http://code.google.com/apis/socialgraph/

There is also a version of this format called Semantic Sitemaps, but
http://sw.deri.org/2007/07/sitemapextension/ is offline right now.

In other cases, RSS feeds (also Atom) do the same thing, and provide a
'What's new' feed for a site, letting everyone know which documents
are new or updated, so that they can be (re-)indexed.

For large collections of documents, it is useful sometimes to have
smaller summary documents so that the bigger files can be fetched only
when they are needed. Mobile apps that care about bandwidth are an
example scenario there.

Regarding Linked Data, what we do there is to link descriptions
together. Each partial description often links to other documents that
are about the same real-world thing. This addresses some of the same
needs as a top level index or catalogue, because you can retrieve
different levels of detail from different sites. So my small FOAF file
is in some ways a top level entry (index?) for me, and it might
point to larger files (eg. twitter or flickr datasets) that are
maintained separately. RDF aggregator sItes like sindice.com can be
used to link these together, even if the top level file does not
contain links to every other file that mentions me. So in that
scenario, it is not 100% necessary for the small file to be an index
to the large files. The data can be linked together later if common
identifiers are used in each data set.

Hope this helps. Can you say more about the specific situation you have in mind?

cheers,

Dan

Re: Organization ontology

2010-06-03 Thread Dan Brickley

On Thu, Jun 3, 2010 at 8:47 AM, Stuart A. Yeates syea...@gmail.com wrote:
 On Wed, Jun 2, 2010 at 8:09 PM, Dave Reynolds
 dave.e.reyno...@googlemail.com wrote:
 On Wed, 2010-06-02 at 17:06 +1200, Stuart A. Yeates wrote:
 On Tue, Jun 1, 2010 at 7:50 PM, Dave Reynolds
 dave.e.reyno...@googlemail.com wrote:
  We would like to announce the availability of an ontology for description 
  of
  organizational structures including government organizations.
 
  This was motivated by the needs of the data.gov.uk project. After some
  checking we were unable to find an existing ontology that precisely met 
  our
  needs and so developed this generic core, intended to be extensible to
  particular domains of use.
 
  [1] http://www.epimorphics.com/public/vocabulary/org.html

 I think this is great, but I'm a little worried that a number of
 Western (and specifically Westminister) assumptions may have been
 built into it.

 Interesting. We tried to keep the ontology reasonably neutral, that's
 why, for example, there is no notion of a Government or Corporation.

 Could you say a little more about the specific Western  Westminster
 assumptions that you feel are built into it?

 (*) that structure is relatively static with sharp transitions between states.

This simplification pretty much comes 'out of the box' with the use of
RDF or other simple logics (SQL too). Nothing we do here deals in a
very fluid manner with an ever-changing, subtle and complex world. But
still SQL and increasingly RDF can be useful tools, and used carefully
I don't think they're instruments of western cultural imperialism.

I don't find anything particularly troublesome about the org: vocab on
this front. If you really want to critique culturally-loaded
ontologies, I'd go find one that declares class hierarchies with terms
like 'Terrorist' without giving any operational definitions...

 (*) that an organisation has a single structure rather than a set of
 structures depending on the operations you are concerned with
 (finance, governance, authority, criminal justice, ...)

Couldn't the subOrganizationOf construct be used to allow these
different aspects be described and then grouped loosly together?

 (*) that the structures are intended to be as they are, rather than
 being steps towards some kind of Platonic ideal

I'm not getting that from the docs. For example, We felt that the
best approach was to develop a small, generic, reusable core ontology
for organizational information and then let developers extend and
specialize it to particular domains. ...suggests a hope for
incremental refinement / improvement, but also a hope that the basic
pieces are likely to map onto multiple parties situations at a higher
level. Bit of both there, but no Plato.

 ...
 Modelling the crime organisations (the mafia, drug runners, Enron,
 identity crime syndicates) may also be helpful in exposing
 assumptions, particularly those in mapping the real-world to legal
 entities.

I agree these are interesting areas to attempt to describe, but
dealing with situations where obfuscation, secrecy and complexity are
core business is a tough stress-test of any model. Ontology-style
modeling works best when there is a shared conceptualisation of what's
going on; even many direct participants in these complex crime
situations lack that. So I'd suggest for those situations taking a
more evidence-based social networks approach; instead of saying
here's their org chart, build things up from raw data of who emails
who, who knows who, who met who, where and when (or who claimed that
they did), etc. RDF is ok for that task too. Those techniques are also
useful when understanding how more legitimate organizations really
function, but (as mentioned w.r.t. accountability) it can largely be
broken out as a separate descriptive problem.

 Alternatively, this may help in defining the subset of organisations
 that you're trying to model.

Yup

 Control is a different issue from organizational structure. This
 ontology is not designed to support reasoning about authority and
 governance models. There are Enterprise Ontologies that explicitly model
 authority, accountability and empowerment flows and it would be possible
 to create a generic one which bolted alongside org but org is not such a
 beast :)

 I suspect I may have mis-understood the subset of problems you're
 trying to solve. A statement such as the above in the ontology
 document might save others making the same mistake.

Perhaps the scope is organizations in which there is some ideal that
all participants can share a common explicit understanding of (the
basics of) how things work - who does roughly what, and what the main
aggregations of activity are.  Companies, clubs, societies, public
sector bodies etc. Sure there will be old-boy networks, secret
handshakes and all kinds of undocumented channels, but those are
understood as routing-around the main tranparent shared picture of how
the organization works (or should work).

Re: Organization ontology

2010-06-03 Thread Dan Brickley

On Thu, Jun 3, 2010 at 3:07 PM, William Waites william.wai...@okfn.org wrote:
On 10-06-03 09:01, Dan Brickley wrote:
I don't find anything particularly troublesome about the org: vocab on
this front. If you really want to critique culturally-loaded
ontologies, I'd go find one that declares class hierarchies with terms
like 'Terrorist' without giving any operational definitions...

I must admit when I looked at the org vocabulary I had a feeling
that there were some assumptions buried in it but discarded a
couple of draft emails trying to articulate it.

I think it stems from org:FormalOrganization being a thing that is
legally recognized and org:OrganizationalUnit (btw, any
particular reason for using the North American spelling here?)

Re spelling - fair question. I think there are good reasons. British
spelling accepts both. FOAF, which was made largely in Bristol UK but
with international participants, has used 'Z' spelling for nearly a
decade, http://xmlns.com/foaf/spec/#term_Organization ... as far as I
know without any complaints. I'm really happy to see this detailed
work happen and hope to nudge FOAF a little too, perhaps finding a
common form of words to define the shared general Org class.

It would be pretty unfortunate to have foaf:Organization and
org:Organisation; much worse imho than the camel-case vs underscore
differences that show up within and between vocabularies. Z seems the
pragmatic choice.

I don't know much about English usage outside the UK and the northern
Americas, but I find 'z' is generally accepted in the UK, whereas in
the US, 's' is seen as a mistake. This seems supported by whoever
wrote this bit of wikipedia,
http://en.wikipedia.org/wiki/American_and_British_English_spelling_differences#-ise.2C_-ize_.28-isation.2C_-ization.29

American spelling accepts only -ize endings in most cases, such as
organize, realize, and recognize.[53] British usage accepts both -ize
and -ise (organize/organise, realize/realise,
recognize/recognise).[53] British English using -ize is known as
Oxford spelling, and is used in publications of the Oxford University
Press, most notably the Oxford English Dictionary, as well as other
authoritative British sources.

being an entity that is not recognised outside of the FormalOrg

Organisations can become recognised in some circumstances
despite never having solicited outside recognition from a state --
this might happen in a court proceeding after some collective
wrongdoing. Conversely you might have something that can
behave like a kind of organisation, e.g. a class in a class-action
lawsuit without the internal structure present it most organisations.

Yes. In FOAF we have a class foaf:Project but it is not quite clear
how best to characteri[sz]e it. In purely FOAF oriented scenarios, I
believe it is hardly ever used (although humm stats below seem to
contradict that). However, the pretty successful DOAP project
('description of a project') has made extensive use of a subclass,
doap:Project in describing open source collaborative projects. These
have something of the character of an organization, but are usually on
the bazaar end of the cathedral/bazzar spectrum.

Are some but not all projects also organizations? etc. discuss :)

See also http://xmlns.com/foaf/spec/#term_Project
http://trac.usefulinc.com/doap

http://sindice.com/search?q=foaf:project+qt=term

Search results for terms “foaf:project ”, found about 13.0 thousand
(sindice seems to require downcasing for some reason)

http://sindice.com/search?q=doap:project+qt=term
Search results for terms “doap:project ”, found about 8.41 thousand

(I haven't time to dig into those results, probably the queries could
be tuned better to filter out some misleading matches)

Is a state an Organisation?

It would be great to link if possible to FAO's Geopolitical ontology
here, see http://en.wikipedia.org/wiki/Geopolitical_ontology ... this
for example has a model for groupings that geo-political entities
belong to (I'm handwaving a bit here on the detail). It also has a
class Organization btw, as well as extensive mappings to different
coding systems.

Organisational units can often be semi-autonomous (e.g. legally
recognised) subsidiaries of a parent or holding company. What
about quangos or crown-corporations (e.g. corporations owned
by the state). They have legal recognition but are really like
subsidiaries or units.

As an aside, I would like to have a way of representing boards of
directors, to update the old (theyrule-derrived) FOAFCorp data and
schema. Ancient page here: http://rdfweb.org/foafcorp/intro.html
schema http://xmlns.com/foaf/corp/

Some types of legally recognised organisations don't have a
distinct legal personality, e.g. a partnership or unincorporated
association so they cannot be said to have rights and responsibilities,
rather the members have joint (or joint and several) rights and
responsibilities. This may seem like splitting hairs but from

Re: UK Govt RDF Data Sets

2010-04-25 Thread Dan Brickley

On Sun, Apr 25, 2010 at 8:02 PM, Kingsley Idehen kide...@openlinksw.com wrote:
 Jeni Tennison wrote:

 Kingsley,

 On 15 Apr 2010, at 23:19, Kingsley Idehen wrote:

 Do you have any idea as to the whereabouts of RDF data sets for the
 SPARQL endpoints associated with data.gov.uk?
[...]
 One thing I haven't been able to reconcile (in my head repeatedly) re. the
 above.

 If data provenance is the key concern behind the RDF dump releases, doesn't
 the same issue apply to CONSTRUCTs or DESCRIBE style crawls against the
 published endpoints? Basically, the very pattern exhibited by some user
 agents that hit the DBpedia endpoint (as per the DBpedia Endpoint Burden
 post).
hes
 What makes a SPARQL endpoint safer than an RDF dump in this regard?

For what it's worth, I've encountered very similar attitudes over the
years in other environments. A good example is the digital library
world; both regarding access to digital collections and online access
to OPAC data, it was quite common to see Z39.50 search protocol access
to the full collection, but accompanied by a rather cautious
reluctance to also offer a simple data dump of the entire thing.
Pointing out that you could do this via repeated Z39.50 searches was
rarely helpful, and seemed more likely to encourage the search
interface to be restricted than for data dumps to be made available.
But hey, times are changing! I think it's just a matter of time...

cheers,

Dan

Fwd: backronym proposal: Universal Resource Linker

2010-04-18 Thread Dan Brickley

So - I'm serious. The term 'URI' has never really worked as something
most Web users encounter and understand.

For RDF, SemWeb and linked data efforts, this is a problem as our data
model is built around URIs.

If 'URL' can be brought back from limbo as a credible technical term,
and rebranded around the concept of 'linkage', I think it'll go a long
way towards explaining what we're up to with RDF.

Thoughts?

Dan


-- Forwarded message --
From: Dan Brickley dan...@danbri.org
Date: Sun, Apr 18, 2010 at 11:52 AM
Subject: backronym proposal: Universal Resource Linker
To: u...@w3.org
Cc: Tim Berners-Lee ti...@w3.org


I'll keep this short. The official term for Web identifiers, URI,
isn't widely known or understood. The I18N-friendly variant IRI
confuses many (are we all supposed to migrate to use it; or just in
our specs?), while the most widely used, understood and (for many)
easiest to pronounce, 'URL' (for Uniform Resource Locator) has been
relegated to 'archaic form' status. At the slightest provocation this
community dissapears down the rathole of URI-versus-URN, and until
this all settles down we are left with an uncomfortable disconnect
between how those in-the-know talk about Web identifiers, and those
many others who merely use it.

As of yesterday, I've been asked but what is a URI? one too many
times. I propose a simple-minded fix: restore 'URL' as the most
general term for Web identifiers, and re-interpret 'URL' as Universal
Resource Linker. Most people won't care, but if they investigate,
they'll find out about the re-naming. This approach avoids URN vs URI
kinds of distinction, scores 2 out of 3 for use of intelligible words,
and is equally appropriate to classic browser/HTML, SemWeb and other
technical uses. What's not to like? The Web is all about links, and
urls are how we make them...

cheers,

Dan

Re: Fwd: backronym proposal: Universal Resource Linker

2010-04-18 Thread Dan Brickley

On Sun, Apr 18, 2010 at 3:42 PM, Nathan nat...@webr3.org wrote:
 Wonder what would happen if we just called them Links?

I think that would confuse people. And would put stress just on the
point where SemWeb and HTML notions of link diverge.

An HTML page can have two (hyper-)links, a href=/contactus/contact
us/a in the header, and a href=/contactus/contacts/a in the
footer. Each of those chunks of markup is what we informally call a
link; the relative URI reference inside the href attribute in both
cases is what makes it possible for the link to be useful. I'm saying
that http://example.com/contactus/ should be called a 'universal
resource linker' instead of 'uniform resource locator'. Using
'universal resource link' for that instead has a different grammatical
role and could confuse since the page has two links (the bits that go
blue in your browser usually), but they both point to the same
URI/URL.

 Seems to be pretty unambiguous, if I say Link to TimBL or my Mum they
 both know what I mean, and it appears to produce the desired mental
 picture when used.

There are two usages at least with link; 'pass me the link' versus
'click on the link'; the latter emphasises the occurance as being the
link.

 Link, short for HyperLink - Link as in Linked Data.

 Keep the URI/URL/IRI for those who need to know the exact syntax of a Link.

So when the RDF perspective comes in, so do subtly different notions
of link. This is why I think framing 'link' as a countable thing will
lead to confusion. RDF links are a bit like relationships; so a
href=http://bob.example.com/; rel=xfn:coworker xfn:buddyBob/a is
a link expressing two relationships, er, links. If you poke to hard at
the magic word link it kinda crumbles a bit. But it remains
incredible evocative and at the heart of both the Web and the SemWeb.
Linker is non-commital enough that allows a family of related
readings; where the markup describes a pre-existing link/relationship
(eg. co-worker), and where markup itself is the link we're interested
in.

If you check back to Timbl's original diagram in
http://www.w3.org/History/1989/proposal.html the different flavours of
'link' were in there from the start; 'wrote' and 'refers to' for
example; the former links a person to a document; the later connects
documents. So the linking story here is that identifiers for people
and documents can share a notation, and become linkable. What exactly
a link is, on the other hand, I think will always be a little bit
slippery.

cheers,

Dan

Re: backronym proposal: Universal Resource Linker

2010-04-18 Thread Dan Brickley

On Sun, Apr 18, 2010 at 7:40 PM, Ian Davis m...@iandavis.com wrote:
 When talking to people who aren't semweb engineers then i use
 URL/URI/link interchangeably. I don't think it matters because the 1%
 that care will look it all up and get the distinction and the rest
 will just get on and use RDF as shown.

Yeah, I find myself slipping between the two in the same sentence
sometimes, even written or spoken.

I don't think it really super matters which we use, but the confusion
is costly and pointless.

At the Augmented Reality Dev Camp here in Amsterdam yesterday, one of
the comments was

http://twitter.com/garciacity/status/12339906312
So what is an URI? mentioned by steven pemberton and hans
overbeek #ardevcamp 

This is perfectly reasonable question from an educated and technical
audience member, and a perfectly avoidable one. I mean no disrespect
to either of the fine speakers, or the audience member; the mess is
not of their making. RDFa and Linked Data were presented to a mixed
audience, some coders, some artists, game designers, augmented
reality, mapping folk etc... a real big mix.; and I think it went over
well, but this silly issue of URI/URL is a bug worth fixing. We should
be able to say URL unapologetically, correctly and without fear of
contradiction. It's a fine acronym; it just has the wrong expansion.
Easily fixed, since most people (as you say) won't even bother to look
it up.

My suggestion is that we flip things upside down. Too often URL
comes across a being a kind of double-taboo (it's the old, incorrect
name  and it's (to URN-advocates) the crappy, lower quality form
of linking, prone to breakage, 404 etc). People who use URL often do
it in a sort of self-deprecating way; they know they should probably
say URI or perhaps IRI; or maybe they really mean URI Reference
or is that IRI Reference to be really inclusive and modern? [And are
they called URI schemes now, or IRI schemes? I truly have no idea.]

So let's pull URL out from the bottom of the pile, reinstate it at the
top, and rework the acronym to remove the most troublesome part
Locator. By flipping that to something link-centric, we re-emphasise
the core value of the Web, and turn the conversation away from
pointless ratholes like names/IDs vs addresses/locations to
something potentially *much* more productive: different types of
URL-based linking.

 * the whole mess around 'UR*' makes it hard for even technically
aware observers to talk clearly
 * we don't have an actively used top term in the tech scene for all
those identifying strings (URIs, URI Refs, IRIs, IRI Refs)
 * the deprecated nature of 'URL' means we don't reward people for
using it; we make them feel dumber instead of smarter. We say URL?
yeah kinda, you probably really ought to say URI but don't worry, you
nearly got it instead of Yeah, URLs - universal resource linkers -
it's all about linking; if you understand URLs you understand the core
idea behind the Web (and the Web of data, ... and the Web of things,
...)

There was a fuss a while back when the HTML5 spec was using URL
instead of URI; however that was without the proposed
reconceptualisation here. I'd hate to stir up a fuss, but I think we
have a lot of interesting ingredients:

* the term 'URL' isn't being used in a technical sense currently - I
consider it available for careful redeployment
* many of us are already using it informally as an overarching
umbrella term ('cos we know it works)
* it has massive market-presence and is understood pretty well by the public
* we really badly need an umbrella term that hides the URI vs IRI vs
*RI-Ref distinction from normal humans
* 'universal resource linker' is loose and evocative enough to do the
job, and makes people feel smarter not dumber...

cheers,

Dan

Re: UK Govt RDF Data Sets

2010-04-16 Thread Dan Brickley

On Fri, Apr 16, 2010 at 12:53 AM, Ian Davis li...@iandavis.com wrote:
 Kingsley,

 You should address your question directly to the project organisers,
 we're a technology provider and host some of the data but it is not up
 to us when or where the dumps get shared. My understanding is that
 because this is officially sanctioned data they want to ensure that
 the provenance is built into the datasets properly. My hope and wish
 is that the commitment to making dumps available will be built into
 the guidelines the UK Government are working on. But those won't be
 issued during this month because of the election.

Re their provenance requirements, do you know if the right people are
already engaged with the W3C Incubator on this topic; see various
links fwd'd in 
http://lists.foaf-project.org/pipermail/foaf-dev/2010-April/010164.html

It would be very interesting if someone were prepared to digitally
sign the files; or at least to publish checksums on a trusted Web page
in RDFa. Lots of options that could be explored. BTW I get the
impression that similar concerns can be found in the library community
too, when publishing SKOS and wanting to make sure that extensions and
addons mixed into the data later are not mis-attributed to the
original source.

cheers,

Dan

Re: twitter's annotation and metadata

2010-04-16 Thread Dan Brickley

+cc: Ed Summers

On Fri, Apr 16, 2010 at 11:37 AM, Chris Sizemore
chris.sizem...@bbc.co.uk wrote:
 the main problem is gonna be the cognitive dissonance over whether a tweet
 is an information or non-information resource and how many URIs are needed
 to fully rep a tweet...
 so, who's gonna volunteer to publish the linked data version of Twitter
 data, a la db/wiki[pedia] ...

Based on 
http://blogs.loc.gov/loc/2010/04/how-tweet-it-is-library-acquires-entire-twitter-archive/
it looks like the Library of Congress might be taking on that job. And
on the strength of the LCSH RDF work, it might even be feasible...

Dan

Re: DBpedia hosting burden

2010-04-15 Thread Dan Brickley

On Wed, Apr 14, 2010 at 11:50 PM, Daniel Koller dakol...@googlemail.com wrote:
 Dan,
 ...I just setup some torrent files containing the current english and german
 dbpedia content: (.. as a test/proof of concept, was just curious to see how
 fast a network effect via p2p networks).
 To try, go to http://dakoller.net/dbpedia_torrents/dbpedia_torrents.html.
 I presume to get it working you need just the first people downloading (and
 keep spreading it around w/ their Torrent-Clients)... as long as the
 *.torrent-files are consistent. (layout of the link page courtesy of the
 dbpedia-people)

Thanks! OK, let's see if my laptop has enough disk space left ;)
could you post an 'ls -l' too, so we have an idea of the file sizes?

Transmission.app on OSX says Downloading from 1 or 1 peers now (for
a few of them), and from 0 of 0 peers for others. Perhaps you have
some limits/queue in place?

Now this is where my grip on the protocol is weak --- I'm behind NAT
currently, and I forget how this works - can other peers find my
machine via your public seeder?

I'll try this on an ubuntu box too. Would be nice if someone could
join with a single simple script...

cheers,

Dan
I was working my way down the list in
http://dakoller.net/dbpedia_torrents/dbpedia_torrents.html
although when I got to Raw Infobox Property Definitions the first two
links 404'd...

Re: DBpedia hosting burden

2010-04-15 Thread Dan Brickley

On Thu, Apr 15, 2010 at 9:57 PM, Kingsley Idehen kide...@openlinksw.com wrote:
 Ian Davis wrote:

 When you use the term: SPARQL Mirror (note: Leigh's comments yesterday re.
 not orienting towards this), you open up a different set of issues. I don't
 want to revisit SPARQL and SPARQL extensions debate etc.. Esp. as Virtuoso's
 SPARQL extensions are integral part of what makes the DBpedia SPARQL
 endpoint viable, amongst other things.

Having the same dataset available via different implementations of
SPARQL can only be healthy. If certain extensions are necessary, this
will only highlight their importance. If there are public services
offering SPARQL-based access to the DBpedia datasets (or subsets) out
there on the Web, it would be rather useful if we could have them
linked from a single easy to find page, along with information about
any restrictions, quirks, subsetting, or value-adding features special
to that service. I suggest using a section in
http://en.wikipedia.org/wiki/DBpedia for this, unless someone cares to
handle that on dbpedia.org.

 The burden issue is basically veering away from the key points, which are:

 1. Use the DBpedia instance properly
 2. When the instance enforces restrictions, understand that this is a
 Virtuoso *feature* not a bug or server shortcoming.

Yes, the showcase implementation needs to be used properly if it is
going to survive the increasing attention developer LOD is getting. It
is perfectly reasonable of you to make clear when there are limits
they are for everyone's benefit.

 Beyond the dbpedia.org instance, there are other locations for:

 1. Data Sets
 2. SPARQL endpoints (like yours and a few others, where functionality
 mirroring isn't an expectation).

Is there a list somewhere of related SPARQL endpoints? (also other
Wikipedia-derrived datasets in RDF)

 Descriptor Resource vhandling ia mirrors, BitTorrents, Reverse Proxies,
 Cache directives, and some 303 heuristics etc.. Are the real issues of
 interest.

(am chatting with Daniel Koller in Skype now re the BitTorrent experiments...)

 Note: I can send wild SPARQL CONSTRUCTs, DESCRIBES, and HTTP GETs for
 Resource Descriptors to a zillion mirrors (maybe next year's April Fool's
 joke re. beauty of Linked Data crawling) and it will only make broaden the
 scope of my dysfunctional behavior. The behavior itself has to be handled
 (one or a zillion mirrors).

Sure. But on balance, more mirrors rather than fewer should benefit
everyone, particularly if 'good behaviour' is documented and
enforced...

 Anyway, we will publish our guide for working with DBpedia very soon. I
 believe this will add immense clarity to this matter.

Great!

cheers,

Dan

Re: DBpedia hosting burden

2010-04-15 Thread Dan Brickley

On Thu, Apr 15, 2010 at 9:57 PM, Kingsley Idehen kide...@openlinksw.com wrote:
 Ian Davis wrote:

 When you use the term: SPARQL Mirror (note: Leigh's comments yesterday re.
 not orienting towards this), you open up a different set of issues. I don't
 want to revisit SPARQL and SPARQL extensions debate etc.. Esp. as Virtuoso's
 SPARQL extensions are integral part of what makes the DBpedia SPARQL
 endpoint viable, amongst other things.

Having the same dataset available via different implementations of
SPARQL can only be healthy. If certain extensions are necessary, this
will only highlight their importance. If there are public services
offering SPARQL-based access to the DBpedia datasets (or subsets) out
there on the Web, it would be rather useful if we could have them
linked from a single easy to find page, along with information about
any restrictions, quirks, subsetting, or value-adding features special
to that service. I suggest using a section in
http://en.wikipedia.org/wiki/DBpedia for this, unless someone cares to
handle that on dbpedia.org.

 The burden issue is basically veering away from the key points, which are:

 1. Use the DBpedia instance properly
 2. When the instance enforces restrictions, understand that this is a
 Virtuoso *feature* not a bug or server shortcoming.

Yes, the showcase implementation needs to be used properly if it is
going to survive the increasing attention developer LOD is getting. It
is perfectly reasonable of you to make clear when there are limits
they are for everyone's benefit.

 Beyond the dbpedia.org instance, there are other locations for:

 1. Data Sets
 2. SPARQL endpoints (like yours and a few others, where functionality
 mirroring isn't an expectation).

Is there a list somewhere of related SPARQL endpoints? (also other
Wikipedia-derrived datasets in RDF)

 Descriptor Resource vhandling ia mirrors, BitTorrents, Reverse Proxies,
 Cache directives, and some 303 heuristics etc.. Are the real issues of
 interest.

(am chatting with Daniel Koller in Skype now re the BitTorrent experiments...)

 Note: I can send wild SPARQL CONSTRUCTs, DESCRIBES, and HTTP GETs for
 Resource Descriptors to a zillion mirrors (maybe next year's April Fool's
 joke re. beauty of Linked Data crawling) and it will only make broaden the
 scope of my dysfunctional behavior. The behavior itself has to be handled
 (one or a zillion mirrors).

Sure. But on balance, more mirrors rather than fewer should benefit
everyone, particularly if 'good behaviour' is documented and
enforced...

 Anyway, we will publish our guide for working with DBpedia very soon. I
 believe this will add immense clarity to this matter.

Great!

cheers,

Dan

Re: DBpedia hosting burden

2010-04-14 Thread Dan Brickley

On Wed, Apr 14, 2010 at 8:11 PM, Kingsley Idehen kide...@openlinksw.com wrote:

Some have cleaned up their act for sure.

Problem is, there are others doing the same thing, who then complain about
the instance in very generic fashion.

They're lucky it exists at all. I'd refer them to this Louis CK sketch
-
http://videosift.com/video/Louie-CK-on-Conan-Oct-1st-2008?fromdupe=We-live-in-an-amazing-amazing-world-and-we-complain
(if it stays online...).

While it is a
shame to say 'no' to people trying to use linked data, this would be
more saying 'yes, but not like that...'.

I think we have an outstanding blog post / technical note about the DBpedia
instance that hasn't been published (possibly due to the 3.5 and
DBpedia-Live work we are doing), said note will cover how to work with the
instance etc..
[..]
We do have a solution in mind, basically, we are going to have a different
place for the descriptor resources and redirect crawlers there via 303's
etc..
[...]
We'll get the guide out.

That sounds useful

As you mention, DBpedia is an important and central resource, thanks
both to the work of the Wikipedia community, and those in the DBpedia
project who enrich and make available all that information. It's
therefore important that the SemWeb / Linked Data community takes care
to remember that these things don't come for free, that bills need
paying and that de-referencing is a privilege not a right.

Bills the major operative word in a world where the Bill Payer and
Database Maintainer is a footnote (at best) re. perception of what
constitutes the DBpedia Project.

Yes, I'm sure some are thoughtless and take it for granted; but also
that others are well aware of the burdens.

(For that matter, I'm not myself so sure how Wikipedia cover their
costs or what their longer-term plan is...).

For us, the most important thing is perspective. DBpedia is another space on
a public network, thus it can't magically rewrite the underlying physics of
wide area networking where access is open to the world. Thus, we can make a
note about proper behavior and explain how we protect the instance such that
everyone has a chance of using it (rather than a select few resource
guzzlers).

This I think is something others can help with, when presenting LOD
and related concepts: to encourage good habits that spread the cost of
keeping this great dataset globally available. So all those making
slides, tutorials, blog posts or software tools have a role to play
here.

Are there any scenarios around eg. BitTorrent that could be explored?
What if each of the static files in http://dbpedia.org/sitemap.xml
were available as torrents (or magnet: URIs)?

When we set up the Descriptor Resource host, these would certainly be
considered.

Ok, let's take care to explore that then; it would probably help
others too. There must be dozens of companies and research
organizations who could put some bandwidth resources into this, if
only there was a short guide to setting up a GUI-less bittorrent tool
and configuring it appropriately. Are there any bittorrent experts on
these mailing lists who could suggest next practical steps here (not
necessarily dbpedia-specific)?

(ah I see a reply from Ivan; copying it in here...)

If I were The Emperor of LOD I'd ask all grand dukes of datasources to
put fresh dumps at some torrent with control of UL/DL ratio :) For
reason I can't understand this idea is proposed few times per year but
never tried.

I suspect BitTorrent is in some ways somehow 'taboo' technology, since
it is most famous for being used to distributed materials that
copyright-owners often don't want distributed. I have no detailed idea
how torrent files are made, how trackers work, etc. I started poking
around magnet: a bit recently but haven't got a sense for how solid
that work is yet. Could a simple Wiki page be used for sharing
torrents? (plus published hash of files elsewhere for integrity
checks). What would it take to get started?

Perhaps if http://wiki.dbpedia.org/Downloads35 had the sha1 for each
download published (rdfa?), then others could experiment with torrents
and downloaders could cross-check against an authoritative description
of the file from dbpedia?

I realise that would
only address part of the problem/cost, but it's a widely used
technology for distributing large files; can we bend it to our needs?

Also, we encourage use of gzip over HTTP :-)

Are there any RDF toolkits in need of a patch to their default setup
in this regard? Tutorials that need fixing, etc?

cheers,

Dan

ps. re big datasets, Library of Congress apparently are going to have
complete twitter archive - see
http://twitter.com/librarycongress/status/12172217971 -
http://blogs.loc.gov/loc/2010/04/how-tweet-it-is-library-acquires-entire-twitter-archive/

XMP RDF extractors?

2010-04-13 Thread Dan Brickley

On Tue, Apr 13, 2010 at 3:56 PM, Leigh Dodds leigh.do...@talis.com wrote:
 Hi,

 Yes.

 PDF: http://patterns.dataincubator.org/book/linked-data-patterns.pdf
 EPUB: http://patterns.dataincubator.org/book/linked-data-patterns.epub

Something of a tangent but this reminds me, what's the latest on RDF
extractors for Adobe XMP? I always used to use 'strings' and a regex
but I haven't tracked the spec and have found this trick working
*less* well over time, not better.

strings linked-data-patterns.pdf | grep -i xmp
 id=W5M0MpCehiHzreSzNTczkc9d?x:xmpmeta xmlns:x=adobe:ns:meta/
rdf:Description xmlns:xmp=http://ns.adobe.com/xap/1.0/; rdf:about=
xmp:CreateDate2010-04-12T23:01:36+01:00/xmp:CreateDate
/x:xmpmeta?xpacket end=r?

By contrast, downloading the .epub file and unzipping you find this in
content.opf:

?xml version=1.0 encoding=utf-8 standalone=no?
package xmlns=http://www.idpf.org/2007/opf; version=2.0
unique-identifier=bookid
  metadata
dc:identifier xmlns:dc=http://purl.org/dc/elements/1.1/;
id=bookid_id2880071/dc:identifier
dc:title xmlns:dc=http://purl.org/dc/elements/1.1/;Linked Data
Patterns/dc:title
dc:creator xmlns:dc=http://purl.org/dc/elements/1.1/;
xmlns:opf=http://www.idpf.org/2007/opf; opf:file-as=Dodds,
LeighLeigh Dodds/dc:creator
dc:creator xmlns:dc=http://purl.org/dc/elements/1.1/;
xmlns:opf=http://www.idpf.org/2007/opf; opf:file-as=Davis, IanIan
Davis/dc:creator
dc:description xmlns:dc=http://purl.org/dc/elements/1.1/;This
book lives at http://patterns.dataincubator.org. Check that website
for the latest version. This work is licenced under the Creative
Commons Attribution 2.0 UK: England amp; Wales License. To view a
copy of this licence, visit
http://creativecommons.org/licenses/by/2.0/uk/. Thanks to members of
the Linked Data mailing list for their feedback and input, and Sean
Hannan for contributing some CSS to style the online
book./dc:description
dc:language xmlns:dc=http://purl.org/dc/elements/1.1/;en/dc:language
  /metadata
  manifest
item id=ncxtoc media-type=application/x-dtbncx+xml href=toc.ncx/
item id=htmltoc media-type=application/xhtml+xml href=bk01-toc.html/
item id=id2880071 href=index.html media-type=application/xhtml+xml/

Wouldn't it be nice if there were easy conventions for books about RDF
to have Webby linked RDF bundled in the files? Both seem nearly there
but not quite... (this not a complaint Leigh, I love this work btw!)

cheers,

Dan


ps. re epub see also
http://lists.w3.org/Archives/Public/public-lod/2010Jan/0121.html

Re: XMP RDF extractors?

2010-04-13 Thread Dan Brickley

On Tue, Apr 13, 2010 at 6:31 PM, Pierre-Antoine Champin
swlists-040...@champin.net wrote:
 Even more tangent, but when I read in detail the XMP spec last year (in
 relation to the Media Annotation WG), I came to two conclusions:

 - XMP specifies RDF at the level of the XML serialization, which is
 *ugly* (emphasis on *ugly*). Furthermore, it makes it unsafe to use
 standard RDF/XML serializers, as those may not enforce those syntactic
 constraints.

 - XMP interprets RDF/XML in a non-standard way, considering the two
 following tags as non equivalent
  ns1:bar    xmlns=http://example.com/foo;...
  ns2:foobar xmlns=http://example.com/;...
 (which is again, a syntax-only perspective). So it is not safe to use
 standard RDF/XML parsers, as they will produce a model which may be
 inconsistent with other XMP parsers.

 So you can neither use standard serializers nor standard parsers to
 handle XMP's RDF safely, so as far as I'm concerned, XMP is not really
 RDF -- and Dan's problems to extract it strengthen this opinion of mine...

 That being said, the risks of inconsistency are minimal, especially for
 parsing. So I guess there is some value in pretending XMP is RDF ;)
 and using an RDF parser to extract it...

I think we can and should be generous to Adobe here; there were
supportive of RDF since the late '90s - eg. Walter Chang's work on UML
and RDF http://www.w3.org/TR/NOTE-rdf-uml/ - and commiting to
something that is embedded within files that will mostly *never* be
re-generated (PDFs, JPEGs etc in the wild) makes for naturally
conservative design. There are probably many kinds of improvement they
could make, but being back-compatible with the large bulk of deployed
XMP must be a major concern. Pushing out revisions to tools on the
scale of Photoshop etc isn't easy, especially when the new stuff will
also have to read/write properly in older deployed tools for unknown
years to come.

That said I think we would do well to look around more actively at
what's out there via XMP, and see how it hangs together when
re-aggregated into a common SPARQL environment. In particular XMP
pre-dates SKOS, and I imagine many of the environments where XMP
matters would benefit from the kinds of integration SKOS can bring. So
I'd love to see some exploration of that...

cheers,

Dan

Re: KIT releases 14 billion triples to the Linked Open Data cloud

2010-04-01 Thread Dan Brickley

But I love it :) Do the numbers include dates?

Dan

On Thu, Apr 1, 2010 at 12:30 PM, Matthias Samwald samw...@gmx.at wrote:
 Hi Denny,

 I am sorry, but I have to voice some criticism of this project. Over the
 past two years, I have become increasingly wary of the excitement over large
 numbers of triples in the LOD community. Large numbers of triples don't mean
 don't necessarily mean that a dataset enables us to do anything novel or
 significantly useful. I think there should be a shift from focusing on
 quantity to focusing on quality and usefulness.

 Now the project you describe seems to be well-made, but it also exemplifies
 this problem to a degree that I have not seen before. You basically
 published a huge dataset of numbers, for the sake of producing a large
 number of triples. Your announcement mainly emphasis on how huge the dataset
 is, and the corresponding paper does the same. The paper gives a few
 application scenarios, I quote

 The added value of the paradigm shift initiated by our work cannot be
 underestimated.
 By endowing numbers with an own identity, the linked open data cloud
 will become treasure trove for a variety of disciplines. By using elaborate
 data
 mining techniques, groundbreaking insights about deep mathematical
 correspondences
 can be obtained. As an example, using our sample dataset, we were able
 to discover that there are signi cantly more odd primes than even ones, and
 even more excitingly a number contains 2 as a prime factor exactly if its
 successor does not.

 I am sorry, but this  sounds a bit overenthusiastic. I see no paradigm
 shift, and I also don't see why your findings about prime numbers required
 you to publish the dataset as linked data. I also have troubles seeing the
 practical value of looking at the resource pages for each number with a
 linked data browser, but I am also not a mathematician.

 I am sorry for being a bit antagonistic, but we as a community should really
 try not to be seduced too easily by publishing ever-larger numbers of
 triples.

 Cheers,
 Matthias Samwald




 --
 From: Denny Vrandecic denny.vrande...@kit.edu
 Sent: Thursday, April 01, 2010 12:01 PM
 To: public-lod@w3.org
 Subject: KIT releases 14 billion triples to the Linked Open Data cloud

 We are happy to announce that the Institute AIFB at the KIT is releasing
 the biggest dataset until now to the Linked Open Data cloud. The Linked Open
 Numbers project offers billions of facts about natural numbers, all readily
 available as Linked Data.

 Our accompanying peer-reviewed paper [1] gives further details on the
 background and implementation. We have integrated with external data sources
 (linking DBpedia to all their 335 number entities) and also directly link to
 the best-known linked open data browsers from the page.

 You can visit the Linked Open Numbers project at:
 http://km.aifb.kit.edu/projects/numbers/

 Or point your linked open data browser directly at:
 http://km.aifb.kit.edu/projects/numbers/n1

 We are happy to have increased the amount of triples on the Web by more
 than 14 billion triples, roughly 87.5% of the size of linked data web before
 this release (see paper for details). We hope that the data set will find
 its serendipitous use.

 The data set and the publication mechanism was checked pedantically, and
 we expect no errors in the triples. If you do find some, please let us know.
 We intend to be compatible with all major linked open data publication
 standards.

 About the AIFB

 The Institute AIFB (Applied Informatics and Formal Description Methods) at
 KIT is one of the world-leading institutions in Semantic Web technology.
 Approximately 20 researchers of the knowledge management research group are
 establishing theoretical results and scalable implementations for the field,
 closely collaborating with the sister institute KSRI (Karlsruhe Service
 Research Institute), the start-up company ontoprise GmbH, and the Knowledge
 Management group at the FZI Research Center for Information Technologies.
 Particular emphasis is given to areas such as logical foundations, Semantic
 Web mining, ontology creation engineering and management, RDF data
 management, semantic web search, and the implementation of interfaces and
 tools. The institute is involved in many industry-university co-operations,
 both on a European and a national level, including a number of intelligent
 Web systems case studies.

 Website: http://www.aifb.kit.edu

 About KIT

 The Karlsruhe Institute of Technology (KIT) is the merger of the former
 Universität Karlsruhe (TH) and the former Forschungszentrum Karlsruhe. With
 about 8000 employees and an annual budget of 700 million Euros, KIT is the
 largest technical research institution within Germany. KIT is both, a state
 university with research and teaching and, at the same time, a large-scale
 research institution of the Helmholtz Association. KIT has a strong
 reputation as

Re: KIT releases 14 billion triples to the Linked Open Data cloud

2010-04-01 Thread Dan Brickley

On Thu, Apr 1, 2010 at 6:25 PM, Martin Hepp (UniBW)
martin.h...@ebusiness-unibw.org wrote:
 Hi Denny:
 Without spooling your All Fools' Day joke: I think it is a dangerous one,
 because there is obviously a true core in the expected criticism.

 I think that without any need, you give outsiders additional ammunition to
 confirm other outsiders' prejudices against the value of linked data. I bet
 you will find lots of triples in the current LOD cloud that have information
 value close to the triples in your experiment.

 And many people communicating over the potential of the Web of Linked Data,
 and maybe deciding about business investments, will not see the joke in your
 page.

On the contrary, I think it was both funny and healthy for the semweb community.

My thought process when I carelessly saw the original blurb go past
was as follows:

* oh dear, more overblown hype for some semweb thing, that's not good
* oh, it's quite stupid in fact
* ah it's Denny, and I like everything he makes ... and ah yeah 2010-04-01, phew

The fact that I was even for a second prepared to entertain the idea
that this was serious, worries me. And clearly a few others on the
list went further before realising. Which is why I say this was a
healthy exercise. If we as a community are so used to over-hyped folly
that we could consider that this might have been a serious offering,
then we ought to take more care of our habits during the other 364
days of the year. If I hadn't seen Denny's name against the project or
actually read the paper, I'd probably have been taken in too...

If we can't laugh at ourselves, we'll be ill prepared to deal with
criticism. And criticism is healthy for any technology community, but
especially one whose ambitions are as large as ours. We are trying to
build a global, integrated system for planet-wide sharing of
descriptions of all things and their interconnections. Described like
that, it sounds like drug-addled idiocy, but that's what we're doing.
And the only way we'll manage it is if we do it in good humour. This
means acting gracefully when fans of other technologies offer
criticism, whether or not they are gentle in their words. And it means
taking care to balance enthusiasm for the potential of this technology
with a realisation that there's still a long way to go in making these
tools and techniques a joy for non-enthusiasts to use...

cheers,

Dan

Re: Should dbpedia have stuff in that is not from wikipedia - was: Re: A URI(Web ID) for the semantic web community as a foaf:Group

2010-03-27 Thread Dan Brickley

[snip]

Couple of almost-independent points -

Re DBpedia, I share a concern that the Wikipedia turned into a
database product remain fairly clearly defined, even though the
RDFization naturally includes a bit of creativity. However even that
has subtleties - there are the different language variants for
example, plus outlying members of the Wikipedia family (wiktionary
etc.).

However I think we as a community should be prepared for an
interesting trend, hopefully one that'll move faster with things like
openid and RDF helping: I believe Wiki federation and
cross-referencing will become a major trend over next few years. The
stress and trauma that the Wikipedia community are currently feeling
re scoping, ie. the Deletionism debate -
http://meta.wikimedia.org/wiki/Deletionism -  can only really be
resolved by accepting that we'll have a Web of useful and overlapping
wikis, treating various topics in more or less detail. Using common
URIs (grounded in the central Wikipedia) makes this possible. And this
means - by combining dbpedia's extraction technology, or the Semantic
MediaWiki addons, that we can expect a lot more RDF data from other
wikis over the coming years. It wouldn't be unreasonable for the
DBpedia project to offer some aggregate of all this, if they chose
to...

Also re SWIG, considered as a entity in the W3C world and as a larger
vaguer community. Some W3C Interest Groups have enumerated
memberships; traditionally RDF IG and its successor, this SemWeb IG,
didn't. There is no master list, just a collection of SWIG-related
mailing lists and other channels. I wonder sometimes about changing
that, so we had a stronger sense of who the members of W3C SWIG
actually are (ie. who has commited to the group's charter; also
db-backed profile pages at w3.org, etc.). There are also data sources
like the mail archives and #swig IRC logs (see
http://swig.xmlhack.com/), Twitter/Identi.ca etc that offer some sense
of who the active members of the community are. Also I made some
experiments in http://danbri.org/words/2009/10/25/504 with exposing
lists of OpenIDs from Wordpress, MediaWiki etc to show who is actively
participating at some site. I think this evidence-driven approach is a
stronger way of defining a network of overlapping foaf:Group
descriptions, rather than having a single central list. I might for
example want to see who was on the www-rdf-logic or www-rdf-rules
lists and via their microblog posts, which amongst them were in the
Netherlands. Or find microblog posts from the people who are actively
contributing to the FOAF or ESW wikis.

There are lots of overlapping communities; being 'in the Semantic Web
community' isn't a simple boolean flag. So I'd rather surface the
underlying data and allow people to compose views into it that suit
particular use cases - find me things bookmarked by ontologists;
what have members of public-lod been saying on Twitter this week?,
Find me DOAP descriptions of software associated with members of the
#swig IRC channel, conferences with 2 or more editors of W3C SemWeb
specs on the steering committee, etc etc...

To relate these two points, I have started documenting bits of SemWeb
history in the FOAF Wiki, since I really can't be bothered to fight
deletionism wars on Wikipedia's main site.

For example http://wiki.foaf-project.org/w/MCF describes Meta Content
Format (and yep the CSS image right alignment has gone wrong there -
help welcomed!). The FOAF wiki has OpenID support, and Semantic Media
Wiki installed, so edits can be associated with OpenIDs. I would love
to know how best to configure SMW so that we could figure out that
http://wiki.foaf-project.org/w/MCF is talking about the same thing as
http://en.wikipedia.org/wiki/Meta_Content_Framework so that folk who
express their interest the topic using either URI can be linked.
What's the markup to put into the FOAF wiki entry which would express
the appropriate sameAs?

Also of note, the FOAF Wiki is currently configured to consume a list
of OpenIDs and add them to a MediaWiki trust group, Bureaucrat.
http://wiki.foaf-project.org/w/FOAF_Wiki:Bureaucrats ... it currently
gets this list just from my blog, ie. anyone who I have trusted enough
to comment in my blog, gets added to this group. In future I would
like to tune this to use more sources and more subtlety. Getting this
kind of trust syndication in place I think will be a big part of
helping smaller Wikis flourish, to connect back to the original
point...

cheers,

Dan

Re: SKOS, owl:sameAs and DBpedia

2010-03-24 Thread Dan Brickley

On Wed, Mar 24, 2010 at 4:57 PM, Yves Raimond yves.raim...@gmail.com wrote:
 Hello!

 We are in the process of rolling out some links to DBpedia over in BBC
 Programmes. However, we are facing a small issue. We use our own
 categorisation scheme based on SKOS, and then want to add some sameAs
 links to DBpedia.

 For example, we currently publish the following statements:

 http://www.bbc.co.uk/programmes/places/france#place a skos:Concept ;
 a po:Place .

 And we want to add an extra statement:

 http://www.bbc.co.uk/programmes/places/france#place owl:sameAs
 http://dbpedia.org/resource/France.

 Is that an issue? Should we drop SKOS altogether if we go on with
 that, or should we use skos:exactMatch instead of owl:sameAs?

see also http://wiki.foaf-project.org/w/term_focus

I'm running out of excuses for not having added this already...

Dan

Re: SKOS, owl:sameAs and DBpedia

2010-03-24 Thread Dan Brickley

On Wed, Mar 24, 2010 at 5:09 PM, Yves Raimond yves.raim...@gmail.com wrote:

 Is that an issue? Should we drop SKOS altogether if we go on with
 that, or should we use skos:exactMatch instead of owl:sameAs?

 see also http://wiki.foaf-project.org/w/term_focus

 I'm running out of excuses for not having added this already...

 Great, thanks for the link!

 However, I'd like to understand why a sameAs would be bad here, I have
 the intuition it might be, but am really not sure. It looks to me like
 there's no resource out there that couldn't be a SKOS concept as well
 (you may want to use anything for categorisation purpose --- the loose
 categorisation relationship being encoded in the predicate, not the
 type). If it can't be, then I am beginning to feel slightly
 uncomfortable about SKOS :-)

Because conceptualisations of things as SKOS concept are distinct from
the things themselves. If this weren't the case, we couldn't have
diverse treatment of common people/places/artifacts in multiple SKOS
thesauri, since sameAs merging would mangle the data. SKOS has lots of
local administrative info attached to each concept which doesn't make
sense when considered to be properties of the thing the concept is a
conceptualization of.

 I am sure this problem must have been looked at before, e.g. within LCSH?

Yes, this has been discussed since we brought SKOS into W3C from the
SWAD-Europe project ~2004. There is some discussion in this old guide
-

http://www.w3.org/TR/2005/WD-swbp-skos-core-guide-20051102/#secmodellingrdf

'There is a subtle difference between SKOS Core and other RDF
applications like FOAF [FOAF], in terms of what they allow you to
model. SKOS Core allows you to model a set of concepts (essentially a
set of meanings) as an RDF graph. Other RDF applications, such as
FOAF, allow you to model things like people, organisations, places
etc. as an RDF graph. Technically, SKOS Core introduces a layer of
indirection into the modelling.'

'The above graph describes a relationship between a concept, and the
person who is the creator of that concept. This graph should be
interpreted as saying,
the person named 'Alistair Miles' is the creator of the concept
denoted by the URI http://www.example.com/concepts#henry8. This
concept was modified on 2005-02-06.
This graph should probably not be interpreted as saying, the person
named 'Alistair Miles' is the creator of King Henry VIII, or that,
King Henry VIII was modified on 2005-02-06.

'This second graph should probably be interpreted as saying,
the persons named 'King Henry VII' and 'Elizabeth of York' are the
creators of the person named 'King Henry VIII'.

So, for a resource of type skos:Concept, any properties of that
resource (such as creator, date of modification, source etc.) should
be interpreted as properties of a concept, and not as properties of
some 'real world thing' that that resource may be a conceptualisation
of.

This layer of indirection allows thesaurus-like data to be expressed
as an RDF graph. The conceptual content of any thesaurus can of course
be remodelled as an RDFS/OWL ontology. However, this remodelling work
can be a major undertaking, particularly for large and/or informal
thesauri. A SKOS Core representation of a thesaurus maps fairly
directly onto the original data structures, and can therefore be
created without expensive remodelling and analysis.

SKOS Core is intended to provide both a stable encoding of
thesaurus-like data within the RDF graph formalism, as well as a
migration path for exploring the costs and benefits of moving from
thesaurus-like to RDFS/OWL-like modelling formalisms.'

http://www.w3.org/TR/2005/WD-swbp-skos-core-guide-20051102/#secidentity

'Concept Identity and Mapping

The property owl:sameAs should not be used to express the fact that
two conceptual resources (i.e. resources of type skos:Concept) share
the same meaning. The property owl:sameAs implies that two resources
are identical in every way (they are in fact the same resource).
Although two conceptual resources may have the same meaning, they may
have different owners, different labels, different documentation,
different history, and of course a different future.'

Hope this helps,

Dan

Re: Improving Organization of Govt. based Linked Data Projects

2010-03-21 Thread Dan Brickley


On 21 Mar 2010, at 12:47, Hugh Glaser h...@ecs.soton.ac.uk wrote:


Hi Kingsley, I am right with you - finding stuff is hard.
But I do think we could make it easier for all of us.
Just the esw wiki alone requires me to put every set I create into a  
bunch of places


10 years ago, looking for RDF on the public Web was like looking for a  
needle in a haystack. There wasnt much out there and it was poorly  
linked. So a big part of the thinking that led to the foaf/rdfweb  
design was to make discovery easier: if you find one rdf doc, you  
should be able to find most of the rest by following seeAlso and other  
kinds of links.


Why isn't this enough? Perhaps because many of the datasets are huge  
db exports, crawlers are often overwhelmed and dissapear into depth- 
first holes? Or because we don't publish triples about doc- and  
dataset-types in a crawler-discoverable way?


A wiki page is ok for initial bootstrap but we ought to outgrow that  
soon...


Dan

Re: head/@profile needed in HTML 5? GRDDL in Linked Data community?

2010-02-24 Thread Dan Brickley

On Wed, Feb 24, 2010 at 5:55 PM, Dan Connolly conno...@w3.org wrote:
 The proposal from the editors and chairs it that it is not needed;
 i.e. not cost-effective.
 http://lists.w3.org/Archives/Public/public-html/2010Feb/0794.html

 Dan B., your message suggests (without actually saying so) that
 Dublin Core doesn't need it. Have you heard back from the Dublin
 Core decision-making authorities?
 http://lists.w3.org/Archives/Public/public-html/2010Jan/0576.html


There was a little discussion on the Dublin Core Advisory Board list
(not a public forum; sorry no links).

I don't believe we considered explicitly the scenario in which
profile= gets lost, but something like RDFa is not permitted for
HTML5.  Maybe Pete or Tom (cc:'d) can comment further? My personal
guess at a DC view would be something like well if we don't get RDFa,
then don't take @profile away!, the assumption being that RDFa would
come with some namespace abbreviation mechanism, whether xmlns:-based
or otherwise. I doubt the DC community would be satisfied by the
current Microdata design in which each use of a DC property would be
identified by its full URI. If you like, I can ask explicitly.

 The microformats community seems happy to explore alternatives.
 http://lists.w3.org/Archives/Public/public-html/2010Feb/0690.html

 I'm considering pushing back on the 0794 proposal, but it's only
 worth my time if somebody actually needs head/@profile to survive
 into HTML 5.

 Does anybody need it?

That's a little like asking if someone needs the emergency life-raft
before telling them whether they get to keep using the boat or not.
WIthout RDFa, DC would have to use it.

 On a somewhat related topic... as RDFa matures, the need for GRDDL
 somewhat fades. I wonder, though... to what extent is GRDDL
 used in the linked data community? What tools consume it? What
 content providers produce it?

I've never used GRDDL, and I don't know of anyone actively using it.
That said, there are many things I don't know! I have tried to get
Redland/Raptor working with it to consume POWDER a couple of times,
but with no success. When I think about running GRDDL against wild Web
content, I have some vague worry about whether untrusted XSLTs are
sufficiently sandboxed, but I haven't investigated the risks very
carefully. I remember Bijan raising similar concerns a while back.

cheers,

Dan

 See also:

 The details of data in documents: GRDDL, profiles, and HTML5
 By Dan Connolly in HTML, Semantic Web, Web Architecture, XML on August
 22, 2008 7:45 PM
 http://www.w3.org/QA/2008/08/the_details_of_data_in_documen.html

 --
 Dan Connolly, W3C http://www.w3.org/People/Connolly/
 gpg D3C2 887B 0F92 6005 C541  0875 0F91 96DE 6E52 C29E

Re: Colors

2010-02-23 Thread Dan Brickley

On Wed, Feb 24, 2010 at 8:31 AM, Pat Hayes pha...@ihmc.us wrote:
 Does anyone know of URIs which identify colors? Umbel has the general notion
 of Color, but I want the actual colors, like, you know, red, white, blue and
 yellow. I can make up my own, but would rather use some already out there,
 if they exist.

 Many thanks for any pointers.

How scruffy are you feeling?
http://en.wikipedia.org/wiki/List_of_colors suggests you'll find a lot
in Wikipedia / dbpedia...

Dan

Re: Terminology when talking about Linked Data

2010-02-17 Thread Dan Brickley

On Wed, Feb 17, 2010 at 12:51 PM, Damian Steer d.st...@bristol.ac.uk wrote:
 Historical aside:

 On 17/02/10 11:20, Hugh Glaser wrote:

 More recently I have also badged as Web of Data;

 See [1], since 1998 :-) It's been used fairly regularly since then, although
 I'd highlight [2] as a particularly significant use of the term.

 Damian

 [1] http://www.w3.org/DesignIssues/Semantic.html
 [2] http://www.plasticbag.org/archives/2006/02/my_future_of_web_apps_slides/

Yes, any use of the phrase Web of data that excludes or sidelines
work like Tom Coates' here ([2]) would be ... regrettable. There have
already been unfortuate run-ins in blog land about whether you can do
'linked data' without using RDF in some LOD-approved manner. There is
much much more to 'data' than RDF (or OWL, or triples, or W3C SemWeb).
The Web's a big place and we have to be inclusive. RDF was originally
standardised as a metadata system, a mechanism for finding stuff ...
whether that stuff was photos, videos, HTML pages, excel spreadsheets,
SQL databases, 3d models. It can also be used to provide summaries or
normalisation of some of the information held in those data objects
too. But we shouldn't forget the original use case, nor sideline it.
Metadata about non-RDF documents is still linked data imho: all of
those forms of Web information are 'linked data' if we use W3C
information-linking technology to increase their findability. There's
more information out there than fits comfortably in triples or quads;
some of the best information is still in people's heads, after all.
FOAF was always blurbed as an experimental linked information
system; we should have been clearer that some of that info was in
triples, some in human-oriented documents, and some ... critically ...
was still in people's heads. The richness comes from the interplay
between those three forms of information. But I guess that's why I
still cling nostalgically to the word 'information' here, rather than
just 'data'.

BTW an early and important paper in the 'web of data' line, which
tried to bring RDF and XML together as components of a larger
('Semantic Web') story is http://www.w3.org/1999/04/WebData  ... it
doesn't use the phrase explicitly (except in the url path maybe) but
it is clear on the need for an inclusive approach.

cheers,

Dan

Re: Terminology when talking about Linked Data

2010-02-17 Thread Dan Brickley






On 17 Feb 2010, at 18:14, Pat Hayes pha...@ihmc.us wrote:



On Feb 17, 2010, at 6:37 AM, Dan Brickley wrote:


... . RDF was originally
standardised as a metadata system, a mechanism for finding stuff ...
whether that stuff was photos, videos, HTML pages, excel  
spreadsheets,

SQL databases, 3d models. ...


Really? That was not the impression I got when I first got involved  
with it. In fact, I asked explicitly for clarification, at the first  
F2F in Sebastopol: is RDF intended to be metadata for Web 'objects',  
or is it supposed to be a notation for describing **things in  
general**? And the resounding chorus from the WG was the latter,  
most definitely not the former. (Which is also what Guha told me  
right after the very first RDF speclet was first released.) And that  
is why I designed the semantics based on a logical model theory  
rather than a computational annotation system. If RDF was supposed  
to be primarily a mechanism for finding stuff, then we designed it  
wrong.


The original use cases were various flavours of 'metadata'; however  
that concept melts on closer inspection. We did the right thing by  
going with a general system; but we did lose touch a bit with some of  
the original scenarios which motivated W3C to standardise RDF in '97.  
MCF and RDF were never themselves technologies with a built-in scope  
of 'describing only data', and that was all fine and good. Whenever  
you dig into 'metadata' requirements you soon find that the whole  
world is soon in-scope. The gamble of course with a highly general  
standard is that it can be used in-principle for *everything* but  
risks in practice being used for nothing. It took us a while to find  
that niche...


Dan



Pat


IHMC (850)434 8903 or (650)494  
3973

40 South Alcaniz St.   (850)202 4416   office
Pensacola(850)202 4440   fax
FL 32502  (850)291 0667   mobile
phayesAT-SIGNihmc.us   http://www.ihmc.us/users/phayes

Re: The status of Semantic Web community- perspective from Scopus and Web Of Science (WOS)

2010-02-13 Thread Dan Brickley

On Fri, Feb 12, 2010 at 8:22 PM, Ying Ding dingy...@indiana.edu wrote:
Hi,

If you are interested to know the Semantic Web: Who is who from the
perspective of Scopus and Web Of Science, recently we conduct a bibliometric
analysis in this field
(http://info.slis.indiana.edu/~dingying/Publication/JIS-1098-v4.pdf), which
might be interesting to you.

It's interesting to see what a traditional - ie. essentially pre-Web -
citation analysis comes up with; however I wouldn't leap so quickly to
claim this this results in 'identifying the most productive players'.

A lot of key SemWeb infrastructure came about through non-academic
collaboration; either industrial or what we might call collaborations
conducted online informally, 'Internet-style'. In fact I'd argue that
the needs of the academic publication process have often been a
retarding factor on this collaborative work. The
traditionally-published academic literature is of course a key part of
the story, but if you look at it alone you will end up with both a
misleading sense of how things got this way, and -worse- misleading
intuitions about how to get more involved and help further the
project. This is why I bother to make a little fuss here.

The phrase 'Semantic Web' from ~2000 was essentially a rebranding of
the then-unfashionable RDF technology. Prior to calling it RDF, the
project was called PICS-NG. These days many call it 'Linked Data'
instead. From http://lists.w3.org/Archives/Public/sw99/ -
http://www.w3.org/1999/11/SW/Overview.html (Member-only link) 'We
propose to continue the W3C Metadata Activity as a Semantic Web
Development Initiative'. But by this point, the base technology was
already out there, both as a W3C Recommendation and as something in
use: Netscape - the Google of it's time - was using RDF already. For
example back in October 1988
http://web.archive.org/web/19991002043750/www.mailbase.ac.uk/lists/rdf-dev/1998-11/0004.html
R.V.Guha, then at Netscape wrote

I still see this as a big and important use of RDF. This server
answers over 2 million requests in RDF every day. ... I do plan to
fix the RDF, but thats with the next version of the browser (I have
about 6M browsers out there which are depending on this older
format).

Any narrative that puts the start of Semantic Web history in 2000/2001
will confuse people as to where it came from: we had major browser
buy-in 2-3 years previously, after all. And any narrative that omits
the role of MCF - simply because it didn't come through the academic
publication process - risks misleading 'emerging stars' about how to
make an impact on the world rather than just on the citation
databases. Netscape bought into RDF because it grew from MCF, acquired
from Apple with Guha. A reformulation of MCF to use an XML notation
was one of the key inputs into the RDF design; see
http://www.w3.org/TR/NOTE-MCF-XML/ and the earlier MCF White Paper
http://www.guha.com/mcf/wp.html

Now MCF had significant mind-share and presence in the tech world back
in 1996 -
http://web.archive.org/web/2815212707/http://www.xspace.net/hotsauce/
- and even grassroots adoption on sites that wanted to have a '3d fly
thru' using Apple's then-cool visualization plugin. MCF was a direct
ancestor to RSS (also originally an RDF-based Netscape product); it
was triples-based, written in XML, and quite recognisable as RDF's
precursor to anyone who reads the spec. The grassroots, information
linking style of MCF was one of the inspirations behind FOAF too.

However it did not leave any footprint in the academic literature. We
might ask why. Like much of the work around W3C and tech industry
standards, the artifacts it left behind don't often show up in the
citation databases. A white paper here, a Web-based specification
there, ... it's influence cannot easily be measured through academic
citation patterns, despite the fact that without it, the vast majority
of papers mentioned in
http://info.slis.indiana.edu/~dingying/Publication/JIS-1098-v4.pdf
would never have existed.

In my experience, many of the discussions that shaped the early RDF
and Semantic Web efforts were conducted online, using email, often
also IRC chat, and as the years went by, increasingly in blogs and now
microblogs. And many of the people who got a lot done were not
employed in an academic setting where there was an institutionalised
pressure to public in certain kinds of places. This is not to belittle
the critically important contributions that came from those employed
in academia, just to note that the wave of interest and research
funding that followed 200/1 served largely to polish and promote ideas
(and tools, specs) that had already reached prominence via
Internet/Web/industry means. Without that academic buy-in and
associated research funding, the Semantic Project would surely be dead
by now. However, there is a continuing danger of confusing the real
project --- a global collaboration to improve the Web's
information-linking facilities ---

Re: DBpedia-based entity recognition service / tool?

2010-02-02 Thread Dan Brickley

On Tue, Feb 2, 2010 at 4:47 PM, Georgi Kobilarov
georgi.kobila...@gmx.de wrote:
 Hi Matthias,

 So you're asking for the perfect entity recognition service, applicable to
 the easy domain of scientific texts? Sure, I developed one in my spare time,
 it's much better than OpenCalais, I was just too lazy to publish it yet...
 ;-)

Yes please, I'll take two :)

Seriously, I think it might be time to look at having common REST APIs
for these things, so we have a more fluid marketplace where servers
can be swapped and composed. How similar are the existing interfaces?
I have no idea...

One idea I had on NoTube that is implemented experimentally in
http://lupedia.ontotext.com/ is to use RDFa as an interop point. So
one of the interfaces from the Ontotext demo there is to return RDFa
markup - http://lupedia.ontotext.com/test-page4rdfa.html ... however
this doesn't leave much scope for including confidence measures etc in
the output.

cheers,

Dan

Can anyone help with an XSLT GRDDL conversion of Open Packaging Format (OPF) into RDF/XML Dublin Core

2010-01-28 Thread Dan Brickley

Hi all

http://www.idpf.org/2007/opf/OPF_2.0_final_spec.html#AppendixA defines
a Dublin Core-based XML metadata format used for ebooks.

This is very nice but a little disconnected from other Dublin Core
data in RDF. It would be great to have some XSLT to explore closer
integration and use of newer Dublin Core idioms (including
http://purl.org/dc/terms/).

Anyone got the time / expertise to explore this?

A related task would be to track down some actual OPF data to convert.
You don't need be an XSLT guru to do this :)

There's a forum at
http://www.idpf.org/forums/viewforum.php?f=5sid=4b4d5b89baf1300bd0f258e0715610e5
with some pointers to data. For example,

I am pleased to announce that Adobe InDesign CS3 now supports the
direct generation of OCF-packaged OPS content. A sample generated
directly from InDesign CS3 can be found at:
http://www.idpf.org/2007/ops/samples/TwoYearsBeforeTheMast.epub;

...which is a .zip package containing a file content.opf, the
beginning of which I'll excerpt below.

Thanks for any help exploring this. I found 3 examples in the forum,
the metadata section of the .opf files are extracted below. As we
think about RDFizing these, I think there are two aspects: firstly,
getting modern RDF triples from the data as-is. This might take some
care to figure out what role= should be, etc. But also secondly,
thinking how the format could be enriched in future iterations, so
that linked data URIs are used, eg. for those LCSH headings. At the
moment they have  dc:subjectlcsh: Czech
Americans—Fiction./dc:subject but it would be nice if
http://id.loc.gov/authorities/sh2009122741#concept was in there
somewhere (instead, as well?).

I'm sure any help working through these practicalities would be
appreciated both by the OPF folk and by Dublin Core...

cheers,

Dan




example 1: http://www.idpf.org/2007/ops/samples/TwoYearsBeforeTheMast.epub

?xml version=1.1?
package xmlns=http://www.idpf.org/2007/opf; version=2.0
unique-identifier=bookid
  metadata xmlns:dc=http://purl.org/dc/elements/1.1/;
dc:titleTwo Years Before the Mast/dc:title
dc:creatorRichard H. Dana Jr./dc:creator
dc:subject19th Century/dc:subject
dc:subjectCalifornia/dc:subject
dc:subjectSailors' life/dc:subject
dc:subjectfur trade/dc:subject
dc:descriptionTwo years at sea on the coast of California/dc:description
dc:identifier
id=bookidurn:uuid:4618c86c-f508-11db-8314-0800200c9a66/dc:identifier
 /metadata
  manifest
item id=ncx href=toc.ncx media-type=text/xml/
item id=introduction href=Introduction.html
media-type=application/xhtml+xml/
item id=chapteri href=ChapterI.html
media-type=application/xhtml+xml/
...



example 2: http://www.idpf.org/2007/ops/samples/hauy.epub

package xmlns=http://www.idpf.org/2007/opf; version=2.0
unique-identifier=uid
metadata xmlns:dc=http://purl.org/dc/elements/1.1/;
xmlns:opf=http://www.idpf.org/2007/opf;
dc:titleValentin Haüy - the father of the education
for the blind/dc:title
dc:creatorBeatrice Christensen Sköld/dc:creator
dc:publisherTPB/dc:publisher
dc:date opf:event=publication2006-03-23/dc:date
dc:date opf:event=creation2007-08-09/dc:date
dc:identifier id=uidC0/dc:identifier
dc:languageen/dc:language
meta name=generator content=Daisy Pipeline OPS Creator /
/metadata


example 3: http://www.idpf.org/2007/ops/samples/myantonia.epub

package version=2.0
 unique-identifier=PrimaryID
 xmlns=http://www.idpf.org/2007/opf;

metadata xmlns:dc=http://purl.org/dc/elements/1.1/;
  xmlns:opf=http://www.idpf.org/2007/opf;
dc:titleMy Ántonia/dc:title
dc:identifier id=PrimaryID
opf:scheme=URNurn:uuid:14c77a9a-e849-11db-8314-0800200c9a66/dc:identifier
dc:languageen-US/dc:language
dc:creator opf:role=aut opf:file-as=Cather, Willa SibertWilla
Cather/dc:creator
dc:creator opf:role=ill opf:file-as=Benda, Wladyslaw TheodorW.
T. Benda/dc:creator
dc:contributor opf:role=edt opf:file-as=Noring, Jon E.Jon E.
Noring/dc:contributor
dc:contributor opf:role=edt opf:file-as=Menéndez, JoséJosé
Menéndez/dc:contributor
dc:contributor opf:role=mdc opf:file-as=Noring, Jon E.Jon E.
Noring/dc:contributor
dc:contributor opf:role=trc opf:file-as=Noring, Jon E.Jon E.
Noring/dc:contributor
dc:publisherDigitalPulp Publishing/dc:publisher
dc:descriptionMy Ántonia is considered to be Willa S. Cather’s best
work, first published in 1918. It is a fictional account (inspired by
Cather’s childhood years) of the pioneer prairie settlers in late 19th
century Nebraska. This version, intended for general readers, is a
faithful, highly-proofed, and modestly modernized transcription of the
First Edition, with text corrections by José
Menéndez./dc:description
dc:coverageNebraska prairie, late 19th and early 20th Centuries
C.E./dc:coverage
dc:sourceFirst Edition of My Ántonia, published by the Riverside
Press Cambridge, Houghton

Re: Question about paths as URIs in the BBC RDF

2010-01-28 Thread Dan Brickley

On Thu, Jan 28, 2010 at 7:56 PM, Ross Singer rossfsin...@gmail.com wrote:
 Hi, I have a question about something I've run across when trying to
 parse the RDF coming from the BBC.  If you take a document like:

 http://www.bbc.co.uk/music/artists/72c536dc-7137-4477-a521-567eeb840fa8.rdf

 notice how all of the URIs are paths, but there's no xml:base to
 declare where these actual paths may reside.

 If I point rapper at that URI, it brings me back fully qualified URIs:
 http://www.bbc.co.uk/music/artists/72c536dc-7137-4477-a521-567eeb840fa8#artist

 but the only way I can figure it's able to do that is for the parser
 and the HTTP agent to be in cahoots somehow, which seems like a
 breakdown in the separation of concerns -- this document is useless,
 except in the context of living on www.bbc.co.uk.  The moment I cache
 it to my local system, if I'm understanding it correctly, it's now
 asserting these things about my filesystem (effectively).  Rapper now
 says:
 file:///music/artists/72c536dc-7137-4477-a521-567eeb840fa8#artist

 So my questions would be:
 1) Is this valid?
 2) If so, is there an expectation of the parser being aware of the URI
 of retrieval? (I have written my own set of parsers, so I'd need to
 rethink this assumption, if so)
 3) How do other client libraries handle this?

Hi Ross,

The relevant specs are

http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210/#section-Syntax-ID-xml-base

The XML Infoset provides a base URI attribute xml:base that sets the
base URI for resolving relative RDF URI references, otherwise the base
URI is that of the document. The base URI applies to all RDF/XML
attributes that deal with RDF URI references which are rdf:about,
rdf:resource, rdf:ID and rdf:datatype.

http://www.faqs.org/rfcs/rfc2396.html which specifies relative URI
processing given a base URI.

I think most of what you need is in :5.1. Establishing a Base URI there.

cheers,

Dan

Re: ISBNs, owl:sameAs, etc

2009-12-28 Thread Dan Brickley

On Tue, Dec 29, 2009 at 4:47 AM, Daniel O'Connor
daniel.ocon...@gmail.com wrote:
 Psst, Chris, Tobias - any chance of RDFBookMashup rendering 'owl:sameAs
 urn:isbn:12434567' ?

 I might see if I can glue freebase's 1.8 million or so ISBNs onto
 rdfbookmashup.

It's probably common knowledge, but there's a few scripts here -
http://wiki.foaf-project.org/w/DanBri/WikipediaISBNs - for extracting
isbns from wikipedia dumps. It found about half a million last time I
tried.

Dan

 -- Forwarded message --
 From: Daniel O'Connor daniel.ocon...@gmail.com
 Date: Tue, Dec 29, 2009 at 2:12 PM
 Subject: ISBNs, owl:sameAs, etc
 To: Discussion list for Freebase Experts freebase-expe...@freebase.com


  I don't suppose anyone wants to mint a whole bunch of URNs for ISBNs via a
 quick acre application?

 I'm upset that
 http://sameas.org/html?uri=urn%3Aisbn%3A9780670063260%0D%0Ax=0y=0

 Doesn't give me http://www.freebase.com/view/soft/isbn/9780670063260/best
 (or its RDF friends)

 :( WOE.

Re: Creating JSON from RDF

2009-12-14 Thread Dan Brickley

On Mon, Dec 14, 2009 at 10:23 AM, Richard Light
rich...@light.demon.co.uk wrote:
 In message c74badc3.20683%t.hamm...@nature.com, Hammond, Tony
 t.hamm...@nature.com writes

 Normal developers will always want simple.

 Surely what normal developers actually want are simple commands whereby data
 can be streamed in, and become available programmatically within their
 chosen development environment, without any further effort on their part?

 Personally I don't see how providing a format which is easier for humans to
 read helps to achieve this.  Do normal developers like writing text parsers
 so much?

 Give 'em RDF and tell them to develop better toolsets ...

RDF tooling still has some rough edges, it must be said. I am as
enthusiastic about RDF as anyone (having been involved since 1997) but
I've also seen the predictable results where on occasion people (eg.
standards groups) have been 'arm twisted' into using the technology
against their judgement and preferences. We don't have a solid
well-packaged and tested RDF/XML parser for the Ruby language yet, for
example. And while we do have librdfa integration into the
Redland/Raptor C toolkit, it hasn't yet propagated into all the easy
install settings we'll eventually find it - like my Amazon EC2 Ubuntu
box, or the copy of Fink I installed recently on my MacBook Pro. And
in PHP we have a fantastic RDF toolkit in ARC2, but it relies on MySQL
for all complex querying. Plenty of scope for toolkit polish and
improvement, nothing to worry massively about, but also lots of things
that will cause pain if we take a stubborn RDF or nothing approach.
I wholeheartedly applaud the pragmatic approach from Jeni and others.

 Come to that, RDF-to-JSON conversion could be a downstream service that
 someone else offers.  You don't have to do it all.

That could be useful for some, and inappropriate for others. Every new
step in the chain introduces potential problems with latency, bugs,
security and so on...

cheers,

Dan

Re: Creating JSON from RDF

2009-12-14 Thread Dan Brickley

On Mon, Dec 14, 2009 at 10:37 AM, Jeni Tennison j...@jenitennison.com wrote:
 Richard,

 My opinion, based on the reactions that I've seen from enthusiastic,
 hard-working developers who just want to get things done, is that we (the
 data.gov.uk project in particular, linked data in general) are not providing
 them what they need.

 We can sit around and wait for other people to provide the simple,
 light-weight interfaces that those developers demand, or we can do it
 ourselves. I can predict with near certainty that if we do not do it
 ourselves, these developers will not use the linked data that we produce:
 they will download the original source data which is also being made
 available to them, and use that.

 We, here, on this list, understand the potential power of using linked data.
 The developers who want to use the data don't. (And the publishers producing
 the data don't.) We simply can't say but they can just build tools, they
 can just use SPARQL. They are not going to build bridges to us. We have to
 build bridges to them.

 My opinion.

Opinion, sure. But absolutely correct, also!

(Excuse me if a small rant is triggered by all this...)

Why, twelve years, two months and twelve days after
http://www.w3.org/TR/WD-rdf-syntax-971002/ was first published, do we
not have well packaged, maintained and fully compliant RDF parsers
available in every major programming language?  And that is for just
the smallest critical piece of software needed to do anything useful.


Short answer: because people from these mailing lists didn't sit down
and do the work. We waited for someone else to do it. Some of us did
bits of it, but ... taken as a whole, there are still plenty of basic
pieces unfinished, in various languages.

Millions upon millions of euros and dollars have been spent on
Semantic this and Semantic that, and now Linked this and Linked that;
countless conferences, workshops and seminars, PDFs, PPTs and so on;
but still such basic software components haven't been finished,
polished, tested and distributed.

I'm not speaking ill of anyone in particular here. Countless folk have
worked hard and tirelessly to progress the state of the art, get tools
matured and deployed. But there is plenty plenty more to do. I do fear
that the structure of both academic and research (eg. EU) funding
doesn't favour the kind of work and workplan we need. In the
SWAD-Europe EU project we were very unusual to have explicit funding
and plans that allowed - for example - Dave Beckett to work not only
on the RDF Core standards, but on their opensource implementation in
C; or Jan Grant and Dave to work on the RDF Test Cases, or Alistair
Miles to take SKOS from a rough idea to something that's shaking up
the whole library world. I wish that kind of funding was easy to come
by, but it's not. A lot of the work we need to get done around here to
speed up progress is pretty boring stuff. It's not cutting edge
research, nor the core of a world-changing startup, nor a good topic
for a phd.

With every passing year the RDF tools do get a bit better, but also
the old ones code rot a bit, or new things come along that need
supporting (GRDDL, RDFa etc.). What can be done in the SemWeb and
Linked Data scene so that it becomes a bigger part of people's real
dayjobs to improve our core tooling? Are the resources already out
there but poorly coordinated? Would some lightweight collective
project management help? Are there things (eg. finalising a ruby
parser toolkit) that are weekend-sized jobs, month sized jobs; do they
look more like msc student summer projects or EU STREP / IP projects
in scale? Could we do more by simply transliterating code between
languages? ie. if something exists in Python it can be converted to
Ruby or vice-versa...? Are funded grants available  (eg. JISC in UK?)
that would help polish, package, test and integrate basic entry-level
RDF / linked data software tools?

Back on the original thread, I am talking here so far only about core
RDF tools, eg. having basic RDF -to- triples facility available
reliably in some language of choice. As Jeni emphasises, there are
lots of other pieces of bridging technology needed (eg. into modern
JSON idioms). But when we are hoping to convert folk to use pure
generic RDF tools, we better make sure they're in good shape. Some
are, some aren't, and that lumpy experience can easily turn people
away...

cheers,

Dan

Re: Creating JSON from RDF

2009-12-13 Thread Dan Brickley

On Sun, Dec 13, 2009 at 8:03 PM, Dave Reynolds
dave.e.reyno...@googlemail.com wrote:
 Hi Jeni,

 [Rest of post snipped for now, I'll respond properly later. Seems like we
 are on sufficiently similar wavelengths that it is just a matter of
 working the details.]

 I don't know where the best place is to work on this: I guess at some
 point it would be good to set up a Wiki page or something that we could use
 as a hub for discussion?

 I'd suggest setting up a Google Code area and making anyone who is
 interested a committer. That gives us a Wiki but also hosting for associated
 code for generating/navigating the format. I'd be happy to set one up.

 An alternative is the ESW Wiki but (a) that doesn't have an associated code
 area, (b) I don't personally have access right now (though I believe that is
 easily fixable) and (c) it might be presumptuous to associate it with W3C at
 this stage of baking.

Ivan Herman (cc:'d) has been looking into a modernised general
'Semantic Web' wiki area on w3.org, ie. using (Semantic?) MediaWiki,
rather than the old MoinMoin (for now and forseeable ESW will remain
using MoinMoin, since migration is non-trivial). There was also some
recent discussion at W3C about opening up Git or Mercurial distributed
versioning systems for the standards community, which sounds like it
could be a good fit for SemWeb IG-and-nearby collaborations. However
that is at an early stage. Google Code might be easiest for now...

Ivan - care to comment?

Dan

Re: Need help mapping two letter country code to URI

2009-11-09 Thread Dan Brickley

On Mon, Nov 9, 2009 at 10:47 PM, Aldo Bucchi aldo.buc...@gmail.com wrote:
 Hi,

 I found a dataset that represents countries as two letter country
 codes: DK, FI, NO, SE, UK.
 I would like to turn these into URIs of the actual countries they represent.

 ( I have no idea on whether this follows an ISO standard or is just
 some private key in this system ).

 Any ideas on a set of candidata URIs? I would like to run a complete
 coverage test and take care I don't introduce distortion ( that is
 pretty easy by doing some heuristic tests against labels, etc ).

 There are some border cases that suggest this isn't ISO3166-1, but I
 am not sure yet. ( and if it were, which widely used URIs are based on
 this standard? ).

http://www.fao.org/countryprofiles/geoinfo.asp might have something
useful for you?

Dan

Re: temporary URLs on Second Life

2009-07-20 Thread Dan Brickley


On 20/7/09 11:01, Danny Ayers wrote:

Second Life objects to become HTTP-aware :

http://www.massively.com/2009/07/08/second-life-objects-to-become-http-aware/

cool, right? well not exactly, it uses shortlived-by-design URIs:

http://wiki.secondlife.com/wiki/LSL_http_server


Well, we can't have it both ways.

Either we want everything of interest to have HTTP URIs.

Or we want all HTTP URIs to de-reference usefully forever.

But we won't easily get eternally-useful http URIs for everything useful 
that has ever been plugged into the 'net.


Anyone building systems that assume otherwise is building something 
rather fragile. There are a *lot* of data objects in secondlife...


cheers,

Dan

Re: Dons flame resistant (3 hours) interface about Linked Data URIs

2009-07-10 Thread Dan Brickley


On 10/7/09 12:23, Juan Sequeda wrote:

Steve is right.

If I am not wrong, when TBL gave his talk at CERN for the 20th
aniversary of the web, he said that he was amazed that people were
hacking HTML by hand. He never expected it.

Now... we are the geeks doing RDF, conneg, linked data by hand... In a
couple of years we will create tools for the non-geeks

We have to learn from our history and not get ahead of ourselves.


RDF has been a W3C Recommendation since February, 1999.  The RDF work 
went public in Oct 1997.


A lot has happened since then...

Definitely we've done a lot of hacker-grade stuff in the meantime. But 
tools for going mainstream are getting overdue! Even tools for 
developers: eg. regular Redland builds on Windows; a solid packaged Ruby 
library, etc.


Re tools for publishing, given the fiddlyness of doing RDF right, my 
vote is for everything that allows tools on one site to post RDF into 
another. I've suggested before that AtomPub + OAuth would be a plausible 
starting point, but I'm open to suggestions.


Re non-geeks, http://www.youtube.com/watch?v=o4MwTvtyrUQ is a must-watch...

cheers,

Dan

[Fwd: 2nd CFP: ISWC'09 workshop on Ontology Matching (OM-2009)]

2009-07-08 Thread Dan Brickley



I don't normally forward conference CFPs, but it seems it would be 
useful to build some links with this community. Aw crap, can't believe I 
typed that. But you know what I mean...


Dan

 Original Message 
Subject:2nd CFP: ISWC'09 workshop on Ontology Matching (OM-2009)
Date:   Wed, 8 Jul 2009 09:28:34 +0200
From:   Pavel Shvaiko pa...@dit.unitn.it
To: pavel.shva...@infotn.it



Apologies for cross-postings

--
CALL FOR PAPERS
--


The Fourth International Workshop on
ONTOLOGY MATCHING
(OM-2009)
http://om2009.ontologymatching.org/
October 25, 2009, ISWC'09 Workshop Program, Fairfax, near Washington
DC., USA


BRIEF DESCRIPTION AND OBJECTIVES
Ontology matching is a key interoperability enabler for the Semantic Web,
as well as a useful tactic in some classical data integration tasks.
It takes the ontologies as input and determines as output an alignment,
that is, a set of correspondences between the semantically
related entities of those ontologies. These correspondences can be used
for various tasks, such as ontology merging and data translation.
Thus, matching ontologies enables the knowledge and data expressed
in the matched ontologies to interoperate.

The workshop has three goals:
1. To bring together leaders from academia, industry and user institutions
to assess how academic advances are addressing real-world requirements.
The workshop will strive to improve academic awareness of industrial
and final user needs, and therefore, direct research towards those needs.
Simultaneously, the workshop will serve to inform industry and user
representatives about existing research efforts that may meet their
requirements. The workshop will also investigate how the ontology
matching technology is going to evolve.

2. To conduct an extensive and rigorous evaluation of ontology matching
approaches through the OAEI (Ontology Alignment Evaluation Initiative)
2009 campaign: http://oaei.ontologymatching.org/2009/
This year's OAEI campaign introduces two new tracks about
oriented alignments and about instance matching (a timely topic for
the linked data community). Therefore, the ontology matching evaluation
initiative itself will provide a solid ground for discussion of how well
the current approaches are meeting business needs.

3. To examine similarities and differences from database schema matching,
which has received decades of attention but is just beginning to transition
to mainstream tools.


TOPICS of interest include but are not limited to:
Business cases for matching;
Requirements to matching from specific domains;
Application of matching techniques in real-world scenarios;
Formal foundations and frameworks for ontology matching;
Large-scale ontology matching evaluation;
Performance of matching techniques;
Matcher selection and self-configuration;
Uncertainty in ontology matching;
User involvement (including both technical and organizational aspects);
Explanations in matching;
Social and collaborative matching;
Alignment management;
Reasoning with alignments;
Matching for traditional applications (e.g., information integration);
Matching for dynamic applications (e.g., peer-to-peer, web-services).



SUBMISSIONS
Contributions to the workshop can be made in terms of technical papers and
posters/statements of interest addressing different issues of ontology
matching
as well as participating in the OAEI 2009 campaign. Technical papers should
be not longer than 12 pages using the LNCS Style:
http://www.springeronline.com/sgw/cda/frontpage/0,11855,5-164-2-72376-0,00.html
Posters/statements of interest should not exceed 2 pages and
should be handled according to the guidelines for technical papers.
All contributions should be prepared in PDF format and should be submitted
through the workshop submission site at:

http://www.easychair.org/conferences/?conf=om20090

Contributors to the OAEI 2009 campaign have to follow the campaign
conditions
and schedule at http://oaei.ontologymatching.org/2009/.


IMPORTANT DATES FOR TECHNICAL PAPERS:
August 11, 2009: Deadline for the submission of papers.
September 6, 2009: Deadline for the notification of acceptance/rejection.
October 2, 2009: Workshop camera ready copy submission.
October 25, 2009: OM-2009, Westfields Conference Center, Fairfax, near
Washington DC., USA.


ORGANIZING COMMITTEE
1. Pavel Shvaiko (Main contact)
TasLab, Informatica Trentina SpA, Italy

2. Jérôme Euzenat
INRIA  LIG, France

3. Fausto Giunchiglia
University of Trento, Italy

4. Heiner Stuckenschmidt
University of Mannheim, Germany

5. Natasha Noy
Stanford Center for Biomedical Informatics Research, USA

6. Arnon Rosenthal
The MITRE Corporation, USA


PROGRAM COMMITTEE
Yuan An, Drexel University, USA
Zohra Bellahsene, LIRMM, France
Paolo Besana, University of Edinburgh, UK
Olivier Bodenreider, National Library of Medicine, USA

Re: tutorial on Music and the Web of Data

2009-07-01 Thread Dan Brickley


On 1/7/09 17:51, Kingsley Idehen wrote:


Linked Music Data or Linked Open Music Data, either provides a clear
moniker for a music oriented Linked Data Space on the Web :-)


It does rather suggest the music files are up there too. And I wouldn't 
complain if they were... :)


Dan

Re: how do I report bad sameAs links? (dbpedia - Cyc)

2009-06-30 Thread Dan Brickley


On 30/6/09 13:33, Kingsley Idehen wrote:

Dan Brickley wrote:


(I was reminded about the SW bug tracker after posting this; good idea)


http://sw.opencyc.org/2008/06/10/concept/Mx4rv8L0_JwpEbGdrcN5Y29ycA
says it is owl:sameAs dbpedia:Spaced

And DBpedia reports the same. They're both wrong! The DBpedia page is
about a television situation comedy show; the Cyc page is about a
freeware computer game.



This is problem in the OpenCyc data space (and the datasets generated
from it). DBpedia doesn't reciprocate that claim :-)


Yes it does! That's how I found the Cyc entry in the first place. Use 
case blogged here - http://danbri.org/words/2009/06/30/418


http://dbpedia.org/page/Spaced
says
owl:sameAs  
* fbase:Spaced
* opencyc:en/Spaced_TheGame


btw - we are on the verge of releasing DBpedia 3.3 (sometime today).


Congratulations! :)

Dan

how do I report bad sameAs links? (dbpedia - Cyc)

2009-06-29 Thread Dan Brickley



http://sw.opencyc.org/2008/06/10/concept/Mx4rv8L0_JwpEbGdrcN5Y29ycA
 says it is owl:sameAs dbpedia:Spaced

And DBpedia reports the same. They're both wrong! The DBpedia page is 
about a television situation comedy show; the Cyc page is about a 
freeware computer game.


cheers,

Dan

Re: Visualization of domain and range

2009-06-26 Thread Dan Brickley



Interesting discussion!

On 25/6/09 14:15, Simon Reinhardt wrote:

Hi

Bernhard Schandl wrote:

[1] http://www.ifs.univie.ac.at/schandl/2009/06/domain+range_bad.png
[2] http://www.ifs.univie.ac.at/schandl/2009/06/domain+range_better.png


I like this. The former has several problems anyway: you have to repeat
properties if they can hold between several classes [3] and you have to
draw lines connecting lines for expressing sub-properties or inverse
properties [4] which looks rather confusing and is not supported by many
visual modelling tools.


Yeah, my [4] is at my threshold of tolerance for chaos in a diagram. I 
wanted a way to show the core of the FOAF spec in a picture, so tried 
(despite similar concerns to those mentioned in this thread) the style 
of putting domain/range directly in an instance-like style.


In http://www.flickr.com/photos/danbri/1856478164/ ([4]) I try to do too 
many things at once:

 * show the classes that each property is used with
 * show sub-property relationships
 * show sub-class relationships
 * show some typical properties
 * show attachment points for friends of FOAF namespaces (DOAP, SIOC, 
DC, Geo etc), with classes and with sample properties


This is a lot of information.

I did try to make a gradual reveal slideshow version, building up from 
something simple. It wasn't great. The layout was done by hand to 
minimise crossovers, and looking at it, I think the whole structure 
could be twisted/stretched to be more evenly presented. It was fiddly to 
do though.


A sample of instance-data would probably convey most of the same 
information about domain/range, and would allow subclasses reasonably 
too. Sub-property would remain hard...


If anyone wants to mess around with the FOAF example, source data in 
OmniGraffle format is here and also in SVG: just do svn co 
http://svn.foaf-project.org/foaf/trunk/xmlns.com/htdocs/foaf/spec/images/;



[3] also shows a combination of the two
problems: if you draw several lines for one property, you have to
connect sub-properties to each of them or to an arbitrarily selected
one. The only downside I see here is that adding ellipses for properties
makes the diagram a bit more bloated.


I don't find [3] very readable. There was another Harmony ABC diagram (I 
think from Carl Lagoze) in 
http://www.ilrt.bris.ac.uk/discovery/harmony/docs/abc/abc_draft.html#Simple%20Rules 
that uses dotted lines for implied types, I think this can work well in 
instance level presentations.


cheers,

Dan


Regards,
Simon


[3] http://metadata.net/harmony/ABC_Class_Hierarchy_with_Properties.gif
[4] http://www.flickr.com/photos/danbri/1856478164/ (sorry Dan!)

Re: .htaccess a major bottleneck to Semantic Web adoption / Was: Re: RDFa vs RDF/XML and content negotiation

2009-06-26 Thread Dan Brickley


On 26/6/09 10:51, Toby Inkster wrote:

On Fri, 2009-06-26 at 09:35 +0200, Dan Brickley wrote:


Does every major RDF toolkit have an integrated RDFa parser already?


No - and even for those that do, it's often rather flaky.

Seseme/Rio doesn't have one in its stable release, though I believe one
is in development for 3.0.

Redland/Raptor often (for me at least) seems to crash on RDFa. It also
complains a lot when named entities are used (e.g.nbsp;) even though
the XHTML+RDFa 1.0 DTD does allow them.

Jena (just testing on sparql.org) doesn't seem to handle RDFa at all.

Not really toolkits per se, but cwm and the current release of
Tabulator don't seem to have RDFa support. (Though I think support for
the latter is being worked on.)

For application developers who are specifically trying to support RDFa,
none of this is a major problem - it's pretty easy to include a little
content-type detection and pass the XHTML through an RDFa-XML converter
prior to the rest of your code getting its hands on it - but this does
require specific handling, which must be an obstacle to adoption.


Yep, pretty much as I feared. Also the Google SGAPI currently only reads 
FOAF in RDF/XML form, not yet updated to use the rdfa support in Rapper.


Re app developers, it depends a lot. If your app is built inside some 
framework - eg. Protege - RDFa might be quite hard to integrate. Some 
apps also store to local disk rather than HTTP space, and so using 
content-negotiation is tricky. RDFa files don't have any well known 
file-suffix patterns either.


cheers,

Dan

Re: http://ld2sd.deri.org/lod-ng-tutorial/

2009-06-23 Thread Dan Brickley


On 22/6/09 23:16, Martin Hepp (UniBW) wrote:



Yves Raimond wrote:

Ontology modularization is
a pretty difficult task, and people use various heuristics for deciding what
to put in the subset being served for an element. There is no guarantee that
the fragment you get contains everything that you need.



There is no safe way of importing only parts of an ontology, unless you
know that its modularization is 100% reliable.
Serving fragments of likely relevant parts of an ontology for reducing
the network overhead is not the same as proper modularization of the
ontology.


Can you give a concrete example of the danger described here? ie. the 
pair of a complete (safe) ontology file and a non-safe subset, and an 
explanation of the problems caused.


I can understand there is no guarantee that the fragment you get 
contains everything you need, and I also remind everyone that 
dereferencing is a privilege not a right: sometimes the network won't 
give you what you want, when you want it. But I've yet to hear of anyone 
who has suffered due to term-oriented ontology fragment downloads. I 
guess medical ontologies would be the natural place for horror stories?


cheers,

Dan

Re: http://ld2sd.deri.org/lod-ng-tutorial/

2009-06-23 Thread Dan Brickley


On 23/6/09 09:33, Martin Hepp (UniBW) wrote:

Hi Dan:
I think Alan already gave examples this morning. An ontology can contain
statements about the relationship between conceptual elements - classes,
properties, individuals - that (1) influence the result to queries but
(2) are not likely retrieved when you just dereference an element from
that ontology. The more complex an ontology is, the more difficult is it
to properly modularize it.


Indeed, I missed his mail until after I'd sent mine. And the examples 
are helpful. However they are - for the non-SemWeb enthusiast - 
incredibly abstract:


FunctionalObjectProperty(p)
InverseFunctionalObjectProperty(p)
ObjectPropertyDomain(:a)
ObjectPropertyRange(:b)
etc.

What I'd love to see is some flesh on these bones: a wiki page that 
works through these cases in terms of a recognisable example. Products, 
people, documents, employees, access control, diseases, music, whatever. 
I want something I can point to that says this is why it is important 
to take care of the formalisms..., this is what we can do so that 
simple-minded but predictable machines do the hard work instead of us



But basically my main point is that the use of owl:imports is defined
pretty well in

http://www.w3.org/TR/owl-ref/#imports-def

and there is no need to deviate from the spec just for the matter of gut
feeling and annoyance about the past dominance of DL research in the
field. And as the spec says - with a proper owl:imports statement, any
application can decide if and what part of the imported ontologies are
being included to the local model for the task at hand.


+1 on respecting the specs, but also all know that not every piece of 
specification finds itself useful in practice. Having a worked-through 
to the instance level account of why owl:imports is useful would help. 
There is no compulsion re standards here: if someone is happy publishing 
RDFS, we can't make them use OWL. If they're happy using OWL we can't 
make them use RIF. If they're happy with RIF 1, we can't make them use 
RIF 2 etc. Or any particular chapter or verse of those specs.


What we can do is ground our evangelism in practical examples. And for 
those to be compelling, they can't solely be at the level of properties 
of properties; we need an account in terms of instance level use cases too.


cheers,

Dan

Re: http://ld2sd.deri.org/lod-ng-tutorial/

2009-06-22 Thread Dan Brickley


[snip]

Yup, re owl:imports, I enthusiastically added it to the FOAF spec when 
some OWL WG insider suggested it was the right thing to use, and 
dutifully removed it when someone (I forget who in both cases - quite 
possibly same person!) a few years later told me it had fallen from 
fashion within the OWL scene.


Re attitudes to OWL ... I do agree there have in the distant past (ie. 
last year!) been a few casually dismissive remarks around here regarding 
OWL. It's all too easy for a healthy enthusiasm for practical tools to 
trick us into seeing tools that we're not so familiar with as 
impractical. I'm happy to have read plenty of useful discussion here and 
nearby about how best to use or augment owl:sameAs. FOAF is a described 
using OWL. I expect some day in the not too distant future, Dublin Core 
Terms will be described in OWL too. And the community on 
public-lod@w3.org have been excellent champions of both. Things aren't 
too polarised, despite the occasional lapses into them and us-ism...


Optimistically,

Dan

Re: Common Tag, FOAF and Dublin Core Re: Common Tag - semantic tagging convention

2009-06-18 Thread Dan Brickley


On 18/6/09 13:31, Bernard Vatant wrote:

Rob, Danny (and Dan)



... why not use simply dc:creator and dc:date to this effect?


Right. dc:date would seem a good choice, though I reckon foaf:maker
might be a better option than dc:creator as the object is a resource
(a foaf:Agent) rather than a literal. While it's likely to mean an
extra node in many current scenarios, it offers significantly more
prospect for linking data (and less ambiguity).


dcterms:creator would also allow for use of a resource. Bibliontology
uses dcterms over dc.

Well I actually meant dcterms:creator when I wrote dc:creator, sorry. So
you can link your personal tags to your foaf profile, for example.
And it's consistent even for tag:AutoTag, since the range of
dcterms:creator is dcterms:Agent, including person, organisation and
software agent as well.
Unless I miss some sublte distinguo dcterms:Agent is equivalent to
foaf:Agent, and dcterms:creator equivalent to foaf:maker. BTW, with due
respect to danbri, I wish FOAF would be revised to align whenever
possible on dcterms vocabulary, now that it has clean declarations of
classes, domains and ranges ...
http://dublincore.org/documents/dcmi-terms is worth (re)visiting :-)


Completely agree. I'm very happy with the direction of DC terms. The 
foaf:maker property was essential for a while, until DC was cleaned up. 
I'll mark it as a sub-property of dcterms:creator. I hope we'll get 
reciprocal claims into the Dublin Core RDF files some day too...


Copying Tom Baker here. Tom - what would the best process be for adding 
in mapping claims to the DC Terms RDF? Maybe we could draft some RDF, 
put it onto dublincore.org elsewhere, and for now add a seeAlso from the 
namespace RDF?


cheers,

Dan

Re: Common Tag, FOAF and Dublin Core Re: Common Tag - semantic tagging convention

2009-06-18 Thread Dan Brickley


On 18/6/09 15:07, Thomas Baker wrote:

On Thu, Jun 18, 2009 at 01:49:56PM +0200, Dan Brickley wrote:

Well I actually meant dcterms:creator when I wrote dc:creator, sorry. So
you can link your personal tags to your foaf profile, for example.
And it's consistent even for tag:AutoTag, since the range of
dcterms:creator is dcterms:Agent, including person, organisation and
software agent as well.
Unless I miss some sublte distinguo dcterms:Agent is equivalent to
foaf:Agent, and dcterms:creator equivalent to foaf:maker. BTW, with due
respect to danbri, I wish FOAF would be revised to align whenever
possible on dcterms vocabulary, now that it has clean declarations of
classes, domains and ranges ...
http://dublincore.org/documents/dcmi-terms is worth (re)visiting :-)

Completely agree. I'm very happy with the direction of DC terms. The
foaf:maker property was essential for a while, until DC was cleaned up.
I'll mark it as a sub-property of dcterms:creator. I hope we'll get
reciprocal claims into the Dublin Core RDF files some day too...

Copying Tom Baker here. Tom - what would the best process be for adding
in mapping claims to the DC Terms RDF? Maybe we could draft some RDF,
put it onto dublincore.org elsewhere, and for now add a seeAlso from the
namespace RDF?


Hi Dan,

If you could write up a short proposal -- how the properties are
defined, with a proposed mapping claim -- we could discuss this
in the DCMI Usage Board and take a decision.  We associate
changes in the namespace RDF (and related namespace
documentation) with formal decisions so would need to follow a
process.


Sounds like a plan! Thanks. I'll take it to DC lists and report back 
here as things progress.


cheers,

Dan

1 2 >

1 - 100 of 139 matches

Mail list logo